website-monitor/spec.md

328 lines
9.1 KiB
Markdown

# Website Change Detection Monitor - Technical Specification
## Product Overview
A SaaS platform that monitors web pages for changes and alerts users when meaningful updates occur. The core value proposition is "signal over noise" - detecting real changes while filtering out irrelevant updates.
## Core Value Propositions
- **Easy to understand**: "I watch pages so you don't have to"
- **Smart filtering**: Automatically ignores timestamps, cookie banners, and noise
- **Keyword intelligence**: Alert on specific content appearing/disappearing
- **SEO-friendly**: Captures long-tail keywords (e.g., "monitor job posting changes")
---
## Feature Specification by Phase
### MVP Features (Launch Fast)
#### 1. URL Monitoring
- **Track URLs**: Add any public web page by URL
- **Frequency options**: 5min / 30min / 6hr / 24hr intervals
- **Change detection methods**:
- Content hash comparison (fast, binary change detection)
- Text diff (character/line level differences)
- **Storage**: Last 5-10 snapshots per URL
#### 2. Alert System
- **Email notifications**: Send alert when change detected
- **Alert content**:
- Timestamp of change
- Link to diff view
- Change severity indicator
- **Basic throttling**: Max 1 alert per check interval
#### 3. Change Viewing
- **History timeline**: Chronological list of all checks
- Status (changed/unchanged/error)
- Timestamp
- Response code
- **Diff viewer**:
- Side-by-side or unified view
- Highlighted additions/deletions
- Character-level diff for precision
- **Snapshot storage**: Full HTML snapshots for history
#### 4. Reliability
- **Retry logic**: 2-3 attempts on timeout/5xx errors
- **Error alerts**: Notify if page becomes unavailable
- **Status tracking**:
- HTTP response codes
- Timeout detection
- Robot/CAPTCHA blocking detection
- **Run logs**: Detailed history of each check attempt
---
### V1 Features (People Pay)
#### 5. Noise Reduction (Differentiator)
- **Automatic filtering**:
- Cookie consent banners
- Timestamps and "last updated" text
- Current date/time displays
- Session IDs in URLs
- Rotating content (recommendations, ads)
- **Custom ignore rules**:
- Text pattern matching (regex)
- CSS selector exclusion
- Ignore numeric changes except in specific areas
- Whitelist/blacklist mode
#### 6. Selective Monitoring
- **Element monitoring**: Track only specific page sections
- Visual point-and-click selector
- CSS selector input (power users)
- XPath support
- **Multiple elements**: Track different sections separately
- **Element naming**: Label tracked elements for clarity
#### 7. Keyword-Based Alerts (High Value)
- **Keyword rules**:
- Alert when keyword appears
- Alert when keyword disappears
- Alert when keyword count changes
- Threshold-based alerts ("less than 5 items")
- **Regex support**: Advanced pattern matching
- **Multiple keywords**: AND/OR logic combinations
- **Case sensitivity options**
#### 8. Advanced Alerting
- **Digest mode**: Daily/weekly summary of all changes
- **Quiet hours**: No alerts during specified times
- **Alert throttling**: Configurable limits
- Max alerts per hour/day
- Cooldown periods
- **Severity filtering**: Only alert on major changes
- **Multiple channels per monitor**: Email + Slack, etc.
---
### V2 Features (Market Winner)
#### 9. Visual Change Detection
- **Screenshot capture**: Full-page screenshots
- **Image diff**: Visual highlighting of changed areas
- **Pixel-perfect detection**: Layout shift detection
- **Before/After carousel**: Easy visual comparison
- **Screenshot retention**: Based on plan tier
#### 10. AI-Powered Summaries
- **Change summarization**: Natural language description
- "Price changed from $29.99 to $24.99"
- "New 'Out of Stock' banner added"
- **Change classification**:
- Price change
- Availability change
- Policy/text update
- Layout-only change
- **Smart alerts**: Only notify for meaningful changes
- **Summary in alert**: No need to view diff for simple changes
#### 11. Complex Page Support
- **JavaScript rendering**: Headless browser mode
- **Authentication**:
- Basic auth
- Session cookies
- Login flow automation
- **Dynamic content**: Wait for AJAX/lazy loading
- **SPA support**: Monitor client-side rendered apps
#### 12. Integrations
- **Slack**: Channel notifications
- **Discord**: Webhook alerts
- **Microsoft Teams**: Connector integration
- **Webhook**: Generic POST for automation tools
- **RSS feed**: Per-monitor or global feed
- **Zapier/Make**: Pre-built integrations
- **API**: Programmatic access to monitors and history
---
### Power User Features (Teams & Scale)
#### 13. Organization
- **Folders/Projects**: Hierarchical organization
- **Tags**: Multi-dimensional categorization
- **Search**: Full-text search across monitors and history
- **Filters**: Status, tags, frequency, last changed
- **Bulk operations**:
- Import URLs from CSV
- Export history
- Bulk pause/resume
- Bulk delete
#### 14. Collaboration
- **Team workspaces**: Shared monitor collections
- **Role-based access**:
- Admin: Full control
- Editor: Create/edit monitors
- Viewer: Read-only access
- **Assignment**: Assign monitors to team members
- **Comments**: Annotate changes
- **Audit log**: Track team actions
#### 15. Advanced Scheduling
- **Custom schedules**: Cron-like expressions
- **Business hours only**: Skip nights/weekends
- **Timezone-aware**: Different times per monitor
- **Geo-distributed checks**: Monitor from multiple regions
- **Adaptive frequency**: Check more often during active periods
---
## Technical Architecture
### Core Components
#### Frontend
- **Tech stack**: React/Next.js + TypeScript
- **UI components**: Tailwind CSS + shadcn/ui
- **State management**: React Query for API data
- **Real-time**: WebSocket for live updates (optional)
#### Backend
- **API**: Node.js/Express or Python/FastAPI
- **Database**: PostgreSQL for relational data
- **Queue system**: Redis + Bull/BullMQ for job scheduling
- **Storage**: S3-compatible for snapshots and screenshots
#### Monitoring Engine
- **Fetcher**: Axios/Got for simple pages
- **Browser**: Puppeteer/Playwright for JS-heavy sites
- **Differ**: jsdiff or custom algorithm
- **Scheduler**: Distributed job queue with priority
- **Rate limiting**: Per-domain backoff
#### Alert System
- **Email**: SendGrid/Postmark
- **Queue**: Separate alert queue for reliability
- **Templates**: Customizable alert formats
- **Delivery tracking**: Open/click tracking
### Data Models
#### Monitor
```
id, user_id, url, name, frequency, element_selector,
ignore_rules, keyword_rules, alert_settings, status,
created_at, last_checked_at, last_changed_at
```
#### Snapshot
```
id, monitor_id, html_content, screenshot_url,
content_hash, http_status, error_message,
created_at, changed_from_previous
```
#### Alert
```
id, monitor_id, snapshot_id, alert_type,
delivered_at, delivery_status, channels
```
---
## Monetization & Plan Gating
### Free Tier
- 5 monitors
- 1-hour minimum frequency
- 7-day history retention
- Email alerts only
- Basic noise filtering
### Pro Tier ($19-29/month)
- 50 monitors
- 5-minute frequency
- 90-day history
- All alert channels
- Advanced filtering + keywords
- Screenshot snapshots
### Business Tier ($99-149/month)
- 200 monitors
- 1-minute frequency
- 1-year history
- API access
- Team collaboration (5 seats)
- Priority support
- JS rendering included
### Enterprise Tier (Custom)
- Unlimited monitors
- Custom frequency
- Unlimited history
- Dedicated infrastructure
- SLA guarantees
- SSO/SAML
- Custom integrations
### Add-ons
- Extra monitors: $5 per 10
- Extended history: $10/month
- Additional team seats: $15/seat
- JS rendering credits: $20/100 pages
---
## Success Metrics
### Product KPIs
- Monitors created per user
- Check success rate (>99%)
- False positive rate (<5%)
- Alert open rate (>40%)
- User retention (D7, D30, M3)
### Business KPIs
- Free → Paid conversion (target >10%)
- Churn rate (target <5% monthly)
- Average monitors per paid user (target >15)
- CAC < 3x MRR
- Net promoter score (target >50)
---
## Security & Compliance
### Security
- **Authentication**: JWT + refresh tokens
- **2FA**: TOTP support
- **Encryption**: At rest (database) and in transit (TLS)
- **API keys**: Scoped, revocable
- **Rate limiting**: Per user and IP
### Privacy
- **GDPR**: Data export, deletion, consent management
- **Data retention**: Configurable, automatic cleanup
- **No tracking**: Don't store personal data from monitored pages
- **Anonymization**: Strip cookies/sessions from snapshots
### Reliability
- **Uptime SLA**: 99.9% for paid plans
- **Status page**: Public incident tracking
- **Backups**: Daily encrypted backups
- **Disaster recovery**: 24-hour RTO
---
## Competitive Differentiation
### vs. Visualping/ChangeTower
- **Better noise filtering**: AI-powered content classification
- **Smarter alerts**: Keyword-based + summarization
- **Better UX**: Cleaner UI, faster setup
### vs. Distill.io
- **Team features**: Built for collaboration from day one
- **More integrations**: Wider ecosystem support
- **Better pricing**: More generous free tier
### vs. Wachete
- **Modern tech**: Faster, more reliable
- **Visual diff**: Screenshot comparison
- **API-first**: Better for automation