website-monitor/spec.md

9.4 KiB

Website Change Detection Monitor - Technical Specification

Product Overview

A SaaS platform that monitors web pages for changes and alerts users when meaningful updates occur. The core value proposition is "signal over noise" - detecting real changes while filtering out irrelevant updates.

Core Value Propositions

  • Easy to understand: "I watch pages so you don't have to"
  • Smart filtering: Automatically ignores timestamps, cookie banners, and noise
  • Keyword intelligence: Alert on specific content appearing/disappearing
  • SEO-friendly: Captures long-tail keywords (e.g., "monitor job posting changes")

Feature Specification by Phase

MVP Features (Launch Fast)

1. URL Monitoring

  • Track URLs: Add any public web page by URL
  • Frequency options: 5min / 30min / 6hr / 24hr intervals
  • Change detection methods:
    • Content hash comparison (fast, binary change detection)
    • Text diff (character/line level differences)
  • Storage: Last 5-10 snapshots per URL

2. Alert System

  • Email notifications: Send alert when change detected
  • Alert content:
    • Timestamp of change
    • Link to diff view
    • Change severity indicator
  • Basic throttling: Max 1 alert per check interval

3. Change Viewing

  • History timeline: Chronological list of all checks
    • Status (changed/unchanged/error)
    • Timestamp
    • Response code
  • Diff viewer:
    • Side-by-side or unified view
    • Highlighted additions/deletions
    • Character-level diff for precision
  • Snapshot storage: Full HTML snapshots for history

4. Reliability

  • Retry logic: 2-3 attempts on timeout/5xx errors
  • Error alerts: Notify if page becomes unavailable
  • Status tracking:
    • HTTP response codes
    • Timeout detection
    • Robot/CAPTCHA blocking detection
  • Run logs: Detailed history of each check attempt

V1 Features (People Pay)

5. Noise Reduction (Differentiator)

  • Automatic filtering:
    • Cookie consent banners
    • Timestamps and "last updated" text
    • Current date/time displays
    • Session IDs in URLs
    • Rotating content (recommendations, ads)
  • Custom ignore rules:
    • Text pattern matching (regex)
    • CSS selector exclusion
    • Ignore numeric changes except in specific areas
    • Whitelist/blacklist mode

6. Selective Monitoring

  • Element monitoring: Track only specific page sections
    • Visual point-and-click selector
    • CSS selector input (power users)
    • XPath support
  • Multiple elements: Track different sections separately
  • Element naming: Label tracked elements for clarity

7. Keyword-Based Alerts (High Value)

  • Keyword rules:
    • Alert when keyword appears
    • Alert when keyword disappears
    • Alert when keyword count changes
    • Threshold-based alerts ("less than 5 items")
  • Regex support: Advanced pattern matching
  • Multiple keywords: AND/OR logic combinations
  • Case sensitivity options

8. Advanced Alerting

  • Digest mode: Daily/weekly summary of all changes
  • Quiet hours: No alerts during specified times
  • Alert throttling: Configurable limits
    • Max alerts per hour/day
    • Cooldown periods
  • Severity filtering: Only alert on major changes
  • Multiple channels per monitor: Email + Slack, etc.

V2 Features (Market Winner)

9. Visual Change Detection

  • Screenshot capture: Full-page screenshots
  • Image diff: Visual highlighting of changed areas
  • Pixel-perfect detection: Layout shift detection
  • Before/After carousel: Easy visual comparison
  • Screenshot retention: Based on plan tier

10. AI-Powered Summaries

  • Change summarization: Natural language description
    • "Price changed from $29.99 to $24.99"
    • "New 'Out of Stock' banner added"
  • Change classification:
    • Price change
    • Availability change
    • Policy/text update
    • Layout-only change
  • Smart alerts: Only notify for meaningful changes
  • Summary in alert: No need to view diff for simple changes

11. Complex Page Support

  • JavaScript rendering: Headless browser mode
  • Authentication:
    • Basic auth
    • Session cookies
    • Login flow automation
  • Dynamic content: Wait for AJAX/lazy loading
  • SPA support: Monitor client-side rendered apps

12. Integrations

  • Slack: Channel notifications
  • Discord: Webhook alerts
  • Microsoft Teams: Connector integration
  • Webhook: Generic POST for automation tools
  • RSS feed: Per-monitor or global feed
  • Zapier/Make: Pre-built integrations
  • API: Programmatic access to monitors and history

Power User Features (Teams & Scale)

13. Organization

  • Folders/Projects: Hierarchical organization
  • Tags: Multi-dimensional categorization
  • Search: Full-text search across monitors and history
  • Filters: Status, tags, frequency, last changed
  • Bulk operations:
    • Import URLs from CSV
    • Export history
    • Bulk pause/resume
    • Bulk delete

14. Collaboration

  • Team workspaces: Shared monitor collections
  • Role-based access:
    • Admin: Full control
    • Editor: Create/edit monitors
    • Viewer: Read-only access
  • Assignment: Assign monitors to team members
  • Comments: Annotate changes
  • Audit log: Track team actions

15. Advanced Scheduling

  • Custom schedules: Cron-like expressions
  • Business hours only: Skip nights/weekends
  • Timezone-aware: Different times per monitor
  • Geo-distributed checks: Monitor from multiple regions
  • Adaptive frequency: Check more often during active periods

Technical Architecture

Core Components

Frontend

  • Tech stack: React/Next.js + TypeScript
  • UI components: Tailwind CSS + shadcn/ui
  • State management: React Query for API data
  • Real-time: WebSocket for live updates (optional)

Backend

  • API: Node.js/Express or Python/FastAPI
  • Database: PostgreSQL for relational data
  • Queue system: Redis + Bull/BullMQ for job scheduling
  • Storage: S3-compatible for snapshots and screenshots

Monitoring Engine

  • Fetcher: Axios/Got for simple pages
  • Browser: Puppeteer/Playwright for JS-heavy sites
  • Differ: jsdiff or custom algorithm
  • Scheduler: Distributed job queue with priority
  • Rate limiting: Per-domain backoff

Alert System

  • Email: SendGrid/Postmark
  • Queue: Separate alert queue for reliability
  • Templates: Customizable alert formats
  • Delivery tracking: Open/click tracking

Data Models

Monitor

id, user_id, url, name, frequency, element_selector,
ignore_rules, keyword_rules, alert_settings, status,
created_at, last_checked_at, last_changed_at

Snapshot

id, monitor_id, html_content, screenshot_url,
content_hash, http_status, error_message,
created_at, changed_from_previous

Alert

id, monitor_id, snapshot_id, alert_type,
delivered_at, delivery_status, channels

Monetization & Plan Gating

Free Tier

  • 5 monitors
  • 1-hour minimum frequency
  • 7-day history retention
  • Email alerts only
  • Basic noise filtering

Pro Tier ($19-29/month)

  • 50 monitors
  • 5-minute frequency
  • 90-day history
  • All alert channels
  • Advanced filtering + keywords
  • Screenshot snapshots

Business Tier ($99-149/month)

  • 200 monitors
  • 1-minute frequency
  • 1-year history
  • API access
  • Team collaboration (5 seats)
  • Priority support
  • JS rendering included

Enterprise Tier (Custom)

  • Unlimited monitors
  • Custom frequency
  • Unlimited history
  • Dedicated infrastructure
  • SLA guarantees
  • SSO/SAML
  • Custom integrations

Add-ons

  • Extra monitors: $5 per 10
  • Extended history: $10/month
  • Additional team seats: $15/seat
  • JS rendering credits: $20/100 pages

Success Metrics

Product KPIs

  • Monitors created per user
  • Check success rate (>99%)
  • False positive rate (<5%)
  • Alert open rate (>40%)
  • User retention (D7, D30, M3)

Business KPIs

  • Free → Paid conversion (target >10%)
  • Churn rate (target <5% monthly)
  • Average monitors per paid user (target >15)
  • CAC < 3x MRR
  • Net promoter score (target >50)

Security & Compliance

Security

  • Authentication: JWT + refresh tokens
  • 2FA: TOTP support
  • Encryption: At rest (database) and in transit (TLS)
  • API keys: Scoped, revocable
  • Rate limiting: Per user and IP

Privacy

  • GDPR: Data export, deletion, consent management
  • Data retention: Configurable, automatic cleanup
  • No tracking: Don't store personal data from monitored pages
  • Anonymization: Strip cookies/sessions from snapshots

Reliability

  • Uptime SLA: 99.9% for paid plans
  • Status page: Public incident tracking
  • Backups: Daily encrypted backups
  • Disaster recovery: 24-hour RTO

Competitive Differentiation

vs. Visualping/ChangeTower

  • Better noise filtering: AI-powered content classification
  • Smarter alerts: Keyword-based + summarization
  • Better UX: Cleaner UI, faster setup

vs. Distill.io

  • Team features: Built for collaboration from day one
  • More integrations: Wider ecosystem support
  • Better pricing: More generous free tier

vs. Wachete

  • Modern tech: Faster, more reliable
  • Visual diff: Screenshot comparison
  • API-first: Better for automation