website-monitor/claude.md

14 KiB

Website Change Detection Monitor - Claude Context

Project Overview

This is a Website Change Detection Monitor SaaS application. The core value proposition is helping users track changes on web pages they care about, with intelligent noise filtering to ensure only meaningful changes trigger alerts.

Tagline: "I watch pages so you don't have to"


Key Differentiators

  1. Smart Noise Filtering: Unlike competitors, we automatically filter out cookie banners, timestamps, rotating ads, and other irrelevant changes
  2. Keyword-Based Alerts: Users can be notified when specific words/phrases appear or disappear (e.g., "sold out", "hiring", "$99")
  3. Simple but Powerful: Easy enough for non-technical users, powerful enough for professionals
  4. SEO-Optimized Market: Tons of long-tail keywords (e.g., "monitor job postings", "track competitor prices")

Architecture Overview

Frontend:

  • Next.js 14+ (App Router)
  • TypeScript
  • Tailwind CSS + shadcn/ui components
  • React Query for state management
  • Zod for validation

Backend:

  • Node.js + Express OR Python + FastAPI
  • PostgreSQL for relational data
  • Redis + Bull/BullMQ for job queuing
  • Puppeteer/Playwright for JS-heavy sites

Infrastructure:

  • Vercel/Railway for frontend hosting
  • Render/Railway/AWS for backend
  • AWS S3 or Cloudflare R2 for snapshot storage
  • Upstash Redis or managed Redis

Third-Party Services:

  • Stripe for billing
  • SendGrid/Postmark for emails
  • Sentry for error tracking
  • PostHog/Mixpanel for analytics

Project Structure

/website-monitor
├── /frontend (Next.js)
│   ├── /app
│   │   ├── /dashboard
│   │   ├── /monitors
│   │   ├── /settings
│   │   └── /auth
│   ├── /components
│   │   ├── /ui (shadcn components)
│   │   ├── /monitors
│   │   └── /diff-viewer
│   ├── /lib
│   │   ├── api-client.ts
│   │   ├── auth.ts
│   │   └── utils.ts
│   └── /public
├── /backend
│   ├── /src
│   │   ├── /routes
│   │   ├── /controllers
│   │   ├── /models
│   │   ├── /services
│   │   │   ├── fetcher.ts
│   │   │   ├── differ.ts
│   │   │   ├── scheduler.ts
│   │   │   └── alerter.ts
│   │   ├── /jobs
│   │   └── /utils
│   ├── /db
│   │   └── /migrations
│   └── /tests
├── /docs
│   ├── spec.md
│   ├── task.md
│   ├── actions.md
│   └── claude.md (this file)
└── README.md

Core Entities & Data Models

User

{
  id: string
  email: string
  passwordHash: string
  plan: 'free' | 'pro' | 'business' | 'enterprise'
  stripeCustomerId: string
  createdAt: Date
  lastLoginAt: Date
}

Monitor

{
  id: string
  userId: string
  url: string
  name: string
  frequency: number // minutes
  status: 'active' | 'paused' | 'error'

  // Advanced features
  elementSelector?: string
  ignoreRules?: {
    type: 'css' | 'regex' | 'text'
    value: string
  }[]
  keywordRules?: {
    keyword: string
    type: 'appears' | 'disappears' | 'count'
    threshold?: number
  }[]

  // Metadata
  lastCheckedAt?: Date
  lastChangedAt?: Date
  consecutiveErrors: number
  createdAt: Date
}

Snapshot

{
  id: string
  monitorId: string
  htmlContent: string
  contentHash: string
  screenshotUrl?: string

  // Status
  httpStatus: number
  responseTime: number
  changed: boolean
  changePercentage?: number

  // Errors
  errorMessage?: string

  // Metadata
  createdAt: Date
}

Alert

{
  id: string
  monitorId: string
  snapshotId: string
  userId: string

  // Alert details
  type: 'change' | 'error' | 'keyword'
  title: string
  summary?: string

  // Delivery
  channels: ('email' | 'slack' | 'webhook')[]
  deliveredAt?: Date
  readAt?: Date

  createdAt: Date
}

Key Algorithms & Logic

Change Detection

// Simple hash comparison for binary change detection
const changed = previousHash !== currentHash

// Text diff for detailed comparison
const diff = diffLines(previousText, currentText)
const changePercentage = (changedLines / totalLines) * 100

// Severity calculation
const severity =
  changePercentage > 50 ? 'major' :
  changePercentage > 10 ? 'medium' : 'minor'

Noise Filtering

// Remove common noise patterns
function filterNoise(html: string): string {
  // Remove timestamps
  html = html.replace(/\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g, '')

  // Remove cookie banners (common selectors)
  const noisySelectors = [
    '.cookie-banner',
    '#cookie-notice',
    '[class*="consent"]',
    // ... more patterns
  ]

  // Parse and remove elements
  const $ = cheerio.load(html)
  noisySelectors.forEach(sel => $(sel).remove())

  return $.html()
}

Keyword Detection

function checkKeywords(
  previousText: string,
  currentText: string,
  rules: KeywordRule[]
): KeywordMatch[] {
  const matches = []

  for (const rule of rules) {
    const prevMatch = previousText.includes(rule.keyword)
    const currMatch = currentText.includes(rule.keyword)

    if (rule.type === 'appears' && !prevMatch && currMatch) {
      matches.push({ rule, type: 'appeared' })
    }
    if (rule.type === 'disappears' && prevMatch && !currMatch) {
      matches.push({ rule, type: 'disappeared' })
    }

    // Count logic...
  }

  return matches
}

Development Guidelines

When Working on This Project

  1. Prioritize MVP: Focus on core features before adding complexity
  2. Performance matters: Diffing and fetching should be fast (<2s)
  3. Noise reduction is key: This is our competitive advantage
  4. User feedback loop: Build in ways to learn from false positives
  5. Security first: Never store credentials in plain text, sanitize all URLs

Code Style

  • Use TypeScript strict mode
  • Write unit tests for core algorithms (differ, filter, keyword)
  • Use async/await, avoid callbacks
  • Prefer functional programming patterns
  • Comment complex logic, especially regex patterns

API Design Principles

  • RESTful endpoints
  • Use proper HTTP status codes
  • Return consistent error format:
    {
      "error": "monitor_not_found",
      "message": "Monitor with id 123 not found",
      "details": {}
    }
    
  • Paginate list endpoints (monitors, snapshots, alerts)
  • Version API if breaking changes needed (/v1/monitors)

Common Tasks & Commands

When Starting Development

# Clone and setup
git clone <repo>
cd website-monitor

# Install dependencies
cd frontend && npm install
cd ../backend && npm install

# Setup environment
cp .env.example .env
# Edit .env with your values

# Start database
docker-compose up -d postgres redis

# Run migrations
cd backend && npm run migrate

# Start dev servers
cd frontend && npm run dev
cd backend && npm run dev

Running Tests

# Frontend tests
cd frontend && npm test

# Backend tests
cd backend && npm test

# E2E tests
npm run test:e2e

Deployment

# Build frontend
cd frontend && npm run build

# Deploy frontend (Vercel)
vercel deploy --prod

# Deploy backend
docker build -t monitor-api .
docker push <registry>/monitor-api
# Deploy to Railway/Render/AWS

Key User Flows to Support

When building features, always consider these primary use cases:

  1. Job seeker monitoring career pages (most common)

    • Needs: Fast frequency (5 min), keyword alerts, instant notifications
  2. Price tracking for e-commerce (high value)

    • Needs: Element selection, numeric comparison, reliable alerts
  3. Competitor monitoring (B2B focus)

    • Needs: Multiple monitors, digest mode, AI summaries
  4. Stock/availability tracking (urgent)

    • Needs: Fastest frequency (1 min), SMS alerts, auto-pause
  5. Policy/regulation monitoring (professional)

    • Needs: Long-term history, team sharing, AI summaries

Integration Points

Email Service (SendGrid/Postmark)

async function sendChangeAlert(monitor: Monitor, snapshot: Snapshot) {
  const diffUrl = `https://app.example.com/monitors/${monitor.id}/diff/${snapshot.id}`

  await emailService.send({
    to: monitor.user.email,
    subject: `Change detected: ${monitor.name}`,
    template: 'change-alert',
    data: {
      monitorName: monitor.name,
      url: monitor.url,
      timestamp: snapshot.createdAt,
      diffUrl,
      changePercentage: snapshot.changePercentage
    }
  })
}

Stripe Billing

async function handleSubscription(userId: string, plan: string) {
  const user = await db.users.findById(userId)

  // Create or update subscription
  const subscription = await stripe.subscriptions.create({
    customer: user.stripeCustomerId,
    items: [{ price: PRICE_IDS[plan] }]
  })

  // Update user plan
  await db.users.update(userId, {
    plan,
    subscriptionId: subscription.id
  })
}

Job Queue (Bull)

// Schedule monitor checks
async function scheduleMonitor(monitor: Monitor) {
  await monitorQueue.add(
    'check-monitor',
    { monitorId: monitor.id },
    {
      repeat: {
        every: monitor.frequency * 60 * 1000 // convert to ms
      },
      jobId: `monitor-${monitor.id}`
    }
  )
}

// Process checks
monitorQueue.process('check-monitor', async (job) => {
  const { monitorId } = job.data
  await checkMonitor(monitorId)
})

Testing Strategy

Unit Tests

  • Diff algorithms
  • Noise filtering
  • Keyword matching
  • Ignore rules application

Integration Tests

  • API endpoints
  • Database operations
  • Job queue processing

E2E Tests

  • User registration & login
  • Monitor creation & management
  • Alert delivery
  • Subscription changes

Performance Tests

  • Fetch speed with various page sizes
  • Diff calculation speed
  • Concurrent monitor checks
  • Database query performance

Deployment Checklist

Before deploying to production:

  • Environment variables configured
  • Database migrations run
  • SSL certificates configured
  • Email deliverability tested
  • Payment processing tested (Stripe test mode → live mode)
  • Error tracking configured (Sentry)
  • Monitoring & alerts set up (uptime, error rate, queue health)
  • Backup strategy implemented
  • Rate limiting configured
  • GDPR compliance (privacy policy, data export/deletion)
  • Security headers configured
  • API documentation updated

Troubleshooting Common Issues

"Monitor keeps triggering false alerts"

  • Check if noise filtering is working
  • Review ignore rules for the monitor
  • Look at diff to identify changing element
  • Add custom ignore rule for that element

"Some pages aren't being monitored correctly"

  • Check if page requires JavaScript rendering
  • Try enabling headless browser mode
  • Check if page requires authentication
  • Look for CAPTCHA or bot detection

"Alerts aren't being delivered"

  • Check email service status
  • Verify email isn't going to spam
  • Check alert queue for errors
  • Verify user's alert settings

"System is slow/overloaded"

  • Check Redis queue health
  • Look for monitors with very high frequency
  • Check database query performance
  • Consider scaling workers horizontally

Metrics to Track

Technical Metrics

  • Average check duration
  • Diff calculation time
  • Check success rate
  • Alert delivery rate
  • Queue processing lag

Product Metrics

  • Active monitors per user
  • Alerts sent per day
  • False positive rate (from user feedback)
  • Feature adoption (keywords, elements, integrations)

Business Metrics

  • Free → Paid conversion rate
  • Monthly churn rate
  • Average revenue per user (ARPU)
  • Customer acquisition cost (CAC)
  • Lifetime value (LTV)

Resources & Documentation

External Documentation

Internal Documentation

  • See spec.md for complete feature specifications
  • See task.md for development roadmap
  • See actions.md for user workflows and use cases

Future Considerations

Potential Enhancements

  • Mobile app (React Native or Progressive Web App)
  • Browser extension for quick monitor addition
  • AI-powered change importance scoring
  • Collaborative features (team annotations, approval workflows)
  • Marketplace for monitor templates
  • Affiliate program for power users

Scaling Considerations

  • Distributed workers across multiple regions
  • Caching layer for frequently accessed pages
  • Database sharding by user
  • Separate queue for high-frequency monitors
  • CDN for snapshot storage

Notes for Claude

When working on this project:

  1. Always reference these docs: spec.md, task.md, actions.md, and this file
  2. MVP mindset: Implement the simplest solution that works first
  3. User-centric: Consider the user workflows in actions.md when building features
  4. Security-conscious: Validate URLs, sanitize inputs, encrypt sensitive data
  5. Performance-aware: Optimize for speed, especially diff calculation
  6. Ask clarifying questions: If requirements are ambiguous, ask before implementing
  7. Test as you go: Write tests for core functionality
  8. Document decisions: Update these docs when making architectural decisions

Common Questions & Answers

Q: Should we support authenticated pages in MVP? A: No, save for V2. Focus on public pages first.

Q: What diff library should we use? A: diff (npm) or jsdiff for JavaScript, difflib for Python.

Q: How do we handle CAPTCHA? A: For MVP, just alert the user. For V2, consider residential proxies or browser fingerprinting.

Q: Should we store full HTML or just text? A: Store both: full HTML for accuracy, extracted text for diffing performance.

Q: What's the minimum viable frequency? A: 5 minutes for paid users, 1 hour for free tier.


Quick Reference

Key Files

  • spec.md - Feature specifications
  • task.md - Development tasks and roadmap
  • actions.md - User workflows and use cases
  • claude.md - This file (project context)

Key Concepts

  • Noise reduction - Core differentiator
  • Keyword alerts - High-value feature
  • Element selection - Monitor specific parts
  • Change severity - Classify importance

Pricing Tiers

  • Free: 5 monitors, 1hr frequency
  • Pro: 50 monitors, 5min frequency, $19-29/mo
  • Business: 200 monitors, 1min frequency, teams, $99-149/mo
  • Enterprise: Unlimited, custom pricing

Last updated: 2026-01-16