# Website Change Detection Monitor - Claude Context ## Project Overview This is a **Website Change Detection Monitor SaaS** application. The core value proposition is helping users track changes on web pages they care about, with intelligent noise filtering to ensure only meaningful changes trigger alerts. **Updated Tagline (2026-01-18)**: "Less noise. More signal. Proof included." **Previous**: "I watch pages so you don't have to" (too generic, doesn't communicate value) **Target Market**: SEO & Growth Teams (SMB → Mid-Market) who monitor competitor pages, SERP changes, and policy updates --- ## Key Differentiators (Market-Validated) 1. **Superior Noise Filtering** 🔥 - Automatically filter cookie banners, timestamps, rotating ads, session IDs - Custom ignore rules (CSS selectors, regex, text patterns) - **Market Evidence**: Distill & Fluxguard emphasize this as core differentiator 2. **Keyword-Based Alerts** 🔥 - Alert when specific words appear/disappear (e.g., "sold out", "hiring", "$99") - Threshold-based triggers, regex support - **Market Evidence**: High-value feature across all competitors 3. **Workflow Integrations** 🔥 NEW PRIORITY - Webhooks (MVP), Slack (V1), Teams/Discord (V2) - Alerts in your existing tools, not just email - **Market Evidence**: Shown prominently by Visualping, Wachete, ChangeDetection 4. **Proof & History** 🔥 - Compare versions, audit-proof snapshots, full history - Messaging: "Prove changes" not just "see changes" - **Market Evidence**: Sken & Fluxguard highlight "versions kept" 5. **Use-Case-Focused Marketing** - Primary: SEO Monitoring, Competitor Tracking, Policy/Compliance - Secondary: Stock/Restock, Job Postings - **Market Evidence**: All competitors segment by use case --- ## Architecture Overview ### Tech Stack (Recommended) **Frontend**: - Next.js 14+ (App Router) - TypeScript - Tailwind CSS + shadcn/ui components - React Query for state management - Zod for validation **Backend**: - Node.js + Express OR Python + FastAPI - PostgreSQL for relational data - Redis + Bull/BullMQ for job queuing - Puppeteer/Playwright for JS-heavy sites **Infrastructure**: - Vercel/Railway for frontend hosting - Render/Railway/AWS for backend - AWS S3 or Cloudflare R2 for snapshot storage - Upstash Redis or managed Redis **Third-Party Services**: - Stripe for billing - SendGrid/Postmark for emails - Sentry for error tracking - PostHog/Mixpanel for analytics --- ## Project Structure ``` /website-monitor ├── /frontend (Next.js) │ ├── /app │ │ ├── /dashboard │ │ ├── /monitors │ │ ├── /settings │ │ └── /auth │ ├── /components │ │ ├── /ui (shadcn components) │ │ ├── /monitors │ │ └── /diff-viewer │ ├── /lib │ │ ├── api-client.ts │ │ ├── auth.ts │ │ └── utils.ts │ └── /public ├── /backend │ ├── /src │ │ ├── /routes │ │ ├── /controllers │ │ ├── /models │ │ ├── /services │ │ │ ├── fetcher.ts │ │ │ ├── differ.ts │ │ │ ├── scheduler.ts │ │ │ └── alerter.ts │ │ ├── /jobs │ │ └── /utils │ ├── /db │ │ └── /migrations │ └── /tests ├── /docs │ ├── spec.md │ ├── task.md │ ├── actions.md │ └── claude.md (this file) └── README.md ``` --- ## Core Entities & Data Models ### User ```typescript { id: string email: string passwordHash: string plan: 'free' | 'pro' | 'business' | 'enterprise' stripeCustomerId: string createdAt: Date lastLoginAt: Date } ``` ### Monitor ```typescript { id: string userId: string url: string name: string frequency: number // minutes status: 'active' | 'paused' | 'error' // Advanced features elementSelector?: string ignoreRules?: { type: 'css' | 'regex' | 'text' value: string }[] keywordRules?: { keyword: string type: 'appears' | 'disappears' | 'count' threshold?: number }[] // Metadata lastCheckedAt?: Date lastChangedAt?: Date consecutiveErrors: number createdAt: Date } ``` ### Snapshot ```typescript { id: string monitorId: string htmlContent: string contentHash: string screenshotUrl?: string // Status httpStatus: number responseTime: number changed: boolean changePercentage?: number // Errors errorMessage?: string // Metadata createdAt: Date } ``` ### Alert ```typescript { id: string monitorId: string snapshotId: string userId: string // Alert details type: 'change' | 'error' | 'keyword' title: string summary?: string // Delivery channels: ('email' | 'slack' | 'webhook')[] deliveredAt?: Date readAt?: Date createdAt: Date } ``` --- ## Key Algorithms & Logic ### Change Detection ```typescript // Simple hash comparison for binary change detection const changed = previousHash !== currentHash // Text diff for detailed comparison const diff = diffLines(previousText, currentText) const changePercentage = (changedLines / totalLines) * 100 // Severity calculation const severity = changePercentage > 50 ? 'major' : changePercentage > 10 ? 'medium' : 'minor' ``` ### Noise Filtering ```typescript // Remove common noise patterns function filterNoise(html: string): string { // Remove timestamps html = html.replace(/\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g, '') // Remove cookie banners (common selectors) const noisySelectors = [ '.cookie-banner', '#cookie-notice', '[class*="consent"]', // ... more patterns ] // Parse and remove elements const $ = cheerio.load(html) noisySelectors.forEach(sel => $(sel).remove()) return $.html() } ``` ### Keyword Detection ```typescript function checkKeywords( previousText: string, currentText: string, rules: KeywordRule[] ): KeywordMatch[] { const matches = [] for (const rule of rules) { const prevMatch = previousText.includes(rule.keyword) const currMatch = currentText.includes(rule.keyword) if (rule.type === 'appears' && !prevMatch && currMatch) { matches.push({ rule, type: 'appeared' }) } if (rule.type === 'disappears' && prevMatch && !currMatch) { matches.push({ rule, type: 'disappeared' }) } // Count logic... } return matches } ``` --- ## Development Guidelines ### When Working on This Project 1. **Prioritize MVP**: Focus on core features before adding complexity 2. **Performance matters**: Diffing and fetching should be fast (<2s) 3. **Noise reduction is key**: This is our competitive advantage 4. **User feedback loop**: Build in ways to learn from false positives 5. **Security first**: Never store credentials in plain text, sanitize all URLs ### Code Style - Use TypeScript strict mode - Write unit tests for core algorithms (differ, filter, keyword) - Use async/await, avoid callbacks - Prefer functional programming patterns - Comment complex logic, especially regex patterns ### API Design Principles - RESTful endpoints - Use proper HTTP status codes - Return consistent error format: ```json { "error": "monitor_not_found", "message": "Monitor with id 123 not found", "details": {} } ``` - Paginate list endpoints (monitors, snapshots, alerts) - Version API if breaking changes needed (/v1/monitors) --- ## Common Tasks & Commands ### When Starting Development ```bash # Clone and setup git clone cd website-monitor # Install dependencies cd frontend && npm install cd ../backend && npm install # Setup environment cp .env.example .env # Edit .env with your values # Start database docker-compose up -d postgres redis # Run migrations cd backend && npm run migrate # Start dev servers cd frontend && npm run dev cd backend && npm run dev ``` ### Running Tests ```bash # Frontend tests cd frontend && npm test # Backend tests cd backend && npm test # E2E tests npm run test:e2e ``` ### Deployment ```bash # Build frontend cd frontend && npm run build # Deploy frontend (Vercel) vercel deploy --prod # Deploy backend docker build -t monitor-api . docker push /monitor-api # Deploy to Railway/Render/AWS ``` --- ## Key User Flows to Support When building features, always consider these primary use cases: 1. **Job seeker monitoring career pages** (most common) - Needs: Fast frequency (5 min), keyword alerts, instant notifications 2. **Price tracking for e-commerce** (high value) - Needs: Element selection, numeric comparison, reliable alerts 3. **Competitor monitoring** (B2B focus) - Needs: Multiple monitors, digest mode, AI summaries 4. **Stock/availability tracking** (urgent) - Needs: Fastest frequency (1 min), SMS alerts, auto-pause 5. **Policy/regulation monitoring** (professional) - Needs: Long-term history, team sharing, AI summaries --- ## Integration Points ### Email Service (SendGrid/Postmark) ```typescript async function sendChangeAlert(monitor: Monitor, snapshot: Snapshot) { const diffUrl = `https://app.example.com/monitors/${monitor.id}/diff/${snapshot.id}` await emailService.send({ to: monitor.user.email, subject: `Change detected: ${monitor.name}`, template: 'change-alert', data: { monitorName: monitor.name, url: monitor.url, timestamp: snapshot.createdAt, diffUrl, changePercentage: snapshot.changePercentage } }) } ``` ### Stripe Billing ```typescript async function handleSubscription(userId: string, plan: string) { const user = await db.users.findById(userId) // Create or update subscription const subscription = await stripe.subscriptions.create({ customer: user.stripeCustomerId, items: [{ price: PRICE_IDS[plan] }] }) // Update user plan await db.users.update(userId, { plan, subscriptionId: subscription.id }) } ``` ### Job Queue (Bull) ```typescript // Schedule monitor checks async function scheduleMonitor(monitor: Monitor) { await monitorQueue.add( 'check-monitor', { monitorId: monitor.id }, { repeat: { every: monitor.frequency * 60 * 1000 // convert to ms }, jobId: `monitor-${monitor.id}` } ) } // Process checks monitorQueue.process('check-monitor', async (job) => { const { monitorId } = job.data await checkMonitor(monitorId) }) ``` --- ## Testing Strategy ### Unit Tests - Diff algorithms - Noise filtering - Keyword matching - Ignore rules application ### Integration Tests - API endpoints - Database operations - Job queue processing ### E2E Tests - User registration & login - Monitor creation & management - Alert delivery - Subscription changes ### Performance Tests - Fetch speed with various page sizes - Diff calculation speed - Concurrent monitor checks - Database query performance --- ## Deployment Checklist Before deploying to production: - [ ] Environment variables configured - [ ] Database migrations run - [ ] SSL certificates configured - [ ] Email deliverability tested - [ ] Payment processing tested (Stripe test mode → live mode) - [ ] Error tracking configured (Sentry) - [ ] Monitoring & alerts set up (uptime, error rate, queue health) - [ ] Backup strategy implemented - [ ] Rate limiting configured - [ ] GDPR compliance (privacy policy, data export/deletion) - [ ] Security headers configured - [ ] API documentation updated --- ## Troubleshooting Common Issues ### "Monitor keeps triggering false alerts" - Check if noise filtering is working - Review ignore rules for the monitor - Look at diff to identify changing element - Add custom ignore rule for that element ### "Some pages aren't being monitored correctly" - Check if page requires JavaScript rendering - Try enabling headless browser mode - Check if page requires authentication - Look for CAPTCHA or bot detection ### "Alerts aren't being delivered" - Check email service status - Verify email isn't going to spam - Check alert queue for errors - Verify user's alert settings ### "System is slow/overloaded" - Check Redis queue health - Look for monitors with very high frequency - Check database query performance - Consider scaling workers horizontally --- ## Metrics to Track ### Technical Metrics - Average check duration - Diff calculation time - Check success rate - Alert delivery rate - Queue processing lag ### Product Metrics - Active monitors per user - Alerts sent per day - False positive rate (from user feedback) - Feature adoption (keywords, elements, integrations) ### Business Metrics - Free → Paid conversion rate - Monthly churn rate - Average revenue per user (ARPU) - Customer acquisition cost (CAC) - Lifetime value (LTV) --- ## Resources & Documentation ### External Documentation - [Next.js Docs](https://nextjs.org/docs) - [Tailwind CSS](https://tailwindcss.com/docs) - [Playwright Docs](https://playwright.dev) - [Bull Queue](https://github.com/OptimalBits/bull) - [Stripe API](https://stripe.com/docs/api) ### Internal Documentation - See `spec.md` for complete feature specifications - See `task.md` for development roadmap - See `actions.md` for user workflows and use cases --- ## Future Considerations ### Potential Enhancements - Mobile app (React Native or Progressive Web App) - Browser extension for quick monitor addition - AI-powered change importance scoring - Collaborative features (team annotations, approval workflows) - Marketplace for monitor templates - Affiliate program for power users ### Scaling Considerations - Distributed workers across multiple regions - Caching layer for frequently accessed pages - Database sharding by user - Separate queue for high-frequency monitors - CDN for snapshot storage --- ## Notes for Claude When working on this project: 1. **Always reference these docs**: spec.md, task.md, actions.md, and this file 2. **MVP mindset**: Implement the simplest solution that works first 3. **User-centric**: Consider the user workflows in actions.md when building features 4. **Security-conscious**: Validate URLs, sanitize inputs, encrypt sensitive data 5. **Performance-aware**: Optimize for speed, especially diff calculation 6. **Ask clarifying questions**: If requirements are ambiguous, ask before implementing 7. **Test as you go**: Write tests for core functionality 8. **Document decisions**: Update these docs when making architectural decisions ### Common Questions & Answers **Q: Should we support authenticated pages in MVP?** A: No, save for V2. Focus on public pages first. **Q: What diff library should we use?** A: `diff` (npm) or `jsdiff` for JavaScript, `difflib` for Python. **Q: How do we handle CAPTCHA?** A: For MVP, just alert the user. For V2, consider residential proxies or browser fingerprinting. **Q: Should we store full HTML or just text?** A: Store both: full HTML for accuracy, extracted text for diffing performance. **Q: What's the minimum viable frequency?** A: 5 minutes for paid users, 1 hour for free tier. --- ## Quick Reference ### Key Files - `spec.md` - Feature specifications - `task.md` - Development tasks and roadmap - `actions.md` - User workflows and use cases - `claude.md` - This file (project context) ### Key Concepts - **Noise reduction** - Core differentiator - **Keyword alerts** - High-value feature - **Element selection** - Monitor specific parts - **Change severity** - Classify importance ### Pricing Tiers (Under Review - See findings_market.md) - **Free**: 5 monitors, 1hr frequency - **Pro**: 50 monitors, 5min frequency, $19-29/mo - **Business**: 200 monitors, 1min frequency, teams, $99-149/mo - **Enterprise**: Unlimited, custom pricing **Note:** Considering switch to "checks/month" model instead of "monitors + frequency" for fairer pricing --- ## Competitive Positioning (Updated 2026-01-18) ### Market Landscape We compete with established players (Visualping, Distill, Fluxguard) and budget options (Sken.io, ChangeDetection.io). ### vs. Visualping - **Their Strength**: Enterprise trust ("85% Fortune 500"), broad features - **Our Angle**: "Better noise control + fairer pricing – without the enterprise bloat" - **Messaging**: "Built for teams who need results, not demos" ### vs. Distill.io - **Their Strength**: Conditions/filters, established user base - **Our Angle**: "Team features built-in + modern UX – not stuck in 2015" - **Messaging**: "Collaboration-first, not an afterthought" ### vs. Fluxguard - **Their Strength**: AI summaries, enterprise focus, sales-led - **Our Angle**: "Self-serve pricing + instant setup – no demo calls required" - **Messaging**: "AI-powered intelligence without the enterprise tax" ### vs. ChangeDetection.io / Sken.io - **Their Strength**: Low price ($3-9/mo), simple - **Our Angle**: "Advanced features (keywords, integrations, teams) without complexity" - **Messaging**: "Powerful, but still simple" ### How We Win 1. **Superior noise control** (multi-layer filtering) 2. **Workflow integrations** (Slack/Teams/Webhooks early) 3. **Use-case marketing** (SEO, Competitor, Policy segments) 4. **Modern UX** (not stuck in legacy design) 5. **Fair pricing** (considering checks/month model) --- *Last updated: 2026-01-18*