website-monitor/claude.md

# Website Change Detection Monitor - Claude Context

## Project Overview

This is a **Website Change Detection Monitor SaaS** application. The core value proposition is helping users track changes on web pages they care about, with intelligent noise filtering to ensure only meaningful changes trigger alerts.

**Tagline**: "I watch pages so you don't have to"

---

## Key Differentiators

1. **Smart Noise Filtering**: Unlike competitors, we automatically filter out cookie banners, timestamps, rotating ads, and other irrelevant changes
2. **Keyword-Based Alerts**: Users can be notified when specific words/phrases appear or disappear (e.g., "sold out", "hiring", "$99")
3. **Simple but Powerful**: Easy enough for non-technical users, powerful enough for professionals
4. **SEO-Optimized Market**: Tons of long-tail keywords (e.g., "monitor job postings", "track competitor prices")

---

## Architecture Overview

### Tech Stack (Recommended)

**Frontend**:
- Next.js 14+ (App Router)
- TypeScript
- Tailwind CSS + shadcn/ui components
- React Query for state management
- Zod for validation

**Backend**:
- Node.js + Express OR Python + FastAPI
- PostgreSQL for relational data
- Redis + Bull/BullMQ for job queuing
- Puppeteer/Playwright for JS-heavy sites

**Infrastructure**:
- Vercel/Railway for frontend hosting
- Render/Railway/AWS for backend
- AWS S3 or Cloudflare R2 for snapshot storage
- Upstash Redis or managed Redis

**Third-Party Services**:
- Stripe for billing
- SendGrid/Postmark for emails
- Sentry for error tracking
- PostHog/Mixpanel for analytics

---

## Project Structure

```
/website-monitor
├── /frontend (Next.js)
│   ├── /app
│   │   ├── /dashboard
│   │   ├── /monitors
│   │   ├── /settings
│   │   └── /auth
│   ├── /components
│   │   ├── /ui (shadcn components)
│   │   ├── /monitors
│   │   └── /diff-viewer
│   ├── /lib
│   │   ├── api-client.ts
│   │   ├── auth.ts
│   │   └── utils.ts
│   └── /public
├── /backend
│   ├── /src
│   │   ├── /routes
│   │   ├── /controllers
│   │   ├── /models
│   │   ├── /services
│   │   │   ├── fetcher.ts
│   │   │   ├── differ.ts
│   │   │   ├── scheduler.ts
│   │   │   └── alerter.ts
│   │   ├── /jobs
│   │   └── /utils
│   ├── /db
│   │   └── /migrations
│   └── /tests
├── /docs
│   ├── spec.md
│   ├── task.md
│   ├── actions.md
│   └── claude.md (this file)
└── README.md
```

---

## Core Entities & Data Models

### User
```typescript
{
  id: string
  email: string
  passwordHash: string
  plan: 'free' | 'pro' | 'business' | 'enterprise'
  stripeCustomerId: string
  createdAt: Date
  lastLoginAt: Date
}
```

### Monitor
```typescript
{
  id: string
  userId: string
  url: string
  name: string
  frequency: number // minutes
  status: 'active' | 'paused' | 'error'

  // Advanced features
  elementSelector?: string
  ignoreRules?: {
    type: 'css' | 'regex' | 'text'
    value: string
  }[]
  keywordRules?: {
    keyword: string
    type: 'appears' | 'disappears' | 'count'
    threshold?: number
  }[]

  // Metadata
  lastCheckedAt?: Date
  lastChangedAt?: Date
  consecutiveErrors: number
  createdAt: Date
}
```

### Snapshot
```typescript
{
  id: string
  monitorId: string
  htmlContent: string
  contentHash: string
  screenshotUrl?: string

  // Status
  httpStatus: number
  responseTime: number
  changed: boolean
  changePercentage?: number

  // Errors
  errorMessage?: string

  // Metadata
  createdAt: Date
}
```

### Alert
```typescript
{
  id: string
  monitorId: string
  snapshotId: string
  userId: string

  // Alert details
  type: 'change' | 'error' | 'keyword'
  title: string
  summary?: string

  // Delivery
  channels: ('email' | 'slack' | 'webhook')[]
  deliveredAt?: Date
  readAt?: Date

  createdAt: Date
}
```

---

## Key Algorithms & Logic

### Change Detection
```typescript
// Simple hash comparison for binary change detection
const changed = previousHash !== currentHash

// Text diff for detailed comparison
const diff = diffLines(previousText, currentText)
const changePercentage = (changedLines / totalLines) * 100

// Severity calculation
const severity =
  changePercentage > 50 ? 'major' :
  changePercentage > 10 ? 'medium' : 'minor'
```

### Noise Filtering
```typescript
// Remove common noise patterns
function filterNoise(html: string): string {
  // Remove timestamps
  html = html.replace(/\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g, '')

  // Remove cookie banners (common selectors)
  const noisySelectors = [
    '.cookie-banner',
    '#cookie-notice',
    '[class*="consent"]',
    // ... more patterns
  ]

  // Parse and remove elements
  const $ = cheerio.load(html)
  noisySelectors.forEach(sel => $(sel).remove())

  return $.html()
}
```

### Keyword Detection
```typescript
function checkKeywords(
  previousText: string,
  currentText: string,
  rules: KeywordRule[]
): KeywordMatch[] {
  const matches = []

  for (const rule of rules) {
    const prevMatch = previousText.includes(rule.keyword)
    const currMatch = currentText.includes(rule.keyword)

    if (rule.type === 'appears' && !prevMatch && currMatch) {
      matches.push({ rule, type: 'appeared' })
    }
    if (rule.type === 'disappears' && prevMatch && !currMatch) {
      matches.push({ rule, type: 'disappeared' })
    }

    // Count logic...
  }

  return matches
}
```

---

## Development Guidelines

### When Working on This Project

1. **Prioritize MVP**: Focus on core features before adding complexity
2. **Performance matters**: Diffing and fetching should be fast (<2s)
3. **Noise reduction is key**: This is our competitive advantage
4. **User feedback loop**: Build in ways to learn from false positives
5. **Security first**: Never store credentials in plain text, sanitize all URLs

### Code Style

- Use TypeScript strict mode
- Write unit tests for core algorithms (differ, filter, keyword)
- Use async/await, avoid callbacks
- Prefer functional programming patterns
- Comment complex logic, especially regex patterns

### API Design Principles

- RESTful endpoints
- Use proper HTTP status codes
- Return consistent error format:
  ```json
  {
    "error": "monitor_not_found",
    "message": "Monitor with id 123 not found",
    "details": {}
  }
  ```
- Paginate list endpoints (monitors, snapshots, alerts)
- Version API if breaking changes needed (/v1/monitors)

---

## Common Tasks & Commands

### When Starting Development
```bash
# Clone and setup
git clone <repo>
cd website-monitor

# Install dependencies
cd frontend && npm install
cd ../backend && npm install

# Setup environment
cp .env.example .env
# Edit .env with your values

# Start database
docker-compose up -d postgres redis

# Run migrations
cd backend && npm run migrate

# Start dev servers
cd frontend && npm run dev
cd backend && npm run dev
```

### Running Tests
```bash
# Frontend tests
cd frontend && npm test

# Backend tests
cd backend && npm test

# E2E tests
npm run test:e2e
```

### Deployment
```bash
# Build frontend
cd frontend && npm run build

# Deploy frontend (Vercel)
vercel deploy --prod

# Deploy backend
docker build -t monitor-api .
docker push <registry>/monitor-api
# Deploy to Railway/Render/AWS
```

---

## Key User Flows to Support

When building features, always consider these primary use cases:

1. **Job seeker monitoring career pages** (most common)
   - Needs: Fast frequency (5 min), keyword alerts, instant notifications

2. **Price tracking for e-commerce** (high value)
   - Needs: Element selection, numeric comparison, reliable alerts

3. **Competitor monitoring** (B2B focus)
   - Needs: Multiple monitors, digest mode, AI summaries

4. **Stock/availability tracking** (urgent)
   - Needs: Fastest frequency (1 min), SMS alerts, auto-pause

5. **Policy/regulation monitoring** (professional)
   - Needs: Long-term history, team sharing, AI summaries

---

## Integration Points

### Email Service (SendGrid/Postmark)
```typescript
async function sendChangeAlert(monitor: Monitor, snapshot: Snapshot) {
  const diffUrl = `https://app.example.com/monitors/${monitor.id}/diff/${snapshot.id}`

  await emailService.send({
    to: monitor.user.email,
    subject: `Change detected: ${monitor.name}`,
    template: 'change-alert',
    data: {
      monitorName: monitor.name,
      url: monitor.url,
      timestamp: snapshot.createdAt,
      diffUrl,
      changePercentage: snapshot.changePercentage
    }
  })
}
```

### Stripe Billing
```typescript
async function handleSubscription(userId: string, plan: string) {
  const user = await db.users.findById(userId)

  // Create or update subscription
  const subscription = await stripe.subscriptions.create({
    customer: user.stripeCustomerId,
    items: [{ price: PRICE_IDS[plan] }]
  })

  // Update user plan
  await db.users.update(userId, {
    plan,
    subscriptionId: subscription.id
  })
}
```

### Job Queue (Bull)
```typescript
// Schedule monitor checks
async function scheduleMonitor(monitor: Monitor) {
  await monitorQueue.add(
    'check-monitor',
    { monitorId: monitor.id },
    {
      repeat: {
        every: monitor.frequency * 60 * 1000 // convert to ms
      },
      jobId: `monitor-${monitor.id}`
    }
  )
}

// Process checks
monitorQueue.process('check-monitor', async (job) => {
  const { monitorId } = job.data
  await checkMonitor(monitorId)
})
```

---

## Testing Strategy

### Unit Tests
- Diff algorithms
- Noise filtering
- Keyword matching
- Ignore rules application

### Integration Tests
- API endpoints
- Database operations
- Job queue processing

### E2E Tests
- User registration & login
- Monitor creation & management
- Alert delivery
- Subscription changes

### Performance Tests
- Fetch speed with various page sizes
- Diff calculation speed
- Concurrent monitor checks
- Database query performance

---

## Deployment Checklist

Before deploying to production:

- [ ] Environment variables configured
- [ ] Database migrations run
- [ ] SSL certificates configured
- [ ] Email deliverability tested
- [ ] Payment processing tested (Stripe test mode → live mode)
- [ ] Error tracking configured (Sentry)
- [ ] Monitoring & alerts set up (uptime, error rate, queue health)
- [ ] Backup strategy implemented
- [ ] Rate limiting configured
- [ ] GDPR compliance (privacy policy, data export/deletion)
- [ ] Security headers configured
- [ ] API documentation updated

---

## Troubleshooting Common Issues

### "Monitor keeps triggering false alerts"
- Check if noise filtering is working
- Review ignore rules for the monitor
- Look at diff to identify changing element
- Add custom ignore rule for that element

### "Some pages aren't being monitored correctly"
- Check if page requires JavaScript rendering
- Try enabling headless browser mode
- Check if page requires authentication
- Look for CAPTCHA or bot detection

### "Alerts aren't being delivered"
- Check email service status
- Verify email isn't going to spam
- Check alert queue for errors
- Verify user's alert settings

### "System is slow/overloaded"
- Check Redis queue health
- Look for monitors with very high frequency
- Check database query performance
- Consider scaling workers horizontally

---

## Metrics to Track

### Technical Metrics
- Average check duration
- Diff calculation time
- Check success rate
- Alert delivery rate
- Queue processing lag

### Product Metrics
- Active monitors per user
- Alerts sent per day
- False positive rate (from user feedback)
- Feature adoption (keywords, elements, integrations)

### Business Metrics
- Free → Paid conversion rate
- Monthly churn rate
- Average revenue per user (ARPU)
- Customer acquisition cost (CAC)
- Lifetime value (LTV)

---

## Resources & Documentation

### External Documentation
- [Next.js Docs](https://nextjs.org/docs)
- [Tailwind CSS](https://tailwindcss.com/docs)
- [Playwright Docs](https://playwright.dev)
- [Bull Queue](https://github.com/OptimalBits/bull)
- [Stripe API](https://stripe.com/docs/api)

### Internal Documentation
- See `spec.md` for complete feature specifications
- See `task.md` for development roadmap
- See `actions.md` for user workflows and use cases

---

## Future Considerations

### Potential Enhancements
- Mobile app (React Native or Progressive Web App)
- Browser extension for quick monitor addition
- AI-powered change importance scoring
- Collaborative features (team annotations, approval workflows)
- Marketplace for monitor templates
- Affiliate program for power users

### Scaling Considerations
- Distributed workers across multiple regions
- Caching layer for frequently accessed pages
- Database sharding by user
- Separate queue for high-frequency monitors
- CDN for snapshot storage

---

## Notes for Claude

When working on this project:

1. **Always reference these docs**: spec.md, task.md, actions.md, and this file
2. **MVP mindset**: Implement the simplest solution that works first
3. **User-centric**: Consider the user workflows in actions.md when building features
4. **Security-conscious**: Validate URLs, sanitize inputs, encrypt sensitive data
5. **Performance-aware**: Optimize for speed, especially diff calculation
6. **Ask clarifying questions**: If requirements are ambiguous, ask before implementing
7. **Test as you go**: Write tests for core functionality
8. **Document decisions**: Update these docs when making architectural decisions

### Common Questions & Answers

**Q: Should we support authenticated pages in MVP?**
A: No, save for V2. Focus on public pages first.

**Q: What diff library should we use?**
A: `diff` (npm) or `jsdiff` for JavaScript, `difflib` for Python.

**Q: How do we handle CAPTCHA?**
A: For MVP, just alert the user. For V2, consider residential proxies or browser fingerprinting.

**Q: Should we store full HTML or just text?**
A: Store both: full HTML for accuracy, extracted text for diffing performance.

**Q: What's the minimum viable frequency?**
A: 5 minutes for paid users, 1 hour for free tier.

---

## Quick Reference

### Key Files
- `spec.md` - Feature specifications
- `task.md` - Development tasks and roadmap
- `actions.md` - User workflows and use cases
- `claude.md` - This file (project context)

### Key Concepts
- **Noise reduction** - Core differentiator
- **Keyword alerts** - High-value feature
- **Element selection** - Monitor specific parts
- **Change severity** - Classify importance

### Pricing Tiers
- **Free**: 5 monitors, 1hr frequency
- **Pro**: 50 monitors, 5min frequency, $19-29/mo
- **Business**: 200 monitors, 1min frequency, teams, $99-149/mo
- **Enterprise**: Unlimited, custom pricing

---

*Last updated: 2026-01-16*