website-monitor/claude.md

621 lines
14 KiB
Markdown

# Website Change Detection Monitor - Claude Context
## Project Overview
This is a **Website Change Detection Monitor SaaS** application. The core value proposition is helping users track changes on web pages they care about, with intelligent noise filtering to ensure only meaningful changes trigger alerts.
**Tagline**: "I watch pages so you don't have to"
---
## Key Differentiators
1. **Smart Noise Filtering**: Unlike competitors, we automatically filter out cookie banners, timestamps, rotating ads, and other irrelevant changes
2. **Keyword-Based Alerts**: Users can be notified when specific words/phrases appear or disappear (e.g., "sold out", "hiring", "$99")
3. **Simple but Powerful**: Easy enough for non-technical users, powerful enough for professionals
4. **SEO-Optimized Market**: Tons of long-tail keywords (e.g., "monitor job postings", "track competitor prices")
---
## Architecture Overview
### Tech Stack (Recommended)
**Frontend**:
- Next.js 14+ (App Router)
- TypeScript
- Tailwind CSS + shadcn/ui components
- React Query for state management
- Zod for validation
**Backend**:
- Node.js + Express OR Python + FastAPI
- PostgreSQL for relational data
- Redis + Bull/BullMQ for job queuing
- Puppeteer/Playwright for JS-heavy sites
**Infrastructure**:
- Vercel/Railway for frontend hosting
- Render/Railway/AWS for backend
- AWS S3 or Cloudflare R2 for snapshot storage
- Upstash Redis or managed Redis
**Third-Party Services**:
- Stripe for billing
- SendGrid/Postmark for emails
- Sentry for error tracking
- PostHog/Mixpanel for analytics
---
## Project Structure
```
/website-monitor
├── /frontend (Next.js)
│ ├── /app
│ │ ├── /dashboard
│ │ ├── /monitors
│ │ ├── /settings
│ │ └── /auth
│ ├── /components
│ │ ├── /ui (shadcn components)
│ │ ├── /monitors
│ │ └── /diff-viewer
│ ├── /lib
│ │ ├── api-client.ts
│ │ ├── auth.ts
│ │ └── utils.ts
│ └── /public
├── /backend
│ ├── /src
│ │ ├── /routes
│ │ ├── /controllers
│ │ ├── /models
│ │ ├── /services
│ │ │ ├── fetcher.ts
│ │ │ ├── differ.ts
│ │ │ ├── scheduler.ts
│ │ │ └── alerter.ts
│ │ ├── /jobs
│ │ └── /utils
│ ├── /db
│ │ └── /migrations
│ └── /tests
├── /docs
│ ├── spec.md
│ ├── task.md
│ ├── actions.md
│ └── claude.md (this file)
└── README.md
```
---
## Core Entities & Data Models
### User
```typescript
{
id: string
email: string
passwordHash: string
plan: 'free' | 'pro' | 'business' | 'enterprise'
stripeCustomerId: string
createdAt: Date
lastLoginAt: Date
}
```
### Monitor
```typescript
{
id: string
userId: string
url: string
name: string
frequency: number // minutes
status: 'active' | 'paused' | 'error'
// Advanced features
elementSelector?: string
ignoreRules?: {
type: 'css' | 'regex' | 'text'
value: string
}[]
keywordRules?: {
keyword: string
type: 'appears' | 'disappears' | 'count'
threshold?: number
}[]
// Metadata
lastCheckedAt?: Date
lastChangedAt?: Date
consecutiveErrors: number
createdAt: Date
}
```
### Snapshot
```typescript
{
id: string
monitorId: string
htmlContent: string
contentHash: string
screenshotUrl?: string
// Status
httpStatus: number
responseTime: number
changed: boolean
changePercentage?: number
// Errors
errorMessage?: string
// Metadata
createdAt: Date
}
```
### Alert
```typescript
{
id: string
monitorId: string
snapshotId: string
userId: string
// Alert details
type: 'change' | 'error' | 'keyword'
title: string
summary?: string
// Delivery
channels: ('email' | 'slack' | 'webhook')[]
deliveredAt?: Date
readAt?: Date
createdAt: Date
}
```
---
## Key Algorithms & Logic
### Change Detection
```typescript
// Simple hash comparison for binary change detection
const changed = previousHash !== currentHash
// Text diff for detailed comparison
const diff = diffLines(previousText, currentText)
const changePercentage = (changedLines / totalLines) * 100
// Severity calculation
const severity =
changePercentage > 50 ? 'major' :
changePercentage > 10 ? 'medium' : 'minor'
```
### Noise Filtering
```typescript
// Remove common noise patterns
function filterNoise(html: string): string {
// Remove timestamps
html = html.replace(/\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g, '')
// Remove cookie banners (common selectors)
const noisySelectors = [
'.cookie-banner',
'#cookie-notice',
'[class*="consent"]',
// ... more patterns
]
// Parse and remove elements
const $ = cheerio.load(html)
noisySelectors.forEach(sel => $(sel).remove())
return $.html()
}
```
### Keyword Detection
```typescript
function checkKeywords(
previousText: string,
currentText: string,
rules: KeywordRule[]
): KeywordMatch[] {
const matches = []
for (const rule of rules) {
const prevMatch = previousText.includes(rule.keyword)
const currMatch = currentText.includes(rule.keyword)
if (rule.type === 'appears' && !prevMatch && currMatch) {
matches.push({ rule, type: 'appeared' })
}
if (rule.type === 'disappears' && prevMatch && !currMatch) {
matches.push({ rule, type: 'disappeared' })
}
// Count logic...
}
return matches
}
```
---
## Development Guidelines
### When Working on This Project
1. **Prioritize MVP**: Focus on core features before adding complexity
2. **Performance matters**: Diffing and fetching should be fast (<2s)
3. **Noise reduction is key**: This is our competitive advantage
4. **User feedback loop**: Build in ways to learn from false positives
5. **Security first**: Never store credentials in plain text, sanitize all URLs
### Code Style
- Use TypeScript strict mode
- Write unit tests for core algorithms (differ, filter, keyword)
- Use async/await, avoid callbacks
- Prefer functional programming patterns
- Comment complex logic, especially regex patterns
### API Design Principles
- RESTful endpoints
- Use proper HTTP status codes
- Return consistent error format:
```json
{
"error": "monitor_not_found",
"message": "Monitor with id 123 not found",
"details": {}
}
```
- Paginate list endpoints (monitors, snapshots, alerts)
- Version API if breaking changes needed (/v1/monitors)
---
## Common Tasks & Commands
### When Starting Development
```bash
# Clone and setup
git clone <repo>
cd website-monitor
# Install dependencies
cd frontend && npm install
cd ../backend && npm install
# Setup environment
cp .env.example .env
# Edit .env with your values
# Start database
docker-compose up -d postgres redis
# Run migrations
cd backend && npm run migrate
# Start dev servers
cd frontend && npm run dev
cd backend && npm run dev
```
### Running Tests
```bash
# Frontend tests
cd frontend && npm test
# Backend tests
cd backend && npm test
# E2E tests
npm run test:e2e
```
### Deployment
```bash
# Build frontend
cd frontend && npm run build
# Deploy frontend (Vercel)
vercel deploy --prod
# Deploy backend
docker build -t monitor-api .
docker push <registry>/monitor-api
# Deploy to Railway/Render/AWS
```
---
## Key User Flows to Support
When building features, always consider these primary use cases:
1. **Job seeker monitoring career pages** (most common)
- Needs: Fast frequency (5 min), keyword alerts, instant notifications
2. **Price tracking for e-commerce** (high value)
- Needs: Element selection, numeric comparison, reliable alerts
3. **Competitor monitoring** (B2B focus)
- Needs: Multiple monitors, digest mode, AI summaries
4. **Stock/availability tracking** (urgent)
- Needs: Fastest frequency (1 min), SMS alerts, auto-pause
5. **Policy/regulation monitoring** (professional)
- Needs: Long-term history, team sharing, AI summaries
---
## Integration Points
### Email Service (SendGrid/Postmark)
```typescript
async function sendChangeAlert(monitor: Monitor, snapshot: Snapshot) {
const diffUrl = `https://app.example.com/monitors/${monitor.id}/diff/${snapshot.id}`
await emailService.send({
to: monitor.user.email,
subject: `Change detected: ${monitor.name}`,
template: 'change-alert',
data: {
monitorName: monitor.name,
url: monitor.url,
timestamp: snapshot.createdAt,
diffUrl,
changePercentage: snapshot.changePercentage
}
})
}
```
### Stripe Billing
```typescript
async function handleSubscription(userId: string, plan: string) {
const user = await db.users.findById(userId)
// Create or update subscription
const subscription = await stripe.subscriptions.create({
customer: user.stripeCustomerId,
items: [{ price: PRICE_IDS[plan] }]
})
// Update user plan
await db.users.update(userId, {
plan,
subscriptionId: subscription.id
})
}
```
### Job Queue (Bull)
```typescript
// Schedule monitor checks
async function scheduleMonitor(monitor: Monitor) {
await monitorQueue.add(
'check-monitor',
{ monitorId: monitor.id },
{
repeat: {
every: monitor.frequency * 60 * 1000 // convert to ms
},
jobId: `monitor-${monitor.id}`
}
)
}
// Process checks
monitorQueue.process('check-monitor', async (job) => {
const { monitorId } = job.data
await checkMonitor(monitorId)
})
```
---
## Testing Strategy
### Unit Tests
- Diff algorithms
- Noise filtering
- Keyword matching
- Ignore rules application
### Integration Tests
- API endpoints
- Database operations
- Job queue processing
### E2E Tests
- User registration & login
- Monitor creation & management
- Alert delivery
- Subscription changes
### Performance Tests
- Fetch speed with various page sizes
- Diff calculation speed
- Concurrent monitor checks
- Database query performance
---
## Deployment Checklist
Before deploying to production:
- [ ] Environment variables configured
- [ ] Database migrations run
- [ ] SSL certificates configured
- [ ] Email deliverability tested
- [ ] Payment processing tested (Stripe test mode live mode)
- [ ] Error tracking configured (Sentry)
- [ ] Monitoring & alerts set up (uptime, error rate, queue health)
- [ ] Backup strategy implemented
- [ ] Rate limiting configured
- [ ] GDPR compliance (privacy policy, data export/deletion)
- [ ] Security headers configured
- [ ] API documentation updated
---
## Troubleshooting Common Issues
### "Monitor keeps triggering false alerts"
- Check if noise filtering is working
- Review ignore rules for the monitor
- Look at diff to identify changing element
- Add custom ignore rule for that element
### "Some pages aren't being monitored correctly"
- Check if page requires JavaScript rendering
- Try enabling headless browser mode
- Check if page requires authentication
- Look for CAPTCHA or bot detection
### "Alerts aren't being delivered"
- Check email service status
- Verify email isn't going to spam
- Check alert queue for errors
- Verify user's alert settings
### "System is slow/overloaded"
- Check Redis queue health
- Look for monitors with very high frequency
- Check database query performance
- Consider scaling workers horizontally
---
## Metrics to Track
### Technical Metrics
- Average check duration
- Diff calculation time
- Check success rate
- Alert delivery rate
- Queue processing lag
### Product Metrics
- Active monitors per user
- Alerts sent per day
- False positive rate (from user feedback)
- Feature adoption (keywords, elements, integrations)
### Business Metrics
- Free Paid conversion rate
- Monthly churn rate
- Average revenue per user (ARPU)
- Customer acquisition cost (CAC)
- Lifetime value (LTV)
---
## Resources & Documentation
### External Documentation
- [Next.js Docs](https://nextjs.org/docs)
- [Tailwind CSS](https://tailwindcss.com/docs)
- [Playwright Docs](https://playwright.dev)
- [Bull Queue](https://github.com/OptimalBits/bull)
- [Stripe API](https://stripe.com/docs/api)
### Internal Documentation
- See `spec.md` for complete feature specifications
- See `task.md` for development roadmap
- See `actions.md` for user workflows and use cases
---
## Future Considerations
### Potential Enhancements
- Mobile app (React Native or Progressive Web App)
- Browser extension for quick monitor addition
- AI-powered change importance scoring
- Collaborative features (team annotations, approval workflows)
- Marketplace for monitor templates
- Affiliate program for power users
### Scaling Considerations
- Distributed workers across multiple regions
- Caching layer for frequently accessed pages
- Database sharding by user
- Separate queue for high-frequency monitors
- CDN for snapshot storage
---
## Notes for Claude
When working on this project:
1. **Always reference these docs**: spec.md, task.md, actions.md, and this file
2. **MVP mindset**: Implement the simplest solution that works first
3. **User-centric**: Consider the user workflows in actions.md when building features
4. **Security-conscious**: Validate URLs, sanitize inputs, encrypt sensitive data
5. **Performance-aware**: Optimize for speed, especially diff calculation
6. **Ask clarifying questions**: If requirements are ambiguous, ask before implementing
7. **Test as you go**: Write tests for core functionality
8. **Document decisions**: Update these docs when making architectural decisions
### Common Questions & Answers
**Q: Should we support authenticated pages in MVP?**
A: No, save for V2. Focus on public pages first.
**Q: What diff library should we use?**
A: `diff` (npm) or `jsdiff` for JavaScript, `difflib` for Python.
**Q: How do we handle CAPTCHA?**
A: For MVP, just alert the user. For V2, consider residential proxies or browser fingerprinting.
**Q: Should we store full HTML or just text?**
A: Store both: full HTML for accuracy, extracted text for diffing performance.
**Q: What's the minimum viable frequency?**
A: 5 minutes for paid users, 1 hour for free tier.
---
## Quick Reference
### Key Files
- `spec.md` - Feature specifications
- `task.md` - Development tasks and roadmap
- `actions.md` - User workflows and use cases
- `claude.md` - This file (project context)
### Key Concepts
- **Noise reduction** - Core differentiator
- **Keyword alerts** - High-value feature
- **Element selection** - Monitor specific parts
- **Change severity** - Classify importance
### Pricing Tiers
- **Free**: 5 monitors, 1hr frequency
- **Pro**: 50 monitors, 5min frequency, $19-29/mo
- **Business**: 200 monitors, 1min frequency, teams, $99-149/mo
- **Enterprise**: Unlimited, custom pricing
---
*Last updated: 2026-01-16*