621 lines
14 KiB
Markdown
621 lines
14 KiB
Markdown
# Website Change Detection Monitor - Claude Context
|
|
|
|
## Project Overview
|
|
|
|
This is a **Website Change Detection Monitor SaaS** application. The core value proposition is helping users track changes on web pages they care about, with intelligent noise filtering to ensure only meaningful changes trigger alerts.
|
|
|
|
**Tagline**: "I watch pages so you don't have to"
|
|
|
|
---
|
|
|
|
## Key Differentiators
|
|
|
|
1. **Smart Noise Filtering**: Unlike competitors, we automatically filter out cookie banners, timestamps, rotating ads, and other irrelevant changes
|
|
2. **Keyword-Based Alerts**: Users can be notified when specific words/phrases appear or disappear (e.g., "sold out", "hiring", "$99")
|
|
3. **Simple but Powerful**: Easy enough for non-technical users, powerful enough for professionals
|
|
4. **SEO-Optimized Market**: Tons of long-tail keywords (e.g., "monitor job postings", "track competitor prices")
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
### Tech Stack (Recommended)
|
|
|
|
**Frontend**:
|
|
- Next.js 14+ (App Router)
|
|
- TypeScript
|
|
- Tailwind CSS + shadcn/ui components
|
|
- React Query for state management
|
|
- Zod for validation
|
|
|
|
**Backend**:
|
|
- Node.js + Express OR Python + FastAPI
|
|
- PostgreSQL for relational data
|
|
- Redis + Bull/BullMQ for job queuing
|
|
- Puppeteer/Playwright for JS-heavy sites
|
|
|
|
**Infrastructure**:
|
|
- Vercel/Railway for frontend hosting
|
|
- Render/Railway/AWS for backend
|
|
- AWS S3 or Cloudflare R2 for snapshot storage
|
|
- Upstash Redis or managed Redis
|
|
|
|
**Third-Party Services**:
|
|
- Stripe for billing
|
|
- SendGrid/Postmark for emails
|
|
- Sentry for error tracking
|
|
- PostHog/Mixpanel for analytics
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
/website-monitor
|
|
├── /frontend (Next.js)
|
|
│ ├── /app
|
|
│ │ ├── /dashboard
|
|
│ │ ├── /monitors
|
|
│ │ ├── /settings
|
|
│ │ └── /auth
|
|
│ ├── /components
|
|
│ │ ├── /ui (shadcn components)
|
|
│ │ ├── /monitors
|
|
│ │ └── /diff-viewer
|
|
│ ├── /lib
|
|
│ │ ├── api-client.ts
|
|
│ │ ├── auth.ts
|
|
│ │ └── utils.ts
|
|
│ └── /public
|
|
├── /backend
|
|
│ ├── /src
|
|
│ │ ├── /routes
|
|
│ │ ├── /controllers
|
|
│ │ ├── /models
|
|
│ │ ├── /services
|
|
│ │ │ ├── fetcher.ts
|
|
│ │ │ ├── differ.ts
|
|
│ │ │ ├── scheduler.ts
|
|
│ │ │ └── alerter.ts
|
|
│ │ ├── /jobs
|
|
│ │ └── /utils
|
|
│ ├── /db
|
|
│ │ └── /migrations
|
|
│ └── /tests
|
|
├── /docs
|
|
│ ├── spec.md
|
|
│ ├── task.md
|
|
│ ├── actions.md
|
|
│ └── claude.md (this file)
|
|
└── README.md
|
|
```
|
|
|
|
---
|
|
|
|
## Core Entities & Data Models
|
|
|
|
### User
|
|
```typescript
|
|
{
|
|
id: string
|
|
email: string
|
|
passwordHash: string
|
|
plan: 'free' | 'pro' | 'business' | 'enterprise'
|
|
stripeCustomerId: string
|
|
createdAt: Date
|
|
lastLoginAt: Date
|
|
}
|
|
```
|
|
|
|
### Monitor
|
|
```typescript
|
|
{
|
|
id: string
|
|
userId: string
|
|
url: string
|
|
name: string
|
|
frequency: number // minutes
|
|
status: 'active' | 'paused' | 'error'
|
|
|
|
// Advanced features
|
|
elementSelector?: string
|
|
ignoreRules?: {
|
|
type: 'css' | 'regex' | 'text'
|
|
value: string
|
|
}[]
|
|
keywordRules?: {
|
|
keyword: string
|
|
type: 'appears' | 'disappears' | 'count'
|
|
threshold?: number
|
|
}[]
|
|
|
|
// Metadata
|
|
lastCheckedAt?: Date
|
|
lastChangedAt?: Date
|
|
consecutiveErrors: number
|
|
createdAt: Date
|
|
}
|
|
```
|
|
|
|
### Snapshot
|
|
```typescript
|
|
{
|
|
id: string
|
|
monitorId: string
|
|
htmlContent: string
|
|
contentHash: string
|
|
screenshotUrl?: string
|
|
|
|
// Status
|
|
httpStatus: number
|
|
responseTime: number
|
|
changed: boolean
|
|
changePercentage?: number
|
|
|
|
// Errors
|
|
errorMessage?: string
|
|
|
|
// Metadata
|
|
createdAt: Date
|
|
}
|
|
```
|
|
|
|
### Alert
|
|
```typescript
|
|
{
|
|
id: string
|
|
monitorId: string
|
|
snapshotId: string
|
|
userId: string
|
|
|
|
// Alert details
|
|
type: 'change' | 'error' | 'keyword'
|
|
title: string
|
|
summary?: string
|
|
|
|
// Delivery
|
|
channels: ('email' | 'slack' | 'webhook')[]
|
|
deliveredAt?: Date
|
|
readAt?: Date
|
|
|
|
createdAt: Date
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Key Algorithms & Logic
|
|
|
|
### Change Detection
|
|
```typescript
|
|
// Simple hash comparison for binary change detection
|
|
const changed = previousHash !== currentHash
|
|
|
|
// Text diff for detailed comparison
|
|
const diff = diffLines(previousText, currentText)
|
|
const changePercentage = (changedLines / totalLines) * 100
|
|
|
|
// Severity calculation
|
|
const severity =
|
|
changePercentage > 50 ? 'major' :
|
|
changePercentage > 10 ? 'medium' : 'minor'
|
|
```
|
|
|
|
### Noise Filtering
|
|
```typescript
|
|
// Remove common noise patterns
|
|
function filterNoise(html: string): string {
|
|
// Remove timestamps
|
|
html = html.replace(/\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g, '')
|
|
|
|
// Remove cookie banners (common selectors)
|
|
const noisySelectors = [
|
|
'.cookie-banner',
|
|
'#cookie-notice',
|
|
'[class*="consent"]',
|
|
// ... more patterns
|
|
]
|
|
|
|
// Parse and remove elements
|
|
const $ = cheerio.load(html)
|
|
noisySelectors.forEach(sel => $(sel).remove())
|
|
|
|
return $.html()
|
|
}
|
|
```
|
|
|
|
### Keyword Detection
|
|
```typescript
|
|
function checkKeywords(
|
|
previousText: string,
|
|
currentText: string,
|
|
rules: KeywordRule[]
|
|
): KeywordMatch[] {
|
|
const matches = []
|
|
|
|
for (const rule of rules) {
|
|
const prevMatch = previousText.includes(rule.keyword)
|
|
const currMatch = currentText.includes(rule.keyword)
|
|
|
|
if (rule.type === 'appears' && !prevMatch && currMatch) {
|
|
matches.push({ rule, type: 'appeared' })
|
|
}
|
|
if (rule.type === 'disappears' && prevMatch && !currMatch) {
|
|
matches.push({ rule, type: 'disappeared' })
|
|
}
|
|
|
|
// Count logic...
|
|
}
|
|
|
|
return matches
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Development Guidelines
|
|
|
|
### When Working on This Project
|
|
|
|
1. **Prioritize MVP**: Focus on core features before adding complexity
|
|
2. **Performance matters**: Diffing and fetching should be fast (<2s)
|
|
3. **Noise reduction is key**: This is our competitive advantage
|
|
4. **User feedback loop**: Build in ways to learn from false positives
|
|
5. **Security first**: Never store credentials in plain text, sanitize all URLs
|
|
|
|
### Code Style
|
|
|
|
- Use TypeScript strict mode
|
|
- Write unit tests for core algorithms (differ, filter, keyword)
|
|
- Use async/await, avoid callbacks
|
|
- Prefer functional programming patterns
|
|
- Comment complex logic, especially regex patterns
|
|
|
|
### API Design Principles
|
|
|
|
- RESTful endpoints
|
|
- Use proper HTTP status codes
|
|
- Return consistent error format:
|
|
```json
|
|
{
|
|
"error": "monitor_not_found",
|
|
"message": "Monitor with id 123 not found",
|
|
"details": {}
|
|
}
|
|
```
|
|
- Paginate list endpoints (monitors, snapshots, alerts)
|
|
- Version API if breaking changes needed (/v1/monitors)
|
|
|
|
---
|
|
|
|
## Common Tasks & Commands
|
|
|
|
### When Starting Development
|
|
```bash
|
|
# Clone and setup
|
|
git clone <repo>
|
|
cd website-monitor
|
|
|
|
# Install dependencies
|
|
cd frontend && npm install
|
|
cd ../backend && npm install
|
|
|
|
# Setup environment
|
|
cp .env.example .env
|
|
# Edit .env with your values
|
|
|
|
# Start database
|
|
docker-compose up -d postgres redis
|
|
|
|
# Run migrations
|
|
cd backend && npm run migrate
|
|
|
|
# Start dev servers
|
|
cd frontend && npm run dev
|
|
cd backend && npm run dev
|
|
```
|
|
|
|
### Running Tests
|
|
```bash
|
|
# Frontend tests
|
|
cd frontend && npm test
|
|
|
|
# Backend tests
|
|
cd backend && npm test
|
|
|
|
# E2E tests
|
|
npm run test:e2e
|
|
```
|
|
|
|
### Deployment
|
|
```bash
|
|
# Build frontend
|
|
cd frontend && npm run build
|
|
|
|
# Deploy frontend (Vercel)
|
|
vercel deploy --prod
|
|
|
|
# Deploy backend
|
|
docker build -t monitor-api .
|
|
docker push <registry>/monitor-api
|
|
# Deploy to Railway/Render/AWS
|
|
```
|
|
|
|
---
|
|
|
|
## Key User Flows to Support
|
|
|
|
When building features, always consider these primary use cases:
|
|
|
|
1. **Job seeker monitoring career pages** (most common)
|
|
- Needs: Fast frequency (5 min), keyword alerts, instant notifications
|
|
|
|
2. **Price tracking for e-commerce** (high value)
|
|
- Needs: Element selection, numeric comparison, reliable alerts
|
|
|
|
3. **Competitor monitoring** (B2B focus)
|
|
- Needs: Multiple monitors, digest mode, AI summaries
|
|
|
|
4. **Stock/availability tracking** (urgent)
|
|
- Needs: Fastest frequency (1 min), SMS alerts, auto-pause
|
|
|
|
5. **Policy/regulation monitoring** (professional)
|
|
- Needs: Long-term history, team sharing, AI summaries
|
|
|
|
---
|
|
|
|
## Integration Points
|
|
|
|
### Email Service (SendGrid/Postmark)
|
|
```typescript
|
|
async function sendChangeAlert(monitor: Monitor, snapshot: Snapshot) {
|
|
const diffUrl = `https://app.example.com/monitors/${monitor.id}/diff/${snapshot.id}`
|
|
|
|
await emailService.send({
|
|
to: monitor.user.email,
|
|
subject: `Change detected: ${monitor.name}`,
|
|
template: 'change-alert',
|
|
data: {
|
|
monitorName: monitor.name,
|
|
url: monitor.url,
|
|
timestamp: snapshot.createdAt,
|
|
diffUrl,
|
|
changePercentage: snapshot.changePercentage
|
|
}
|
|
})
|
|
}
|
|
```
|
|
|
|
### Stripe Billing
|
|
```typescript
|
|
async function handleSubscription(userId: string, plan: string) {
|
|
const user = await db.users.findById(userId)
|
|
|
|
// Create or update subscription
|
|
const subscription = await stripe.subscriptions.create({
|
|
customer: user.stripeCustomerId,
|
|
items: [{ price: PRICE_IDS[plan] }]
|
|
})
|
|
|
|
// Update user plan
|
|
await db.users.update(userId, {
|
|
plan,
|
|
subscriptionId: subscription.id
|
|
})
|
|
}
|
|
```
|
|
|
|
### Job Queue (Bull)
|
|
```typescript
|
|
// Schedule monitor checks
|
|
async function scheduleMonitor(monitor: Monitor) {
|
|
await monitorQueue.add(
|
|
'check-monitor',
|
|
{ monitorId: monitor.id },
|
|
{
|
|
repeat: {
|
|
every: monitor.frequency * 60 * 1000 // convert to ms
|
|
},
|
|
jobId: `monitor-${monitor.id}`
|
|
}
|
|
)
|
|
}
|
|
|
|
// Process checks
|
|
monitorQueue.process('check-monitor', async (job) => {
|
|
const { monitorId } = job.data
|
|
await checkMonitor(monitorId)
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- Diff algorithms
|
|
- Noise filtering
|
|
- Keyword matching
|
|
- Ignore rules application
|
|
|
|
### Integration Tests
|
|
- API endpoints
|
|
- Database operations
|
|
- Job queue processing
|
|
|
|
### E2E Tests
|
|
- User registration & login
|
|
- Monitor creation & management
|
|
- Alert delivery
|
|
- Subscription changes
|
|
|
|
### Performance Tests
|
|
- Fetch speed with various page sizes
|
|
- Diff calculation speed
|
|
- Concurrent monitor checks
|
|
- Database query performance
|
|
|
|
---
|
|
|
|
## Deployment Checklist
|
|
|
|
Before deploying to production:
|
|
|
|
- [ ] Environment variables configured
|
|
- [ ] Database migrations run
|
|
- [ ] SSL certificates configured
|
|
- [ ] Email deliverability tested
|
|
- [ ] Payment processing tested (Stripe test mode → live mode)
|
|
- [ ] Error tracking configured (Sentry)
|
|
- [ ] Monitoring & alerts set up (uptime, error rate, queue health)
|
|
- [ ] Backup strategy implemented
|
|
- [ ] Rate limiting configured
|
|
- [ ] GDPR compliance (privacy policy, data export/deletion)
|
|
- [ ] Security headers configured
|
|
- [ ] API documentation updated
|
|
|
|
---
|
|
|
|
## Troubleshooting Common Issues
|
|
|
|
### "Monitor keeps triggering false alerts"
|
|
- Check if noise filtering is working
|
|
- Review ignore rules for the monitor
|
|
- Look at diff to identify changing element
|
|
- Add custom ignore rule for that element
|
|
|
|
### "Some pages aren't being monitored correctly"
|
|
- Check if page requires JavaScript rendering
|
|
- Try enabling headless browser mode
|
|
- Check if page requires authentication
|
|
- Look for CAPTCHA or bot detection
|
|
|
|
### "Alerts aren't being delivered"
|
|
- Check email service status
|
|
- Verify email isn't going to spam
|
|
- Check alert queue for errors
|
|
- Verify user's alert settings
|
|
|
|
### "System is slow/overloaded"
|
|
- Check Redis queue health
|
|
- Look for monitors with very high frequency
|
|
- Check database query performance
|
|
- Consider scaling workers horizontally
|
|
|
|
---
|
|
|
|
## Metrics to Track
|
|
|
|
### Technical Metrics
|
|
- Average check duration
|
|
- Diff calculation time
|
|
- Check success rate
|
|
- Alert delivery rate
|
|
- Queue processing lag
|
|
|
|
### Product Metrics
|
|
- Active monitors per user
|
|
- Alerts sent per day
|
|
- False positive rate (from user feedback)
|
|
- Feature adoption (keywords, elements, integrations)
|
|
|
|
### Business Metrics
|
|
- Free → Paid conversion rate
|
|
- Monthly churn rate
|
|
- Average revenue per user (ARPU)
|
|
- Customer acquisition cost (CAC)
|
|
- Lifetime value (LTV)
|
|
|
|
---
|
|
|
|
## Resources & Documentation
|
|
|
|
### External Documentation
|
|
- [Next.js Docs](https://nextjs.org/docs)
|
|
- [Tailwind CSS](https://tailwindcss.com/docs)
|
|
- [Playwright Docs](https://playwright.dev)
|
|
- [Bull Queue](https://github.com/OptimalBits/bull)
|
|
- [Stripe API](https://stripe.com/docs/api)
|
|
|
|
### Internal Documentation
|
|
- See `spec.md` for complete feature specifications
|
|
- See `task.md` for development roadmap
|
|
- See `actions.md` for user workflows and use cases
|
|
|
|
---
|
|
|
|
## Future Considerations
|
|
|
|
### Potential Enhancements
|
|
- Mobile app (React Native or Progressive Web App)
|
|
- Browser extension for quick monitor addition
|
|
- AI-powered change importance scoring
|
|
- Collaborative features (team annotations, approval workflows)
|
|
- Marketplace for monitor templates
|
|
- Affiliate program for power users
|
|
|
|
### Scaling Considerations
|
|
- Distributed workers across multiple regions
|
|
- Caching layer for frequently accessed pages
|
|
- Database sharding by user
|
|
- Separate queue for high-frequency monitors
|
|
- CDN for snapshot storage
|
|
|
|
---
|
|
|
|
## Notes for Claude
|
|
|
|
When working on this project:
|
|
|
|
1. **Always reference these docs**: spec.md, task.md, actions.md, and this file
|
|
2. **MVP mindset**: Implement the simplest solution that works first
|
|
3. **User-centric**: Consider the user workflows in actions.md when building features
|
|
4. **Security-conscious**: Validate URLs, sanitize inputs, encrypt sensitive data
|
|
5. **Performance-aware**: Optimize for speed, especially diff calculation
|
|
6. **Ask clarifying questions**: If requirements are ambiguous, ask before implementing
|
|
7. **Test as you go**: Write tests for core functionality
|
|
8. **Document decisions**: Update these docs when making architectural decisions
|
|
|
|
### Common Questions & Answers
|
|
|
|
**Q: Should we support authenticated pages in MVP?**
|
|
A: No, save for V2. Focus on public pages first.
|
|
|
|
**Q: What diff library should we use?**
|
|
A: `diff` (npm) or `jsdiff` for JavaScript, `difflib` for Python.
|
|
|
|
**Q: How do we handle CAPTCHA?**
|
|
A: For MVP, just alert the user. For V2, consider residential proxies or browser fingerprinting.
|
|
|
|
**Q: Should we store full HTML or just text?**
|
|
A: Store both: full HTML for accuracy, extracted text for diffing performance.
|
|
|
|
**Q: What's the minimum viable frequency?**
|
|
A: 5 minutes for paid users, 1 hour for free tier.
|
|
|
|
---
|
|
|
|
## Quick Reference
|
|
|
|
### Key Files
|
|
- `spec.md` - Feature specifications
|
|
- `task.md` - Development tasks and roadmap
|
|
- `actions.md` - User workflows and use cases
|
|
- `claude.md` - This file (project context)
|
|
|
|
### Key Concepts
|
|
- **Noise reduction** - Core differentiator
|
|
- **Keyword alerts** - High-value feature
|
|
- **Element selection** - Monitor specific parts
|
|
- **Change severity** - Classify importance
|
|
|
|
### Pricing Tiers
|
|
- **Free**: 5 monitors, 1hr frequency
|
|
- **Pro**: 50 monitors, 5min frequency, $19-29/mo
|
|
- **Business**: 200 monitors, 1min frequency, teams, $99-149/mo
|
|
- **Enterprise**: Unlimited, custom pricing
|
|
|
|
---
|
|
|
|
*Last updated: 2026-01-16*
|