website-monitor/claude.md

681 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Website Change Detection Monitor - Claude Context
## Project Overview
This is a **Website Change Detection Monitor SaaS** application. The core value proposition is helping users track changes on web pages they care about, with intelligent noise filtering to ensure only meaningful changes trigger alerts.
**Updated Tagline (2026-01-18)**: "Less noise. More signal. Proof included."
**Previous**: "I watch pages so you don't have to" (too generic, doesn't communicate value)
**Target Market**: SEO & Growth Teams (SMB → Mid-Market) who monitor competitor pages, SERP changes, and policy updates
---
## Key Differentiators (Market-Validated)
1. **Superior Noise Filtering** 🔥
- Automatically filter cookie banners, timestamps, rotating ads, session IDs
- Custom ignore rules (CSS selectors, regex, text patterns)
- **Market Evidence**: Distill & Fluxguard emphasize this as core differentiator
2. **Keyword-Based Alerts** 🔥
- Alert when specific words appear/disappear (e.g., "sold out", "hiring", "$99")
- Threshold-based triggers, regex support
- **Market Evidence**: High-value feature across all competitors
3. **Workflow Integrations** 🔥 NEW PRIORITY
- Webhooks (MVP), Slack (V1), Teams/Discord (V2)
- Alerts in your existing tools, not just email
- **Market Evidence**: Shown prominently by Visualping, Wachete, ChangeDetection
4. **Proof & History** 🔥
- Compare versions, audit-proof snapshots, full history
- Messaging: "Prove changes" not just "see changes"
- **Market Evidence**: Sken & Fluxguard highlight "versions kept"
5. **Use-Case-Focused Marketing**
- Primary: SEO Monitoring, Competitor Tracking, Policy/Compliance
- Secondary: Stock/Restock, Job Postings
- **Market Evidence**: All competitors segment by use case
---
## Architecture Overview
### Tech Stack (Recommended)
**Frontend**:
- Next.js 14+ (App Router)
- TypeScript
- Tailwind CSS + shadcn/ui components
- React Query for state management
- Zod for validation
**Backend**:
- Node.js + Express OR Python + FastAPI
- PostgreSQL for relational data
- Redis + Bull/BullMQ for job queuing
- Puppeteer/Playwright for JS-heavy sites
**Infrastructure**:
- Vercel/Railway for frontend hosting
- Render/Railway/AWS for backend
- AWS S3 or Cloudflare R2 for snapshot storage
- Upstash Redis or managed Redis
**Third-Party Services**:
- Stripe for billing
- SendGrid/Postmark for emails
- Sentry for error tracking
- PostHog/Mixpanel for analytics
---
## Project Structure
```
/website-monitor
├── /frontend (Next.js)
│ ├── /app
│ │ ├── /dashboard
│ │ ├── /monitors
│ │ ├── /settings
│ │ └── /auth
│ ├── /components
│ │ ├── /ui (shadcn components)
│ │ ├── /monitors
│ │ └── /diff-viewer
│ ├── /lib
│ │ ├── api-client.ts
│ │ ├── auth.ts
│ │ └── utils.ts
│ └── /public
├── /backend
│ ├── /src
│ │ ├── /routes
│ │ ├── /controllers
│ │ ├── /models
│ │ ├── /services
│ │ │ ├── fetcher.ts
│ │ │ ├── differ.ts
│ │ │ ├── scheduler.ts
│ │ │ └── alerter.ts
│ │ ├── /jobs
│ │ └── /utils
│ ├── /db
│ │ └── /migrations
│ └── /tests
├── /docs
│ ├── spec.md
│ ├── task.md
│ ├── actions.md
│ └── claude.md (this file)
└── README.md
```
---
## Core Entities & Data Models
### User
```typescript
{
id: string
email: string
passwordHash: string
plan: 'free' | 'pro' | 'business' | 'enterprise'
stripeCustomerId: string
createdAt: Date
lastLoginAt: Date
}
```
### Monitor
```typescript
{
id: string
userId: string
url: string
name: string
frequency: number // minutes
status: 'active' | 'paused' | 'error'
// Advanced features
elementSelector?: string
ignoreRules?: {
type: 'css' | 'regex' | 'text'
value: string
}[]
keywordRules?: {
keyword: string
type: 'appears' | 'disappears' | 'count'
threshold?: number
}[]
// Metadata
lastCheckedAt?: Date
lastChangedAt?: Date
consecutiveErrors: number
createdAt: Date
}
```
### Snapshot
```typescript
{
id: string
monitorId: string
htmlContent: string
contentHash: string
screenshotUrl?: string
// Status
httpStatus: number
responseTime: number
changed: boolean
changePercentage?: number
// Errors
errorMessage?: string
// Metadata
createdAt: Date
}
```
### Alert
```typescript
{
id: string
monitorId: string
snapshotId: string
userId: string
// Alert details
type: 'change' | 'error' | 'keyword'
title: string
summary?: string
// Delivery
channels: ('email' | 'slack' | 'webhook')[]
deliveredAt?: Date
readAt?: Date
createdAt: Date
}
```
---
## Key Algorithms & Logic
### Change Detection
```typescript
// Simple hash comparison for binary change detection
const changed = previousHash !== currentHash
// Text diff for detailed comparison
const diff = diffLines(previousText, currentText)
const changePercentage = (changedLines / totalLines) * 100
// Severity calculation
const severity =
changePercentage > 50 ? 'major' :
changePercentage > 10 ? 'medium' : 'minor'
```
### Noise Filtering
```typescript
// Remove common noise patterns
function filterNoise(html: string): string {
// Remove timestamps
html = html.replace(/\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g, '')
// Remove cookie banners (common selectors)
const noisySelectors = [
'.cookie-banner',
'#cookie-notice',
'[class*="consent"]',
// ... more patterns
]
// Parse and remove elements
const $ = cheerio.load(html)
noisySelectors.forEach(sel => $(sel).remove())
return $.html()
}
```
### Keyword Detection
```typescript
function checkKeywords(
previousText: string,
currentText: string,
rules: KeywordRule[]
): KeywordMatch[] {
const matches = []
for (const rule of rules) {
const prevMatch = previousText.includes(rule.keyword)
const currMatch = currentText.includes(rule.keyword)
if (rule.type === 'appears' && !prevMatch && currMatch) {
matches.push({ rule, type: 'appeared' })
}
if (rule.type === 'disappears' && prevMatch && !currMatch) {
matches.push({ rule, type: 'disappeared' })
}
// Count logic...
}
return matches
}
```
---
## Development Guidelines
### When Working on This Project
1. **Prioritize MVP**: Focus on core features before adding complexity
2. **Performance matters**: Diffing and fetching should be fast (<2s)
3. **Noise reduction is key**: This is our competitive advantage
4. **User feedback loop**: Build in ways to learn from false positives
5. **Security first**: Never store credentials in plain text, sanitize all URLs
### Code Style
- Use TypeScript strict mode
- Write unit tests for core algorithms (differ, filter, keyword)
- Use async/await, avoid callbacks
- Prefer functional programming patterns
- Comment complex logic, especially regex patterns
### API Design Principles
- RESTful endpoints
- Use proper HTTP status codes
- Return consistent error format:
```json
{
"error": "monitor_not_found",
"message": "Monitor with id 123 not found",
"details": {}
}
```
- Paginate list endpoints (monitors, snapshots, alerts)
- Version API if breaking changes needed (/v1/monitors)
---
## Common Tasks & Commands
### When Starting Development
```bash
# Clone and setup
git clone <repo>
cd website-monitor
# Install dependencies
cd frontend && npm install
cd ../backend && npm install
# Setup environment
cp .env.example .env
# Edit .env with your values
# Start database
docker-compose up -d postgres redis
# Run migrations
cd backend && npm run migrate
# Start dev servers
cd frontend && npm run dev
cd backend && npm run dev
```
### Running Tests
```bash
# Frontend tests
cd frontend && npm test
# Backend tests
cd backend && npm test
# E2E tests
npm run test:e2e
```
### Deployment
```bash
# Build frontend
cd frontend && npm run build
# Deploy frontend (Vercel)
vercel deploy --prod
# Deploy backend
docker build -t monitor-api .
docker push <registry>/monitor-api
# Deploy to Railway/Render/AWS
```
---
## Key User Flows to Support
When building features, always consider these primary use cases:
1. **Job seeker monitoring career pages** (most common)
- Needs: Fast frequency (5 min), keyword alerts, instant notifications
2. **Price tracking for e-commerce** (high value)
- Needs: Element selection, numeric comparison, reliable alerts
3. **Competitor monitoring** (B2B focus)
- Needs: Multiple monitors, digest mode, AI summaries
4. **Stock/availability tracking** (urgent)
- Needs: Fastest frequency (1 min), SMS alerts, auto-pause
5. **Policy/regulation monitoring** (professional)
- Needs: Long-term history, team sharing, AI summaries
---
## Integration Points
### Email Service (SendGrid/Postmark)
```typescript
async function sendChangeAlert(monitor: Monitor, snapshot: Snapshot) {
const diffUrl = `https://app.example.com/monitors/${monitor.id}/diff/${snapshot.id}`
await emailService.send({
to: monitor.user.email,
subject: `Change detected: ${monitor.name}`,
template: 'change-alert',
data: {
monitorName: monitor.name,
url: monitor.url,
timestamp: snapshot.createdAt,
diffUrl,
changePercentage: snapshot.changePercentage
}
})
}
```
### Stripe Billing
```typescript
async function handleSubscription(userId: string, plan: string) {
const user = await db.users.findById(userId)
// Create or update subscription
const subscription = await stripe.subscriptions.create({
customer: user.stripeCustomerId,
items: [{ price: PRICE_IDS[plan] }]
})
// Update user plan
await db.users.update(userId, {
plan,
subscriptionId: subscription.id
})
}
```
### Job Queue (Bull)
```typescript
// Schedule monitor checks
async function scheduleMonitor(monitor: Monitor) {
await monitorQueue.add(
'check-monitor',
{ monitorId: monitor.id },
{
repeat: {
every: monitor.frequency * 60 * 1000 // convert to ms
},
jobId: `monitor-${monitor.id}`
}
)
}
// Process checks
monitorQueue.process('check-monitor', async (job) => {
const { monitorId } = job.data
await checkMonitor(monitorId)
})
```
---
## Testing Strategy
### Unit Tests
- Diff algorithms
- Noise filtering
- Keyword matching
- Ignore rules application
### Integration Tests
- API endpoints
- Database operations
- Job queue processing
### E2E Tests
- User registration & login
- Monitor creation & management
- Alert delivery
- Subscription changes
### Performance Tests
- Fetch speed with various page sizes
- Diff calculation speed
- Concurrent monitor checks
- Database query performance
---
## Deployment Checklist
Before deploying to production:
- [ ] Environment variables configured
- [ ] Database migrations run
- [ ] SSL certificates configured
- [ ] Email deliverability tested
- [ ] Payment processing tested (Stripe test mode live mode)
- [ ] Error tracking configured (Sentry)
- [ ] Monitoring & alerts set up (uptime, error rate, queue health)
- [ ] Backup strategy implemented
- [ ] Rate limiting configured
- [ ] GDPR compliance (privacy policy, data export/deletion)
- [ ] Security headers configured
- [ ] API documentation updated
---
## Troubleshooting Common Issues
### "Monitor keeps triggering false alerts"
- Check if noise filtering is working
- Review ignore rules for the monitor
- Look at diff to identify changing element
- Add custom ignore rule for that element
### "Some pages aren't being monitored correctly"
- Check if page requires JavaScript rendering
- Try enabling headless browser mode
- Check if page requires authentication
- Look for CAPTCHA or bot detection
### "Alerts aren't being delivered"
- Check email service status
- Verify email isn't going to spam
- Check alert queue for errors
- Verify user's alert settings
### "System is slow/overloaded"
- Check Redis queue health
- Look for monitors with very high frequency
- Check database query performance
- Consider scaling workers horizontally
---
## Metrics to Track
### Technical Metrics
- Average check duration
- Diff calculation time
- Check success rate
- Alert delivery rate
- Queue processing lag
### Product Metrics
- Active monitors per user
- Alerts sent per day
- False positive rate (from user feedback)
- Feature adoption (keywords, elements, integrations)
### Business Metrics
- Free Paid conversion rate
- Monthly churn rate
- Average revenue per user (ARPU)
- Customer acquisition cost (CAC)
- Lifetime value (LTV)
---
## Resources & Documentation
### External Documentation
- [Next.js Docs](https://nextjs.org/docs)
- [Tailwind CSS](https://tailwindcss.com/docs)
- [Playwright Docs](https://playwright.dev)
- [Bull Queue](https://github.com/OptimalBits/bull)
- [Stripe API](https://stripe.com/docs/api)
### Internal Documentation
- See `spec.md` for complete feature specifications
- See `task.md` for development roadmap
- See `actions.md` for user workflows and use cases
---
## Future Considerations
### Potential Enhancements
- Mobile app (React Native or Progressive Web App)
- Browser extension for quick monitor addition
- AI-powered change importance scoring
- Collaborative features (team annotations, approval workflows)
- Marketplace for monitor templates
- Affiliate program for power users
### Scaling Considerations
- Distributed workers across multiple regions
- Caching layer for frequently accessed pages
- Database sharding by user
- Separate queue for high-frequency monitors
- CDN for snapshot storage
---
## Notes for Claude
When working on this project:
1. **Always reference these docs**: spec.md, task.md, actions.md, and this file
2. **MVP mindset**: Implement the simplest solution that works first
3. **User-centric**: Consider the user workflows in actions.md when building features
4. **Security-conscious**: Validate URLs, sanitize inputs, encrypt sensitive data
5. **Performance-aware**: Optimize for speed, especially diff calculation
6. **Ask clarifying questions**: If requirements are ambiguous, ask before implementing
7. **Test as you go**: Write tests for core functionality
8. **Document decisions**: Update these docs when making architectural decisions
### Common Questions & Answers
**Q: Should we support authenticated pages in MVP?**
A: No, save for V2. Focus on public pages first.
**Q: What diff library should we use?**
A: `diff` (npm) or `jsdiff` for JavaScript, `difflib` for Python.
**Q: How do we handle CAPTCHA?**
A: For MVP, just alert the user. For V2, consider residential proxies or browser fingerprinting.
**Q: Should we store full HTML or just text?**
A: Store both: full HTML for accuracy, extracted text for diffing performance.
**Q: What's the minimum viable frequency?**
A: 5 minutes for paid users, 1 hour for free tier.
---
## Quick Reference
### Key Files
- `spec.md` - Feature specifications
- `task.md` - Development tasks and roadmap
- `actions.md` - User workflows and use cases
- `claude.md` - This file (project context)
### Key Concepts
- **Noise reduction** - Core differentiator
- **Keyword alerts** - High-value feature
- **Element selection** - Monitor specific parts
- **Change severity** - Classify importance
### Pricing Tiers (Under Review - See findings_market.md)
- **Free**: 5 monitors, 1hr frequency
- **Pro**: 50 monitors, 5min frequency, $19-29/mo
- **Business**: 200 monitors, 1min frequency, teams, $99-149/mo
- **Enterprise**: Unlimited, custom pricing
**Note:** Considering switch to "checks/month" model instead of "monitors + frequency" for fairer pricing
---
## Competitive Positioning (Updated 2026-01-18)
### Market Landscape
We compete with established players (Visualping, Distill, Fluxguard) and budget options (Sken.io, ChangeDetection.io).
### vs. Visualping
- **Their Strength**: Enterprise trust ("85% Fortune 500"), broad features
- **Our Angle**: "Better noise control + fairer pricing without the enterprise bloat"
- **Messaging**: "Built for teams who need results, not demos"
### vs. Distill.io
- **Their Strength**: Conditions/filters, established user base
- **Our Angle**: "Team features built-in + modern UX not stuck in 2015"
- **Messaging**: "Collaboration-first, not an afterthought"
### vs. Fluxguard
- **Their Strength**: AI summaries, enterprise focus, sales-led
- **Our Angle**: "Self-serve pricing + instant setup no demo calls required"
- **Messaging**: "AI-powered intelligence without the enterprise tax"
### vs. ChangeDetection.io / Sken.io
- **Their Strength**: Low price ($3-9/mo), simple
- **Our Angle**: "Advanced features (keywords, integrations, teams) without complexity"
- **Messaging**: "Powerful, but still simple"
### How We Win
1. **Superior noise control** (multi-layer filtering)
2. **Workflow integrations** (Slack/Teams/Webhooks early)
3. **Use-case marketing** (SEO, Competitor, Policy segments)
4. **Modern UX** (not stuck in legacy design)
5. **Fair pricing** (considering checks/month model)
---
*Last updated: 2026-01-18*