Automated FDA regulatory monitoring system that uses the Federal Register API and eCFR API to detect regulatory changes and generate AI-powered summaries for quick impact assessment. This project was part of my Internship at Accelerated Biosciences in Taiwan.
Overview
Problem: Manually monitoring FDA regulatory publications and CFR regulation changes is time-consuming and risks missing important updates that could impact business operations.
Solution: Automated tool that monitors both Federal Register announcements and CFR regulation text via official government APIs, detects changes, cross-references them, and provides AI-generated summaries with email notifications.
Key Features
Federal Register Monitoring
- Automated Detection: Weekly checks of Federal Register for FDA publications
- Intelligent Filtering: 3-stage hybrid system (keywords → scoring → AI review)
- Topic Focused: Filters for cell therapy, gene therapy, stem cells, exosomes, secretomes
- 4-Tier Relevance: HIGH/MEDIUM/LOW/MINIMAL priority classification
- AI Summarization: Claude-powered summaries with key points and action items
CFR Regulation Monitoring
- Snapshot System: Periodic snapshots of 21 CFR Parts 1271, 600, 610, 312, 601
- Change Detection: Automatically detects ADDED, MODIFIED, REMOVED, RENAMED sections
- Diff Analysis: Shows before/after text comparison
- AI Impact Analysis: Relevance scoring (0-100) with detailed impact assessment
- Cross-Referencing: Links CFR changes to Federal Register documents
- Timeline Building: Shows regulatory progression (PRORULE → RULE → CFR change)
- Classification: Identifies expected vs unexpected changes
Integration & Delivery
- Interactive Workflow: Single
fda workflowcommand for complete monitoring - Email Digests: Combined FR documents + CFR changes with HTML rendering
- No Web Scraping: Uses official government APIs (Federal Register + eCFR)
- Historical Tracking: SQLite database maintains complete regulatory history
- Unified CLI: Single command-line tool for all operations
Usage
Primary Command: Interactive Workflow
The recommended way to use FDA Watch & Digest is through the interactive workflow:
python fda.py workflowWhat it does:
-
Step 1: Federal Register Detection
- Asks: “How many days back should I check?” (default: 7 days)
- Fetches documents from Federal Register API
- Applies 3-stage relevance filtering (keywords → scoring → AI review)
- Shows top HIGH priority documents
- Auto-saves to database
-
Step 2: CFR Regulation Monitoring
- Creates snapshots of monitored CFR parts (1271, 600, 610, 312, 601)
- Detects changes between snapshots
- Runs AI analysis on detected changes
- Reports relevance distribution
-
Step 3: AI Summarization
- Shows document counts (HIGH + MEDIUM relevance)
- Displays estimated API cost
- Asks: “Generate AI summaries? (Y/n)”
- Processes documents with progress indicator
- Generates 2-paragraph summaries (context + impact, no action items)
-
Step 4: Email Delivery
- Generates email digest with FR documents + CFR changes
- Shows counts by relevance level
- Asks: “Preview / Send / Skip?”
- Opens preview in browser if requested
- Sends email if confirmed (even if no changes)
-
Summary: Shows what was accomplished (documents saved, CFR changes detected, summaries generated, email sent)
Quick Mode (for automation/cron):
python fda.py workflow --quick- Uses default 7 days lookback
- Auto-summarizes HIGH + MEDIUM documents
- Auto-sends email (even if no updates)
- No interactive prompts
Individual Commands
For granular control, use individual commands:
# Document detection
python fda.py run --days 60 # Detect and save new documents
# Testing & validation
python fda.py filter --days 60 # Test filtering system (with AI)
python fda.py filter --no-ai # Test without AI review (faster)
python fda.py test # Comprehensive system test
python fda.py smtp # Test SMTP connection only
# AI Summarization
python fda.py summarize # Summarize HIGH & MEDIUM docs (recommended)
python fda.py summarize --all # Summarize all relevance levels
python fda.py summarize --limit 5 # Limit to 5 documents
python fda.py summarize --relevance HIGH # Only HIGH relevance
# Email operations
python fda.py preview # Preview email in browser
python fda.py send # Send email digest
# Database viewing
python fda.py show # View all documents
python fda.py show --relevance HIGH # Filter by relevance
python fda.py show --type RULE # Filter by document type
python fda.py show --limit 10 # Limit results
# CFR Monitoring (NEW)
python fda.py cfr snapshot # Create CFR snapshots for all monitored parts
python fda.py cfr snapshot --parts 1271 # Snapshot specific part only
python fda.py cfr detect # Detect changes between snapshots
python fda.py cfr analyze # Run AI analysis on detected changes
python fda.py cfr analyze --limit 10 # Analyze up to 10 changes
python fda.py cfr show # View recent CFR changes
python fda.py cfr show --part 1271 # Show changes for specific part
python fda.py cfr show --relevance HIGH # Filter by relevance
python fda.py cfr show --verbose # Show detailed analysis
# Help
python fda.py --help # Show all commands
python fda.py workflow --help # Help for specific commandFiltering System
The system uses a 3-stage hybrid approach to identify relevant documents:
Stage 1: Keyword Pre-screening
- Searches entire document (title + abstract + body text)
- Case-insensitive matching
- Keywords: cell therapy, gene therapy, stem cells, exosomes, secretomes, regenerative medicine
- Documents without keywords are rejected and not stored
Stage 2: Weighted Relevance Scoring
- Title matches: 10 points per keyword
- Abstract matches: 5 points per keyword
- Body matches: 1 point per keyword
- Document type bonuses:
- RULE: +3 points (critical regulations)
- PRORULE: +2 points (proposed changes)
- NOTICE: +0 points
Stage 3: AI Validation (Claude)
- Only reviews borderline cases (MEDIUM/LOW relevance)
- HIGH relevance documents (≥15 pts) skip AI review (already clearly relevant)
- MINIMAL relevance documents (1-4 pts) skip AI review (clearly not relevant)
- Validates actual relevance vs. keyword spam
4-Tier Classification
- HIGH (≥15 pts): Direct regulatory impact
- MEDIUM (10-14 pts): May affect compliance
- LOW (5-9 pts): Potentially relevant
- MINIMAL (1-4 pts): Brief mentions
- REJECTED (0 pts): No keywords - not stored
AI Summarization
Uses Claude Haiku 4.5 to generate concise, regulatory-focused summaries.
Summary Structure
Each AI-generated summary includes:
- KEY POINTS: 2-3 bullet points highlighting the most important takeaways
- SUMMARY: Exactly 2 paragraphs:
- First paragraph: What the regulation/guidance is about and why it was issued (context and background)
- Second paragraph: Regulatory impact on cell/gene therapy companies, what changes, deadlines, and comment periods Note: Summaries focus on explaining the regulation itself, not prescribing action items. They provide regulatory context and impact assessment, letting recipients determine appropriate responses.
Cost Optimization
- Default: Only HIGH and MEDIUM relevance documents are summarized
- Rationale: Highest regulatory impact documents require detailed analysis
- Estimated cost: ~$0.01-0.03 per summary (Claude Haiku 4.5)
- Typical usage: 2-3 summaries per week = ~$0.10-0.30/month
CFR Regulation Monitoring
The system now monitors actual CFR regulation text changes, not just Federal Register announcements.
How It Works
-
Snapshot Creation (
fda cfr snapshot)- Fetches current text of monitored CFR parts from eCFR API
- Creates SHA-256 hash of content
- Stores in database with date
- Monitored parts: 21 CFR 1271 (HCT/Ps), 600/610 (Biologics), 312 (INDs), 601 (Licensing)
-
Change Detection (
fda cfr detect)- Compares two most recent snapshots for each part
- Identifies ADDED, MODIFIED, REMOVED, RENAMED sections
- Generates before/after text comparison
- Calculates similarity scores
-
AI Analysis (
fda cfr analyze)- Relevance scoring (0-100) based on cell/gene therapy impact
- Detailed impact analysis (2,000+ characters)
- Compliance action recommendations (4,000+ characters)
- Classification by relevance level (HIGH/MEDIUM/LOW/MINIMAL)
-
Cross-Referencing (automatic)
- Links CFR changes to Federal Register documents
- Multi-strategy matching (direct references, keywords, part-level)
- Builds timelines showing regulatory progression
- Classifies as expected (announced in FR) or unexpected (silent change)
CFR Change Data
Each detected CFR change includes:
- Citation: e.g., “21 CFR § 1271.90”
- Change Type: ADDED, MODIFIED, REMOVED, or RENAMED
- Before/After Text: Full text comparison
- Diff Summary: Brief description of what changed
- Relevance Score: 0-100 (AI-generated)
- Impact Analysis: Regulatory implications and affected companies
- Compliance Notes: Specific actions required, deadlines
- Related Documents: Federal Register documents that announced the change
- Timeline: Chronological progression (PRORULE → RULE → CFR change)
- Classification: Expected (announced) vs Unexpected (silent)
Email Integration
CFR changes are automatically included in email digests:
- Displayed in separate section after Federal Register documents
- Color-coded by change type (green=ADDED, yellow=MODIFIED, red=REMOVED)
- Collapsible before/after text sections
- Expandable AI analysis and compliance notes
- Related Federal Register documents listed with confidence levels
Technical Architecture
Data Flow
Federal Register API
│
▼
┌──────────────────────────┐
│ Change Detector │ Compares with database
│ (detect new docs) │ Identifies what's new
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Relevance Filter │ 3-stage hybrid system:
│ (3-stage hybrid) │ 1. Keywords
│ │ 2. Scoring
│ │ 3. AI validation
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ SQLite Database │ Stores relevant docs
│ (persistent storage) │ Tracks relevance scores
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Document Summarizer │ Claude AI generates
│ (Claude AI) │ actionable summaries
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Email Notifier │ Sends priority-based
│ (SMTP) │ digest to stakeholders
└──────────────────────────┘
Technology Stack
- Language: Python 3.11+
- APIs: Federal Register API (no auth required)
- AI: Anthropic Claude Haiku 4.5
- Database: SQLite
- Email: SMTP (Gmail or custom)
- HTTP Client:
requests - Scheduling: Cron or Python
schedule(optional)
Development Status
Current Phase: Production Ready ✅
Completed Features
- Federal Register API integration
- 3-stage relevance filtering system
- Change detection and deduplication
- SQLite database with relevance tracking
- AI-powered summarization (Claude Haiku 4.5)
- Email notifications (HTML + plain text)
- Interactive workflow orchestration
- Comprehensive CLI tool
- System testing and validation
Roadmap
- Automated scheduling (cron integration)
- eCFR API integration (monitor regulation text changes)
- Web dashboard for viewing history
- Multi-user support with preferences
- Advanced analytics and trend analysis
Why This Approach?
Advantages over web scraping:
- ✅ Reliable - Official government API, well-maintained
- ✅ Free - No authentication or API keys required
- ✅ Structured - Clean JSON responses, easy to parse
- ✅ Complete - Access to historical data and metadata
- ✅ Fast - No HTML parsing or page rendering
- ✅ Legal - Using official public APIs as intended
Regulatory monitoring best practices:
- Federal Register is the authoritative source for new regulations
- Proposed rules give advance warning before final implementation
- This is where regulatory professionals look first
Last Updated: 2025-11-17 Status: Production Ready Version: 1.0