FDA Watch & Digest

Automated FDA regulatory monitoring system that uses the Federal Register API and eCFR API to detect regulatory changes and generate AI-powered summaries for quick impact assessment. This project was part of my Internship at Accelerated Biosciences in Taiwan.

Overview

Problem: Manually monitoring FDA regulatory publications and CFR regulation changes is time-consuming and risks missing important updates that could impact business operations.

Solution: Automated tool that monitors both Federal Register announcements and CFR regulation text via official government APIs, detects changes, cross-references them, and provides AI-generated summaries with email notifications.

Key Features

Federal Register Monitoring

Automated Detection: Weekly checks of Federal Register for FDA publications
Intelligent Filtering: 3-stage hybrid system (keywords → scoring → AI review)
Topic Focused: Filters for cell therapy, gene therapy, stem cells, exosomes, secretomes
4-Tier Relevance: HIGH/MEDIUM/LOW/MINIMAL priority classification
AI Summarization: Claude-powered summaries with key points and action items

CFR Regulation Monitoring

Snapshot System: Periodic snapshots of 21 CFR Parts 1271, 600, 610, 312, 601
Change Detection: Automatically detects ADDED, MODIFIED, REMOVED, RENAMED sections
Diff Analysis: Shows before/after text comparison
AI Impact Analysis: Relevance scoring (0-100) with detailed impact assessment
Cross-Referencing: Links CFR changes to Federal Register documents
Timeline Building: Shows regulatory progression (PRORULE → RULE → CFR change)
Classification: Identifies expected vs unexpected changes

Integration & Delivery

Interactive Workflow: Single fda workflow command for complete monitoring
Email Digests: Combined FR documents + CFR changes with HTML rendering
No Web Scraping: Uses official government APIs (Federal Register + eCFR)
Historical Tracking: SQLite database maintains complete regulatory history
Unified CLI: Single command-line tool for all operations

Usage

Primary Command: Interactive Workflow

The recommended way to use FDA Watch & Digest is through the interactive workflow:

python fda.py workflow

What it does:

Step 1: Federal Register Detection
- Asks: “How many days back should I check?” (default: 7 days)
- Fetches documents from Federal Register API
- Applies 3-stage relevance filtering (keywords → scoring → AI review)
- Shows top HIGH priority documents
- Auto-saves to database
Step 2: CFR Regulation Monitoring
- Creates snapshots of monitored CFR parts (1271, 600, 610, 312, 601)
- Detects changes between snapshots
- Runs AI analysis on detected changes
- Reports relevance distribution
Step 3: AI Summarization
- Shows document counts (HIGH + MEDIUM relevance)
- Displays estimated API cost
- Asks: “Generate AI summaries? (Y/n)”
- Processes documents with progress indicator
- Generates 2-paragraph summaries (context + impact, no action items)
Step 4: Email Delivery
- Generates email digest with FR documents + CFR changes
- Shows counts by relevance level
- Asks: “Preview / Send / Skip?”
- Opens preview in browser if requested
- Sends email if confirmed (even if no changes)
Summary: Shows what was accomplished (documents saved, CFR changes detected, summaries generated, email sent)

Quick Mode (for automation/cron):

python fda.py workflow --quick

Uses default 7 days lookback
Auto-summarizes HIGH + MEDIUM documents
Auto-sends email (even if no updates)
No interactive prompts

Individual Commands

For granular control, use individual commands:

# Document detection
python fda.py run --days 60          # Detect and save new documents
 
# Testing & validation
python fda.py filter --days 60       # Test filtering system (with AI)
python fda.py filter --no-ai         # Test without AI review (faster)
python fda.py test                   # Comprehensive system test
python fda.py smtp                   # Test SMTP connection only
 
# AI Summarization
python fda.py summarize              # Summarize HIGH & MEDIUM docs (recommended)
python fda.py summarize --all        # Summarize all relevance levels
python fda.py summarize --limit 5    # Limit to 5 documents
python fda.py summarize --relevance HIGH  # Only HIGH relevance
 
# Email operations
python fda.py preview                # Preview email in browser
python fda.py send                   # Send email digest
 
# Database viewing
python fda.py show                   # View all documents
python fda.py show --relevance HIGH  # Filter by relevance
python fda.py show --type RULE       # Filter by document type
python fda.py show --limit 10        # Limit results
 
# CFR Monitoring (NEW)
python fda.py cfr snapshot           # Create CFR snapshots for all monitored parts
python fda.py cfr snapshot --parts 1271  # Snapshot specific part only
python fda.py cfr detect             # Detect changes between snapshots
python fda.py cfr analyze            # Run AI analysis on detected changes
python fda.py cfr analyze --limit 10 # Analyze up to 10 changes
python fda.py cfr show               # View recent CFR changes
python fda.py cfr show --part 1271   # Show changes for specific part
python fda.py cfr show --relevance HIGH  # Filter by relevance
python fda.py cfr show --verbose     # Show detailed analysis
 
# Help
python fda.py --help                 # Show all commands
python fda.py workflow --help        # Help for specific command

Filtering System

The system uses a 3-stage hybrid approach to identify relevant documents:

Stage 1: Keyword Pre-screening

Searches entire document (title + abstract + body text)
Case-insensitive matching
Keywords: cell therapy, gene therapy, stem cells, exosomes, secretomes, regenerative medicine
Documents without keywords are rejected and not stored

Stage 2: Weighted Relevance Scoring

Title matches: 10 points per keyword
Abstract matches: 5 points per keyword
Body matches: 1 point per keyword
Document type bonuses:
- RULE: +3 points (critical regulations)
- PRORULE: +2 points (proposed changes)
- NOTICE: +0 points

Stage 3: AI Validation (Claude)

Only reviews borderline cases (MEDIUM/LOW relevance)
HIGH relevance documents (≥15 pts) skip AI review (already clearly relevant)
MINIMAL relevance documents (1-4 pts) skip AI review (clearly not relevant)
Validates actual relevance vs. keyword spam

4-Tier Classification

HIGH (≥15 pts): Direct regulatory impact
MEDIUM (10-14 pts): May affect compliance
LOW (5-9 pts): Potentially relevant
MINIMAL (1-4 pts): Brief mentions
REJECTED (0 pts): No keywords - not stored

AI Summarization

Uses Claude Haiku 4.5 to generate concise, regulatory-focused summaries.

Summary Structure

Each AI-generated summary includes:

KEY POINTS: 2-3 bullet points highlighting the most important takeaways
SUMMARY: Exactly 2 paragraphs:
- First paragraph: What the regulation/guidance is about and why it was issued (context and background)
- Second paragraph: Regulatory impact on cell/gene therapy companies, what changes, deadlines, and comment periods Note: Summaries focus on explaining the regulation itself, not prescribing action items. They provide regulatory context and impact assessment, letting recipients determine appropriate responses.

Cost Optimization

Default: Only HIGH and MEDIUM relevance documents are summarized
Rationale: Highest regulatory impact documents require detailed analysis
Estimated cost: ~$0.01-0.03 per summary (Claude Haiku 4.5)
Typical usage: 2-3 summaries per week = ~$0.10-0.30/month

CFR Regulation Monitoring

The system now monitors actual CFR regulation text changes, not just Federal Register announcements.

How It Works

Snapshot Creation (fda cfr snapshot)
- Fetches current text of monitored CFR parts from eCFR API
- Creates SHA-256 hash of content
- Stores in database with date
- Monitored parts: 21 CFR 1271 (HCT/Ps), 600/610 (Biologics), 312 (INDs), 601 (Licensing)
Change Detection (fda cfr detect)
- Compares two most recent snapshots for each part
- Identifies ADDED, MODIFIED, REMOVED, RENAMED sections
- Generates before/after text comparison
- Calculates similarity scores
AI Analysis (fda cfr analyze)
- Relevance scoring (0-100) based on cell/gene therapy impact
- Detailed impact analysis (2,000+ characters)
- Compliance action recommendations (4,000+ characters)
- Classification by relevance level (HIGH/MEDIUM/LOW/MINIMAL)
Cross-Referencing (automatic)
- Links CFR changes to Federal Register documents
- Multi-strategy matching (direct references, keywords, part-level)
- Builds timelines showing regulatory progression
- Classifies as expected (announced in FR) or unexpected (silent change)

CFR Change Data

Each detected CFR change includes:

Citation: e.g., “21 CFR § 1271.90”
Change Type: ADDED, MODIFIED, REMOVED, or RENAMED
Before/After Text: Full text comparison
Diff Summary: Brief description of what changed
Relevance Score: 0-100 (AI-generated)
Impact Analysis: Regulatory implications and affected companies
Compliance Notes: Specific actions required, deadlines
Related Documents: Federal Register documents that announced the change
Timeline: Chronological progression (PRORULE → RULE → CFR change)
Classification: Expected (announced) vs Unexpected (silent)

Email Integration

CFR changes are automatically included in email digests:

Displayed in separate section after Federal Register documents
Color-coded by change type (green=ADDED, yellow=MODIFIED, red=REMOVED)
Collapsible before/after text sections
Expandable AI analysis and compliance notes
Related Federal Register documents listed with confidence levels

Technical Architecture

Data Flow

Federal Register API
        │
        ▼
┌──────────────────────────┐
│   Change Detector        │  Compares with database
│   (detect new docs)      │  Identifies what's new
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│  Relevance Filter        │  3-stage hybrid system:
│  (3-stage hybrid)        │  1. Keywords  
│                          │  2. Scoring
│                          │  3. AI validation
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│   SQLite Database        │  Stores relevant docs
│   (persistent storage)   │  Tracks relevance scores
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│  Document Summarizer     │  Claude AI generates
│  (Claude AI)             │  actionable summaries
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│  Email Notifier          │  Sends priority-based
│  (SMTP)                  │  digest to stakeholders
└──────────────────────────┘

Technology Stack

Language: Python 3.11+
APIs: Federal Register API (no auth required)
AI: Anthropic Claude Haiku 4.5
Database: SQLite
Email: SMTP (Gmail or custom)
HTTP Client: requests
Scheduling: Cron or Python schedule (optional)

Development Status

Current Phase: Production Ready ✅

Completed Features

Federal Register API integration
3-stage relevance filtering system
Change detection and deduplication
SQLite database with relevance tracking
AI-powered summarization (Claude Haiku 4.5)
Email notifications (HTML + plain text)
Interactive workflow orchestration
Comprehensive CLI tool
System testing and validation

Roadmap

Automated scheduling (cron integration)
eCFR API integration (monitor regulation text changes)
Web dashboard for viewing history
Multi-user support with preferences
Advanced analytics and trend analysis

Why This Approach?

Advantages over web scraping:

✅ Reliable - Official government API, well-maintained
✅ Free - No authentication or API keys required
✅ Structured - Clean JSON responses, easy to parse
✅ Complete - Access to historical data and metadata
✅ Fast - No HTML parsing or page rendering
✅ Legal - Using official public APIs as intended

Regulatory monitoring best practices:

Federal Register is the authoritative source for new regulations
Proposed rules give advance warning before final implementation
This is where regulatory professionals look first

Last Updated: 2025-11-17 Status: Production Ready Version: 1.0

index

Franco's Digital Garden

Explorer

FDA Watch & Digest

Overview

Key Features

Federal Register Monitoring

CFR Regulation Monitoring

Integration & Delivery

Usage

Primary Command: Interactive Workflow

Individual Commands

Filtering System

Stage 1: Keyword Pre-screening

Stage 2: Weighted Relevance Scoring

Stage 3: AI Validation (Claude)

4-Tier Classification

AI Summarization

Summary Structure

Cost Optimization

CFR Regulation Monitoring

How It Works

CFR Change Data

Email Integration

Technical Architecture

Data Flow

Technology Stack

Development Status

Completed Features

Roadmap

Why This Approach?

Graph View

Table of Contents

Backlinks