37 KiB
News Domain Completion - Task Implementation Guide
Overview
Complete the final 5% of the news domain by implementing OpenRouter-only LLM sentiment analysis, vector embeddings, and APScheduler job execution. This builds on 95% complete infrastructure with PostgreSQL + TimescaleDB + pgvectorscale stack.
Total Estimated Time: 12-16 hours with AI assistance
Target Completion: 3-4 days
Test Coverage Requirement: Maintain >85%
Architecture Pattern: Database → Entity → Repository → Service → Scheduling
Implementation Phases
Phase 1: Foundation (4-7 hours)
Database and entity layer enhancements for LLM integration
Phase 2: Data Access (2-3 hours)
Repository layer enhancements for vector and job operations
Phase 3: LLM Integration (5-8 hours)
OpenRouter clients and service integration
Phase 4: Scheduling (4-6 hours)
Job scheduling and CLI integration
Phase 5: Validation (3-5 hours)
Testing, documentation, and monitoring
Task Breakdown
Phase 1: Foundation
T001: Database Migration - NewsJobConfig Table
Priority: Critical | Duration: 1-2 hours | Dependencies: None
Description: Create database migration for news job configurations table with proper indexes
Acceptance Criteria:
news_job_configstable created with UUID primary key- JSONB fields for symbols and categories with validation
- Proper indexes for enabled/frequency queries
- Migration script tests with rollback capability
Implementation Details:
# Migration structure
def upgrade():
op.create_table(
'news_job_configs',
sa.Column('id', postgresql.UUID(), primary_key=True),
sa.Column('name', sa.String(255), nullable=False),
sa.Column('symbols', postgresql.JSONB(), nullable=False),
sa.Column('categories', postgresql.JSONB(), nullable=False),
sa.Column('frequency_cron', sa.String(100), nullable=False),
sa.Column('enabled', sa.Boolean(), default=True),
sa.Column('last_run', sa.DateTime(timezone=True)),
sa.Column('created_at', sa.DateTime(timezone=True), default=func.now()),
sa.Column('updated_at', sa.DateTime(timezone=True), default=func.now())
)
# Indexes
op.create_index('idx_news_jobs_enabled_frequency', 'news_job_configs',
['enabled', 'frequency_cron'])
op.create_index('idx_news_jobs_last_run', 'news_job_configs',
['last_run'], postgresql_where=sa.text('enabled = true'))
Files to Modify:
/Users/martinrichards/code/TradingAgents/tradingagents/data/migrations/add_news_job_configs.py
Test Requirements:
- Migration up/down tests
- Index performance validation
- Constraint validation tests
T002: Enhance NewsArticle Entity - Sentiment and Embeddings
Priority: Critical | Duration: 2-3 hours | Dependencies: T001
Description: Add LLM sentiment fields and embedding validation to NewsArticle entity
Acceptance Criteria:
- Add
sentiment_score,sentiment_confidence,sentiment_labelfields - Add
title_embeddingandcontent_embeddingvector fields - Enhanced
validate()method with sentiment range checks - Updated transformations for vector handling
- Embedding dimension validation (1536)
Implementation Details:
@dataclass
class NewsArticle:
# Existing fields...
# LLM sentiment fields
sentiment_score: Optional[float] = None # [-1.0, 1.0]
sentiment_confidence: Optional[float] = None # [0.0, 1.0]
sentiment_label: Optional[str] = None # "positive", "negative", "neutral"
# Vector embedding fields
title_embedding: Optional[List[float]] = None # 1536 dimensions
content_embedding: Optional[List[float]] = None # 1536 dimensions
def validate(self) -> Dict[str, List[str]]:
errors = super().validate()
# Sentiment validation
if self.sentiment_score is not None:
if not -1.0 <= self.sentiment_score <= 1.0:
errors["sentiment_score"] = ["Must be between -1.0 and 1.0"]
if self.sentiment_confidence is not None:
if not 0.0 <= self.sentiment_confidence <= 1.0:
errors["sentiment_confidence"] = ["Must be between 0.0 and 1.0"]
# Vector dimension validation
for field, vector in [("title_embedding", self.title_embedding),
("content_embedding", self.content_embedding)]:
if vector is not None and len(vector) != 1536:
errors[field] = ["Must be exactly 1536 dimensions"]
return errors
def to_record(self) -> Dict[str, Any]:
record = super().to_record()
# Convert vectors to pgvector format if present
if self.title_embedding:
record["title_embedding"] = self.title_embedding
if self.content_embedding:
record["content_embedding"] = self.content_embedding
return record
Files to Modify:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_article.py
Test Requirements:
- Sentiment validation tests (range checks)
- Vector dimension validation tests
- Transformation method tests
- Business rule violation tests
T003: Create NewsJobConfig Entity
Priority: Critical | Duration: 1-2 hours | Dependencies: T001
Description: Implement NewsJobConfig entity for scheduled job management
Acceptance Criteria:
- NewsJobConfig dataclass with all required fields
- Business rule validation for job configuration
- Cron expression validation for frequency
- Symbol list validation
- JSON serialization for database storage
Implementation Details:
@dataclass
class NewsJobConfig:
id: Optional[UUID] = None
name: str = ""
symbols: List[str] = field(default_factory=list)
categories: List[str] = field(default_factory=list)
frequency_cron: str = ""
enabled: bool = True
last_run: Optional[datetime] = None
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
def validate(self) -> Dict[str, List[str]]:
errors = {}
# Name validation
if not self.name or len(self.name) > 255:
errors["name"] = ["Name required and must be <= 255 characters"]
# Symbol validation
if not self.symbols:
errors["symbols"] = ["At least one symbol required"]
for symbol in self.symbols:
if not symbol.isupper() or not symbol.isalpha():
errors["symbols"] = ["Symbols must be uppercase letters only"]
# Cron validation
try:
from croniter import croniter
if not croniter.is_valid(self.frequency_cron):
errors["frequency_cron"] = ["Invalid cron expression"]
except ImportError:
# Fallback validation for simple intervals
if self.frequency_cron not in ["hourly", "daily", "weekly"]:
errors["frequency_cron"] = ["Invalid frequency"]
return errors
Files to Create:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_job_config.py
Test Requirements:
- Job configuration validation tests
- Schedule parsing tests
- Symbol validation tests
- Serialization/deserialization tests
Phase 2: Data Access
T004: Enhance NewsRepository - Vector and Job Operations
Priority: Critical | Duration: 2-3 hours | Dependencies: T002, T003
Description: Add vector similarity search and NewsJobConfig CRUD operations
Acceptance Criteria:
- Vector similarity search with cosine distance
- Batch embedding update operations
- NewsJobConfig CRUD methods
- Optimized query performance for vector operations
- Proper async connection handling
Implementation Details:
class NewsRepository:
# Existing methods...
async def find_similar_articles(self,
embedding: List[float],
limit: int = 10,
threshold: float = 0.8) -> List[NewsArticle]:
"""Find articles similar to given embedding using cosine distance"""
query = """
SELECT *, 1 - (title_embedding <=> %s::vector) as similarity
FROM news_articles
WHERE title_embedding IS NOT NULL
AND 1 - (title_embedding <=> %s::vector) > %s
ORDER BY title_embedding <=> %s::vector
LIMIT %s
"""
async with self._get_connection() as conn:
rows = await conn.fetch(query, embedding, embedding, threshold, embedding, limit)
return [NewsArticle.from_record(dict(row)) for row in rows]
async def batch_update_embeddings(self,
articles: List[NewsArticle]) -> None:
"""Efficiently update embeddings for multiple articles"""
if not articles:
return
query = """
UPDATE news_articles
SET title_embedding = %s, content_embedding = %s, updated_at = now()
WHERE id = %s
"""
async with self._get_connection() as conn:
await conn.executemany(query, [
(article.title_embedding, article.content_embedding, article.id)
for article in articles
if article.id and (article.title_embedding or article.content_embedding)
])
# NewsJobConfig CRUD operations
async def create_job_config(self, config: NewsJobConfig) -> NewsJobConfig:
"""Create new job configuration"""
query = """
INSERT INTO news_job_configs (id, name, symbols, categories, frequency_cron, enabled)
VALUES (%s, %s, %s, %s, %s, %s)
RETURNING *
"""
config.id = config.id or uuid4()
async with self._get_connection() as conn:
row = await conn.fetchrow(query,
config.id, config.name, json.dumps(config.symbols),
json.dumps(config.categories), config.frequency_cron, config.enabled)
return NewsJobConfig.from_record(dict(row))
async def get_active_job_configs(self) -> List[NewsJobConfig]:
"""Get all enabled job configurations"""
query = "SELECT * FROM news_job_configs WHERE enabled = true"
async with self._get_connection() as conn:
rows = await conn.fetch(query)
return [NewsJobConfig.from_record(dict(row)) for row in rows]
Files to Modify:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/repositories/news_repository.py
Test Requirements:
- Vector similarity search tests with mock data
- Batch operation performance tests
- Job config CRUD tests
- Database connection pooling tests
Phase 3: LLM Integration
T005: OpenRouter Client - Sentiment Analysis
Priority: Critical | Duration: 2-3 hours | Dependencies: T002
Description: Implement OpenRouter client for LLM sentiment analysis
Acceptance Criteria:
- OpenRouter API integration for sentiment analysis
- Structured prompts for financial news sentiment
- Response parsing with Pydantic models
- Error handling with graceful fallbacks
- Retry logic with exponential backoff
Implementation Details:
class OpenRouterSentimentClient:
def __init__(self, config: TradingAgentsConfig):
self.api_key = config.openrouter_api_key
self.model = config.quick_think_llm
self.base_url = "https://openrouter.ai/api/v1"
async def analyze_sentiment(self, title: str, content: str) -> SentimentResult:
"""Analyze sentiment of news article"""
prompt = f"""
Analyze the sentiment of this financial news article:
Title: {title}
Content: {content[:1000]}...
Provide sentiment analysis as JSON:
{{
"score": float between -1.0 (very negative) and 1.0 (very positive),
"confidence": float between 0.0 and 1.0,
"label": "positive" | "negative" | "neutral",
"reasoning": "brief explanation"
}}
"""
try:
async with aiohttp.ClientSession() as session:
response = await self._make_request(session, prompt)
return self._parse_sentiment_response(response)
except Exception as e:
logger.warning(f"LLM sentiment analysis failed: {e}")
return self._fallback_sentiment(title, content)
def _fallback_sentiment(self, title: str, content: str) -> SentimentResult:
"""Keyword-based fallback sentiment analysis"""
# Simple keyword-based sentiment as fallback
positive_words = ["gain", "profit", "up", "growth", "buy"]
negative_words = ["loss", "down", "decline", "sell", "drop"]
text = (title + " " + content).lower()
pos_count = sum(word in text for word in positive_words)
neg_count = sum(word in text for word in negative_words)
if pos_count > neg_count:
return SentimentResult(score=0.3, confidence=0.5, label="positive")
elif neg_count > pos_count:
return SentimentResult(score=-0.3, confidence=0.5, label="negative")
else:
return SentimentResult(score=0.0, confidence=0.5, label="neutral")
Files to Create:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_sentiment_client.py
Test Requirements:
- Sentiment analysis API tests with VCR
- Error handling tests
- Response parsing tests
- Fallback mechanism tests
T006: OpenRouter Client - Vector Embeddings
Priority: Critical | Duration: 1-2 hours | Dependencies: T002
Description: Implement OpenRouter client for vector embeddings generation
Acceptance Criteria:
- OpenRouter embeddings API integration
- Text preprocessing for embedding generation
- Batch processing for multiple articles
- 1536-dimensional vector validation
- Proper error handling and retries
Implementation Details:
class OpenRouterEmbeddingsClient:
def __init__(self, config: TradingAgentsConfig):
self.api_key = config.openrouter_api_key
self.model = "openai/text-embedding-ada-002" # Via OpenRouter
self.base_url = "https://openrouter.ai/api/v1"
async def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for multiple texts"""
if not texts:
return []
try:
async with aiohttp.ClientSession() as session:
response = await self._make_embeddings_request(session, texts)
embeddings = self._parse_embeddings_response(response)
# Validate dimensions
for i, embedding in enumerate(embeddings):
if len(embedding) != 1536:
raise ValueError(f"Invalid embedding dimension at index {i}: {len(embedding)}")
return embeddings
except Exception as e:
logger.error(f"Embeddings generation failed: {e}")
# Return zero vectors as fallback
return [[0.0] * 1536 for _ in texts]
async def generate_article_embeddings(self, article: NewsArticle) -> Tuple[List[float], List[float]]:
"""Generate embeddings for article title and content"""
texts = []
# Prepare texts for embedding
if article.title:
texts.append(self._preprocess_text(article.title))
if article.summary:
# Combine title and summary for comprehensive embedding
combined_text = f"{article.title} {article.summary}"
texts.append(self._preprocess_text(combined_text))
if not texts:
return [0.0] * 1536, [0.0] * 1536
embeddings = await self.generate_embeddings(texts)
title_embedding = embeddings[0] if len(embeddings) > 0 else [0.0] * 1536
content_embedding = embeddings[1] if len(embeddings) > 1 else [0.0] * 1536
return title_embedding, content_embedding
def _preprocess_text(self, text: str) -> str:
"""Preprocess text for optimal embedding generation"""
# Remove extra whitespace and limit length
cleaned = " ".join(text.split())
return cleaned[:8000] # OpenAI embedding limit
Files to Create:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_embeddings_client.py
Test Requirements:
- Embeddings API tests with VCR
- Batch processing tests
- Vector dimension validation tests
- Text preprocessing tests
T007: Enhance NewsService - LLM Integration
Priority: Critical | Duration: 2-3 hours | Dependencies: T005, T006
Description: Integrate OpenRouter LLM clients into NewsService workflow
Acceptance Criteria:
- Replace keyword sentiment with LLM analysis
- Add embedding generation to article processing
- End-to-end article processing pipeline
- Proper error handling and fallback strategies
- Integration with existing service methods
Implementation Details:
class NewsService:
def __init__(self,
repository: NewsRepository,
config: TradingAgentsConfig):
self.repository = repository
self.config = config
self.sentiment_client = OpenRouterSentimentClient(config)
self.embeddings_client = OpenRouterEmbeddingsClient(config)
async def process_articles_with_llm(self, articles: List[NewsArticle]) -> List[NewsArticle]:
"""Process articles with LLM sentiment analysis and embeddings"""
processed_articles = []
for article in articles:
try:
# Generate sentiment analysis
sentiment_result = await self.sentiment_client.analyze_sentiment(
article.title, article.summary or ""
)
# Generate embeddings
title_embedding, content_embedding = await self.embeddings_client.generate_article_embeddings(article)
# Update article with LLM results
article.sentiment_score = sentiment_result.score
article.sentiment_confidence = sentiment_result.confidence
article.sentiment_label = sentiment_result.label
article.title_embedding = title_embedding
article.content_embedding = content_embedding
processed_articles.append(article)
except Exception as e:
logger.warning(f"Failed to process article {article.id}: {e}")
# Add article without LLM processing
processed_articles.append(article)
return processed_articles
async def collect_and_process_news(self, symbols: List[str]) -> List[NewsArticle]:
"""Complete pipeline: collect → process → store with LLM analysis"""
# Collect raw articles (existing functionality)
raw_articles = await self.collect_news_articles(symbols)
# Process with LLM
processed_articles = await self.process_articles_with_llm(raw_articles)
# Store processed articles
stored_articles = []
for article in processed_articles:
stored_article = await self.repository.create_article(article)
stored_articles.append(stored_article)
# Batch update embeddings for efficiency
articles_with_embeddings = [a for a in stored_articles
if a.title_embedding or a.content_embedding]
if articles_with_embeddings:
await self.repository.batch_update_embeddings(articles_with_embeddings)
return stored_articles
Files to Modify:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/news_service.py
Test Requirements:
- Integration tests with mocked LLM clients
- Article processing pipeline tests
- Error handling and fallback tests
- Performance tests for batch operations
Phase 4: Scheduling
T008: APScheduler Integration - Job Scheduling
Priority: High | Duration: 3-4 hours | Dependencies: T003, T004, T007
Description: Implement scheduled news collection using APScheduler
Acceptance Criteria:
- APScheduler setup with PostgreSQL job store
- Scheduled job execution with proper error handling
- Job configuration loading and validation
- Status monitoring and failure recovery
- CLI integration for job management
Implementation Details:
class ScheduledNewsCollector:
def __init__(self,
news_service: NewsService,
repository: NewsRepository,
config: TradingAgentsConfig):
self.news_service = news_service
self.repository = repository
self.config = config
self.scheduler = None
async def initialize_scheduler(self):
"""Initialize APScheduler with PostgreSQL job store"""
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
jobstore = SQLAlchemyJobStore(url=self.config.database_url,
tablename='apscheduler_jobs')
self.scheduler = AsyncIOScheduler()
self.scheduler.add_jobstore(jobstore, 'default')
async def load_job_configurations(self):
"""Load and schedule all active job configurations"""
job_configs = await self.repository.get_active_job_configs()
for config in job_configs:
try:
await self._schedule_job(config)
except Exception as e:
logger.error(f"Failed to schedule job {config.name}: {e}")
async def _schedule_job(self, job_config: NewsJobConfig):
"""Schedule a single job configuration"""
job_id = f"news_collection_{job_config.id}"
# Remove existing job if present
if self.scheduler.get_job(job_id):
self.scheduler.remove_job(job_id)
# Add new job
from apscheduler.triggers.cron import CronTrigger
trigger = CronTrigger.from_crontab(job_config.frequency_cron)
self.scheduler.add_job(
self._execute_news_collection,
trigger=trigger,
id=job_id,
args=[job_config],
name=f"News collection: {job_config.name}",
replace_existing=True
)
async def _execute_news_collection(self, job_config: NewsJobConfig):
"""Execute news collection for a job configuration"""
try:
logger.info(f"Starting news collection job: {job_config.name}")
# Collect and process news
articles = await self.news_service.collect_and_process_news(job_config.symbols)
# Update job last run timestamp
job_config.last_run = datetime.now(timezone.utc)
await self.repository.update_job_config(job_config)
logger.info(f"Completed news collection job: {job_config.name}, "
f"collected {len(articles)} articles")
except Exception as e:
logger.error(f"News collection job failed: {job_config.name}, error: {e}")
# Could implement notification/alerting here
async def start_scheduler(self):
"""Start the scheduler"""
if not self.scheduler:
await self.initialize_scheduler()
await self.load_job_configurations()
self.scheduler.start()
logger.info("News collection scheduler started")
async def stop_scheduler(self):
"""Stop the scheduler"""
if self.scheduler:
self.scheduler.shutdown(wait=True)
logger.info("News collection scheduler stopped")
Files to Create:
/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/scheduled_news_collector.py
Test Requirements:
- Job scheduling tests with test scheduler
- Job execution tests with mocked dependencies
- Error handling and retry tests
- Job configuration validation tests
T009: CLI Integration - Job Management Commands
Priority: Medium | Duration: 1-2 hours | Dependencies: T008
Description: Add CLI commands for news job management and manual execution
Acceptance Criteria:
- CLI commands for job creation/management
- Manual job execution commands
- Job status and monitoring commands
- Integration with existing CLI structure
- Proper error handling and user feedback
Implementation Details:
# Add to cli/commands/news_commands.py
@click.group()
def news():
"""News domain management commands"""
pass
@news.group()
def job():
"""Job management commands"""
pass
@job.command()
@click.option('--name', required=True, help='Job name')
@click.option('--symbols', required=True, help='Comma-separated stock symbols')
@click.option('--frequency', required=True, help='Cron expression or simple frequency')
@click.option('--categories', help='Comma-separated news categories')
async def create(name: str, symbols: str, frequency: str, categories: str):
"""Create a new news collection job"""
try:
symbol_list = [s.strip().upper() for s in symbols.split(',')]
category_list = [c.strip() for c in (categories or "").split(',')] if categories else []
config = NewsJobConfig(
name=name,
symbols=symbol_list,
categories=category_list,
frequency_cron=frequency,
enabled=True
)
# Validate configuration
errors = config.validate()
if errors:
click.echo(f"❌ Invalid configuration: {errors}")
return
# Create job
repository = NewsRepository(get_database_config())
created_config = await repository.create_job_config(config)
click.echo(f"✅ Created job: {created_config.name} (ID: {created_config.id})")
except Exception as e:
click.echo(f"❌ Failed to create job: {e}")
@job.command()
async def list():
"""List all job configurations"""
try:
repository = NewsRepository(get_database_config())
configs = await repository.get_all_job_configs()
if not configs:
click.echo("No jobs configured")
return
click.echo("\n📋 News Collection Jobs:")
click.echo("=" * 60)
for config in configs:
status = "🟢 Enabled" if config.enabled else "🔴 Disabled"
last_run = config.last_run.strftime("%Y-%m-%d %H:%M") if config.last_run else "Never"
click.echo(f"{config.name}")
click.echo(f" Status: {status}")
click.echo(f" Symbols: {', '.join(config.symbols)}")
click.echo(f" Schedule: {config.frequency_cron}")
click.echo(f" Last Run: {last_run}")
click.echo()
except Exception as e:
click.echo(f"❌ Failed to list jobs: {e}")
@job.command()
@click.argument('job_id', type=str)
async def run(job_id: str):
"""Manually execute a job"""
try:
repository = NewsRepository(get_database_config())
config = await repository.get_job_config(UUID(job_id))
if not config:
click.echo(f"❌ Job not found: {job_id}")
return
click.echo(f"🚀 Running job: {config.name}")
# Execute job
service = NewsService(repository, get_trading_config())
articles = await service.collect_and_process_news(config.symbols)
click.echo(f"✅ Completed: collected {len(articles)} articles")
except Exception as e:
click.echo(f"❌ Job execution failed: {e}")
Files to Modify:
/Users/martinrichards/code/TradingAgents/cli/commands/news_commands.py
Test Requirements:
- CLI command tests with mocked services
- User input validation tests
- Output formatting tests
Phase 5: Validation
T010: Integration Tests - End-to-End Workflow
Priority: High | Duration: 2-3 hours | Dependencies: T007, T008
Description: Comprehensive integration tests for complete news domain workflow
Acceptance Criteria:
- End-to-end workflow tests from RSS to vector storage
- Agent integration tests via AgentToolkit
- Performance tests for daily collection volumes
- Error recovery and fallback tests
- Test coverage maintained above 85%
Implementation Details:
# tests/domains/news/integration/test_news_workflow.py
class TestNewsWorkflowIntegration:
@pytest.mark.asyncio
async def test_complete_news_processing_pipeline(self, test_db, mock_openrouter):
"""Test complete pipeline from RSS to vector storage"""
# Setup
config = TradingAgentsConfig.from_test_config()
repository = NewsRepository(test_db)
service = NewsService(repository, config)
# Mock OpenRouter responses
mock_openrouter.sentiment_response = {
"score": 0.7,
"confidence": 0.85,
"label": "positive"
}
mock_openrouter.embeddings_response = [[0.1] * 1536]
# Execute pipeline
articles = await service.collect_and_process_news(["AAPL"])
# Verify results
assert len(articles) > 0
assert all(a.sentiment_score is not None for a in articles)
assert all(a.title_embedding is not None for a in articles)
# Verify database storage
stored_articles = await repository.get_articles_by_symbol("AAPL")
assert len(stored_articles) == len(articles)
# Test vector similarity search
similar = await repository.find_similar_articles(
articles[0].title_embedding, limit=5
)
assert len(similar) > 0
@pytest.mark.asyncio
async def test_agent_toolkit_integration(self, test_db):
"""Test integration with AgentToolkit for RAG queries"""
from tradingagents.agents.libs.toolkit import AgentToolkit
# Setup with real data
toolkit = AgentToolkit(test_db)
# Test news context retrieval
context = await toolkit.get_news_context("AAPL", days=7)
assert "articles" in context
assert "sentiment_summary" in context
# Test vector similarity for context
similar_context = await toolkit.get_similar_news(
"Apple earnings beat expectations", limit=5
)
assert len(similar_context) <= 5
@pytest.mark.asyncio
async def test_scheduler_integration(self, test_db):
"""Test APScheduler integration with job management"""
config = TradingAgentsConfig.from_test_config()
repository = NewsRepository(test_db)
service = NewsService(repository, config)
scheduler = ScheduledNewsCollector(service, repository, config)
# Create test job configuration
job_config = NewsJobConfig(
name="test_job",
symbols=["AAPL"],
frequency_cron="0 */6 * * *", # Every 6 hours
enabled=True
)
await repository.create_job_config(job_config)
# Test scheduler initialization
await scheduler.initialize_scheduler()
await scheduler.load_job_configurations()
# Verify job was scheduled
assert scheduler.scheduler.get_job(f"news_collection_{job_config.id}") is not None
# Test manual job execution
await scheduler._execute_news_collection(job_config)
# Verify execution updated last_run
updated_config = await repository.get_job_config(job_config.id)
assert updated_config.last_run is not None
@pytest.mark.asyncio
async def test_error_recovery_and_fallbacks(self, test_db):
"""Test error handling and fallback mechanisms"""
config = TradingAgentsConfig.from_test_config()
repository = NewsRepository(test_db)
service = NewsService(repository, config)
# Test with failing LLM client
with patch.object(service.sentiment_client, 'analyze_sentiment', side_effect=Exception("API Error")):
articles = await service.collect_and_process_news(["AAPL"])
# Should still process articles with fallback
assert len(articles) > 0
# Should have fallback sentiment values
assert any(a.sentiment_score is not None for a in articles)
@pytest.mark.asyncio
async def test_performance_benchmarks(self, test_db):
"""Test performance meets requirements"""
config = TradingAgentsConfig.from_test_config()
repository = NewsRepository(test_db)
# Create test articles with embeddings
test_articles = await self._create_test_articles_with_embeddings(repository, count=1000)
# Test query performance (< 100ms requirement)
start_time = time.time()
articles = await repository.get_recent_articles_by_symbol("AAPL", days=30)
query_time = (time.time() - start_time) * 1000
assert query_time < 100, f"Query took {query_time}ms, should be < 100ms"
# Test vector similarity performance (< 1s requirement)
start_time = time.time()
similar = await repository.find_similar_articles(
test_articles[0].title_embedding, limit=10
)
vector_time = (time.time() - start_time) * 1000
assert vector_time < 1000, f"Vector search took {vector_time}ms, should be < 1s"
Files to Create:
/Users/martinrichards/code/TradingAgents/tests/domains/news/integration/test_news_workflow.py
Test Requirements:
- Full workflow integration tests
- AgentToolkit integration tests
- Performance benchmark tests
- Error scenario tests
T011: Documentation and Monitoring
Priority: Medium | Duration: 1-2 hours | Dependencies: T010
Description: Update documentation and add monitoring for new functionality
Acceptance Criteria:
- Updated API documentation for new methods
- Job scheduling configuration examples
- Performance monitoring dashboard queries
- Troubleshooting guide for common issues
- Agent integration documentation
Files to Modify:
/Users/martinrichards/code/TradingAgents/docs/domains/news.md/Users/martinrichards/code/TradingAgents/docs/api-reference.md
Test Requirements:
- Documentation accuracy validation
- Configuration example testing
Parallel Development Opportunities
AI Agent Collaboration Points
Tasks T005 & T006 can be developed in parallel:
- Both are independent OpenRouter client implementations
- Different LLM capabilities (sentiment vs embeddings)
- Can be tested independently with VCR cassettes
Phase 1 Tasks (T001, T002, T003) have minimal dependencies:
- T002 and T003 both depend on T001 but can be developed simultaneously
- Entity layer changes are independent of each other
Critical Path Analysis
Critical Path: T001 → T002/T003 → T004 → T005/T006 → T007 → T008
Parallel Opportunities:
- Foundation Phase: T002 + T003 (after T001)
- LLM Integration: T005 + T006 (after T002)
- Testing: Unit tests alongside implementation
Risk Mitigation Strategies
LLM API Dependencies:
- Implement comprehensive fallback strategies
- Use VCR for deterministic testing
- Mock clients for unit tests
Database Performance:
- Test with realistic data volumes
- Monitor query performance during development
- Use proper indexes for vector operations
Integration Complexity:
- Build incrementally with testing at each step
- Maintain backward compatibility
- Use feature flags for gradual rollout
Success Metrics
Technical Metrics:
- Test coverage >85% maintained
- Query performance <100ms
- Vector search performance <1s
- Zero breaking changes to AgentToolkit
Functional Metrics:
- Successful OpenRouter-only LLM integration
- Scheduled jobs executing reliably
- Agent context enriched with sentiment and similarity
Quality Metrics:
- All acceptance criteria met
- Comprehensive error handling
- Production-ready monitoring and documentation
Implementation Guidelines
TDD Approach
Every task follows: Write test → Write code → Refactor
Layered Architecture Pattern
Strict adherence to: Database → Entity → Repository → Service → Scheduling
Error Handling Strategy
Graceful fallbacks for all LLM API dependencies
Performance Requirements
Async operations with proper connection pooling throughout
Testing Strategy
Unit tests + Integration tests + VCR for external API calls
This comprehensive task breakdown provides clear implementation guidance for completing the final 5% of the news domain while maintaining architectural consistency and leveraging AI-assisted development patterns.