# News Domain Completion - Task Implementation Guide ## Overview Complete the final 5% of the news domain by implementing OpenRouter-only LLM sentiment analysis, vector embeddings, and APScheduler job execution. This builds on 95% complete infrastructure with PostgreSQL + TimescaleDB + pgvectorscale stack. **Total Estimated Time**: 12-16 hours with AI assistance **Target Completion**: 3-4 days **Test Coverage Requirement**: Maintain >85% **Architecture Pattern**: Database → Entity → Repository → Service → Scheduling ## Implementation Phases ### Phase 1: Foundation (4-7 hours) Database and entity layer enhancements for LLM integration ### Phase 2: Data Access (2-3 hours) Repository layer enhancements for vector and job operations ### Phase 3: LLM Integration (5-8 hours) OpenRouter clients and service integration ### Phase 4: Scheduling (4-6 hours) Job scheduling and CLI integration ### Phase 5: Validation (3-5 hours) Testing, documentation, and monitoring --- ## Task Breakdown ### Phase 1: Foundation #### T001: Database Migration - NewsJobConfig Table **Priority**: Critical | **Duration**: 1-2 hours | **Dependencies**: None **Description**: Create database migration for news job configurations table with proper indexes **Acceptance Criteria**: - [ ] `news_job_configs` table created with UUID primary key - [ ] JSONB fields for symbols and categories with validation - [ ] Proper indexes for enabled/frequency queries - [ ] Migration script tests with rollback capability **Implementation Details**: ```python # Migration structure def upgrade(): op.create_table( 'news_job_configs', sa.Column('id', postgresql.UUID(), primary_key=True), sa.Column('name', sa.String(255), nullable=False), sa.Column('symbols', postgresql.JSONB(), nullable=False), sa.Column('categories', postgresql.JSONB(), nullable=False), sa.Column('frequency_cron', sa.String(100), nullable=False), sa.Column('enabled', sa.Boolean(), default=True), sa.Column('last_run', sa.DateTime(timezone=True)), sa.Column('created_at', sa.DateTime(timezone=True), default=func.now()), sa.Column('updated_at', sa.DateTime(timezone=True), default=func.now()) ) # Indexes op.create_index('idx_news_jobs_enabled_frequency', 'news_job_configs', ['enabled', 'frequency_cron']) op.create_index('idx_news_jobs_last_run', 'news_job_configs', ['last_run'], postgresql_where=sa.text('enabled = true')) ``` **Files to Modify**: - `/Users/martinrichards/code/TradingAgents/tradingagents/data/migrations/add_news_job_configs.py` **Test Requirements**: - Migration up/down tests - Index performance validation - Constraint validation tests --- #### T002: Enhance NewsArticle Entity - Sentiment and Embeddings **Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T001 **Description**: Add LLM sentiment fields and embedding validation to NewsArticle entity **Acceptance Criteria**: - [ ] Add `sentiment_score`, `sentiment_confidence`, `sentiment_label` fields - [ ] Add `title_embedding` and `content_embedding` vector fields - [ ] Enhanced `validate()` method with sentiment range checks - [ ] Updated transformations for vector handling - [ ] Embedding dimension validation (1536) **Implementation Details**: ```python @dataclass class NewsArticle: # Existing fields... # LLM sentiment fields sentiment_score: Optional[float] = None # [-1.0, 1.0] sentiment_confidence: Optional[float] = None # [0.0, 1.0] sentiment_label: Optional[str] = None # "positive", "negative", "neutral" # Vector embedding fields title_embedding: Optional[List[float]] = None # 1536 dimensions content_embedding: Optional[List[float]] = None # 1536 dimensions def validate(self) -> Dict[str, List[str]]: errors = super().validate() # Sentiment validation if self.sentiment_score is not None: if not -1.0 <= self.sentiment_score <= 1.0: errors["sentiment_score"] = ["Must be between -1.0 and 1.0"] if self.sentiment_confidence is not None: if not 0.0 <= self.sentiment_confidence <= 1.0: errors["sentiment_confidence"] = ["Must be between 0.0 and 1.0"] # Vector dimension validation for field, vector in [("title_embedding", self.title_embedding), ("content_embedding", self.content_embedding)]: if vector is not None and len(vector) != 1536: errors[field] = ["Must be exactly 1536 dimensions"] return errors def to_record(self) -> Dict[str, Any]: record = super().to_record() # Convert vectors to pgvector format if present if self.title_embedding: record["title_embedding"] = self.title_embedding if self.content_embedding: record["content_embedding"] = self.content_embedding return record ``` **Files to Modify**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_article.py` **Test Requirements**: - Sentiment validation tests (range checks) - Vector dimension validation tests - Transformation method tests - Business rule violation tests --- #### T003: Create NewsJobConfig Entity **Priority**: Critical | **Duration**: 1-2 hours | **Dependencies**: T001 **Description**: Implement NewsJobConfig entity for scheduled job management **Acceptance Criteria**: - [ ] NewsJobConfig dataclass with all required fields - [ ] Business rule validation for job configuration - [ ] Cron expression validation for frequency - [ ] Symbol list validation - [ ] JSON serialization for database storage **Implementation Details**: ```python @dataclass class NewsJobConfig: id: Optional[UUID] = None name: str = "" symbols: List[str] = field(default_factory=list) categories: List[str] = field(default_factory=list) frequency_cron: str = "" enabled: bool = True last_run: Optional[datetime] = None created_at: Optional[datetime] = None updated_at: Optional[datetime] = None def validate(self) -> Dict[str, List[str]]: errors = {} # Name validation if not self.name or len(self.name) > 255: errors["name"] = ["Name required and must be <= 255 characters"] # Symbol validation if not self.symbols: errors["symbols"] = ["At least one symbol required"] for symbol in self.symbols: if not symbol.isupper() or not symbol.isalpha(): errors["symbols"] = ["Symbols must be uppercase letters only"] # Cron validation try: from croniter import croniter if not croniter.is_valid(self.frequency_cron): errors["frequency_cron"] = ["Invalid cron expression"] except ImportError: # Fallback validation for simple intervals if self.frequency_cron not in ["hourly", "daily", "weekly"]: errors["frequency_cron"] = ["Invalid frequency"] return errors ``` **Files to Create**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_job_config.py` **Test Requirements**: - Job configuration validation tests - Schedule parsing tests - Symbol validation tests - Serialization/deserialization tests --- ### Phase 2: Data Access #### T004: Enhance NewsRepository - Vector and Job Operations **Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T002, T003 **Description**: Add vector similarity search and NewsJobConfig CRUD operations **Acceptance Criteria**: - [ ] Vector similarity search with cosine distance - [ ] Batch embedding update operations - [ ] NewsJobConfig CRUD methods - [ ] Optimized query performance for vector operations - [ ] Proper async connection handling **Implementation Details**: ```python class NewsRepository: # Existing methods... async def find_similar_articles(self, embedding: List[float], limit: int = 10, threshold: float = 0.8) -> List[NewsArticle]: """Find articles similar to given embedding using cosine distance""" query = """ SELECT *, 1 - (title_embedding <=> %s::vector) as similarity FROM news_articles WHERE title_embedding IS NOT NULL AND 1 - (title_embedding <=> %s::vector) > %s ORDER BY title_embedding <=> %s::vector LIMIT %s """ async with self._get_connection() as conn: rows = await conn.fetch(query, embedding, embedding, threshold, embedding, limit) return [NewsArticle.from_record(dict(row)) for row in rows] async def batch_update_embeddings(self, articles: List[NewsArticle]) -> None: """Efficiently update embeddings for multiple articles""" if not articles: return query = """ UPDATE news_articles SET title_embedding = %s, content_embedding = %s, updated_at = now() WHERE id = %s """ async with self._get_connection() as conn: await conn.executemany(query, [ (article.title_embedding, article.content_embedding, article.id) for article in articles if article.id and (article.title_embedding or article.content_embedding) ]) # NewsJobConfig CRUD operations async def create_job_config(self, config: NewsJobConfig) -> NewsJobConfig: """Create new job configuration""" query = """ INSERT INTO news_job_configs (id, name, symbols, categories, frequency_cron, enabled) VALUES (%s, %s, %s, %s, %s, %s) RETURNING * """ config.id = config.id or uuid4() async with self._get_connection() as conn: row = await conn.fetchrow(query, config.id, config.name, json.dumps(config.symbols), json.dumps(config.categories), config.frequency_cron, config.enabled) return NewsJobConfig.from_record(dict(row)) async def get_active_job_configs(self) -> List[NewsJobConfig]: """Get all enabled job configurations""" query = "SELECT * FROM news_job_configs WHERE enabled = true" async with self._get_connection() as conn: rows = await conn.fetch(query) return [NewsJobConfig.from_record(dict(row)) for row in rows] ``` **Files to Modify**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/repositories/news_repository.py` **Test Requirements**: - Vector similarity search tests with mock data - Batch operation performance tests - Job config CRUD tests - Database connection pooling tests --- ### Phase 3: LLM Integration #### T005: OpenRouter Client - Sentiment Analysis **Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T002 **Description**: Implement OpenRouter client for LLM sentiment analysis **Acceptance Criteria**: - [ ] OpenRouter API integration for sentiment analysis - [ ] Structured prompts for financial news sentiment - [ ] Response parsing with Pydantic models - [ ] Error handling with graceful fallbacks - [ ] Retry logic with exponential backoff **Implementation Details**: ```python class OpenRouterSentimentClient: def __init__(self, config: TradingAgentsConfig): self.api_key = config.openrouter_api_key self.model = config.quick_think_llm self.base_url = "https://openrouter.ai/api/v1" async def analyze_sentiment(self, title: str, content: str) -> SentimentResult: """Analyze sentiment of news article""" prompt = f""" Analyze the sentiment of this financial news article: Title: {title} Content: {content[:1000]}... Provide sentiment analysis as JSON: {{ "score": float between -1.0 (very negative) and 1.0 (very positive), "confidence": float between 0.0 and 1.0, "label": "positive" | "negative" | "neutral", "reasoning": "brief explanation" }} """ try: async with aiohttp.ClientSession() as session: response = await self._make_request(session, prompt) return self._parse_sentiment_response(response) except Exception as e: logger.warning(f"LLM sentiment analysis failed: {e}") return self._fallback_sentiment(title, content) def _fallback_sentiment(self, title: str, content: str) -> SentimentResult: """Keyword-based fallback sentiment analysis""" # Simple keyword-based sentiment as fallback positive_words = ["gain", "profit", "up", "growth", "buy"] negative_words = ["loss", "down", "decline", "sell", "drop"] text = (title + " " + content).lower() pos_count = sum(word in text for word in positive_words) neg_count = sum(word in text for word in negative_words) if pos_count > neg_count: return SentimentResult(score=0.3, confidence=0.5, label="positive") elif neg_count > pos_count: return SentimentResult(score=-0.3, confidence=0.5, label="negative") else: return SentimentResult(score=0.0, confidence=0.5, label="neutral") ``` **Files to Create**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_sentiment_client.py` **Test Requirements**: - Sentiment analysis API tests with VCR - Error handling tests - Response parsing tests - Fallback mechanism tests --- #### T006: OpenRouter Client - Vector Embeddings **Priority**: Critical | **Duration**: 1-2 hours | **Dependencies**: T002 **Description**: Implement OpenRouter client for vector embeddings generation **Acceptance Criteria**: - [ ] OpenRouter embeddings API integration - [ ] Text preprocessing for embedding generation - [ ] Batch processing for multiple articles - [ ] 1536-dimensional vector validation - [ ] Proper error handling and retries **Implementation Details**: ```python class OpenRouterEmbeddingsClient: def __init__(self, config: TradingAgentsConfig): self.api_key = config.openrouter_api_key self.model = "openai/text-embedding-ada-002" # Via OpenRouter self.base_url = "https://openrouter.ai/api/v1" async def generate_embeddings(self, texts: List[str]) -> List[List[float]]: """Generate embeddings for multiple texts""" if not texts: return [] try: async with aiohttp.ClientSession() as session: response = await self._make_embeddings_request(session, texts) embeddings = self._parse_embeddings_response(response) # Validate dimensions for i, embedding in enumerate(embeddings): if len(embedding) != 1536: raise ValueError(f"Invalid embedding dimension at index {i}: {len(embedding)}") return embeddings except Exception as e: logger.error(f"Embeddings generation failed: {e}") # Return zero vectors as fallback return [[0.0] * 1536 for _ in texts] async def generate_article_embeddings(self, article: NewsArticle) -> Tuple[List[float], List[float]]: """Generate embeddings for article title and content""" texts = [] # Prepare texts for embedding if article.title: texts.append(self._preprocess_text(article.title)) if article.summary: # Combine title and summary for comprehensive embedding combined_text = f"{article.title} {article.summary}" texts.append(self._preprocess_text(combined_text)) if not texts: return [0.0] * 1536, [0.0] * 1536 embeddings = await self.generate_embeddings(texts) title_embedding = embeddings[0] if len(embeddings) > 0 else [0.0] * 1536 content_embedding = embeddings[1] if len(embeddings) > 1 else [0.0] * 1536 return title_embedding, content_embedding def _preprocess_text(self, text: str) -> str: """Preprocess text for optimal embedding generation""" # Remove extra whitespace and limit length cleaned = " ".join(text.split()) return cleaned[:8000] # OpenAI embedding limit ``` **Files to Create**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_embeddings_client.py` **Test Requirements**: - Embeddings API tests with VCR - Batch processing tests - Vector dimension validation tests - Text preprocessing tests --- #### T007: Enhance NewsService - LLM Integration **Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T005, T006 **Description**: Integrate OpenRouter LLM clients into NewsService workflow **Acceptance Criteria**: - [ ] Replace keyword sentiment with LLM analysis - [ ] Add embedding generation to article processing - [ ] End-to-end article processing pipeline - [ ] Proper error handling and fallback strategies - [ ] Integration with existing service methods **Implementation Details**: ```python class NewsService: def __init__(self, repository: NewsRepository, config: TradingAgentsConfig): self.repository = repository self.config = config self.sentiment_client = OpenRouterSentimentClient(config) self.embeddings_client = OpenRouterEmbeddingsClient(config) async def process_articles_with_llm(self, articles: List[NewsArticle]) -> List[NewsArticle]: """Process articles with LLM sentiment analysis and embeddings""" processed_articles = [] for article in articles: try: # Generate sentiment analysis sentiment_result = await self.sentiment_client.analyze_sentiment( article.title, article.summary or "" ) # Generate embeddings title_embedding, content_embedding = await self.embeddings_client.generate_article_embeddings(article) # Update article with LLM results article.sentiment_score = sentiment_result.score article.sentiment_confidence = sentiment_result.confidence article.sentiment_label = sentiment_result.label article.title_embedding = title_embedding article.content_embedding = content_embedding processed_articles.append(article) except Exception as e: logger.warning(f"Failed to process article {article.id}: {e}") # Add article without LLM processing processed_articles.append(article) return processed_articles async def collect_and_process_news(self, symbols: List[str]) -> List[NewsArticle]: """Complete pipeline: collect → process → store with LLM analysis""" # Collect raw articles (existing functionality) raw_articles = await self.collect_news_articles(symbols) # Process with LLM processed_articles = await self.process_articles_with_llm(raw_articles) # Store processed articles stored_articles = [] for article in processed_articles: stored_article = await self.repository.create_article(article) stored_articles.append(stored_article) # Batch update embeddings for efficiency articles_with_embeddings = [a for a in stored_articles if a.title_embedding or a.content_embedding] if articles_with_embeddings: await self.repository.batch_update_embeddings(articles_with_embeddings) return stored_articles ``` **Files to Modify**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/news_service.py` **Test Requirements**: - Integration tests with mocked LLM clients - Article processing pipeline tests - Error handling and fallback tests - Performance tests for batch operations --- ### Phase 4: Scheduling #### T008: APScheduler Integration - Job Scheduling **Priority**: High | **Duration**: 3-4 hours | **Dependencies**: T003, T004, T007 **Description**: Implement scheduled news collection using APScheduler **Acceptance Criteria**: - [ ] APScheduler setup with PostgreSQL job store - [ ] Scheduled job execution with proper error handling - [ ] Job configuration loading and validation - [ ] Status monitoring and failure recovery - [ ] CLI integration for job management **Implementation Details**: ```python class ScheduledNewsCollector: def __init__(self, news_service: NewsService, repository: NewsRepository, config: TradingAgentsConfig): self.news_service = news_service self.repository = repository self.config = config self.scheduler = None async def initialize_scheduler(self): """Initialize APScheduler with PostgreSQL job store""" from apscheduler.schedulers.asyncio import AsyncIOScheduler from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore jobstore = SQLAlchemyJobStore(url=self.config.database_url, tablename='apscheduler_jobs') self.scheduler = AsyncIOScheduler() self.scheduler.add_jobstore(jobstore, 'default') async def load_job_configurations(self): """Load and schedule all active job configurations""" job_configs = await self.repository.get_active_job_configs() for config in job_configs: try: await self._schedule_job(config) except Exception as e: logger.error(f"Failed to schedule job {config.name}: {e}") async def _schedule_job(self, job_config: NewsJobConfig): """Schedule a single job configuration""" job_id = f"news_collection_{job_config.id}" # Remove existing job if present if self.scheduler.get_job(job_id): self.scheduler.remove_job(job_id) # Add new job from apscheduler.triggers.cron import CronTrigger trigger = CronTrigger.from_crontab(job_config.frequency_cron) self.scheduler.add_job( self._execute_news_collection, trigger=trigger, id=job_id, args=[job_config], name=f"News collection: {job_config.name}", replace_existing=True ) async def _execute_news_collection(self, job_config: NewsJobConfig): """Execute news collection for a job configuration""" try: logger.info(f"Starting news collection job: {job_config.name}") # Collect and process news articles = await self.news_service.collect_and_process_news(job_config.symbols) # Update job last run timestamp job_config.last_run = datetime.now(timezone.utc) await self.repository.update_job_config(job_config) logger.info(f"Completed news collection job: {job_config.name}, " f"collected {len(articles)} articles") except Exception as e: logger.error(f"News collection job failed: {job_config.name}, error: {e}") # Could implement notification/alerting here async def start_scheduler(self): """Start the scheduler""" if not self.scheduler: await self.initialize_scheduler() await self.load_job_configurations() self.scheduler.start() logger.info("News collection scheduler started") async def stop_scheduler(self): """Stop the scheduler""" if self.scheduler: self.scheduler.shutdown(wait=True) logger.info("News collection scheduler stopped") ``` **Files to Create**: - `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/scheduled_news_collector.py` **Test Requirements**: - Job scheduling tests with test scheduler - Job execution tests with mocked dependencies - Error handling and retry tests - Job configuration validation tests --- #### T009: CLI Integration - Job Management Commands **Priority**: Medium | **Duration**: 1-2 hours | **Dependencies**: T008 **Description**: Add CLI commands for news job management and manual execution **Acceptance Criteria**: - [ ] CLI commands for job creation/management - [ ] Manual job execution commands - [ ] Job status and monitoring commands - [ ] Integration with existing CLI structure - [ ] Proper error handling and user feedback **Implementation Details**: ```python # Add to cli/commands/news_commands.py @click.group() def news(): """News domain management commands""" pass @news.group() def job(): """Job management commands""" pass @job.command() @click.option('--name', required=True, help='Job name') @click.option('--symbols', required=True, help='Comma-separated stock symbols') @click.option('--frequency', required=True, help='Cron expression or simple frequency') @click.option('--categories', help='Comma-separated news categories') async def create(name: str, symbols: str, frequency: str, categories: str): """Create a new news collection job""" try: symbol_list = [s.strip().upper() for s in symbols.split(',')] category_list = [c.strip() for c in (categories or "").split(',')] if categories else [] config = NewsJobConfig( name=name, symbols=symbol_list, categories=category_list, frequency_cron=frequency, enabled=True ) # Validate configuration errors = config.validate() if errors: click.echo(f"❌ Invalid configuration: {errors}") return # Create job repository = NewsRepository(get_database_config()) created_config = await repository.create_job_config(config) click.echo(f"✅ Created job: {created_config.name} (ID: {created_config.id})") except Exception as e: click.echo(f"❌ Failed to create job: {e}") @job.command() async def list(): """List all job configurations""" try: repository = NewsRepository(get_database_config()) configs = await repository.get_all_job_configs() if not configs: click.echo("No jobs configured") return click.echo("\n📋 News Collection Jobs:") click.echo("=" * 60) for config in configs: status = "🟢 Enabled" if config.enabled else "🔴 Disabled" last_run = config.last_run.strftime("%Y-%m-%d %H:%M") if config.last_run else "Never" click.echo(f"{config.name}") click.echo(f" Status: {status}") click.echo(f" Symbols: {', '.join(config.symbols)}") click.echo(f" Schedule: {config.frequency_cron}") click.echo(f" Last Run: {last_run}") click.echo() except Exception as e: click.echo(f"❌ Failed to list jobs: {e}") @job.command() @click.argument('job_id', type=str) async def run(job_id: str): """Manually execute a job""" try: repository = NewsRepository(get_database_config()) config = await repository.get_job_config(UUID(job_id)) if not config: click.echo(f"❌ Job not found: {job_id}") return click.echo(f"🚀 Running job: {config.name}") # Execute job service = NewsService(repository, get_trading_config()) articles = await service.collect_and_process_news(config.symbols) click.echo(f"✅ Completed: collected {len(articles)} articles") except Exception as e: click.echo(f"❌ Job execution failed: {e}") ``` **Files to Modify**: - `/Users/martinrichards/code/TradingAgents/cli/commands/news_commands.py` **Test Requirements**: - CLI command tests with mocked services - User input validation tests - Output formatting tests --- ### Phase 5: Validation #### T010: Integration Tests - End-to-End Workflow **Priority**: High | **Duration**: 2-3 hours | **Dependencies**: T007, T008 **Description**: Comprehensive integration tests for complete news domain workflow **Acceptance Criteria**: - [ ] End-to-end workflow tests from RSS to vector storage - [ ] Agent integration tests via AgentToolkit - [ ] Performance tests for daily collection volumes - [ ] Error recovery and fallback tests - [ ] Test coverage maintained above 85% **Implementation Details**: ```python # tests/domains/news/integration/test_news_workflow.py class TestNewsWorkflowIntegration: @pytest.mark.asyncio async def test_complete_news_processing_pipeline(self, test_db, mock_openrouter): """Test complete pipeline from RSS to vector storage""" # Setup config = TradingAgentsConfig.from_test_config() repository = NewsRepository(test_db) service = NewsService(repository, config) # Mock OpenRouter responses mock_openrouter.sentiment_response = { "score": 0.7, "confidence": 0.85, "label": "positive" } mock_openrouter.embeddings_response = [[0.1] * 1536] # Execute pipeline articles = await service.collect_and_process_news(["AAPL"]) # Verify results assert len(articles) > 0 assert all(a.sentiment_score is not None for a in articles) assert all(a.title_embedding is not None for a in articles) # Verify database storage stored_articles = await repository.get_articles_by_symbol("AAPL") assert len(stored_articles) == len(articles) # Test vector similarity search similar = await repository.find_similar_articles( articles[0].title_embedding, limit=5 ) assert len(similar) > 0 @pytest.mark.asyncio async def test_agent_toolkit_integration(self, test_db): """Test integration with AgentToolkit for RAG queries""" from tradingagents.agents.libs.toolkit import AgentToolkit # Setup with real data toolkit = AgentToolkit(test_db) # Test news context retrieval context = await toolkit.get_news_context("AAPL", days=7) assert "articles" in context assert "sentiment_summary" in context # Test vector similarity for context similar_context = await toolkit.get_similar_news( "Apple earnings beat expectations", limit=5 ) assert len(similar_context) <= 5 @pytest.mark.asyncio async def test_scheduler_integration(self, test_db): """Test APScheduler integration with job management""" config = TradingAgentsConfig.from_test_config() repository = NewsRepository(test_db) service = NewsService(repository, config) scheduler = ScheduledNewsCollector(service, repository, config) # Create test job configuration job_config = NewsJobConfig( name="test_job", symbols=["AAPL"], frequency_cron="0 */6 * * *", # Every 6 hours enabled=True ) await repository.create_job_config(job_config) # Test scheduler initialization await scheduler.initialize_scheduler() await scheduler.load_job_configurations() # Verify job was scheduled assert scheduler.scheduler.get_job(f"news_collection_{job_config.id}") is not None # Test manual job execution await scheduler._execute_news_collection(job_config) # Verify execution updated last_run updated_config = await repository.get_job_config(job_config.id) assert updated_config.last_run is not None @pytest.mark.asyncio async def test_error_recovery_and_fallbacks(self, test_db): """Test error handling and fallback mechanisms""" config = TradingAgentsConfig.from_test_config() repository = NewsRepository(test_db) service = NewsService(repository, config) # Test with failing LLM client with patch.object(service.sentiment_client, 'analyze_sentiment', side_effect=Exception("API Error")): articles = await service.collect_and_process_news(["AAPL"]) # Should still process articles with fallback assert len(articles) > 0 # Should have fallback sentiment values assert any(a.sentiment_score is not None for a in articles) @pytest.mark.asyncio async def test_performance_benchmarks(self, test_db): """Test performance meets requirements""" config = TradingAgentsConfig.from_test_config() repository = NewsRepository(test_db) # Create test articles with embeddings test_articles = await self._create_test_articles_with_embeddings(repository, count=1000) # Test query performance (< 100ms requirement) start_time = time.time() articles = await repository.get_recent_articles_by_symbol("AAPL", days=30) query_time = (time.time() - start_time) * 1000 assert query_time < 100, f"Query took {query_time}ms, should be < 100ms" # Test vector similarity performance (< 1s requirement) start_time = time.time() similar = await repository.find_similar_articles( test_articles[0].title_embedding, limit=10 ) vector_time = (time.time() - start_time) * 1000 assert vector_time < 1000, f"Vector search took {vector_time}ms, should be < 1s" ``` **Files to Create**: - `/Users/martinrichards/code/TradingAgents/tests/domains/news/integration/test_news_workflow.py` **Test Requirements**: - Full workflow integration tests - AgentToolkit integration tests - Performance benchmark tests - Error scenario tests --- #### T011: Documentation and Monitoring **Priority**: Medium | **Duration**: 1-2 hours | **Dependencies**: T010 **Description**: Update documentation and add monitoring for new functionality **Acceptance Criteria**: - [ ] Updated API documentation for new methods - [ ] Job scheduling configuration examples - [ ] Performance monitoring dashboard queries - [ ] Troubleshooting guide for common issues - [ ] Agent integration documentation **Files to Modify**: - `/Users/martinrichards/code/TradingAgents/docs/domains/news.md` - `/Users/martinrichards/code/TradingAgents/docs/api-reference.md` **Test Requirements**: - Documentation accuracy validation - Configuration example testing --- ## Parallel Development Opportunities ### AI Agent Collaboration Points **Tasks T005 & T006** can be developed in parallel: - Both are independent OpenRouter client implementations - Different LLM capabilities (sentiment vs embeddings) - Can be tested independently with VCR cassettes **Phase 1 Tasks (T001, T002, T003)** have minimal dependencies: - T002 and T003 both depend on T001 but can be developed simultaneously - Entity layer changes are independent of each other ### Critical Path Analysis **Critical Path**: T001 → T002/T003 → T004 → T005/T006 → T007 → T008 **Parallel Opportunities**: 1. **Foundation Phase**: T002 + T003 (after T001) 2. **LLM Integration**: T005 + T006 (after T002) 3. **Testing**: Unit tests alongside implementation ### Risk Mitigation Strategies **LLM API Dependencies**: - Implement comprehensive fallback strategies - Use VCR for deterministic testing - Mock clients for unit tests **Database Performance**: - Test with realistic data volumes - Monitor query performance during development - Use proper indexes for vector operations **Integration Complexity**: - Build incrementally with testing at each step - Maintain backward compatibility - Use feature flags for gradual rollout --- ## Success Metrics **Technical Metrics**: - Test coverage >85% maintained - Query performance <100ms - Vector search performance <1s - Zero breaking changes to AgentToolkit **Functional Metrics**: - Successful OpenRouter-only LLM integration - Scheduled jobs executing reliably - Agent context enriched with sentiment and similarity **Quality Metrics**: - All acceptance criteria met - Comprehensive error handling - Production-ready monitoring and documentation --- ## Implementation Guidelines ### TDD Approach **Every task follows**: Write test → Write code → Refactor ### Layered Architecture Pattern **Strict adherence to**: Database → Entity → Repository → Service → Scheduling ### Error Handling Strategy **Graceful fallbacks** for all LLM API dependencies ### Performance Requirements **Async operations** with proper connection pooling throughout ### Testing Strategy **Unit tests + Integration tests + VCR** for external API calls --- This comprehensive task breakdown provides clear implementation guidance for completing the final 5% of the news domain while maintaining architectural consistency and leveraging AI-assisted development patterns.