TradingAgents/docs/specs/news/tasks.md

37 KiB

News Domain Completion - Task Implementation Guide

Overview

Complete the final 5% of the news domain by implementing OpenRouter-only LLM sentiment analysis, vector embeddings, and APScheduler job execution. This builds on 95% complete infrastructure with PostgreSQL + TimescaleDB + pgvectorscale stack.

Total Estimated Time: 12-16 hours with AI assistance
Target Completion: 3-4 days
Test Coverage Requirement: Maintain >85%
Architecture Pattern: Database → Entity → Repository → Service → Scheduling

Implementation Phases

Phase 1: Foundation (4-7 hours)

Database and entity layer enhancements for LLM integration

Phase 2: Data Access (2-3 hours)

Repository layer enhancements for vector and job operations

Phase 3: LLM Integration (5-8 hours)

OpenRouter clients and service integration

Phase 4: Scheduling (4-6 hours)

Job scheduling and CLI integration

Phase 5: Validation (3-5 hours)

Testing, documentation, and monitoring


Task Breakdown

Phase 1: Foundation

T001: Database Migration - NewsJobConfig Table

Priority: Critical | Duration: 1-2 hours | Dependencies: None

Description: Create database migration for news job configurations table with proper indexes

Acceptance Criteria:

  • news_job_configs table created with UUID primary key
  • JSONB fields for symbols and categories with validation
  • Proper indexes for enabled/frequency queries
  • Migration script tests with rollback capability

Implementation Details:

# Migration structure
def upgrade():
    op.create_table(
        'news_job_configs',
        sa.Column('id', postgresql.UUID(), primary_key=True),
        sa.Column('name', sa.String(255), nullable=False),
        sa.Column('symbols', postgresql.JSONB(), nullable=False),
        sa.Column('categories', postgresql.JSONB(), nullable=False),
        sa.Column('frequency_cron', sa.String(100), nullable=False),
        sa.Column('enabled', sa.Boolean(), default=True),
        sa.Column('last_run', sa.DateTime(timezone=True)),
        sa.Column('created_at', sa.DateTime(timezone=True), default=func.now()),
        sa.Column('updated_at', sa.DateTime(timezone=True), default=func.now())
    )
    
    # Indexes
    op.create_index('idx_news_jobs_enabled_frequency', 'news_job_configs', 
                   ['enabled', 'frequency_cron'])
    op.create_index('idx_news_jobs_last_run', 'news_job_configs', 
                   ['last_run'], postgresql_where=sa.text('enabled = true'))

Files to Modify:

  • /Users/martinrichards/code/TradingAgents/tradingagents/data/migrations/add_news_job_configs.py

Test Requirements:

  • Migration up/down tests
  • Index performance validation
  • Constraint validation tests

T002: Enhance NewsArticle Entity - Sentiment and Embeddings

Priority: Critical | Duration: 2-3 hours | Dependencies: T001

Description: Add LLM sentiment fields and embedding validation to NewsArticle entity

Acceptance Criteria:

  • Add sentiment_score, sentiment_confidence, sentiment_label fields
  • Add title_embedding and content_embedding vector fields
  • Enhanced validate() method with sentiment range checks
  • Updated transformations for vector handling
  • Embedding dimension validation (1536)

Implementation Details:

@dataclass
class NewsArticle:
    # Existing fields...
    
    # LLM sentiment fields
    sentiment_score: Optional[float] = None  # [-1.0, 1.0]
    sentiment_confidence: Optional[float] = None  # [0.0, 1.0]
    sentiment_label: Optional[str] = None  # "positive", "negative", "neutral"
    
    # Vector embedding fields
    title_embedding: Optional[List[float]] = None  # 1536 dimensions
    content_embedding: Optional[List[float]] = None  # 1536 dimensions
    
    def validate(self) -> Dict[str, List[str]]:
        errors = super().validate()
        
        # Sentiment validation
        if self.sentiment_score is not None:
            if not -1.0 <= self.sentiment_score <= 1.0:
                errors["sentiment_score"] = ["Must be between -1.0 and 1.0"]
        
        if self.sentiment_confidence is not None:
            if not 0.0 <= self.sentiment_confidence <= 1.0:
                errors["sentiment_confidence"] = ["Must be between 0.0 and 1.0"]
        
        # Vector dimension validation
        for field, vector in [("title_embedding", self.title_embedding), 
                             ("content_embedding", self.content_embedding)]:
            if vector is not None and len(vector) != 1536:
                errors[field] = ["Must be exactly 1536 dimensions"]
        
        return errors
    
    def to_record(self) -> Dict[str, Any]:
        record = super().to_record()
        # Convert vectors to pgvector format if present
        if self.title_embedding:
            record["title_embedding"] = self.title_embedding
        if self.content_embedding:
            record["content_embedding"] = self.content_embedding
        return record

Files to Modify:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_article.py

Test Requirements:

  • Sentiment validation tests (range checks)
  • Vector dimension validation tests
  • Transformation method tests
  • Business rule violation tests

T003: Create NewsJobConfig Entity

Priority: Critical | Duration: 1-2 hours | Dependencies: T001

Description: Implement NewsJobConfig entity for scheduled job management

Acceptance Criteria:

  • NewsJobConfig dataclass with all required fields
  • Business rule validation for job configuration
  • Cron expression validation for frequency
  • Symbol list validation
  • JSON serialization for database storage

Implementation Details:

@dataclass
class NewsJobConfig:
    id: Optional[UUID] = None
    name: str = ""
    symbols: List[str] = field(default_factory=list)
    categories: List[str] = field(default_factory=list)  
    frequency_cron: str = ""
    enabled: bool = True
    last_run: Optional[datetime] = None
    created_at: Optional[datetime] = None
    updated_at: Optional[datetime] = None
    
    def validate(self) -> Dict[str, List[str]]:
        errors = {}
        
        # Name validation
        if not self.name or len(self.name) > 255:
            errors["name"] = ["Name required and must be <= 255 characters"]
        
        # Symbol validation
        if not self.symbols:
            errors["symbols"] = ["At least one symbol required"]
        for symbol in self.symbols:
            if not symbol.isupper() or not symbol.isalpha():
                errors["symbols"] = ["Symbols must be uppercase letters only"]
        
        # Cron validation
        try:
            from croniter import croniter
            if not croniter.is_valid(self.frequency_cron):
                errors["frequency_cron"] = ["Invalid cron expression"]
        except ImportError:
            # Fallback validation for simple intervals
            if self.frequency_cron not in ["hourly", "daily", "weekly"]:
                errors["frequency_cron"] = ["Invalid frequency"]
        
        return errors

Files to Create:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_job_config.py

Test Requirements:

  • Job configuration validation tests
  • Schedule parsing tests
  • Symbol validation tests
  • Serialization/deserialization tests

Phase 2: Data Access

T004: Enhance NewsRepository - Vector and Job Operations

Priority: Critical | Duration: 2-3 hours | Dependencies: T002, T003

Description: Add vector similarity search and NewsJobConfig CRUD operations

Acceptance Criteria:

  • Vector similarity search with cosine distance
  • Batch embedding update operations
  • NewsJobConfig CRUD methods
  • Optimized query performance for vector operations
  • Proper async connection handling

Implementation Details:

class NewsRepository:
    # Existing methods...
    
    async def find_similar_articles(self, 
                                  embedding: List[float], 
                                  limit: int = 10,
                                  threshold: float = 0.8) -> List[NewsArticle]:
        """Find articles similar to given embedding using cosine distance"""
        query = """
        SELECT *, 1 - (title_embedding <=> %s::vector) as similarity
        FROM news_articles 
        WHERE title_embedding IS NOT NULL
        AND 1 - (title_embedding <=> %s::vector) > %s
        ORDER BY title_embedding <=> %s::vector
        LIMIT %s
        """
        
        async with self._get_connection() as conn:
            rows = await conn.fetch(query, embedding, embedding, threshold, embedding, limit)
            return [NewsArticle.from_record(dict(row)) for row in rows]
    
    async def batch_update_embeddings(self, 
                                    articles: List[NewsArticle]) -> None:
        """Efficiently update embeddings for multiple articles"""
        if not articles:
            return
        
        query = """
        UPDATE news_articles 
        SET title_embedding = %s, content_embedding = %s, updated_at = now()
        WHERE id = %s
        """
        
        async with self._get_connection() as conn:
            await conn.executemany(query, [
                (article.title_embedding, article.content_embedding, article.id)
                for article in articles
                if article.id and (article.title_embedding or article.content_embedding)
            ])
    
    # NewsJobConfig CRUD operations
    async def create_job_config(self, config: NewsJobConfig) -> NewsJobConfig:
        """Create new job configuration"""
        query = """
        INSERT INTO news_job_configs (id, name, symbols, categories, frequency_cron, enabled)
        VALUES (%s, %s, %s, %s, %s, %s)
        RETURNING *
        """
        
        config.id = config.id or uuid4()
        async with self._get_connection() as conn:
            row = await conn.fetchrow(query, 
                config.id, config.name, json.dumps(config.symbols),
                json.dumps(config.categories), config.frequency_cron, config.enabled)
            return NewsJobConfig.from_record(dict(row))
    
    async def get_active_job_configs(self) -> List[NewsJobConfig]:
        """Get all enabled job configurations"""
        query = "SELECT * FROM news_job_configs WHERE enabled = true"
        async with self._get_connection() as conn:
            rows = await conn.fetch(query)
            return [NewsJobConfig.from_record(dict(row)) for row in rows]

Files to Modify:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/repositories/news_repository.py

Test Requirements:

  • Vector similarity search tests with mock data
  • Batch operation performance tests
  • Job config CRUD tests
  • Database connection pooling tests

Phase 3: LLM Integration

T005: OpenRouter Client - Sentiment Analysis

Priority: Critical | Duration: 2-3 hours | Dependencies: T002

Description: Implement OpenRouter client for LLM sentiment analysis

Acceptance Criteria:

  • OpenRouter API integration for sentiment analysis
  • Structured prompts for financial news sentiment
  • Response parsing with Pydantic models
  • Error handling with graceful fallbacks
  • Retry logic with exponential backoff

Implementation Details:

class OpenRouterSentimentClient:
    def __init__(self, config: TradingAgentsConfig):
        self.api_key = config.openrouter_api_key
        self.model = config.quick_think_llm
        self.base_url = "https://openrouter.ai/api/v1"
        
    async def analyze_sentiment(self, title: str, content: str) -> SentimentResult:
        """Analyze sentiment of news article"""
        prompt = f"""
        Analyze the sentiment of this financial news article:
        
        Title: {title}
        Content: {content[:1000]}...
        
        Provide sentiment analysis as JSON:
        {{
            "score": float between -1.0 (very negative) and 1.0 (very positive),
            "confidence": float between 0.0 and 1.0,
            "label": "positive" | "negative" | "neutral",
            "reasoning": "brief explanation"
        }}
        """
        
        try:
            async with aiohttp.ClientSession() as session:
                response = await self._make_request(session, prompt)
                return self._parse_sentiment_response(response)
        except Exception as e:
            logger.warning(f"LLM sentiment analysis failed: {e}")
            return self._fallback_sentiment(title, content)
    
    def _fallback_sentiment(self, title: str, content: str) -> SentimentResult:
        """Keyword-based fallback sentiment analysis"""
        # Simple keyword-based sentiment as fallback
        positive_words = ["gain", "profit", "up", "growth", "buy"]
        negative_words = ["loss", "down", "decline", "sell", "drop"]
        
        text = (title + " " + content).lower()
        pos_count = sum(word in text for word in positive_words)
        neg_count = sum(word in text for word in negative_words)
        
        if pos_count > neg_count:
            return SentimentResult(score=0.3, confidence=0.5, label="positive")
        elif neg_count > pos_count:
            return SentimentResult(score=-0.3, confidence=0.5, label="negative")
        else:
            return SentimentResult(score=0.0, confidence=0.5, label="neutral")

Files to Create:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_sentiment_client.py

Test Requirements:

  • Sentiment analysis API tests with VCR
  • Error handling tests
  • Response parsing tests
  • Fallback mechanism tests

T006: OpenRouter Client - Vector Embeddings

Priority: Critical | Duration: 1-2 hours | Dependencies: T002

Description: Implement OpenRouter client for vector embeddings generation

Acceptance Criteria:

  • OpenRouter embeddings API integration
  • Text preprocessing for embedding generation
  • Batch processing for multiple articles
  • 1536-dimensional vector validation
  • Proper error handling and retries

Implementation Details:

class OpenRouterEmbeddingsClient:
    def __init__(self, config: TradingAgentsConfig):
        self.api_key = config.openrouter_api_key
        self.model = "openai/text-embedding-ada-002"  # Via OpenRouter
        self.base_url = "https://openrouter.ai/api/v1"
        
    async def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for multiple texts"""
        if not texts:
            return []
            
        try:
            async with aiohttp.ClientSession() as session:
                response = await self._make_embeddings_request(session, texts)
                embeddings = self._parse_embeddings_response(response)
                
                # Validate dimensions
                for i, embedding in enumerate(embeddings):
                    if len(embedding) != 1536:
                        raise ValueError(f"Invalid embedding dimension at index {i}: {len(embedding)}")
                
                return embeddings
        except Exception as e:
            logger.error(f"Embeddings generation failed: {e}")
            # Return zero vectors as fallback
            return [[0.0] * 1536 for _ in texts]
    
    async def generate_article_embeddings(self, article: NewsArticle) -> Tuple[List[float], List[float]]:
        """Generate embeddings for article title and content"""
        texts = []
        
        # Prepare texts for embedding
        if article.title:
            texts.append(self._preprocess_text(article.title))
        if article.summary:
            # Combine title and summary for comprehensive embedding  
            combined_text = f"{article.title} {article.summary}"
            texts.append(self._preprocess_text(combined_text))
        
        if not texts:
            return [0.0] * 1536, [0.0] * 1536
            
        embeddings = await self.generate_embeddings(texts)
        title_embedding = embeddings[0] if len(embeddings) > 0 else [0.0] * 1536
        content_embedding = embeddings[1] if len(embeddings) > 1 else [0.0] * 1536
        
        return title_embedding, content_embedding
    
    def _preprocess_text(self, text: str) -> str:
        """Preprocess text for optimal embedding generation"""
        # Remove extra whitespace and limit length
        cleaned = " ".join(text.split())
        return cleaned[:8000]  # OpenAI embedding limit

Files to Create:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_embeddings_client.py

Test Requirements:

  • Embeddings API tests with VCR
  • Batch processing tests
  • Vector dimension validation tests
  • Text preprocessing tests

T007: Enhance NewsService - LLM Integration

Priority: Critical | Duration: 2-3 hours | Dependencies: T005, T006

Description: Integrate OpenRouter LLM clients into NewsService workflow

Acceptance Criteria:

  • Replace keyword sentiment with LLM analysis
  • Add embedding generation to article processing
  • End-to-end article processing pipeline
  • Proper error handling and fallback strategies
  • Integration with existing service methods

Implementation Details:

class NewsService:
    def __init__(self, 
                 repository: NewsRepository,
                 config: TradingAgentsConfig):
        self.repository = repository
        self.config = config
        self.sentiment_client = OpenRouterSentimentClient(config)
        self.embeddings_client = OpenRouterEmbeddingsClient(config)
    
    async def process_articles_with_llm(self, articles: List[NewsArticle]) -> List[NewsArticle]:
        """Process articles with LLM sentiment analysis and embeddings"""
        processed_articles = []
        
        for article in articles:
            try:
                # Generate sentiment analysis
                sentiment_result = await self.sentiment_client.analyze_sentiment(
                    article.title, article.summary or ""
                )
                
                # Generate embeddings
                title_embedding, content_embedding = await self.embeddings_client.generate_article_embeddings(article)
                
                # Update article with LLM results
                article.sentiment_score = sentiment_result.score
                article.sentiment_confidence = sentiment_result.confidence
                article.sentiment_label = sentiment_result.label
                article.title_embedding = title_embedding
                article.content_embedding = content_embedding
                
                processed_articles.append(article)
                
            except Exception as e:
                logger.warning(f"Failed to process article {article.id}: {e}")
                # Add article without LLM processing
                processed_articles.append(article)
        
        return processed_articles
    
    async def collect_and_process_news(self, symbols: List[str]) -> List[NewsArticle]:
        """Complete pipeline: collect → process → store with LLM analysis"""
        # Collect raw articles (existing functionality)
        raw_articles = await self.collect_news_articles(symbols)
        
        # Process with LLM
        processed_articles = await self.process_articles_with_llm(raw_articles)
        
        # Store processed articles
        stored_articles = []
        for article in processed_articles:
            stored_article = await self.repository.create_article(article)
            stored_articles.append(stored_article)
        
        # Batch update embeddings for efficiency
        articles_with_embeddings = [a for a in stored_articles 
                                  if a.title_embedding or a.content_embedding]
        if articles_with_embeddings:
            await self.repository.batch_update_embeddings(articles_with_embeddings)
        
        return stored_articles

Files to Modify:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/news_service.py

Test Requirements:

  • Integration tests with mocked LLM clients
  • Article processing pipeline tests
  • Error handling and fallback tests
  • Performance tests for batch operations

Phase 4: Scheduling

T008: APScheduler Integration - Job Scheduling

Priority: High | Duration: 3-4 hours | Dependencies: T003, T004, T007

Description: Implement scheduled news collection using APScheduler

Acceptance Criteria:

  • APScheduler setup with PostgreSQL job store
  • Scheduled job execution with proper error handling
  • Job configuration loading and validation
  • Status monitoring and failure recovery
  • CLI integration for job management

Implementation Details:

class ScheduledNewsCollector:
    def __init__(self, 
                 news_service: NewsService,
                 repository: NewsRepository,
                 config: TradingAgentsConfig):
        self.news_service = news_service
        self.repository = repository
        self.config = config
        self.scheduler = None
        
    async def initialize_scheduler(self):
        """Initialize APScheduler with PostgreSQL job store"""
        from apscheduler.schedulers.asyncio import AsyncIOScheduler
        from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
        
        jobstore = SQLAlchemyJobStore(url=self.config.database_url, 
                                     tablename='apscheduler_jobs')
        
        self.scheduler = AsyncIOScheduler()
        self.scheduler.add_jobstore(jobstore, 'default')
        
    async def load_job_configurations(self):
        """Load and schedule all active job configurations"""
        job_configs = await self.repository.get_active_job_configs()
        
        for config in job_configs:
            try:
                await self._schedule_job(config)
            except Exception as e:
                logger.error(f"Failed to schedule job {config.name}: {e}")
    
    async def _schedule_job(self, job_config: NewsJobConfig):
        """Schedule a single job configuration"""
        job_id = f"news_collection_{job_config.id}"
        
        # Remove existing job if present
        if self.scheduler.get_job(job_id):
            self.scheduler.remove_job(job_id)
        
        # Add new job
        from apscheduler.triggers.cron import CronTrigger
        trigger = CronTrigger.from_crontab(job_config.frequency_cron)
        
        self.scheduler.add_job(
            self._execute_news_collection,
            trigger=trigger,
            id=job_id,
            args=[job_config],
            name=f"News collection: {job_config.name}",
            replace_existing=True
        )
        
    async def _execute_news_collection(self, job_config: NewsJobConfig):
        """Execute news collection for a job configuration"""
        try:
            logger.info(f"Starting news collection job: {job_config.name}")
            
            # Collect and process news
            articles = await self.news_service.collect_and_process_news(job_config.symbols)
            
            # Update job last run timestamp
            job_config.last_run = datetime.now(timezone.utc)
            await self.repository.update_job_config(job_config)
            
            logger.info(f"Completed news collection job: {job_config.name}, "
                       f"collected {len(articles)} articles")
                       
        except Exception as e:
            logger.error(f"News collection job failed: {job_config.name}, error: {e}")
            # Could implement notification/alerting here
            
    async def start_scheduler(self):
        """Start the scheduler"""
        if not self.scheduler:
            await self.initialize_scheduler()
            
        await self.load_job_configurations()
        self.scheduler.start()
        logger.info("News collection scheduler started")
        
    async def stop_scheduler(self):
        """Stop the scheduler"""
        if self.scheduler:
            self.scheduler.shutdown(wait=True)
            logger.info("News collection scheduler stopped")

Files to Create:

  • /Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/scheduled_news_collector.py

Test Requirements:

  • Job scheduling tests with test scheduler
  • Job execution tests with mocked dependencies
  • Error handling and retry tests
  • Job configuration validation tests

T009: CLI Integration - Job Management Commands

Priority: Medium | Duration: 1-2 hours | Dependencies: T008

Description: Add CLI commands for news job management and manual execution

Acceptance Criteria:

  • CLI commands for job creation/management
  • Manual job execution commands
  • Job status and monitoring commands
  • Integration with existing CLI structure
  • Proper error handling and user feedback

Implementation Details:

# Add to cli/commands/news_commands.py
@click.group()
def news():
    """News domain management commands"""
    pass

@news.group() 
def job():
    """Job management commands"""
    pass

@job.command()
@click.option('--name', required=True, help='Job name')
@click.option('--symbols', required=True, help='Comma-separated stock symbols')
@click.option('--frequency', required=True, help='Cron expression or simple frequency')
@click.option('--categories', help='Comma-separated news categories')
async def create(name: str, symbols: str, frequency: str, categories: str):
    """Create a new news collection job"""
    try:
        symbol_list = [s.strip().upper() for s in symbols.split(',')]
        category_list = [c.strip() for c in (categories or "").split(',')] if categories else []
        
        config = NewsJobConfig(
            name=name,
            symbols=symbol_list,
            categories=category_list,
            frequency_cron=frequency,
            enabled=True
        )
        
        # Validate configuration
        errors = config.validate()
        if errors:
            click.echo(f"❌ Invalid configuration: {errors}")
            return
            
        # Create job
        repository = NewsRepository(get_database_config())
        created_config = await repository.create_job_config(config)
        
        click.echo(f"✅ Created job: {created_config.name} (ID: {created_config.id})")
        
    except Exception as e:
        click.echo(f"❌ Failed to create job: {e}")

@job.command()
async def list():
    """List all job configurations"""
    try:
        repository = NewsRepository(get_database_config())
        configs = await repository.get_all_job_configs()
        
        if not configs:
            click.echo("No jobs configured")
            return
            
        click.echo("\n📋 News Collection Jobs:")
        click.echo("=" * 60)
        
        for config in configs:
            status = "🟢 Enabled" if config.enabled else "🔴 Disabled"
            last_run = config.last_run.strftime("%Y-%m-%d %H:%M") if config.last_run else "Never"
            
            click.echo(f"{config.name}")
            click.echo(f"  Status: {status}")
            click.echo(f"  Symbols: {', '.join(config.symbols)}")
            click.echo(f"  Schedule: {config.frequency_cron}")
            click.echo(f"  Last Run: {last_run}")
            click.echo()
            
    except Exception as e:
        click.echo(f"❌ Failed to list jobs: {e}")

@job.command()
@click.argument('job_id', type=str)
async def run(job_id: str):
    """Manually execute a job"""
    try:
        repository = NewsRepository(get_database_config())
        config = await repository.get_job_config(UUID(job_id))
        
        if not config:
            click.echo(f"❌ Job not found: {job_id}")
            return
            
        click.echo(f"🚀 Running job: {config.name}")
        
        # Execute job
        service = NewsService(repository, get_trading_config())
        articles = await service.collect_and_process_news(config.symbols)
        
        click.echo(f"✅ Completed: collected {len(articles)} articles")
        
    except Exception as e:
        click.echo(f"❌ Job execution failed: {e}")

Files to Modify:

  • /Users/martinrichards/code/TradingAgents/cli/commands/news_commands.py

Test Requirements:

  • CLI command tests with mocked services
  • User input validation tests
  • Output formatting tests

Phase 5: Validation

T010: Integration Tests - End-to-End Workflow

Priority: High | Duration: 2-3 hours | Dependencies: T007, T008

Description: Comprehensive integration tests for complete news domain workflow

Acceptance Criteria:

  • End-to-end workflow tests from RSS to vector storage
  • Agent integration tests via AgentToolkit
  • Performance tests for daily collection volumes
  • Error recovery and fallback tests
  • Test coverage maintained above 85%

Implementation Details:

# tests/domains/news/integration/test_news_workflow.py
class TestNewsWorkflowIntegration:
    
    @pytest.mark.asyncio
    async def test_complete_news_processing_pipeline(self, test_db, mock_openrouter):
        """Test complete pipeline from RSS to vector storage"""
        # Setup
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        service = NewsService(repository, config)
        
        # Mock OpenRouter responses
        mock_openrouter.sentiment_response = {
            "score": 0.7,
            "confidence": 0.85, 
            "label": "positive"
        }
        mock_openrouter.embeddings_response = [[0.1] * 1536]
        
        # Execute pipeline
        articles = await service.collect_and_process_news(["AAPL"])
        
        # Verify results
        assert len(articles) > 0
        assert all(a.sentiment_score is not None for a in articles)
        assert all(a.title_embedding is not None for a in articles)
        
        # Verify database storage
        stored_articles = await repository.get_articles_by_symbol("AAPL")
        assert len(stored_articles) == len(articles)
        
        # Test vector similarity search
        similar = await repository.find_similar_articles(
            articles[0].title_embedding, limit=5
        )
        assert len(similar) > 0
    
    @pytest.mark.asyncio
    async def test_agent_toolkit_integration(self, test_db):
        """Test integration with AgentToolkit for RAG queries"""
        from tradingagents.agents.libs.toolkit import AgentToolkit
        
        # Setup with real data
        toolkit = AgentToolkit(test_db)
        
        # Test news context retrieval
        context = await toolkit.get_news_context("AAPL", days=7)
        assert "articles" in context
        assert "sentiment_summary" in context
        
        # Test vector similarity for context
        similar_context = await toolkit.get_similar_news(
            "Apple earnings beat expectations", limit=5
        )
        assert len(similar_context) <= 5
    
    @pytest.mark.asyncio  
    async def test_scheduler_integration(self, test_db):
        """Test APScheduler integration with job management"""
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        service = NewsService(repository, config)
        scheduler = ScheduledNewsCollector(service, repository, config)
        
        # Create test job configuration
        job_config = NewsJobConfig(
            name="test_job",
            symbols=["AAPL"],
            frequency_cron="0 */6 * * *",  # Every 6 hours
            enabled=True
        )
        await repository.create_job_config(job_config)
        
        # Test scheduler initialization
        await scheduler.initialize_scheduler()
        await scheduler.load_job_configurations()
        
        # Verify job was scheduled
        assert scheduler.scheduler.get_job(f"news_collection_{job_config.id}") is not None
        
        # Test manual job execution
        await scheduler._execute_news_collection(job_config)
        
        # Verify execution updated last_run
        updated_config = await repository.get_job_config(job_config.id)
        assert updated_config.last_run is not None
        
    @pytest.mark.asyncio
    async def test_error_recovery_and_fallbacks(self, test_db):
        """Test error handling and fallback mechanisms"""
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        service = NewsService(repository, config)
        
        # Test with failing LLM client
        with patch.object(service.sentiment_client, 'analyze_sentiment', side_effect=Exception("API Error")):
            articles = await service.collect_and_process_news(["AAPL"])
            
            # Should still process articles with fallback
            assert len(articles) > 0
            # Should have fallback sentiment values
            assert any(a.sentiment_score is not None for a in articles)
    
    @pytest.mark.asyncio
    async def test_performance_benchmarks(self, test_db):
        """Test performance meets requirements"""
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        
        # Create test articles with embeddings
        test_articles = await self._create_test_articles_with_embeddings(repository, count=1000)
        
        # Test query performance (< 100ms requirement)
        start_time = time.time()
        articles = await repository.get_recent_articles_by_symbol("AAPL", days=30)
        query_time = (time.time() - start_time) * 1000
        
        assert query_time < 100, f"Query took {query_time}ms, should be < 100ms"
        
        # Test vector similarity performance (< 1s requirement)
        start_time = time.time()
        similar = await repository.find_similar_articles(
            test_articles[0].title_embedding, limit=10
        )
        vector_time = (time.time() - start_time) * 1000
        
        assert vector_time < 1000, f"Vector search took {vector_time}ms, should be < 1s"

Files to Create:

  • /Users/martinrichards/code/TradingAgents/tests/domains/news/integration/test_news_workflow.py

Test Requirements:

  • Full workflow integration tests
  • AgentToolkit integration tests
  • Performance benchmark tests
  • Error scenario tests

T011: Documentation and Monitoring

Priority: Medium | Duration: 1-2 hours | Dependencies: T010

Description: Update documentation and add monitoring for new functionality

Acceptance Criteria:

  • Updated API documentation for new methods
  • Job scheduling configuration examples
  • Performance monitoring dashboard queries
  • Troubleshooting guide for common issues
  • Agent integration documentation

Files to Modify:

  • /Users/martinrichards/code/TradingAgents/docs/domains/news.md
  • /Users/martinrichards/code/TradingAgents/docs/api-reference.md

Test Requirements:

  • Documentation accuracy validation
  • Configuration example testing

Parallel Development Opportunities

AI Agent Collaboration Points

Tasks T005 & T006 can be developed in parallel:

  • Both are independent OpenRouter client implementations
  • Different LLM capabilities (sentiment vs embeddings)
  • Can be tested independently with VCR cassettes

Phase 1 Tasks (T001, T002, T003) have minimal dependencies:

  • T002 and T003 both depend on T001 but can be developed simultaneously
  • Entity layer changes are independent of each other

Critical Path Analysis

Critical Path: T001 → T002/T003 → T004 → T005/T006 → T007 → T008

Parallel Opportunities:

  1. Foundation Phase: T002 + T003 (after T001)
  2. LLM Integration: T005 + T006 (after T002)
  3. Testing: Unit tests alongside implementation

Risk Mitigation Strategies

LLM API Dependencies:

  • Implement comprehensive fallback strategies
  • Use VCR for deterministic testing
  • Mock clients for unit tests

Database Performance:

  • Test with realistic data volumes
  • Monitor query performance during development
  • Use proper indexes for vector operations

Integration Complexity:

  • Build incrementally with testing at each step
  • Maintain backward compatibility
  • Use feature flags for gradual rollout

Success Metrics

Technical Metrics:

  • Test coverage >85% maintained
  • Query performance <100ms
  • Vector search performance <1s
  • Zero breaking changes to AgentToolkit

Functional Metrics:

  • Successful OpenRouter-only LLM integration
  • Scheduled jobs executing reliably
  • Agent context enriched with sentiment and similarity

Quality Metrics:

  • All acceptance criteria met
  • Comprehensive error handling
  • Production-ready monitoring and documentation

Implementation Guidelines

TDD Approach

Every task follows: Write test → Write code → Refactor

Layered Architecture Pattern

Strict adherence to: Database → Entity → Repository → Service → Scheduling

Error Handling Strategy

Graceful fallbacks for all LLM API dependencies

Performance Requirements

Async operations with proper connection pooling throughout

Testing Strategy

Unit tests + Integration tests + VCR for external API calls


This comprehensive task breakdown provides clear implementation guidance for completing the final 5% of the news domain while maintaining architectural consistency and leveraging AI-assisted development patterns.