37 KiB

Raw Blame History

News Domain Completion - Task Implementation Guide

Overview

Complete the final 5% of the news domain by implementing OpenRouter-only LLM sentiment analysis, vector embeddings, and APScheduler job execution. This builds on 95% complete infrastructure with PostgreSQL + TimescaleDB + pgvectorscale stack.

Total Estimated Time: 12-16 hours with AI assistance
Target Completion: 3-4 days
Test Coverage Requirement: Maintain >85%
Architecture Pattern: Database → Entity → Repository → Service → Scheduling

Implementation Phases

Phase 1: Foundation (4-7 hours)

Database and entity layer enhancements for LLM integration

Phase 2: Data Access (2-3 hours)

Repository layer enhancements for vector and job operations

Phase 3: LLM Integration (5-8 hours)

OpenRouter clients and service integration

Phase 4: Scheduling (4-6 hours)

Job scheduling and CLI integration

Phase 5: Validation (3-5 hours)

Testing, documentation, and monitoring

Task Breakdown

Phase 1: Foundation

T001: Database Migration - NewsJobConfig Table

Priority: Critical | Duration: 1-2 hours | Dependencies: None

Description: Create database migration for news job configurations table with proper indexes

Acceptance Criteria:

news_job_configs table created with UUID primary key
JSONB fields for symbols and categories with validation
Proper indexes for enabled/frequency queries
Migration script tests with rollback capability

Implementation Details:

# Migration structure
def upgrade():
    op.create_table(
        'news_job_configs',
        sa.Column('id', postgresql.UUID(), primary_key=True),
        sa.Column('name', sa.String(255), nullable=False),
        sa.Column('symbols', postgresql.JSONB(), nullable=False),
        sa.Column('categories', postgresql.JSONB(), nullable=False),
        sa.Column('frequency_cron', sa.String(100), nullable=False),
        sa.Column('enabled', sa.Boolean(), default=True),
        sa.Column('last_run', sa.DateTime(timezone=True)),
        sa.Column('created_at', sa.DateTime(timezone=True), default=func.now()),
        sa.Column('updated_at', sa.DateTime(timezone=True), default=func.now())
    )
    
    # Indexes
    op.create_index('idx_news_jobs_enabled_frequency', 'news_job_configs', 
                   ['enabled', 'frequency_cron'])
    op.create_index('idx_news_jobs_last_run', 'news_job_configs', 
                   ['last_run'], postgresql_where=sa.text('enabled = true'))

Files to Modify:

/Users/martinrichards/code/TradingAgents/tradingagents/data/migrations/add_news_job_configs.py

Test Requirements:

Migration up/down tests
Index performance validation
Constraint validation tests

T002: Enhance NewsArticle Entity - Sentiment and Embeddings

Priority: Critical | Duration: 2-3 hours | Dependencies: T001

Description: Add LLM sentiment fields and embedding validation to NewsArticle entity

Acceptance Criteria:

Add sentiment_score, sentiment_confidence, sentiment_label fields
Add title_embedding and content_embedding vector fields
Enhanced validate() method with sentiment range checks
Updated transformations for vector handling
Embedding dimension validation (1536)

Implementation Details:

@dataclass
class NewsArticle:
    # Existing fields...
    
    # LLM sentiment fields
    sentiment_score: Optional[float] = None  # [-1.0, 1.0]
    sentiment_confidence: Optional[float] = None  # [0.0, 1.0]
    sentiment_label: Optional[str] = None  # "positive", "negative", "neutral"
    
    # Vector embedding fields
    title_embedding: Optional[List[float]] = None  # 1536 dimensions
    content_embedding: Optional[List[float]] = None  # 1536 dimensions
    
    def validate(self) -> Dict[str, List[str]]:
        errors = super().validate()
        
        # Sentiment validation
        if self.sentiment_score is not None:
            if not -1.0 <= self.sentiment_score <= 1.0:
                errors["sentiment_score"] = ["Must be between -1.0 and 1.0"]
        
        if self.sentiment_confidence is not None:
            if not 0.0 <= self.sentiment_confidence <= 1.0:
                errors["sentiment_confidence"] = ["Must be between 0.0 and 1.0"]
        
        # Vector dimension validation
        for field, vector in [("title_embedding", self.title_embedding), 
                             ("content_embedding", self.content_embedding)]:
            if vector is not None and len(vector) != 1536:
                errors[field] = ["Must be exactly 1536 dimensions"]
        
        return errors
    
    def to_record(self) -> Dict[str, Any]:
        record = super().to_record()
        # Convert vectors to pgvector format if present
        if self.title_embedding:
            record["title_embedding"] = self.title_embedding
        if self.content_embedding:
            record["content_embedding"] = self.content_embedding
        return record

Files to Modify:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_article.py

Test Requirements:

Sentiment validation tests (range checks)
Vector dimension validation tests
Transformation method tests
Business rule violation tests

T003: Create NewsJobConfig Entity

Priority: Critical | Duration: 1-2 hours | Dependencies: T001

Description: Implement NewsJobConfig entity for scheduled job management

Acceptance Criteria:

NewsJobConfig dataclass with all required fields
Business rule validation for job configuration
Cron expression validation for frequency
Symbol list validation
JSON serialization for database storage

Implementation Details:

@dataclass
class NewsJobConfig:
    id: Optional[UUID] = None
    name: str = ""
    symbols: List[str] = field(default_factory=list)
    categories: List[str] = field(default_factory=list)  
    frequency_cron: str = ""
    enabled: bool = True
    last_run: Optional[datetime] = None
    created_at: Optional[datetime] = None
    updated_at: Optional[datetime] = None
    
    def validate(self) -> Dict[str, List[str]]:
        errors = {}
        
        # Name validation
        if not self.name or len(self.name) > 255:
            errors["name"] = ["Name required and must be <= 255 characters"]
        
        # Symbol validation
        if not self.symbols:
            errors["symbols"] = ["At least one symbol required"]
        for symbol in self.symbols:
            if not symbol.isupper() or not symbol.isalpha():
                errors["symbols"] = ["Symbols must be uppercase letters only"]
        
        # Cron validation
        try:
            from croniter import croniter
            if not croniter.is_valid(self.frequency_cron):
                errors["frequency_cron"] = ["Invalid cron expression"]
        except ImportError:
            # Fallback validation for simple intervals
            if self.frequency_cron not in ["hourly", "daily", "weekly"]:
                errors["frequency_cron"] = ["Invalid frequency"]
        
        return errors

Files to Create:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_job_config.py

Test Requirements:

Job configuration validation tests
Schedule parsing tests
Symbol validation tests
Serialization/deserialization tests

Phase 2: Data Access

T004: Enhance NewsRepository - Vector and Job Operations

Priority: Critical | Duration: 2-3 hours | Dependencies: T002, T003

Description: Add vector similarity search and NewsJobConfig CRUD operations

Acceptance Criteria:

Vector similarity search with cosine distance
Batch embedding update operations
NewsJobConfig CRUD methods
Optimized query performance for vector operations
Proper async connection handling

Implementation Details:

class NewsRepository:
    # Existing methods...
    
    async def find_similar_articles(self, 
                                  embedding: List[float], 
                                  limit: int = 10,
                                  threshold: float = 0.8) -> List[NewsArticle]:
        """Find articles similar to given embedding using cosine distance"""
        query = """
        SELECT *, 1 - (title_embedding <=> %s::vector) as similarity
        FROM news_articles 
        WHERE title_embedding IS NOT NULL
        AND 1 - (title_embedding <=> %s::vector) > %s
        ORDER BY title_embedding <=> %s::vector
        LIMIT %s
        """
        
        async with self._get_connection() as conn:
            rows = await conn.fetch(query, embedding, embedding, threshold, embedding, limit)
            return [NewsArticle.from_record(dict(row)) for row in rows]
    
    async def batch_update_embeddings(self, 
                                    articles: List[NewsArticle]) -> None:
        """Efficiently update embeddings for multiple articles"""
        if not articles:
            return
        
        query = """
        UPDATE news_articles 
        SET title_embedding = %s, content_embedding = %s, updated_at = now()
        WHERE id = %s
        """
        
        async with self._get_connection() as conn:
            await conn.executemany(query, [
                (article.title_embedding, article.content_embedding, article.id)
                for article in articles
                if article.id and (article.title_embedding or article.content_embedding)
            ])
    
    # NewsJobConfig CRUD operations
    async def create_job_config(self, config: NewsJobConfig) -> NewsJobConfig:
        """Create new job configuration"""
        query = """
        INSERT INTO news_job_configs (id, name, symbols, categories, frequency_cron, enabled)
        VALUES (%s, %s, %s, %s, %s, %s)
        RETURNING *
        """
        
        config.id = config.id or uuid4()
        async with self._get_connection() as conn:
            row = await conn.fetchrow(query, 
                config.id, config.name, json.dumps(config.symbols),
                json.dumps(config.categories), config.frequency_cron, config.enabled)
            return NewsJobConfig.from_record(dict(row))
    
    async def get_active_job_configs(self) -> List[NewsJobConfig]:
        """Get all enabled job configurations"""
        query = "SELECT * FROM news_job_configs WHERE enabled = true"
        async with self._get_connection() as conn:
            rows = await conn.fetch(query)
            return [NewsJobConfig.from_record(dict(row)) for row in rows]

Files to Modify:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/repositories/news_repository.py

Test Requirements:

Vector similarity search tests with mock data
Batch operation performance tests
Job config CRUD tests
Database connection pooling tests

Phase 3: LLM Integration

T005: OpenRouter Client - Sentiment Analysis

Priority: Critical | Duration: 2-3 hours | Dependencies: T002

Description: Implement OpenRouter client for LLM sentiment analysis

Acceptance Criteria:

OpenRouter API integration for sentiment analysis
Structured prompts for financial news sentiment
Response parsing with Pydantic models
Error handling with graceful fallbacks
Retry logic with exponential backoff

Implementation Details:

class OpenRouterSentimentClient:
    def __init__(self, config: TradingAgentsConfig):
        self.api_key = config.openrouter_api_key
        self.model = config.quick_think_llm
        self.base_url = "https://openrouter.ai/api/v1"
        
    async def analyze_sentiment(self, title: str, content: str) -> SentimentResult:
        """Analyze sentiment of news article"""
        prompt = f"""
        Analyze the sentiment of this financial news article:
        
        Title: {title}
        Content: {content[:1000]}...
        
        Provide sentiment analysis as JSON:
        {{
            "score": float between -1.0 (very negative) and 1.0 (very positive),
            "confidence": float between 0.0 and 1.0,
            "label": "positive" | "negative" | "neutral",
            "reasoning": "brief explanation"
        }}
        """
        
        try:
            async with aiohttp.ClientSession() as session:
                response = await self._make_request(session, prompt)
                return self._parse_sentiment_response(response)
        except Exception as e:
            logger.warning(f"LLM sentiment analysis failed: {e}")
            return self._fallback_sentiment(title, content)
    
    def _fallback_sentiment(self, title: str, content: str) -> SentimentResult:
        """Keyword-based fallback sentiment analysis"""
        # Simple keyword-based sentiment as fallback
        positive_words = ["gain", "profit", "up", "growth", "buy"]
        negative_words = ["loss", "down", "decline", "sell", "drop"]
        
        text = (title + " " + content).lower()
        pos_count = sum(word in text for word in positive_words)
        neg_count = sum(word in text for word in negative_words)
        
        if pos_count > neg_count:
            return SentimentResult(score=0.3, confidence=0.5, label="positive")
        elif neg_count > pos_count:
            return SentimentResult(score=-0.3, confidence=0.5, label="negative")
        else:
            return SentimentResult(score=0.0, confidence=0.5, label="neutral")

Files to Create:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_sentiment_client.py

Test Requirements:

Sentiment analysis API tests with VCR
Error handling tests
Response parsing tests
Fallback mechanism tests

T006: OpenRouter Client - Vector Embeddings

Priority: Critical | Duration: 1-2 hours | Dependencies: T002

Description: Implement OpenRouter client for vector embeddings generation

Acceptance Criteria:

OpenRouter embeddings API integration
Text preprocessing for embedding generation
Batch processing for multiple articles
1536-dimensional vector validation
Proper error handling and retries

Implementation Details:

class OpenRouterEmbeddingsClient:
    def __init__(self, config: TradingAgentsConfig):
        self.api_key = config.openrouter_api_key
        self.model = "openai/text-embedding-ada-002"  # Via OpenRouter
        self.base_url = "https://openrouter.ai/api/v1"
        
    async def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for multiple texts"""
        if not texts:
            return []
            
        try:
            async with aiohttp.ClientSession() as session:
                response = await self._make_embeddings_request(session, texts)
                embeddings = self._parse_embeddings_response(response)
                
                # Validate dimensions
                for i, embedding in enumerate(embeddings):
                    if len(embedding) != 1536:
                        raise ValueError(f"Invalid embedding dimension at index {i}: {len(embedding)}")
                
                return embeddings
        except Exception as e:
            logger.error(f"Embeddings generation failed: {e}")
            # Return zero vectors as fallback
            return [[0.0] * 1536 for _ in texts]
    
    async def generate_article_embeddings(self, article: NewsArticle) -> Tuple[List[float], List[float]]:
        """Generate embeddings for article title and content"""
        texts = []
        
        # Prepare texts for embedding
        if article.title:
            texts.append(self._preprocess_text(article.title))
        if article.summary:
            # Combine title and summary for comprehensive embedding  
            combined_text = f"{article.title} {article.summary}"
            texts.append(self._preprocess_text(combined_text))
        
        if not texts:
            return [0.0] * 1536, [0.0] * 1536
            
        embeddings = await self.generate_embeddings(texts)
        title_embedding = embeddings[0] if len(embeddings) > 0 else [0.0] * 1536
        content_embedding = embeddings[1] if len(embeddings) > 1 else [0.0] * 1536
        
        return title_embedding, content_embedding
    
    def _preprocess_text(self, text: str) -> str:
        """Preprocess text for optimal embedding generation"""
        # Remove extra whitespace and limit length
        cleaned = " ".join(text.split())
        return cleaned[:8000]  # OpenAI embedding limit

Files to Create:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_embeddings_client.py

Test Requirements:

Embeddings API tests with VCR
Batch processing tests
Vector dimension validation tests
Text preprocessing tests

T007: Enhance NewsService - LLM Integration

Priority: Critical | Duration: 2-3 hours | Dependencies: T005, T006

Description: Integrate OpenRouter LLM clients into NewsService workflow

Acceptance Criteria:

Replace keyword sentiment with LLM analysis
Add embedding generation to article processing
End-to-end article processing pipeline
Proper error handling and fallback strategies
Integration with existing service methods

Implementation Details:

class NewsService:
    def __init__(self, 
                 repository: NewsRepository,
                 config: TradingAgentsConfig):
        self.repository = repository
        self.config = config
        self.sentiment_client = OpenRouterSentimentClient(config)
        self.embeddings_client = OpenRouterEmbeddingsClient(config)
    
    async def process_articles_with_llm(self, articles: List[NewsArticle]) -> List[NewsArticle]:
        """Process articles with LLM sentiment analysis and embeddings"""
        processed_articles = []
        
        for article in articles:
            try:
                # Generate sentiment analysis
                sentiment_result = await self.sentiment_client.analyze_sentiment(
                    article.title, article.summary or ""
                )
                
                # Generate embeddings
                title_embedding, content_embedding = await self.embeddings_client.generate_article_embeddings(article)
                
                # Update article with LLM results
                article.sentiment_score = sentiment_result.score
                article.sentiment_confidence = sentiment_result.confidence
                article.sentiment_label = sentiment_result.label
                article.title_embedding = title_embedding
                article.content_embedding = content_embedding
                
                processed_articles.append(article)
                
            except Exception as e:
                logger.warning(f"Failed to process article {article.id}: {e}")
                # Add article without LLM processing
                processed_articles.append(article)
        
        return processed_articles
    
    async def collect_and_process_news(self, symbols: List[str]) -> List[NewsArticle]:
        """Complete pipeline: collect → process → store with LLM analysis"""
        # Collect raw articles (existing functionality)
        raw_articles = await self.collect_news_articles(symbols)
        
        # Process with LLM
        processed_articles = await self.process_articles_with_llm(raw_articles)
        
        # Store processed articles
        stored_articles = []
        for article in processed_articles:
            stored_article = await self.repository.create_article(article)
            stored_articles.append(stored_article)
        
        # Batch update embeddings for efficiency
        articles_with_embeddings = [a for a in stored_articles 
                                  if a.title_embedding or a.content_embedding]
        if articles_with_embeddings:
            await self.repository.batch_update_embeddings(articles_with_embeddings)
        
        return stored_articles

Files to Modify:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/news_service.py

Test Requirements:

Integration tests with mocked LLM clients
Article processing pipeline tests
Error handling and fallback tests
Performance tests for batch operations

Phase 4: Scheduling

T008: APScheduler Integration - Job Scheduling

Priority: High | Duration: 3-4 hours | Dependencies: T003, T004, T007

Description: Implement scheduled news collection using APScheduler

Acceptance Criteria:

APScheduler setup with PostgreSQL job store
Scheduled job execution with proper error handling
Job configuration loading and validation
Status monitoring and failure recovery
CLI integration for job management

Implementation Details:

class ScheduledNewsCollector:
    def __init__(self, 
                 news_service: NewsService,
                 repository: NewsRepository,
                 config: TradingAgentsConfig):
        self.news_service = news_service
        self.repository = repository
        self.config = config
        self.scheduler = None
        
    async def initialize_scheduler(self):
        """Initialize APScheduler with PostgreSQL job store"""
        from apscheduler.schedulers.asyncio import AsyncIOScheduler
        from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
        
        jobstore = SQLAlchemyJobStore(url=self.config.database_url, 
                                     tablename='apscheduler_jobs')
        
        self.scheduler = AsyncIOScheduler()
        self.scheduler.add_jobstore(jobstore, 'default')
        
    async def load_job_configurations(self):
        """Load and schedule all active job configurations"""
        job_configs = await self.repository.get_active_job_configs()
        
        for config in job_configs:
            try:
                await self._schedule_job(config)
            except Exception as e:
                logger.error(f"Failed to schedule job {config.name}: {e}")
    
    async def _schedule_job(self, job_config: NewsJobConfig):
        """Schedule a single job configuration"""
        job_id = f"news_collection_{job_config.id}"
        
        # Remove existing job if present
        if self.scheduler.get_job(job_id):
            self.scheduler.remove_job(job_id)
        
        # Add new job
        from apscheduler.triggers.cron import CronTrigger
        trigger = CronTrigger.from_crontab(job_config.frequency_cron)
        
        self.scheduler.add_job(
            self._execute_news_collection,
            trigger=trigger,
            id=job_id,
            args=[job_config],
            name=f"News collection: {job_config.name}",
            replace_existing=True
        )
        
    async def _execute_news_collection(self, job_config: NewsJobConfig):
        """Execute news collection for a job configuration"""
        try:
            logger.info(f"Starting news collection job: {job_config.name}")
            
            # Collect and process news
            articles = await self.news_service.collect_and_process_news(job_config.symbols)
            
            # Update job last run timestamp
            job_config.last_run = datetime.now(timezone.utc)
            await self.repository.update_job_config(job_config)
            
            logger.info(f"Completed news collection job: {job_config.name}, "
                       f"collected {len(articles)} articles")
                       
        except Exception as e:
            logger.error(f"News collection job failed: {job_config.name}, error: {e}")
            # Could implement notification/alerting here
            
    async def start_scheduler(self):
        """Start the scheduler"""
        if not self.scheduler:
            await self.initialize_scheduler()
            
        await self.load_job_configurations()
        self.scheduler.start()
        logger.info("News collection scheduler started")
        
    async def stop_scheduler(self):
        """Stop the scheduler"""
        if self.scheduler:
            self.scheduler.shutdown(wait=True)
            logger.info("News collection scheduler stopped")

Files to Create:

/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/scheduled_news_collector.py

Test Requirements:

Job scheduling tests with test scheduler
Job execution tests with mocked dependencies
Error handling and retry tests
Job configuration validation tests

T009: CLI Integration - Job Management Commands

Priority: Medium | Duration: 1-2 hours | Dependencies: T008

Description: Add CLI commands for news job management and manual execution

Acceptance Criteria:

CLI commands for job creation/management
Manual job execution commands
Job status and monitoring commands
Integration with existing CLI structure
Proper error handling and user feedback

Implementation Details:

# Add to cli/commands/news_commands.py
@click.group()
def news():
    """News domain management commands"""
    pass

@news.group() 
def job():
    """Job management commands"""
    pass

@job.command()
@click.option('--name', required=True, help='Job name')
@click.option('--symbols', required=True, help='Comma-separated stock symbols')
@click.option('--frequency', required=True, help='Cron expression or simple frequency')
@click.option('--categories', help='Comma-separated news categories')
async def create(name: str, symbols: str, frequency: str, categories: str):
    """Create a new news collection job"""
    try:
        symbol_list = [s.strip().upper() for s in symbols.split(',')]
        category_list = [c.strip() for c in (categories or "").split(',')] if categories else []
        
        config = NewsJobConfig(
            name=name,
            symbols=symbol_list,
            categories=category_list,
            frequency_cron=frequency,
            enabled=True
        )
        
        # Validate configuration
        errors = config.validate()
        if errors:
            click.echo(f"❌ Invalid configuration: {errors}")
            return
            
        # Create job
        repository = NewsRepository(get_database_config())
        created_config = await repository.create_job_config(config)
        
        click.echo(f"✅ Created job: {created_config.name} (ID: {created_config.id})")
        
    except Exception as e:
        click.echo(f"❌ Failed to create job: {e}")

@job.command()
async def list():
    """List all job configurations"""
    try:
        repository = NewsRepository(get_database_config())
        configs = await repository.get_all_job_configs()
        
        if not configs:
            click.echo("No jobs configured")
            return
            
        click.echo("\n📋 News Collection Jobs:")
        click.echo("=" * 60)
        
        for config in configs:
            status = "🟢 Enabled" if config.enabled else "🔴 Disabled"
            last_run = config.last_run.strftime("%Y-%m-%d %H:%M") if config.last_run else "Never"
            
            click.echo(f"{config.name}")
            click.echo(f"  Status: {status}")
            click.echo(f"  Symbols: {', '.join(config.symbols)}")
            click.echo(f"  Schedule: {config.frequency_cron}")
            click.echo(f"  Last Run: {last_run}")
            click.echo()
            
    except Exception as e:
        click.echo(f"❌ Failed to list jobs: {e}")

@job.command()
@click.argument('job_id', type=str)
async def run(job_id: str):
    """Manually execute a job"""
    try:
        repository = NewsRepository(get_database_config())
        config = await repository.get_job_config(UUID(job_id))
        
        if not config:
            click.echo(f"❌ Job not found: {job_id}")
            return
            
        click.echo(f"🚀 Running job: {config.name}")
        
        # Execute job
        service = NewsService(repository, get_trading_config())
        articles = await service.collect_and_process_news(config.symbols)
        
        click.echo(f"✅ Completed: collected {len(articles)} articles")
        
    except Exception as e:
        click.echo(f"❌ Job execution failed: {e}")

Files to Modify:

/Users/martinrichards/code/TradingAgents/cli/commands/news_commands.py

Test Requirements:

CLI command tests with mocked services
User input validation tests
Output formatting tests

Phase 5: Validation

T010: Integration Tests - End-to-End Workflow

Priority: High | Duration: 2-3 hours | Dependencies: T007, T008

Description: Comprehensive integration tests for complete news domain workflow

Acceptance Criteria:

End-to-end workflow tests from RSS to vector storage
Agent integration tests via AgentToolkit
Performance tests for daily collection volumes
Error recovery and fallback tests
Test coverage maintained above 85%

Implementation Details:

# tests/domains/news/integration/test_news_workflow.py
class TestNewsWorkflowIntegration:
    
    @pytest.mark.asyncio
    async def test_complete_news_processing_pipeline(self, test_db, mock_openrouter):
        """Test complete pipeline from RSS to vector storage"""
        # Setup
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        service = NewsService(repository, config)
        
        # Mock OpenRouter responses
        mock_openrouter.sentiment_response = {
            "score": 0.7,
            "confidence": 0.85, 
            "label": "positive"
        }
        mock_openrouter.embeddings_response = [[0.1] * 1536]
        
        # Execute pipeline
        articles = await service.collect_and_process_news(["AAPL"])
        
        # Verify results
        assert len(articles) > 0
        assert all(a.sentiment_score is not None for a in articles)
        assert all(a.title_embedding is not None for a in articles)
        
        # Verify database storage
        stored_articles = await repository.get_articles_by_symbol("AAPL")
        assert len(stored_articles) == len(articles)
        
        # Test vector similarity search
        similar = await repository.find_similar_articles(
            articles[0].title_embedding, limit=5
        )
        assert len(similar) > 0
    
    @pytest.mark.asyncio
    async def test_agent_toolkit_integration(self, test_db):
        """Test integration with AgentToolkit for RAG queries"""
        from tradingagents.agents.libs.toolkit import AgentToolkit
        
        # Setup with real data
        toolkit = AgentToolkit(test_db)
        
        # Test news context retrieval
        context = await toolkit.get_news_context("AAPL", days=7)
        assert "articles" in context
        assert "sentiment_summary" in context
        
        # Test vector similarity for context
        similar_context = await toolkit.get_similar_news(
            "Apple earnings beat expectations", limit=5
        )
        assert len(similar_context) <= 5
    
    @pytest.mark.asyncio  
    async def test_scheduler_integration(self, test_db):
        """Test APScheduler integration with job management"""
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        service = NewsService(repository, config)
        scheduler = ScheduledNewsCollector(service, repository, config)
        
        # Create test job configuration
        job_config = NewsJobConfig(
            name="test_job",
            symbols=["AAPL"],
            frequency_cron="0 */6 * * *",  # Every 6 hours
            enabled=True
        )
        await repository.create_job_config(job_config)
        
        # Test scheduler initialization
        await scheduler.initialize_scheduler()
        await scheduler.load_job_configurations()
        
        # Verify job was scheduled
        assert scheduler.scheduler.get_job(f"news_collection_{job_config.id}") is not None
        
        # Test manual job execution
        await scheduler._execute_news_collection(job_config)
        
        # Verify execution updated last_run
        updated_config = await repository.get_job_config(job_config.id)
        assert updated_config.last_run is not None
        
    @pytest.mark.asyncio
    async def test_error_recovery_and_fallbacks(self, test_db):
        """Test error handling and fallback mechanisms"""
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        service = NewsService(repository, config)
        
        # Test with failing LLM client
        with patch.object(service.sentiment_client, 'analyze_sentiment', side_effect=Exception("API Error")):
            articles = await service.collect_and_process_news(["AAPL"])
            
            # Should still process articles with fallback
            assert len(articles) > 0
            # Should have fallback sentiment values
            assert any(a.sentiment_score is not None for a in articles)
    
    @pytest.mark.asyncio
    async def test_performance_benchmarks(self, test_db):
        """Test performance meets requirements"""
        config = TradingAgentsConfig.from_test_config()
        repository = NewsRepository(test_db)
        
        # Create test articles with embeddings
        test_articles = await self._create_test_articles_with_embeddings(repository, count=1000)
        
        # Test query performance (< 100ms requirement)
        start_time = time.time()
        articles = await repository.get_recent_articles_by_symbol("AAPL", days=30)
        query_time = (time.time() - start_time) * 1000
        
        assert query_time < 100, f"Query took {query_time}ms, should be < 100ms"
        
        # Test vector similarity performance (< 1s requirement)
        start_time = time.time()
        similar = await repository.find_similar_articles(
            test_articles[0].title_embedding, limit=10
        )
        vector_time = (time.time() - start_time) * 1000
        
        assert vector_time < 1000, f"Vector search took {vector_time}ms, should be < 1s"

Files to Create:

/Users/martinrichards/code/TradingAgents/tests/domains/news/integration/test_news_workflow.py

Test Requirements:

Full workflow integration tests
AgentToolkit integration tests
Performance benchmark tests
Error scenario tests

T011: Documentation and Monitoring

Priority: Medium | Duration: 1-2 hours | Dependencies: T010

Description: Update documentation and add monitoring for new functionality

Acceptance Criteria:

Updated API documentation for new methods
Job scheduling configuration examples
Performance monitoring dashboard queries
Troubleshooting guide for common issues
Agent integration documentation

Files to Modify:

/Users/martinrichards/code/TradingAgents/docs/domains/news.md
/Users/martinrichards/code/TradingAgents/docs/api-reference.md

Test Requirements:

Documentation accuracy validation
Configuration example testing

Parallel Development Opportunities

AI Agent Collaboration Points

Tasks T005 & T006 can be developed in parallel:

Both are independent OpenRouter client implementations
Different LLM capabilities (sentiment vs embeddings)
Can be tested independently with VCR cassettes

Phase 1 Tasks (T001, T002, T003) have minimal dependencies:

T002 and T003 both depend on T001 but can be developed simultaneously
Entity layer changes are independent of each other

Critical Path Analysis

Critical Path: T001 → T002/T003 → T004 → T005/T006 → T007 → T008

Parallel Opportunities:

Foundation Phase: T002 + T003 (after T001)
LLM Integration: T005 + T006 (after T002)
Testing: Unit tests alongside implementation

Risk Mitigation Strategies

LLM API Dependencies:

Implement comprehensive fallback strategies
Use VCR for deterministic testing
Mock clients for unit tests

Database Performance:

Test with realistic data volumes
Monitor query performance during development
Use proper indexes for vector operations

Integration Complexity:

Build incrementally with testing at each step
Maintain backward compatibility
Use feature flags for gradual rollout

Success Metrics

Technical Metrics:

Test coverage >85% maintained
Query performance <100ms
Vector search performance <1s
Zero breaking changes to AgentToolkit

Functional Metrics:

Successful OpenRouter-only LLM integration
Scheduled jobs executing reliably
Agent context enriched with sentiment and similarity

Quality Metrics:

All acceptance criteria met
Comprehensive error handling
Production-ready monitoring and documentation

Implementation Guidelines

TDD Approach

Every task follows: Write test → Write code → Refactor

Layered Architecture Pattern

Strict adherence to: Database → Entity → Repository → Service → Scheduling

Error Handling Strategy

Graceful fallbacks for all LLM API dependencies

Performance Requirements

Async operations with proper connection pooling throughout

Testing Strategy

Unit tests + Integration tests + VCR for external API calls

This comprehensive task breakdown provides clear implementation guidance for completing the final 5% of the news domain while maintaining architectural consistency and leveraging AI-assisted development patterns.

37 KiB Raw Blame History

News Domain Completion - Task Implementation Guide

Overview

Implementation Phases

Phase 1: Foundation (4-7 hours)

Phase 2: Data Access (2-3 hours)

Phase 3: LLM Integration (5-8 hours)

Phase 4: Scheduling (4-6 hours)

Phase 5: Validation (3-5 hours)

Task Breakdown

Phase 1: Foundation

T001: Database Migration - NewsJobConfig Table

T002: Enhance NewsArticle Entity - Sentiment and Embeddings

T003: Create NewsJobConfig Entity

Phase 2: Data Access

T004: Enhance NewsRepository - Vector and Job Operations

Phase 3: LLM Integration

T005: OpenRouter Client - Sentiment Analysis

T006: OpenRouter Client - Vector Embeddings

T007: Enhance NewsService - LLM Integration

Phase 4: Scheduling

T008: APScheduler Integration - Job Scheduling

T009: CLI Integration - Job Management Commands

Phase 5: Validation

T010: Integration Tests - End-to-End Workflow

T011: Documentation and Monitoring

Parallel Development Opportunities

AI Agent Collaboration Points

Critical Path Analysis

Risk Mitigation Strategies

Success Metrics

Implementation Guidelines

TDD Approach

Layered Architecture Pattern

Error Handling Strategy

Performance Requirements

Testing Strategy

37 KiB

Raw Blame History