# News Domain Technical Design ## Overview This document details the technical design for completing the final 5% of the News domain implementation. The existing infrastructure is 95% complete with Google News collection, article scraping, and basic storage implemented. The remaining work focuses on **scheduled execution**, **LLM-powered sentiment analysis**, and **vector embeddings** using OpenRouter as the unified LLM provider. ## Architecture Overview ### Component Relationships ```mermaid graph TD A[APScheduler] --> B[ScheduledNewsCollector] B --> C[NewsService] C --> D[GoogleNewsClient] C --> E[ArticleScraperClient] C --> F[OpenRouter LLM Client] C --> G[OpenRouter Embeddings Client] C --> H[NewsRepository] H --> I[PostgreSQL + TimescaleDB + pgvectorscale] J[News Analysts] --> K[AgentToolkit] K --> C K --> H ``` ### Data Flow Architecture 1. **Scheduled Collection Flow** ``` APScheduler → ScheduledNewsCollector → NewsService.update_company_news() → GoogleNewsClient → ArticleScraperClient → OpenRouter (sentiment + embeddings) → NewsRepository.upsert_batch() → PostgreSQL ``` 2. **Agent Query Flow** ``` News Analyst → AgentToolkit → NewsService.find_relevant_articles() → NewsRepository (semantic search) → pgvectorscale vector similarity ``` ### Key Design Principles - **Leverage Existing 95%**: Build on proven GoogleNewsClient and ArticleScraperClient infrastructure - **OpenRouter Unified**: Single API for both sentiment analysis and embeddings - **Best-Effort Processing**: LLM failures don't block article storage - **Vector-Enhanced Search**: Semantic similarity for News Analysts - **Fault-Tolerant Scheduling**: Robust error handling and monitoring ## Domain Model ### Enhanced NewsArticle Entity The existing `NewsArticle` entity requires enhancements for structured sentiment and vector support: ```python from typing import Optional, Dict, Any, List from pydantic import BaseModel, Field, validator import datetime class SentimentScore(BaseModel): """Structured sentiment analysis result""" sentiment: Literal["positive", "negative", "neutral"] confidence: float = Field(ge=0.0, le=1.0) reasoning: str @validator('confidence') def validate_confidence(cls, v): if v < 0.5: raise ValueError("Confidence must be >= 0.5 for reliable sentiment") return v class NewsArticle(BaseModel): """Enhanced NewsArticle entity with sentiment and vector support""" # Existing fields (95% complete) headline: str url: str = Field(..., regex=r'^https?://') source: str published_date: datetime.datetime summary: Optional[str] = None entities: List[str] = Field(default_factory=list) author: Optional[str] = None category: Optional[str] = None # Enhanced fields (final 5%) sentiment_score: Optional[SentimentScore] = None title_embedding: Optional[List[float]] = Field(None, min_items=1536, max_items=1536) content_embedding: Optional[List[float]] = Field(None, min_items=1536, max_items=1536) # Metadata created_at: datetime.datetime = Field(default_factory=datetime.datetime.now) updated_at: datetime.datetime = Field(default_factory=datetime.datetime.now) @validator('content_embedding', 'title_embedding') def validate_embeddings(cls, v): if v and len(v) != 1536: raise ValueError("Embeddings must be 1536 dimensions for OpenRouter compatibility") return v def has_reliable_sentiment(self) -> bool: """Check if sentiment analysis is reliable (confidence >= 0.5)""" return bool(self.sentiment_score and self.sentiment_score.confidence >= 0.5) def to_record(self) -> Dict[str, Any]: """Convert to database record format""" record = self.dict() # Convert sentiment to JSONB format if self.sentiment_score: record['sentiment_score'] = self.sentiment_score.dict() return record @classmethod def from_record(cls, record: Dict[str, Any]) -> 'NewsArticle': """Create entity from database record""" if record.get('sentiment_score'): record['sentiment_score'] = SentimentScore(**record['sentiment_score']) return cls(**record) ``` ### New NewsJobConfig Entity Configuration entity for scheduled news collection: ```python from pydantic import BaseModel, Field, validator from typing import List class NewsJobConfig(BaseModel): """Configuration for scheduled news collection jobs""" tickers: List[str] = Field(..., min_items=1, max_items=50) schedule_hour: int = Field(..., ge=0, le=23) sentiment_model: str = Field(default="anthropic/claude-3.5-haiku") embedding_model: str = Field(default="text-embedding-3-large") max_articles_per_ticker: int = Field(default=20, ge=5, le=100) lookback_days: int = Field(default=7, ge=1, le=30) @validator('tickers') def validate_tickers(cls, v): # Ensure uppercase stock symbols return [ticker.upper().strip() for ticker in v] @validator('sentiment_model') def validate_sentiment_model(cls, v): # Ensure OpenRouter model format if '/' not in v: raise ValueError("Model must be in OpenRouter format (provider/model)") return v def to_cron_expression(self) -> str: """Convert to cron expression for APScheduler""" return f"0 {self.schedule_hour} * * *" # Daily at specified hour ``` ## Database Design ### Schema Enhancements The existing `news_articles` table requires minimal modifications to support the final 5%: ```sql -- Existing table structure (95% complete) CREATE TABLE IF NOT EXISTS news_articles ( id SERIAL PRIMARY KEY, headline TEXT NOT NULL, url TEXT UNIQUE NOT NULL, source TEXT NOT NULL, published_date TIMESTAMPTZ NOT NULL, summary TEXT, entities TEXT[] DEFAULT '{}', sentiment_score JSONB, -- Enhanced for structured format author TEXT, category TEXT, title_embedding vector(1536), -- New: pgvectorscale vector type content_embedding vector(1536), -- New: pgvectorscale vector type created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); -- New indexes for final 5% performance CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_news_articles_symbol_date ON news_articles (((entities)), published_date DESC); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_news_articles_title_embedding ON news_articles USING vectors (title_embedding vector_cosine_ops); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_news_articles_content_embedding ON news_articles USING vectors (content_embedding vector_cosine_ops); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_news_articles_sentiment ON news_articles (((sentiment_score->>'sentiment'))) WHERE sentiment_score IS NOT NULL; ``` ### Query Patterns **Time-based News Queries (News Analysts)** ```sql -- Optimized for Agent queries: recent news for specific ticker SELECT headline, summary, sentiment_score, published_date FROM news_articles WHERE entities @> ARRAY[$1::text] AND published_date >= NOW() - INTERVAL '30 days' ORDER BY published_date DESC LIMIT 20; ``` **Semantic Similarity Queries (Vector Search)** ```sql -- Find similar articles using pgvectorscale SELECT headline, url, summary, 1 - (title_embedding <=> $1::vector) AS similarity_score FROM news_articles WHERE entities @> ARRAY[$2::text] AND title_embedding IS NOT NULL ORDER BY title_embedding <=> $1::vector LIMIT 10; ``` **Batch Upsert Operations (Daily Collection)** ```sql -- Efficient upsert for daily news collection INSERT INTO news_articles (headline, url, source, published_date, summary, entities, sentiment_score, title_embedding, content_embedding) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) ON CONFLICT (url) DO UPDATE SET headline = EXCLUDED.headline, summary = EXCLUDED.summary, entities = EXCLUDED.entities, sentiment_score = EXCLUDED.sentiment_score, title_embedding = EXCLUDED.title_embedding, content_embedding = EXCLUDED.content_embedding, updated_at = NOW(); ``` ## API Integration ### OpenRouter Unified Client Single OpenRouter integration for both sentiment analysis and embeddings: ```python from typing import List, Optional, Dict, Any import httpx from tradingagents.config import TradingAgentsConfig class OpenRouterClient: """Unified OpenRouter client for sentiment analysis and embeddings""" def __init__(self, config: TradingAgentsConfig): self.config = config self.base_url = "https://openrouter.ai/api/v1" self.headers = { "Authorization": f"Bearer {config.openrouter_api_key}", "Content-Type": "application/json" } async def analyze_sentiment(self, text: str, model: Optional[str] = None) -> SentimentScore: """Generate structured sentiment analysis using LLM""" model = model or self.config.quick_think_llm prompt = f"""Analyze the sentiment of this news article text and respond with ONLY a JSON object: Article: {text[:2000]} # Truncate for token limits Required JSON format: {{ "sentiment": "positive|negative|neutral", "confidence": 0.0-1.0, "reasoning": "brief explanation" }}""" payload = { "model": model, "messages": [{"role": "user", "content": prompt}], "temperature": 0.1, # Low temperature for consistent structured output "max_tokens": 200 } async with httpx.AsyncClient() as client: try: response = await client.post( f"{self.base_url}/chat/completions", headers=self.headers, json=payload, timeout=30.0 ) response.raise_for_status() result = response.json() content = result["choices"][0]["message"]["content"].strip() # Parse JSON response import json sentiment_data = json.loads(content) return SentimentScore(**sentiment_data) except Exception as e: # Best-effort: return neutral sentiment on failure return SentimentScore( sentiment="neutral", confidence=0.3, # Below reliability threshold reasoning=f"Analysis failed: {str(e)[:100]}" ) async def generate_embeddings(self, texts: List[str], model: Optional[str] = None) -> List[List[float]]: """Generate embeddings for multiple texts""" model = model or "text-embedding-3-large" # Truncate texts to avoid token limits truncated_texts = [text[:8000] for text in texts] payload = { "model": model, "input": truncated_texts } async with httpx.AsyncClient() as client: try: response = await client.post( f"{self.base_url}/embeddings", headers=self.headers, json=payload, timeout=60.0 ) response.raise_for_status() result = response.json() return [item["embedding"] for item in result["data"]] except Exception as e: # Return None embeddings on failure (stored as NULL in DB) return [None] * len(texts) ``` ### Enhanced NewsService Integration Update existing NewsService to integrate LLM capabilities: ```python class NewsService: """Enhanced NewsService with LLM sentiment and embeddings (final 5%)""" def __init__(self, repository: NewsRepository, google_client: GoogleNewsClient, scraper_client: ArticleScraperClient, openrouter_client: OpenRouterClient): self.repository = repository self.google_client = google_client self.scraper_client = scraper_client self.openrouter_client = openrouter_client async def update_company_news(self, symbol: str, lookback_days: int = 7, max_articles: int = 20, include_sentiment: bool = True, include_embeddings: bool = True) -> List[NewsArticle]: """Enhanced method with LLM sentiment analysis and embeddings""" # Step 1: Use existing 95% infrastructure for collection cutoff_date = datetime.datetime.now() - datetime.timedelta(days=lookback_days) # Fetch from Google News (existing) google_results = await self.google_client.fetch_company_news(symbol, max_articles) articles = [] for result in google_results: if result.published_date < cutoff_date: continue # Scrape full content (existing) scraped_content = await self.scraper_client.scrape_article(result.url) # Create base article (existing pattern) article = NewsArticle( headline=result.title, url=result.url, source=result.source, published_date=result.published_date, summary=scraped_content.summary if scraped_content else result.description, entities=[symbol], author=scraped_content.author if scraped_content else None ) # Step 2: NEW - Add LLM sentiment analysis if include_sentiment and scraped_content and scraped_content.content: article.sentiment_score = await self.openrouter_client.analyze_sentiment( scraped_content.content ) articles.append(article) # Step 3: NEW - Batch generate embeddings if include_embeddings and articles: titles = [a.headline for a in articles] contents = [a.summary or a.headline for a in articles] title_embeddings = await self.openrouter_client.generate_embeddings(titles) content_embeddings = await self.openrouter_client.generate_embeddings(contents) for i, article in enumerate(articles): if i < len(title_embeddings) and title_embeddings[i]: article.title_embedding = title_embeddings[i] if i < len(content_embeddings) and content_embeddings[i]: article.content_embedding = content_embeddings[i] # Step 4: Batch persist (existing pattern) await self.repository.upsert_batch(articles) return articles async def find_similar_articles(self, query_text: str, symbol: Optional[str] = None, limit: int = 10) -> List[NewsArticle]: """NEW: Semantic similarity search for News Analysts""" # Generate query embedding query_embeddings = await self.openrouter_client.generate_embeddings([query_text]) if not query_embeddings[0]: # Fallback to text search return await self.repository.find_by_text_search(query_text, symbol, limit) return await self.repository.find_similar_articles( query_embeddings[0], symbol, limit ) ``` ## Job Scheduling Architecture ### APScheduler Integration Robust scheduled execution using APScheduler: ```python from apscheduler.schedulers.asyncio import AsyncIOScheduler from apscheduler.jobstores.redis import RedisJobStore # Optional: persistent job store from apscheduler.executors.asyncio import AsyncIOExecutor import logging class ScheduledNewsCollector: """Orchestrates scheduled news collection jobs""" def __init__(self, news_service: NewsService, config: TradingAgentsConfig, job_config: NewsJobConfig): self.news_service = news_service self.config = config self.job_config = job_config # Configure APScheduler jobstores = { 'default': {'type': 'memory'} # Use Redis for production } executors = { 'default': AsyncIOExecutor(), } job_defaults = { 'coalesce': False, # Don't combine missed jobs 'max_instances': 1, # One job per ticker at a time 'misfire_grace_time': 300 # 5 minute grace period } self.scheduler = AsyncIOScheduler( jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone='UTC' ) async def start(self): """Start the scheduler and register jobs""" for ticker in self.job_config.tickers: # Schedule daily collection for each ticker self.scheduler.add_job( func=self._collect_ticker_news, trigger='cron', hour=self.job_config.schedule_hour, minute=0, args=[ticker], id=f"news_collection_{ticker}", replace_existing=True, max_instances=1 ) self.scheduler.start() logging.info(f"Started news collection scheduler for {len(self.job_config.tickers)} tickers") async def stop(self): """Gracefully stop the scheduler""" if self.scheduler.running: self.scheduler.shutdown(wait=True) async def _collect_ticker_news(self, ticker: str): """Execute news collection for a single ticker""" start_time = datetime.datetime.now() try: logging.info(f"Starting news collection for {ticker}") articles = await self.news_service.update_company_news( symbol=ticker, lookback_days=self.job_config.lookback_days, max_articles=self.job_config.max_articles_per_ticker, include_sentiment=True, include_embeddings=True ) # Log metrics sentiment_count = sum(1 for a in articles if a.has_reliable_sentiment()) embedding_count = sum(1 for a in articles if a.title_embedding) duration = (datetime.datetime.now() - start_time).total_seconds() logging.info( f"Completed news collection for {ticker}: " f"{len(articles)} articles, {sentiment_count} with sentiment, " f"{embedding_count} with embeddings in {duration:.1f}s" ) except Exception as e: logging.error(f"News collection failed for {ticker}: {str(e)}") # Don't raise - let scheduler continue with other tickers def get_job_status(self) -> Dict[str, Any]: """Get status of all scheduled jobs""" jobs = self.scheduler.get_jobs() return { "scheduler_running": self.scheduler.running, "job_count": len(jobs), "jobs": [ { "id": job.id, "next_run": job.next_run_time.isoformat() if job.next_run_time else None, "trigger": str(job.trigger) } for job in jobs ] } ``` ### Error Handling and Monitoring Comprehensive error handling for production reliability: ```python class NewsCollectionMonitor: """Monitor and handle news collection job failures""" def __init__(self, collector: ScheduledNewsCollector): self.collector = collector self.failure_counts = defaultdict(int) self.max_failures = 3 async def handle_job_failure(self, ticker: str, error: Exception): """Handle job failure with exponential backoff""" self.failure_counts[ticker] += 1 if self.failure_counts[ticker] >= self.max_failures: logging.error(f"Max failures reached for {ticker}, disabling job") self.collector.scheduler.remove_job(f"news_collection_{ticker}") # Could send alert here else: # Schedule retry with exponential backoff delay_minutes = 2 ** self.failure_counts[ticker] retry_time = datetime.datetime.now() + datetime.timedelta(minutes=delay_minutes) self.collector.scheduler.add_job( func=self.collector._collect_ticker_news, trigger='date', run_date=retry_time, args=[ticker], id=f"news_retry_{ticker}_{int(retry_time.timestamp())}", max_instances=1 ) def reset_failure_count(self, ticker: str): """Reset failure count on successful job""" if ticker in self.failure_counts: del self.failure_counts[ticker] ``` ## Implementation Strategy ### Phase 1: Entity and Database Enhancements (Week 1) **Deliverables:** - [ ] Enhanced `NewsArticle` entity with `SentimentScore` and vector support - [ ] New `NewsJobConfig` entity with validation - [ ] Database migration for vector indexes and sentiment_score JSONB enhancement - [ ] Repository method `find_similar_articles()` with pgvectorscale integration **Testing Focus:** - Unit tests for entity validation and serialization - Repository integration tests with vector similarity queries - Database migration verification ### Phase 2: OpenRouter Integration (Week 2) **Deliverables:** - [ ] `OpenRouterClient` with sentiment analysis and embeddings - [ ] Enhanced `NewsService.update_company_news()` with LLM integration - [ ] Error handling for LLM failures (best-effort approach) - [ ] Integration tests with OpenRouter API (using pytest-vcr) **Testing Focus:** - Mock OpenRouter responses for consistent testing - Error handling scenarios (API failures, malformed responses) - Embedding dimension validation ### Phase 3: Job Scheduling System (Week 3) **Deliverables:** - [ ] `ScheduledNewsCollector` with APScheduler integration - [ ] `NewsCollectionMonitor` for error handling and retries - [ ] Configuration management for job scheduling - [ ] Graceful startup and shutdown procedures **Testing Focus:** - Scheduler lifecycle testing - Job execution and failure handling - Configuration validation ### Phase 4: Testing and Performance Optimization (Week 4) **Deliverables:** - [ ] Complete test coverage maintaining >85% threshold - [ ] Performance optimization for vector queries - [ ] Documentation and deployment guides - [ ] Integration with existing News Analyst AgentToolkit **Testing Focus:** - End-to-end integration tests - Performance benchmarks for vector similarity queries - Load testing for scheduled job execution ## Testing Strategy ### Test Architecture Following the existing pragmatic TDD approach with mock boundaries: ``` tests/domains/news/ ├── __init__.py ├── test_news_entities.py # Entity validation and serialization ├── test_news_service.py # Mock repository and OpenRouter client ├── test_news_repository.py # PostgreSQL test database ├── test_openrouter_client.py # pytest-vcr for API responses ├── test_scheduled_collector.py # Mock APScheduler and services └── integration/ ├── test_sentiment_pipeline.py # End-to-end sentiment analysis ├── test_embedding_pipeline.py # End-to-end embedding generation └── test_scheduled_execution.py # Full job execution cycle ``` ### Key Test Categories **Entity Tests (Fast Unit Tests)** ```python def test_news_article_sentiment_validation(): """Test sentiment score validation and reliability checks""" # Valid sentiment sentiment = SentimentScore( sentiment="positive", confidence=0.8, reasoning="Strong positive language" ) article = NewsArticle( headline="Test headline", url="https://example.com", source="Test Source", published_date=datetime.datetime.now(), sentiment_score=sentiment ) assert article.has_reliable_sentiment() == True # Low confidence sentiment low_confidence = SentimentScore( sentiment="neutral", confidence=0.3, reasoning="Ambiguous language" ) article.sentiment_score = low_confidence assert article.has_reliable_sentiment() == False def test_news_article_vector_validation(): """Test vector embedding validation""" # Valid 1536-dimension embedding valid_embedding = [0.1] * 1536 article = NewsArticle( headline="Test", url="https://example.com", source="Test", published_date=datetime.datetime.now(), title_embedding=valid_embedding ) assert len(article.title_embedding) == 1536 # Invalid dimension should raise ValidationError with pytest.raises(ValidationError): NewsArticle( headline="Test", url="https://example.com", source="Test", published_date=datetime.datetime.now(), title_embedding=[0.1] * 512 # Wrong dimension ) ``` **Service Integration Tests (Mock Boundaries)** ```python @pytest.mark.asyncio async def test_news_service_with_sentiment_analysis(mock_openrouter_client, mock_repository): """Test NewsService integration with mocked LLM client""" # Mock successful sentiment analysis mock_sentiment = SentimentScore( sentiment="positive", confidence=0.9, reasoning="Optimistic financial outlook" ) mock_openrouter_client.analyze_sentiment.return_value = mock_sentiment # Mock embeddings mock_openrouter_client.generate_embeddings.return_value = [ [0.1] * 1536, # title embedding [0.2] * 1536 # content embedding ] service = NewsService( repository=mock_repository, google_client=mock_google_client, scraper_client=mock_scraper_client, openrouter_client=mock_openrouter_client ) articles = await service.update_company_news("AAPL", include_sentiment=True) # Verify LLM integration assert len(articles) > 0 assert articles[0].sentiment_score == mock_sentiment assert articles[0].title_embedding == [0.1] * 1536 assert mock_openrouter_client.analyze_sentiment.called assert mock_openrouter_client.generate_embeddings.called ``` **Repository Integration Tests (Real Database)** ```python @pytest.mark.asyncio async def test_repository_vector_similarity_search(test_db): """Test vector similarity search with real pgvectorscale""" repository = NewsRepository(test_db) # Insert articles with embeddings article1 = NewsArticle( headline="Apple reports strong iPhone sales", url="https://example.com/1", source="TechNews", published_date=datetime.datetime.now(), entities=["AAPL"], title_embedding=[0.1, 0.2] + [0.0] * 1534 # Similar to query ) article2 = NewsArticle( headline="Microsoft launches new Azure features", url="https://example.com/2", source="CloudNews", published_date=datetime.datetime.now(), entities=["MSFT"], title_embedding=[0.9, 0.8] + [0.0] * 1534 # Different from query ) await repository.upsert_batch([article1, article2]) # Query with similar embedding query_embedding = [0.15, 0.25] + [0.0] * 1534 similar_articles = await repository.find_similar_articles( query_embedding, symbol="AAPL", limit=1 ) assert len(similar_articles) == 1 assert similar_articles[0].headline == "Apple reports strong iPhone sales" ``` **API Integration Tests (pytest-vcr)** ```python @pytest.mark.vcr @pytest.mark.asyncio async def test_openrouter_sentiment_analysis(): """Test real OpenRouter API calls with VCR cassettes""" config = TradingAgentsConfig.from_env() client = OpenRouterClient(config) test_text = "Apple's quarterly earnings exceeded expectations with strong iPhone sales." sentiment = await client.analyze_sentiment(test_text) assert isinstance(sentiment, SentimentScore) assert sentiment.sentiment in ["positive", "negative", "neutral"] assert 0.0 <= sentiment.confidence <= 1.0 assert len(sentiment.reasoning) > 0 @pytest.mark.vcr @pytest.mark.asyncio async def test_openrouter_embeddings_generation(): """Test real OpenRouter embeddings API with VCR""" config = TradingAgentsConfig.from_env() client = OpenRouterClient(config) texts = ["Apple stock rises", "Market volatility increases"] embeddings = await client.generate_embeddings(texts) assert len(embeddings) == 2 assert all(len(emb) == 1536 for emb in embeddings) assert all(isinstance(val, float) for emb in embeddings for val in emb) ``` ### Coverage Requirements Maintain existing >85% coverage with new components: - **Entity Layer**: 95% coverage (comprehensive validation testing) - **Service Layer**: 90% coverage (mock external dependencies) - **Repository Layer**: 85% coverage (real database integration tests) - **Client Layer**: 80% coverage (pytest-vcr for API calls) - **Integration Tests**: End-to-end scenarios covering complete workflows ### Performance Testing ```python @pytest.mark.performance @pytest.mark.asyncio async def test_vector_similarity_performance(): """Ensure vector similarity queries perform under 100ms""" repository = NewsRepository(test_db) # Insert 1000 articles with embeddings articles = [create_test_article_with_embedding() for _ in range(1000)] await repository.upsert_batch(articles) query_embedding = [random.random() for _ in range(1536)] start_time = time.time() results = await repository.find_similar_articles(query_embedding, limit=10) duration = time.time() - start_time assert duration < 0.1 # Under 100ms assert len(results) == 10 ``` ## Integration Points ### News Analyst AgentToolkit Integration The completed News domain integrates seamlessly with existing News Analyst agents: ```python class NewsAnalystToolkit: """Enhanced toolkit with semantic search capabilities""" def __init__(self, news_service: NewsService): self.news_service = news_service async def get_relevant_news(self, ticker: str, query: Optional[str] = None, days_back: int = 30) -> List[Dict[str, Any]]: """Get news with optional semantic search""" if query: # Use semantic similarity search articles = await self.news_service.find_similar_articles( query_text=query, symbol=ticker, limit=20 ) else: # Use time-based search (existing) articles = await self.news_service.find_recent_news( symbol=ticker, days_back=days_back ) return [ { "headline": article.headline, "summary": article.summary, "published_date": article.published_date.isoformat(), "sentiment": article.sentiment_score.sentiment if article.sentiment_score else "unknown", "confidence": article.sentiment_score.confidence if article.sentiment_score else 0.0, "source": article.source, "url": article.url } for article in articles ] ``` ### Configuration Integration Seamless integration with existing `TradingAgentsConfig`: ```python # Enhanced configuration for news domain completion config = TradingAgentsConfig( # Existing LLM configuration llm_provider="openrouter", openrouter_api_key=os.getenv("OPENROUTER_API_KEY"), quick_think_llm="anthropic/claude-3.5-haiku", # For sentiment analysis # New news-specific settings news_collection_enabled=True, news_schedule_hour=6, # UTC news_sentiment_enabled=True, news_embeddings_enabled=True, news_max_articles_per_ticker=20, # Database (existing) database_url=os.getenv("DATABASE_URL"), ) # Job configuration news_job_config = NewsJobConfig( tickers=["AAPL", "GOOGL", "MSFT", "TSLA", "NVDA"], schedule_hour=6, # 6 AM UTC daily collection sentiment_model=config.quick_think_llm, embedding_model="text-embedding-3-large", max_articles_per_ticker=20 ) ``` This design completes the final 5% of the News domain while leveraging the existing 95% infrastructure, maintaining architectural consistency, and providing the robust scheduled execution, LLM-powered sentiment analysis, and vector embeddings needed for advanced News Analyst capabilities.