1039 lines
37 KiB
Markdown
1039 lines
37 KiB
Markdown
# News Domain Completion - Task Implementation Guide
|
|
|
|
## Overview
|
|
|
|
Complete the final 5% of the news domain by implementing OpenRouter-only LLM sentiment analysis, vector embeddings, and APScheduler job execution. This builds on 95% complete infrastructure with PostgreSQL + TimescaleDB + pgvectorscale stack.
|
|
|
|
**Total Estimated Time**: 12-16 hours with AI assistance
|
|
**Target Completion**: 3-4 days
|
|
**Test Coverage Requirement**: Maintain >85%
|
|
**Architecture Pattern**: Database → Entity → Repository → Service → Scheduling
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Foundation (4-7 hours)
|
|
Database and entity layer enhancements for LLM integration
|
|
|
|
### Phase 2: Data Access (2-3 hours)
|
|
Repository layer enhancements for vector and job operations
|
|
|
|
### Phase 3: LLM Integration (5-8 hours)
|
|
OpenRouter clients and service integration
|
|
|
|
### Phase 4: Scheduling (4-6 hours)
|
|
Job scheduling and CLI integration
|
|
|
|
### Phase 5: Validation (3-5 hours)
|
|
Testing, documentation, and monitoring
|
|
|
|
---
|
|
|
|
## Task Breakdown
|
|
|
|
### Phase 1: Foundation
|
|
|
|
#### T001: Database Migration - NewsJobConfig Table
|
|
**Priority**: Critical | **Duration**: 1-2 hours | **Dependencies**: None
|
|
|
|
**Description**: Create database migration for news job configurations table with proper indexes
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] `news_job_configs` table created with UUID primary key
|
|
- [ ] JSONB fields for symbols and categories with validation
|
|
- [ ] Proper indexes for enabled/frequency queries
|
|
- [ ] Migration script tests with rollback capability
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
# Migration structure
|
|
def upgrade():
|
|
op.create_table(
|
|
'news_job_configs',
|
|
sa.Column('id', postgresql.UUID(), primary_key=True),
|
|
sa.Column('name', sa.String(255), nullable=False),
|
|
sa.Column('symbols', postgresql.JSONB(), nullable=False),
|
|
sa.Column('categories', postgresql.JSONB(), nullable=False),
|
|
sa.Column('frequency_cron', sa.String(100), nullable=False),
|
|
sa.Column('enabled', sa.Boolean(), default=True),
|
|
sa.Column('last_run', sa.DateTime(timezone=True)),
|
|
sa.Column('created_at', sa.DateTime(timezone=True), default=func.now()),
|
|
sa.Column('updated_at', sa.DateTime(timezone=True), default=func.now())
|
|
)
|
|
|
|
# Indexes
|
|
op.create_index('idx_news_jobs_enabled_frequency', 'news_job_configs',
|
|
['enabled', 'frequency_cron'])
|
|
op.create_index('idx_news_jobs_last_run', 'news_job_configs',
|
|
['last_run'], postgresql_where=sa.text('enabled = true'))
|
|
```
|
|
|
|
**Files to Modify**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/data/migrations/add_news_job_configs.py`
|
|
|
|
**Test Requirements**:
|
|
- Migration up/down tests
|
|
- Index performance validation
|
|
- Constraint validation tests
|
|
|
|
---
|
|
|
|
#### T002: Enhance NewsArticle Entity - Sentiment and Embeddings
|
|
**Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T001
|
|
|
|
**Description**: Add LLM sentiment fields and embedding validation to NewsArticle entity
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] Add `sentiment_score`, `sentiment_confidence`, `sentiment_label` fields
|
|
- [ ] Add `title_embedding` and `content_embedding` vector fields
|
|
- [ ] Enhanced `validate()` method with sentiment range checks
|
|
- [ ] Updated transformations for vector handling
|
|
- [ ] Embedding dimension validation (1536)
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
@dataclass
|
|
class NewsArticle:
|
|
# Existing fields...
|
|
|
|
# LLM sentiment fields
|
|
sentiment_score: Optional[float] = None # [-1.0, 1.0]
|
|
sentiment_confidence: Optional[float] = None # [0.0, 1.0]
|
|
sentiment_label: Optional[str] = None # "positive", "negative", "neutral"
|
|
|
|
# Vector embedding fields
|
|
title_embedding: Optional[List[float]] = None # 1536 dimensions
|
|
content_embedding: Optional[List[float]] = None # 1536 dimensions
|
|
|
|
def validate(self) -> Dict[str, List[str]]:
|
|
errors = super().validate()
|
|
|
|
# Sentiment validation
|
|
if self.sentiment_score is not None:
|
|
if not -1.0 <= self.sentiment_score <= 1.0:
|
|
errors["sentiment_score"] = ["Must be between -1.0 and 1.0"]
|
|
|
|
if self.sentiment_confidence is not None:
|
|
if not 0.0 <= self.sentiment_confidence <= 1.0:
|
|
errors["sentiment_confidence"] = ["Must be between 0.0 and 1.0"]
|
|
|
|
# Vector dimension validation
|
|
for field, vector in [("title_embedding", self.title_embedding),
|
|
("content_embedding", self.content_embedding)]:
|
|
if vector is not None and len(vector) != 1536:
|
|
errors[field] = ["Must be exactly 1536 dimensions"]
|
|
|
|
return errors
|
|
|
|
def to_record(self) -> Dict[str, Any]:
|
|
record = super().to_record()
|
|
# Convert vectors to pgvector format if present
|
|
if self.title_embedding:
|
|
record["title_embedding"] = self.title_embedding
|
|
if self.content_embedding:
|
|
record["content_embedding"] = self.content_embedding
|
|
return record
|
|
```
|
|
|
|
**Files to Modify**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_article.py`
|
|
|
|
**Test Requirements**:
|
|
- Sentiment validation tests (range checks)
|
|
- Vector dimension validation tests
|
|
- Transformation method tests
|
|
- Business rule violation tests
|
|
|
|
---
|
|
|
|
#### T003: Create NewsJobConfig Entity
|
|
**Priority**: Critical | **Duration**: 1-2 hours | **Dependencies**: T001
|
|
|
|
**Description**: Implement NewsJobConfig entity for scheduled job management
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] NewsJobConfig dataclass with all required fields
|
|
- [ ] Business rule validation for job configuration
|
|
- [ ] Cron expression validation for frequency
|
|
- [ ] Symbol list validation
|
|
- [ ] JSON serialization for database storage
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
@dataclass
|
|
class NewsJobConfig:
|
|
id: Optional[UUID] = None
|
|
name: str = ""
|
|
symbols: List[str] = field(default_factory=list)
|
|
categories: List[str] = field(default_factory=list)
|
|
frequency_cron: str = ""
|
|
enabled: bool = True
|
|
last_run: Optional[datetime] = None
|
|
created_at: Optional[datetime] = None
|
|
updated_at: Optional[datetime] = None
|
|
|
|
def validate(self) -> Dict[str, List[str]]:
|
|
errors = {}
|
|
|
|
# Name validation
|
|
if not self.name or len(self.name) > 255:
|
|
errors["name"] = ["Name required and must be <= 255 characters"]
|
|
|
|
# Symbol validation
|
|
if not self.symbols:
|
|
errors["symbols"] = ["At least one symbol required"]
|
|
for symbol in self.symbols:
|
|
if not symbol.isupper() or not symbol.isalpha():
|
|
errors["symbols"] = ["Symbols must be uppercase letters only"]
|
|
|
|
# Cron validation
|
|
try:
|
|
from croniter import croniter
|
|
if not croniter.is_valid(self.frequency_cron):
|
|
errors["frequency_cron"] = ["Invalid cron expression"]
|
|
except ImportError:
|
|
# Fallback validation for simple intervals
|
|
if self.frequency_cron not in ["hourly", "daily", "weekly"]:
|
|
errors["frequency_cron"] = ["Invalid frequency"]
|
|
|
|
return errors
|
|
```
|
|
|
|
**Files to Create**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/entities/news_job_config.py`
|
|
|
|
**Test Requirements**:
|
|
- Job configuration validation tests
|
|
- Schedule parsing tests
|
|
- Symbol validation tests
|
|
- Serialization/deserialization tests
|
|
|
|
---
|
|
|
|
### Phase 2: Data Access
|
|
|
|
#### T004: Enhance NewsRepository - Vector and Job Operations
|
|
**Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T002, T003
|
|
|
|
**Description**: Add vector similarity search and NewsJobConfig CRUD operations
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] Vector similarity search with cosine distance
|
|
- [ ] Batch embedding update operations
|
|
- [ ] NewsJobConfig CRUD methods
|
|
- [ ] Optimized query performance for vector operations
|
|
- [ ] Proper async connection handling
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
class NewsRepository:
|
|
# Existing methods...
|
|
|
|
async def find_similar_articles(self,
|
|
embedding: List[float],
|
|
limit: int = 10,
|
|
threshold: float = 0.8) -> List[NewsArticle]:
|
|
"""Find articles similar to given embedding using cosine distance"""
|
|
query = """
|
|
SELECT *, 1 - (title_embedding <=> %s::vector) as similarity
|
|
FROM news_articles
|
|
WHERE title_embedding IS NOT NULL
|
|
AND 1 - (title_embedding <=> %s::vector) > %s
|
|
ORDER BY title_embedding <=> %s::vector
|
|
LIMIT %s
|
|
"""
|
|
|
|
async with self._get_connection() as conn:
|
|
rows = await conn.fetch(query, embedding, embedding, threshold, embedding, limit)
|
|
return [NewsArticle.from_record(dict(row)) for row in rows]
|
|
|
|
async def batch_update_embeddings(self,
|
|
articles: List[NewsArticle]) -> None:
|
|
"""Efficiently update embeddings for multiple articles"""
|
|
if not articles:
|
|
return
|
|
|
|
query = """
|
|
UPDATE news_articles
|
|
SET title_embedding = %s, content_embedding = %s, updated_at = now()
|
|
WHERE id = %s
|
|
"""
|
|
|
|
async with self._get_connection() as conn:
|
|
await conn.executemany(query, [
|
|
(article.title_embedding, article.content_embedding, article.id)
|
|
for article in articles
|
|
if article.id and (article.title_embedding or article.content_embedding)
|
|
])
|
|
|
|
# NewsJobConfig CRUD operations
|
|
async def create_job_config(self, config: NewsJobConfig) -> NewsJobConfig:
|
|
"""Create new job configuration"""
|
|
query = """
|
|
INSERT INTO news_job_configs (id, name, symbols, categories, frequency_cron, enabled)
|
|
VALUES (%s, %s, %s, %s, %s, %s)
|
|
RETURNING *
|
|
"""
|
|
|
|
config.id = config.id or uuid4()
|
|
async with self._get_connection() as conn:
|
|
row = await conn.fetchrow(query,
|
|
config.id, config.name, json.dumps(config.symbols),
|
|
json.dumps(config.categories), config.frequency_cron, config.enabled)
|
|
return NewsJobConfig.from_record(dict(row))
|
|
|
|
async def get_active_job_configs(self) -> List[NewsJobConfig]:
|
|
"""Get all enabled job configurations"""
|
|
query = "SELECT * FROM news_job_configs WHERE enabled = true"
|
|
async with self._get_connection() as conn:
|
|
rows = await conn.fetch(query)
|
|
return [NewsJobConfig.from_record(dict(row)) for row in rows]
|
|
```
|
|
|
|
**Files to Modify**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/repositories/news_repository.py`
|
|
|
|
**Test Requirements**:
|
|
- Vector similarity search tests with mock data
|
|
- Batch operation performance tests
|
|
- Job config CRUD tests
|
|
- Database connection pooling tests
|
|
|
|
---
|
|
|
|
### Phase 3: LLM Integration
|
|
|
|
#### T005: OpenRouter Client - Sentiment Analysis
|
|
**Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T002
|
|
|
|
**Description**: Implement OpenRouter client for LLM sentiment analysis
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] OpenRouter API integration for sentiment analysis
|
|
- [ ] Structured prompts for financial news sentiment
|
|
- [ ] Response parsing with Pydantic models
|
|
- [ ] Error handling with graceful fallbacks
|
|
- [ ] Retry logic with exponential backoff
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
class OpenRouterSentimentClient:
|
|
def __init__(self, config: TradingAgentsConfig):
|
|
self.api_key = config.openrouter_api_key
|
|
self.model = config.quick_think_llm
|
|
self.base_url = "https://openrouter.ai/api/v1"
|
|
|
|
async def analyze_sentiment(self, title: str, content: str) -> SentimentResult:
|
|
"""Analyze sentiment of news article"""
|
|
prompt = f"""
|
|
Analyze the sentiment of this financial news article:
|
|
|
|
Title: {title}
|
|
Content: {content[:1000]}...
|
|
|
|
Provide sentiment analysis as JSON:
|
|
{{
|
|
"score": float between -1.0 (very negative) and 1.0 (very positive),
|
|
"confidence": float between 0.0 and 1.0,
|
|
"label": "positive" | "negative" | "neutral",
|
|
"reasoning": "brief explanation"
|
|
}}
|
|
"""
|
|
|
|
try:
|
|
async with aiohttp.ClientSession() as session:
|
|
response = await self._make_request(session, prompt)
|
|
return self._parse_sentiment_response(response)
|
|
except Exception as e:
|
|
logger.warning(f"LLM sentiment analysis failed: {e}")
|
|
return self._fallback_sentiment(title, content)
|
|
|
|
def _fallback_sentiment(self, title: str, content: str) -> SentimentResult:
|
|
"""Keyword-based fallback sentiment analysis"""
|
|
# Simple keyword-based sentiment as fallback
|
|
positive_words = ["gain", "profit", "up", "growth", "buy"]
|
|
negative_words = ["loss", "down", "decline", "sell", "drop"]
|
|
|
|
text = (title + " " + content).lower()
|
|
pos_count = sum(word in text for word in positive_words)
|
|
neg_count = sum(word in text for word in negative_words)
|
|
|
|
if pos_count > neg_count:
|
|
return SentimentResult(score=0.3, confidence=0.5, label="positive")
|
|
elif neg_count > pos_count:
|
|
return SentimentResult(score=-0.3, confidence=0.5, label="negative")
|
|
else:
|
|
return SentimentResult(score=0.0, confidence=0.5, label="neutral")
|
|
```
|
|
|
|
**Files to Create**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_sentiment_client.py`
|
|
|
|
**Test Requirements**:
|
|
- Sentiment analysis API tests with VCR
|
|
- Error handling tests
|
|
- Response parsing tests
|
|
- Fallback mechanism tests
|
|
|
|
---
|
|
|
|
#### T006: OpenRouter Client - Vector Embeddings
|
|
**Priority**: Critical | **Duration**: 1-2 hours | **Dependencies**: T002
|
|
|
|
**Description**: Implement OpenRouter client for vector embeddings generation
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] OpenRouter embeddings API integration
|
|
- [ ] Text preprocessing for embedding generation
|
|
- [ ] Batch processing for multiple articles
|
|
- [ ] 1536-dimensional vector validation
|
|
- [ ] Proper error handling and retries
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
class OpenRouterEmbeddingsClient:
|
|
def __init__(self, config: TradingAgentsConfig):
|
|
self.api_key = config.openrouter_api_key
|
|
self.model = "openai/text-embedding-ada-002" # Via OpenRouter
|
|
self.base_url = "https://openrouter.ai/api/v1"
|
|
|
|
async def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
|
|
"""Generate embeddings for multiple texts"""
|
|
if not texts:
|
|
return []
|
|
|
|
try:
|
|
async with aiohttp.ClientSession() as session:
|
|
response = await self._make_embeddings_request(session, texts)
|
|
embeddings = self._parse_embeddings_response(response)
|
|
|
|
# Validate dimensions
|
|
for i, embedding in enumerate(embeddings):
|
|
if len(embedding) != 1536:
|
|
raise ValueError(f"Invalid embedding dimension at index {i}: {len(embedding)}")
|
|
|
|
return embeddings
|
|
except Exception as e:
|
|
logger.error(f"Embeddings generation failed: {e}")
|
|
# Return zero vectors as fallback
|
|
return [[0.0] * 1536 for _ in texts]
|
|
|
|
async def generate_article_embeddings(self, article: NewsArticle) -> Tuple[List[float], List[float]]:
|
|
"""Generate embeddings for article title and content"""
|
|
texts = []
|
|
|
|
# Prepare texts for embedding
|
|
if article.title:
|
|
texts.append(self._preprocess_text(article.title))
|
|
if article.summary:
|
|
# Combine title and summary for comprehensive embedding
|
|
combined_text = f"{article.title} {article.summary}"
|
|
texts.append(self._preprocess_text(combined_text))
|
|
|
|
if not texts:
|
|
return [0.0] * 1536, [0.0] * 1536
|
|
|
|
embeddings = await self.generate_embeddings(texts)
|
|
title_embedding = embeddings[0] if len(embeddings) > 0 else [0.0] * 1536
|
|
content_embedding = embeddings[1] if len(embeddings) > 1 else [0.0] * 1536
|
|
|
|
return title_embedding, content_embedding
|
|
|
|
def _preprocess_text(self, text: str) -> str:
|
|
"""Preprocess text for optimal embedding generation"""
|
|
# Remove extra whitespace and limit length
|
|
cleaned = " ".join(text.split())
|
|
return cleaned[:8000] # OpenAI embedding limit
|
|
```
|
|
|
|
**Files to Create**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/clients/openrouter_embeddings_client.py`
|
|
|
|
**Test Requirements**:
|
|
- Embeddings API tests with VCR
|
|
- Batch processing tests
|
|
- Vector dimension validation tests
|
|
- Text preprocessing tests
|
|
|
|
---
|
|
|
|
#### T007: Enhance NewsService - LLM Integration
|
|
**Priority**: Critical | **Duration**: 2-3 hours | **Dependencies**: T005, T006
|
|
|
|
**Description**: Integrate OpenRouter LLM clients into NewsService workflow
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] Replace keyword sentiment with LLM analysis
|
|
- [ ] Add embedding generation to article processing
|
|
- [ ] End-to-end article processing pipeline
|
|
- [ ] Proper error handling and fallback strategies
|
|
- [ ] Integration with existing service methods
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
class NewsService:
|
|
def __init__(self,
|
|
repository: NewsRepository,
|
|
config: TradingAgentsConfig):
|
|
self.repository = repository
|
|
self.config = config
|
|
self.sentiment_client = OpenRouterSentimentClient(config)
|
|
self.embeddings_client = OpenRouterEmbeddingsClient(config)
|
|
|
|
async def process_articles_with_llm(self, articles: List[NewsArticle]) -> List[NewsArticle]:
|
|
"""Process articles with LLM sentiment analysis and embeddings"""
|
|
processed_articles = []
|
|
|
|
for article in articles:
|
|
try:
|
|
# Generate sentiment analysis
|
|
sentiment_result = await self.sentiment_client.analyze_sentiment(
|
|
article.title, article.summary or ""
|
|
)
|
|
|
|
# Generate embeddings
|
|
title_embedding, content_embedding = await self.embeddings_client.generate_article_embeddings(article)
|
|
|
|
# Update article with LLM results
|
|
article.sentiment_score = sentiment_result.score
|
|
article.sentiment_confidence = sentiment_result.confidence
|
|
article.sentiment_label = sentiment_result.label
|
|
article.title_embedding = title_embedding
|
|
article.content_embedding = content_embedding
|
|
|
|
processed_articles.append(article)
|
|
|
|
except Exception as e:
|
|
logger.warning(f"Failed to process article {article.id}: {e}")
|
|
# Add article without LLM processing
|
|
processed_articles.append(article)
|
|
|
|
return processed_articles
|
|
|
|
async def collect_and_process_news(self, symbols: List[str]) -> List[NewsArticle]:
|
|
"""Complete pipeline: collect → process → store with LLM analysis"""
|
|
# Collect raw articles (existing functionality)
|
|
raw_articles = await self.collect_news_articles(symbols)
|
|
|
|
# Process with LLM
|
|
processed_articles = await self.process_articles_with_llm(raw_articles)
|
|
|
|
# Store processed articles
|
|
stored_articles = []
|
|
for article in processed_articles:
|
|
stored_article = await self.repository.create_article(article)
|
|
stored_articles.append(stored_article)
|
|
|
|
# Batch update embeddings for efficiency
|
|
articles_with_embeddings = [a for a in stored_articles
|
|
if a.title_embedding or a.content_embedding]
|
|
if articles_with_embeddings:
|
|
await self.repository.batch_update_embeddings(articles_with_embeddings)
|
|
|
|
return stored_articles
|
|
```
|
|
|
|
**Files to Modify**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/news_service.py`
|
|
|
|
**Test Requirements**:
|
|
- Integration tests with mocked LLM clients
|
|
- Article processing pipeline tests
|
|
- Error handling and fallback tests
|
|
- Performance tests for batch operations
|
|
|
|
---
|
|
|
|
### Phase 4: Scheduling
|
|
|
|
#### T008: APScheduler Integration - Job Scheduling
|
|
**Priority**: High | **Duration**: 3-4 hours | **Dependencies**: T003, T004, T007
|
|
|
|
**Description**: Implement scheduled news collection using APScheduler
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] APScheduler setup with PostgreSQL job store
|
|
- [ ] Scheduled job execution with proper error handling
|
|
- [ ] Job configuration loading and validation
|
|
- [ ] Status monitoring and failure recovery
|
|
- [ ] CLI integration for job management
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
class ScheduledNewsCollector:
|
|
def __init__(self,
|
|
news_service: NewsService,
|
|
repository: NewsRepository,
|
|
config: TradingAgentsConfig):
|
|
self.news_service = news_service
|
|
self.repository = repository
|
|
self.config = config
|
|
self.scheduler = None
|
|
|
|
async def initialize_scheduler(self):
|
|
"""Initialize APScheduler with PostgreSQL job store"""
|
|
from apscheduler.schedulers.asyncio import AsyncIOScheduler
|
|
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
|
|
|
|
jobstore = SQLAlchemyJobStore(url=self.config.database_url,
|
|
tablename='apscheduler_jobs')
|
|
|
|
self.scheduler = AsyncIOScheduler()
|
|
self.scheduler.add_jobstore(jobstore, 'default')
|
|
|
|
async def load_job_configurations(self):
|
|
"""Load and schedule all active job configurations"""
|
|
job_configs = await self.repository.get_active_job_configs()
|
|
|
|
for config in job_configs:
|
|
try:
|
|
await self._schedule_job(config)
|
|
except Exception as e:
|
|
logger.error(f"Failed to schedule job {config.name}: {e}")
|
|
|
|
async def _schedule_job(self, job_config: NewsJobConfig):
|
|
"""Schedule a single job configuration"""
|
|
job_id = f"news_collection_{job_config.id}"
|
|
|
|
# Remove existing job if present
|
|
if self.scheduler.get_job(job_id):
|
|
self.scheduler.remove_job(job_id)
|
|
|
|
# Add new job
|
|
from apscheduler.triggers.cron import CronTrigger
|
|
trigger = CronTrigger.from_crontab(job_config.frequency_cron)
|
|
|
|
self.scheduler.add_job(
|
|
self._execute_news_collection,
|
|
trigger=trigger,
|
|
id=job_id,
|
|
args=[job_config],
|
|
name=f"News collection: {job_config.name}",
|
|
replace_existing=True
|
|
)
|
|
|
|
async def _execute_news_collection(self, job_config: NewsJobConfig):
|
|
"""Execute news collection for a job configuration"""
|
|
try:
|
|
logger.info(f"Starting news collection job: {job_config.name}")
|
|
|
|
# Collect and process news
|
|
articles = await self.news_service.collect_and_process_news(job_config.symbols)
|
|
|
|
# Update job last run timestamp
|
|
job_config.last_run = datetime.now(timezone.utc)
|
|
await self.repository.update_job_config(job_config)
|
|
|
|
logger.info(f"Completed news collection job: {job_config.name}, "
|
|
f"collected {len(articles)} articles")
|
|
|
|
except Exception as e:
|
|
logger.error(f"News collection job failed: {job_config.name}, error: {e}")
|
|
# Could implement notification/alerting here
|
|
|
|
async def start_scheduler(self):
|
|
"""Start the scheduler"""
|
|
if not self.scheduler:
|
|
await self.initialize_scheduler()
|
|
|
|
await self.load_job_configurations()
|
|
self.scheduler.start()
|
|
logger.info("News collection scheduler started")
|
|
|
|
async def stop_scheduler(self):
|
|
"""Stop the scheduler"""
|
|
if self.scheduler:
|
|
self.scheduler.shutdown(wait=True)
|
|
logger.info("News collection scheduler stopped")
|
|
```
|
|
|
|
**Files to Create**:
|
|
- `/Users/martinrichards/code/TradingAgents/tradingagents/domains/news/services/scheduled_news_collector.py`
|
|
|
|
**Test Requirements**:
|
|
- Job scheduling tests with test scheduler
|
|
- Job execution tests with mocked dependencies
|
|
- Error handling and retry tests
|
|
- Job configuration validation tests
|
|
|
|
---
|
|
|
|
#### T009: CLI Integration - Job Management Commands
|
|
**Priority**: Medium | **Duration**: 1-2 hours | **Dependencies**: T008
|
|
|
|
**Description**: Add CLI commands for news job management and manual execution
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] CLI commands for job creation/management
|
|
- [ ] Manual job execution commands
|
|
- [ ] Job status and monitoring commands
|
|
- [ ] Integration with existing CLI structure
|
|
- [ ] Proper error handling and user feedback
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
# Add to cli/commands/news_commands.py
|
|
@click.group()
|
|
def news():
|
|
"""News domain management commands"""
|
|
pass
|
|
|
|
@news.group()
|
|
def job():
|
|
"""Job management commands"""
|
|
pass
|
|
|
|
@job.command()
|
|
@click.option('--name', required=True, help='Job name')
|
|
@click.option('--symbols', required=True, help='Comma-separated stock symbols')
|
|
@click.option('--frequency', required=True, help='Cron expression or simple frequency')
|
|
@click.option('--categories', help='Comma-separated news categories')
|
|
async def create(name: str, symbols: str, frequency: str, categories: str):
|
|
"""Create a new news collection job"""
|
|
try:
|
|
symbol_list = [s.strip().upper() for s in symbols.split(',')]
|
|
category_list = [c.strip() for c in (categories or "").split(',')] if categories else []
|
|
|
|
config = NewsJobConfig(
|
|
name=name,
|
|
symbols=symbol_list,
|
|
categories=category_list,
|
|
frequency_cron=frequency,
|
|
enabled=True
|
|
)
|
|
|
|
# Validate configuration
|
|
errors = config.validate()
|
|
if errors:
|
|
click.echo(f"❌ Invalid configuration: {errors}")
|
|
return
|
|
|
|
# Create job
|
|
repository = NewsRepository(get_database_config())
|
|
created_config = await repository.create_job_config(config)
|
|
|
|
click.echo(f"✅ Created job: {created_config.name} (ID: {created_config.id})")
|
|
|
|
except Exception as e:
|
|
click.echo(f"❌ Failed to create job: {e}")
|
|
|
|
@job.command()
|
|
async def list():
|
|
"""List all job configurations"""
|
|
try:
|
|
repository = NewsRepository(get_database_config())
|
|
configs = await repository.get_all_job_configs()
|
|
|
|
if not configs:
|
|
click.echo("No jobs configured")
|
|
return
|
|
|
|
click.echo("\n📋 News Collection Jobs:")
|
|
click.echo("=" * 60)
|
|
|
|
for config in configs:
|
|
status = "🟢 Enabled" if config.enabled else "🔴 Disabled"
|
|
last_run = config.last_run.strftime("%Y-%m-%d %H:%M") if config.last_run else "Never"
|
|
|
|
click.echo(f"{config.name}")
|
|
click.echo(f" Status: {status}")
|
|
click.echo(f" Symbols: {', '.join(config.symbols)}")
|
|
click.echo(f" Schedule: {config.frequency_cron}")
|
|
click.echo(f" Last Run: {last_run}")
|
|
click.echo()
|
|
|
|
except Exception as e:
|
|
click.echo(f"❌ Failed to list jobs: {e}")
|
|
|
|
@job.command()
|
|
@click.argument('job_id', type=str)
|
|
async def run(job_id: str):
|
|
"""Manually execute a job"""
|
|
try:
|
|
repository = NewsRepository(get_database_config())
|
|
config = await repository.get_job_config(UUID(job_id))
|
|
|
|
if not config:
|
|
click.echo(f"❌ Job not found: {job_id}")
|
|
return
|
|
|
|
click.echo(f"🚀 Running job: {config.name}")
|
|
|
|
# Execute job
|
|
service = NewsService(repository, get_trading_config())
|
|
articles = await service.collect_and_process_news(config.symbols)
|
|
|
|
click.echo(f"✅ Completed: collected {len(articles)} articles")
|
|
|
|
except Exception as e:
|
|
click.echo(f"❌ Job execution failed: {e}")
|
|
```
|
|
|
|
**Files to Modify**:
|
|
- `/Users/martinrichards/code/TradingAgents/cli/commands/news_commands.py`
|
|
|
|
**Test Requirements**:
|
|
- CLI command tests with mocked services
|
|
- User input validation tests
|
|
- Output formatting tests
|
|
|
|
---
|
|
|
|
### Phase 5: Validation
|
|
|
|
#### T010: Integration Tests - End-to-End Workflow
|
|
**Priority**: High | **Duration**: 2-3 hours | **Dependencies**: T007, T008
|
|
|
|
**Description**: Comprehensive integration tests for complete news domain workflow
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] End-to-end workflow tests from RSS to vector storage
|
|
- [ ] Agent integration tests via AgentToolkit
|
|
- [ ] Performance tests for daily collection volumes
|
|
- [ ] Error recovery and fallback tests
|
|
- [ ] Test coverage maintained above 85%
|
|
|
|
**Implementation Details**:
|
|
```python
|
|
# tests/domains/news/integration/test_news_workflow.py
|
|
class TestNewsWorkflowIntegration:
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_complete_news_processing_pipeline(self, test_db, mock_openrouter):
|
|
"""Test complete pipeline from RSS to vector storage"""
|
|
# Setup
|
|
config = TradingAgentsConfig.from_test_config()
|
|
repository = NewsRepository(test_db)
|
|
service = NewsService(repository, config)
|
|
|
|
# Mock OpenRouter responses
|
|
mock_openrouter.sentiment_response = {
|
|
"score": 0.7,
|
|
"confidence": 0.85,
|
|
"label": "positive"
|
|
}
|
|
mock_openrouter.embeddings_response = [[0.1] * 1536]
|
|
|
|
# Execute pipeline
|
|
articles = await service.collect_and_process_news(["AAPL"])
|
|
|
|
# Verify results
|
|
assert len(articles) > 0
|
|
assert all(a.sentiment_score is not None for a in articles)
|
|
assert all(a.title_embedding is not None for a in articles)
|
|
|
|
# Verify database storage
|
|
stored_articles = await repository.get_articles_by_symbol("AAPL")
|
|
assert len(stored_articles) == len(articles)
|
|
|
|
# Test vector similarity search
|
|
similar = await repository.find_similar_articles(
|
|
articles[0].title_embedding, limit=5
|
|
)
|
|
assert len(similar) > 0
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_agent_toolkit_integration(self, test_db):
|
|
"""Test integration with AgentToolkit for RAG queries"""
|
|
from tradingagents.agents.libs.toolkit import AgentToolkit
|
|
|
|
# Setup with real data
|
|
toolkit = AgentToolkit(test_db)
|
|
|
|
# Test news context retrieval
|
|
context = await toolkit.get_news_context("AAPL", days=7)
|
|
assert "articles" in context
|
|
assert "sentiment_summary" in context
|
|
|
|
# Test vector similarity for context
|
|
similar_context = await toolkit.get_similar_news(
|
|
"Apple earnings beat expectations", limit=5
|
|
)
|
|
assert len(similar_context) <= 5
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_scheduler_integration(self, test_db):
|
|
"""Test APScheduler integration with job management"""
|
|
config = TradingAgentsConfig.from_test_config()
|
|
repository = NewsRepository(test_db)
|
|
service = NewsService(repository, config)
|
|
scheduler = ScheduledNewsCollector(service, repository, config)
|
|
|
|
# Create test job configuration
|
|
job_config = NewsJobConfig(
|
|
name="test_job",
|
|
symbols=["AAPL"],
|
|
frequency_cron="0 */6 * * *", # Every 6 hours
|
|
enabled=True
|
|
)
|
|
await repository.create_job_config(job_config)
|
|
|
|
# Test scheduler initialization
|
|
await scheduler.initialize_scheduler()
|
|
await scheduler.load_job_configurations()
|
|
|
|
# Verify job was scheduled
|
|
assert scheduler.scheduler.get_job(f"news_collection_{job_config.id}") is not None
|
|
|
|
# Test manual job execution
|
|
await scheduler._execute_news_collection(job_config)
|
|
|
|
# Verify execution updated last_run
|
|
updated_config = await repository.get_job_config(job_config.id)
|
|
assert updated_config.last_run is not None
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_error_recovery_and_fallbacks(self, test_db):
|
|
"""Test error handling and fallback mechanisms"""
|
|
config = TradingAgentsConfig.from_test_config()
|
|
repository = NewsRepository(test_db)
|
|
service = NewsService(repository, config)
|
|
|
|
# Test with failing LLM client
|
|
with patch.object(service.sentiment_client, 'analyze_sentiment', side_effect=Exception("API Error")):
|
|
articles = await service.collect_and_process_news(["AAPL"])
|
|
|
|
# Should still process articles with fallback
|
|
assert len(articles) > 0
|
|
# Should have fallback sentiment values
|
|
assert any(a.sentiment_score is not None for a in articles)
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_performance_benchmarks(self, test_db):
|
|
"""Test performance meets requirements"""
|
|
config = TradingAgentsConfig.from_test_config()
|
|
repository = NewsRepository(test_db)
|
|
|
|
# Create test articles with embeddings
|
|
test_articles = await self._create_test_articles_with_embeddings(repository, count=1000)
|
|
|
|
# Test query performance (< 100ms requirement)
|
|
start_time = time.time()
|
|
articles = await repository.get_recent_articles_by_symbol("AAPL", days=30)
|
|
query_time = (time.time() - start_time) * 1000
|
|
|
|
assert query_time < 100, f"Query took {query_time}ms, should be < 100ms"
|
|
|
|
# Test vector similarity performance (< 1s requirement)
|
|
start_time = time.time()
|
|
similar = await repository.find_similar_articles(
|
|
test_articles[0].title_embedding, limit=10
|
|
)
|
|
vector_time = (time.time() - start_time) * 1000
|
|
|
|
assert vector_time < 1000, f"Vector search took {vector_time}ms, should be < 1s"
|
|
```
|
|
|
|
**Files to Create**:
|
|
- `/Users/martinrichards/code/TradingAgents/tests/domains/news/integration/test_news_workflow.py`
|
|
|
|
**Test Requirements**:
|
|
- Full workflow integration tests
|
|
- AgentToolkit integration tests
|
|
- Performance benchmark tests
|
|
- Error scenario tests
|
|
|
|
---
|
|
|
|
#### T011: Documentation and Monitoring
|
|
**Priority**: Medium | **Duration**: 1-2 hours | **Dependencies**: T010
|
|
|
|
**Description**: Update documentation and add monitoring for new functionality
|
|
|
|
**Acceptance Criteria**:
|
|
- [ ] Updated API documentation for new methods
|
|
- [ ] Job scheduling configuration examples
|
|
- [ ] Performance monitoring dashboard queries
|
|
- [ ] Troubleshooting guide for common issues
|
|
- [ ] Agent integration documentation
|
|
|
|
**Files to Modify**:
|
|
- `/Users/martinrichards/code/TradingAgents/docs/domains/news.md`
|
|
- `/Users/martinrichards/code/TradingAgents/docs/api-reference.md`
|
|
|
|
**Test Requirements**:
|
|
- Documentation accuracy validation
|
|
- Configuration example testing
|
|
|
|
---
|
|
|
|
## Parallel Development Opportunities
|
|
|
|
### AI Agent Collaboration Points
|
|
|
|
**Tasks T005 & T006** can be developed in parallel:
|
|
- Both are independent OpenRouter client implementations
|
|
- Different LLM capabilities (sentiment vs embeddings)
|
|
- Can be tested independently with VCR cassettes
|
|
|
|
**Phase 1 Tasks (T001, T002, T003)** have minimal dependencies:
|
|
- T002 and T003 both depend on T001 but can be developed simultaneously
|
|
- Entity layer changes are independent of each other
|
|
|
|
### Critical Path Analysis
|
|
|
|
**Critical Path**: T001 → T002/T003 → T004 → T005/T006 → T007 → T008
|
|
|
|
**Parallel Opportunities**:
|
|
1. **Foundation Phase**: T002 + T003 (after T001)
|
|
2. **LLM Integration**: T005 + T006 (after T002)
|
|
3. **Testing**: Unit tests alongside implementation
|
|
|
|
### Risk Mitigation Strategies
|
|
|
|
**LLM API Dependencies**:
|
|
- Implement comprehensive fallback strategies
|
|
- Use VCR for deterministic testing
|
|
- Mock clients for unit tests
|
|
|
|
**Database Performance**:
|
|
- Test with realistic data volumes
|
|
- Monitor query performance during development
|
|
- Use proper indexes for vector operations
|
|
|
|
**Integration Complexity**:
|
|
- Build incrementally with testing at each step
|
|
- Maintain backward compatibility
|
|
- Use feature flags for gradual rollout
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
**Technical Metrics**:
|
|
- Test coverage >85% maintained
|
|
- Query performance <100ms
|
|
- Vector search performance <1s
|
|
- Zero breaking changes to AgentToolkit
|
|
|
|
**Functional Metrics**:
|
|
- Successful OpenRouter-only LLM integration
|
|
- Scheduled jobs executing reliably
|
|
- Agent context enriched with sentiment and similarity
|
|
|
|
**Quality Metrics**:
|
|
- All acceptance criteria met
|
|
- Comprehensive error handling
|
|
- Production-ready monitoring and documentation
|
|
|
|
---
|
|
|
|
## Implementation Guidelines
|
|
|
|
### TDD Approach
|
|
**Every task follows**: Write test → Write code → Refactor
|
|
|
|
### Layered Architecture Pattern
|
|
**Strict adherence to**: Database → Entity → Repository → Service → Scheduling
|
|
|
|
### Error Handling Strategy
|
|
**Graceful fallbacks** for all LLM API dependencies
|
|
|
|
### Performance Requirements
|
|
**Async operations** with proper connection pooling throughout
|
|
|
|
### Testing Strategy
|
|
**Unit tests + Integration tests + VCR** for external API calls
|
|
|
|
---
|
|
|
|
This comprehensive task breakdown provides clear implementation guidance for completing the final 5% of the news domain while maintaining architectural consistency and leveraging AI-assisted development patterns. |