543 lines
16 KiB
Markdown
543 lines
16 KiB
Markdown
# Technical Standards - TradingAgents
|
|
|
|
## Database Architecture
|
|
|
|
### Core Stack: PostgreSQL + TimescaleDB + pgvectorscale
|
|
|
|
**Primary Database**: PostgreSQL 16+ with TimescaleDB and pgvector extensions
|
|
- **TimescaleDB**: Optimized for time-series financial data (prices, volumes, news timestamps)
|
|
- **pgvector/pgvectorscale**: Vector embeddings for RAG-powered agents
|
|
- **Connection**: asyncpg driver for high-performance async operations
|
|
|
|
**Database URL Pattern**:
|
|
```python
|
|
# Development
|
|
DATABASE_URL = "postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents"
|
|
|
|
# Production
|
|
DATABASE_URL = "postgresql+asyncpg://username:password@host:port/database"
|
|
```
|
|
|
|
**Required Extensions**:
|
|
```sql
|
|
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
|
|
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
|
|
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
|
|
```
|
|
|
|
### Schema Design Standards
|
|
|
|
**Time-Series Tables (TimescaleDB)**:
|
|
```sql
|
|
-- Market data with time-based partitioning
|
|
CREATE TABLE market_data (
|
|
id UUID PRIMARY KEY DEFAULT uuid7(),
|
|
symbol VARCHAR(20) NOT NULL,
|
|
timestamp TIMESTAMPTZ NOT NULL,
|
|
price DECIMAL(18,8),
|
|
volume BIGINT,
|
|
-- Metadata
|
|
created_at TIMESTAMPTZ DEFAULT NOW()
|
|
);
|
|
|
|
-- Convert to hypertable for time-series optimization
|
|
SELECT create_hypertable('market_data', 'timestamp');
|
|
|
|
-- Indexes for common query patterns
|
|
CREATE INDEX ON market_data (symbol, timestamp DESC);
|
|
```
|
|
|
|
**Vector-Enabled Tables**:
|
|
```sql
|
|
-- News articles with embeddings
|
|
CREATE TABLE news_articles (
|
|
id UUID PRIMARY KEY DEFAULT uuid7(),
|
|
headline TEXT NOT NULL,
|
|
url TEXT UNIQUE NOT NULL, -- Deduplication key
|
|
published_date DATE NOT NULL,
|
|
title_embedding VECTOR(1536), -- OpenAI embedding size
|
|
content_embedding VECTOR(1536),
|
|
-- TimescaleDB partitioning on published_date
|
|
created_at TIMESTAMPTZ DEFAULT NOW()
|
|
);
|
|
|
|
-- Vector similarity index
|
|
CREATE INDEX ON news_articles USING ivfflat (title_embedding vector_cosine_ops);
|
|
```
|
|
|
|
**Composite Indexes for Query Optimization**:
|
|
```sql
|
|
-- Common query patterns
|
|
CREATE INDEX idx_symbol_date ON news_articles (symbol, published_date);
|
|
CREATE INDEX idx_published_date ON news_articles (published_date);
|
|
CREATE INDEX idx_url_unique ON news_articles (url);
|
|
```
|
|
|
|
### Connection Management
|
|
|
|
**Async Session Factory**:
|
|
```python
|
|
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
|
|
|
|
class DatabaseManager:
|
|
def __init__(self, database_url: str, echo: bool = False):
|
|
# Ensure asyncpg driver
|
|
if not database_url.startswith("postgresql+asyncpg://"):
|
|
database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")
|
|
|
|
self.engine = create_async_engine(
|
|
database_url,
|
|
echo=echo,
|
|
pool_recycle=3600, # 1-hour connection recycling
|
|
pool_pre_ping=True, # Connection health checks
|
|
)
|
|
|
|
self.AsyncSessionLocal = async_sessionmaker(
|
|
bind=self.engine,
|
|
class_=AsyncSession,
|
|
autocommit=False,
|
|
autoflush=False,
|
|
)
|
|
```
|
|
|
|
**Session Context Management**:
|
|
```python
|
|
@asynccontextmanager
|
|
async def get_session(self) -> AsyncGenerator[AsyncSession, None]:
|
|
"""Type-checker friendly session management"""
|
|
session = self.AsyncSessionLocal()
|
|
try:
|
|
yield session
|
|
await session.commit()
|
|
except Exception:
|
|
await session.rollback()
|
|
raise
|
|
finally:
|
|
await session.close()
|
|
```
|
|
|
|
## LLM Integration Standards
|
|
|
|
### OpenRouter as Unified Provider
|
|
|
|
**Configuration**:
|
|
```python
|
|
# Environment variables
|
|
OPENROUTER_API_KEY = "your_openrouter_key"
|
|
LLM_PROVIDER = "openrouter"
|
|
DEEP_THINK_LLM = "openai/gpt-4o" # Complex analysis
|
|
QUICK_THINK_LLM = "openai/gpt-4o-mini" # Fast responses
|
|
BACKEND_URL = "https://openrouter.ai/api/v1"
|
|
```
|
|
|
|
**Model Selection Strategy**:
|
|
- **Deep Think**: Complex reasoning, debates, risk analysis (`openai/gpt-4o`, `anthropic/claude-3.5-sonnet`)
|
|
- **Quick Think**: Data formatting, simple queries (`openai/gpt-4o-mini`, `anthropic/claude-3-haiku`)
|
|
|
|
**Cost Optimization**:
|
|
```python
|
|
# Development/testing configuration
|
|
config = TradingAgentsConfig(
|
|
llm_provider="openrouter",
|
|
deep_think_llm="openai/gpt-4o-mini", # Lower cost
|
|
quick_think_llm="openai/gpt-4o-mini", # Consistent model
|
|
max_debate_rounds=1, # Reduce API calls
|
|
online_tools=False, # Use cached data
|
|
)
|
|
```
|
|
|
|
### Agent Integration Patterns
|
|
|
|
**Anti-Corruption Layer**:
|
|
```python
|
|
class AgentToolkit:
|
|
"""Mediates between LLM agents and domain services"""
|
|
|
|
def __init__(self, config: TradingAgentsConfig):
|
|
self.config = config
|
|
self.services = self._initialize_services()
|
|
|
|
async def get_news_context(self, symbol: str, date: date) -> dict:
|
|
"""Convert domain models to structured LLM context"""
|
|
articles = await self.news_service.get_articles(symbol, date)
|
|
|
|
return {
|
|
"articles": [article.to_dict() for article in articles],
|
|
"count": len(articles),
|
|
"data_quality": self._assess_data_quality(articles),
|
|
"source_distribution": self._analyze_sources(articles)
|
|
}
|
|
```
|
|
|
|
## Layered Architecture Enforcement
|
|
|
|
### Standard Layer Pattern
|
|
|
|
**Data Flow**: `Request → Router → Service → Repository → Entity → Database`
|
|
|
|
**Component Responsibilities**:
|
|
|
|
1. **Entity (Domain Model)**:
|
|
```python
|
|
@dataclass
|
|
class NewsArticle:
|
|
"""Domain entity with business rules and transformations"""
|
|
|
|
headline: str
|
|
url: str
|
|
published_date: date
|
|
sentiment_score: float | None = None
|
|
|
|
def to_entity(self, symbol: str | None = None) -> NewsArticleEntity:
|
|
"""Transform to database model"""
|
|
return NewsArticleEntity(
|
|
headline=self.headline,
|
|
url=self.url,
|
|
published_date=self.published_date,
|
|
symbol=symbol
|
|
)
|
|
|
|
@staticmethod
|
|
def from_entity(entity: NewsArticleEntity) -> 'NewsArticle':
|
|
"""Transform from database model"""
|
|
return NewsArticle(
|
|
headline=entity.headline,
|
|
url=entity.url,
|
|
published_date=entity.published_date,
|
|
sentiment_score=entity.sentiment_score
|
|
)
|
|
|
|
def validate(self) -> list[str]:
|
|
"""Business rule validation"""
|
|
errors = []
|
|
if not self.headline.strip():
|
|
errors.append("Headline cannot be empty")
|
|
if not self.url.startswith(("http://", "https://")):
|
|
errors.append("Invalid URL format")
|
|
return errors
|
|
```
|
|
|
|
2. **Repository (Data Access)**:
|
|
```python
|
|
class NewsRepository:
|
|
"""Handles data persistence with async operations"""
|
|
|
|
def __init__(self, database_manager: DatabaseManager):
|
|
self.db_manager = database_manager
|
|
|
|
async def list(self, symbol: str, date: date) -> list[NewsArticle]:
|
|
"""Query with proper error handling and logging"""
|
|
async with self.db_manager.get_session() as session:
|
|
result = await session.execute(
|
|
select(NewsArticleEntity)
|
|
.filter(and_(
|
|
NewsArticleEntity.symbol == symbol,
|
|
NewsArticleEntity.published_date == date
|
|
))
|
|
.order_by(NewsArticleEntity.published_date.desc())
|
|
)
|
|
entities = result.scalars().all()
|
|
return [NewsArticle.from_entity(e) for e in entities]
|
|
|
|
async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]:
|
|
"""Bulk operations for performance"""
|
|
if not articles:
|
|
return []
|
|
|
|
async with self.db_manager.get_session() as session:
|
|
# Use PostgreSQL ON CONFLICT for atomic upserts
|
|
stmt = insert(NewsArticleEntity).values([
|
|
article.to_entity(symbol).__dict__ for article in articles
|
|
])
|
|
upsert_stmt = stmt.on_conflict_do_update(
|
|
index_elements=["url"],
|
|
set_={k: stmt.excluded[k] for k in stmt.excluded.keys()}
|
|
).returning(NewsArticleEntity)
|
|
|
|
result = await session.execute(upsert_stmt)
|
|
entities = result.scalars().all()
|
|
return [NewsArticle.from_entity(e) for e in entities]
|
|
```
|
|
|
|
3. **Service (Business Logic)**:
|
|
```python
|
|
class NewsService:
|
|
"""Orchestrates business operations"""
|
|
|
|
def __init__(self, repository: NewsRepository, clients: dict):
|
|
self.repository = repository
|
|
self.clients = clients
|
|
|
|
async def get_articles(self, symbol: str, date: date) -> list[NewsArticle]:
|
|
"""Business logic with error handling"""
|
|
try:
|
|
articles = await self.repository.list(symbol, date)
|
|
logger.info(f"Retrieved {len(articles)} articles for {symbol}")
|
|
return articles
|
|
except Exception as e:
|
|
logger.error(f"Failed to get articles for {symbol}: {e}")
|
|
return [] # Graceful degradation
|
|
|
|
async def update_articles(self, symbol: str, date: date) -> int:
|
|
"""Coordinated data refresh"""
|
|
new_articles = await self._fetch_from_sources(symbol, date)
|
|
if new_articles:
|
|
stored = await self.repository.upsert_batch(new_articles, symbol)
|
|
return len(stored)
|
|
return 0
|
|
```
|
|
|
|
### Domain Isolation
|
|
|
|
**Three Core Domains**:
|
|
|
|
1. **News Domain** (`tradingagents/domains/news/`)
|
|
2. **Market Data Domain** (`tradingagents/domains/marketdata/`)
|
|
3. **Social Media Domain** (`tradingagents/domains/socialmedia/`)
|
|
|
|
**Domain Boundary Rules**:
|
|
- Domains communicate through service interfaces only
|
|
- No direct database access between domains
|
|
- Shared types in `tradingagents/types/`
|
|
- Domain events for loose coupling
|
|
|
|
## Vector Integration and RAG Patterns
|
|
|
|
### Vector Embedding Storage
|
|
|
|
**OpenAI Embeddings (1536 dimensions)**:
|
|
```python
|
|
# Entity definition
|
|
class NewsArticleEntity(Base):
|
|
title_embedding: Mapped[list[float] | None] = mapped_column(
|
|
Vector(1536), nullable=True
|
|
)
|
|
content_embedding: Mapped[list[float] | None] = mapped_column(
|
|
Vector(1536), nullable=True
|
|
)
|
|
|
|
# Similarity search
|
|
async def find_similar_articles(self, query_embedding: list[float], limit: int = 10) -> list[NewsArticle]:
|
|
async with self.db_manager.get_session() as session:
|
|
result = await session.execute(
|
|
select(NewsArticleEntity)
|
|
.order_by(NewsArticleEntity.title_embedding.cosine_distance(query_embedding))
|
|
.limit(limit)
|
|
)
|
|
return [NewsArticle.from_entity(e) for e in result.scalars()]
|
|
```
|
|
|
|
### RAG Context Assembly
|
|
|
|
**Agent Context Pattern**:
|
|
```python
|
|
async def build_agent_context(self, symbol: str, date: date) -> dict:
|
|
"""Assemble multi-source context for agents"""
|
|
|
|
# Recent news with embeddings
|
|
news_articles = await self.news_service.get_articles(symbol, date)
|
|
|
|
# Market data
|
|
market_data = await self.market_service.get_recent_data(symbol, days=30)
|
|
|
|
# Social sentiment
|
|
social_data = await self.social_service.get_sentiment(symbol, date)
|
|
|
|
return {
|
|
"news": {
|
|
"articles": [a.to_dict() for a in news_articles],
|
|
"sentiment_avg": sum(a.sentiment_score or 0 for a in news_articles) / len(news_articles),
|
|
"sources": list({a.source for a in news_articles})
|
|
},
|
|
"market": {
|
|
"current_price": market_data.current_price,
|
|
"volatility": market_data.volatility_30d,
|
|
"volume_trend": market_data.volume_trend
|
|
},
|
|
"social": {
|
|
"reddit_sentiment": social_data.reddit_score,
|
|
"twitter_mentions": social_data.twitter_mentions
|
|
},
|
|
"context_quality": self._assess_context_quality(news_articles, market_data, social_data)
|
|
}
|
|
```
|
|
|
|
## Migration and Deployment Standards
|
|
|
|
### Database Migrations
|
|
|
|
**Alembic Configuration**:
|
|
```python
|
|
# alembic/env.py
|
|
import asyncio
|
|
from sqlalchemy.ext.asyncio import create_async_engine
|
|
from tradingagents.lib.database import Base
|
|
|
|
def run_async_migrations():
|
|
config = context.config
|
|
database_url = config.get_main_option("sqlalchemy.url")
|
|
|
|
# Ensure asyncpg driver
|
|
if database_url.startswith("postgresql://"):
|
|
database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")
|
|
|
|
engine = create_async_engine(database_url)
|
|
|
|
async def do_run_migrations():
|
|
async with engine.begin() as connection:
|
|
await connection.run_sync(do_run_migrations_sync)
|
|
|
|
asyncio.run(do_run_migrations())
|
|
```
|
|
|
|
**TimescaleDB-Specific Migrations**:
|
|
```python
|
|
"""Add TimescaleDB hypertable
|
|
|
|
Revision ID: 001
|
|
"""
|
|
|
|
def upgrade():
|
|
# Create table first
|
|
op.create_table(
|
|
'market_data',
|
|
sa.Column('id', postgresql.UUID(), nullable=False),
|
|
sa.Column('symbol', sa.String(20), nullable=False),
|
|
sa.Column('timestamp', sa.TIMESTAMP(timezone=True), nullable=False),
|
|
sa.Column('price', sa.Numeric(18, 8)),
|
|
sa.PrimaryKeyConstraint('id')
|
|
)
|
|
|
|
# Convert to hypertable
|
|
op.execute("SELECT create_hypertable('market_data', 'timestamp');")
|
|
|
|
# Add indexes
|
|
op.create_index('idx_market_symbol_time', 'market_data', ['symbol', 'timestamp'])
|
|
```
|
|
|
|
### Docker Configuration
|
|
|
|
**Development Environment**:
|
|
```yaml
|
|
# docker-compose.yml
|
|
services:
|
|
timescaledb:
|
|
build: ./db
|
|
container_name: tradingagents_timescaledb
|
|
environment:
|
|
POSTGRES_USER: postgres
|
|
POSTGRES_PASSWORD: tradingagents
|
|
POSTGRES_DB: tradingagents
|
|
ports:
|
|
- "5432:5432"
|
|
volumes:
|
|
- ./seed.sql:/docker-entrypoint-initdb.d/seed.sql
|
|
- timescale_data:/var/lib/postgresql/data
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U postgres -d tradingagents"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
```
|
|
|
|
### Environment Configuration
|
|
|
|
**Required Environment Variables**:
|
|
```bash
|
|
# Database
|
|
DATABASE_URL=postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents
|
|
|
|
# OpenRouter LLM
|
|
OPENROUTER_API_KEY=your_openrouter_key
|
|
LLM_PROVIDER=openrouter
|
|
DEEP_THINK_LLM=openai/gpt-4o
|
|
QUICK_THINK_LLM=openai/gpt-4o-mini
|
|
BACKEND_URL=https://openrouter.ai/api/v1
|
|
|
|
# Application
|
|
TRADINGAGENTS_RESULTS_DIR=./results
|
|
TRADINGAGENTS_DATA_DIR=./data
|
|
DEFAULT_LOOKBACK_DAYS=30
|
|
ONLINE_TOOLS=true
|
|
|
|
# Performance
|
|
MAX_DEBATE_ROUNDS=1
|
|
MAX_RISK_DISCUSS_ROUNDS=1
|
|
```
|
|
|
|
## Quality Gates
|
|
|
|
### Database Performance
|
|
|
|
**Query Performance Standards**:
|
|
- Simple queries: < 100ms
|
|
- Complex aggregations: < 500ms
|
|
- Vector similarity searches: < 1s
|
|
- Batch operations: < 5s for 1000 records
|
|
|
|
**Monitoring Queries**:
|
|
```sql
|
|
-- Query performance monitoring
|
|
SELECT query, mean_exec_time, calls, total_exec_time
|
|
FROM pg_stat_statements
|
|
WHERE mean_exec_time > 100
|
|
ORDER BY mean_exec_time DESC;
|
|
|
|
-- TimescaleDB chunk information
|
|
SELECT * FROM chunk_relation_size('market_data');
|
|
```
|
|
|
|
### Connection Health
|
|
|
|
**Health Check Implementation**:
|
|
```python
|
|
async def health_check() -> dict:
|
|
"""Comprehensive system health check"""
|
|
checks = {}
|
|
|
|
# Database connectivity
|
|
try:
|
|
async with db_manager.get_session() as session:
|
|
await session.execute(text("SELECT 1"))
|
|
checks["database"] = {"status": "healthy", "latency_ms": None}
|
|
except Exception as e:
|
|
checks["database"] = {"status": "unhealthy", "error": str(e)}
|
|
|
|
# OpenRouter API
|
|
try:
|
|
# Test API connection
|
|
checks["llm_api"] = {"status": "healthy"}
|
|
except Exception as e:
|
|
checks["llm_api"] = {"status": "unhealthy", "error": str(e)}
|
|
|
|
return checks
|
|
```
|
|
|
|
### Data Quality Enforcement
|
|
|
|
**Validation Pipeline**:
|
|
```python
|
|
class DataQualityValidator:
|
|
"""Ensures data meets quality standards before storage"""
|
|
|
|
def validate_news_article(self, article: NewsArticle) -> list[str]:
|
|
errors = []
|
|
|
|
# Business rules
|
|
if not article.headline.strip():
|
|
errors.append("Empty headline")
|
|
|
|
if len(article.headline) > 500:
|
|
errors.append("Headline too long")
|
|
|
|
if article.sentiment_score and not (-1 <= article.sentiment_score <= 1):
|
|
errors.append("Invalid sentiment score range")
|
|
|
|
# Data freshness
|
|
if article.published_date > date.today():
|
|
errors.append("Future publication date")
|
|
|
|
return errors
|
|
```
|
|
|
|
This technical standards document provides the foundation for maintaining consistency across the TradingAgents codebase while ensuring optimal performance for financial data processing and AI agent operations. |