16 KiB
Technical Standards - TradingAgents
Database Architecture
Core Stack: PostgreSQL + TimescaleDB + pgvectorscale
Primary Database: PostgreSQL 16+ with TimescaleDB and pgvector extensions
- TimescaleDB: Optimized for time-series financial data (prices, volumes, news timestamps)
- pgvector/pgvectorscale: Vector embeddings for RAG-powered agents
- Connection: asyncpg driver for high-performance async operations
Database URL Pattern:
# Development
DATABASE_URL = "postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents"
# Production
DATABASE_URL = "postgresql+asyncpg://username:password@host:port/database"
Required Extensions:
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Schema Design Standards
Time-Series Tables (TimescaleDB):
-- Market data with time-based partitioning
CREATE TABLE market_data (
id UUID PRIMARY KEY DEFAULT uuid7(),
symbol VARCHAR(20) NOT NULL,
timestamp TIMESTAMPTZ NOT NULL,
price DECIMAL(18,8),
volume BIGINT,
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Convert to hypertable for time-series optimization
SELECT create_hypertable('market_data', 'timestamp');
-- Indexes for common query patterns
CREATE INDEX ON market_data (symbol, timestamp DESC);
Vector-Enabled Tables:
-- News articles with embeddings
CREATE TABLE news_articles (
id UUID PRIMARY KEY DEFAULT uuid7(),
headline TEXT NOT NULL,
url TEXT UNIQUE NOT NULL, -- Deduplication key
published_date DATE NOT NULL,
title_embedding VECTOR(1536), -- OpenAI embedding size
content_embedding VECTOR(1536),
-- TimescaleDB partitioning on published_date
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Vector similarity index
CREATE INDEX ON news_articles USING ivfflat (title_embedding vector_cosine_ops);
Composite Indexes for Query Optimization:
-- Common query patterns
CREATE INDEX idx_symbol_date ON news_articles (symbol, published_date);
CREATE INDEX idx_published_date ON news_articles (published_date);
CREATE INDEX idx_url_unique ON news_articles (url);
Connection Management
Async Session Factory:
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
class DatabaseManager:
def __init__(self, database_url: str, echo: bool = False):
# Ensure asyncpg driver
if not database_url.startswith("postgresql+asyncpg://"):
database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")
self.engine = create_async_engine(
database_url,
echo=echo,
pool_recycle=3600, # 1-hour connection recycling
pool_pre_ping=True, # Connection health checks
)
self.AsyncSessionLocal = async_sessionmaker(
bind=self.engine,
class_=AsyncSession,
autocommit=False,
autoflush=False,
)
Session Context Management:
@asynccontextmanager
async def get_session(self) -> AsyncGenerator[AsyncSession, None]:
"""Type-checker friendly session management"""
session = self.AsyncSessionLocal()
try:
yield session
await session.commit()
except Exception:
await session.rollback()
raise
finally:
await session.close()
LLM Integration Standards
OpenRouter as Unified Provider
Configuration:
# Environment variables
OPENROUTER_API_KEY = "your_openrouter_key"
LLM_PROVIDER = "openrouter"
DEEP_THINK_LLM = "openai/gpt-4o" # Complex analysis
QUICK_THINK_LLM = "openai/gpt-4o-mini" # Fast responses
BACKEND_URL = "https://openrouter.ai/api/v1"
Model Selection Strategy:
- Deep Think: Complex reasoning, debates, risk analysis (
openai/gpt-4o,anthropic/claude-3.5-sonnet) - Quick Think: Data formatting, simple queries (
openai/gpt-4o-mini,anthropic/claude-3-haiku)
Cost Optimization:
# Development/testing configuration
config = TradingAgentsConfig(
llm_provider="openrouter",
deep_think_llm="openai/gpt-4o-mini", # Lower cost
quick_think_llm="openai/gpt-4o-mini", # Consistent model
max_debate_rounds=1, # Reduce API calls
online_tools=False, # Use cached data
)
Agent Integration Patterns
Anti-Corruption Layer:
class AgentToolkit:
"""Mediates between LLM agents and domain services"""
def __init__(self, config: TradingAgentsConfig):
self.config = config
self.services = self._initialize_services()
async def get_news_context(self, symbol: str, date: date) -> dict:
"""Convert domain models to structured LLM context"""
articles = await self.news_service.get_articles(symbol, date)
return {
"articles": [article.to_dict() for article in articles],
"count": len(articles),
"data_quality": self._assess_data_quality(articles),
"source_distribution": self._analyze_sources(articles)
}
Layered Architecture Enforcement
Standard Layer Pattern
Data Flow: Request → Router → Service → Repository → Entity → Database
Component Responsibilities:
- Entity (Domain Model):
@dataclass
class NewsArticle:
"""Domain entity with business rules and transformations"""
headline: str
url: str
published_date: date
sentiment_score: float | None = None
def to_entity(self, symbol: str | None = None) -> NewsArticleEntity:
"""Transform to database model"""
return NewsArticleEntity(
headline=self.headline,
url=self.url,
published_date=self.published_date,
symbol=symbol
)
@staticmethod
def from_entity(entity: NewsArticleEntity) -> 'NewsArticle':
"""Transform from database model"""
return NewsArticle(
headline=entity.headline,
url=entity.url,
published_date=entity.published_date,
sentiment_score=entity.sentiment_score
)
def validate(self) -> list[str]:
"""Business rule validation"""
errors = []
if not self.headline.strip():
errors.append("Headline cannot be empty")
if not self.url.startswith(("http://", "https://")):
errors.append("Invalid URL format")
return errors
- Repository (Data Access):
class NewsRepository:
"""Handles data persistence with async operations"""
def __init__(self, database_manager: DatabaseManager):
self.db_manager = database_manager
async def list(self, symbol: str, date: date) -> list[NewsArticle]:
"""Query with proper error handling and logging"""
async with self.db_manager.get_session() as session:
result = await session.execute(
select(NewsArticleEntity)
.filter(and_(
NewsArticleEntity.symbol == symbol,
NewsArticleEntity.published_date == date
))
.order_by(NewsArticleEntity.published_date.desc())
)
entities = result.scalars().all()
return [NewsArticle.from_entity(e) for e in entities]
async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]:
"""Bulk operations for performance"""
if not articles:
return []
async with self.db_manager.get_session() as session:
# Use PostgreSQL ON CONFLICT for atomic upserts
stmt = insert(NewsArticleEntity).values([
article.to_entity(symbol).__dict__ for article in articles
])
upsert_stmt = stmt.on_conflict_do_update(
index_elements=["url"],
set_={k: stmt.excluded[k] for k in stmt.excluded.keys()}
).returning(NewsArticleEntity)
result = await session.execute(upsert_stmt)
entities = result.scalars().all()
return [NewsArticle.from_entity(e) for e in entities]
- Service (Business Logic):
class NewsService:
"""Orchestrates business operations"""
def __init__(self, repository: NewsRepository, clients: dict):
self.repository = repository
self.clients = clients
async def get_articles(self, symbol: str, date: date) -> list[NewsArticle]:
"""Business logic with error handling"""
try:
articles = await self.repository.list(symbol, date)
logger.info(f"Retrieved {len(articles)} articles for {symbol}")
return articles
except Exception as e:
logger.error(f"Failed to get articles for {symbol}: {e}")
return [] # Graceful degradation
async def update_articles(self, symbol: str, date: date) -> int:
"""Coordinated data refresh"""
new_articles = await self._fetch_from_sources(symbol, date)
if new_articles:
stored = await self.repository.upsert_batch(new_articles, symbol)
return len(stored)
return 0
Domain Isolation
Three Core Domains:
- News Domain (
tradingagents/domains/news/) - Market Data Domain (
tradingagents/domains/marketdata/) - Social Media Domain (
tradingagents/domains/socialmedia/)
Domain Boundary Rules:
- Domains communicate through service interfaces only
- No direct database access between domains
- Shared types in
tradingagents/types/ - Domain events for loose coupling
Vector Integration and RAG Patterns
Vector Embedding Storage
OpenAI Embeddings (1536 dimensions):
# Entity definition
class NewsArticleEntity(Base):
title_embedding: Mapped[list[float] | None] = mapped_column(
Vector(1536), nullable=True
)
content_embedding: Mapped[list[float] | None] = mapped_column(
Vector(1536), nullable=True
)
# Similarity search
async def find_similar_articles(self, query_embedding: list[float], limit: int = 10) -> list[NewsArticle]:
async with self.db_manager.get_session() as session:
result = await session.execute(
select(NewsArticleEntity)
.order_by(NewsArticleEntity.title_embedding.cosine_distance(query_embedding))
.limit(limit)
)
return [NewsArticle.from_entity(e) for e in result.scalars()]
RAG Context Assembly
Agent Context Pattern:
async def build_agent_context(self, symbol: str, date: date) -> dict:
"""Assemble multi-source context for agents"""
# Recent news with embeddings
news_articles = await self.news_service.get_articles(symbol, date)
# Market data
market_data = await self.market_service.get_recent_data(symbol, days=30)
# Social sentiment
social_data = await self.social_service.get_sentiment(symbol, date)
return {
"news": {
"articles": [a.to_dict() for a in news_articles],
"sentiment_avg": sum(a.sentiment_score or 0 for a in news_articles) / len(news_articles),
"sources": list({a.source for a in news_articles})
},
"market": {
"current_price": market_data.current_price,
"volatility": market_data.volatility_30d,
"volume_trend": market_data.volume_trend
},
"social": {
"reddit_sentiment": social_data.reddit_score,
"twitter_mentions": social_data.twitter_mentions
},
"context_quality": self._assess_context_quality(news_articles, market_data, social_data)
}
Migration and Deployment Standards
Database Migrations
Alembic Configuration:
# alembic/env.py
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine
from tradingagents.lib.database import Base
def run_async_migrations():
config = context.config
database_url = config.get_main_option("sqlalchemy.url")
# Ensure asyncpg driver
if database_url.startswith("postgresql://"):
database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")
engine = create_async_engine(database_url)
async def do_run_migrations():
async with engine.begin() as connection:
await connection.run_sync(do_run_migrations_sync)
asyncio.run(do_run_migrations())
TimescaleDB-Specific Migrations:
"""Add TimescaleDB hypertable
Revision ID: 001
"""
def upgrade():
# Create table first
op.create_table(
'market_data',
sa.Column('id', postgresql.UUID(), nullable=False),
sa.Column('symbol', sa.String(20), nullable=False),
sa.Column('timestamp', sa.TIMESTAMP(timezone=True), nullable=False),
sa.Column('price', sa.Numeric(18, 8)),
sa.PrimaryKeyConstraint('id')
)
# Convert to hypertable
op.execute("SELECT create_hypertable('market_data', 'timestamp');")
# Add indexes
op.create_index('idx_market_symbol_time', 'market_data', ['symbol', 'timestamp'])
Docker Configuration
Development Environment:
# docker-compose.yml
services:
timescaledb:
build: ./db
container_name: tradingagents_timescaledb
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: tradingagents
POSTGRES_DB: tradingagents
ports:
- "5432:5432"
volumes:
- ./seed.sql:/docker-entrypoint-initdb.d/seed.sql
- timescale_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d tradingagents"]
interval: 30s
timeout: 10s
retries: 3
Environment Configuration
Required Environment Variables:
# Database
DATABASE_URL=postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents
# OpenRouter LLM
OPENROUTER_API_KEY=your_openrouter_key
LLM_PROVIDER=openrouter
DEEP_THINK_LLM=openai/gpt-4o
QUICK_THINK_LLM=openai/gpt-4o-mini
BACKEND_URL=https://openrouter.ai/api/v1
# Application
TRADINGAGENTS_RESULTS_DIR=./results
TRADINGAGENTS_DATA_DIR=./data
DEFAULT_LOOKBACK_DAYS=30
ONLINE_TOOLS=true
# Performance
MAX_DEBATE_ROUNDS=1
MAX_RISK_DISCUSS_ROUNDS=1
Quality Gates
Database Performance
Query Performance Standards:
- Simple queries: < 100ms
- Complex aggregations: < 500ms
- Vector similarity searches: < 1s
- Batch operations: < 5s for 1000 records
Monitoring Queries:
-- Query performance monitoring
SELECT query, mean_exec_time, calls, total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;
-- TimescaleDB chunk information
SELECT * FROM chunk_relation_size('market_data');
Connection Health
Health Check Implementation:
async def health_check() -> dict:
"""Comprehensive system health check"""
checks = {}
# Database connectivity
try:
async with db_manager.get_session() as session:
await session.execute(text("SELECT 1"))
checks["database"] = {"status": "healthy", "latency_ms": None}
except Exception as e:
checks["database"] = {"status": "unhealthy", "error": str(e)}
# OpenRouter API
try:
# Test API connection
checks["llm_api"] = {"status": "healthy"}
except Exception as e:
checks["llm_api"] = {"status": "unhealthy", "error": str(e)}
return checks
Data Quality Enforcement
Validation Pipeline:
class DataQualityValidator:
"""Ensures data meets quality standards before storage"""
def validate_news_article(self, article: NewsArticle) -> list[str]:
errors = []
# Business rules
if not article.headline.strip():
errors.append("Empty headline")
if len(article.headline) > 500:
errors.append("Headline too long")
if article.sentiment_score and not (-1 <= article.sentiment_score <= 1):
errors.append("Invalid sentiment score range")
# Data freshness
if article.published_date > date.today():
errors.append("Future publication date")
return errors
This technical standards document provides the foundation for maintaining consistency across the TradingAgents codebase while ensuring optimal performance for financial data processing and AI agent operations.