# Technical Standards - TradingAgents ## Database Architecture ### Core Stack: PostgreSQL + TimescaleDB + pgvectorscale **Primary Database**: PostgreSQL 16+ with TimescaleDB and pgvector extensions - **TimescaleDB**: Optimized for time-series financial data (prices, volumes, news timestamps) - **pgvector/pgvectorscale**: Vector embeddings for RAG-powered agents - **Connection**: asyncpg driver for high-performance async operations **Database URL Pattern**: ```python # Development DATABASE_URL = "postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents" # Production DATABASE_URL = "postgresql+asyncpg://username:password@host:port/database" ``` **Required Extensions**: ```sql CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE; CREATE EXTENSION IF NOT EXISTS vector CASCADE; CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; ``` ### Schema Design Standards **Time-Series Tables (TimescaleDB)**: ```sql -- Market data with time-based partitioning CREATE TABLE market_data ( id UUID PRIMARY KEY DEFAULT uuid7(), symbol VARCHAR(20) NOT NULL, timestamp TIMESTAMPTZ NOT NULL, price DECIMAL(18,8), volume BIGINT, -- Metadata created_at TIMESTAMPTZ DEFAULT NOW() ); -- Convert to hypertable for time-series optimization SELECT create_hypertable('market_data', 'timestamp'); -- Indexes for common query patterns CREATE INDEX ON market_data (symbol, timestamp DESC); ``` **Vector-Enabled Tables**: ```sql -- News articles with embeddings CREATE TABLE news_articles ( id UUID PRIMARY KEY DEFAULT uuid7(), headline TEXT NOT NULL, url TEXT UNIQUE NOT NULL, -- Deduplication key published_date DATE NOT NULL, title_embedding VECTOR(1536), -- OpenAI embedding size content_embedding VECTOR(1536), -- TimescaleDB partitioning on published_date created_at TIMESTAMPTZ DEFAULT NOW() ); -- Vector similarity index CREATE INDEX ON news_articles USING ivfflat (title_embedding vector_cosine_ops); ``` **Composite Indexes for Query Optimization**: ```sql -- Common query patterns CREATE INDEX idx_symbol_date ON news_articles (symbol, published_date); CREATE INDEX idx_published_date ON news_articles (published_date); CREATE INDEX idx_url_unique ON news_articles (url); ``` ### Connection Management **Async Session Factory**: ```python from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine class DatabaseManager: def __init__(self, database_url: str, echo: bool = False): # Ensure asyncpg driver if not database_url.startswith("postgresql+asyncpg://"): database_url = database_url.replace("postgresql://", "postgresql+asyncpg://") self.engine = create_async_engine( database_url, echo=echo, pool_recycle=3600, # 1-hour connection recycling pool_pre_ping=True, # Connection health checks ) self.AsyncSessionLocal = async_sessionmaker( bind=self.engine, class_=AsyncSession, autocommit=False, autoflush=False, ) ``` **Session Context Management**: ```python @asynccontextmanager async def get_session(self) -> AsyncGenerator[AsyncSession, None]: """Type-checker friendly session management""" session = self.AsyncSessionLocal() try: yield session await session.commit() except Exception: await session.rollback() raise finally: await session.close() ``` ## LLM Integration Standards ### OpenRouter as Unified Provider **Configuration**: ```python # Environment variables OPENROUTER_API_KEY = "your_openrouter_key" LLM_PROVIDER = "openrouter" DEEP_THINK_LLM = "openai/gpt-4o" # Complex analysis QUICK_THINK_LLM = "openai/gpt-4o-mini" # Fast responses BACKEND_URL = "https://openrouter.ai/api/v1" ``` **Model Selection Strategy**: - **Deep Think**: Complex reasoning, debates, risk analysis (`openai/gpt-4o`, `anthropic/claude-3.5-sonnet`) - **Quick Think**: Data formatting, simple queries (`openai/gpt-4o-mini`, `anthropic/claude-3-haiku`) **Cost Optimization**: ```python # Development/testing configuration config = TradingAgentsConfig( llm_provider="openrouter", deep_think_llm="openai/gpt-4o-mini", # Lower cost quick_think_llm="openai/gpt-4o-mini", # Consistent model max_debate_rounds=1, # Reduce API calls online_tools=False, # Use cached data ) ``` ### Agent Integration Patterns **Anti-Corruption Layer**: ```python class AgentToolkit: """Mediates between LLM agents and domain services""" def __init__(self, config: TradingAgentsConfig): self.config = config self.services = self._initialize_services() async def get_news_context(self, symbol: str, date: date) -> dict: """Convert domain models to structured LLM context""" articles = await self.news_service.get_articles(symbol, date) return { "articles": [article.to_dict() for article in articles], "count": len(articles), "data_quality": self._assess_data_quality(articles), "source_distribution": self._analyze_sources(articles) } ``` ## Layered Architecture Enforcement ### Standard Layer Pattern **Data Flow**: `Request → Router → Service → Repository → Entity → Database` **Component Responsibilities**: 1. **Entity (Domain Model)**: ```python @dataclass class NewsArticle: """Domain entity with business rules and transformations""" headline: str url: str published_date: date sentiment_score: float | None = None def to_entity(self, symbol: str | None = None) -> NewsArticleEntity: """Transform to database model""" return NewsArticleEntity( headline=self.headline, url=self.url, published_date=self.published_date, symbol=symbol ) @staticmethod def from_entity(entity: NewsArticleEntity) -> 'NewsArticle': """Transform from database model""" return NewsArticle( headline=entity.headline, url=entity.url, published_date=entity.published_date, sentiment_score=entity.sentiment_score ) def validate(self) -> list[str]: """Business rule validation""" errors = [] if not self.headline.strip(): errors.append("Headline cannot be empty") if not self.url.startswith(("http://", "https://")): errors.append("Invalid URL format") return errors ``` 2. **Repository (Data Access)**: ```python class NewsRepository: """Handles data persistence with async operations""" def __init__(self, database_manager: DatabaseManager): self.db_manager = database_manager async def list(self, symbol: str, date: date) -> list[NewsArticle]: """Query with proper error handling and logging""" async with self.db_manager.get_session() as session: result = await session.execute( select(NewsArticleEntity) .filter(and_( NewsArticleEntity.symbol == symbol, NewsArticleEntity.published_date == date )) .order_by(NewsArticleEntity.published_date.desc()) ) entities = result.scalars().all() return [NewsArticle.from_entity(e) for e in entities] async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]: """Bulk operations for performance""" if not articles: return [] async with self.db_manager.get_session() as session: # Use PostgreSQL ON CONFLICT for atomic upserts stmt = insert(NewsArticleEntity).values([ article.to_entity(symbol).__dict__ for article in articles ]) upsert_stmt = stmt.on_conflict_do_update( index_elements=["url"], set_={k: stmt.excluded[k] for k in stmt.excluded.keys()} ).returning(NewsArticleEntity) result = await session.execute(upsert_stmt) entities = result.scalars().all() return [NewsArticle.from_entity(e) for e in entities] ``` 3. **Service (Business Logic)**: ```python class NewsService: """Orchestrates business operations""" def __init__(self, repository: NewsRepository, clients: dict): self.repository = repository self.clients = clients async def get_articles(self, symbol: str, date: date) -> list[NewsArticle]: """Business logic with error handling""" try: articles = await self.repository.list(symbol, date) logger.info(f"Retrieved {len(articles)} articles for {symbol}") return articles except Exception as e: logger.error(f"Failed to get articles for {symbol}: {e}") return [] # Graceful degradation async def update_articles(self, symbol: str, date: date) -> int: """Coordinated data refresh""" new_articles = await self._fetch_from_sources(symbol, date) if new_articles: stored = await self.repository.upsert_batch(new_articles, symbol) return len(stored) return 0 ``` ### Domain Isolation **Three Core Domains**: 1. **News Domain** (`tradingagents/domains/news/`) 2. **Market Data Domain** (`tradingagents/domains/marketdata/`) 3. **Social Media Domain** (`tradingagents/domains/socialmedia/`) **Domain Boundary Rules**: - Domains communicate through service interfaces only - No direct database access between domains - Shared types in `tradingagents/types/` - Domain events for loose coupling ## Vector Integration and RAG Patterns ### Vector Embedding Storage **OpenAI Embeddings (1536 dimensions)**: ```python # Entity definition class NewsArticleEntity(Base): title_embedding: Mapped[list[float] | None] = mapped_column( Vector(1536), nullable=True ) content_embedding: Mapped[list[float] | None] = mapped_column( Vector(1536), nullable=True ) # Similarity search async def find_similar_articles(self, query_embedding: list[float], limit: int = 10) -> list[NewsArticle]: async with self.db_manager.get_session() as session: result = await session.execute( select(NewsArticleEntity) .order_by(NewsArticleEntity.title_embedding.cosine_distance(query_embedding)) .limit(limit) ) return [NewsArticle.from_entity(e) for e in result.scalars()] ``` ### RAG Context Assembly **Agent Context Pattern**: ```python async def build_agent_context(self, symbol: str, date: date) -> dict: """Assemble multi-source context for agents""" # Recent news with embeddings news_articles = await self.news_service.get_articles(symbol, date) # Market data market_data = await self.market_service.get_recent_data(symbol, days=30) # Social sentiment social_data = await self.social_service.get_sentiment(symbol, date) return { "news": { "articles": [a.to_dict() for a in news_articles], "sentiment_avg": sum(a.sentiment_score or 0 for a in news_articles) / len(news_articles), "sources": list({a.source for a in news_articles}) }, "market": { "current_price": market_data.current_price, "volatility": market_data.volatility_30d, "volume_trend": market_data.volume_trend }, "social": { "reddit_sentiment": social_data.reddit_score, "twitter_mentions": social_data.twitter_mentions }, "context_quality": self._assess_context_quality(news_articles, market_data, social_data) } ``` ## Migration and Deployment Standards ### Database Migrations **Alembic Configuration**: ```python # alembic/env.py import asyncio from sqlalchemy.ext.asyncio import create_async_engine from tradingagents.lib.database import Base def run_async_migrations(): config = context.config database_url = config.get_main_option("sqlalchemy.url") # Ensure asyncpg driver if database_url.startswith("postgresql://"): database_url = database_url.replace("postgresql://", "postgresql+asyncpg://") engine = create_async_engine(database_url) async def do_run_migrations(): async with engine.begin() as connection: await connection.run_sync(do_run_migrations_sync) asyncio.run(do_run_migrations()) ``` **TimescaleDB-Specific Migrations**: ```python """Add TimescaleDB hypertable Revision ID: 001 """ def upgrade(): # Create table first op.create_table( 'market_data', sa.Column('id', postgresql.UUID(), nullable=False), sa.Column('symbol', sa.String(20), nullable=False), sa.Column('timestamp', sa.TIMESTAMP(timezone=True), nullable=False), sa.Column('price', sa.Numeric(18, 8)), sa.PrimaryKeyConstraint('id') ) # Convert to hypertable op.execute("SELECT create_hypertable('market_data', 'timestamp');") # Add indexes op.create_index('idx_market_symbol_time', 'market_data', ['symbol', 'timestamp']) ``` ### Docker Configuration **Development Environment**: ```yaml # docker-compose.yml services: timescaledb: build: ./db container_name: tradingagents_timescaledb environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: tradingagents POSTGRES_DB: tradingagents ports: - "5432:5432" volumes: - ./seed.sql:/docker-entrypoint-initdb.d/seed.sql - timescale_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d tradingagents"] interval: 30s timeout: 10s retries: 3 ``` ### Environment Configuration **Required Environment Variables**: ```bash # Database DATABASE_URL=postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents # OpenRouter LLM OPENROUTER_API_KEY=your_openrouter_key LLM_PROVIDER=openrouter DEEP_THINK_LLM=openai/gpt-4o QUICK_THINK_LLM=openai/gpt-4o-mini BACKEND_URL=https://openrouter.ai/api/v1 # Application TRADINGAGENTS_RESULTS_DIR=./results TRADINGAGENTS_DATA_DIR=./data DEFAULT_LOOKBACK_DAYS=30 ONLINE_TOOLS=true # Performance MAX_DEBATE_ROUNDS=1 MAX_RISK_DISCUSS_ROUNDS=1 ``` ## Quality Gates ### Database Performance **Query Performance Standards**: - Simple queries: < 100ms - Complex aggregations: < 500ms - Vector similarity searches: < 1s - Batch operations: < 5s for 1000 records **Monitoring Queries**: ```sql -- Query performance monitoring SELECT query, mean_exec_time, calls, total_exec_time FROM pg_stat_statements WHERE mean_exec_time > 100 ORDER BY mean_exec_time DESC; -- TimescaleDB chunk information SELECT * FROM chunk_relation_size('market_data'); ``` ### Connection Health **Health Check Implementation**: ```python async def health_check() -> dict: """Comprehensive system health check""" checks = {} # Database connectivity try: async with db_manager.get_session() as session: await session.execute(text("SELECT 1")) checks["database"] = {"status": "healthy", "latency_ms": None} except Exception as e: checks["database"] = {"status": "unhealthy", "error": str(e)} # OpenRouter API try: # Test API connection checks["llm_api"] = {"status": "healthy"} except Exception as e: checks["llm_api"] = {"status": "unhealthy", "error": str(e)} return checks ``` ### Data Quality Enforcement **Validation Pipeline**: ```python class DataQualityValidator: """Ensures data meets quality standards before storage""" def validate_news_article(self, article: NewsArticle) -> list[str]: errors = [] # Business rules if not article.headline.strip(): errors.append("Empty headline") if len(article.headline) > 500: errors.append("Headline too long") if article.sentiment_score and not (-1 <= article.sentiment_score <= 1): errors.append("Invalid sentiment score range") # Data freshness if article.published_date > date.today(): errors.append("Future publication date") return errors ``` This technical standards document provides the foundation for maintaining consistency across the TradingAgents codebase while ensuring optimal performance for financial data processing and AI agent operations.