16 KiB

Raw Blame History

Technical Standards - TradingAgents

Database Architecture

Core Stack: PostgreSQL + TimescaleDB + pgvectorscale

Primary Database: PostgreSQL 16+ with TimescaleDB and pgvector extensions

TimescaleDB: Optimized for time-series financial data (prices, volumes, news timestamps)
pgvector/pgvectorscale: Vector embeddings for RAG-powered agents
Connection: asyncpg driver for high-performance async operations

Database URL Pattern:

# Development
DATABASE_URL = "postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents"

# Production
DATABASE_URL = "postgresql+asyncpg://username:password@host:port/database"

Required Extensions:

CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Schema Design Standards

Time-Series Tables (TimescaleDB):

-- Market data with time-based partitioning
CREATE TABLE market_data (
    id UUID PRIMARY KEY DEFAULT uuid7(),
    symbol VARCHAR(20) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    price DECIMAL(18,8),
    volume BIGINT,
    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Convert to hypertable for time-series optimization
SELECT create_hypertable('market_data', 'timestamp');

-- Indexes for common query patterns
CREATE INDEX ON market_data (symbol, timestamp DESC);

Vector-Enabled Tables:

-- News articles with embeddings
CREATE TABLE news_articles (
    id UUID PRIMARY KEY DEFAULT uuid7(),
    headline TEXT NOT NULL,
    url TEXT UNIQUE NOT NULL,  -- Deduplication key
    published_date DATE NOT NULL,
    title_embedding VECTOR(1536),  -- OpenAI embedding size
    content_embedding VECTOR(1536),
    -- TimescaleDB partitioning on published_date
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Vector similarity index
CREATE INDEX ON news_articles USING ivfflat (title_embedding vector_cosine_ops);

Composite Indexes for Query Optimization:

-- Common query patterns
CREATE INDEX idx_symbol_date ON news_articles (symbol, published_date);
CREATE INDEX idx_published_date ON news_articles (published_date);
CREATE INDEX idx_url_unique ON news_articles (url);

Connection Management

Async Session Factory:

from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine

class DatabaseManager:
    def __init__(self, database_url: str, echo: bool = False):
        # Ensure asyncpg driver
        if not database_url.startswith("postgresql+asyncpg://"):
            database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")
        
        self.engine = create_async_engine(
            database_url,
            echo=echo,
            pool_recycle=3600,  # 1-hour connection recycling
            pool_pre_ping=True,  # Connection health checks
        )
        
        self.AsyncSessionLocal = async_sessionmaker(
            bind=self.engine,
            class_=AsyncSession,
            autocommit=False,
            autoflush=False,
        )

Session Context Management:

@asynccontextmanager
async def get_session(self) -> AsyncGenerator[AsyncSession, None]:
    """Type-checker friendly session management"""
    session = self.AsyncSessionLocal()
    try:
        yield session
        await session.commit()
    except Exception:
        await session.rollback()
        raise
    finally:
        await session.close()

LLM Integration Standards

OpenRouter as Unified Provider

Configuration:

# Environment variables
OPENROUTER_API_KEY = "your_openrouter_key"
LLM_PROVIDER = "openrouter"
DEEP_THINK_LLM = "openai/gpt-4o"      # Complex analysis
QUICK_THINK_LLM = "openai/gpt-4o-mini" # Fast responses
BACKEND_URL = "https://openrouter.ai/api/v1"

Model Selection Strategy:

Deep Think: Complex reasoning, debates, risk analysis (openai/gpt-4o, anthropic/claude-3.5-sonnet)
Quick Think: Data formatting, simple queries (openai/gpt-4o-mini, anthropic/claude-3-haiku)

Cost Optimization:

# Development/testing configuration
config = TradingAgentsConfig(
    llm_provider="openrouter",
    deep_think_llm="openai/gpt-4o-mini",     # Lower cost
    quick_think_llm="openai/gpt-4o-mini",    # Consistent model
    max_debate_rounds=1,                     # Reduce API calls
    online_tools=False,                      # Use cached data
)

Agent Integration Patterns

Anti-Corruption Layer:

class AgentToolkit:
    """Mediates between LLM agents and domain services"""
    
    def __init__(self, config: TradingAgentsConfig):
        self.config = config
        self.services = self._initialize_services()
    
    async def get_news_context(self, symbol: str, date: date) -> dict:
        """Convert domain models to structured LLM context"""
        articles = await self.news_service.get_articles(symbol, date)
        
        return {
            "articles": [article.to_dict() for article in articles],
            "count": len(articles),
            "data_quality": self._assess_data_quality(articles),
            "source_distribution": self._analyze_sources(articles)
        }

Layered Architecture Enforcement

Standard Layer Pattern

Data Flow: Request → Router → Service → Repository → Entity → Database

Component Responsibilities:

Entity (Domain Model):

@dataclass
class NewsArticle:
    """Domain entity with business rules and transformations"""
    
    headline: str
    url: str
    published_date: date
    sentiment_score: float | None = None
    
    def to_entity(self, symbol: str | None = None) -> NewsArticleEntity:
        """Transform to database model"""
        return NewsArticleEntity(
            headline=self.headline,
            url=self.url,
            published_date=self.published_date,
            symbol=symbol
        )
    
    @staticmethod
    def from_entity(entity: NewsArticleEntity) -> 'NewsArticle':
        """Transform from database model"""
        return NewsArticle(
            headline=entity.headline,
            url=entity.url,
            published_date=entity.published_date,
            sentiment_score=entity.sentiment_score
        )
    
    def validate(self) -> list[str]:
        """Business rule validation"""
        errors = []
        if not self.headline.strip():
            errors.append("Headline cannot be empty")
        if not self.url.startswith(("http://", "https://")):
            errors.append("Invalid URL format")
        return errors

Repository (Data Access):

class NewsRepository:
    """Handles data persistence with async operations"""
    
    def __init__(self, database_manager: DatabaseManager):
        self.db_manager = database_manager
    
    async def list(self, symbol: str, date: date) -> list[NewsArticle]:
        """Query with proper error handling and logging"""
        async with self.db_manager.get_session() as session:
            result = await session.execute(
                select(NewsArticleEntity)
                .filter(and_(
                    NewsArticleEntity.symbol == symbol,
                    NewsArticleEntity.published_date == date
                ))
                .order_by(NewsArticleEntity.published_date.desc())
            )
            entities = result.scalars().all()
            return [NewsArticle.from_entity(e) for e in entities]
    
    async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]:
        """Bulk operations for performance"""
        if not articles:
            return []
        
        async with self.db_manager.get_session() as session:
            # Use PostgreSQL ON CONFLICT for atomic upserts
            stmt = insert(NewsArticleEntity).values([
                article.to_entity(symbol).__dict__ for article in articles
            ])
            upsert_stmt = stmt.on_conflict_do_update(
                index_elements=["url"],
                set_={k: stmt.excluded[k] for k in stmt.excluded.keys()}
            ).returning(NewsArticleEntity)
            
            result = await session.execute(upsert_stmt)
            entities = result.scalars().all()
            return [NewsArticle.from_entity(e) for e in entities]

Service (Business Logic):

class NewsService:
    """Orchestrates business operations"""
    
    def __init__(self, repository: NewsRepository, clients: dict):
        self.repository = repository
        self.clients = clients
    
    async def get_articles(self, symbol: str, date: date) -> list[NewsArticle]:
        """Business logic with error handling"""
        try:
            articles = await self.repository.list(symbol, date)
            logger.info(f"Retrieved {len(articles)} articles for {symbol}")
            return articles
        except Exception as e:
            logger.error(f"Failed to get articles for {symbol}: {e}")
            return []  # Graceful degradation
    
    async def update_articles(self, symbol: str, date: date) -> int:
        """Coordinated data refresh"""
        new_articles = await self._fetch_from_sources(symbol, date)
        if new_articles:
            stored = await self.repository.upsert_batch(new_articles, symbol)
            return len(stored)
        return 0

Domain Isolation

Three Core Domains:

News Domain (tradingagents/domains/news/)
Market Data Domain (tradingagents/domains/marketdata/)
Social Media Domain (tradingagents/domains/socialmedia/)

Domain Boundary Rules:

Domains communicate through service interfaces only
No direct database access between domains
Shared types in tradingagents/types/
Domain events for loose coupling

Vector Integration and RAG Patterns

Vector Embedding Storage

OpenAI Embeddings (1536 dimensions):

# Entity definition
class NewsArticleEntity(Base):
    title_embedding: Mapped[list[float] | None] = mapped_column(
        Vector(1536), nullable=True
    )
    content_embedding: Mapped[list[float] | None] = mapped_column(
        Vector(1536), nullable=True
    )

# Similarity search
async def find_similar_articles(self, query_embedding: list[float], limit: int = 10) -> list[NewsArticle]:
    async with self.db_manager.get_session() as session:
        result = await session.execute(
            select(NewsArticleEntity)
            .order_by(NewsArticleEntity.title_embedding.cosine_distance(query_embedding))
            .limit(limit)
        )
        return [NewsArticle.from_entity(e) for e in result.scalars()]

RAG Context Assembly

Agent Context Pattern:

async def build_agent_context(self, symbol: str, date: date) -> dict:
    """Assemble multi-source context for agents"""
    
    # Recent news with embeddings
    news_articles = await self.news_service.get_articles(symbol, date)
    
    # Market data
    market_data = await self.market_service.get_recent_data(symbol, days=30)
    
    # Social sentiment
    social_data = await self.social_service.get_sentiment(symbol, date)
    
    return {
        "news": {
            "articles": [a.to_dict() for a in news_articles],
            "sentiment_avg": sum(a.sentiment_score or 0 for a in news_articles) / len(news_articles),
            "sources": list({a.source for a in news_articles})
        },
        "market": {
            "current_price": market_data.current_price,
            "volatility": market_data.volatility_30d,
            "volume_trend": market_data.volume_trend
        },
        "social": {
            "reddit_sentiment": social_data.reddit_score,
            "twitter_mentions": social_data.twitter_mentions
        },
        "context_quality": self._assess_context_quality(news_articles, market_data, social_data)
    }

Migration and Deployment Standards

Database Migrations

Alembic Configuration:

# alembic/env.py
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine
from tradingagents.lib.database import Base

def run_async_migrations():
    config = context.config
    database_url = config.get_main_option("sqlalchemy.url")
    
    # Ensure asyncpg driver
    if database_url.startswith("postgresql://"):
        database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")
    
    engine = create_async_engine(database_url)
    
    async def do_run_migrations():
        async with engine.begin() as connection:
            await connection.run_sync(do_run_migrations_sync)
    
    asyncio.run(do_run_migrations())

TimescaleDB-Specific Migrations:

"""Add TimescaleDB hypertable

Revision ID: 001
"""

def upgrade():
    # Create table first
    op.create_table(
        'market_data',
        sa.Column('id', postgresql.UUID(), nullable=False),
        sa.Column('symbol', sa.String(20), nullable=False),
        sa.Column('timestamp', sa.TIMESTAMP(timezone=True), nullable=False),
        sa.Column('price', sa.Numeric(18, 8)),
        sa.PrimaryKeyConstraint('id')
    )
    
    # Convert to hypertable
    op.execute("SELECT create_hypertable('market_data', 'timestamp');")
    
    # Add indexes
    op.create_index('idx_market_symbol_time', 'market_data', ['symbol', 'timestamp'])

Docker Configuration

Development Environment:

# docker-compose.yml
services:
  timescaledb:
    build: ./db
    container_name: tradingagents_timescaledb
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: tradingagents
      POSTGRES_DB: tradingagents
    ports:
      - "5432:5432"
    volumes:
      - ./seed.sql:/docker-entrypoint-initdb.d/seed.sql
      - timescale_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d tradingagents"]
      interval: 30s
      timeout: 10s
      retries: 3

Environment Configuration

Required Environment Variables:

# Database
DATABASE_URL=postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents

# OpenRouter LLM
OPENROUTER_API_KEY=your_openrouter_key
LLM_PROVIDER=openrouter
DEEP_THINK_LLM=openai/gpt-4o
QUICK_THINK_LLM=openai/gpt-4o-mini
BACKEND_URL=https://openrouter.ai/api/v1

# Application
TRADINGAGENTS_RESULTS_DIR=./results
TRADINGAGENTS_DATA_DIR=./data
DEFAULT_LOOKBACK_DAYS=30
ONLINE_TOOLS=true

# Performance
MAX_DEBATE_ROUNDS=1
MAX_RISK_DISCUSS_ROUNDS=1

Quality Gates

Database Performance

Query Performance Standards:

Simple queries: < 100ms
Complex aggregations: < 500ms
Vector similarity searches: < 1s
Batch operations: < 5s for 1000 records

Monitoring Queries:

-- Query performance monitoring
SELECT query, mean_exec_time, calls, total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;

-- TimescaleDB chunk information
SELECT * FROM chunk_relation_size('market_data');

Connection Health

Health Check Implementation:

async def health_check() -> dict:
    """Comprehensive system health check"""
    checks = {}
    
    # Database connectivity
    try:
        async with db_manager.get_session() as session:
            await session.execute(text("SELECT 1"))
        checks["database"] = {"status": "healthy", "latency_ms": None}
    except Exception as e:
        checks["database"] = {"status": "unhealthy", "error": str(e)}
    
    # OpenRouter API
    try:
        # Test API connection
        checks["llm_api"] = {"status": "healthy"}
    except Exception as e:
        checks["llm_api"] = {"status": "unhealthy", "error": str(e)}
    
    return checks

Data Quality Enforcement

Validation Pipeline:

class DataQualityValidator:
    """Ensures data meets quality standards before storage"""
    
    def validate_news_article(self, article: NewsArticle) -> list[str]:
        errors = []
        
        # Business rules
        if not article.headline.strip():
            errors.append("Empty headline")
        
        if len(article.headline) > 500:
            errors.append("Headline too long")
        
        if article.sentiment_score and not (-1 <= article.sentiment_score <= 1):
            errors.append("Invalid sentiment score range")
        
        # Data freshness
        if article.published_date > date.today():
            errors.append("Future publication date")
        
        return errors

This technical standards document provides the foundation for maintaining consistency across the TradingAgents codebase while ensuring optimal performance for financial data processing and AI agent operations.

16 KiB Raw Blame History