TradingAgents/docs/standards/tech.md

# Technical Standards - TradingAgents

## Database Architecture

### Core Stack: PostgreSQL + TimescaleDB + pgvectorscale

**Primary Database**: PostgreSQL 16+ with TimescaleDB and pgvector extensions
- **TimescaleDB**: Optimized for time-series financial data (prices, volumes, news timestamps)
- **pgvector/pgvectorscale**: Vector embeddings for RAG-powered agents
- **Connection**: asyncpg driver for high-performance async operations

**Database URL Pattern**:
```python
# Development
DATABASE_URL = "postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents"

# Production
DATABASE_URL = "postgresql+asyncpg://username:password@host:port/database"
```

**Required Extensions**:
```sql
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
```

### Schema Design Standards

**Time-Series Tables (TimescaleDB)**:
```sql
-- Market data with time-based partitioning
CREATE TABLE market_data (
    id UUID PRIMARY KEY DEFAULT uuid7(),
    symbol VARCHAR(20) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    price DECIMAL(18,8),
    volume BIGINT,
    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Convert to hypertable for time-series optimization
SELECT create_hypertable('market_data', 'timestamp');

-- Indexes for common query patterns
CREATE INDEX ON market_data (symbol, timestamp DESC);
```

**Vector-Enabled Tables**:
```sql
-- News articles with embeddings
CREATE TABLE news_articles (
    id UUID PRIMARY KEY DEFAULT uuid7(),
    headline TEXT NOT NULL,
    url TEXT UNIQUE NOT NULL,  -- Deduplication key
    published_date DATE NOT NULL,
    title_embedding VECTOR(1536),  -- OpenAI embedding size
    content_embedding VECTOR(1536),
    -- TimescaleDB partitioning on published_date
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Vector similarity index
CREATE INDEX ON news_articles USING ivfflat (title_embedding vector_cosine_ops);
```

**Composite Indexes for Query Optimization**:
```sql
-- Common query patterns
CREATE INDEX idx_symbol_date ON news_articles (symbol, published_date);
CREATE INDEX idx_published_date ON news_articles (published_date);
CREATE INDEX idx_url_unique ON news_articles (url);
```

### Connection Management

**Async Session Factory**:
```python
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine

class DatabaseManager:
    def __init__(self, database_url: str, echo: bool = False):
        # Ensure asyncpg driver
        if not database_url.startswith("postgresql+asyncpg://"):
            database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")

        self.engine = create_async_engine(
            database_url,
            echo=echo,
            pool_recycle=3600,  # 1-hour connection recycling
            pool_pre_ping=True,  # Connection health checks
        )

        self.AsyncSessionLocal = async_sessionmaker(
            bind=self.engine,
            class_=AsyncSession,
            autocommit=False,
            autoflush=False,
        )
```

**Session Context Management**:
```python
@asynccontextmanager
async def get_session(self) -> AsyncGenerator[AsyncSession, None]:
    """Type-checker friendly session management"""
    session = self.AsyncSessionLocal()
    try:
        yield session
        await session.commit()
    except Exception:
        await session.rollback()
        raise
    finally:
        await session.close()
```

## LLM Integration Standards

### OpenRouter as Unified Provider

**Configuration**:
```python
# Environment variables
OPENROUTER_API_KEY = "your_openrouter_key"
LLM_PROVIDER = "openrouter"
DEEP_THINK_LLM = "openai/gpt-4o"      # Complex analysis
QUICK_THINK_LLM = "openai/gpt-4o-mini" # Fast responses
BACKEND_URL = "https://openrouter.ai/api/v1"
```

**Model Selection Strategy**:
- **Deep Think**: Complex reasoning, debates, risk analysis (`openai/gpt-4o`, `anthropic/claude-3.5-sonnet`)
- **Quick Think**: Data formatting, simple queries (`openai/gpt-4o-mini`, `anthropic/claude-3-haiku`)

**Cost Optimization**:
```python
# Development/testing configuration
config = TradingAgentsConfig(
    llm_provider="openrouter",
    deep_think_llm="openai/gpt-4o-mini",     # Lower cost
    quick_think_llm="openai/gpt-4o-mini",    # Consistent model
    max_debate_rounds=1,                     # Reduce API calls
    online_tools=False,                      # Use cached data
)
```

### Agent Integration Patterns

**Anti-Corruption Layer**:
```python
class AgentToolkit:
    """Mediates between LLM agents and domain services"""

    def __init__(self, config: TradingAgentsConfig):
        self.config = config
        self.services = self._initialize_services()

    async def get_news_context(self, symbol: str, date: date) -> dict:
        """Convert domain models to structured LLM context"""
        articles = await self.news_service.get_articles(symbol, date)

        return {
            "articles": [article.to_dict() for article in articles],
            "count": len(articles),
            "data_quality": self._assess_data_quality(articles),
            "source_distribution": self._analyze_sources(articles)
        }
```

## Layered Architecture Enforcement

### Standard Layer Pattern

**Data Flow**: `Request → Router → Service → Repository → Entity → Database`

**Component Responsibilities**:

1. **Entity (Domain Model)**:
```python
@dataclass
class NewsArticle:
    """Domain entity with business rules and transformations"""

    headline: str
    url: str
    published_date: date
    sentiment_score: float | None = None

    def to_entity(self, symbol: str | None = None) -> NewsArticleEntity:
        """Transform to database model"""
        return NewsArticleEntity(
            headline=self.headline,
            url=self.url,
            published_date=self.published_date,
            symbol=symbol
        )

    @staticmethod
    def from_entity(entity: NewsArticleEntity) -> 'NewsArticle':
        """Transform from database model"""
        return NewsArticle(
            headline=entity.headline,
            url=entity.url,
            published_date=entity.published_date,
            sentiment_score=entity.sentiment_score
        )

    def validate(self) -> list[str]:
        """Business rule validation"""
        errors = []
        if not self.headline.strip():
            errors.append("Headline cannot be empty")
        if not self.url.startswith(("http://", "https://")):
            errors.append("Invalid URL format")
        return errors
```

2. **Repository (Data Access)**:
```python
class NewsRepository:
    """Handles data persistence with async operations"""

    def __init__(self, database_manager: DatabaseManager):
        self.db_manager = database_manager

    async def list(self, symbol: str, date: date) -> list[NewsArticle]:
        """Query with proper error handling and logging"""
        async with self.db_manager.get_session() as session:
            result = await session.execute(
                select(NewsArticleEntity)
                .filter(and_(
                    NewsArticleEntity.symbol == symbol,
                    NewsArticleEntity.published_date == date
                ))
                .order_by(NewsArticleEntity.published_date.desc())
            )
            entities = result.scalars().all()
            return [NewsArticle.from_entity(e) for e in entities]

    async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]:
        """Bulk operations for performance"""
        if not articles:
            return []

        async with self.db_manager.get_session() as session:
            # Use PostgreSQL ON CONFLICT for atomic upserts
            stmt = insert(NewsArticleEntity).values([
                article.to_entity(symbol).__dict__ for article in articles
            ])
            upsert_stmt = stmt.on_conflict_do_update(
                index_elements=["url"],
                set_={k: stmt.excluded[k] for k in stmt.excluded.keys()}
            ).returning(NewsArticleEntity)

            result = await session.execute(upsert_stmt)
            entities = result.scalars().all()
            return [NewsArticle.from_entity(e) for e in entities]
```

3. **Service (Business Logic)**:
```python
class NewsService:
    """Orchestrates business operations"""

    def __init__(self, repository: NewsRepository, clients: dict):
        self.repository = repository
        self.clients = clients

    async def get_articles(self, symbol: str, date: date) -> list[NewsArticle]:
        """Business logic with error handling"""
        try:
            articles = await self.repository.list(symbol, date)
            logger.info(f"Retrieved {len(articles)} articles for {symbol}")
            return articles
        except Exception as e:
            logger.error(f"Failed to get articles for {symbol}: {e}")
            return []  # Graceful degradation

    async def update_articles(self, symbol: str, date: date) -> int:
        """Coordinated data refresh"""
        new_articles = await self._fetch_from_sources(symbol, date)
        if new_articles:
            stored = await self.repository.upsert_batch(new_articles, symbol)
            return len(stored)
        return 0
```

### Domain Isolation

**Three Core Domains**:

1. **News Domain** (`tradingagents/domains/news/`)
2. **Market Data Domain** (`tradingagents/domains/marketdata/`)
3. **Social Media Domain** (`tradingagents/domains/socialmedia/`)

**Domain Boundary Rules**:
- Domains communicate through service interfaces only
- No direct database access between domains
- Shared types in `tradingagents/types/`
- Domain events for loose coupling

## Vector Integration and RAG Patterns

### Vector Embedding Storage

**OpenAI Embeddings (1536 dimensions)**:
```python
# Entity definition
class NewsArticleEntity(Base):
    title_embedding: Mapped[list[float] | None] = mapped_column(
        Vector(1536), nullable=True
    )
    content_embedding: Mapped[list[float] | None] = mapped_column(
        Vector(1536), nullable=True
    )

# Similarity search
async def find_similar_articles(self, query_embedding: list[float], limit: int = 10) -> list[NewsArticle]:
    async with self.db_manager.get_session() as session:
        result = await session.execute(
            select(NewsArticleEntity)
            .order_by(NewsArticleEntity.title_embedding.cosine_distance(query_embedding))
            .limit(limit)
        )
        return [NewsArticle.from_entity(e) for e in result.scalars()]
```

### RAG Context Assembly

**Agent Context Pattern**:
```python
async def build_agent_context(self, symbol: str, date: date) -> dict:
    """Assemble multi-source context for agents"""

    # Recent news with embeddings
    news_articles = await self.news_service.get_articles(symbol, date)

    # Market data
    market_data = await self.market_service.get_recent_data(symbol, days=30)

    # Social sentiment
    social_data = await self.social_service.get_sentiment(symbol, date)

    return {
        "news": {
            "articles": [a.to_dict() for a in news_articles],
            "sentiment_avg": sum(a.sentiment_score or 0 for a in news_articles) / len(news_articles),
            "sources": list({a.source for a in news_articles})
        },
        "market": {
            "current_price": market_data.current_price,
            "volatility": market_data.volatility_30d,
            "volume_trend": market_data.volume_trend
        },
        "social": {
            "reddit_sentiment": social_data.reddit_score,
            "twitter_mentions": social_data.twitter_mentions
        },
        "context_quality": self._assess_context_quality(news_articles, market_data, social_data)
    }
```

## Migration and Deployment Standards

### Database Migrations

**Alembic Configuration**:
```python
# alembic/env.py
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine
from tradingagents.lib.database import Base

def run_async_migrations():
    config = context.config
    database_url = config.get_main_option("sqlalchemy.url")

    # Ensure asyncpg driver
    if database_url.startswith("postgresql://"):
        database_url = database_url.replace("postgresql://", "postgresql+asyncpg://")

    engine = create_async_engine(database_url)

    async def do_run_migrations():
        async with engine.begin() as connection:
            await connection.run_sync(do_run_migrations_sync)

    asyncio.run(do_run_migrations())
```

**TimescaleDB-Specific Migrations**:
```python
"""Add TimescaleDB hypertable

Revision ID: 001
"""

def upgrade():
    # Create table first
    op.create_table(
        'market_data',
        sa.Column('id', postgresql.UUID(), nullable=False),
        sa.Column('symbol', sa.String(20), nullable=False),
        sa.Column('timestamp', sa.TIMESTAMP(timezone=True), nullable=False),
        sa.Column('price', sa.Numeric(18, 8)),
        sa.PrimaryKeyConstraint('id')
    )

    # Convert to hypertable
    op.execute("SELECT create_hypertable('market_data', 'timestamp');")

    # Add indexes
    op.create_index('idx_market_symbol_time', 'market_data', ['symbol', 'timestamp'])
```

### Docker Configuration

**Development Environment**:
```yaml
# docker-compose.yml
services:
  timescaledb:
    build: ./db
    container_name: tradingagents_timescaledb
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: tradingagents
      POSTGRES_DB: tradingagents
    ports:
      - "5432:5432"
    volumes:
      - ./seed.sql:/docker-entrypoint-initdb.d/seed.sql
      - timescale_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d tradingagents"]
      interval: 30s
      timeout: 10s
      retries: 3
```

### Environment Configuration

**Required Environment Variables**:
```bash
# Database
DATABASE_URL=postgresql+asyncpg://postgres:tradingagents@localhost:5432/tradingagents

# OpenRouter LLM
OPENROUTER_API_KEY=your_openrouter_key
LLM_PROVIDER=openrouter
DEEP_THINK_LLM=openai/gpt-4o
QUICK_THINK_LLM=openai/gpt-4o-mini
BACKEND_URL=https://openrouter.ai/api/v1

# Application
TRADINGAGENTS_RESULTS_DIR=./results
TRADINGAGENTS_DATA_DIR=./data
DEFAULT_LOOKBACK_DAYS=30
ONLINE_TOOLS=true

# Performance
MAX_DEBATE_ROUNDS=1
MAX_RISK_DISCUSS_ROUNDS=1
```

## Quality Gates

### Database Performance

**Query Performance Standards**:
- Simple queries: < 100ms
- Complex aggregations: < 500ms
- Vector similarity searches: < 1s
- Batch operations: < 5s for 1000 records

**Monitoring Queries**:
```sql
-- Query performance monitoring
SELECT query, mean_exec_time, calls, total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;

-- TimescaleDB chunk information
SELECT * FROM chunk_relation_size('market_data');
```

### Connection Health

**Health Check Implementation**:
```python
async def health_check() -> dict:
    """Comprehensive system health check"""
    checks = {}

    # Database connectivity
    try:
        async with db_manager.get_session() as session:
            await session.execute(text("SELECT 1"))
        checks["database"] = {"status": "healthy", "latency_ms": None}
    except Exception as e:
        checks["database"] = {"status": "unhealthy", "error": str(e)}

    # OpenRouter API
    try:
        # Test API connection
        checks["llm_api"] = {"status": "healthy"}
    except Exception as e:
        checks["llm_api"] = {"status": "unhealthy", "error": str(e)}

    return checks
```

### Data Quality Enforcement

**Validation Pipeline**:
```python
class DataQualityValidator:
    """Ensures data meets quality standards before storage"""

    def validate_news_article(self, article: NewsArticle) -> list[str]:
        errors = []

        # Business rules
        if not article.headline.strip():
            errors.append("Empty headline")

        if len(article.headline) > 500:
            errors.append("Headline too long")

        if article.sentiment_score and not (-1 <= article.sentiment_score <= 1):
            errors.append("Invalid sentiment score range")

        # Data freshness
        if article.published_date > date.today():
            errors.append("Future publication date")

        return errors
```

This technical standards document provides the foundation for maintaining consistency across the TradingAgents codebase while ensuring optimal performance for financial data processing and AI agent operations.