TradingAgents/docs/standards/style.md

# Style Guide - TradingAgents

## Python Code Style

### Formatting with Ruff

**Configuration** (pyproject.toml):
```toml
[tool.ruff]
target-version = "py313"
line-length = 88
fix = true
extend-exclude = [
    "migrations/",
    "alembic/versions/",
    ".env",
    "venv/",
    ".venv/",
]

[tool.ruff.lint]
select = [
    "E",     # pycodestyle errors
    "W",     # pycodestyle warnings
    "F",     # Pyflakes
    "I",     # isort
    "B",     # flake8-bugbear
    "C4",    # flake8-comprehensions
    "UP",    # pyupgrade
    "ERA",   # eradicate
    "PIE",   # flake8-pie
    "SIM",   # flake8-simplify
    "TCH",   # flake8-type-checking
    "ARG",   # flake8-unused-arguments
    "PTH",   # flake8-use-pathlib
    "FIX",   # flake8-fixme
    "TD",    # flake8-todos
]

ignore = [
    "E501",  # Line too long (handled by formatter)
    "B008",  # Do not perform function calls in argument defaults
    "B904",  # Use `raise ... from ...` for exception chaining
    "TD002", # Missing author in TODO
    "TD003", # Missing issue link on line following TODO
    "FIX002", # Line contains TODO
]

[tool.ruff.lint.per-file-ignores]
"tests/**/*.py" = [
    "S101",    # Use of assert detected
    "ARG001",  # Unused function argument
    "FBT001",  # Boolean positional arg
    "PLR2004", # Magic value used in comparison
]

"migrations/**/*.py" = [
    "ERA001",  # Found commented-out code
]

[tool.ruff.lint.isort]
known-first-party = ["tradingagents"]
force-sort-within-sections = true
```

### Type Hints and Annotations

**Modern Type Syntax** (Python 3.13):
```python
# Use built-in generics (no typing.List, typing.Dict)
def process_articles(articles: list[NewsArticle]) -> dict[str, int]:
    """Process articles and return symbol counts"""
    counts: dict[str, int] = {}
    for article in articles:
        symbol = article.symbol or "UNKNOWN"
        counts[symbol] = counts.get(symbol, 0) + 1
    return counts

# Union types with |
def get_article(article_id: str | int) -> NewsArticle | None:
    """Get article by ID (string or integer)"""
    if isinstance(article_id, str):
        return get_by_url(article_id)
    return get_by_id(article_id)

# Optional with explicit None
def calculate_sentiment(text: str, model: str | None = None) -> float | None:
    """Calculate sentiment score"""
    if not text.strip():
        return None
    # Implementation
    return 0.5
```

**Type Annotations for Complex Types**:
```python
from typing import TypeVar, Generic, Protocol, TypedDict, Awaitable
from collections.abc import Callable, AsyncGenerator
from datetime import date, datetime

# Type variables
T = TypeVar('T')
ArticleT = TypeVar('ArticleT', bound='NewsArticle')

# Protocol for type checking
class Repository(Protocol[T]):
    async def list(self, symbol: str, date: date) -> list[T]:
        ...

    async def upsert(self, item: T) -> T:
        ...

# TypedDict for structured data
class ArticleData(TypedDict):
    headline: str
    url: str
    published_date: str
    sentiment_score: float | None

# Callable types
ProcessorFunc = Callable[[list[NewsArticle]], Awaitable[dict[str, int]]]
```

### Docstring Standards

**Google Style Docstrings**:
```python
class NewsRepository:
    """Repository for news article data access with PostgreSQL backend.

    Handles CRUD operations for news articles with support for batch operations,
    vector similarity search, and TimescaleDB time-series optimization.

    Attributes:
        db_manager: AsyncIO database connection manager

    Example:
        >>> db_manager = DatabaseManager("postgresql://...")
        >>> repo = NewsRepository(db_manager)
        >>> articles = await repo.list("AAPL", date(2024, 1, 15))
    """

    def __init__(self, database_manager: DatabaseManager) -> None:
        """Initialize repository with database connection.

        Args:
            database_manager: Async database connection manager with
                PostgreSQL + TimescaleDB + pgvector support.
        """
        self.db_manager = database_manager

    async def upsert_batch(
        self,
        articles: list[NewsArticle],
        symbol: str,
        *,
        chunk_size: int = 1000
    ) -> list[NewsArticle]:
        """Batch insert or update articles with deduplication.

        Uses PostgreSQL ON CONFLICT for atomic upserts based on URL uniqueness.
        Processes articles in chunks to optimize memory usage for large datasets.

        Args:
            articles: News articles to store
            symbol: Stock symbol to associate with articles
            chunk_size: Number of articles to process per database transaction.
                Defaults to 1000 for optimal PostgreSQL performance.

        Returns:
            List of stored articles with database-generated metadata

        Raises:
            IntegrityError: If URL constraint violations occur
            DatabaseConnectionError: If database is unavailable

        Example:
            >>> articles = [NewsArticle("Title", "https://...", ...)]
            >>> stored = await repo.upsert_batch(articles, "AAPL")
            >>> assert len(stored) == len(articles)
        """
        if not articles:
            return []

        # Implementation...
```

**Module-Level Docstrings**:
```python
"""
News repository with PostgreSQL + TimescaleDB backend.

This module provides data access patterns for financial news articles with
support for:
- Time-series queries optimized by TimescaleDB
- Vector similarity search using pgvector
- Bulk operations with PostgreSQL-specific optimizations
- Async/await patterns for high-performance I/O

Example Usage:
    from tradingagents.domains.news.news_repository import NewsRepository
    from tradingagents.lib.database import DatabaseManager

    db = DatabaseManager("postgresql+asyncpg://...")
    repo = NewsRepository(db)

    # Get articles for a symbol and date
    articles = await repo.list("AAPL", date(2024, 1, 15))

    # Batch store new articles
    new_articles = [...]
    stored = await repo.upsert_batch(new_articles, "AAPL")
"""

from __future__ import annotations
```

### Variable and Function Naming

**Snake Case for Everything**:
```python
# Variables
article_count = len(articles)
sentiment_threshold = 0.5
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")

# Functions
def calculate_portfolio_risk(positions: list[Position]) -> float:
    """Calculate portfolio-wide risk metrics"""

async def fetch_news_articles(symbol: str, date: date) -> list[NewsArticle]:
    """Fetch news articles from external APIs"""

# Private methods
def _validate_sentiment_score(score: float | None) -> bool:
    """Internal validation for sentiment scores"""

# Constants
MAX_ARTICLES_PER_REQUEST = 100
DEFAULT_LOOKBACK_DAYS = 30
OPENAI_EMBEDDING_DIMENSIONS = 1536
```

**Descriptive Names Over Short Names**:
```python
# Good - Clear intent
async def update_articles_for_symbol(symbol: str, target_date: date) -> int:
    successful_count = 0
    failed_count = 0

    for news_source in self.configured_sources:
        try:
            articles = await news_source.fetch(symbol, target_date)
            stored_articles = await self.repository.upsert_batch(articles, symbol)
            successful_count += len(stored_articles)
        except Exception as e:
            failed_count += 1
            logger.warning(f"Failed to fetch from {news_source.name}: {e}")

    return successful_count

# Avoid - Unclear abbreviations
async def upd_arts(sym: str, dt: date) -> int:
    cnt = 0
    for src in self.srcs:
        arts = await src.get(sym, dt)
        cnt += len(arts)
    return cnt
```

### Import Organization

**Import Order with isort**:
```python
# 1. Standard library imports
import asyncio
import logging
import uuid
from datetime import date, datetime
from pathlib import Path
from typing import Any

# 2. Third-party imports
import aiohttp
from sqlalchemy import select, and_
from sqlalchemy.ext.asyncio import AsyncSession
import pytest

# 3. First-party imports
from tradingagents.config import TradingAgentsConfig
from tradingagents.domains.news.news_repository import NewsArticle, NewsRepository
from tradingagents.lib.database import DatabaseManager

# 4. Relative imports (avoid when possible)
from .google_news_client import GoogleNewsClient
```

**Import Aliases**:
```python
# Standard aliases for common packages
import pandas as pd
import numpy as np
from datetime import datetime as dt, date

# Avoid long module paths
from tradingagents.domains.news.news_repository import (
    NewsArticle,
    NewsRepository,
    NewsArticleEntity
)

# Type-only imports for forward references
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from tradingagents.agents.trading_agent import TradingAgent
```

## Database Naming Conventions

### Table Names

**Snake Case with Domain Prefix**:
```sql
-- Domain-prefixed tables
news_articles           -- Core news data
news_article_embeddings -- Vector embeddings (if separate)

market_data_daily       -- Daily market prices
market_data_intraday    -- Intraday tick data

social_media_posts      -- Social media content
social_sentiment_scores -- Sentiment analysis results

-- Agent-specific tables
agent_decisions         -- Trading decisions
agent_portfolios        -- Portfolio states
agent_memories          -- RAG memory store
```

### Column Names

**Descriptive Snake Case**:
```sql
-- Good - Clear and consistent
CREATE TABLE news_articles (
    id UUID PRIMARY KEY DEFAULT uuid7(),
    headline TEXT NOT NULL,
    url TEXT UNIQUE NOT NULL,
    published_date DATE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Foreign key relationships
    symbol VARCHAR(20) REFERENCES stocks(symbol),
    source_id UUID REFERENCES news_sources(id),

    -- Metrics and scores
    sentiment_score DECIMAL(3,2) CHECK (sentiment_score BETWEEN -1 AND 1),
    readability_score INTEGER CHECK (readability_score BETWEEN 0 AND 100),

    -- Vector embeddings
    title_embedding VECTOR(1536),
    content_embedding VECTOR(1536)
);

-- Avoid - Unclear abbreviations
CREATE TABLE art (
    id UUID,
    ttl TEXT,    -- title?
    dt DATE,     -- published_date?
    scr DECIMAL, -- score? source?
    emb VECTOR(1536)  -- embedding?
);
```

### Index Names

**Descriptive with Purpose**:
```sql
-- Pattern: idx_{table}_{columns}_{purpose}
CREATE INDEX idx_news_articles_symbol_date_lookup
ON news_articles (symbol, published_date);

CREATE INDEX idx_news_articles_published_date_timeseries
ON news_articles (published_date DESC);

CREATE INDEX idx_news_articles_url_unique
ON news_articles (url);

-- Vector indexes with algorithm
CREATE INDEX idx_news_articles_title_embedding_cosine
ON news_articles USING ivfflat (title_embedding vector_cosine_ops);

-- Partial indexes for specific queries
CREATE INDEX idx_news_articles_recent_high_sentiment
ON news_articles (published_date, sentiment_score)
WHERE published_date > CURRENT_DATE - INTERVAL '30 days'
AND sentiment_score > 0.5;
```

## API Design Patterns

### RESTful URL Structure

**Resource-Based URLs**:
```python
# Good - Resource-oriented
GET    /api/v1/symbols/AAPL/articles?date=2024-01-15     # Get articles
POST   /api/v1/symbols/AAPL/articles                     # Create articles
PUT    /api/v1/articles/{article_id}                     # Update article
DELETE /api/v1/articles/{article_id}                     # Delete article

GET    /api/v1/symbols/AAPL/market-data?start=2024-01-01&end=2024-01-31
POST   /api/v1/trading/decisions                         # Create trading decision
GET    /api/v1/agents/portfolios/{portfolio_id}          # Get portfolio state

# Avoid - Action-oriented
POST /api/v1/getArticles                                 # Should be GET
POST /api/v1/updateSymbolData                            # Should be PUT
GET  /api/v1/performTradingAnalysis                      # Should be POST
```

**Query Parameter Standards**:
```python
from datetime import date
from pydantic import BaseModel, Field, validator

class ArticleQueryParams(BaseModel):
    """Query parameters for article endpoints"""

    # Date filtering
    date: date | None = None
    start_date: date | None = Field(None, alias="start")
    end_date: date | None = Field(None, alias="end")

    # Pagination
    limit: int = Field(default=50, ge=1, le=1000)
    offset: int = Field(default=0, ge=0)

    # Filtering
    sources: list[str] | None = Field(None, description="Filter by news sources")
    min_sentiment: float | None = Field(None, ge=-1.0, le=1.0)
    max_sentiment: float | None = Field(None, ge=-1.0, le=1.0)

    # Search
    query: str | None = Field(None, max_length=200)

    @validator('end_date')
    def end_date_after_start(cls, v, values):
        if v and values.get('start_date') and v < values['start_date']:
            raise ValueError('end_date must be after start_date')
        return v
```

### Response Formats

**Consistent JSON Structure**:
```python
from typing import Generic, TypeVar
from pydantic import BaseModel

T = TypeVar('T')

class APIResponse(BaseModel, Generic[T]):
    """Standard API response wrapper"""

    data: T | None = None
    success: bool = True
    message: str | None = None
    errors: list[str] = []

    # Metadata
    request_id: str | None = None
    timestamp: str = Field(default_factory=lambda: datetime.utcnow().isoformat())

class PaginatedResponse(APIResponse[list[T]]):
    """Paginated response with metadata"""

    pagination: dict[str, int] = Field(default_factory=dict)

    @classmethod
    def create(
        cls,
        data: list[T],
        total: int,
        limit: int,
        offset: int
    ) -> 'PaginatedResponse[T]':
        return cls(
            data=data,
            pagination={
                "total": total,
                "limit": limit,
                "offset": offset,
                "has_more": offset + len(data) < total
            }
        )

# Usage example
@app.get("/api/v1/symbols/{symbol}/articles")
async def get_articles(
    symbol: str,
    params: ArticleQueryParams = Depends(),
    db: AsyncSession = Depends(get_db_session)
) -> PaginatedResponse[ArticleData]:
    """Get news articles for a symbol"""

    # Query implementation
    articles, total = await article_service.get_paginated(
        symbol=symbol,
        limit=params.limit,
        offset=params.offset,
        date_filter=params.date
    )

    return PaginatedResponse.create(
        data=[ArticleData.from_entity(a) for a in articles],
        total=total,
        limit=params.limit,
        offset=params.offset
    )
```

## Documentation Standards

### Code Comments

**When to Comment**:
```python
class NewsRepository:
    async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]:
        # Don't comment obvious code
        if not articles:
            return []

        # DO comment complex business logic
        # Use PostgreSQL ON CONFLICT for atomic upsert operations.
        # This prevents race conditions when multiple processes
        # are updating the same articles simultaneously.
        stmt = insert(NewsArticleEntity).values(entity_data_list)
        upsert_stmt = stmt.on_conflict_do_update(
            index_elements=["url"],  # Deduplication key
            set_={
                # Update all fields except ID and created_at
                **{col: stmt.excluded[col] for col in updateable_columns},
                "updated_at": func.now(),
            },
        )

        # DO comment performance optimizations
        # Batch size of 1000 optimizes PostgreSQL memory usage
        # while avoiding transaction timeout for large datasets
        for chunk in chunks(entity_data_list, 1000):
            result = await session.execute(upsert_stmt)
```

**TODO Comments**:
```python
# TODO(martin): Implement caching layer for frequently accessed articles
# TODO(martin): Add vector similarity search for related articles
# FIXME(martin): Handle edge case where published_date is in future
# HACK(martin): Temporary workaround for API rate limiting - remove after v2.0
```

### README Structure

**Repository README.md Template**:
```markdown
# TradingAgents - Multi-Agent Financial Analysis

Brief description of what the project does and why it exists.

## Quick Start

```bash
# 1. Setup environment
export OPENROUTER_API_KEY="your_key"
mise run docker    # Start PostgreSQL

# 2. Install and run
mise run install
mise run dev       # Interactive CLI
```

## Architecture

High-level overview with diagrams if helpful.

## Development

### Prerequisites
- Python 3.13+
- PostgreSQL 16+ with TimescaleDB
- OpenRouter API access

### Setup
```bash
mise run install   # Install dependencies
mise run test      # Run test suite
mise run format    # Format code
```

### Testing
Details about test strategy and running tests.

## Configuration

Environment variables and configuration options.

## Contributing

Link to contributing guidelines.
```

### Commit Message Conventions

**Conventional Commits Format**:
```
type(scope): description

[optional body]

[optional footer(s)]
```

**Types**:
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation changes
- `style`: Code style changes (formatting, missing semicolons, etc.)
- `refactor`: Code refactoring
- `test`: Adding missing tests or correcting existing tests
- `chore`: Changes to build process or auxiliary tools

**Examples**:
```
feat(news): add vector similarity search for related articles

Implements pgvector-based similarity search using OpenAI embeddings.
Articles can now find related content based on semantic similarity
rather than just keyword matching.

- Add title_embedding and content_embedding columns
- Implement cosine similarity search in NewsRepository
- Add vector index for performance optimization

Closes #123

---

fix(database): handle connection timeouts in async sessions

Connection pooling was causing timeouts under high load.
Added proper timeout handling and connection recycling.

- Set pool_recycle=3600 for connection health
- Add retry logic for transient connection errors
- Improve error logging for debugging

---

test(news): add integration tests for batch upsert operations

Covers edge cases for duplicate URL handling and large batch processing.

---

docs(api): update OpenAPI spec for news endpoints

- Add pagination parameters
- Document error response formats
- Include example requests and responses
```

### Code Organization

**File and Directory Structure**:
```
tradingagents/
├── __init__.py
├── config.py                    # Application configuration
├── main.py                      # Entry point
├──
├── domains/                     # Domain-driven design
│   ├── __init__.py
│   ├── news/                    # News domain
│   │   ├── __init__.py
│   │   ├── news_service.py      # Business logic
│   │   ├── news_repository.py   # Data access
│   │   ├── google_news_client.py # External API
│   │   └── models.py           # Domain models
│   ├── marketdata/             # Market data domain
│   └── socialmedia/            # Social media domain
│
├── agents/                      # LLM agents
│   ├── __init__.py
│   ├── trading_agent.py
│   ├── analyst_agent.py
│   └── libs/                   # Agent utilities
│       ├── __init__.py
│       └── agent_toolkit.py
│
├── lib/                        # Shared utilities
│   ├── __init__.py
│   ├── database.py            # Database connection
│   ├── logging.py             # Logging configuration
│   └── utils.py               # Common utilities
│
└── types/                     # Shared type definitions
    ├── __init__.py
    ├── common.py
    └── financial.py
```

This style guide ensures consistent, maintainable code across the TradingAgents project while leveraging modern Python features and database optimization techniques.