TradingAgents/docs/standards/style.md

715 lines
20 KiB
Markdown

# Style Guide - TradingAgents
## Python Code Style
### Formatting with Ruff
**Configuration** (pyproject.toml):
```toml
[tool.ruff]
target-version = "py313"
line-length = 88
fix = true
extend-exclude = [
"migrations/",
"alembic/versions/",
".env",
"venv/",
".venv/",
]
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # Pyflakes
"I", # isort
"B", # flake8-bugbear
"C4", # flake8-comprehensions
"UP", # pyupgrade
"ERA", # eradicate
"PIE", # flake8-pie
"SIM", # flake8-simplify
"TCH", # flake8-type-checking
"ARG", # flake8-unused-arguments
"PTH", # flake8-use-pathlib
"FIX", # flake8-fixme
"TD", # flake8-todos
]
ignore = [
"E501", # Line too long (handled by formatter)
"B008", # Do not perform function calls in argument defaults
"B904", # Use `raise ... from ...` for exception chaining
"TD002", # Missing author in TODO
"TD003", # Missing issue link on line following TODO
"FIX002", # Line contains TODO
]
[tool.ruff.lint.per-file-ignores]
"tests/**/*.py" = [
"S101", # Use of assert detected
"ARG001", # Unused function argument
"FBT001", # Boolean positional arg
"PLR2004", # Magic value used in comparison
]
"migrations/**/*.py" = [
"ERA001", # Found commented-out code
]
[tool.ruff.lint.isort]
known-first-party = ["tradingagents"]
force-sort-within-sections = true
```
### Type Hints and Annotations
**Modern Type Syntax** (Python 3.13):
```python
# Use built-in generics (no typing.List, typing.Dict)
def process_articles(articles: list[NewsArticle]) -> dict[str, int]:
"""Process articles and return symbol counts"""
counts: dict[str, int] = {}
for article in articles:
symbol = article.symbol or "UNKNOWN"
counts[symbol] = counts.get(symbol, 0) + 1
return counts
# Union types with |
def get_article(article_id: str | int) -> NewsArticle | None:
"""Get article by ID (string or integer)"""
if isinstance(article_id, str):
return get_by_url(article_id)
return get_by_id(article_id)
# Optional with explicit None
def calculate_sentiment(text: str, model: str | None = None) -> float | None:
"""Calculate sentiment score"""
if not text.strip():
return None
# Implementation
return 0.5
```
**Type Annotations for Complex Types**:
```python
from typing import TypeVar, Generic, Protocol, TypedDict, Awaitable
from collections.abc import Callable, AsyncGenerator
from datetime import date, datetime
# Type variables
T = TypeVar('T')
ArticleT = TypeVar('ArticleT', bound='NewsArticle')
# Protocol for type checking
class Repository(Protocol[T]):
async def list(self, symbol: str, date: date) -> list[T]:
...
async def upsert(self, item: T) -> T:
...
# TypedDict for structured data
class ArticleData(TypedDict):
headline: str
url: str
published_date: str
sentiment_score: float | None
# Callable types
ProcessorFunc = Callable[[list[NewsArticle]], Awaitable[dict[str, int]]]
```
### Docstring Standards
**Google Style Docstrings**:
```python
class NewsRepository:
"""Repository for news article data access with PostgreSQL backend.
Handles CRUD operations for news articles with support for batch operations,
vector similarity search, and TimescaleDB time-series optimization.
Attributes:
db_manager: AsyncIO database connection manager
Example:
>>> db_manager = DatabaseManager("postgresql://...")
>>> repo = NewsRepository(db_manager)
>>> articles = await repo.list("AAPL", date(2024, 1, 15))
"""
def __init__(self, database_manager: DatabaseManager) -> None:
"""Initialize repository with database connection.
Args:
database_manager: Async database connection manager with
PostgreSQL + TimescaleDB + pgvector support.
"""
self.db_manager = database_manager
async def upsert_batch(
self,
articles: list[NewsArticle],
symbol: str,
*,
chunk_size: int = 1000
) -> list[NewsArticle]:
"""Batch insert or update articles with deduplication.
Uses PostgreSQL ON CONFLICT for atomic upserts based on URL uniqueness.
Processes articles in chunks to optimize memory usage for large datasets.
Args:
articles: News articles to store
symbol: Stock symbol to associate with articles
chunk_size: Number of articles to process per database transaction.
Defaults to 1000 for optimal PostgreSQL performance.
Returns:
List of stored articles with database-generated metadata
Raises:
IntegrityError: If URL constraint violations occur
DatabaseConnectionError: If database is unavailable
Example:
>>> articles = [NewsArticle("Title", "https://...", ...)]
>>> stored = await repo.upsert_batch(articles, "AAPL")
>>> assert len(stored) == len(articles)
"""
if not articles:
return []
# Implementation...
```
**Module-Level Docstrings**:
```python
"""
News repository with PostgreSQL + TimescaleDB backend.
This module provides data access patterns for financial news articles with
support for:
- Time-series queries optimized by TimescaleDB
- Vector similarity search using pgvector
- Bulk operations with PostgreSQL-specific optimizations
- Async/await patterns for high-performance I/O
Example Usage:
from tradingagents.domains.news.news_repository import NewsRepository
from tradingagents.lib.database import DatabaseManager
db = DatabaseManager("postgresql+asyncpg://...")
repo = NewsRepository(db)
# Get articles for a symbol and date
articles = await repo.list("AAPL", date(2024, 1, 15))
# Batch store new articles
new_articles = [...]
stored = await repo.upsert_batch(new_articles, "AAPL")
"""
from __future__ import annotations
```
### Variable and Function Naming
**Snake Case for Everything**:
```python
# Variables
article_count = len(articles)
sentiment_threshold = 0.5
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")
# Functions
def calculate_portfolio_risk(positions: list[Position]) -> float:
"""Calculate portfolio-wide risk metrics"""
async def fetch_news_articles(symbol: str, date: date) -> list[NewsArticle]:
"""Fetch news articles from external APIs"""
# Private methods
def _validate_sentiment_score(score: float | None) -> bool:
"""Internal validation for sentiment scores"""
# Constants
MAX_ARTICLES_PER_REQUEST = 100
DEFAULT_LOOKBACK_DAYS = 30
OPENAI_EMBEDDING_DIMENSIONS = 1536
```
**Descriptive Names Over Short Names**:
```python
# Good - Clear intent
async def update_articles_for_symbol(symbol: str, target_date: date) -> int:
successful_count = 0
failed_count = 0
for news_source in self.configured_sources:
try:
articles = await news_source.fetch(symbol, target_date)
stored_articles = await self.repository.upsert_batch(articles, symbol)
successful_count += len(stored_articles)
except Exception as e:
failed_count += 1
logger.warning(f"Failed to fetch from {news_source.name}: {e}")
return successful_count
# Avoid - Unclear abbreviations
async def upd_arts(sym: str, dt: date) -> int:
cnt = 0
for src in self.srcs:
arts = await src.get(sym, dt)
cnt += len(arts)
return cnt
```
### Import Organization
**Import Order with isort**:
```python
# 1. Standard library imports
import asyncio
import logging
import uuid
from datetime import date, datetime
from pathlib import Path
from typing import Any
# 2. Third-party imports
import aiohttp
from sqlalchemy import select, and_
from sqlalchemy.ext.asyncio import AsyncSession
import pytest
# 3. First-party imports
from tradingagents.config import TradingAgentsConfig
from tradingagents.domains.news.news_repository import NewsArticle, NewsRepository
from tradingagents.lib.database import DatabaseManager
# 4. Relative imports (avoid when possible)
from .google_news_client import GoogleNewsClient
```
**Import Aliases**:
```python
# Standard aliases for common packages
import pandas as pd
import numpy as np
from datetime import datetime as dt, date
# Avoid long module paths
from tradingagents.domains.news.news_repository import (
NewsArticle,
NewsRepository,
NewsArticleEntity
)
# Type-only imports for forward references
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from tradingagents.agents.trading_agent import TradingAgent
```
## Database Naming Conventions
### Table Names
**Snake Case with Domain Prefix**:
```sql
-- Domain-prefixed tables
news_articles -- Core news data
news_article_embeddings -- Vector embeddings (if separate)
market_data_daily -- Daily market prices
market_data_intraday -- Intraday tick data
social_media_posts -- Social media content
social_sentiment_scores -- Sentiment analysis results
-- Agent-specific tables
agent_decisions -- Trading decisions
agent_portfolios -- Portfolio states
agent_memories -- RAG memory store
```
### Column Names
**Descriptive Snake Case**:
```sql
-- Good - Clear and consistent
CREATE TABLE news_articles (
id UUID PRIMARY KEY DEFAULT uuid7(),
headline TEXT NOT NULL,
url TEXT UNIQUE NOT NULL,
published_date DATE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Foreign key relationships
symbol VARCHAR(20) REFERENCES stocks(symbol),
source_id UUID REFERENCES news_sources(id),
-- Metrics and scores
sentiment_score DECIMAL(3,2) CHECK (sentiment_score BETWEEN -1 AND 1),
readability_score INTEGER CHECK (readability_score BETWEEN 0 AND 100),
-- Vector embeddings
title_embedding VECTOR(1536),
content_embedding VECTOR(1536)
);
-- Avoid - Unclear abbreviations
CREATE TABLE art (
id UUID,
ttl TEXT, -- title?
dt DATE, -- published_date?
scr DECIMAL, -- score? source?
emb VECTOR(1536) -- embedding?
);
```
### Index Names
**Descriptive with Purpose**:
```sql
-- Pattern: idx_{table}_{columns}_{purpose}
CREATE INDEX idx_news_articles_symbol_date_lookup
ON news_articles (symbol, published_date);
CREATE INDEX idx_news_articles_published_date_timeseries
ON news_articles (published_date DESC);
CREATE INDEX idx_news_articles_url_unique
ON news_articles (url);
-- Vector indexes with algorithm
CREATE INDEX idx_news_articles_title_embedding_cosine
ON news_articles USING ivfflat (title_embedding vector_cosine_ops);
-- Partial indexes for specific queries
CREATE INDEX idx_news_articles_recent_high_sentiment
ON news_articles (published_date, sentiment_score)
WHERE published_date > CURRENT_DATE - INTERVAL '30 days'
AND sentiment_score > 0.5;
```
## API Design Patterns
### RESTful URL Structure
**Resource-Based URLs**:
```python
# Good - Resource-oriented
GET /api/v1/symbols/AAPL/articles?date=2024-01-15 # Get articles
POST /api/v1/symbols/AAPL/articles # Create articles
PUT /api/v1/articles/{article_id} # Update article
DELETE /api/v1/articles/{article_id} # Delete article
GET /api/v1/symbols/AAPL/market-data?start=2024-01-01&end=2024-01-31
POST /api/v1/trading/decisions # Create trading decision
GET /api/v1/agents/portfolios/{portfolio_id} # Get portfolio state
# Avoid - Action-oriented
POST /api/v1/getArticles # Should be GET
POST /api/v1/updateSymbolData # Should be PUT
GET /api/v1/performTradingAnalysis # Should be POST
```
**Query Parameter Standards**:
```python
from datetime import date
from pydantic import BaseModel, Field, validator
class ArticleQueryParams(BaseModel):
"""Query parameters for article endpoints"""
# Date filtering
date: date | None = None
start_date: date | None = Field(None, alias="start")
end_date: date | None = Field(None, alias="end")
# Pagination
limit: int = Field(default=50, ge=1, le=1000)
offset: int = Field(default=0, ge=0)
# Filtering
sources: list[str] | None = Field(None, description="Filter by news sources")
min_sentiment: float | None = Field(None, ge=-1.0, le=1.0)
max_sentiment: float | None = Field(None, ge=-1.0, le=1.0)
# Search
query: str | None = Field(None, max_length=200)
@validator('end_date')
def end_date_after_start(cls, v, values):
if v and values.get('start_date') and v < values['start_date']:
raise ValueError('end_date must be after start_date')
return v
```
### Response Formats
**Consistent JSON Structure**:
```python
from typing import Generic, TypeVar
from pydantic import BaseModel
T = TypeVar('T')
class APIResponse(BaseModel, Generic[T]):
"""Standard API response wrapper"""
data: T | None = None
success: bool = True
message: str | None = None
errors: list[str] = []
# Metadata
request_id: str | None = None
timestamp: str = Field(default_factory=lambda: datetime.utcnow().isoformat())
class PaginatedResponse(APIResponse[list[T]]):
"""Paginated response with metadata"""
pagination: dict[str, int] = Field(default_factory=dict)
@classmethod
def create(
cls,
data: list[T],
total: int,
limit: int,
offset: int
) -> 'PaginatedResponse[T]':
return cls(
data=data,
pagination={
"total": total,
"limit": limit,
"offset": offset,
"has_more": offset + len(data) < total
}
)
# Usage example
@app.get("/api/v1/symbols/{symbol}/articles")
async def get_articles(
symbol: str,
params: ArticleQueryParams = Depends(),
db: AsyncSession = Depends(get_db_session)
) -> PaginatedResponse[ArticleData]:
"""Get news articles for a symbol"""
# Query implementation
articles, total = await article_service.get_paginated(
symbol=symbol,
limit=params.limit,
offset=params.offset,
date_filter=params.date
)
return PaginatedResponse.create(
data=[ArticleData.from_entity(a) for a in articles],
total=total,
limit=params.limit,
offset=params.offset
)
```
## Documentation Standards
### Code Comments
**When to Comment**:
```python
class NewsRepository:
async def upsert_batch(self, articles: list[NewsArticle], symbol: str) -> list[NewsArticle]:
# Don't comment obvious code
if not articles:
return []
# DO comment complex business logic
# Use PostgreSQL ON CONFLICT for atomic upsert operations.
# This prevents race conditions when multiple processes
# are updating the same articles simultaneously.
stmt = insert(NewsArticleEntity).values(entity_data_list)
upsert_stmt = stmt.on_conflict_do_update(
index_elements=["url"], # Deduplication key
set_={
# Update all fields except ID and created_at
**{col: stmt.excluded[col] for col in updateable_columns},
"updated_at": func.now(),
},
)
# DO comment performance optimizations
# Batch size of 1000 optimizes PostgreSQL memory usage
# while avoiding transaction timeout for large datasets
for chunk in chunks(entity_data_list, 1000):
result = await session.execute(upsert_stmt)
```
**TODO Comments**:
```python
# TODO(martin): Implement caching layer for frequently accessed articles
# TODO(martin): Add vector similarity search for related articles
# FIXME(martin): Handle edge case where published_date is in future
# HACK(martin): Temporary workaround for API rate limiting - remove after v2.0
```
### README Structure
**Repository README.md Template**:
```markdown
# TradingAgents - Multi-Agent Financial Analysis
Brief description of what the project does and why it exists.
## Quick Start
```bash
# 1. Setup environment
export OPENROUTER_API_KEY="your_key"
mise run docker # Start PostgreSQL
# 2. Install and run
mise run install
mise run dev # Interactive CLI
```
## Architecture
High-level overview with diagrams if helpful.
## Development
### Prerequisites
- Python 3.13+
- PostgreSQL 16+ with TimescaleDB
- OpenRouter API access
### Setup
```bash
mise run install # Install dependencies
mise run test # Run test suite
mise run format # Format code
```
### Testing
Details about test strategy and running tests.
## Configuration
Environment variables and configuration options.
## Contributing
Link to contributing guidelines.
```
### Commit Message Conventions
**Conventional Commits Format**:
```
type(scope): description
[optional body]
[optional footer(s)]
```
**Types**:
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation changes
- `style`: Code style changes (formatting, missing semicolons, etc.)
- `refactor`: Code refactoring
- `test`: Adding missing tests or correcting existing tests
- `chore`: Changes to build process or auxiliary tools
**Examples**:
```
feat(news): add vector similarity search for related articles
Implements pgvector-based similarity search using OpenAI embeddings.
Articles can now find related content based on semantic similarity
rather than just keyword matching.
- Add title_embedding and content_embedding columns
- Implement cosine similarity search in NewsRepository
- Add vector index for performance optimization
Closes #123
---
fix(database): handle connection timeouts in async sessions
Connection pooling was causing timeouts under high load.
Added proper timeout handling and connection recycling.
- Set pool_recycle=3600 for connection health
- Add retry logic for transient connection errors
- Improve error logging for debugging
---
test(news): add integration tests for batch upsert operations
Covers edge cases for duplicate URL handling and large batch processing.
---
docs(api): update OpenAPI spec for news endpoints
- Add pagination parameters
- Document error response formats
- Include example requests and responses
```
### Code Organization
**File and Directory Structure**:
```
tradingagents/
├── __init__.py
├── config.py # Application configuration
├── main.py # Entry point
├──
├── domains/ # Domain-driven design
│ ├── __init__.py
│ ├── news/ # News domain
│ │ ├── __init__.py
│ │ ├── news_service.py # Business logic
│ │ ├── news_repository.py # Data access
│ │ ├── google_news_client.py # External API
│ │ └── models.py # Domain models
│ ├── marketdata/ # Market data domain
│ └── socialmedia/ # Social media domain
├── agents/ # LLM agents
│ ├── __init__.py
│ ├── trading_agent.py
│ ├── analyst_agent.py
│ └── libs/ # Agent utilities
│ ├── __init__.py
│ └── agent_toolkit.py
├── lib/ # Shared utilities
│ ├── __init__.py
│ ├── database.py # Database connection
│ ├── logging.py # Logging configuration
│ └── utils.py # Common utilities
└── types/ # Shared type definitions
├── __init__.py
├── common.py
└── financial.py
```
This style guide ensures consistent, maintainable code across the TradingAgents project while leveraging modern Python features and database optimization techniques.