TradingAgents/docs/architecture/llm-integration.md

12 KiB

LLM Integration Architecture

This document describes how TradingAgents integrates with different Large Language Model (LLM) providers through a unified abstraction layer.

Overview

TradingAgents supports multiple LLM providers through a flexible configuration system that allows switching between providers without code changes.

Supported Providers

OpenAI

  • Models: GPT-4o, GPT-4o-mini, o4-mini (default), o1-preview
  • Strengths: Strong reasoning, reliable, extensive fine-tuning
  • Use Case: Default choice for production
  • API Key: OPENAI_API_KEY
  • Endpoint: https://api.openai.com/v1

Anthropic

  • Models: Claude Sonnet 4, Claude Opus 4
  • Strengths: Strong reasoning, long context windows, excellent instruction following
  • Use Case: Alternative to OpenAI, good for complex analysis
  • API Key: ANTHROPIC_API_KEY
  • Endpoint: https://api.anthropic.com

OpenRouter

  • Models: Unified access to 100+ models from multiple providers
  • Strengths: Single API for multiple providers, competitive pricing
  • Use Case: Flexibility, cost optimization, accessing diverse models
  • API Key: OPENROUTER_API_KEY (plus OPENAI_API_KEY for embeddings)
  • Endpoint: https://openrouter.ai/api/v1

Google Generative AI

  • Models: Gemini 2.0 Flash, Gemini Pro
  • Strengths: Fast inference, multimodal capabilities
  • Use Case: Cost-effective alternative, multimodal analysis
  • API Key: GOOGLE_API_KEY
  • Endpoint: Built-in (no custom endpoint)

Ollama

  • Models: Local models (Llama, Mistral, etc.)
  • Strengths: No API costs, data privacy, offline operation
  • Use Case: Development, experimentation, privacy-sensitive analysis
  • API Key: None (local)
  • Endpoint: http://localhost:11434/v1

Provider Abstraction

Configuration-Driven Selection

LLM providers are selected through configuration:

config = {
    "llm_provider": "openai",  # Provider selection
    "deep_think_llm": "o4-mini",  # Model for complex reasoning
    "quick_think_llm": "gpt-4o-mini",  # Model for fast tasks
    "backend_url": "https://api.openai.com/v1"
}

Initialization Logic

The TradingAgentsGraph class handles provider initialization:

if config["llm_provider"].lower() in ("openai", "ollama"):
    from langchain_openai import ChatOpenAI

    self.deep_thinking_llm = ChatOpenAI(
        model=config["deep_think_llm"],
        base_url=config["backend_url"]
    )
    self.quick_thinking_llm = ChatOpenAI(
        model=config["quick_think_llm"],
        base_url=config["backend_url"]
    )

elif config["llm_provider"].lower() == "anthropic":
    from langchain_anthropic import ChatAnthropic

    self.deep_thinking_llm = ChatAnthropic(
        model=config["deep_think_llm"],
        base_url=config["backend_url"]
    )
    self.quick_thinking_llm = ChatAnthropic(
        model=config["quick_think_llm"],
        base_url=config["backend_url"]
    )

elif config["llm_provider"].lower() == "openrouter":
    from langchain_openai import ChatOpenAI

    openrouter_key = os.getenv("OPENROUTER_API_KEY")
    if not openrouter_key:
        raise ValueError("OPENROUTER_API_KEY required")

    default_headers = {
        "HTTP-Referer": "https://github.com/TauricResearch/TradingAgents",
        "X-Title": "TradingAgents"
    }

    self.deep_thinking_llm = ChatOpenAI(
        model=config["deep_think_llm"],
        base_url=config["backend_url"],
        api_key=openrouter_key,
        default_headers=default_headers
    )
    self.quick_thinking_llm = ChatOpenAI(
        model=config["quick_think_llm"],
        base_url=config["backend_url"],
        api_key=openrouter_key,
        default_headers=default_headers
    )

elif config["llm_provider"].lower() == "google":
    from langchain_google_genai import ChatGoogleGenerativeAI

    self.deep_thinking_llm = ChatGoogleGenerativeAI(
        model=config["deep_think_llm"]
    )
    self.quick_thinking_llm = ChatGoogleGenerativeAI(
        model=config["quick_think_llm"]
    )

Location: tradingagents/graph/trading_graph.py

Model Selection Strategy

Two-Tier Model Approach

TradingAgents uses two types of LLMs for different tasks:

Deep Thinking LLM

  • Purpose: Complex reasoning, strategic analysis, debate moderation
  • Characteristics: Larger models, slower, more expensive, higher quality
  • Use Cases:
    • Researcher debate moderation
    • Trading decision synthesis
    • Risk assessment evaluation
  • Recommended Models:
    • OpenAI: o4-mini, o1-preview
    • Anthropic: claude-sonnet-4, claude-opus-4
    • OpenRouter: anthropic/claude-sonnet-4.5

Quick Thinking LLM

  • Purpose: Fast analysis, data summarization, routine tasks
  • Characteristics: Smaller models, faster, cost-effective
  • Use Cases:
    • Analyst report generation
    • Data interpretation
    • Tool calling
  • Recommended Models:
    • OpenAI: gpt-4o-mini, gpt-4o
    • Anthropic: claude-sonnet-4
    • OpenRouter: openai/gpt-4o-mini

Model Selection Guidelines

For Production:

config["deep_think_llm"] = "o1-preview"      # Best reasoning
config["quick_think_llm"] = "gpt-4o-mini"    # Cost-effective

For Development/Testing:

config["deep_think_llm"] = "o4-mini"         # Fast and cheaper
config["quick_think_llm"] = "gpt-4o-mini"    # Consistent quality

For Cost Optimization:

config["llm_provider"] = "openrouter"
config["deep_think_llm"] = "anthropic/claude-sonnet-4.5"
config["quick_think_llm"] = "openai/gpt-4o-mini"

Provider-Specific Configuration

OpenAI Configuration

config = {
    "llm_provider": "openai",
    "deep_think_llm": "o4-mini",
    "quick_think_llm": "gpt-4o-mini",
    "backend_url": "https://api.openai.com/v1"
}

Environment:

export OPENAI_API_KEY=sk-your_key_here

Anthropic Configuration

config = {
    "llm_provider": "anthropic",
    "deep_think_llm": "claude-sonnet-4-20250514",
    "quick_think_llm": "claude-sonnet-4-20250514",
    "backend_url": "https://api.anthropic.com"
}

Environment:

export ANTHROPIC_API_KEY=sk-ant-your_key_here

OpenRouter Configuration

config = {
    "llm_provider": "openrouter",
    "deep_think_llm": "anthropic/claude-sonnet-4.5",
    "quick_think_llm": "openai/gpt-4o-mini",
    "backend_url": "https://openrouter.ai/api/v1"
}

Environment:

export OPENROUTER_API_KEY=sk-or-v1-your_key_here
export OPENAI_API_KEY=sk-your_key_here  # Required for embeddings

Note: OpenRouter uses provider/model-name format:

  • anthropic/claude-sonnet-4.5
  • openai/gpt-4o
  • google/gemini-pro

Google Generative AI Configuration

config = {
    "llm_provider": "google",
    "deep_think_llm": "gemini-2.0-flash",
    "quick_think_llm": "gemini-2.0-flash"
}

Environment:

export GOOGLE_API_KEY=your_key_here

Ollama Configuration

config = {
    "llm_provider": "ollama",
    "deep_think_llm": "mistral",
    "quick_think_llm": "mistral",
    "backend_url": "http://localhost:11434/v1"
}

Prerequisites:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull model
ollama pull mistral

# Start Ollama server
ollama serve

Error Handling

Rate Limit Handling

Unified rate limit error handling across providers:

from tradingagents.utils.exceptions import LLMRateLimitError

try:
    response = llm.invoke(messages)
except LLMRateLimitError as e:
    print(f"Rate limit hit: {e.message}")
    if e.retry_after:
        print(f"Retry after {e.retry_after} seconds")

Location: tradingagents/utils/exceptions.py

Provider-Specific Errors

Each provider may raise different errors:

OpenAI:

  • RateLimitError → Retry after specified time
  • InvalidRequestError → Check model name, parameters
  • AuthenticationError → Verify API key

Anthropic:

  • RateLimitError → Retry with backoff
  • InvalidRequestError → Check message format
  • APIError → Server-side issues

OpenRouter:

  • Follows OpenAI error format
  • Additional headers required for attribution

Fallback Strategy

Implement provider fallback for resilience:

providers = ["openai", "anthropic", "openrouter"]

for provider in providers:
    try:
        config["llm_provider"] = provider
        ta = TradingAgentsGraph(config=config)
        result = ta.propagate(ticker, date)
        break
    except LLMRateLimitError:
        continue

Cost Optimization

Model Cost Comparison

Deep Thinking Tasks:

Provider Model Cost/1M Tokens (Input/Output)
OpenAI o4-mini $1.50 / $6.00
OpenAI o1-preview $15.00 / $60.00
Anthropic claude-sonnet-4 $3.00 / $15.00
OpenRouter Varies by model Check OpenRouter pricing

Quick Thinking Tasks:

Provider Model Cost/1M Tokens (Input/Output)
OpenAI gpt-4o-mini $0.15 / $0.60
OpenAI gpt-4o $2.50 / $10.00
Google gemini-2.0-flash Free tier available
Ollama Local models Free (local)

Cost Reduction Strategies

  1. Use Smaller Models for Simple Tasks

    config["quick_think_llm"] = "gpt-4o-mini"  # Instead of gpt-4o
    
  2. Reduce Debate Rounds

    config["max_debate_rounds"] = 1  # Instead of 2-3
    
  3. Use OpenRouter for Competitive Pricing

    config["llm_provider"] = "openrouter"
    
  4. Cache LLM Responses

    # Implemented in agent memory system
    memory.store_analysis(ticker, date, result)
    
  5. Use Ollama for Development

    config["llm_provider"] = "ollama"  # No API costs
    

Embeddings

Embedding Provider

TradingAgents uses OpenAI embeddings for vector storage (memory system):

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Important: Even when using non-OpenAI LLM providers (Anthropic, Google, etc.), OPENAI_API_KEY is still required for embeddings.

Alternative Embedding Providers

For fully offline operation, consider:

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Note: This requires updating the memory initialization code.

Performance Considerations

Latency

Provider Latency (Approximate):

  • OpenAI: 1-3 seconds per request
  • Anthropic: 1-2 seconds per request
  • Google: 0.5-1.5 seconds per request
  • OpenRouter: Varies by underlying model
  • Ollama: 0.5-5 seconds (depends on local hardware)

Throughput

Concurrent Requests:

  • OpenAI: Tier-based limits (20-5000 RPM)
  • Anthropic: Tier-based limits (50-2000 RPM)
  • OpenRouter: Model-specific limits
  • Ollama: Limited by local GPU/CPU

Caching

LangChain provides built-in caching:

from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

Best Practices

  1. Set API Keys as Environment Variables: Never hardcode keys
  2. Use Two-Tier Model Strategy: Deep/quick thinking separation
  3. Implement Error Handling: Catch rate limits and retry
  4. Monitor Costs: Track token usage and expenses
  5. Test with Cheaper Models: Use o4-mini/gpt-4o-mini for development
  6. Cache When Possible: Avoid redundant API calls
  7. Use OpenRouter for Flexibility: Easy switching between providers
  8. Implement Timeouts: Prevent hanging requests
  9. Log API Usage: Track which models are called
  10. Consider Local Models: Ollama for sensitive data or development

References