12 KiB

Raw Blame History

LLM Integration Architecture

This document describes how TradingAgents integrates with different Large Language Model (LLM) providers through a unified abstraction layer.

Overview

TradingAgents supports multiple LLM providers through a flexible configuration system that allows switching between providers without code changes.

Supported Providers

OpenAI

Models: GPT-4o, GPT-4o-mini, o4-mini (default), o1-preview
Strengths: Strong reasoning, reliable, extensive fine-tuning
Use Case: Default choice for production
API Key: OPENAI_API_KEY
Endpoint: https://api.openai.com/v1

Anthropic

Models: Claude Sonnet 4, Claude Opus 4
Strengths: Strong reasoning, long context windows, excellent instruction following
Use Case: Alternative to OpenAI, good for complex analysis
API Key: ANTHROPIC_API_KEY
Endpoint: https://api.anthropic.com

OpenRouter

Models: Unified access to 100+ models from multiple providers
Strengths: Single API for multiple providers, competitive pricing
Use Case: Flexibility, cost optimization, accessing diverse models
API Key: OPENROUTER_API_KEY (plus OPENAI_API_KEY for embeddings)
Endpoint: https://openrouter.ai/api/v1

Google Generative AI

Models: Gemini 2.0 Flash, Gemini Pro
Strengths: Fast inference, multimodal capabilities
Use Case: Cost-effective alternative, multimodal analysis
API Key: GOOGLE_API_KEY
Endpoint: Built-in (no custom endpoint)

Ollama

Models: Local models (Llama, Mistral, etc.)
Strengths: No API costs, data privacy, offline operation
Use Case: Development, experimentation, privacy-sensitive analysis
API Key: None (local)
Endpoint: http://localhost:11434/v1

Provider Abstraction

Configuration-Driven Selection

LLM providers are selected through configuration:

config = {
    "llm_provider": "openai",  # Provider selection
    "deep_think_llm": "o4-mini",  # Model for complex reasoning
    "quick_think_llm": "gpt-4o-mini",  # Model for fast tasks
    "backend_url": "https://api.openai.com/v1"
}

Initialization Logic

The TradingAgentsGraph class handles provider initialization:

if config["llm_provider"].lower() in ("openai", "ollama"):
    from langchain_openai import ChatOpenAI

    self.deep_thinking_llm = ChatOpenAI(
        model=config["deep_think_llm"],
        base_url=config["backend_url"]
    )
    self.quick_thinking_llm = ChatOpenAI(
        model=config["quick_think_llm"],
        base_url=config["backend_url"]
    )

elif config["llm_provider"].lower() == "anthropic":
    from langchain_anthropic import ChatAnthropic

    self.deep_thinking_llm = ChatAnthropic(
        model=config["deep_think_llm"],
        base_url=config["backend_url"]
    )
    self.quick_thinking_llm = ChatAnthropic(
        model=config["quick_think_llm"],
        base_url=config["backend_url"]
    )

elif config["llm_provider"].lower() == "openrouter":
    from langchain_openai import ChatOpenAI

    openrouter_key = os.getenv("OPENROUTER_API_KEY")
    if not openrouter_key:
        raise ValueError("OPENROUTER_API_KEY required")

    default_headers = {
        "HTTP-Referer": "https://github.com/TauricResearch/TradingAgents",
        "X-Title": "TradingAgents"
    }

    self.deep_thinking_llm = ChatOpenAI(
        model=config["deep_think_llm"],
        base_url=config["backend_url"],
        api_key=openrouter_key,
        default_headers=default_headers
    )
    self.quick_thinking_llm = ChatOpenAI(
        model=config["quick_think_llm"],
        base_url=config["backend_url"],
        api_key=openrouter_key,
        default_headers=default_headers
    )

elif config["llm_provider"].lower() == "google":
    from langchain_google_genai import ChatGoogleGenerativeAI

    self.deep_thinking_llm = ChatGoogleGenerativeAI(
        model=config["deep_think_llm"]
    )
    self.quick_thinking_llm = ChatGoogleGenerativeAI(
        model=config["quick_think_llm"]
    )

Location: tradingagents/graph/trading_graph.py

Model Selection Strategy

Two-Tier Model Approach

TradingAgents uses two types of LLMs for different tasks:

Deep Thinking LLM

Purpose: Complex reasoning, strategic analysis, debate moderation
Characteristics: Larger models, slower, more expensive, higher quality
Use Cases:
- Researcher debate moderation
- Trading decision synthesis
- Risk assessment evaluation
Recommended Models:
- OpenAI: o4-mini, o1-preview
- Anthropic: claude-sonnet-4, claude-opus-4
- OpenRouter: anthropic/claude-sonnet-4.5

Quick Thinking LLM

Purpose: Fast analysis, data summarization, routine tasks
Characteristics: Smaller models, faster, cost-effective
Use Cases:
- Analyst report generation
- Data interpretation
- Tool calling
Recommended Models:
- OpenAI: gpt-4o-mini, gpt-4o
- Anthropic: claude-sonnet-4
- OpenRouter: openai/gpt-4o-mini

Model Selection Guidelines

For Production:

config["deep_think_llm"] = "o1-preview"      # Best reasoning
config["quick_think_llm"] = "gpt-4o-mini"    # Cost-effective

For Development/Testing:

config["deep_think_llm"] = "o4-mini"         # Fast and cheaper
config["quick_think_llm"] = "gpt-4o-mini"    # Consistent quality

For Cost Optimization:

config["llm_provider"] = "openrouter"
config["deep_think_llm"] = "anthropic/claude-sonnet-4.5"
config["quick_think_llm"] = "openai/gpt-4o-mini"

Provider-Specific Configuration

OpenAI Configuration

config = {
    "llm_provider": "openai",
    "deep_think_llm": "o4-mini",
    "quick_think_llm": "gpt-4o-mini",
    "backend_url": "https://api.openai.com/v1"
}

Environment:

export OPENAI_API_KEY=sk-your_key_here

Anthropic Configuration

config = {
    "llm_provider": "anthropic",
    "deep_think_llm": "claude-sonnet-4-20250514",
    "quick_think_llm": "claude-sonnet-4-20250514",
    "backend_url": "https://api.anthropic.com"
}

Environment:

export ANTHROPIC_API_KEY=sk-ant-your_key_here

OpenRouter Configuration

config = {
    "llm_provider": "openrouter",
    "deep_think_llm": "anthropic/claude-sonnet-4.5",
    "quick_think_llm": "openai/gpt-4o-mini",
    "backend_url": "https://openrouter.ai/api/v1"
}

Environment:

export OPENROUTER_API_KEY=sk-or-v1-your_key_here
export OPENAI_API_KEY=sk-your_key_here  # Required for embeddings

Note: OpenRouter uses provider/model-name format:

anthropic/claude-sonnet-4.5
openai/gpt-4o
google/gemini-pro

Google Generative AI Configuration

config = {
    "llm_provider": "google",
    "deep_think_llm": "gemini-2.0-flash",
    "quick_think_llm": "gemini-2.0-flash"
}

Environment:

export GOOGLE_API_KEY=your_key_here

Ollama Configuration

config = {
    "llm_provider": "ollama",
    "deep_think_llm": "mistral",
    "quick_think_llm": "mistral",
    "backend_url": "http://localhost:11434/v1"
}

Prerequisites:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull model
ollama pull mistral

# Start Ollama server
ollama serve

Error Handling

Rate Limit Handling

Unified rate limit error handling across providers:

from tradingagents.utils.exceptions import LLMRateLimitError

try:
    response = llm.invoke(messages)
except LLMRateLimitError as e:
    print(f"Rate limit hit: {e.message}")
    if e.retry_after:
        print(f"Retry after {e.retry_after} seconds")

Location: tradingagents/utils/exceptions.py

Provider-Specific Errors

Each provider may raise different errors:

OpenAI:

RateLimitError → Retry after specified time
InvalidRequestError → Check model name, parameters
AuthenticationError → Verify API key

Anthropic:

RateLimitError → Retry with backoff
InvalidRequestError → Check message format
APIError → Server-side issues

OpenRouter:

Follows OpenAI error format
Additional headers required for attribution

Fallback Strategy

Implement provider fallback for resilience:

providers = ["openai", "anthropic", "openrouter"]

for provider in providers:
    try:
        config["llm_provider"] = provider
        ta = TradingAgentsGraph(config=config)
        result = ta.propagate(ticker, date)
        break
    except LLMRateLimitError:
        continue

Cost Optimization

Model Cost Comparison

Deep Thinking Tasks:

Provider	Model	Cost/1M Tokens (Input/Output)
OpenAI	o4-mini	$1.50 / $6.00
OpenAI	o1-preview	$15.00 / $60.00
Anthropic	claude-sonnet-4	$3.00 / $15.00
OpenRouter	Varies by model	Check OpenRouter pricing

Quick Thinking Tasks:

Provider	Model	Cost/1M Tokens (Input/Output)
OpenAI	gpt-4o-mini	$0.15 / $0.60
OpenAI	gpt-4o	$2.50 / $10.00
Google	gemini-2.0-flash	Free tier available
Ollama	Local models	Free (local)

Cost Reduction Strategies

Use Smaller Models for Simple Tasks

config["quick_think_llm"] = "gpt-4o-mini"  # Instead of gpt-4o

Reduce Debate Rounds

config["max_debate_rounds"] = 1  # Instead of 2-3

Use OpenRouter for Competitive Pricing
```
config["llm_provider"] = "openrouter"
```

Cache LLM Responses

# Implemented in agent memory system
memory.store_analysis(ticker, date, result)

Use Ollama for Development

config["llm_provider"] = "ollama"  # No API costs

Embeddings

Embedding Provider

TradingAgents uses OpenAI embeddings for vector storage (memory system):

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Important: Even when using non-OpenAI LLM providers (Anthropic, Google, etc.), OPENAI_API_KEY is still required for embeddings.

Alternative Embedding Providers

For fully offline operation, consider:

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Note: This requires updating the memory initialization code.

Performance Considerations

Latency

Provider Latency (Approximate):

OpenAI: 1-3 seconds per request
Anthropic: 1-2 seconds per request
Google: 0.5-1.5 seconds per request
OpenRouter: Varies by underlying model
Ollama: 0.5-5 seconds (depends on local hardware)

Throughput

Concurrent Requests:

OpenAI: Tier-based limits (20-5000 RPM)
Anthropic: Tier-based limits (50-2000 RPM)
OpenRouter: Model-specific limits
Ollama: Limited by local GPU/CPU

Caching

LangChain provides built-in caching:

from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

Best Practices

Set API Keys as Environment Variables: Never hardcode keys
Use Two-Tier Model Strategy: Deep/quick thinking separation
Implement Error Handling: Catch rate limits and retry
Monitor Costs: Track token usage and expenses
Test with Cheaper Models: Use o4-mini/gpt-4o-mini for development
Cache When Possible: Avoid redundant API calls
Use OpenRouter for Flexibility: Easy switching between providers
Implement Timeouts: Prevent hanging requests
Log API Usage: Track which models are called
Consider Local Models: Ollama for sensitive data or development

12 KiB Raw Blame History

LLM Integration Architecture

Overview

Supported Providers

OpenAI

Anthropic

OpenRouter

Google Generative AI

Ollama

Provider Abstraction

Configuration-Driven Selection

Initialization Logic

Model Selection Strategy

Two-Tier Model Approach

Deep Thinking LLM

Quick Thinking LLM

Model Selection Guidelines

Provider-Specific Configuration

OpenAI Configuration

Anthropic Configuration

OpenRouter Configuration

Google Generative AI Configuration

Ollama Configuration

Error Handling

Rate Limit Handling

Provider-Specific Errors

Fallback Strategy

Cost Optimization

Model Cost Comparison

Cost Reduction Strategies

Embeddings

Embedding Provider

Alternative Embedding Providers

Performance Considerations

Latency

Throughput

Caching

Best Practices

References

12 KiB

Raw Blame History