12 KiB
LLM Integration Architecture
This document describes how TradingAgents integrates with different Large Language Model (LLM) providers through a unified abstraction layer.
Overview
TradingAgents supports multiple LLM providers through a flexible configuration system that allows switching between providers without code changes.
Supported Providers
OpenAI
- Models: GPT-4o, GPT-4o-mini, o4-mini (default), o1-preview
- Strengths: Strong reasoning, reliable, extensive fine-tuning
- Use Case: Default choice for production
- API Key:
OPENAI_API_KEY - Endpoint:
https://api.openai.com/v1
Anthropic
- Models: Claude Sonnet 4, Claude Opus 4
- Strengths: Strong reasoning, long context windows, excellent instruction following
- Use Case: Alternative to OpenAI, good for complex analysis
- API Key:
ANTHROPIC_API_KEY - Endpoint:
https://api.anthropic.com
OpenRouter
- Models: Unified access to 100+ models from multiple providers
- Strengths: Single API for multiple providers, competitive pricing
- Use Case: Flexibility, cost optimization, accessing diverse models
- API Key:
OPENROUTER_API_KEY(plusOPENAI_API_KEYfor embeddings) - Endpoint:
https://openrouter.ai/api/v1
Google Generative AI
- Models: Gemini 2.0 Flash, Gemini Pro
- Strengths: Fast inference, multimodal capabilities
- Use Case: Cost-effective alternative, multimodal analysis
- API Key:
GOOGLE_API_KEY - Endpoint: Built-in (no custom endpoint)
Ollama
- Models: Local models (Llama, Mistral, etc.)
- Strengths: No API costs, data privacy, offline operation
- Use Case: Development, experimentation, privacy-sensitive analysis
- API Key: None (local)
- Endpoint:
http://localhost:11434/v1
Provider Abstraction
Configuration-Driven Selection
LLM providers are selected through configuration:
config = {
"llm_provider": "openai", # Provider selection
"deep_think_llm": "o4-mini", # Model for complex reasoning
"quick_think_llm": "gpt-4o-mini", # Model for fast tasks
"backend_url": "https://api.openai.com/v1"
}
Initialization Logic
The TradingAgentsGraph class handles provider initialization:
if config["llm_provider"].lower() in ("openai", "ollama"):
from langchain_openai import ChatOpenAI
self.deep_thinking_llm = ChatOpenAI(
model=config["deep_think_llm"],
base_url=config["backend_url"]
)
self.quick_thinking_llm = ChatOpenAI(
model=config["quick_think_llm"],
base_url=config["backend_url"]
)
elif config["llm_provider"].lower() == "anthropic":
from langchain_anthropic import ChatAnthropic
self.deep_thinking_llm = ChatAnthropic(
model=config["deep_think_llm"],
base_url=config["backend_url"]
)
self.quick_thinking_llm = ChatAnthropic(
model=config["quick_think_llm"],
base_url=config["backend_url"]
)
elif config["llm_provider"].lower() == "openrouter":
from langchain_openai import ChatOpenAI
openrouter_key = os.getenv("OPENROUTER_API_KEY")
if not openrouter_key:
raise ValueError("OPENROUTER_API_KEY required")
default_headers = {
"HTTP-Referer": "https://github.com/TauricResearch/TradingAgents",
"X-Title": "TradingAgents"
}
self.deep_thinking_llm = ChatOpenAI(
model=config["deep_think_llm"],
base_url=config["backend_url"],
api_key=openrouter_key,
default_headers=default_headers
)
self.quick_thinking_llm = ChatOpenAI(
model=config["quick_think_llm"],
base_url=config["backend_url"],
api_key=openrouter_key,
default_headers=default_headers
)
elif config["llm_provider"].lower() == "google":
from langchain_google_genai import ChatGoogleGenerativeAI
self.deep_thinking_llm = ChatGoogleGenerativeAI(
model=config["deep_think_llm"]
)
self.quick_thinking_llm = ChatGoogleGenerativeAI(
model=config["quick_think_llm"]
)
Location: tradingagents/graph/trading_graph.py
Model Selection Strategy
Two-Tier Model Approach
TradingAgents uses two types of LLMs for different tasks:
Deep Thinking LLM
- Purpose: Complex reasoning, strategic analysis, debate moderation
- Characteristics: Larger models, slower, more expensive, higher quality
- Use Cases:
- Researcher debate moderation
- Trading decision synthesis
- Risk assessment evaluation
- Recommended Models:
- OpenAI: o4-mini, o1-preview
- Anthropic: claude-sonnet-4, claude-opus-4
- OpenRouter: anthropic/claude-sonnet-4.5
Quick Thinking LLM
- Purpose: Fast analysis, data summarization, routine tasks
- Characteristics: Smaller models, faster, cost-effective
- Use Cases:
- Analyst report generation
- Data interpretation
- Tool calling
- Recommended Models:
- OpenAI: gpt-4o-mini, gpt-4o
- Anthropic: claude-sonnet-4
- OpenRouter: openai/gpt-4o-mini
Model Selection Guidelines
For Production:
config["deep_think_llm"] = "o1-preview" # Best reasoning
config["quick_think_llm"] = "gpt-4o-mini" # Cost-effective
For Development/Testing:
config["deep_think_llm"] = "o4-mini" # Fast and cheaper
config["quick_think_llm"] = "gpt-4o-mini" # Consistent quality
For Cost Optimization:
config["llm_provider"] = "openrouter"
config["deep_think_llm"] = "anthropic/claude-sonnet-4.5"
config["quick_think_llm"] = "openai/gpt-4o-mini"
Provider-Specific Configuration
OpenAI Configuration
config = {
"llm_provider": "openai",
"deep_think_llm": "o4-mini",
"quick_think_llm": "gpt-4o-mini",
"backend_url": "https://api.openai.com/v1"
}
Environment:
export OPENAI_API_KEY=sk-your_key_here
Anthropic Configuration
config = {
"llm_provider": "anthropic",
"deep_think_llm": "claude-sonnet-4-20250514",
"quick_think_llm": "claude-sonnet-4-20250514",
"backend_url": "https://api.anthropic.com"
}
Environment:
export ANTHROPIC_API_KEY=sk-ant-your_key_here
OpenRouter Configuration
config = {
"llm_provider": "openrouter",
"deep_think_llm": "anthropic/claude-sonnet-4.5",
"quick_think_llm": "openai/gpt-4o-mini",
"backend_url": "https://openrouter.ai/api/v1"
}
Environment:
export OPENROUTER_API_KEY=sk-or-v1-your_key_here
export OPENAI_API_KEY=sk-your_key_here # Required for embeddings
Note: OpenRouter uses provider/model-name format:
anthropic/claude-sonnet-4.5openai/gpt-4ogoogle/gemini-pro
Google Generative AI Configuration
config = {
"llm_provider": "google",
"deep_think_llm": "gemini-2.0-flash",
"quick_think_llm": "gemini-2.0-flash"
}
Environment:
export GOOGLE_API_KEY=your_key_here
Ollama Configuration
config = {
"llm_provider": "ollama",
"deep_think_llm": "mistral",
"quick_think_llm": "mistral",
"backend_url": "http://localhost:11434/v1"
}
Prerequisites:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull model
ollama pull mistral
# Start Ollama server
ollama serve
Error Handling
Rate Limit Handling
Unified rate limit error handling across providers:
from tradingagents.utils.exceptions import LLMRateLimitError
try:
response = llm.invoke(messages)
except LLMRateLimitError as e:
print(f"Rate limit hit: {e.message}")
if e.retry_after:
print(f"Retry after {e.retry_after} seconds")
Location: tradingagents/utils/exceptions.py
Provider-Specific Errors
Each provider may raise different errors:
OpenAI:
RateLimitError→ Retry after specified timeInvalidRequestError→ Check model name, parametersAuthenticationError→ Verify API key
Anthropic:
RateLimitError→ Retry with backoffInvalidRequestError→ Check message formatAPIError→ Server-side issues
OpenRouter:
- Follows OpenAI error format
- Additional headers required for attribution
Fallback Strategy
Implement provider fallback for resilience:
providers = ["openai", "anthropic", "openrouter"]
for provider in providers:
try:
config["llm_provider"] = provider
ta = TradingAgentsGraph(config=config)
result = ta.propagate(ticker, date)
break
except LLMRateLimitError:
continue
Cost Optimization
Model Cost Comparison
Deep Thinking Tasks:
| Provider | Model | Cost/1M Tokens (Input/Output) |
|---|---|---|
| OpenAI | o4-mini | $1.50 / $6.00 |
| OpenAI | o1-preview | $15.00 / $60.00 |
| Anthropic | claude-sonnet-4 | $3.00 / $15.00 |
| OpenRouter | Varies by model | Check OpenRouter pricing |
Quick Thinking Tasks:
| Provider | Model | Cost/1M Tokens (Input/Output) |
|---|---|---|
| OpenAI | gpt-4o-mini | $0.15 / $0.60 |
| OpenAI | gpt-4o | $2.50 / $10.00 |
| gemini-2.0-flash | Free tier available | |
| Ollama | Local models | Free (local) |
Cost Reduction Strategies
-
Use Smaller Models for Simple Tasks
config["quick_think_llm"] = "gpt-4o-mini" # Instead of gpt-4o -
Reduce Debate Rounds
config["max_debate_rounds"] = 1 # Instead of 2-3 -
Use OpenRouter for Competitive Pricing
config["llm_provider"] = "openrouter" -
Cache LLM Responses
# Implemented in agent memory system memory.store_analysis(ticker, date, result) -
Use Ollama for Development
config["llm_provider"] = "ollama" # No API costs
Embeddings
Embedding Provider
TradingAgents uses OpenAI embeddings for vector storage (memory system):
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
Important: Even when using non-OpenAI LLM providers (Anthropic, Google, etc.), OPENAI_API_KEY is still required for embeddings.
Alternative Embedding Providers
For fully offline operation, consider:
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
Note: This requires updating the memory initialization code.
Performance Considerations
Latency
Provider Latency (Approximate):
- OpenAI: 1-3 seconds per request
- Anthropic: 1-2 seconds per request
- Google: 0.5-1.5 seconds per request
- OpenRouter: Varies by underlying model
- Ollama: 0.5-5 seconds (depends on local hardware)
Throughput
Concurrent Requests:
- OpenAI: Tier-based limits (20-5000 RPM)
- Anthropic: Tier-based limits (50-2000 RPM)
- OpenRouter: Model-specific limits
- Ollama: Limited by local GPU/CPU
Caching
LangChain provides built-in caching:
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
Best Practices
- Set API Keys as Environment Variables: Never hardcode keys
- Use Two-Tier Model Strategy: Deep/quick thinking separation
- Implement Error Handling: Catch rate limits and retry
- Monitor Costs: Track token usage and expenses
- Test with Cheaper Models: Use o4-mini/gpt-4o-mini for development
- Cache When Possible: Avoid redundant API calls
- Use OpenRouter for Flexibility: Easy switching between providers
- Implement Timeouts: Prevent hanging requests
- Log API Usage: Track which models are called
- Consider Local Models: Ollama for sensitive data or development