TradingAgents/docs/architecture/llm-integration.md

452 lines
12 KiB
Markdown

# LLM Integration Architecture
This document describes how TradingAgents integrates with different Large Language Model (LLM) providers through a unified abstraction layer.
## Overview
TradingAgents supports multiple LLM providers through a flexible configuration system that allows switching between providers without code changes.
## Supported Providers
### OpenAI
- **Models**: GPT-4o, GPT-4o-mini, o4-mini (default), o1-preview
- **Strengths**: Strong reasoning, reliable, extensive fine-tuning
- **Use Case**: Default choice for production
- **API Key**: `OPENAI_API_KEY`
- **Endpoint**: `https://api.openai.com/v1`
### Anthropic
- **Models**: Claude Sonnet 4, Claude Opus 4
- **Strengths**: Strong reasoning, long context windows, excellent instruction following
- **Use Case**: Alternative to OpenAI, good for complex analysis
- **API Key**: `ANTHROPIC_API_KEY`
- **Endpoint**: `https://api.anthropic.com`
### OpenRouter
- **Models**: Unified access to 100+ models from multiple providers
- **Strengths**: Single API for multiple providers, competitive pricing
- **Use Case**: Flexibility, cost optimization, accessing diverse models
- **API Key**: `OPENROUTER_API_KEY` (plus `OPENAI_API_KEY` for embeddings)
- **Endpoint**: `https://openrouter.ai/api/v1`
### Google Generative AI
- **Models**: Gemini 2.0 Flash, Gemini Pro
- **Strengths**: Fast inference, multimodal capabilities
- **Use Case**: Cost-effective alternative, multimodal analysis
- **API Key**: `GOOGLE_API_KEY`
- **Endpoint**: Built-in (no custom endpoint)
### Ollama
- **Models**: Local models (Llama, Mistral, etc.)
- **Strengths**: No API costs, data privacy, offline operation
- **Use Case**: Development, experimentation, privacy-sensitive analysis
- **API Key**: None (local)
- **Endpoint**: `http://localhost:11434/v1`
## Provider Abstraction
### Configuration-Driven Selection
LLM providers are selected through configuration:
```python
config = {
"llm_provider": "openai", # Provider selection
"deep_think_llm": "o4-mini", # Model for complex reasoning
"quick_think_llm": "gpt-4o-mini", # Model for fast tasks
"backend_url": "https://api.openai.com/v1"
}
```
### Initialization Logic
The `TradingAgentsGraph` class handles provider initialization:
```python
if config["llm_provider"].lower() in ("openai", "ollama"):
from langchain_openai import ChatOpenAI
self.deep_thinking_llm = ChatOpenAI(
model=config["deep_think_llm"],
base_url=config["backend_url"]
)
self.quick_thinking_llm = ChatOpenAI(
model=config["quick_think_llm"],
base_url=config["backend_url"]
)
elif config["llm_provider"].lower() == "anthropic":
from langchain_anthropic import ChatAnthropic
self.deep_thinking_llm = ChatAnthropic(
model=config["deep_think_llm"],
base_url=config["backend_url"]
)
self.quick_thinking_llm = ChatAnthropic(
model=config["quick_think_llm"],
base_url=config["backend_url"]
)
elif config["llm_provider"].lower() == "openrouter":
from langchain_openai import ChatOpenAI
openrouter_key = os.getenv("OPENROUTER_API_KEY")
if not openrouter_key:
raise ValueError("OPENROUTER_API_KEY required")
default_headers = {
"HTTP-Referer": "https://github.com/TauricResearch/TradingAgents",
"X-Title": "TradingAgents"
}
self.deep_thinking_llm = ChatOpenAI(
model=config["deep_think_llm"],
base_url=config["backend_url"],
api_key=openrouter_key,
default_headers=default_headers
)
self.quick_thinking_llm = ChatOpenAI(
model=config["quick_think_llm"],
base_url=config["backend_url"],
api_key=openrouter_key,
default_headers=default_headers
)
elif config["llm_provider"].lower() == "google":
from langchain_google_genai import ChatGoogleGenerativeAI
self.deep_thinking_llm = ChatGoogleGenerativeAI(
model=config["deep_think_llm"]
)
self.quick_thinking_llm = ChatGoogleGenerativeAI(
model=config["quick_think_llm"]
)
```
Location: `tradingagents/graph/trading_graph.py`
## Model Selection Strategy
### Two-Tier Model Approach
TradingAgents uses two types of LLMs for different tasks:
#### Deep Thinking LLM
- **Purpose**: Complex reasoning, strategic analysis, debate moderation
- **Characteristics**: Larger models, slower, more expensive, higher quality
- **Use Cases**:
- Researcher debate moderation
- Trading decision synthesis
- Risk assessment evaluation
- **Recommended Models**:
- OpenAI: o4-mini, o1-preview
- Anthropic: claude-sonnet-4, claude-opus-4
- OpenRouter: anthropic/claude-sonnet-4.5
#### Quick Thinking LLM
- **Purpose**: Fast analysis, data summarization, routine tasks
- **Characteristics**: Smaller models, faster, cost-effective
- **Use Cases**:
- Analyst report generation
- Data interpretation
- Tool calling
- **Recommended Models**:
- OpenAI: gpt-4o-mini, gpt-4o
- Anthropic: claude-sonnet-4
- OpenRouter: openai/gpt-4o-mini
### Model Selection Guidelines
**For Production:**
```python
config["deep_think_llm"] = "o1-preview" # Best reasoning
config["quick_think_llm"] = "gpt-4o-mini" # Cost-effective
```
**For Development/Testing:**
```python
config["deep_think_llm"] = "o4-mini" # Fast and cheaper
config["quick_think_llm"] = "gpt-4o-mini" # Consistent quality
```
**For Cost Optimization:**
```python
config["llm_provider"] = "openrouter"
config["deep_think_llm"] = "anthropic/claude-sonnet-4.5"
config["quick_think_llm"] = "openai/gpt-4o-mini"
```
## Provider-Specific Configuration
### OpenAI Configuration
```python
config = {
"llm_provider": "openai",
"deep_think_llm": "o4-mini",
"quick_think_llm": "gpt-4o-mini",
"backend_url": "https://api.openai.com/v1"
}
```
Environment:
```bash
export OPENAI_API_KEY=sk-your_key_here
```
### Anthropic Configuration
```python
config = {
"llm_provider": "anthropic",
"deep_think_llm": "claude-sonnet-4-20250514",
"quick_think_llm": "claude-sonnet-4-20250514",
"backend_url": "https://api.anthropic.com"
}
```
Environment:
```bash
export ANTHROPIC_API_KEY=sk-ant-your_key_here
```
### OpenRouter Configuration
```python
config = {
"llm_provider": "openrouter",
"deep_think_llm": "anthropic/claude-sonnet-4.5",
"quick_think_llm": "openai/gpt-4o-mini",
"backend_url": "https://openrouter.ai/api/v1"
}
```
Environment:
```bash
export OPENROUTER_API_KEY=sk-or-v1-your_key_here
export OPENAI_API_KEY=sk-your_key_here # Required for embeddings
```
**Note**: OpenRouter uses `provider/model-name` format:
- `anthropic/claude-sonnet-4.5`
- `openai/gpt-4o`
- `google/gemini-pro`
### Google Generative AI Configuration
```python
config = {
"llm_provider": "google",
"deep_think_llm": "gemini-2.0-flash",
"quick_think_llm": "gemini-2.0-flash"
}
```
Environment:
```bash
export GOOGLE_API_KEY=your_key_here
```
### Ollama Configuration
```python
config = {
"llm_provider": "ollama",
"deep_think_llm": "mistral",
"quick_think_llm": "mistral",
"backend_url": "http://localhost:11434/v1"
}
```
Prerequisites:
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull model
ollama pull mistral
# Start Ollama server
ollama serve
```
## Error Handling
### Rate Limit Handling
Unified rate limit error handling across providers:
```python
from tradingagents.utils.exceptions import LLMRateLimitError
try:
response = llm.invoke(messages)
except LLMRateLimitError as e:
print(f"Rate limit hit: {e.message}")
if e.retry_after:
print(f"Retry after {e.retry_after} seconds")
```
Location: `tradingagents/utils/exceptions.py`
### Provider-Specific Errors
Each provider may raise different errors:
**OpenAI:**
- `RateLimitError` → Retry after specified time
- `InvalidRequestError` → Check model name, parameters
- `AuthenticationError` → Verify API key
**Anthropic:**
- `RateLimitError` → Retry with backoff
- `InvalidRequestError` → Check message format
- `APIError` → Server-side issues
**OpenRouter:**
- Follows OpenAI error format
- Additional headers required for attribution
### Fallback Strategy
Implement provider fallback for resilience:
```python
providers = ["openai", "anthropic", "openrouter"]
for provider in providers:
try:
config["llm_provider"] = provider
ta = TradingAgentsGraph(config=config)
result = ta.propagate(ticker, date)
break
except LLMRateLimitError:
continue
```
## Cost Optimization
### Model Cost Comparison
**Deep Thinking Tasks:**
| Provider | Model | Cost/1M Tokens (Input/Output) |
|----------|-------|-------------------------------|
| OpenAI | o4-mini | $1.50 / $6.00 |
| OpenAI | o1-preview | $15.00 / $60.00 |
| Anthropic | claude-sonnet-4 | $3.00 / $15.00 |
| OpenRouter | Varies by model | Check OpenRouter pricing |
**Quick Thinking Tasks:**
| Provider | Model | Cost/1M Tokens (Input/Output) |
|----------|-------|-------------------------------|
| OpenAI | gpt-4o-mini | $0.15 / $0.60 |
| OpenAI | gpt-4o | $2.50 / $10.00 |
| Google | gemini-2.0-flash | Free tier available |
| Ollama | Local models | Free (local) |
### Cost Reduction Strategies
1. **Use Smaller Models for Simple Tasks**
```python
config["quick_think_llm"] = "gpt-4o-mini" # Instead of gpt-4o
```
2. **Reduce Debate Rounds**
```python
config["max_debate_rounds"] = 1 # Instead of 2-3
```
3. **Use OpenRouter for Competitive Pricing**
```python
config["llm_provider"] = "openrouter"
```
4. **Cache LLM Responses**
```python
# Implemented in agent memory system
memory.store_analysis(ticker, date, result)
```
5. **Use Ollama for Development**
```python
config["llm_provider"] = "ollama" # No API costs
```
## Embeddings
### Embedding Provider
TradingAgents uses OpenAI embeddings for vector storage (memory system):
```python
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
```
**Important**: Even when using non-OpenAI LLM providers (Anthropic, Google, etc.), `OPENAI_API_KEY` is still required for embeddings.
### Alternative Embedding Providers
For fully offline operation, consider:
```python
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
```
Note: This requires updating the memory initialization code.
## Performance Considerations
### Latency
**Provider Latency (Approximate):**
- OpenAI: 1-3 seconds per request
- Anthropic: 1-2 seconds per request
- Google: 0.5-1.5 seconds per request
- OpenRouter: Varies by underlying model
- Ollama: 0.5-5 seconds (depends on local hardware)
### Throughput
**Concurrent Requests:**
- OpenAI: Tier-based limits (20-5000 RPM)
- Anthropic: Tier-based limits (50-2000 RPM)
- OpenRouter: Model-specific limits
- Ollama: Limited by local GPU/CPU
### Caching
LangChain provides built-in caching:
```python
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
```
## Best Practices
1. **Set API Keys as Environment Variables**: Never hardcode keys
2. **Use Two-Tier Model Strategy**: Deep/quick thinking separation
3. **Implement Error Handling**: Catch rate limits and retry
4. **Monitor Costs**: Track token usage and expenses
5. **Test with Cheaper Models**: Use o4-mini/gpt-4o-mini for development
6. **Cache When Possible**: Avoid redundant API calls
7. **Use OpenRouter for Flexibility**: Easy switching between providers
8. **Implement Timeouts**: Prevent hanging requests
9. **Log API Usage**: Track which models are called
10. **Consider Local Models**: Ollama for sensitive data or development
## References
- [Multi-Agent System](multi-agent-system.md)
- [Configuration Guide](../guides/configuration.md)
- [Adding LLM Provider Guide](../guides/adding-llm-provider.md)
- [TradingGraph API](../api/trading-graph.md)