452 lines
12 KiB
Markdown
452 lines
12 KiB
Markdown
# LLM Integration Architecture
|
|
|
|
This document describes how TradingAgents integrates with different Large Language Model (LLM) providers through a unified abstraction layer.
|
|
|
|
## Overview
|
|
|
|
TradingAgents supports multiple LLM providers through a flexible configuration system that allows switching between providers without code changes.
|
|
|
|
## Supported Providers
|
|
|
|
### OpenAI
|
|
- **Models**: GPT-4o, GPT-4o-mini, o4-mini (default), o1-preview
|
|
- **Strengths**: Strong reasoning, reliable, extensive fine-tuning
|
|
- **Use Case**: Default choice for production
|
|
- **API Key**: `OPENAI_API_KEY`
|
|
- **Endpoint**: `https://api.openai.com/v1`
|
|
|
|
### Anthropic
|
|
- **Models**: Claude Sonnet 4, Claude Opus 4
|
|
- **Strengths**: Strong reasoning, long context windows, excellent instruction following
|
|
- **Use Case**: Alternative to OpenAI, good for complex analysis
|
|
- **API Key**: `ANTHROPIC_API_KEY`
|
|
- **Endpoint**: `https://api.anthropic.com`
|
|
|
|
### OpenRouter
|
|
- **Models**: Unified access to 100+ models from multiple providers
|
|
- **Strengths**: Single API for multiple providers, competitive pricing
|
|
- **Use Case**: Flexibility, cost optimization, accessing diverse models
|
|
- **API Key**: `OPENROUTER_API_KEY` (plus `OPENAI_API_KEY` for embeddings)
|
|
- **Endpoint**: `https://openrouter.ai/api/v1`
|
|
|
|
### Google Generative AI
|
|
- **Models**: Gemini 2.0 Flash, Gemini Pro
|
|
- **Strengths**: Fast inference, multimodal capabilities
|
|
- **Use Case**: Cost-effective alternative, multimodal analysis
|
|
- **API Key**: `GOOGLE_API_KEY`
|
|
- **Endpoint**: Built-in (no custom endpoint)
|
|
|
|
### Ollama
|
|
- **Models**: Local models (Llama, Mistral, etc.)
|
|
- **Strengths**: No API costs, data privacy, offline operation
|
|
- **Use Case**: Development, experimentation, privacy-sensitive analysis
|
|
- **API Key**: None (local)
|
|
- **Endpoint**: `http://localhost:11434/v1`
|
|
|
|
## Provider Abstraction
|
|
|
|
### Configuration-Driven Selection
|
|
|
|
LLM providers are selected through configuration:
|
|
|
|
```python
|
|
config = {
|
|
"llm_provider": "openai", # Provider selection
|
|
"deep_think_llm": "o4-mini", # Model for complex reasoning
|
|
"quick_think_llm": "gpt-4o-mini", # Model for fast tasks
|
|
"backend_url": "https://api.openai.com/v1"
|
|
}
|
|
```
|
|
|
|
### Initialization Logic
|
|
|
|
The `TradingAgentsGraph` class handles provider initialization:
|
|
|
|
```python
|
|
if config["llm_provider"].lower() in ("openai", "ollama"):
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
self.deep_thinking_llm = ChatOpenAI(
|
|
model=config["deep_think_llm"],
|
|
base_url=config["backend_url"]
|
|
)
|
|
self.quick_thinking_llm = ChatOpenAI(
|
|
model=config["quick_think_llm"],
|
|
base_url=config["backend_url"]
|
|
)
|
|
|
|
elif config["llm_provider"].lower() == "anthropic":
|
|
from langchain_anthropic import ChatAnthropic
|
|
|
|
self.deep_thinking_llm = ChatAnthropic(
|
|
model=config["deep_think_llm"],
|
|
base_url=config["backend_url"]
|
|
)
|
|
self.quick_thinking_llm = ChatAnthropic(
|
|
model=config["quick_think_llm"],
|
|
base_url=config["backend_url"]
|
|
)
|
|
|
|
elif config["llm_provider"].lower() == "openrouter":
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
openrouter_key = os.getenv("OPENROUTER_API_KEY")
|
|
if not openrouter_key:
|
|
raise ValueError("OPENROUTER_API_KEY required")
|
|
|
|
default_headers = {
|
|
"HTTP-Referer": "https://github.com/TauricResearch/TradingAgents",
|
|
"X-Title": "TradingAgents"
|
|
}
|
|
|
|
self.deep_thinking_llm = ChatOpenAI(
|
|
model=config["deep_think_llm"],
|
|
base_url=config["backend_url"],
|
|
api_key=openrouter_key,
|
|
default_headers=default_headers
|
|
)
|
|
self.quick_thinking_llm = ChatOpenAI(
|
|
model=config["quick_think_llm"],
|
|
base_url=config["backend_url"],
|
|
api_key=openrouter_key,
|
|
default_headers=default_headers
|
|
)
|
|
|
|
elif config["llm_provider"].lower() == "google":
|
|
from langchain_google_genai import ChatGoogleGenerativeAI
|
|
|
|
self.deep_thinking_llm = ChatGoogleGenerativeAI(
|
|
model=config["deep_think_llm"]
|
|
)
|
|
self.quick_thinking_llm = ChatGoogleGenerativeAI(
|
|
model=config["quick_think_llm"]
|
|
)
|
|
```
|
|
|
|
Location: `tradingagents/graph/trading_graph.py`
|
|
|
|
## Model Selection Strategy
|
|
|
|
### Two-Tier Model Approach
|
|
|
|
TradingAgents uses two types of LLMs for different tasks:
|
|
|
|
#### Deep Thinking LLM
|
|
- **Purpose**: Complex reasoning, strategic analysis, debate moderation
|
|
- **Characteristics**: Larger models, slower, more expensive, higher quality
|
|
- **Use Cases**:
|
|
- Researcher debate moderation
|
|
- Trading decision synthesis
|
|
- Risk assessment evaluation
|
|
- **Recommended Models**:
|
|
- OpenAI: o4-mini, o1-preview
|
|
- Anthropic: claude-sonnet-4, claude-opus-4
|
|
- OpenRouter: anthropic/claude-sonnet-4.5
|
|
|
|
#### Quick Thinking LLM
|
|
- **Purpose**: Fast analysis, data summarization, routine tasks
|
|
- **Characteristics**: Smaller models, faster, cost-effective
|
|
- **Use Cases**:
|
|
- Analyst report generation
|
|
- Data interpretation
|
|
- Tool calling
|
|
- **Recommended Models**:
|
|
- OpenAI: gpt-4o-mini, gpt-4o
|
|
- Anthropic: claude-sonnet-4
|
|
- OpenRouter: openai/gpt-4o-mini
|
|
|
|
### Model Selection Guidelines
|
|
|
|
**For Production:**
|
|
```python
|
|
config["deep_think_llm"] = "o1-preview" # Best reasoning
|
|
config["quick_think_llm"] = "gpt-4o-mini" # Cost-effective
|
|
```
|
|
|
|
**For Development/Testing:**
|
|
```python
|
|
config["deep_think_llm"] = "o4-mini" # Fast and cheaper
|
|
config["quick_think_llm"] = "gpt-4o-mini" # Consistent quality
|
|
```
|
|
|
|
**For Cost Optimization:**
|
|
```python
|
|
config["llm_provider"] = "openrouter"
|
|
config["deep_think_llm"] = "anthropic/claude-sonnet-4.5"
|
|
config["quick_think_llm"] = "openai/gpt-4o-mini"
|
|
```
|
|
|
|
## Provider-Specific Configuration
|
|
|
|
### OpenAI Configuration
|
|
|
|
```python
|
|
config = {
|
|
"llm_provider": "openai",
|
|
"deep_think_llm": "o4-mini",
|
|
"quick_think_llm": "gpt-4o-mini",
|
|
"backend_url": "https://api.openai.com/v1"
|
|
}
|
|
```
|
|
|
|
Environment:
|
|
```bash
|
|
export OPENAI_API_KEY=sk-your_key_here
|
|
```
|
|
|
|
### Anthropic Configuration
|
|
|
|
```python
|
|
config = {
|
|
"llm_provider": "anthropic",
|
|
"deep_think_llm": "claude-sonnet-4-20250514",
|
|
"quick_think_llm": "claude-sonnet-4-20250514",
|
|
"backend_url": "https://api.anthropic.com"
|
|
}
|
|
```
|
|
|
|
Environment:
|
|
```bash
|
|
export ANTHROPIC_API_KEY=sk-ant-your_key_here
|
|
```
|
|
|
|
### OpenRouter Configuration
|
|
|
|
```python
|
|
config = {
|
|
"llm_provider": "openrouter",
|
|
"deep_think_llm": "anthropic/claude-sonnet-4.5",
|
|
"quick_think_llm": "openai/gpt-4o-mini",
|
|
"backend_url": "https://openrouter.ai/api/v1"
|
|
}
|
|
```
|
|
|
|
Environment:
|
|
```bash
|
|
export OPENROUTER_API_KEY=sk-or-v1-your_key_here
|
|
export OPENAI_API_KEY=sk-your_key_here # Required for embeddings
|
|
```
|
|
|
|
**Note**: OpenRouter uses `provider/model-name` format:
|
|
- `anthropic/claude-sonnet-4.5`
|
|
- `openai/gpt-4o`
|
|
- `google/gemini-pro`
|
|
|
|
### Google Generative AI Configuration
|
|
|
|
```python
|
|
config = {
|
|
"llm_provider": "google",
|
|
"deep_think_llm": "gemini-2.0-flash",
|
|
"quick_think_llm": "gemini-2.0-flash"
|
|
}
|
|
```
|
|
|
|
Environment:
|
|
```bash
|
|
export GOOGLE_API_KEY=your_key_here
|
|
```
|
|
|
|
### Ollama Configuration
|
|
|
|
```python
|
|
config = {
|
|
"llm_provider": "ollama",
|
|
"deep_think_llm": "mistral",
|
|
"quick_think_llm": "mistral",
|
|
"backend_url": "http://localhost:11434/v1"
|
|
}
|
|
```
|
|
|
|
Prerequisites:
|
|
```bash
|
|
# Install Ollama
|
|
curl -fsSL https://ollama.com/install.sh | sh
|
|
|
|
# Pull model
|
|
ollama pull mistral
|
|
|
|
# Start Ollama server
|
|
ollama serve
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Rate Limit Handling
|
|
|
|
Unified rate limit error handling across providers:
|
|
|
|
```python
|
|
from tradingagents.utils.exceptions import LLMRateLimitError
|
|
|
|
try:
|
|
response = llm.invoke(messages)
|
|
except LLMRateLimitError as e:
|
|
print(f"Rate limit hit: {e.message}")
|
|
if e.retry_after:
|
|
print(f"Retry after {e.retry_after} seconds")
|
|
```
|
|
|
|
Location: `tradingagents/utils/exceptions.py`
|
|
|
|
### Provider-Specific Errors
|
|
|
|
Each provider may raise different errors:
|
|
|
|
**OpenAI:**
|
|
- `RateLimitError` → Retry after specified time
|
|
- `InvalidRequestError` → Check model name, parameters
|
|
- `AuthenticationError` → Verify API key
|
|
|
|
**Anthropic:**
|
|
- `RateLimitError` → Retry with backoff
|
|
- `InvalidRequestError` → Check message format
|
|
- `APIError` → Server-side issues
|
|
|
|
**OpenRouter:**
|
|
- Follows OpenAI error format
|
|
- Additional headers required for attribution
|
|
|
|
### Fallback Strategy
|
|
|
|
Implement provider fallback for resilience:
|
|
|
|
```python
|
|
providers = ["openai", "anthropic", "openrouter"]
|
|
|
|
for provider in providers:
|
|
try:
|
|
config["llm_provider"] = provider
|
|
ta = TradingAgentsGraph(config=config)
|
|
result = ta.propagate(ticker, date)
|
|
break
|
|
except LLMRateLimitError:
|
|
continue
|
|
```
|
|
|
|
## Cost Optimization
|
|
|
|
### Model Cost Comparison
|
|
|
|
**Deep Thinking Tasks:**
|
|
| Provider | Model | Cost/1M Tokens (Input/Output) |
|
|
|----------|-------|-------------------------------|
|
|
| OpenAI | o4-mini | $1.50 / $6.00 |
|
|
| OpenAI | o1-preview | $15.00 / $60.00 |
|
|
| Anthropic | claude-sonnet-4 | $3.00 / $15.00 |
|
|
| OpenRouter | Varies by model | Check OpenRouter pricing |
|
|
|
|
**Quick Thinking Tasks:**
|
|
| Provider | Model | Cost/1M Tokens (Input/Output) |
|
|
|----------|-------|-------------------------------|
|
|
| OpenAI | gpt-4o-mini | $0.15 / $0.60 |
|
|
| OpenAI | gpt-4o | $2.50 / $10.00 |
|
|
| Google | gemini-2.0-flash | Free tier available |
|
|
| Ollama | Local models | Free (local) |
|
|
|
|
### Cost Reduction Strategies
|
|
|
|
1. **Use Smaller Models for Simple Tasks**
|
|
```python
|
|
config["quick_think_llm"] = "gpt-4o-mini" # Instead of gpt-4o
|
|
```
|
|
|
|
2. **Reduce Debate Rounds**
|
|
```python
|
|
config["max_debate_rounds"] = 1 # Instead of 2-3
|
|
```
|
|
|
|
3. **Use OpenRouter for Competitive Pricing**
|
|
```python
|
|
config["llm_provider"] = "openrouter"
|
|
```
|
|
|
|
4. **Cache LLM Responses**
|
|
```python
|
|
# Implemented in agent memory system
|
|
memory.store_analysis(ticker, date, result)
|
|
```
|
|
|
|
5. **Use Ollama for Development**
|
|
```python
|
|
config["llm_provider"] = "ollama" # No API costs
|
|
```
|
|
|
|
## Embeddings
|
|
|
|
### Embedding Provider
|
|
|
|
TradingAgents uses OpenAI embeddings for vector storage (memory system):
|
|
|
|
```python
|
|
from langchain_openai import OpenAIEmbeddings
|
|
|
|
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
|
|
```
|
|
|
|
**Important**: Even when using non-OpenAI LLM providers (Anthropic, Google, etc.), `OPENAI_API_KEY` is still required for embeddings.
|
|
|
|
### Alternative Embedding Providers
|
|
|
|
For fully offline operation, consider:
|
|
|
|
```python
|
|
from langchain_community.embeddings import HuggingFaceEmbeddings
|
|
|
|
embeddings = HuggingFaceEmbeddings(
|
|
model_name="sentence-transformers/all-MiniLM-L6-v2"
|
|
)
|
|
```
|
|
|
|
Note: This requires updating the memory initialization code.
|
|
|
|
## Performance Considerations
|
|
|
|
### Latency
|
|
|
|
**Provider Latency (Approximate):**
|
|
- OpenAI: 1-3 seconds per request
|
|
- Anthropic: 1-2 seconds per request
|
|
- Google: 0.5-1.5 seconds per request
|
|
- OpenRouter: Varies by underlying model
|
|
- Ollama: 0.5-5 seconds (depends on local hardware)
|
|
|
|
### Throughput
|
|
|
|
**Concurrent Requests:**
|
|
- OpenAI: Tier-based limits (20-5000 RPM)
|
|
- Anthropic: Tier-based limits (50-2000 RPM)
|
|
- OpenRouter: Model-specific limits
|
|
- Ollama: Limited by local GPU/CPU
|
|
|
|
### Caching
|
|
|
|
LangChain provides built-in caching:
|
|
|
|
```python
|
|
from langchain.cache import SQLiteCache
|
|
from langchain.globals import set_llm_cache
|
|
|
|
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Set API Keys as Environment Variables**: Never hardcode keys
|
|
2. **Use Two-Tier Model Strategy**: Deep/quick thinking separation
|
|
3. **Implement Error Handling**: Catch rate limits and retry
|
|
4. **Monitor Costs**: Track token usage and expenses
|
|
5. **Test with Cheaper Models**: Use o4-mini/gpt-4o-mini for development
|
|
6. **Cache When Possible**: Avoid redundant API calls
|
|
7. **Use OpenRouter for Flexibility**: Easy switching between providers
|
|
8. **Implement Timeouts**: Prevent hanging requests
|
|
9. **Log API Usage**: Track which models are called
|
|
10. **Consider Local Models**: Ollama for sensitive data or development
|
|
|
|
## References
|
|
|
|
- [Multi-Agent System](multi-agent-system.md)
|
|
- [Configuration Guide](../guides/configuration.md)
|
|
- [Adding LLM Provider Guide](../guides/adding-llm-provider.md)
|
|
- [TradingGraph API](../api/trading-graph.md)
|