TradingAgents/docs/architecture/llm-integration.md

# LLM Integration Architecture

This document describes how TradingAgents integrates with different Large Language Model (LLM) providers through a unified abstraction layer.

## Overview

TradingAgents supports multiple LLM providers through a flexible configuration system that allows switching between providers without code changes.

## Supported Providers

### OpenAI
- **Models**: GPT-4o, GPT-4o-mini, o4-mini (default), o1-preview
- **Strengths**: Strong reasoning, reliable, extensive fine-tuning
- **Use Case**: Default choice for production
- **API Key**: `OPENAI_API_KEY`
- **Endpoint**: `https://api.openai.com/v1`

### Anthropic
- **Models**: Claude Sonnet 4, Claude Opus 4
- **Strengths**: Strong reasoning, long context windows, excellent instruction following
- **Use Case**: Alternative to OpenAI, good for complex analysis
- **API Key**: `ANTHROPIC_API_KEY`
- **Endpoint**: `https://api.anthropic.com`

### OpenRouter
- **Models**: Unified access to 100+ models from multiple providers
- **Strengths**: Single API for multiple providers, competitive pricing
- **Use Case**: Flexibility, cost optimization, accessing diverse models
- **API Key**: `OPENROUTER_API_KEY` (plus `OPENAI_API_KEY` for embeddings)
- **Endpoint**: `https://openrouter.ai/api/v1`

### Google Generative AI
- **Models**: Gemini 2.0 Flash, Gemini Pro
- **Strengths**: Fast inference, multimodal capabilities
- **Use Case**: Cost-effective alternative, multimodal analysis
- **API Key**: `GOOGLE_API_KEY`
- **Endpoint**: Built-in (no custom endpoint)

### Ollama
- **Models**: Local models (Llama, Mistral, etc.)
- **Strengths**: No API costs, data privacy, offline operation
- **Use Case**: Development, experimentation, privacy-sensitive analysis
- **API Key**: None (local)
- **Endpoint**: `http://localhost:11434/v1`

## Provider Abstraction

### Configuration-Driven Selection

LLM providers are selected through configuration:

```python
config = {
    "llm_provider": "openai",  # Provider selection
    "deep_think_llm": "o4-mini",  # Model for complex reasoning
    "quick_think_llm": "gpt-4o-mini",  # Model for fast tasks
    "backend_url": "https://api.openai.com/v1"
}
```

### Initialization Logic

The `TradingAgentsGraph` class handles provider initialization:

```python
if config["llm_provider"].lower() in ("openai", "ollama"):
    from langchain_openai import ChatOpenAI

    self.deep_thinking_llm = ChatOpenAI(
        model=config["deep_think_llm"],
        base_url=config["backend_url"]
    )
    self.quick_thinking_llm = ChatOpenAI(
        model=config["quick_think_llm"],
        base_url=config["backend_url"]
    )

elif config["llm_provider"].lower() == "anthropic":
    from langchain_anthropic import ChatAnthropic

    self.deep_thinking_llm = ChatAnthropic(
        model=config["deep_think_llm"],
        base_url=config["backend_url"]
    )
    self.quick_thinking_llm = ChatAnthropic(
        model=config["quick_think_llm"],
        base_url=config["backend_url"]
    )

elif config["llm_provider"].lower() == "openrouter":
    from langchain_openai import ChatOpenAI

    openrouter_key = os.getenv("OPENROUTER_API_KEY")
    if not openrouter_key:
        raise ValueError("OPENROUTER_API_KEY required")

    default_headers = {
        "HTTP-Referer": "https://github.com/TauricResearch/TradingAgents",
        "X-Title": "TradingAgents"
    }

    self.deep_thinking_llm = ChatOpenAI(
        model=config["deep_think_llm"],
        base_url=config["backend_url"],
        api_key=openrouter_key,
        default_headers=default_headers
    )
    self.quick_thinking_llm = ChatOpenAI(
        model=config["quick_think_llm"],
        base_url=config["backend_url"],
        api_key=openrouter_key,
        default_headers=default_headers
    )

elif config["llm_provider"].lower() == "google":
    from langchain_google_genai import ChatGoogleGenerativeAI

    self.deep_thinking_llm = ChatGoogleGenerativeAI(
        model=config["deep_think_llm"]
    )
    self.quick_thinking_llm = ChatGoogleGenerativeAI(
        model=config["quick_think_llm"]
    )
```

Location: `tradingagents/graph/trading_graph.py`

## Model Selection Strategy

### Two-Tier Model Approach

TradingAgents uses two types of LLMs for different tasks:

#### Deep Thinking LLM
- **Purpose**: Complex reasoning, strategic analysis, debate moderation
- **Characteristics**: Larger models, slower, more expensive, higher quality
- **Use Cases**:
  - Researcher debate moderation
  - Trading decision synthesis
  - Risk assessment evaluation
- **Recommended Models**:
  - OpenAI: o4-mini, o1-preview
  - Anthropic: claude-sonnet-4, claude-opus-4
  - OpenRouter: anthropic/claude-sonnet-4.5

#### Quick Thinking LLM
- **Purpose**: Fast analysis, data summarization, routine tasks
- **Characteristics**: Smaller models, faster, cost-effective
- **Use Cases**:
  - Analyst report generation
  - Data interpretation
  - Tool calling
- **Recommended Models**:
  - OpenAI: gpt-4o-mini, gpt-4o
  - Anthropic: claude-sonnet-4
  - OpenRouter: openai/gpt-4o-mini

### Model Selection Guidelines

**For Production:**
```python
config["deep_think_llm"] = "o1-preview"      # Best reasoning
config["quick_think_llm"] = "gpt-4o-mini"    # Cost-effective
```

**For Development/Testing:**
```python
config["deep_think_llm"] = "o4-mini"         # Fast and cheaper
config["quick_think_llm"] = "gpt-4o-mini"    # Consistent quality
```

**For Cost Optimization:**
```python
config["llm_provider"] = "openrouter"
config["deep_think_llm"] = "anthropic/claude-sonnet-4.5"
config["quick_think_llm"] = "openai/gpt-4o-mini"
```

## Provider-Specific Configuration

### OpenAI Configuration

```python
config = {
    "llm_provider": "openai",
    "deep_think_llm": "o4-mini",
    "quick_think_llm": "gpt-4o-mini",
    "backend_url": "https://api.openai.com/v1"
}
```

Environment:
```bash
export OPENAI_API_KEY=sk-your_key_here
```

### Anthropic Configuration

```python
config = {
    "llm_provider": "anthropic",
    "deep_think_llm": "claude-sonnet-4-20250514",
    "quick_think_llm": "claude-sonnet-4-20250514",
    "backend_url": "https://api.anthropic.com"
}
```

Environment:
```bash
export ANTHROPIC_API_KEY=sk-ant-your_key_here
```

### OpenRouter Configuration

```python
config = {
    "llm_provider": "openrouter",
    "deep_think_llm": "anthropic/claude-sonnet-4.5",
    "quick_think_llm": "openai/gpt-4o-mini",
    "backend_url": "https://openrouter.ai/api/v1"
}
```

Environment:
```bash
export OPENROUTER_API_KEY=sk-or-v1-your_key_here
export OPENAI_API_KEY=sk-your_key_here  # Required for embeddings
```

**Note**: OpenRouter uses `provider/model-name` format:
- `anthropic/claude-sonnet-4.5`
- `openai/gpt-4o`
- `google/gemini-pro`

### Google Generative AI Configuration

```python
config = {
    "llm_provider": "google",
    "deep_think_llm": "gemini-2.0-flash",
    "quick_think_llm": "gemini-2.0-flash"
}
```

Environment:
```bash
export GOOGLE_API_KEY=your_key_here
```

### Ollama Configuration

```python
config = {
    "llm_provider": "ollama",
    "deep_think_llm": "mistral",
    "quick_think_llm": "mistral",
    "backend_url": "http://localhost:11434/v1"
}
```

Prerequisites:
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull model
ollama pull mistral

# Start Ollama server
ollama serve
```

## Error Handling

### Rate Limit Handling

Unified rate limit error handling across providers:

```python
from tradingagents.utils.exceptions import LLMRateLimitError

try:
    response = llm.invoke(messages)
except LLMRateLimitError as e:
    print(f"Rate limit hit: {e.message}")
    if e.retry_after:
        print(f"Retry after {e.retry_after} seconds")
```

Location: `tradingagents/utils/exceptions.py`

### Provider-Specific Errors

Each provider may raise different errors:

**OpenAI:**
- `RateLimitError` → Retry after specified time
- `InvalidRequestError` → Check model name, parameters
- `AuthenticationError` → Verify API key

**Anthropic:**
- `RateLimitError` → Retry with backoff
- `InvalidRequestError` → Check message format
- `APIError` → Server-side issues

**OpenRouter:**
- Follows OpenAI error format
- Additional headers required for attribution

### Fallback Strategy

Implement provider fallback for resilience:

```python
providers = ["openai", "anthropic", "openrouter"]

for provider in providers:
    try:
        config["llm_provider"] = provider
        ta = TradingAgentsGraph(config=config)
        result = ta.propagate(ticker, date)
        break
    except LLMRateLimitError:
        continue
```

## Cost Optimization

### Model Cost Comparison

**Deep Thinking Tasks:**
| Provider | Model | Cost/1M Tokens (Input/Output) |
|----------|-------|-------------------------------|
| OpenAI | o4-mini | $1.50 / $6.00 |
| OpenAI | o1-preview | $15.00 / $60.00 |
| Anthropic | claude-sonnet-4 | $3.00 / $15.00 |
| OpenRouter | Varies by model | Check OpenRouter pricing |

**Quick Thinking Tasks:**
| Provider | Model | Cost/1M Tokens (Input/Output) |
|----------|-------|-------------------------------|
| OpenAI | gpt-4o-mini | $0.15 / $0.60 |
| OpenAI | gpt-4o | $2.50 / $10.00 |
| Google | gemini-2.0-flash | Free tier available |
| Ollama | Local models | Free (local) |

### Cost Reduction Strategies

1. **Use Smaller Models for Simple Tasks**
   ```python
   config["quick_think_llm"] = "gpt-4o-mini"  # Instead of gpt-4o
   ```

2. **Reduce Debate Rounds**
   ```python
   config["max_debate_rounds"] = 1  # Instead of 2-3
   ```

3. **Use OpenRouter for Competitive Pricing**
   ```python
   config["llm_provider"] = "openrouter"
   ```

4. **Cache LLM Responses**
   ```python
   # Implemented in agent memory system
   memory.store_analysis(ticker, date, result)
   ```

5. **Use Ollama for Development**
   ```python
   config["llm_provider"] = "ollama"  # No API costs
   ```

## Embeddings

### Embedding Provider

TradingAgents uses OpenAI embeddings for vector storage (memory system):

```python
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
```

**Important**: Even when using non-OpenAI LLM providers (Anthropic, Google, etc.), `OPENAI_API_KEY` is still required for embeddings.

### Alternative Embedding Providers

For fully offline operation, consider:

```python
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
```

Note: This requires updating the memory initialization code.

## Performance Considerations

### Latency

**Provider Latency (Approximate):**
- OpenAI: 1-3 seconds per request
- Anthropic: 1-2 seconds per request
- Google: 0.5-1.5 seconds per request
- OpenRouter: Varies by underlying model
- Ollama: 0.5-5 seconds (depends on local hardware)

### Throughput

**Concurrent Requests:**
- OpenAI: Tier-based limits (20-5000 RPM)
- Anthropic: Tier-based limits (50-2000 RPM)
- OpenRouter: Model-specific limits
- Ollama: Limited by local GPU/CPU

### Caching

LangChain provides built-in caching:

```python
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))
```

## Best Practices

1. **Set API Keys as Environment Variables**: Never hardcode keys
2. **Use Two-Tier Model Strategy**: Deep/quick thinking separation
3. **Implement Error Handling**: Catch rate limits and retry
4. **Monitor Costs**: Track token usage and expenses
5. **Test with Cheaper Models**: Use o4-mini/gpt-4o-mini for development
6. **Cache When Possible**: Avoid redundant API calls
7. **Use OpenRouter for Flexibility**: Easy switching between providers
8. **Implement Timeouts**: Prevent hanging requests
9. **Log API Usage**: Track which models are called
10. **Consider Local Models**: Ollama for sensitive data or development

## References

- [Multi-Agent System](multi-agent-system.md)
- [Configuration Guide](../guides/configuration.md)
- [Adding LLM Provider Guide](../guides/adding-llm-provider.md)
- [TradingGraph API](../api/trading-graph.md)