TradingAgents/docs/EMBEDDING_MIGRATION.md

374 lines
11 KiB
Markdown

# Embedding Provider Separation - Implementation Summary
## Overview
This document summarizes the changes made to separate embedding configuration from chat model configuration in the TradingAgents framework.
## Branch Information
- **Branch Name**: `feature/separate-embedding-client`
- **Base Branch**: `main`
- **Status**: Ready for review/merge
## Problem Statement
Previously, the TradingAgents memory system used the same `backend_url` for both chat models and embeddings. This caused critical failures when:
1. Using **OpenRouter** for chat (doesn't support OpenAI embedding endpoints)
2. Using **Anthropic/Google** for chat (don't provide embeddings)
3. The embedding endpoint returned HTML error pages instead of JSON
4. Users wanted to mix providers (e.g., OpenRouter for chat, OpenAI for embeddings)
**Example Error**:
```python
AttributeError: 'str' object has no attribute 'data'
# Caused by: OpenRouter returned HTML page instead of embedding JSON
```
## Solution
Implemented a comprehensive separation of embedding and chat model configurations with three key features:
### 1. Separate Embedding Client Configuration
New configuration parameters independent of chat LLM settings:
```python
config = {
# Chat LLM settings
"llm_provider": "openrouter",
"backend_url": "https://openrouter.ai/api/v1",
# NEW: Separate embedding settings
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
"embedding_model": "text-embedding-3-small",
"enable_memory": True,
}
```
### 2. Multiple Provider Support
- **OpenAI**: Production-grade embeddings (recommended)
- **Ollama**: Local embeddings for offline/development use
- **None**: Disable memory system entirely
### 3. Graceful Fallback
- System continues to operate when embeddings fail
- Comprehensive error logging
- Memory operations return empty results instead of crashing
- Agents function without historical context when memory is disabled
## Files Modified
### Core Framework
1. **`tradingagents/default_config.py`**
- Added 4 new configuration parameters for embeddings
- Maintains backward compatibility with existing configs
2. **`tradingagents/agents/utils/memory.py`**
- Complete refactor of `FinancialSituationMemory` class
- Added provider-specific initialization logic
- Implemented graceful error handling
- Added `is_enabled()` method
- Added comprehensive logging
- All methods now return safe defaults on failure
3. **`tradingagents/graph/trading_graph.py`**
- Added `_configure_embeddings()` method for smart defaults
- Separated chat LLM initialization from embedding setup
- Added memory status logging
- Updated `reflect_and_remember()` to respect memory settings
### CLI/User Interface
4. **`cli/utils.py`**
- Added `select_embedding_provider()` function
- Returns tuple: (provider, backend_url, model)
- Interactive selection with clear descriptions
- Code formatting improvements
5. **`cli/main.py`**
- Added Step 7: Embedding Provider selection
- Updated `get_user_selections()` to include embedding settings
- Updated `run_analysis()` to configure embedding from user selections
- Improved formatting and code style consistency
### Documentation
6. **`docs/EMBEDDING_CONFIGURATION.md`** (NEW)
- Comprehensive guide for embedding configuration
- Common scenarios and examples
- Troubleshooting section
- API reference
- Migration guide
7. **`docs/EMBEDDING_MIGRATION.md`** (THIS FILE)
- Implementation summary
- Technical details
- Testing recommendations
## Technical Details
### Configuration Priority
The system applies configuration in this order:
1. **Explicit user configuration** (highest priority)
2. **Provider-specific defaults**
3. **Fallback defaults** (lowest priority)
Example logic:
```python
def _configure_embeddings(self):
if "embedding_provider" not in self.config:
self.config["embedding_provider"] = "openai" # Safe default
if "embedding_backend_url" not in self.config:
if self.config["embedding_provider"] == "ollama":
self.config["embedding_backend_url"] = "http://localhost:11434/v1"
else:
self.config["embedding_backend_url"] = "https://api.openai.com/v1"
```
### Error Handling Strategy
Memory system implements defensive programming:
```python
def get_embedding(self, text: str) -> Optional[List[float]]:
if not self.enabled or not self.client:
return None # Safe fallback
try:
response = self.client.embeddings.create(...)
return response.data[0].embedding
except Exception as e:
logger.error(f"Failed to get embedding: {e}")
return None # Never crash, return None
```
All callers handle `None` gracefully:
```python
def add_situations(...):
for situation in situations:
embedding = self.get_embedding(situation)
if embedding is None:
logger.warning("Skipping situation due to embedding failure")
continue # Skip this item, process others
```
### Backward Compatibility
Existing configurations continue to work without modification:
**Old config** (still works):
```python
config = {
"llm_provider": "openai",
"backend_url": "https://api.openai.com/v1",
}
# Embeddings auto-configured to use OpenAI
```
**New config** (explicit control):
```python
config = {
"llm_provider": "openrouter",
"backend_url": "https://openrouter.ai/api/v1",
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
}
# Full control over both chat and embeddings
```
## Testing Recommendations
### Unit Tests
```python
# Test memory initialization with different providers
def test_memory_openai_provider():
config = {
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
"enable_memory": True,
}
memory = FinancialSituationMemory("test", config)
assert memory.is_enabled()
def test_memory_disabled():
config = {"embedding_provider": "none", "enable_memory": False}
memory = FinancialSituationMemory("test", config)
assert not memory.is_enabled()
assert memory.get_memories("test") == []
def test_memory_graceful_failure():
config = {
"embedding_provider": "openai",
"embedding_backend_url": "https://invalid-url.example/v1",
"enable_memory": True,
}
memory = FinancialSituationMemory("test", config)
# Should disable itself on connection failure
result = memory.get_memories("test")
assert result == []
```
### Integration Tests
```python
# Test full graph with different configurations
def test_graph_with_openrouter_and_openai_embeddings():
config = {
"llm_provider": "openrouter",
"backend_url": "https://openrouter.ai/api/v1",
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
}
graph = TradingAgentsGraph(["market"], config=config)
# Should initialize without errors
assert graph.bull_memory.is_enabled()
def test_graph_with_disabled_memory():
config = {
"llm_provider": "openai",
"backend_url": "https://api.openai.com/v1",
"enable_memory": False,
}
graph = TradingAgentsGraph(["market"], config=config)
# Should work without memory
assert not graph.bull_memory.is_enabled()
```
### Manual Testing Scenarios
1. **OpenRouter + OpenAI embeddings**
```bash
export OPENROUTER_API_KEY="sk-or-..."
export OPENAI_API_KEY="sk-..."
python -m cli.main
# Select OpenRouter for chat, OpenAI for embeddings
```
2. **All Ollama (local)**
```bash
ollama pull llama3.1 nomic-embed-text
python -m cli.main
# Select Ollama for both chat and embeddings
```
3. **Disabled memory**
```bash
python -m cli.main
# Select any chat provider, disable memory
# Verify agents work without errors
```
## Breaking Changes
**None** - This is a backward-compatible enhancement.
Existing code continues to work without modification. New features are opt-in.
## Dependencies
No new dependencies added. Uses existing packages:
- `openai` (already required)
- `chromadb` (already required)
## Performance Impact
- **Minimal**: Embedding initialization is one-time cost
- **Memory**: No additional memory overhead when disabled
- **Latency**: No impact on chat model latency
- **Cost**: Allows optimization by choosing cheaper embedding providers
## Security Considerations
- API keys for different providers should be stored separately
- Follow least-privilege principle: use separate keys for chat vs embeddings
- Embedding data sent to configured provider (ensure compliance)
Example `.env`:
```bash
# Separate keys for different services
OPENAI_API_KEY="sk-..." # For embeddings
OPENROUTER_API_KEY="sk-or-..." # For chat models
```
## Future Enhancements
Potential improvements for future versions:
1. **Additional embedding providers**:
- HuggingFace embeddings
- Cohere embeddings
- Azure OpenAI embeddings
2. **Embedding caching**:
- Cache embeddings to disk
- Reduce API calls for repeated situations
3. **Embedding fine-tuning**:
- Support for custom fine-tuned embedding models
- Domain-specific financial embeddings
4. **Async embeddings**:
- Batch embedding requests
- Parallel processing for large memory operations
5. **Embedding quality metrics**:
- Track similarity score distributions
- Alert on low-quality matches
## Migration Checklist
For users upgrading to this version:
- [ ] Review current configuration
- [ ] Identify chat provider (OpenRouter, Anthropic, etc.)
- [ ] Decide on embedding strategy:
- [ ] Use OpenAI for embeddings (recommended)
- [ ] Use Ollama for local embeddings
- [ ] Disable memory if not needed
- [ ] Update `.env` with necessary API keys
- [ ] Test configuration in development
- [ ] Monitor logs for embedding-related warnings
- [ ] Verify memory is working as expected
## Rollback Plan
If issues arise:
1. **Immediate**: Set `enable_memory: False` to disable embeddings
2. **Code**: Remove embedding-specific config, system uses defaults
3. **Branch**: Revert to previous commit before this feature
## Support
For questions or issues:
1. Check `docs/EMBEDDING_CONFIGURATION.md` for detailed guide
2. Review error logs for specific failure messages
3. Try with `enable_memory: False` to isolate issue
4. Open GitHub issue with:
- Configuration used
- Error messages/logs
- Provider information
## Conclusion
This implementation successfully addresses the embedding/chat provider separation issue while maintaining backward compatibility and adding robust error handling. The system now supports flexible provider configurations and gracefully handles failures.
**Key Achievements**:
- ✅ Separate embedding and chat configurations
- ✅ Multiple embedding provider support
- ✅ Graceful degradation on failures
- ✅ Backward compatible
- ✅ Comprehensive documentation
- ✅ CLI integration
- ✅ Zero new dependencies