11 KiB
Embedding Provider Separation - Implementation Summary
Overview
This document summarizes the changes made to separate embedding configuration from chat model configuration in the TradingAgents framework.
Branch Information
- Branch Name:
feature/separate-embedding-client - Base Branch:
main - Status: Ready for review/merge
Problem Statement
Previously, the TradingAgents memory system used the same backend_url for both chat models and embeddings. This caused critical failures when:
- Using OpenRouter for chat (doesn't support OpenAI embedding endpoints)
- Using Anthropic/Google for chat (don't provide embeddings)
- The embedding endpoint returned HTML error pages instead of JSON
- Users wanted to mix providers (e.g., OpenRouter for chat, OpenAI for embeddings)
Example Error:
AttributeError: 'str' object has no attribute 'data'
# Caused by: OpenRouter returned HTML page instead of embedding JSON
Solution
Implemented a comprehensive separation of embedding and chat model configurations with three key features:
1. Separate Embedding Client Configuration
New configuration parameters independent of chat LLM settings:
config = {
# Chat LLM settings
"llm_provider": "openrouter",
"backend_url": "https://openrouter.ai/api/v1",
# NEW: Separate embedding settings
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
"embedding_model": "text-embedding-3-small",
"enable_memory": True,
}
2. Multiple Provider Support
- OpenAI: Production-grade embeddings (recommended)
- Ollama: Local embeddings for offline/development use
- None: Disable memory system entirely
3. Graceful Fallback
- System continues to operate when embeddings fail
- Comprehensive error logging
- Memory operations return empty results instead of crashing
- Agents function without historical context when memory is disabled
Files Modified
Core Framework
-
tradingagents/default_config.py- Added 4 new configuration parameters for embeddings
- Maintains backward compatibility with existing configs
-
tradingagents/agents/utils/memory.py- Complete refactor of
FinancialSituationMemoryclass - Added provider-specific initialization logic
- Implemented graceful error handling
- Added
is_enabled()method - Added comprehensive logging
- All methods now return safe defaults on failure
- Complete refactor of
-
tradingagents/graph/trading_graph.py- Added
_configure_embeddings()method for smart defaults - Separated chat LLM initialization from embedding setup
- Added memory status logging
- Updated
reflect_and_remember()to respect memory settings
- Added
CLI/User Interface
-
cli/utils.py- Added
select_embedding_provider()function - Returns tuple: (provider, backend_url, model)
- Interactive selection with clear descriptions
- Code formatting improvements
- Added
-
cli/main.py- Added Step 7: Embedding Provider selection
- Updated
get_user_selections()to include embedding settings - Updated
run_analysis()to configure embedding from user selections - Improved formatting and code style consistency
Documentation
-
docs/EMBEDDING_CONFIGURATION.md(NEW)- Comprehensive guide for embedding configuration
- Common scenarios and examples
- Troubleshooting section
- API reference
- Migration guide
-
docs/EMBEDDING_MIGRATION.md(THIS FILE)- Implementation summary
- Technical details
- Testing recommendations
Technical Details
Configuration Priority
The system applies configuration in this order:
- Explicit user configuration (highest priority)
- Provider-specific defaults
- Fallback defaults (lowest priority)
Example logic:
def _configure_embeddings(self):
if "embedding_provider" not in self.config:
self.config["embedding_provider"] = "openai" # Safe default
if "embedding_backend_url" not in self.config:
if self.config["embedding_provider"] == "ollama":
self.config["embedding_backend_url"] = "http://localhost:11434/v1"
else:
self.config["embedding_backend_url"] = "https://api.openai.com/v1"
Error Handling Strategy
Memory system implements defensive programming:
def get_embedding(self, text: str) -> Optional[List[float]]:
if not self.enabled or not self.client:
return None # Safe fallback
try:
response = self.client.embeddings.create(...)
return response.data[0].embedding
except Exception as e:
logger.error(f"Failed to get embedding: {e}")
return None # Never crash, return None
All callers handle None gracefully:
def add_situations(...):
for situation in situations:
embedding = self.get_embedding(situation)
if embedding is None:
logger.warning("Skipping situation due to embedding failure")
continue # Skip this item, process others
Backward Compatibility
Existing configurations continue to work without modification:
Old config (still works):
config = {
"llm_provider": "openai",
"backend_url": "https://api.openai.com/v1",
}
# Embeddings auto-configured to use OpenAI
New config (explicit control):
config = {
"llm_provider": "openrouter",
"backend_url": "https://openrouter.ai/api/v1",
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
}
# Full control over both chat and embeddings
Testing Recommendations
Unit Tests
# Test memory initialization with different providers
def test_memory_openai_provider():
config = {
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
"enable_memory": True,
}
memory = FinancialSituationMemory("test", config)
assert memory.is_enabled()
def test_memory_disabled():
config = {"embedding_provider": "none", "enable_memory": False}
memory = FinancialSituationMemory("test", config)
assert not memory.is_enabled()
assert memory.get_memories("test") == []
def test_memory_graceful_failure():
config = {
"embedding_provider": "openai",
"embedding_backend_url": "https://invalid-url.example/v1",
"enable_memory": True,
}
memory = FinancialSituationMemory("test", config)
# Should disable itself on connection failure
result = memory.get_memories("test")
assert result == []
Integration Tests
# Test full graph with different configurations
def test_graph_with_openrouter_and_openai_embeddings():
config = {
"llm_provider": "openrouter",
"backend_url": "https://openrouter.ai/api/v1",
"embedding_provider": "openai",
"embedding_backend_url": "https://api.openai.com/v1",
}
graph = TradingAgentsGraph(["market"], config=config)
# Should initialize without errors
assert graph.bull_memory.is_enabled()
def test_graph_with_disabled_memory():
config = {
"llm_provider": "openai",
"backend_url": "https://api.openai.com/v1",
"enable_memory": False,
}
graph = TradingAgentsGraph(["market"], config=config)
# Should work without memory
assert not graph.bull_memory.is_enabled()
Manual Testing Scenarios
-
OpenRouter + OpenAI embeddings
export OPENROUTER_API_KEY="sk-or-..." export OPENAI_API_KEY="sk-..." python -m cli.main # Select OpenRouter for chat, OpenAI for embeddings -
All Ollama (local)
ollama pull llama3.1 nomic-embed-text python -m cli.main # Select Ollama for both chat and embeddings -
Disabled memory
python -m cli.main # Select any chat provider, disable memory # Verify agents work without errors
Breaking Changes
None - This is a backward-compatible enhancement.
Existing code continues to work without modification. New features are opt-in.
Dependencies
No new dependencies added. Uses existing packages:
openai(already required)chromadb(already required)
Performance Impact
- Minimal: Embedding initialization is one-time cost
- Memory: No additional memory overhead when disabled
- Latency: No impact on chat model latency
- Cost: Allows optimization by choosing cheaper embedding providers
Security Considerations
- API keys for different providers should be stored separately
- Follow least-privilege principle: use separate keys for chat vs embeddings
- Embedding data sent to configured provider (ensure compliance)
Example .env:
# Separate keys for different services
OPENAI_API_KEY="sk-..." # For embeddings
OPENROUTER_API_KEY="sk-or-..." # For chat models
Future Enhancements
Potential improvements for future versions:
-
Additional embedding providers:
- HuggingFace embeddings
- Cohere embeddings
- Azure OpenAI embeddings
-
Embedding caching:
- Cache embeddings to disk
- Reduce API calls for repeated situations
-
Embedding fine-tuning:
- Support for custom fine-tuned embedding models
- Domain-specific financial embeddings
-
Async embeddings:
- Batch embedding requests
- Parallel processing for large memory operations
-
Embedding quality metrics:
- Track similarity score distributions
- Alert on low-quality matches
Migration Checklist
For users upgrading to this version:
- Review current configuration
- Identify chat provider (OpenRouter, Anthropic, etc.)
- Decide on embedding strategy:
- Use OpenAI for embeddings (recommended)
- Use Ollama for local embeddings
- Disable memory if not needed
- Update
.envwith necessary API keys - Test configuration in development
- Monitor logs for embedding-related warnings
- Verify memory is working as expected
Rollback Plan
If issues arise:
- Immediate: Set
enable_memory: Falseto disable embeddings - Code: Remove embedding-specific config, system uses defaults
- Branch: Revert to previous commit before this feature
Support
For questions or issues:
- Check
docs/EMBEDDING_CONFIGURATION.mdfor detailed guide - Review error logs for specific failure messages
- Try with
enable_memory: Falseto isolate issue - Open GitHub issue with:
- Configuration used
- Error messages/logs
- Provider information
Conclusion
This implementation successfully addresses the embedding/chat provider separation issue while maintaining backward compatibility and adding robust error handling. The system now supports flexible provider configurations and gracefully handles failures.
Key Achievements:
- ✅ Separate embedding and chat configurations
- ✅ Multiple embedding provider support
- ✅ Graceful degradation on failures
- ✅ Backward compatible
- ✅ Comprehensive documentation
- ✅ CLI integration
- ✅ Zero new dependencies