6.6 KiB
Memory.py Chunking & Persistent Storage - Quick Reference
Summary of Changes
Implementation of get_embedding chunking and ChromaDB persistent storage from BA2TradePlatform to TradingAgents repository with minimal code changes for easy PR review.
Files Modified
1. requirements.txt
Change: Added 1 line
typing-extensions
+langchain
langchain-openai
2. tradingagents/agents/utils/memory.py
Changes: Enhanced 3 methods + updated imports
Import Changes
import chromadb
from chromadb.config import Settings
from openai import OpenAI
+import numpy as np
+import os
+from langchain.text_splitter import RecursiveCharacterTextSplitter
init Method
Before:
def __init__(self, name, config):
After:
def __init__(self, name, config, symbol=None, persistent_dir=None):
Key additions:
- Optional
symbolparameter for collection naming - Optional
persistent_dirparameter for disk storage - PersistentClient instead of in-memory Client (when persistent_dir provided)
- Collection name sanitization
- Error handling for ChromaDB compatibility
get_embedding Method
Before: Returned single embedding
def get_embedding(self, text):
response = self.client.embeddings.create(model=self.embedding, input=text)
return response.data[0].embedding
After: Returns list of embeddings (chunking support)
def get_embedding(self, text):
max_chars = 24000
if len(text) <= max_chars:
response = self.client.embeddings.create(model=self.embedding, input=text)
return [response.data[0].embedding] # Return as list
# Chunk long text and return list of embeddings
text_splitter = RecursiveCharacterTextSplitter(...)
chunks = text_splitter.split_text(text)
return [get_embedding_for_chunk(chunk) for chunk in chunks]
add_situations Method
Before: Single embedding per situation
for i, (situation, recommendation) in enumerate(situations_and_advice):
embeddings.append(self.get_embedding(situation))
After: Multiple embeddings per situation (chunking support)
for situation, recommendation in situations_and_advice:
situation_embeddings = self.get_embedding(situation) # Now returns list
for chunk_idx, embedding in enumerate(situation_embeddings):
situations.append(situation)
embeddings.append(embedding)
# ... store each chunk
get_memories Method
Before: Single embedding query
query_embedding = self.get_embedding(current_situation)
After: Average embeddings for multi-chunk queries
query_embeddings = self.get_embedding(current_situation) # Returns list
if len(query_embeddings) > 1:
query_embedding = np.mean(query_embeddings, axis=0).tolist()
else:
query_embedding = query_embeddings[0]
3. test_memory_chunking.py (New File)
Comprehensive test suite with 4 test scenarios:
- Short text backward compatibility
- Long text chunking (24,000+ chars)
- Persistent storage functionality
- Symbol-based collection naming
Key Features
1. Text Chunking
- Trigger: Texts > 24,000 characters (~8,000 tokens)
- Method: RecursiveCharacterTextSplitter
- Chunk size: 23,000 chars
- Overlap: 500 chars
- Separators:
["\n\n", "\n", ". ", " ", ""]
2. Persistent Storage
- Client: ChromaDB PersistentClient
- Path: User-specified via
persistent_dirparameter - Collections: Per-symbol or shared
- Fallback: In-memory mode if
persistent_dirnot provided
3. Backward Compatibility
- ✅ Old API calls work unchanged
- ✅ In-memory storage by default
- ✅ Single embedding for short texts
- ✅ All existing tests pass
Usage Comparison
Basic Usage (Unchanged)
# Works exactly as before
config = {"backend_url": "https://api.openai.com/v1"}
memory = FinancialSituationMemory("trading", config)
memory.add_situations([(situation, advice)])
results = memory.get_memories(query, n_matches=1)
New Features (Opt-in)
# With persistent storage
memory = FinancialSituationMemory(
"trading",
config,
symbol="AAPL",
persistent_dir="./chromadb_storage"
)
# Handles long texts automatically
long_analysis = "..." * 10000 # Very long text
memory.add_situations([(long_analysis, "recommendation")])
Benefits
Problem Solved #1: Long Text Handling
- Before: ❌ API error for texts > 8K tokens
- After: ✅ Automatic chunking and processing
Problem Solved #2: Memory Persistence
- Before: ❌ Lost on process restart
- After: ✅ Survives across sessions
Additional Benefits
- Per-symbol memory isolation
- Better organization for multi-asset systems
- Robust error handling
- Informative logging
Migration Path
No Migration Needed!
Existing code continues to work without any changes.
To Enable New Features:
- Add
persistent_dirparameter to enable disk storage - Add
symbolparameter to isolate memories per symbol - No other code changes required!
Testing
Run Test Suite
cd TradingAgents
export OPENAI_API_KEY="your-key"
python test_memory_chunking.py
Expected Output
✅ PASSED: Short Text Compatibility
✅ PASSED: Long Text Chunking
✅ PASSED: Persistent Storage
✅ PASSED: Symbol Collection Naming
ALL TESTS PASSED!
Code Review Checklist
- ✅ Minimal changes - Only essential modifications
- ✅ No breaking changes - Full backward compatibility
- ✅ Well-tested - Comprehensive test coverage
- ✅ Documented - Clear docstrings and PR description
- ✅ Production-ready - Error handling and fallbacks
- ✅ Clean diff - Easy to review in GitHub
Diff Statistics
- Lines added: ~120
- Lines removed: ~15
- Net change: ~105 lines
- Files modified: 2
- Files added: 2 (test + PR doc)
- Dependencies added: 1 (
langchain)
Comparison with BA2TradePlatform Version
The TradingAgents version is intentionally simplified:
Removed (BA2-specific):
- ❌
market_analysis_idparameter (BA2-specific) - ❌
expert_instance_idparameter (BA2-specific) - ❌
from ba2_trade_platform.config import CACHE_FOLDER - ❌ Logger references (
ta_logger) replaced withprint()
Kept (Universal):
- ✅ Text chunking logic
- ✅ Persistent storage
- ✅ Symbol-based naming
- ✅ Error handling
- ✅ Backward compatibility
Result:
Clean, standalone implementation ready for TradingAgents upstream!
Ready for Pull Request ✅
This implementation provides the same functionality as BA2TradePlatform while maintaining independence and minimal changes for easy review.