8.3 KiB
8.3 KiB
Time Series Cache System for Financial Data
An intelligent caching system for TradingAgents that optimizes financial API calls through smart time series data management.
🚀 Overview
The Time Series Cache system provides intelligent caching for financial data APIs, automatically managing:
- Date Range Optimization: Detects overlapping queries and fetches only missing data
- Multiple Data Types: OHLCV, news, fundamentals, technical indicators, insider data
- Storage Efficiency: Uses Parquet format with SQLite indexing for fast retrieval
- Cache Management: Built-in statistics, cleanup, and monitoring tools
📊 Key Features
✅ Intelligent Gap Detection
- Automatically identifies what data is already cached
- Only fetches missing date ranges from APIs
- Seamlessly merges cached and new data
✅ Multiple Data Type Support
- OHLCV Data: Price, volume data from YFinance
- News Data: Finnhub news, Google News
- Technical Indicators: RSI, MACD, SMA, etc.
- Insider Data: SEC insider transactions and sentiment
- Fundamentals: Financial statements and ratios
✅ Performance Optimization
- Fast Storage: Parquet files for data, SQLite for indexing
- Memory Efficient: Loads only requested date ranges
- Parallel Safe: Thread-safe operations for concurrent access
✅ Cache Management
- Performance statistics and monitoring
- Automated cleanup of old data
- Symbol-specific and date-based clearing
🔧 Installation & Setup
The cache system is integrated into TradingAgents dataflows. No additional setup required!
Cache files are stored in: data_cache/time_series/
📖 Usage Examples
Basic OHLCV Data Caching
from tradingagents.dataflows import get_YFin_data_cached
# First call - fetches from API and caches
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
# Second call - uses cache (much faster!)
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
# Overlapping range - only fetches new dates
data = get_YFin_data_cached("AAPL", "2024-01-10", "2024-01-25")
Window-Based Data Retrieval
from tradingagents.dataflows import get_YFin_data_window_cached
# Get 30 days of data before current date
data = get_YFin_data_window_cached("TSLA", "2024-01-15", 30)
News Data Caching
from tradingagents.dataflows import get_finnhub_news_cached, get_google_news_cached
# Cache Finnhub news
news = get_finnhub_news_cached("AAPL", "2024-01-15", 7)
# Cache Google News
google_news = get_google_news_cached("stock market", "2024-01-15", 7)
Technical Indicators Caching
from tradingagents.dataflows import get_technical_indicators_cached
# Cache RSI calculations
rsi_data = get_technical_indicators_cached("AAPL", "rsi", "2024-01-15", 20)
# Cache MACD calculations
macd_data = get_technical_indicators_cached("AAPL", "macd", "2024-01-15", 30)
Cache Performance Monitoring
from tradingagents.dataflows import get_cache_statistics
# Get comprehensive cache stats
stats = get_cache_statistics()
print(stats)
# Output example:
# ## Financial Data Cache Statistics
#
# **Cache Performance:**
# - Total Entries: 42
# - Cache Size: 15.67 MB
# - Hit Ratio: 78.3%
# - Cache Hits: 89
# - Cache Misses: 25
# - API Calls Saved: 64
Cache Management
from tradingagents.dataflows import clear_cache_data
# Clear cache for specific symbol
clear_cache_data(symbol="AAPL")
# Clear data older than 30 days
clear_cache_data(older_than_days=30)
# Clear old data for specific symbol
clear_cache_data(symbol="AAPL", older_than_days=7)
🏗️ Architecture
Core Components
- TimeSeriesCache: Main cache engine with intelligent date range management
- CachedApiWrappers: Integration layer with existing financial APIs
- Interface Functions: Drop-in replacements for existing API calls
Data Flow
API Request → Cache Check → Gap Detection → API Fetch (if needed) → Cache Store → Return Data
Storage Structure
data_cache/time_series/
├── cache_index.db # SQLite index for fast lookups
├── ohlcv/ # OHLCV data files
│ ├── AAPL_abc123.parquet
│ └── TSLA_def456.parquet
├── news/ # News data files
├── indicators/ # Technical indicators
├── insider/ # Insider trading data
└── sentiment/ # Sentiment analysis data
📈 Performance Benefits
Speed Improvements
- Cache Hits: 10-100x faster than API calls
- Gap Filling: Only fetches missing data
- Batch Operations: Efficient for overlapping queries
Cost Savings
- Reduced API Calls: Can reduce API usage by 60-90%
- Rate Limit Friendly: Avoids redundant API requests
- Bandwidth Efficient: Local storage reduces network usage
Example Performance
# First call: ~2.5 seconds (API fetch + cache)
data1 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
# Second call: ~0.05 seconds (cache hit)
data2 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
# 50x speed improvement!
🔄 Migration Guide
Replace Existing Functions
| Old Function | New Cached Function |
|---|---|
get_YFin_data() |
get_YFin_data_cached() |
get_YFin_data_window() |
get_YFin_data_window_cached() |
get_finnhub_news() |
get_finnhub_news_cached() |
get_google_news() |
get_google_news_cached() |
Example Migration
# Before
from tradingagents.dataflows import get_YFin_data
data = get_YFin_data("AAPL", "2024-01-01", "2024-01-15")
# After
from tradingagents.dataflows import get_YFin_data_cached
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
# Same interface, better performance!
🛠️ Advanced Configuration
Custom Cache Directory
from tradingagents.dataflows.time_series_cache import TimeSeriesCache
# Create cache with custom directory
cache = TimeSeriesCache(cache_dir="/path/to/custom/cache")
Direct Cache API
from tradingagents.dataflows.time_series_cache import get_cache, DataType
from datetime import datetime
cache = get_cache()
# Check cache coverage
gaps, cached = cache.check_cache_coverage(
"AAPL",
DataType.OHLCV,
datetime(2024, 1, 1),
datetime(2024, 1, 15)
)
# Fetch with custom function
def my_fetch_function(symbol, start_date, end_date):
# Your custom API fetch logic
return pd.DataFrame(...)
data = cache.fetch_with_cache(
"AAPL",
DataType.OHLCV,
datetime(2024, 1, 1),
datetime(2024, 1, 15),
my_fetch_function
)
🧪 Testing
Run the demo script to test the caching system:
python demo_time_series_cache.py
This will demonstrate:
- OHLCV data caching performance
- News data caching
- Technical indicators caching
- Cache statistics and management
🔍 Troubleshooting
Common Issues
Cache directory permissions
# Ensure write permissions
chmod 755 data_cache/time_series/
SQLite database locked
- Restart Python process
- Check for concurrent access
Missing data dependencies
# Install required packages
pip install pandas pyarrow sqlite3
Debug Mode
import logging
logging.basicConfig(level=logging.INFO)
# Cache operations will now show detailed logs
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
📋 Cache Statistics Explained
| Metric | Description |
|---|---|
| Total Entries | Number of cached data segments |
| Cache Size | Total disk space used (MB) |
| Hit Ratio | % of requests served from cache |
| Cache Hits | Number of successful cache retrievals |
| Cache Misses | Number of API calls required |
| API Calls Saved | Estimated API calls avoided |
🤝 Contributing
The cache system is designed to be extensible. To add new data types:
- Add new
DataTypeenum value - Create wrapper function in
cached_api_wrappers.py - Add interface function in
interface.py - Update exports in
__init__.py
📚 Related Documentation
💡 Pro Tip: Monitor cache performance regularly with get_cache_statistics() to optimize your data retrieval patterns!