319 lines
8.3 KiB
Markdown
319 lines
8.3 KiB
Markdown
# Time Series Cache System for Financial Data
|
|
|
|
An intelligent caching system for TradingAgents that optimizes financial API calls through smart time series data management.
|
|
|
|
## 🚀 Overview
|
|
|
|
The Time Series Cache system provides intelligent caching for financial data APIs, automatically managing:
|
|
- **Date Range Optimization**: Detects overlapping queries and fetches only missing data
|
|
- **Multiple Data Types**: OHLCV, news, fundamentals, technical indicators, insider data
|
|
- **Storage Efficiency**: Uses Parquet format with SQLite indexing for fast retrieval
|
|
- **Cache Management**: Built-in statistics, cleanup, and monitoring tools
|
|
|
|
## 📊 Key Features
|
|
|
|
### ✅ Intelligent Gap Detection
|
|
- Automatically identifies what data is already cached
|
|
- Only fetches missing date ranges from APIs
|
|
- Seamlessly merges cached and new data
|
|
|
|
### ✅ Multiple Data Type Support
|
|
- **OHLCV Data**: Price, volume data from YFinance
|
|
- **News Data**: Finnhub news, Google News
|
|
- **Technical Indicators**: RSI, MACD, SMA, etc.
|
|
- **Insider Data**: SEC insider transactions and sentiment
|
|
- **Fundamentals**: Financial statements and ratios
|
|
|
|
### ✅ Performance Optimization
|
|
- **Fast Storage**: Parquet files for data, SQLite for indexing
|
|
- **Memory Efficient**: Loads only requested date ranges
|
|
- **Parallel Safe**: Thread-safe operations for concurrent access
|
|
|
|
### ✅ Cache Management
|
|
- Performance statistics and monitoring
|
|
- Automated cleanup of old data
|
|
- Symbol-specific and date-based clearing
|
|
|
|
## 🔧 Installation & Setup
|
|
|
|
The cache system is integrated into TradingAgents dataflows. No additional setup required!
|
|
|
|
Cache files are stored in: `data_cache/time_series/`
|
|
|
|
## 📖 Usage Examples
|
|
|
|
### Basic OHLCV Data Caching
|
|
|
|
```python
|
|
from tradingagents.dataflows import get_YFin_data_cached
|
|
|
|
# First call - fetches from API and caches
|
|
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
|
|
|
|
# Second call - uses cache (much faster!)
|
|
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
|
|
|
|
# Overlapping range - only fetches new dates
|
|
data = get_YFin_data_cached("AAPL", "2024-01-10", "2024-01-25")
|
|
```
|
|
|
|
### Window-Based Data Retrieval
|
|
|
|
```python
|
|
from tradingagents.dataflows import get_YFin_data_window_cached
|
|
|
|
# Get 30 days of data before current date
|
|
data = get_YFin_data_window_cached("TSLA", "2024-01-15", 30)
|
|
```
|
|
|
|
### News Data Caching
|
|
|
|
```python
|
|
from tradingagents.dataflows import get_finnhub_news_cached, get_google_news_cached
|
|
|
|
# Cache Finnhub news
|
|
news = get_finnhub_news_cached("AAPL", "2024-01-15", 7)
|
|
|
|
# Cache Google News
|
|
google_news = get_google_news_cached("stock market", "2024-01-15", 7)
|
|
```
|
|
|
|
### Technical Indicators Caching
|
|
|
|
```python
|
|
from tradingagents.dataflows import get_technical_indicators_cached
|
|
|
|
# Cache RSI calculations
|
|
rsi_data = get_technical_indicators_cached("AAPL", "rsi", "2024-01-15", 20)
|
|
|
|
# Cache MACD calculations
|
|
macd_data = get_technical_indicators_cached("AAPL", "macd", "2024-01-15", 30)
|
|
```
|
|
|
|
### Cache Performance Monitoring
|
|
|
|
```python
|
|
from tradingagents.dataflows import get_cache_statistics
|
|
|
|
# Get comprehensive cache stats
|
|
stats = get_cache_statistics()
|
|
print(stats)
|
|
|
|
# Output example:
|
|
# ## Financial Data Cache Statistics
|
|
#
|
|
# **Cache Performance:**
|
|
# - Total Entries: 42
|
|
# - Cache Size: 15.67 MB
|
|
# - Hit Ratio: 78.3%
|
|
# - Cache Hits: 89
|
|
# - Cache Misses: 25
|
|
# - API Calls Saved: 64
|
|
```
|
|
|
|
### Cache Management
|
|
|
|
```python
|
|
from tradingagents.dataflows import clear_cache_data
|
|
|
|
# Clear cache for specific symbol
|
|
clear_cache_data(symbol="AAPL")
|
|
|
|
# Clear data older than 30 days
|
|
clear_cache_data(older_than_days=30)
|
|
|
|
# Clear old data for specific symbol
|
|
clear_cache_data(symbol="AAPL", older_than_days=7)
|
|
```
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Core Components
|
|
|
|
1. **TimeSeriesCache**: Main cache engine with intelligent date range management
|
|
2. **CachedApiWrappers**: Integration layer with existing financial APIs
|
|
3. **Interface Functions**: Drop-in replacements for existing API calls
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
API Request → Cache Check → Gap Detection → API Fetch (if needed) → Cache Store → Return Data
|
|
```
|
|
|
|
### Storage Structure
|
|
|
|
```
|
|
data_cache/time_series/
|
|
├── cache_index.db # SQLite index for fast lookups
|
|
├── ohlcv/ # OHLCV data files
|
|
│ ├── AAPL_abc123.parquet
|
|
│ └── TSLA_def456.parquet
|
|
├── news/ # News data files
|
|
├── indicators/ # Technical indicators
|
|
├── insider/ # Insider trading data
|
|
└── sentiment/ # Sentiment analysis data
|
|
```
|
|
|
|
## 📈 Performance Benefits
|
|
|
|
### Speed Improvements
|
|
- **Cache Hits**: 10-100x faster than API calls
|
|
- **Gap Filling**: Only fetches missing data
|
|
- **Batch Operations**: Efficient for overlapping queries
|
|
|
|
### Cost Savings
|
|
- **Reduced API Calls**: Can reduce API usage by 60-90%
|
|
- **Rate Limit Friendly**: Avoids redundant API requests
|
|
- **Bandwidth Efficient**: Local storage reduces network usage
|
|
|
|
### Example Performance
|
|
|
|
```python
|
|
# First call: ~2.5 seconds (API fetch + cache)
|
|
data1 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
|
|
|
|
# Second call: ~0.05 seconds (cache hit)
|
|
data2 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
|
|
|
|
# 50x speed improvement!
|
|
```
|
|
|
|
## 🔄 Migration Guide
|
|
|
|
### Replace Existing Functions
|
|
|
|
| Old Function | New Cached Function |
|
|
|--------------|-------------------|
|
|
| `get_YFin_data()` | `get_YFin_data_cached()` |
|
|
| `get_YFin_data_window()` | `get_YFin_data_window_cached()` |
|
|
| `get_finnhub_news()` | `get_finnhub_news_cached()` |
|
|
| `get_google_news()` | `get_google_news_cached()` |
|
|
|
|
### Example Migration
|
|
|
|
```python
|
|
# Before
|
|
from tradingagents.dataflows import get_YFin_data
|
|
data = get_YFin_data("AAPL", "2024-01-01", "2024-01-15")
|
|
|
|
# After
|
|
from tradingagents.dataflows import get_YFin_data_cached
|
|
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
|
|
|
|
# Same interface, better performance!
|
|
```
|
|
|
|
## 🛠️ Advanced Configuration
|
|
|
|
### Custom Cache Directory
|
|
|
|
```python
|
|
from tradingagents.dataflows.time_series_cache import TimeSeriesCache
|
|
|
|
# Create cache with custom directory
|
|
cache = TimeSeriesCache(cache_dir="/path/to/custom/cache")
|
|
```
|
|
|
|
### Direct Cache API
|
|
|
|
```python
|
|
from tradingagents.dataflows.time_series_cache import get_cache, DataType
|
|
from datetime import datetime
|
|
|
|
cache = get_cache()
|
|
|
|
# Check cache coverage
|
|
gaps, cached = cache.check_cache_coverage(
|
|
"AAPL",
|
|
DataType.OHLCV,
|
|
datetime(2024, 1, 1),
|
|
datetime(2024, 1, 15)
|
|
)
|
|
|
|
# Fetch with custom function
|
|
def my_fetch_function(symbol, start_date, end_date):
|
|
# Your custom API fetch logic
|
|
return pd.DataFrame(...)
|
|
|
|
data = cache.fetch_with_cache(
|
|
"AAPL",
|
|
DataType.OHLCV,
|
|
datetime(2024, 1, 1),
|
|
datetime(2024, 1, 15),
|
|
my_fetch_function
|
|
)
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
Run the demo script to test the caching system:
|
|
|
|
```bash
|
|
python demo_time_series_cache.py
|
|
```
|
|
|
|
This will demonstrate:
|
|
- OHLCV data caching performance
|
|
- News data caching
|
|
- Technical indicators caching
|
|
- Cache statistics and management
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Cache directory permissions**
|
|
```bash
|
|
# Ensure write permissions
|
|
chmod 755 data_cache/time_series/
|
|
```
|
|
|
|
**SQLite database locked**
|
|
- Restart Python process
|
|
- Check for concurrent access
|
|
|
|
**Missing data dependencies**
|
|
```bash
|
|
# Install required packages
|
|
pip install pandas pyarrow sqlite3
|
|
```
|
|
|
|
### Debug Mode
|
|
|
|
```python
|
|
import logging
|
|
logging.basicConfig(level=logging.INFO)
|
|
|
|
# Cache operations will now show detailed logs
|
|
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")
|
|
```
|
|
|
|
## 📋 Cache Statistics Explained
|
|
|
|
| Metric | Description |
|
|
|--------|-------------|
|
|
| **Total Entries** | Number of cached data segments |
|
|
| **Cache Size** | Total disk space used (MB) |
|
|
| **Hit Ratio** | % of requests served from cache |
|
|
| **Cache Hits** | Number of successful cache retrievals |
|
|
| **Cache Misses** | Number of API calls required |
|
|
| **API Calls Saved** | Estimated API calls avoided |
|
|
|
|
## 🤝 Contributing
|
|
|
|
The cache system is designed to be extensible. To add new data types:
|
|
|
|
1. Add new `DataType` enum value
|
|
2. Create wrapper function in `cached_api_wrappers.py`
|
|
3. Add interface function in `interface.py`
|
|
4. Update exports in `__init__.py`
|
|
|
|
## 📚 Related Documentation
|
|
|
|
- [TradingAgents API Documentation](./README.md)
|
|
- [Financial Data Configuration](./tradingagents/dataflows/config.py)
|
|
- [Agent Utilities](./tradingagents/agents/utils/)
|
|
|
|
---
|
|
|
|
**💡 Pro Tip**: Monitor cache performance regularly with `get_cache_statistics()` to optimize your data retrieval patterns! |