TradingAgents/TIME_SERIES_CACHE_README.md

8.3 KiB

Time Series Cache System for Financial Data

An intelligent caching system for TradingAgents that optimizes financial API calls through smart time series data management.

🚀 Overview

The Time Series Cache system provides intelligent caching for financial data APIs, automatically managing:

  • Date Range Optimization: Detects overlapping queries and fetches only missing data
  • Multiple Data Types: OHLCV, news, fundamentals, technical indicators, insider data
  • Storage Efficiency: Uses Parquet format with SQLite indexing for fast retrieval
  • Cache Management: Built-in statistics, cleanup, and monitoring tools

📊 Key Features

Intelligent Gap Detection

  • Automatically identifies what data is already cached
  • Only fetches missing date ranges from APIs
  • Seamlessly merges cached and new data

Multiple Data Type Support

  • OHLCV Data: Price, volume data from YFinance
  • News Data: Finnhub news, Google News
  • Technical Indicators: RSI, MACD, SMA, etc.
  • Insider Data: SEC insider transactions and sentiment
  • Fundamentals: Financial statements and ratios

Performance Optimization

  • Fast Storage: Parquet files for data, SQLite for indexing
  • Memory Efficient: Loads only requested date ranges
  • Parallel Safe: Thread-safe operations for concurrent access

Cache Management

  • Performance statistics and monitoring
  • Automated cleanup of old data
  • Symbol-specific and date-based clearing

🔧 Installation & Setup

The cache system is integrated into TradingAgents dataflows. No additional setup required!

Cache files are stored in: data_cache/time_series/

📖 Usage Examples

Basic OHLCV Data Caching

from tradingagents.dataflows import get_YFin_data_cached

# First call - fetches from API and caches
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")

# Second call - uses cache (much faster!)
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")

# Overlapping range - only fetches new dates
data = get_YFin_data_cached("AAPL", "2024-01-10", "2024-01-25")

Window-Based Data Retrieval

from tradingagents.dataflows import get_YFin_data_window_cached

# Get 30 days of data before current date
data = get_YFin_data_window_cached("TSLA", "2024-01-15", 30)

News Data Caching

from tradingagents.dataflows import get_finnhub_news_cached, get_google_news_cached

# Cache Finnhub news
news = get_finnhub_news_cached("AAPL", "2024-01-15", 7)

# Cache Google News
google_news = get_google_news_cached("stock market", "2024-01-15", 7)

Technical Indicators Caching

from tradingagents.dataflows import get_technical_indicators_cached

# Cache RSI calculations
rsi_data = get_technical_indicators_cached("AAPL", "rsi", "2024-01-15", 20)

# Cache MACD calculations  
macd_data = get_technical_indicators_cached("AAPL", "macd", "2024-01-15", 30)

Cache Performance Monitoring

from tradingagents.dataflows import get_cache_statistics

# Get comprehensive cache stats
stats = get_cache_statistics()
print(stats)

# Output example:
# ## Financial Data Cache Statistics
# 
# **Cache Performance:**
# - Total Entries: 42
# - Cache Size: 15.67 MB
# - Hit Ratio: 78.3%
# - Cache Hits: 89
# - Cache Misses: 25
# - API Calls Saved: 64

Cache Management

from tradingagents.dataflows import clear_cache_data

# Clear cache for specific symbol
clear_cache_data(symbol="AAPL")

# Clear data older than 30 days
clear_cache_data(older_than_days=30)

# Clear old data for specific symbol
clear_cache_data(symbol="AAPL", older_than_days=7)

🏗️ Architecture

Core Components

  1. TimeSeriesCache: Main cache engine with intelligent date range management
  2. CachedApiWrappers: Integration layer with existing financial APIs
  3. Interface Functions: Drop-in replacements for existing API calls

Data Flow

API Request → Cache Check → Gap Detection → API Fetch (if needed) → Cache Store → Return Data

Storage Structure

data_cache/time_series/
├── cache_index.db          # SQLite index for fast lookups
├── ohlcv/                  # OHLCV data files
│   ├── AAPL_abc123.parquet
│   └── TSLA_def456.parquet
├── news/                   # News data files
├── indicators/             # Technical indicators
├── insider/               # Insider trading data
└── sentiment/             # Sentiment analysis data

📈 Performance Benefits

Speed Improvements

  • Cache Hits: 10-100x faster than API calls
  • Gap Filling: Only fetches missing data
  • Batch Operations: Efficient for overlapping queries

Cost Savings

  • Reduced API Calls: Can reduce API usage by 60-90%
  • Rate Limit Friendly: Avoids redundant API requests
  • Bandwidth Efficient: Local storage reduces network usage

Example Performance

# First call: ~2.5 seconds (API fetch + cache)
data1 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")

# Second call: ~0.05 seconds (cache hit)
data2 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")

# 50x speed improvement!

🔄 Migration Guide

Replace Existing Functions

Old Function New Cached Function
get_YFin_data() get_YFin_data_cached()
get_YFin_data_window() get_YFin_data_window_cached()
get_finnhub_news() get_finnhub_news_cached()
get_google_news() get_google_news_cached()

Example Migration

# Before
from tradingagents.dataflows import get_YFin_data
data = get_YFin_data("AAPL", "2024-01-01", "2024-01-15")

# After  
from tradingagents.dataflows import get_YFin_data_cached
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")

# Same interface, better performance!

🛠️ Advanced Configuration

Custom Cache Directory

from tradingagents.dataflows.time_series_cache import TimeSeriesCache

# Create cache with custom directory
cache = TimeSeriesCache(cache_dir="/path/to/custom/cache")

Direct Cache API

from tradingagents.dataflows.time_series_cache import get_cache, DataType
from datetime import datetime

cache = get_cache()

# Check cache coverage
gaps, cached = cache.check_cache_coverage(
    "AAPL", 
    DataType.OHLCV, 
    datetime(2024, 1, 1), 
    datetime(2024, 1, 15)
)

# Fetch with custom function
def my_fetch_function(symbol, start_date, end_date):
    # Your custom API fetch logic
    return pd.DataFrame(...)

data = cache.fetch_with_cache(
    "AAPL", 
    DataType.OHLCV,
    datetime(2024, 1, 1),
    datetime(2024, 1, 15), 
    my_fetch_function
)

🧪 Testing

Run the demo script to test the caching system:

python demo_time_series_cache.py

This will demonstrate:

  • OHLCV data caching performance
  • News data caching
  • Technical indicators caching
  • Cache statistics and management

🔍 Troubleshooting

Common Issues

Cache directory permissions

# Ensure write permissions
chmod 755 data_cache/time_series/

SQLite database locked

  • Restart Python process
  • Check for concurrent access

Missing data dependencies

# Install required packages
pip install pandas pyarrow sqlite3

Debug Mode

import logging
logging.basicConfig(level=logging.INFO)

# Cache operations will now show detailed logs
data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15")

📋 Cache Statistics Explained

Metric Description
Total Entries Number of cached data segments
Cache Size Total disk space used (MB)
Hit Ratio % of requests served from cache
Cache Hits Number of successful cache retrievals
Cache Misses Number of API calls required
API Calls Saved Estimated API calls avoided

🤝 Contributing

The cache system is designed to be extensible. To add new data types:

  1. Add new DataType enum value
  2. Create wrapper function in cached_api_wrappers.py
  3. Add interface function in interface.py
  4. Update exports in __init__.py

💡 Pro Tip: Monitor cache performance regularly with get_cache_statistics() to optimize your data retrieval patterns!