# Time Series Cache System for Financial Data An intelligent caching system for TradingAgents that optimizes financial API calls through smart time series data management. ## ๐Ÿš€ Overview The Time Series Cache system provides intelligent caching for financial data APIs, automatically managing: - **Date Range Optimization**: Detects overlapping queries and fetches only missing data - **Multiple Data Types**: OHLCV, news, fundamentals, technical indicators, insider data - **Storage Efficiency**: Uses Parquet format with SQLite indexing for fast retrieval - **Cache Management**: Built-in statistics, cleanup, and monitoring tools ## ๐Ÿ“Š Key Features ### โœ… Intelligent Gap Detection - Automatically identifies what data is already cached - Only fetches missing date ranges from APIs - Seamlessly merges cached and new data ### โœ… Multiple Data Type Support - **OHLCV Data**: Price, volume data from YFinance - **News Data**: Finnhub news, Google News - **Technical Indicators**: RSI, MACD, SMA, etc. - **Insider Data**: SEC insider transactions and sentiment - **Fundamentals**: Financial statements and ratios ### โœ… Performance Optimization - **Fast Storage**: Parquet files for data, SQLite for indexing - **Memory Efficient**: Loads only requested date ranges - **Parallel Safe**: Thread-safe operations for concurrent access ### โœ… Cache Management - Performance statistics and monitoring - Automated cleanup of old data - Symbol-specific and date-based clearing ## ๐Ÿ”ง Installation & Setup The cache system is integrated into TradingAgents dataflows. No additional setup required! Cache files are stored in: `data_cache/time_series/` ## ๐Ÿ“– Usage Examples ### Basic OHLCV Data Caching ```python from tradingagents.dataflows import get_YFin_data_cached # First call - fetches from API and caches data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15") # Second call - uses cache (much faster!) data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15") # Overlapping range - only fetches new dates data = get_YFin_data_cached("AAPL", "2024-01-10", "2024-01-25") ``` ### Window-Based Data Retrieval ```python from tradingagents.dataflows import get_YFin_data_window_cached # Get 30 days of data before current date data = get_YFin_data_window_cached("TSLA", "2024-01-15", 30) ``` ### News Data Caching ```python from tradingagents.dataflows import get_finnhub_news_cached, get_google_news_cached # Cache Finnhub news news = get_finnhub_news_cached("AAPL", "2024-01-15", 7) # Cache Google News google_news = get_google_news_cached("stock market", "2024-01-15", 7) ``` ### Technical Indicators Caching ```python from tradingagents.dataflows import get_technical_indicators_cached # Cache RSI calculations rsi_data = get_technical_indicators_cached("AAPL", "rsi", "2024-01-15", 20) # Cache MACD calculations macd_data = get_technical_indicators_cached("AAPL", "macd", "2024-01-15", 30) ``` ### Cache Performance Monitoring ```python from tradingagents.dataflows import get_cache_statistics # Get comprehensive cache stats stats = get_cache_statistics() print(stats) # Output example: # ## Financial Data Cache Statistics # # **Cache Performance:** # - Total Entries: 42 # - Cache Size: 15.67 MB # - Hit Ratio: 78.3% # - Cache Hits: 89 # - Cache Misses: 25 # - API Calls Saved: 64 ``` ### Cache Management ```python from tradingagents.dataflows import clear_cache_data # Clear cache for specific symbol clear_cache_data(symbol="AAPL") # Clear data older than 30 days clear_cache_data(older_than_days=30) # Clear old data for specific symbol clear_cache_data(symbol="AAPL", older_than_days=7) ``` ## ๐Ÿ—๏ธ Architecture ### Core Components 1. **TimeSeriesCache**: Main cache engine with intelligent date range management 2. **CachedApiWrappers**: Integration layer with existing financial APIs 3. **Interface Functions**: Drop-in replacements for existing API calls ### Data Flow ``` API Request โ†’ Cache Check โ†’ Gap Detection โ†’ API Fetch (if needed) โ†’ Cache Store โ†’ Return Data ``` ### Storage Structure ``` data_cache/time_series/ โ”œโ”€โ”€ cache_index.db # SQLite index for fast lookups โ”œโ”€โ”€ ohlcv/ # OHLCV data files โ”‚ โ”œโ”€โ”€ AAPL_abc123.parquet โ”‚ โ””โ”€โ”€ TSLA_def456.parquet โ”œโ”€โ”€ news/ # News data files โ”œโ”€โ”€ indicators/ # Technical indicators โ”œโ”€โ”€ insider/ # Insider trading data โ””โ”€โ”€ sentiment/ # Sentiment analysis data ``` ## ๐Ÿ“ˆ Performance Benefits ### Speed Improvements - **Cache Hits**: 10-100x faster than API calls - **Gap Filling**: Only fetches missing data - **Batch Operations**: Efficient for overlapping queries ### Cost Savings - **Reduced API Calls**: Can reduce API usage by 60-90% - **Rate Limit Friendly**: Avoids redundant API requests - **Bandwidth Efficient**: Local storage reduces network usage ### Example Performance ```python # First call: ~2.5 seconds (API fetch + cache) data1 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15") # Second call: ~0.05 seconds (cache hit) data2 = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15") # 50x speed improvement! ``` ## ๐Ÿ”„ Migration Guide ### Replace Existing Functions | Old Function | New Cached Function | |--------------|-------------------| | `get_YFin_data()` | `get_YFin_data_cached()` | | `get_YFin_data_window()` | `get_YFin_data_window_cached()` | | `get_finnhub_news()` | `get_finnhub_news_cached()` | | `get_google_news()` | `get_google_news_cached()` | ### Example Migration ```python # Before from tradingagents.dataflows import get_YFin_data data = get_YFin_data("AAPL", "2024-01-01", "2024-01-15") # After from tradingagents.dataflows import get_YFin_data_cached data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15") # Same interface, better performance! ``` ## ๐Ÿ› ๏ธ Advanced Configuration ### Custom Cache Directory ```python from tradingagents.dataflows.time_series_cache import TimeSeriesCache # Create cache with custom directory cache = TimeSeriesCache(cache_dir="/path/to/custom/cache") ``` ### Direct Cache API ```python from tradingagents.dataflows.time_series_cache import get_cache, DataType from datetime import datetime cache = get_cache() # Check cache coverage gaps, cached = cache.check_cache_coverage( "AAPL", DataType.OHLCV, datetime(2024, 1, 1), datetime(2024, 1, 15) ) # Fetch with custom function def my_fetch_function(symbol, start_date, end_date): # Your custom API fetch logic return pd.DataFrame(...) data = cache.fetch_with_cache( "AAPL", DataType.OHLCV, datetime(2024, 1, 1), datetime(2024, 1, 15), my_fetch_function ) ``` ## ๐Ÿงช Testing Run the demo script to test the caching system: ```bash python demo_time_series_cache.py ``` This will demonstrate: - OHLCV data caching performance - News data caching - Technical indicators caching - Cache statistics and management ## ๐Ÿ” Troubleshooting ### Common Issues **Cache directory permissions** ```bash # Ensure write permissions chmod 755 data_cache/time_series/ ``` **SQLite database locked** - Restart Python process - Check for concurrent access **Missing data dependencies** ```bash # Install required packages pip install pandas pyarrow sqlite3 ``` ### Debug Mode ```python import logging logging.basicConfig(level=logging.INFO) # Cache operations will now show detailed logs data = get_YFin_data_cached("AAPL", "2024-01-01", "2024-01-15") ``` ## ๐Ÿ“‹ Cache Statistics Explained | Metric | Description | |--------|-------------| | **Total Entries** | Number of cached data segments | | **Cache Size** | Total disk space used (MB) | | **Hit Ratio** | % of requests served from cache | | **Cache Hits** | Number of successful cache retrievals | | **Cache Misses** | Number of API calls required | | **API Calls Saved** | Estimated API calls avoided | ## ๐Ÿค Contributing The cache system is designed to be extensible. To add new data types: 1. Add new `DataType` enum value 2. Create wrapper function in `cached_api_wrappers.py` 3. Add interface function in `interface.py` 4. Update exports in `__init__.py` ## ๐Ÿ“š Related Documentation - [TradingAgents API Documentation](./README.md) - [Financial Data Configuration](./tradingagents/dataflows/config.py) - [Agent Utilities](./tradingagents/agents/utils/) --- **๐Ÿ’ก Pro Tip**: Monitor cache performance regularly with `get_cache_statistics()` to optimize your data retrieval patterns!