docs: add filter OHLCV cache design spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Youssef Aitousarrah 2026-04-15 12:31:47 -07:00
parent e50cbbd0ff
commit 883e7ea352
1 changed files with 61 additions and 0 deletions

View File

@ -0,0 +1,61 @@
# Filter Stage OHLCV Cache Design
**Date:** 2026-04-15
**Status:** Approved
## Problem
The discovery filter stage fires ~120 sequential per-ticker yfinance HTTP calls for each run (~60 candidates × 2 calls each), causing "Too Many Requests" rate limit errors. In the 2026-04-15 run, only 36/61 candidates got prices — the rest were silently dropped, leaving the ranker with just 11 candidates and producing only 2 final picks.
The three offending call sites in `filter.py`:
1. `get_stock_price(ticker)` — fallback per-ticker price fetch
2. `check_intraday_movement(ticker)``yf.Ticker().history(period="1d", interval="1m")`
3. `check_if_price_reacted(ticker)``yf.Ticker().history(period="1mo")`
The OHLCV cache (1y daily bars, populated nightly) already contains all the data these checks need. Since discovery runs at 7:30am ET (pre-market), yesterday's close is the correct "current price" anyway.
## Solution
Load the OHLCV cache once at the start of the filter stage for the ~60 candidate tickers, then replace all three yfinance call sites with cache lookups.
## Scope
**File changed:** `tradingagents/dataflows/discovery/filter.py` only.
**Unchanged:** `_fetch_batch_volume()`, `_fetch_batch_news()`, `get_fundamentals()`, scanner code, OHLCV cache internals.
## Design
### Cache load
At the top of `CandidateFilter.filter()`, before the per-candidate loop:
```python
cache_dir = self.config.get("discovery", {}).get("ohlcv_cache_dir", "data/ohlcv_cache")
candidate_tickers = [c["ticker"] for c in candidates if c.get("ticker")]
ohlcv_data = download_ohlcv_cached(candidate_tickers, period="1y", cache_dir=cache_dir)
```
`ohlcv_data` is `Dict[str, DataFrame]` — ticker → daily OHLCV. Loading ~60 tickers from the parquet cache takes under 1 second.
### Replacement map
| Was | Becomes |
|---|---|
| `_fetch_batch_prices()` + `get_stock_price()` fallback | `ohlcv_data[ticker]["Close"].iloc[-1]` |
| `check_intraday_movement()` via `history(1d, 1m)` | `(close[-1] - close[-2]) / close[-2] * 100` (last 2 daily closes) |
| `check_if_price_reacted()` via `history(1mo)` | `(close[-1] - close[-N]) / close[-N] * 100` where N = `recent_movement_lookback_days` |
### Fallback
If a ticker is missing from `ohlcv_data` (e.g. newly listed, not yet in nightly prefetch), fall through to the existing `get_stock_price()` call. No candidates are lost that weren't already being lost before this change.
### Remove
`_fetch_batch_prices()` method — no longer needed once the cache covers current price.
## Expected outcome
- Zero yfinance rate limit errors during filter enrichment
- All candidates that pass the scan reach the ranker (was 11/63, should be ~60)
- No change to filter logic or thresholds — only the data source changes