docs: add filter OHLCV cache design spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
e50cbbd0ff
commit
883e7ea352
|
|
@ -0,0 +1,61 @@
|
|||
# Filter Stage OHLCV Cache Design
|
||||
|
||||
**Date:** 2026-04-15
|
||||
**Status:** Approved
|
||||
|
||||
## Problem
|
||||
|
||||
The discovery filter stage fires ~120 sequential per-ticker yfinance HTTP calls for each run (~60 candidates × 2 calls each), causing "Too Many Requests" rate limit errors. In the 2026-04-15 run, only 36/61 candidates got prices — the rest were silently dropped, leaving the ranker with just 11 candidates and producing only 2 final picks.
|
||||
|
||||
The three offending call sites in `filter.py`:
|
||||
1. `get_stock_price(ticker)` — fallback per-ticker price fetch
|
||||
2. `check_intraday_movement(ticker)` — `yf.Ticker().history(period="1d", interval="1m")`
|
||||
3. `check_if_price_reacted(ticker)` — `yf.Ticker().history(period="1mo")`
|
||||
|
||||
The OHLCV cache (1y daily bars, populated nightly) already contains all the data these checks need. Since discovery runs at 7:30am ET (pre-market), yesterday's close is the correct "current price" anyway.
|
||||
|
||||
## Solution
|
||||
|
||||
Load the OHLCV cache once at the start of the filter stage for the ~60 candidate tickers, then replace all three yfinance call sites with cache lookups.
|
||||
|
||||
## Scope
|
||||
|
||||
**File changed:** `tradingagents/dataflows/discovery/filter.py` only.
|
||||
|
||||
**Unchanged:** `_fetch_batch_volume()`, `_fetch_batch_news()`, `get_fundamentals()`, scanner code, OHLCV cache internals.
|
||||
|
||||
## Design
|
||||
|
||||
### Cache load
|
||||
|
||||
At the top of `CandidateFilter.filter()`, before the per-candidate loop:
|
||||
|
||||
```python
|
||||
cache_dir = self.config.get("discovery", {}).get("ohlcv_cache_dir", "data/ohlcv_cache")
|
||||
candidate_tickers = [c["ticker"] for c in candidates if c.get("ticker")]
|
||||
ohlcv_data = download_ohlcv_cached(candidate_tickers, period="1y", cache_dir=cache_dir)
|
||||
```
|
||||
|
||||
`ohlcv_data` is `Dict[str, DataFrame]` — ticker → daily OHLCV. Loading ~60 tickers from the parquet cache takes under 1 second.
|
||||
|
||||
### Replacement map
|
||||
|
||||
| Was | Becomes |
|
||||
|---|---|
|
||||
| `_fetch_batch_prices()` + `get_stock_price()` fallback | `ohlcv_data[ticker]["Close"].iloc[-1]` |
|
||||
| `check_intraday_movement()` via `history(1d, 1m)` | `(close[-1] - close[-2]) / close[-2] * 100` (last 2 daily closes) |
|
||||
| `check_if_price_reacted()` via `history(1mo)` | `(close[-1] - close[-N]) / close[-N] * 100` where N = `recent_movement_lookback_days` |
|
||||
|
||||
### Fallback
|
||||
|
||||
If a ticker is missing from `ohlcv_data` (e.g. newly listed, not yet in nightly prefetch), fall through to the existing `get_stock_price()` call. No candidates are lost that weren't already being lost before this change.
|
||||
|
||||
### Remove
|
||||
|
||||
`_fetch_batch_prices()` method — no longer needed once the cache covers current price.
|
||||
|
||||
## Expected outcome
|
||||
|
||||
- Zero yfinance rate limit errors during filter enrichment
|
||||
- All candidates that pass the scan reach the ranker (was 11/63, should be ~60)
|
||||
- No change to filter logic or thresholds — only the data source changes
|
||||
Loading…
Reference in New Issue