- tradingagents/dataflows/universe.py: single source of truth for ticker
universe; all scanners now call load_universe(config) instead of
duplicating the 3-level fallback chain with hardcoded "data/tickers.txt"
- scripts/prefetch_ohlcv.py: nightly script using existing ohlcv_cache.py
incremental logic; first run downloads 1y history, subsequent runs append
only new trading days
- .github/workflows/prefetch.yml: runs at 01:00 UTC daily, before all other
workflows; commits updated parquet to repo
- Updated 6 scanners: minervini, high_52w_breakout, ml_signal, options_flow,
sector_rotation, technical_breakout — removed duplicate DEFAULT_TICKER_FILE
constants and _load_tickers_from_file() functions
- minervini, high_52w_breakout, technical_breakout: replace yf.download()
with download_ohlcv_cached() — reads from prefetched cache instead of
hitting yfinance at discovery time
- default_config.py: added discovery.ohlcv_cache_dir config key
- data/ohlcv_cache/: initial 1y backfill (588 tickers, 5.4MB parquet)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements ShortSqueezeScanner wrapping existing get_short_interest() in finviz_scraper.py.
Research finding: raw high SI predicts negative long-term returns (academic); edge is using
SI as a squeeze-risk flag when combined with earnings_calendar or options_flow catalysts.
Directly addresses earnings_calendar pending hypothesis (APLD 30.6% SI was strongest setup).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements ShortSqueezeScanner wrapping existing get_short_interest() in finviz_scraper.py.
Research finding: raw high SI predicts negative long-term returns (academic); edge is using
SI as a squeeze-risk flag when combined with earnings_calendar or options_flow catalysts.
Directly addresses earnings_calendar pending hypothesis (APLD 30.6% SI was strongest setup).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs causing zero recommendations:
1. risk_metrics.py was untracked — importing it raised ModuleNotFoundError which
was caught by the outer try/except in filter.py, silently dropping all 32
candidates that reached the fundamental risk check stage.
2. Minervini scanner at max_tickers=200 took >5 min to download 200 tickers x 1y
of OHLCV data. ThreadPoolExecutor.cancel() cannot kill a running thread, so the
download kept running as a zombie thread for 20 more minutes after the pipeline
completed, holding the Python process alive until the 30-min workflow timeout
killed the entire job.
Reducing to 50 tickers brings the download to ~75s, well under the 300s global
scanner timeout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yf.download(592 tickers, period=1y) takes 20+ minutes in CI, causing
the 30-minute job timeout to trigger. Add max_tickers=200 (configurable)
to limit the batch download to the first N tickers from the file. The
concurrent scanner pool already has a 5-min global timeout, but the hung
download thread monopolises network connections and starves the filter stage.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
minervini.py existed but was never committed. Without the file on the
remote, the __init__.py import added in the previous fix causes an
ImportError in CI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add permissions: contents: write so git push works (was failing with 403)
- Add continue-on-error: true on discovery step so partial output still commits
- Change all commit/tracking/position steps to if: always() so they run regardless of discovery outcome
- Use commit-then-pull-rebase-then-push pattern to handle branch divergence
- Fix minervini scanner missing from scanners/__init__.py (enabled in config but never loaded)
- Fix .gitignore: results/* + !results/discovery/ so CI run logs can be committed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same issue as options_flow: early exit on candidate count discards strong
signals that happen to be later in iteration order.
insider_buying: Dict iteration order matched OpenInsider HTML scrape order,
not signal quality. Now scores by cluster buys + C-suite + dollar value,
then takes top N.
technical_breakout: Stopped at limit*2 in file order despite data already
being batch-downloaded (zero API cost to check all). Removed early exit,
scan full universe, sort by volume_multiple.
sector_rotation: Checked laggards in arbitrary dict order, spending API
calls on random tickers. Now sorts by most-negative 5d return first so
the strongest laggard candidates are checked before hitting the budget.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the scanner stopped as soon as self.limit candidates were found
from as_completed() futures. Since futures complete in non-deterministic
network-latency order, this was equivalent to random sampling — fast-to-
respond tickers won regardless of how strong their options signal was.
Fix: collect all candidates from the full universe, then sort by options_score
(unusual strike count weighted 1.5x for calls to favor bullish flow) before
applying the limit. The top-N strongest signals are now always returned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. executor.shutdown(wait=True) still blocked after global timeout (critical)
The previous fix added timeout= to as_completed() but used `with
ThreadPoolExecutor() as executor`, whose __exit__ calls shutdown(wait=True).
This meant the process still hung waiting for stuck threads (ml_signal) even
after the TimeoutError was caught. Fixed by creating the executor explicitly
and calling shutdown(wait=False) in a finally block.
2. ml_signal hangs on every run — "Batch-downloading 592 tickers (1y)..." never
completes. Root cause: a single yfinance request for 592 tickers × 1 year of
daily OHLCV is a very large payload that regularly times out at the network
layer. Fixed by:
- Reducing default lookback from "1y" to "6mo" (halves download size)
- Splitting downloads into 150-ticker chunks so a slow chunk doesn't kill
the whole scan (partial results are still returned)
3. C (Citigroup) and other single-letter NYSE tickers rejected as invalid.
validate_ticker_format used ^[A-Z]{2,5}$ requiring at least 2 letters.
Real tickers like C, A, F, T, X, M are 1 letter. Fixed to ^[A-Z]{1,5}$.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues caused the agent to get stuck after the last log message
from a completed scanner (e.g. "✓ reddit_trending: 11 candidates"):
1. `as_completed()` had no global timeout. If a scanner thread blocked
in a non-interruptible I/O call, `as_completed()` waited forever
because it only yields a future once it has finished — the per-future
`future.result(timeout=N)` call was never even reached.
Fixed by passing `timeout=global_timeout` to `as_completed()` so
the outer iterator raises TimeoutError after a capped wall-clock
budget, then logs which scanners didn't complete and continues.
2. `SectorRotationScanner` called `get_ticker_info()` (one HTTP request
per ticker) in a serial loop for up to 100 tickers from a 592-ticker
file, easily exceeding the 30 s per-scanner budget.
Fixed by batch-downloading close prices for all tickers in a single
`download_history()` call, computing 5-day returns locally, and only
calling `get_ticker_info()` for the small subset of laggard tickers
(<2% 5d move) that actually need a sector label.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Call get_finviz_insider_buying with return_structured=True and deduplicate=False
to get all raw transaction dicts instead of parsing markdown
- Group transactions by ticker for cluster detection (2+ unique insiders = CRITICAL)
- Smart priority: CEO/CFO + >$100K = CRITICAL, director + >$50K = HIGH, etc.
- Preserve insider_name, insider_title, transaction_value, num_insiders_buying in output
- Rich context strings: "CEO John Smith purchased $250K of AAPL shares"
- Update finviz_scraper alias to pass through return_structured and deduplicate params
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add GitHub Actions workflow for daily discovery (8:30 AM ET, weekdays)
- Add headless run_daily_discovery.py script for scheduling
- Expand options_flow scanner to use tickers.txt with parallel execution
- Add recommendation history section to Performance page with filters and charts
- Fix strategy name normalization (momentum/Momentum/Momentum-Hype → momentum)
- Fix strategy metrics to count all recs, not just evaluated ones
- Add error handling to Streamlit page rendering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ML signal scanner results table logging
- Add log_prompts_console config flag for prompt visibility control
- Expand ranker investment thesis to 4-6 sentence structured reasoning
- Linter auto-formatting across modified files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major additions:
- ML win probability scanner: scans ticker universe using trained
LightGBM/TabPFN model, surfaces candidates with P(WIN) above threshold
- 30-feature engineering pipeline (20 base + 10 interaction features)
computed from OHLCV data via stockstats + pandas
- Triple-barrier labeling for training data generation
- Dataset builder and training script with calibration analysis
- Discovery enrichment: confluence scoring, short interest extraction,
earnings estimates, options signal normalization, quant pre-score
- Configurable prompt logging (log_prompts_console flag)
- Enhanced ranker investment thesis (4-6 sentence reasoning)
- Typed DiscoveryConfig dataclass for all discovery settings
- Console price charts for visual ticker analysis
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Created nested "filters" section for all filter-stage settings
(min_average_volume, same-day movers, recent movers, etc.)
- Created nested "enrichment" section for batch news settings
- Updated CandidateFilter to read from new nested structure
- Added backward compatibility fallback for old flat config
- Improved config organization and clarity
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>