Commit Graph

31 Commits

Author SHA1 Message Date
Youssef Aitousarrah e15e2df7a5 feat(cache): unified ticker universe + nightly OHLCV prefetch
- tradingagents/dataflows/universe.py: single source of truth for ticker
  universe; all scanners now call load_universe(config) instead of
  duplicating the 3-level fallback chain with hardcoded "data/tickers.txt"

- scripts/prefetch_ohlcv.py: nightly script using existing ohlcv_cache.py
  incremental logic; first run downloads 1y history, subsequent runs append
  only new trading days

- .github/workflows/prefetch.yml: runs at 01:00 UTC daily, before all other
  workflows; commits updated parquet to repo

- Updated 6 scanners: minervini, high_52w_breakout, ml_signal, options_flow,
  sector_rotation, technical_breakout — removed duplicate DEFAULT_TICKER_FILE
  constants and _load_tickers_from_file() functions

- minervini, high_52w_breakout, technical_breakout: replace yf.download()
  with download_ohlcv_cached() — reads from prefetched cache instead of
  hitting yfinance at discovery time

- default_config.py: added discovery.ohlcv_cache_dir config key

- data/ohlcv_cache/: initial 1y backfill (588 tickers, 5.4MB parquet)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 16:18:52 -07:00
github-actions[bot] 17e77f036f research(autonomous): 2026-04-14 — automated research run 2026-04-14 13:47:21 -07:00
github-actions[bot] f862e91870 learn(iterate): 2026-04-14 — automated iteration run 2026-04-14 07:25:30 +00:00
Aitous 79a58a540c
Merge pull request #14 from Aitous/iterate/current
learn(iterate): automated improvements — 2026-04-13
2026-04-13 12:06:56 -07:00
github-actions[bot] 48a1c1672f research(autonomous): 2026-04-13 — automated research run 2026-04-13 09:06:49 +00:00
github-actions[bot] 17e45df41a learn(iterate): 2026-04-13 — automated iteration run 2026-04-13 07:52:59 +00:00
Youssef Aitousarrah f73681cf1c research(short-squeeze): 2026-04-12 — new short_squeeze scanner; high SI (>20%) as squeeze-risk discovery for cross-scanner confluence
Implements ShortSqueezeScanner wrapping existing get_short_interest() in finviz_scraper.py.
Research finding: raw high SI predicts negative long-term returns (academic); edge is using
SI as a squeeze-risk flag when combined with earnings_calendar or options_flow catalysts.
Directly addresses earnings_calendar pending hypothesis (APLD 30.6% SI was strongest setup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:10:36 -07:00
Youssef Aitousarrah a51d6193f8 research(short-squeeze): 2026-04-12 — new short_squeeze scanner; high SI (>20%) as squeeze-risk discovery for cross-scanner confluence
Implements ShortSqueezeScanner wrapping existing get_short_interest() in finviz_scraper.py.
Research finding: raw high SI predicts negative long-term returns (academic); edge is using
SI as a squeeze-risk flag when combined with earnings_calendar or options_flow catalysts.
Directly addresses earnings_calendar pending hypothesis (APLD 30.6% SI was strongest setup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:10:30 -07:00
Youssef Aitousarrah 612366fa45 learn(iterate): 2026-04-12 — document social_dd/early_accumulation; split social_dd from social_hype in ranker (55% 30d win rate vs 14.3%)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:03:03 -07:00
Youssef Aitousarrah 2d8b91b709 learn(iterate): 2026-04-12 — surface worst-performing strategies in ranker context; LLM now sees news_catalyst (0% 7d win rate) and social_hype (14.3%) as explicit penalties
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:04:46 -07:00
Youssef Aitousarrah 7ec0e52b98 learn(iterate): 2026-04-12 — raise score threshold 55→65; minervini leads; insider_buying staleness pattern identified
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:04:46 -07:00
github-actions[bot] f17b2e4e02 learn(iterate): 2026-04-11 — automated iteration run 2026-04-11 06:55:28 +00:00
Youssef Aitousarrah 704af1a855 fix(discovery): commit risk_metrics.py and reduce Minervini max_tickers to 50
Two bugs causing zero recommendations:

1. risk_metrics.py was untracked — importing it raised ModuleNotFoundError which
   was caught by the outer try/except in filter.py, silently dropping all 32
   candidates that reached the fundamental risk check stage.

2. Minervini scanner at max_tickers=200 took >5 min to download 200 tickers x 1y
   of OHLCV data. ThreadPoolExecutor.cancel() cannot kill a running thread, so the
   download kept running as a zombie thread for 20 more minutes after the pipeline
   completed, holding the Python process alive until the 30-min workflow timeout
   killed the entire job.

   Reducing to 50 tickers brings the download to ~75s, well under the 300s global
   scanner timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 16:24:29 -07:00
Youssef Aitousarrah 957b009da1 fix(minervini): cap ticker universe to prevent CI timeout
yf.download(592 tickers, period=1y) takes 20+ minutes in CI, causing
the 30-minute job timeout to trigger. Add max_tickers=200 (configurable)
to limit the batch download to the first N tickers from the file. The
concurrent scanner pool already has a 5-min global timeout, but the hung
download thread monopolises network connections and starves the filter stage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 14:24:22 -07:00
Youssef Aitousarrah b68a43ec0d feat(scanners): add minervini scanner to registry
minervini.py existed but was never committed. Without the file on the
remote, the __init__.py import added in the previous fix causes an
ImportError in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:51:42 -07:00
Youssef Aitousarrah 32d89c3bfc fix(ci): restore daily discovery workflow
- Add permissions: contents: write so git push works (was failing with 403)
- Add continue-on-error: true on discovery step so partial output still commits
- Change all commit/tracking/position steps to if: always() so they run regardless of discovery outcome
- Use commit-then-pull-rebase-then-push pattern to handle branch divergence
- Fix minervini scanner missing from scanners/__init__.py (enabled in config but never loaded)
- Fix .gitignore: results/* + !results/discovery/ so CI run logs can be committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah 719a2d3f4e fix(scanners): rank by signal quality before limiting in 3 more scanners
Same issue as options_flow: early exit on candidate count discards strong
signals that happen to be later in iteration order.

insider_buying: Dict iteration order matched OpenInsider HTML scrape order,
not signal quality. Now scores by cluster buys + C-suite + dollar value,
then takes top N.

technical_breakout: Stopped at limit*2 in file order despite data already
being batch-downloaded (zero API cost to check all). Removed early exit,
scan full universe, sort by volume_multiple.

sector_rotation: Checked laggards in arbitrary dict order, spending API
calls on random tickers. Now sorts by most-negative 5d return first so
the strongest laggard candidates are checked before hitting the budget.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah 136fa47645 fix(options_flow): scan full universe before applying limit, rank by signal strength
Previously the scanner stopped as soon as self.limit candidates were found
from as_completed() futures. Since futures complete in non-deterministic
network-latency order, this was equivalent to random sampling — fast-to-
respond tickers won regardless of how strong their options signal was.

Fix: collect all candidates from the full universe, then sort by options_score
(unusual strike count weighted 1.5x for calls to favor bullish flow) before
applying the limit. The top-N strongest signals are now always returned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah c792b17ab6 fix(discovery): fix three scanner hang/validation bugs found in ranker_debug.log
1. executor.shutdown(wait=True) still blocked after global timeout (critical)
   The previous fix added timeout= to as_completed() but used `with
   ThreadPoolExecutor() as executor`, whose __exit__ calls shutdown(wait=True).
   This meant the process still hung waiting for stuck threads (ml_signal) even
   after the TimeoutError was caught.  Fixed by creating the executor explicitly
   and calling shutdown(wait=False) in a finally block.

2. ml_signal hangs on every run — "Batch-downloading 592 tickers (1y)..." never
   completes. Root cause: a single yfinance request for 592 tickers × 1 year of
   daily OHLCV is a very large payload that regularly times out at the network
   layer. Fixed by:
   - Reducing default lookback from "1y" to "6mo" (halves download size)
   - Splitting downloads into 150-ticker chunks so a slow chunk doesn't kill
     the whole scan (partial results are still returned)

3. C (Citigroup) and other single-letter NYSE tickers rejected as invalid.
   validate_ticker_format used ^[A-Z]{2,5}$ requiring at least 2 letters.
   Real tickers like C, A, F, T, X, M are 1 letter. Fixed to ^[A-Z]{1,5}$.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:35:42 -08:00
Youssef Aitousarrah ce2a6ef8fa fix(discovery): fix infinite hang when a scanner thread blocks indefinitely
Two issues caused the agent to get stuck after the last log message
from a completed scanner (e.g. "✓ reddit_trending: 11 candidates"):

1. `as_completed()` had no global timeout. If a scanner thread blocked
   in a non-interruptible I/O call, `as_completed()` waited forever
   because it only yields a future once it has finished — the per-future
   `future.result(timeout=N)` call was never even reached.
   Fixed by passing `timeout=global_timeout` to `as_completed()` so
   the outer iterator raises TimeoutError after a capped wall-clock
   budget, then logs which scanners didn't complete and continues.

2. `SectorRotationScanner` called `get_ticker_info()` (one HTTP request
   per ticker) in a serial loop for up to 100 tickers from a 592-ticker
   file, easily exceeding the 30 s per-scanner budget.
   Fixed by batch-downloading close prices for all tickers in a single
   `download_history()` call, computing 5-day returns locally, and only
   calling `get_ticker_info()` for the small subset of laggard tickers
   (<2% 5d move) that actually need a sector label.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:14:53 -08:00
Youssef Aitousarrah ec8309a34e Update 2026-02-20 08:38:15 -08:00
Youssef Aitousarrah 1c20dc8c90 feat: improve all 9 scanners and add 3 new scanners
Phase 1 - Fix existing scanners:
- Options flow: apply min_premium filter, scan 3 expirations
- Volume accumulation: distinguish accumulation vs distribution
- Reddit DD: use LLM quality score for priority (skip <60)
- Reddit trending: add mention counts, scale priority by volume
- Semantic news: include headlines, add catalyst classification
- Earnings calendar: add pre-earnings accumulation + EPS estimates
- Market movers: add price ($5) and volume (500K) validation
- ML signal: raise min_win_prob from 35% to 50%

Phase 2 - New scanners:
- Analyst upgrades: monitors rating changes via Alpha Vantage
- Technical breakout: volume-confirmed breakouts above 20d high
- Sector rotation: finds laggards in accelerating sectors

All 12 scanners register with valid Strategy enum values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:36:18 -08:00
Youssef Aitousarrah 573b756b4b fix(insider-buying): preserve transaction details, add cluster detection and smart priority
- Call get_finviz_insider_buying with return_structured=True and deduplicate=False
  to get all raw transaction dicts instead of parsing markdown
- Group transactions by ticker for cluster detection (2+ unique insiders = CRITICAL)
- Smart priority: CEO/CFO + >$100K = CRITICAL, director + >$50K = HIGH, etc.
- Preserve insider_name, insider_title, transaction_value, num_insiders_buying in output
- Rich context strings: "CEO John Smith purchased $250K of AAPL shares"
- Update finviz_scraper alias to pass through return_structured and deduplicate params

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:36:18 -08:00
Youssef Aitousarrah 6831339b78 Remore unused code and improve the UI 2026-02-16 14:17:43 -08:00
Youssef Aitousarrah 8d3205043e Update 2026-02-16 14:17:41 -08:00
Youssef Aitousarrah f4aceef857 feat: add daily discovery workflow, recommendation history, and scanner improvements
- Add GitHub Actions workflow for daily discovery (8:30 AM ET, weekdays)
- Add headless run_daily_discovery.py script for scheduling
- Expand options_flow scanner to use tickers.txt with parallel execution
- Add recommendation history section to Performance page with filters and charts
- Fix strategy name normalization (momentum/Momentum/Momentum-Hype → momentum)
- Fix strategy metrics to count all recs, not just evaluated ones
- Add error handling to Streamlit page rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 22:07:02 -08:00
Youssef Aitousarrah 8ebb42114d Add recommendations folder so that the UI can display it 4 2026-02-10 22:28:52 -08:00
Youssef Aitousarrah cb5ae49501 chore: linter formatting + ML scanner logging, prompt control, ranker reasoning
- Add ML signal scanner results table logging
- Add log_prompts_console config flag for prompt visibility control
- Expand ranker investment thesis to 4-6 sentence structured reasoning
- Linter auto-formatting across modified files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 23:04:38 -08:00
Youssef Aitousarrah 43bdd6de11 feat: discovery pipeline enhancements with ML signal scanner
Major additions:
- ML win probability scanner: scans ticker universe using trained
  LightGBM/TabPFN model, surfaces candidates with P(WIN) above threshold
- 30-feature engineering pipeline (20 base + 10 interaction features)
  computed from OHLCV data via stockstats + pandas
- Triple-barrier labeling for training data generation
- Dataset builder and training script with calibration analysis
- Discovery enrichment: confluence scoring, short interest extraction,
  earnings estimates, options signal normalization, quant pre-score
- Configurable prompt logging (log_prompts_console flag)
- Enhanced ranker investment thesis (4-6 sentence reasoning)
- Typed DiscoveryConfig dataclass for all discovery settings
- Console price charts for visual ticker analysis

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 22:53:42 -08:00
Youssef Aitousarrah f1178b4a57 refactor: organize discovery config into dedicated filter/enrichment sections
- Created nested "filters" section for all filter-stage settings
  (min_average_volume, same-day movers, recent movers, etc.)
- Created nested "enrichment" section for batch news settings
- Updated CandidateFilter to read from new nested structure
- Added backward compatibility fallback for old flat config
- Improved config organization and clarity

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-06 08:22:39 -08:00
Youssef Aitousarrah 369f8c444b feat: discovery system code quality improvements and concurrent execution
Implement comprehensive code quality improvements and performance optimizations
for the discovery pipeline based on code review findings.

## Key Improvements

### 1. Common Utilities (DRY Principle)
- Created `tradingagents/dataflows/discovery/common_utils.py`
- Extracted ticker parsing logic (eliminates 40+ lines of duplication)
- Centralized stopwords list (71 common non-ticker words)
- Added ReDoS protection (100KB text length limit)
- Provides `validate_candidate_structure()` for output validation

### 2. Scanner Output Validation
- Two-layer validation approach:
  - Registration-time: Check scanner class structure
  - Runtime: Validate each candidate dictionary
- Added `scan_with_validation()` wrapper in BaseScanner
- Validates required keys: ticker, source, context, priority
- Graceful error handling with structured logging

### 3. Configuration-Driven Design
- Moved magic numbers to `default_config.py`:
  - `ticker_universe`: Top 20 liquid options tickers
  - `min_volume`: 1000 (options flow threshold)
  - `min_transaction_value`: $25,000 (insider buying filter)
- Fixed hardcoded absolute paths to relative paths
- Improved portability across development environments

### 4. Concurrent Scanner Execution (37% Performance Gain)
- Implemented ThreadPoolExecutor for parallel scanner execution
- Configuration: `scanner_execution.concurrent`, `max_workers`, `timeout_seconds`
- Performance: 42s vs 67s (37% faster with 8 scanners)
- Thread-safe state management (each scanner gets copy)
- Per-scanner timeout with graceful degradation
- Error isolation (one failure doesn't stop others)

### 5. Error Handling Improvements
- Changed bare `except:` to `except Exception:` (avoid catching KeyboardInterrupt)
- Added structured logging with `exc_info=True` and extra fields
- Implemented graceful degradation throughout pipeline

## Files Changed

### Core Implementation
- `tradingagents/__init__.py` (NEW) - Package initialization
- `tradingagents/default_config.py` - Scanner execution config, magic numbers
- `tradingagents/graph/discovery_graph.py` - Concurrent execution logic
- `tradingagents/dataflows/discovery/common_utils.py` (NEW) - Shared utilities
- `tradingagents/dataflows/discovery/scanner_registry.py` - Validation wrapper
- `tradingagents/dataflows/discovery/scanners/*.py` - Use common utilities

### Testing & Documentation
- `tests/test_concurrent_scanners.py` (NEW) - Comprehensive test suite
- `verify_concurrent_execution.py` (NEW) - Performance verification
- `CONCURRENT_EXECUTION.md` (NEW) - Implementation documentation

## Test Results

All tests passing (exit code 0):
-  Concurrent execution: 42s, 66-69 candidates
-  Sequential fallback: 56-67s, 65-68 candidates
-  Timeout handling: Graceful degradation with 1s timeout
-  Error isolation: Individual failures don't cascade

## Performance Impact

- Scanner execution: 37% faster (42s vs 67s)
- Time saved: ~25 seconds per discovery run
- At scale: 4+ minutes saved daily in production
- Same candidate quality (65-69 tickers in both modes)

## Breaking Changes

None. Concurrent execution is opt-in via config flag.
Sequential mode remains available as fallback.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 23:27:01 -08:00