Commit Graph

54 Commits

Author SHA1 Message Date
github-actions[bot] 1dd00e467f research(autonomous): 2026-04-14 — automated research run 2026-04-14 21:17:06 +00:00
github-actions[bot] 17e77f036f research(autonomous): 2026-04-14 — automated research run 2026-04-14 13:47:21 -07:00
github-actions[bot] f862e91870 learn(iterate): 2026-04-14 — automated iteration run 2026-04-14 07:25:30 +00:00
Aitous 79a58a540c
Merge pull request #14 from Aitous/iterate/current
learn(iterate): automated improvements — 2026-04-13
2026-04-13 12:06:56 -07:00
github-actions[bot] 48a1c1672f research(autonomous): 2026-04-13 — automated research run 2026-04-13 09:06:49 +00:00
github-actions[bot] 17e45df41a learn(iterate): 2026-04-13 — automated iteration run 2026-04-13 07:52:59 +00:00
Youssef Aitousarrah f73681cf1c research(short-squeeze): 2026-04-12 — new short_squeeze scanner; high SI (>20%) as squeeze-risk discovery for cross-scanner confluence
Implements ShortSqueezeScanner wrapping existing get_short_interest() in finviz_scraper.py.
Research finding: raw high SI predicts negative long-term returns (academic); edge is using
SI as a squeeze-risk flag when combined with earnings_calendar or options_flow catalysts.
Directly addresses earnings_calendar pending hypothesis (APLD 30.6% SI was strongest setup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:10:36 -07:00
Youssef Aitousarrah a51d6193f8 research(short-squeeze): 2026-04-12 — new short_squeeze scanner; high SI (>20%) as squeeze-risk discovery for cross-scanner confluence
Implements ShortSqueezeScanner wrapping existing get_short_interest() in finviz_scraper.py.
Research finding: raw high SI predicts negative long-term returns (academic); edge is using
SI as a squeeze-risk flag when combined with earnings_calendar or options_flow catalysts.
Directly addresses earnings_calendar pending hypothesis (APLD 30.6% SI was strongest setup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:10:30 -07:00
Youssef Aitousarrah 612366fa45 learn(iterate): 2026-04-12 — document social_dd/early_accumulation; split social_dd from social_hype in ranker (55% 30d win rate vs 14.3%)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:03:03 -07:00
Youssef Aitousarrah 2d8b91b709 learn(iterate): 2026-04-12 — surface worst-performing strategies in ranker context; LLM now sees news_catalyst (0% 7d win rate) and social_hype (14.3%) as explicit penalties
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:04:46 -07:00
Youssef Aitousarrah 7ec0e52b98 learn(iterate): 2026-04-12 — raise score threshold 55→65; minervini leads; insider_buying staleness pattern identified
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:04:46 -07:00
github-actions[bot] f17b2e4e02 learn(iterate): 2026-04-11 — automated iteration run 2026-04-11 06:55:28 +00:00
Youssef Aitousarrah c09cc7ec25 fix(y_finance): make suppress_yfinance_warnings thread-safe
The previous implementation redirected sys.stderr to /dev/null using a
context manager. This is not thread-safe: 8 concurrent scanner threads each
mutate sys.stderr, and when one thread's context manager closes the devnull
file, another thread that captured devnull as its saved stderr attempts to
write to the closed fd and raises "I/O operation on closed file".

This corrupted sys.stderr state caused _fetch_batch_prices to fail and
all per-ticker get_stock_price fallback calls to return None, resulting in
every candidate being dropped with "no data available".

Fix by suppressing at the Python logging level instead of redirecting
sys.stderr. Logger.setLevel() is protected by internal locks and is safe
to call from concurrent threads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 16:37:03 -07:00
Youssef Aitousarrah 704af1a855 fix(discovery): commit risk_metrics.py and reduce Minervini max_tickers to 50
Two bugs causing zero recommendations:

1. risk_metrics.py was untracked — importing it raised ModuleNotFoundError which
   was caught by the outer try/except in filter.py, silently dropping all 32
   candidates that reached the fundamental risk check stage.

2. Minervini scanner at max_tickers=200 took >5 min to download 200 tickers x 1y
   of OHLCV data. ThreadPoolExecutor.cancel() cannot kill a running thread, so the
   download kept running as a zombie thread for 20 more minutes after the pipeline
   completed, holding the Python process alive until the 30-min workflow timeout
   killed the entire job.

   Reducing to 50 tickers brings the download to ~75s, well under the 300s global
   scanner timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 16:24:29 -07:00
Youssef Aitousarrah 2e79c2245f fix(filter): only filter upside intraday movers, not downside
The same-day mover filter used abs() to check intraday movement, which
filtered both gap-ups AND gap-downs. On volatile/crash days (e.g. 2026-04-07)
all stocks dropped >10% from open, causing every candidate to be filtered and
leaving zero recommendations.

The filter's purpose is to avoid chasing stocks that already ran up. A stock
down 20% intraday is not "stale" — it should be evaluated on its merits.
Changed threshold check from abs(pct) >= threshold to pct >= threshold so only
upside movers are filtered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 12:51:41 -07:00
Youssef Aitousarrah 957b009da1 fix(minervini): cap ticker universe to prevent CI timeout
yf.download(592 tickers, period=1y) takes 20+ minutes in CI, causing
the 30-minute job timeout to trigger. Add max_tickers=200 (configurable)
to limit the batch download to the first N tickers from the file. The
concurrent scanner pool already has a 5-min global timeout, but the hung
download thread monopolises network connections and starves the filter stage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 14:24:22 -07:00
Youssef Aitousarrah b68a43ec0d feat(scanners): add minervini scanner to registry
minervini.py existed but was never committed. Without the file on the
remote, the __init__.py import added in the previous fix causes an
ImportError in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:51:42 -07:00
Youssef Aitousarrah 32d89c3bfc fix(ci): restore daily discovery workflow
- Add permissions: contents: write so git push works (was failing with 403)
- Add continue-on-error: true on discovery step so partial output still commits
- Change all commit/tracking/position steps to if: always() so they run regardless of discovery outcome
- Use commit-then-pull-rebase-then-push pattern to handle branch divergence
- Fix minervini scanner missing from scanners/__init__.py (enabled in config but never loaded)
- Fix .gitignore: results/* + !results/discovery/ so CI run logs can be committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah 719a2d3f4e fix(scanners): rank by signal quality before limiting in 3 more scanners
Same issue as options_flow: early exit on candidate count discards strong
signals that happen to be later in iteration order.

insider_buying: Dict iteration order matched OpenInsider HTML scrape order,
not signal quality. Now scores by cluster buys + C-suite + dollar value,
then takes top N.

technical_breakout: Stopped at limit*2 in file order despite data already
being batch-downloaded (zero API cost to check all). Removed early exit,
scan full universe, sort by volume_multiple.

sector_rotation: Checked laggards in arbitrary dict order, spending API
calls on random tickers. Now sorts by most-negative 5d return first so
the strongest laggard candidates are checked before hitting the budget.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah 136fa47645 fix(options_flow): scan full universe before applying limit, rank by signal strength
Previously the scanner stopped as soon as self.limit candidates were found
from as_completed() futures. Since futures complete in non-deterministic
network-latency order, this was equivalent to random sampling — fast-to-
respond tickers won regardless of how strong their options signal was.

Fix: collect all candidates from the full universe, then sort by options_score
(unusual strike count weighted 1.5x for calls to favor bullish flow) before
applying the limit. The top-N strongest signals are now always returned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah 61b731ac28 fix(filter): replace tqdm with logger in batch news functions to fix I/O error
tqdm writes to stderr immediately on __enter__, before any loop iteration.
In Streamlit's thread/subprocess context stderr can be a closed pipe, causing
'I/O operation on closed file' which _run_call catches and returns {} — so
the entire news enrichment step was silently skipped every run.

Replaced tqdm progress bars with logger.info() calls in:
- get_batch_stock_news_google() in openai.py
- get_batch_stock_news_openai() in openai.py
- Reddit DD parallel evaluation in reddit_api.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:46:21 -07:00
Youssef Aitousarrah c792b17ab6 fix(discovery): fix three scanner hang/validation bugs found in ranker_debug.log
1. executor.shutdown(wait=True) still blocked after global timeout (critical)
   The previous fix added timeout= to as_completed() but used `with
   ThreadPoolExecutor() as executor`, whose __exit__ calls shutdown(wait=True).
   This meant the process still hung waiting for stuck threads (ml_signal) even
   after the TimeoutError was caught.  Fixed by creating the executor explicitly
   and calling shutdown(wait=False) in a finally block.

2. ml_signal hangs on every run — "Batch-downloading 592 tickers (1y)..." never
   completes. Root cause: a single yfinance request for 592 tickers × 1 year of
   daily OHLCV is a very large payload that regularly times out at the network
   layer. Fixed by:
   - Reducing default lookback from "1y" to "6mo" (halves download size)
   - Splitting downloads into 150-ticker chunks so a slow chunk doesn't kill
     the whole scan (partial results are still returned)

3. C (Citigroup) and other single-letter NYSE tickers rejected as invalid.
   validate_ticker_format used ^[A-Z]{2,5}$ requiring at least 2 letters.
   Real tickers like C, A, F, T, X, M are 1 letter. Fixed to ^[A-Z]{1,5}$.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:35:42 -08:00
Youssef Aitousarrah ce2a6ef8fa fix(discovery): fix infinite hang when a scanner thread blocks indefinitely
Two issues caused the agent to get stuck after the last log message
from a completed scanner (e.g. "✓ reddit_trending: 11 candidates"):

1. `as_completed()` had no global timeout. If a scanner thread blocked
   in a non-interruptible I/O call, `as_completed()` waited forever
   because it only yields a future once it has finished — the per-future
   `future.result(timeout=N)` call was never even reached.
   Fixed by passing `timeout=global_timeout` to `as_completed()` so
   the outer iterator raises TimeoutError after a capped wall-clock
   budget, then logs which scanners didn't complete and continues.

2. `SectorRotationScanner` called `get_ticker_info()` (one HTTP request
   per ticker) in a serial loop for up to 100 tickers from a 592-ticker
   file, easily exceeding the 30 s per-scanner budget.
   Fixed by batch-downloading close prices for all tickers in a single
   `download_history()` call, computing 5-day returns locally, and only
   calling `get_ticker_info()` for the small subset of laggard tickers
   (<2% 5d move) that actually need a sector label.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:14:53 -08:00
Youssef Aitousarrah 8e2392029a Update 2026-02-20 08:39:37 -08:00
Youssef Aitousarrah ec8309a34e Update 2026-02-20 08:38:15 -08:00
Youssef Aitousarrah 1c20dc8c90 feat: improve all 9 scanners and add 3 new scanners
Phase 1 - Fix existing scanners:
- Options flow: apply min_premium filter, scan 3 expirations
- Volume accumulation: distinguish accumulation vs distribution
- Reddit DD: use LLM quality score for priority (skip <60)
- Reddit trending: add mention counts, scale priority by volume
- Semantic news: include headlines, add catalyst classification
- Earnings calendar: add pre-earnings accumulation + EPS estimates
- Market movers: add price ($5) and volume (500K) validation
- ML signal: raise min_win_prob from 35% to 50%

Phase 2 - New scanners:
- Analyst upgrades: monitors rating changes via Alpha Vantage
- Technical breakout: volume-confirmed breakouts above 20d high
- Sector rotation: finds laggards in accelerating sectors

All 12 scanners register with valid Strategy enum values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:36:18 -08:00
Youssef Aitousarrah 573b756b4b fix(insider-buying): preserve transaction details, add cluster detection and smart priority
- Call get_finviz_insider_buying with return_structured=True and deduplicate=False
  to get all raw transaction dicts instead of parsing markdown
- Group transactions by ticker for cluster detection (2+ unique insiders = CRITICAL)
- Smart priority: CEO/CFO + >$100K = CRITICAL, director + >$50K = HIGH, etc.
- Preserve insider_name, insider_title, transaction_value, num_insiders_buying in output
- Rich context strings: "CEO John Smith purchased $250K of AAPL shares"
- Update finviz_scraper alias to pass through return_structured and deduplicate params

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:36:18 -08:00
Youssef Aitousarrah 6831339b78 Remore unused code and improve the UI 2026-02-16 14:17:43 -08:00
Youssef Aitousarrah 8d3205043e Update 2026-02-16 14:17:41 -08:00
Youssef Aitousarrah f4aceef857 feat: add daily discovery workflow, recommendation history, and scanner improvements
- Add GitHub Actions workflow for daily discovery (8:30 AM ET, weekdays)
- Add headless run_daily_discovery.py script for scheduling
- Expand options_flow scanner to use tickers.txt with parallel execution
- Add recommendation history section to Performance page with filters and charts
- Fix strategy name normalization (momentum/Momentum/Momentum-Hype → momentum)
- Fix strategy metrics to count all recs, not just evaluated ones
- Add error handling to Streamlit page rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 22:07:02 -08:00
Youssef Aitousarrah 8ebb42114d Add recommendations folder so that the UI can display it 4 2026-02-10 22:28:52 -08:00
Youssef Aitousarrah cb5ae49501 chore: linter formatting + ML scanner logging, prompt control, ranker reasoning
- Add ML signal scanner results table logging
- Add log_prompts_console config flag for prompt visibility control
- Expand ranker investment thesis to 4-6 sentence structured reasoning
- Linter auto-formatting across modified files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 23:04:38 -08:00
Youssef Aitousarrah 43bdd6de11 feat: discovery pipeline enhancements with ML signal scanner
Major additions:
- ML win probability scanner: scans ticker universe using trained
  LightGBM/TabPFN model, surfaces candidates with P(WIN) above threshold
- 30-feature engineering pipeline (20 base + 10 interaction features)
  computed from OHLCV data via stockstats + pandas
- Triple-barrier labeling for training data generation
- Dataset builder and training script with calibration analysis
- Discovery enrichment: confluence scoring, short interest extraction,
  earnings estimates, options signal normalization, quant pre-score
- Configurable prompt logging (log_prompts_console flag)
- Enhanced ranker investment thesis (4-6 sentence reasoning)
- Typed DiscoveryConfig dataclass for all discovery settings
- Console price charts for visual ticker analysis

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 22:53:42 -08:00
Youssef Aitousarrah f1178b4a57 refactor: organize discovery config into dedicated filter/enrichment sections
- Created nested "filters" section for all filter-stage settings
  (min_average_volume, same-day movers, recent movers, etc.)
- Created nested "enrichment" section for batch news settings
- Updated CandidateFilter to read from new nested structure
- Added backward compatibility fallback for old flat config
- Improved config organization and clarity

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-06 08:22:39 -08:00
Youssef Aitousarrah 369f8c444b feat: discovery system code quality improvements and concurrent execution
Implement comprehensive code quality improvements and performance optimizations
for the discovery pipeline based on code review findings.

## Key Improvements

### 1. Common Utilities (DRY Principle)
- Created `tradingagents/dataflows/discovery/common_utils.py`
- Extracted ticker parsing logic (eliminates 40+ lines of duplication)
- Centralized stopwords list (71 common non-ticker words)
- Added ReDoS protection (100KB text length limit)
- Provides `validate_candidate_structure()` for output validation

### 2. Scanner Output Validation
- Two-layer validation approach:
  - Registration-time: Check scanner class structure
  - Runtime: Validate each candidate dictionary
- Added `scan_with_validation()` wrapper in BaseScanner
- Validates required keys: ticker, source, context, priority
- Graceful error handling with structured logging

### 3. Configuration-Driven Design
- Moved magic numbers to `default_config.py`:
  - `ticker_universe`: Top 20 liquid options tickers
  - `min_volume`: 1000 (options flow threshold)
  - `min_transaction_value`: $25,000 (insider buying filter)
- Fixed hardcoded absolute paths to relative paths
- Improved portability across development environments

### 4. Concurrent Scanner Execution (37% Performance Gain)
- Implemented ThreadPoolExecutor for parallel scanner execution
- Configuration: `scanner_execution.concurrent`, `max_workers`, `timeout_seconds`
- Performance: 42s vs 67s (37% faster with 8 scanners)
- Thread-safe state management (each scanner gets copy)
- Per-scanner timeout with graceful degradation
- Error isolation (one failure doesn't stop others)

### 5. Error Handling Improvements
- Changed bare `except:` to `except Exception:` (avoid catching KeyboardInterrupt)
- Added structured logging with `exc_info=True` and extra fields
- Implemented graceful degradation throughout pipeline

## Files Changed

### Core Implementation
- `tradingagents/__init__.py` (NEW) - Package initialization
- `tradingagents/default_config.py` - Scanner execution config, magic numbers
- `tradingagents/graph/discovery_graph.py` - Concurrent execution logic
- `tradingagents/dataflows/discovery/common_utils.py` (NEW) - Shared utilities
- `tradingagents/dataflows/discovery/scanner_registry.py` - Validation wrapper
- `tradingagents/dataflows/discovery/scanners/*.py` - Use common utilities

### Testing & Documentation
- `tests/test_concurrent_scanners.py` (NEW) - Comprehensive test suite
- `verify_concurrent_execution.py` (NEW) - Performance verification
- `CONCURRENT_EXECUTION.md` (NEW) - Implementation documentation

## Test Results

All tests passing (exit code 0):
-  Concurrent execution: 42s, 66-69 candidates
-  Sequential fallback: 56-67s, 65-68 candidates
-  Timeout handling: Graceful degradation with 1s timeout
-  Error isolation: Individual failures don't cascade

## Performance Impact

- Scanner execution: 37% faster (42s vs 67s)
- Time saved: ~25 seconds per discovery run
- At scale: 4+ minutes saved daily in production
- Same candidate quality (65-69 tickers in both modes)

## Breaking Changes

None. Concurrent execution is opt-in via config flag.
Sequential mode remains available as fallback.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 23:27:01 -08:00
Youssef Aitousarrah 2376fc74a1 Update 2025-12-11 00:23:28 -08:00
Youssef Aitousarrah ea4ee9176b Update 2025-12-09 23:16:53 -08:00
Youssef Aitousarrah ccc78c694b Update 2025-12-06 15:39:49 -08:00
Youssef Aitousarrah 5cf57e5d97 Update 2025-12-02 20:49:42 -08:00
Youssef Aitousarrah 9ee66746a5 Fix: Stop after all primary vendors in multi-vendor config
When multiple primary vendors are configured (e.g., 'reddit,alpha_vantage'),
the system now correctly stops after attempting all primary vendors instead
of continuing through all fallback vendors.

Changes:
- Track which primary vendors have been attempted in a list
- Add stopping condition when all primary vendors are attempted
- Preserve existing single-vendor behavior (stop after first success)

This prevents unnecessary API calls and ensures predictable behavior.
2025-11-23 17:09:15 -08:00
Edward Sun 7bb2941b07 optimized yfin fetching to be much faster 2025-10-06 19:58:01 -07:00
Edward Sun c07dcf026b added fallbacks for tools 2025-10-03 22:40:09 -07:00
luohy15 d23fb539e9 minor fix 2025-09-30 13:27:48 +08:00
luohy15 b01051b9f4 Switch default data vendor
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 12:43:27 +08:00
luohy15 6211b1132a Improve Alpha Vantage indicator column parsing with robust mapping
- Replace hardcoded column indices with column name lookup
- Add mapping for all supported indicators to their expected CSV column names
- Handle missing columns gracefully with descriptive error messages
- Strip whitespace from header parsing for reliability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-26 23:36:36 +08:00
luohy15 8b04ec307f minor fix 2025-09-26 23:25:33 +08:00
luohy15 0ab323c2c6 Add Alpha Vantage API integration as primary data provider
- Replace FinnHub with Alpha Vantage API in README documentation
- Implement comprehensive Alpha Vantage modules:
  - Stock data (daily OHLCV with date filtering)
  - Technical indicators (SMA, EMA, MACD, RSI, Bollinger Bands, ATR)
  - Fundamental data (overview, balance sheet, cashflow, income statement)
  - News and sentiment data with insider transactions
- Update news analyst tools to use ticker-based news search
- Integrate Alpha Vantage vendor methods into interface routing
- Maintain backward compatibility with existing vendor system

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-26 22:57:50 +08:00
luohy15 a6734d71bc WIP 2025-09-26 16:17:50 +08:00
Max Wong 43aa9c5d09
Local Ollama (#53)
- Fix typo 'Start' 'End'
- Add llama3.1 selection
- Use 'quick_think_llm' model instead of hard-coding GPT
2025-06-26 00:27:01 -04:00
Edward Sun da84ef43aa main works, cli bugs 2025-06-15 22:20:59 -07:00