Commit Graph

10 Commits

Author SHA1 Message Date
Youssef Aitousarrah c09cc7ec25 fix(y_finance): make suppress_yfinance_warnings thread-safe
The previous implementation redirected sys.stderr to /dev/null using a
context manager. This is not thread-safe: 8 concurrent scanner threads each
mutate sys.stderr, and when one thread's context manager closes the devnull
file, another thread that captured devnull as its saved stderr attempts to
write to the closed fd and raises "I/O operation on closed file".

This corrupted sys.stderr state caused _fetch_batch_prices to fail and
all per-ticker get_stock_price fallback calls to return None, resulting in
every candidate being dropped with "no data available".

Fix by suppressing at the Python logging level instead of redirecting
sys.stderr. Logger.setLevel() is protected by internal locks and is safe
to call from concurrent threads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 16:37:03 -07:00
Youssef Aitousarrah 2e79c2245f fix(filter): only filter upside intraday movers, not downside
The same-day mover filter used abs() to check intraday movement, which
filtered both gap-ups AND gap-downs. On volatile/crash days (e.g. 2026-04-07)
all stocks dropped >10% from open, causing every candidate to be filtered and
leaving zero recommendations.

The filter's purpose is to avoid chasing stocks that already ran up. A stock
down 20% intraday is not "stale" — it should be evaluated on its merits.
Changed threshold check from abs(pct) >= threshold to pct >= threshold so only
upside movers are filtered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 12:51:41 -07:00
Youssef Aitousarrah 43bdd6de11 feat: discovery pipeline enhancements with ML signal scanner
Major additions:
- ML win probability scanner: scans ticker universe using trained
  LightGBM/TabPFN model, surfaces candidates with P(WIN) above threshold
- 30-feature engineering pipeline (20 base + 10 interaction features)
  computed from OHLCV data via stockstats + pandas
- Triple-barrier labeling for training data generation
- Dataset builder and training script with calibration analysis
- Discovery enrichment: confluence scoring, short interest extraction,
  earnings estimates, options signal normalization, quant pre-score
- Configurable prompt logging (log_prompts_console flag)
- Enhanced ranker investment thesis (4-6 sentence reasoning)
- Typed DiscoveryConfig dataclass for all discovery settings
- Console price charts for visual ticker analysis

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 22:53:42 -08:00
Youssef Aitousarrah 369f8c444b feat: discovery system code quality improvements and concurrent execution
Implement comprehensive code quality improvements and performance optimizations
for the discovery pipeline based on code review findings.

## Key Improvements

### 1. Common Utilities (DRY Principle)
- Created `tradingagents/dataflows/discovery/common_utils.py`
- Extracted ticker parsing logic (eliminates 40+ lines of duplication)
- Centralized stopwords list (71 common non-ticker words)
- Added ReDoS protection (100KB text length limit)
- Provides `validate_candidate_structure()` for output validation

### 2. Scanner Output Validation
- Two-layer validation approach:
  - Registration-time: Check scanner class structure
  - Runtime: Validate each candidate dictionary
- Added `scan_with_validation()` wrapper in BaseScanner
- Validates required keys: ticker, source, context, priority
- Graceful error handling with structured logging

### 3. Configuration-Driven Design
- Moved magic numbers to `default_config.py`:
  - `ticker_universe`: Top 20 liquid options tickers
  - `min_volume`: 1000 (options flow threshold)
  - `min_transaction_value`: $25,000 (insider buying filter)
- Fixed hardcoded absolute paths to relative paths
- Improved portability across development environments

### 4. Concurrent Scanner Execution (37% Performance Gain)
- Implemented ThreadPoolExecutor for parallel scanner execution
- Configuration: `scanner_execution.concurrent`, `max_workers`, `timeout_seconds`
- Performance: 42s vs 67s (37% faster with 8 scanners)
- Thread-safe state management (each scanner gets copy)
- Per-scanner timeout with graceful degradation
- Error isolation (one failure doesn't stop others)

### 5. Error Handling Improvements
- Changed bare `except:` to `except Exception:` (avoid catching KeyboardInterrupt)
- Added structured logging with `exc_info=True` and extra fields
- Implemented graceful degradation throughout pipeline

## Files Changed

### Core Implementation
- `tradingagents/__init__.py` (NEW) - Package initialization
- `tradingagents/default_config.py` - Scanner execution config, magic numbers
- `tradingagents/graph/discovery_graph.py` - Concurrent execution logic
- `tradingagents/dataflows/discovery/common_utils.py` (NEW) - Shared utilities
- `tradingagents/dataflows/discovery/scanner_registry.py` - Validation wrapper
- `tradingagents/dataflows/discovery/scanners/*.py` - Use common utilities

### Testing & Documentation
- `tests/test_concurrent_scanners.py` (NEW) - Comprehensive test suite
- `verify_concurrent_execution.py` (NEW) - Performance verification
- `CONCURRENT_EXECUTION.md` (NEW) - Implementation documentation

## Test Results

All tests passing (exit code 0):
-  Concurrent execution: 42s, 66-69 candidates
-  Sequential fallback: 56-67s, 65-68 candidates
-  Timeout handling: Graceful degradation with 1s timeout
-  Error isolation: Individual failures don't cascade

## Performance Impact

- Scanner execution: 37% faster (42s vs 67s)
- Time saved: ~25 seconds per discovery run
- At scale: 4+ minutes saved daily in production
- Same candidate quality (65-69 tickers in both modes)

## Breaking Changes

None. Concurrent execution is opt-in via config flag.
Sequential mode remains available as fallback.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 23:27:01 -08:00
Youssef Aitousarrah 2376fc74a1 Update 2025-12-11 00:23:28 -08:00
Youssef Aitousarrah ea4ee9176b Update 2025-12-09 23:16:53 -08:00
Youssef Aitousarrah 5cf57e5d97 Update 2025-12-02 20:49:42 -08:00
Edward Sun 7bb2941b07 optimized yfin fetching to be much faster 2025-10-06 19:58:01 -07:00
Edward Sun c07dcf026b added fallbacks for tools 2025-10-03 22:40:09 -07:00
luohy15 b01051b9f4 Switch default data vendor
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 12:43:27 +08:00