2.0 KiB
2.0 KiB
Semantic News Scanner
Current Understanding
Currently regex-based extraction, not semantic. Headline text IS included in
candidate context via news_headline field (improved from prior version).
Catalyst classification from headline keywords maps to priority:
- CRITICAL: FDA approval, acquisition, merger, breakthrough
- HIGH: upgrade, beat, contract win, patent, guidance raise
- MEDIUM: downgrade, miss, lawsuit, investigation, recall, warning
P&L data shows news_catalyst is the worst-performing strategy: -17.5% avg 30d
return, 0% 7d win rate, 12.5% 1d win rate. Root cause: MEDIUM-priority candidates
(negative catalysts — downgrades, lawsuits, recalls) are included in the candidate
pool and frequently get through to recommendations with a bullish framing. Scanner
now restricted to CRITICAL-only to eliminate negative-catalyst contamination.
Evidence Log
2026-04-11 — P&L review
- 8 recommendations, 1d win rate 12.5%, 7d win rate 0% (worst of all strategies).
- Avg 30d return: -17.5%. Avg 1d return: -4.19%. Avg 7d return: -8.79%.
- Sample shows WTI (W&T Offshore) appearing twice (Apr 3 and Apr 6) as news_catalyst based on geopolitical oil price spike — both marked as "high" risk. The spike reversed, consistent with the -17.5% 30d outcome.
- Root issue 1: MEDIUM-priority keywords include negative events (downgrade, miss, lawsuit) that generate candidates with inherently negative thesis.
- Root issue 2: CRITICAL/HIGH keywords like "upgrade" and "patent" overlap with noise in global news feeds that mention these terms incidentally.
- Fix applied: only emit candidates when headline matches CRITICAL-priority keywords. Eliminates the negative-catalyst false positives.
- Confidence: medium (8 data points; market downturn may amplify losses)
Pending Hypotheses
- Would embedding-based semantic matching outperform keyword regex?
- Does catalyst classification (FDA vs earnings vs acquisition) affect hit rate?
- Do CRITICAL-only candidates (post-fix) outperform CRITICAL+HIGH baseline?