Merge pull request #13 from Aitous/research/current

research: short squeeze scanner — 2026-04-12
2026-04-12 19:21:25 -07:00 · 2026-04-12 19:21:25 -07:00 · ddb2a010e7
parent 50889fd667 f73681cf1c
commit ddb2a010e7
8 changed files with 254 additions and 1 deletions
--- a/docs/iterations/LEARNINGS.md
+++ b/docs/iterations/LEARNINGS.md
@ -10,7 +10,16 @@
 | analyst_upgrades | scanners/analyst_upgrades.md | 2026-04-12 | 50% 7d win rate (breakeven); cross-scanner confluence with options_flow is positive signal |
 | earnings_calendar | scanners/earnings_calendar.md | 2026-04-12 | Appears as earnings_play; 38.1% 1d, 37.7% 7d — poor; best setups require high short interest |
 | pipeline/scoring | pipeline/scoring.md | 2026-04-12 | stats summary now surfaces worst performers; news_catalyst 0% 7d, social_hype 14.3% 7d — worst strategies |
+| early_accumulation | scanners/early_accumulation.md | 2026-04-12 | Sub-threshold (score=60); no catalyst → structurally score-capped by ranker |
+| social_dd | scanners/social_dd.md | 2026-04-12 | Sub-threshold (score=56); BUT 55% 30d win rate — diverges from social_hype; ranker may be suppressing it incorrectly |
 | volume_accumulation | scanners/volume_accumulation.md | — | No data yet |
+| short_squeeze | scanners/short_squeeze.md | — | No data yet — new scanner, research: high SI (>20%) + catalyst = squeeze risk; not a directional signal alone |
+
+## Research
+
+| Title | File | Date | Summary |
+|-------|------|------|---------|
+| Short Interest Squeeze Scanner | research/2026-04-12-short-interest-squeeze.md | 2026-04-12 | High SI (>20%) + DTC >5 as squeeze-risk discovery; implemented as short_squeeze scanner |
 | reddit_dd | scanners/reddit_dd.md | — | No data yet |
 | reddit_trending | scanners/reddit_trending.md | — | No data yet |
 | semantic_news | scanners/semantic_news.md | — | No data yet |
--- a/docs/iterations/research/2026-04-12-short-interest-squeeze.md
+++ b/docs/iterations/research/2026-04-12-short-interest-squeeze.md
@ -0,0 +1,66 @@
+# Research: Short Interest Squeeze Scanner
+
+**Date:** 2026-04-12
+**Mode:** autonomous
+
+## Summary
+
+Stocks with high short interest (>20% of float) and high days-to-cover (DTC >5) face elevated squeeze
+risk when a positive catalyst arrives — earnings beat, news, or unusual options activity. Academic
+literature confirms that *decreases* in short interest predict positive future returns (14.6% annualized
+for distressed firms), while raw high SI alone is actually a negative long-term indicator. The edge
+here is not buying high-SI blindly, but using high SI + catalyst as a squeeze-risk scanner: a
+discovery tool that surfaces stocks where short sellers are structurally vulnerable.
+
+## Sources Reviewed
+
+- QuantifiedStrategies (short squeeze backtest): Short squeeze strategies alone backtested poorly —
+  rarity and randomness of squeezes prevent a reliable standalone edge
+- Alpha Architect (DTC & short covering): DTC is a better predictor of poor returns than raw SI;
+  long-short strategy using DTC generated 1.2% monthly return; short covering (SI decrease) signals
+  informed belief change
+- QuantPedia / academic: SI decrease in distressed firms predicts +14.6% annualized risk-adjusted
+  return; short sellers are informed traders whose exit signals conviction shift
+- Scanz / practitioner screeners: Consensus thresholds — SI% of float > 10% (moderate), >20%
+  (high), DTC > 5 (high squeeze pressure)
+- tosindicators.com: "Upcoming earnings with high short interest" scan is a common institutional
+  approach — validates the earnings_calendar pending hypothesis
+- earnings_calendar.md (internal): Pending hypothesis that SI > 20% pre-earnings produces better
+  outcomes; APLD (30.6% SI, score=75) was the strongest recent earnings setup
+- social_dd.md (internal): GME scan (15.7% SI, score=56) showed 55% 30d win rate — best 30d
+  performer in pipeline
+
+## Fit Evaluation
+
+| Dimension | Score | Notes |
+|-----------|-------|-------|
+| Data availability | ✅ | `get_short_interest(return_structured=True)` in `finviz_scraper.py` fully integrated |
+| Complexity | trivial | Wrap existing function, map to `{ticker, source, context, priority}` format |
+| Signal uniqueness | low overlap | No existing standalone short-interest scanner; social_dd uses SI as one factor among many |
+| Evidence quality | qualitative | Academic support for DTC as predictor; practitioner consensus on thresholds |
+
+## Recommendation
+
+**Implement** — The data source is already integrated and the signal fills a genuine gap. The scanner
+should NOT simply buy high-SI stocks (negative long-term returns). Instead, it surfaces squeeze
+candidates for downstream ranker scoring: stocks where short sellers are structurally vulnerable and
+any catalyst could force rapid covering. The ranker then assigns final conviction based on cross-
+scanner signals (options flow, earnings, news). This directly addresses the earnings_calendar pending
+hypothesis (SI > 20% pre-earnings).
+
+## Proposed Scanner Spec
+
+- **Scanner name:** `short_squeeze`
+- **Data source:** `tradingagents/dataflows/finviz_scraper.py` → `get_short_interest(return_structured=True)`
+- **Signal logic:**
+  - Fetch Finviz tickers with SI > 15% of float, verified by Yahoo Finance
+  - CRITICAL: SI >= 30% (extreme squeeze risk — one catalyst away from violent covering)
+  - HIGH: SI >= 20% (high squeeze potential — elevated squeeze risk)
+  - MEDIUM: SI >= 15% (moderate squeeze potential — worth watching)
+  - Context string includes: SI%, DTC if available, squeeze signal label
+- **Priority rules:**
+  - CRITICAL if `short_interest_pct >= 30` (extreme_squeeze_risk)
+  - HIGH if `short_interest_pct >= 20` (high_squeeze_potential)
+  - MEDIUM otherwise (moderate_squeeze_potential)
+- **Context format:** `"Short interest {SI:.1f}% of float — {signal_label} | squeeze risk if catalyst arrives"`
+- **Strategy tag:** `short_squeeze`
--- a/docs/iterations/scanners/early_accumulation.md
+++ b/docs/iterations/scanners/early_accumulation.md
@ -0,0 +1,24 @@
+# Early Accumulation Scanner
+
+## Current Understanding
+Detects quiet accumulation patterns: rising OBV, price above 50/200 SMA, low ATR
+(low volatility), and bullish MACD crossover — without requiring a strong near-term
+catalyst. Designed for slow-grind setups rather than explosive moves. The absence of
+an immediate catalyst structurally limits the LLM's score assignment, since the ranker
+rewards urgency and specificity. This may cause systematic under-scoring relative to
+true edge.
+
+## Evidence Log
+
+### 2026-04-12 — Fast-loop (2026-04-12 run)
+- Single appearance: FRT (Federal Realty Investment Trust), score=60, conf=6, risk_level=low.
+- Thesis: +1.55% daily price move, OBV 12.3M rising, MACD crossover, ATR 1.7% (low risk).
+- Score sub-threshold (60 < 65). Key weakness per thesis: "lack of immediate catalysts" and overbought Stochastic (88.7).
+- Pattern observation: early_accumulation may be structurally score-capped by ranker's catalyst-weighting. A score of 60 with conf=6 on a low-risk setup may represent miscalibration rather than poor edge.
+- 0 mature recommendations (no recommendation generated from this appearance).
+- Confidence: low (single data point, no outcome data)
+
+## Pending Hypotheses
+- [ ] Does early_accumulation systematically score 55-65 due to ranker penalizing "no catalyst"? If so, the scoring.md penalty logic may need adjustment.
+- [ ] Do early_accumulation setups produce better 30d returns than 7d returns (slow-grind nature)?
+- [ ] Is the overbought Stochastic reading a reliable short-term timing filter to delay entry?
--- a/docs/iterations/scanners/short_squeeze.md
+++ b/docs/iterations/scanners/short_squeeze.md
@ -0,0 +1,20 @@
+# Short Squeeze Scanner
+
+## Current Understanding
+Identifies stocks with structurally high short interest (>15% of float by default, CRITICAL at >30%)
+where short sellers are vulnerable to forced covering on any positive catalyst. The scanner uses
+Finviz for discovery (screener filters) + Yahoo Finance for exact SI% verification.
+
+Key distinction: High SI alone predicts *negative* long-term returns on average (academic consensus).
+The scanner is a squeeze-risk flag, not a directional buy signal. Value comes from cross-scanner
+confluence: a stock appearing here AND in options_flow or earnings_calendar is significantly stronger
+than either signal alone.
+
+## Evidence Log
+
+_(populated by /iterate runs)_
+
+## Pending Hypotheses
+- [ ] Does short_squeeze + options_flow confluence produce better 7d win rate than either scanner alone?
+- [ ] Does short_squeeze + earnings_calendar (SI>20%) produce better outcomes than earnings alone? (See earnings_calendar.md pending hypothesis)
+- [ ] Is there a volume threshold (e.g., market cap <$2B small-cap) that sharpens the signal?
--- a/docs/iterations/scanners/social_dd.md
+++ b/docs/iterations/scanners/social_dd.md
@ -0,0 +1,26 @@
+# Social DD Scanner
+
+## Current Understanding
+Identifies speculative momentum setups driven by high social sentiment scores and
+elevated short interest (potential short squeeze). Despite a speculative surface-level
+profile, early P&L data shows 55% 30d win rate and the only scanner positive at 30d
+(+0.94% avg 30d return). This DIVERGES from `social_hype` (14.3% 7d win rate) —
+`social_dd` likely includes more fundamental corroboration (short interest, OBV, MACD)
+versus pure social sentiment. Current ranker prompt groups them together, which may be
+incorrect. Setups currently score below 65 and are filtered by the score threshold.
+
+## Evidence Log
+
+### 2026-04-12 — Fast-loop (2026-04-08 run)
+- Single appearance: GME, score=56, conf=5, risk_level=speculative.
+- Thesis: Social DD score 75/100 + 15.7% short interest + bullish MACD crossover.
+- Score sub-threshold (56 < 65). Negative signals in thesis: weak fundamentals (-13.9% revenue growth), insider selling $330k.
+- **Critical context from scoring.md P&L review**: social_dd historically shows 55% 30d win rate, +0.94% avg 30d — the only scanner positive at 30d. This suggests the scanner has real edge but requires a longer holding period than 1-7 days.
+- Current ranker prompt groups social_dd with social_hype as "SPECULATIVE" — this may cause social_dd to be systematically under-scored, suppressing a legitimate slow-win strategy.
+- 0 mature recommendations from discovery pipeline (no recommendation generated from this appearance).
+- Confidence: medium (outcome data from scoring.md gives P&L context, but very few appearances in discovery pipeline)
+
+## Pending Hypotheses
+- [ ] Does the ranker's "social_dd / social_hype → SPECULATIVE" grouping suppress social_dd scores, causing us to miss 30d winners?
+- [ ] Should social_dd get a separate ranker treatment from social_hype, given divergent 30d outcomes?
+- [ ] At what social score threshold (>75? >85?) does the setup reliably score ≥65 to generate recommendations?
--- a/tradingagents/dataflows/discovery/ranker.py
+++ b/tradingagents/dataflows/discovery/ranker.py
@ -302,7 +302,8 @@ Each candidate was discovered by a specific scanner. Evaluate them using the cri
 - **options_flow**: Focus on put/call ratio, absolute call VOLUME vs open interest, premium size, and whether flow aligns with the technical trend. Unusually low P/C ratios (<0.1) with high volume are strongest.
 - **momentum / technical_breakout**: Focus on volume confirmation (>2x average), trend alignment (above key SMAs), and whether momentum is accelerating or fading. Avoid chasing extended moves (RSI >80).
 - **earnings_play**: Focus on short interest (squeeze potential), pre-earnings accumulation signals, analyst estimate trends, and historical earnings surprise rate. Binary risk must be acknowledged.
- **social_dd / social_hype**: Treat as SPECULATIVE. Require corroborating technical or fundamental evidence. Pure social sentiment without data backing should score low.
+- **social_dd**: Has shown 55% 30d win rate — strongest long-hold scanner. These setups combine social sentiment WITH technical confirmation (OBV, short interest, MACD). Score based on quality of technical/fundamental corroboration. A strong OBV + high short interest + bullish MACD warrants 65-75. DO NOT conflate with social_hype.
+- **social_hype**: Treat as SPECULATIVE (14.3% 7d win rate, -4.84% avg 7d return). Require strong corroborating evidence. Pure social sentiment without data backing should score below 50.
 - **short_squeeze**: Focus on short interest %, days to cover, cost to borrow, and whether a catalyst exists to trigger covering. High SI alone is not enough.
 - **contrarian_value**: Focus on oversold technicals (RSI <30), fundamental support (earnings stability), and a clear reason why the selloff is overdone.
 - **news_catalyst**: Focus on the materiality of the news, whether it's already priced in (check intraday move), and the timeline of impact.
--- a/tradingagents/dataflows/discovery/scanners/init.py
+++ b/tradingagents/dataflows/discovery/scanners/init.py
@ -13,6 +13,7 @@ from . import (
    reddit_trending,  # noqa: F401
    sector_rotation,  # noqa: F401
    semantic_news,  # noqa: F401
+    short_squeeze,  # noqa: F401
    technical_breakout,  # noqa: F401
    volume_accumulation,  # noqa: F401
 )
--- a/tradingagents/dataflows/discovery/scanners/short_squeeze.py
+++ b/tradingagents/dataflows/discovery/scanners/short_squeeze.py
@ -0,0 +1,106 @@
+"""Short interest squeeze-risk scanner.
+
+Surfaces stocks with structurally elevated short interest where any positive
+catalyst (earnings beat, news, options activity) could force rapid short covering.
+
+Research basis: docs/iterations/research/2026-04-12-short-interest-squeeze.md
+Key insight: High SI alone predicts *negative* long-term returns (mean reversion);
+the edge is using high SI as a squeeze-risk flag for downstream cross-scanner
+ranker scoring, not as a directional buy signal on its own.
+"""
+
+from typing import Any, Dict, List
+
+from tradingagents.dataflows.discovery.scanner_registry import SCANNER_REGISTRY, BaseScanner
+from tradingagents.dataflows.discovery.utils import Priority
+from tradingagents.utils.logger import get_logger
+
+logger = get_logger(__name__)
+
+_SIGNAL_LABELS = {
+    "extreme_squeeze_risk": "extreme squeeze risk",
+    "high_squeeze_potential": "high squeeze potential",
+    "moderate_squeeze_potential": "moderate squeeze potential",
+    "low_squeeze_potential": "low squeeze potential",
+}
+
+
+class ShortSqueezeScanner(BaseScanner):
+    """Scan for stocks with high short interest and elevated squeeze risk."""
+
+    name = "short_squeeze"
+    pipeline = "edge"
+    strategy = "short_squeeze"
+
+    def __init__(self, config: Dict[str, Any]):
+        super().__init__(config)
+        self.min_short_interest_pct = self.scanner_config.get("min_short_interest_pct", 15.0)
+        self.min_days_to_cover = self.scanner_config.get("min_days_to_cover", 2.0)
+        self.top_n = self.scanner_config.get("top_n", 20)
+
+    def scan(self, state: Dict[str, Any]) -> List[Dict[str, Any]]:
+        if not self.is_enabled():
+            return []
+
+        logger.info(f"📉 Scanning short interest (SI >{self.min_short_interest_pct}%)...")
+
+        try:
+            from tradingagents.dataflows.finviz_scraper import get_short_interest
+
+            raw = get_short_interest(
+                min_short_interest_pct=self.min_short_interest_pct,
+                min_days_to_cover=self.min_days_to_cover,
+                top_n=self.top_n,
+                return_structured=True,
+            )
+
+            if not raw:
+                logger.info("No short squeeze candidates found")
+                return []
+
+            logger.info(f"Found {len(raw)} high short interest candidates")
+
+            candidates = []
+            for item in raw:
+                ticker = item.get("ticker", "").upper().strip()
+                if not ticker:
+                    continue
+
+                si_pct = item.get("short_interest_pct", 0)
+                signal = item.get("signal", "low_squeeze_potential")
+                label = _SIGNAL_LABELS.get(signal, signal)
+
+                # Priority based on squeeze intensity
+                if signal == "extreme_squeeze_risk":
+                    priority = Priority.CRITICAL.value
+                elif signal == "high_squeeze_potential":
+                    priority = Priority.HIGH.value
+                else:
+                    priority = Priority.MEDIUM.value
+
+                context = (
+                    f"Short interest {si_pct:.1f}% of float — {label}"
+                    " | squeeze risk elevates if catalyst arrives"
+                )
+
+                candidates.append(
+                    {
+                        "ticker": ticker,
+                        "source": self.name,
+                        "context": context,
+                        "priority": priority,
+                        "strategy": self.strategy,
+                        "short_interest_pct": si_pct,
+                        "squeeze_signal": signal,
+                    }
+                )
+
+            candidates = candidates[: self.limit]
+            return candidates
+
+        except Exception as e:
+            logger.warning(f"⚠️  Short squeeze scanner failed: {e}")
+            return []
+
+
+SCANNER_REGISTRY.register(ShortSqueezeScanner)