Merge pull request #13 from Aitous/research/current

research: short squeeze scanner — 2026-04-12
This commit is contained in:
Aitous 2026-04-12 19:21:25 -07:00 committed by GitHub
commit ddb2a010e7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 254 additions and 1 deletions

View File

@ -10,7 +10,16 @@
| analyst_upgrades | scanners/analyst_upgrades.md | 2026-04-12 | 50% 7d win rate (breakeven); cross-scanner confluence with options_flow is positive signal |
| earnings_calendar | scanners/earnings_calendar.md | 2026-04-12 | Appears as earnings_play; 38.1% 1d, 37.7% 7d — poor; best setups require high short interest |
| pipeline/scoring | pipeline/scoring.md | 2026-04-12 | stats summary now surfaces worst performers; news_catalyst 0% 7d, social_hype 14.3% 7d — worst strategies |
| early_accumulation | scanners/early_accumulation.md | 2026-04-12 | Sub-threshold (score=60); no catalyst → structurally score-capped by ranker |
| social_dd | scanners/social_dd.md | 2026-04-12 | Sub-threshold (score=56); BUT 55% 30d win rate — diverges from social_hype; ranker may be suppressing it incorrectly |
| volume_accumulation | scanners/volume_accumulation.md | — | No data yet |
| short_squeeze | scanners/short_squeeze.md | — | No data yet — new scanner, research: high SI (>20%) + catalyst = squeeze risk; not a directional signal alone |
## Research
| Title | File | Date | Summary |
|-------|------|------|---------|
| Short Interest Squeeze Scanner | research/2026-04-12-short-interest-squeeze.md | 2026-04-12 | High SI (>20%) + DTC >5 as squeeze-risk discovery; implemented as short_squeeze scanner |
| reddit_dd | scanners/reddit_dd.md | — | No data yet |
| reddit_trending | scanners/reddit_trending.md | — | No data yet |
| semantic_news | scanners/semantic_news.md | — | No data yet |

View File

@ -0,0 +1,66 @@
# Research: Short Interest Squeeze Scanner
**Date:** 2026-04-12
**Mode:** autonomous
## Summary
Stocks with high short interest (>20% of float) and high days-to-cover (DTC >5) face elevated squeeze
risk when a positive catalyst arrives — earnings beat, news, or unusual options activity. Academic
literature confirms that *decreases* in short interest predict positive future returns (14.6% annualized
for distressed firms), while raw high SI alone is actually a negative long-term indicator. The edge
here is not buying high-SI blindly, but using high SI + catalyst as a squeeze-risk scanner: a
discovery tool that surfaces stocks where short sellers are structurally vulnerable.
## Sources Reviewed
- QuantifiedStrategies (short squeeze backtest): Short squeeze strategies alone backtested poorly —
rarity and randomness of squeezes prevent a reliable standalone edge
- Alpha Architect (DTC & short covering): DTC is a better predictor of poor returns than raw SI;
long-short strategy using DTC generated 1.2% monthly return; short covering (SI decrease) signals
informed belief change
- QuantPedia / academic: SI decrease in distressed firms predicts +14.6% annualized risk-adjusted
return; short sellers are informed traders whose exit signals conviction shift
- Scanz / practitioner screeners: Consensus thresholds — SI% of float > 10% (moderate), >20%
(high), DTC > 5 (high squeeze pressure)
- tosindicators.com: "Upcoming earnings with high short interest" scan is a common institutional
approach — validates the earnings_calendar pending hypothesis
- earnings_calendar.md (internal): Pending hypothesis that SI > 20% pre-earnings produces better
outcomes; APLD (30.6% SI, score=75) was the strongest recent earnings setup
- social_dd.md (internal): GME scan (15.7% SI, score=56) showed 55% 30d win rate — best 30d
performer in pipeline
## Fit Evaluation
| Dimension | Score | Notes |
|-----------|-------|-------|
| Data availability | ✅ | `get_short_interest(return_structured=True)` in `finviz_scraper.py` fully integrated |
| Complexity | trivial | Wrap existing function, map to `{ticker, source, context, priority}` format |
| Signal uniqueness | low overlap | No existing standalone short-interest scanner; social_dd uses SI as one factor among many |
| Evidence quality | qualitative | Academic support for DTC as predictor; practitioner consensus on thresholds |
## Recommendation
**Implement** — The data source is already integrated and the signal fills a genuine gap. The scanner
should NOT simply buy high-SI stocks (negative long-term returns). Instead, it surfaces squeeze
candidates for downstream ranker scoring: stocks where short sellers are structurally vulnerable and
any catalyst could force rapid covering. The ranker then assigns final conviction based on cross-
scanner signals (options flow, earnings, news). This directly addresses the earnings_calendar pending
hypothesis (SI > 20% pre-earnings).
## Proposed Scanner Spec
- **Scanner name:** `short_squeeze`
- **Data source:** `tradingagents/dataflows/finviz_scraper.py``get_short_interest(return_structured=True)`
- **Signal logic:**
- Fetch Finviz tickers with SI > 15% of float, verified by Yahoo Finance
- CRITICAL: SI >= 30% (extreme squeeze risk — one catalyst away from violent covering)
- HIGH: SI >= 20% (high squeeze potential — elevated squeeze risk)
- MEDIUM: SI >= 15% (moderate squeeze potential — worth watching)
- Context string includes: SI%, DTC if available, squeeze signal label
- **Priority rules:**
- CRITICAL if `short_interest_pct >= 30` (extreme_squeeze_risk)
- HIGH if `short_interest_pct >= 20` (high_squeeze_potential)
- MEDIUM otherwise (moderate_squeeze_potential)
- **Context format:** `"Short interest {SI:.1f}% of float — {signal_label} | squeeze risk if catalyst arrives"`
- **Strategy tag:** `short_squeeze`

View File

@ -0,0 +1,24 @@
# Early Accumulation Scanner
## Current Understanding
Detects quiet accumulation patterns: rising OBV, price above 50/200 SMA, low ATR
(low volatility), and bullish MACD crossover — without requiring a strong near-term
catalyst. Designed for slow-grind setups rather than explosive moves. The absence of
an immediate catalyst structurally limits the LLM's score assignment, since the ranker
rewards urgency and specificity. This may cause systematic under-scoring relative to
true edge.
## Evidence Log
### 2026-04-12 — Fast-loop (2026-04-12 run)
- Single appearance: FRT (Federal Realty Investment Trust), score=60, conf=6, risk_level=low.
- Thesis: +1.55% daily price move, OBV 12.3M rising, MACD crossover, ATR 1.7% (low risk).
- Score sub-threshold (60 < 65). Key weakness per thesis: "lack of immediate catalysts" and overbought Stochastic (88.7).
- Pattern observation: early_accumulation may be structurally score-capped by ranker's catalyst-weighting. A score of 60 with conf=6 on a low-risk setup may represent miscalibration rather than poor edge.
- 0 mature recommendations (no recommendation generated from this appearance).
- Confidence: low (single data point, no outcome data)
## Pending Hypotheses
- [ ] Does early_accumulation systematically score 55-65 due to ranker penalizing "no catalyst"? If so, the scoring.md penalty logic may need adjustment.
- [ ] Do early_accumulation setups produce better 30d returns than 7d returns (slow-grind nature)?
- [ ] Is the overbought Stochastic reading a reliable short-term timing filter to delay entry?

View File

@ -0,0 +1,20 @@
# Short Squeeze Scanner
## Current Understanding
Identifies stocks with structurally high short interest (>15% of float by default, CRITICAL at >30%)
where short sellers are vulnerable to forced covering on any positive catalyst. The scanner uses
Finviz for discovery (screener filters) + Yahoo Finance for exact SI% verification.
Key distinction: High SI alone predicts *negative* long-term returns on average (academic consensus).
The scanner is a squeeze-risk flag, not a directional buy signal. Value comes from cross-scanner
confluence: a stock appearing here AND in options_flow or earnings_calendar is significantly stronger
than either signal alone.
## Evidence Log
_(populated by /iterate runs)_
## Pending Hypotheses
- [ ] Does short_squeeze + options_flow confluence produce better 7d win rate than either scanner alone?
- [ ] Does short_squeeze + earnings_calendar (SI>20%) produce better outcomes than earnings alone? (See earnings_calendar.md pending hypothesis)
- [ ] Is there a volume threshold (e.g., market cap <$2B small-cap) that sharpens the signal?

View File

@ -0,0 +1,26 @@
# Social DD Scanner
## Current Understanding
Identifies speculative momentum setups driven by high social sentiment scores and
elevated short interest (potential short squeeze). Despite a speculative surface-level
profile, early P&L data shows 55% 30d win rate and the only scanner positive at 30d
(+0.94% avg 30d return). This DIVERGES from `social_hype` (14.3% 7d win rate) —
`social_dd` likely includes more fundamental corroboration (short interest, OBV, MACD)
versus pure social sentiment. Current ranker prompt groups them together, which may be
incorrect. Setups currently score below 65 and are filtered by the score threshold.
## Evidence Log
### 2026-04-12 — Fast-loop (2026-04-08 run)
- Single appearance: GME, score=56, conf=5, risk_level=speculative.
- Thesis: Social DD score 75/100 + 15.7% short interest + bullish MACD crossover.
- Score sub-threshold (56 < 65). Negative signals in thesis: weak fundamentals (-13.9% revenue growth), insider selling $330k.
- **Critical context from scoring.md P&L review**: social_dd historically shows 55% 30d win rate, +0.94% avg 30d — the only scanner positive at 30d. This suggests the scanner has real edge but requires a longer holding period than 1-7 days.
- Current ranker prompt groups social_dd with social_hype as "SPECULATIVE" — this may cause social_dd to be systematically under-scored, suppressing a legitimate slow-win strategy.
- 0 mature recommendations from discovery pipeline (no recommendation generated from this appearance).
- Confidence: medium (outcome data from scoring.md gives P&L context, but very few appearances in discovery pipeline)
## Pending Hypotheses
- [ ] Does the ranker's "social_dd / social_hype → SPECULATIVE" grouping suppress social_dd scores, causing us to miss 30d winners?
- [ ] Should social_dd get a separate ranker treatment from social_hype, given divergent 30d outcomes?
- [ ] At what social score threshold (>75? >85?) does the setup reliably score ≥65 to generate recommendations?

View File

@ -302,7 +302,8 @@ Each candidate was discovered by a specific scanner. Evaluate them using the cri
- **options_flow**: Focus on put/call ratio, absolute call VOLUME vs open interest, premium size, and whether flow aligns with the technical trend. Unusually low P/C ratios (<0.1) with high volume are strongest.
- **momentum / technical_breakout**: Focus on volume confirmation (>2x average), trend alignment (above key SMAs), and whether momentum is accelerating or fading. Avoid chasing extended moves (RSI >80).
- **earnings_play**: Focus on short interest (squeeze potential), pre-earnings accumulation signals, analyst estimate trends, and historical earnings surprise rate. Binary risk must be acknowledged.
- **social_dd / social_hype**: Treat as SPECULATIVE. Require corroborating technical or fundamental evidence. Pure social sentiment without data backing should score low.
- **social_dd**: Has shown 55% 30d win rate strongest long-hold scanner. These setups combine social sentiment WITH technical confirmation (OBV, short interest, MACD). Score based on quality of technical/fundamental corroboration. A strong OBV + high short interest + bullish MACD warrants 65-75. DO NOT conflate with social_hype.
- **social_hype**: Treat as SPECULATIVE (14.3% 7d win rate, -4.84% avg 7d return). Require strong corroborating evidence. Pure social sentiment without data backing should score below 50.
- **short_squeeze**: Focus on short interest %, days to cover, cost to borrow, and whether a catalyst exists to trigger covering. High SI alone is not enough.
- **contrarian_value**: Focus on oversold technicals (RSI <30), fundamental support (earnings stability), and a clear reason why the selloff is overdone.
- **news_catalyst**: Focus on the materiality of the news, whether it's already priced in (check intraday move), and the timeline of impact.

View File

@ -13,6 +13,7 @@ from . import (
reddit_trending, # noqa: F401
sector_rotation, # noqa: F401
semantic_news, # noqa: F401
short_squeeze, # noqa: F401
technical_breakout, # noqa: F401
volume_accumulation, # noqa: F401
)

View File

@ -0,0 +1,106 @@
"""Short interest squeeze-risk scanner.
Surfaces stocks with structurally elevated short interest where any positive
catalyst (earnings beat, news, options activity) could force rapid short covering.
Research basis: docs/iterations/research/2026-04-12-short-interest-squeeze.md
Key insight: High SI alone predicts *negative* long-term returns (mean reversion);
the edge is using high SI as a squeeze-risk flag for downstream cross-scanner
ranker scoring, not as a directional buy signal on its own.
"""
from typing import Any, Dict, List
from tradingagents.dataflows.discovery.scanner_registry import SCANNER_REGISTRY, BaseScanner
from tradingagents.dataflows.discovery.utils import Priority
from tradingagents.utils.logger import get_logger
logger = get_logger(__name__)
_SIGNAL_LABELS = {
"extreme_squeeze_risk": "extreme squeeze risk",
"high_squeeze_potential": "high squeeze potential",
"moderate_squeeze_potential": "moderate squeeze potential",
"low_squeeze_potential": "low squeeze potential",
}
class ShortSqueezeScanner(BaseScanner):
"""Scan for stocks with high short interest and elevated squeeze risk."""
name = "short_squeeze"
pipeline = "edge"
strategy = "short_squeeze"
def __init__(self, config: Dict[str, Any]):
super().__init__(config)
self.min_short_interest_pct = self.scanner_config.get("min_short_interest_pct", 15.0)
self.min_days_to_cover = self.scanner_config.get("min_days_to_cover", 2.0)
self.top_n = self.scanner_config.get("top_n", 20)
def scan(self, state: Dict[str, Any]) -> List[Dict[str, Any]]:
if not self.is_enabled():
return []
logger.info(f"📉 Scanning short interest (SI >{self.min_short_interest_pct}%)...")
try:
from tradingagents.dataflows.finviz_scraper import get_short_interest
raw = get_short_interest(
min_short_interest_pct=self.min_short_interest_pct,
min_days_to_cover=self.min_days_to_cover,
top_n=self.top_n,
return_structured=True,
)
if not raw:
logger.info("No short squeeze candidates found")
return []
logger.info(f"Found {len(raw)} high short interest candidates")
candidates = []
for item in raw:
ticker = item.get("ticker", "").upper().strip()
if not ticker:
continue
si_pct = item.get("short_interest_pct", 0)
signal = item.get("signal", "low_squeeze_potential")
label = _SIGNAL_LABELS.get(signal, signal)
# Priority based on squeeze intensity
if signal == "extreme_squeeze_risk":
priority = Priority.CRITICAL.value
elif signal == "high_squeeze_potential":
priority = Priority.HIGH.value
else:
priority = Priority.MEDIUM.value
context = (
f"Short interest {si_pct:.1f}% of float — {label}"
" | squeeze risk elevates if catalyst arrives"
)
candidates.append(
{
"ticker": ticker,
"source": self.name,
"context": context,
"priority": priority,
"strategy": self.strategy,
"short_interest_pct": si_pct,
"squeeze_signal": signal,
}
)
candidates = candidates[: self.limit]
return candidates
except Exception as e:
logger.warning(f"⚠️ Short squeeze scanner failed: {e}")
return []
SCANNER_REGISTRY.register(ShortSqueezeScanner)