research(dark-pool-flow): update — meridianfin.io scraping unblocks implementation; recommend dark_pool_flow scanner

This commit is contained in:
Youssef Aitousarrah 2026-04-16 12:19:00 -07:00
parent e374ee62e1
commit 3e464ba558
2 changed files with 39 additions and 21 deletions

View File

@ -25,7 +25,7 @@
| Short Interest Squeeze Scanner | research/2026-04-12-short-interest-squeeze.md | 2026-04-12 | High SI (>20%) + DTC >5 as squeeze-risk discovery; implemented as short_squeeze scanner |
| 52-Week High Breakout Momentum | research/2026-04-13-52-week-high-breakout.md | 2026-04-13 | George & Hwang (2004) validated: 52w high crossing + 1.5x volume = 72% win rate, +11.4% avg over 31d; implemented as high_52w_breakout scanner |
| PEAD Post-Earnings Drift | research/2026-04-14-pead-earnings-beat.md | 2026-04-14 | Bernard & Thomas (1989): 18% annualized PEAD; QuantPedia: 15% annualized (1987-2004); implemented as earnings_beat scanner (distinct from earnings_calendar's upcoming-only scope) |
| Dark Pool / Block Trade Flow | research/2026-04-16-dark-pool-flow.md | 2026-04-16 | Real predictive content (Zhu 2012, Buti 2022) but blocked: no free real-time feed; FINRA ATS lags 2 weeks; revisit if Unusual Whales API access obtained |
| Dark Pool / Block Trade Flow | research/2026-04-16-dark-pool-flow.md | 2026-04-16 | Zhu 2012 + Buti 2022 academic backing; meridianfin.io provides free scrapable daily FINRA Z-scored anomalies (no auth, 1-day lag); recommend implement as dark_pool_flow scanner in edge pipeline |
| reddit_dd | scanners/reddit_dd.md | — | No data yet |
| reddit_trending | scanners/reddit_trending.md | — | No data yet |
| semantic_news | scanners/semantic_news.md | — | No data yet |

View File

@ -3,38 +3,56 @@
**Date:** 2026-04-16
**Mode:** autonomous
> **Auto-implementation skipped:** Data availability ❌ — no free real-time dark pool feed exists; FINRA ATS data has 1-2 week delay, making it unusable for next-day discovery.
## Summary
Dark pool order flow (off-exchange block trades) does predict short-term returns in academic literature, but the signal requires real-time directional data (bid/ask side classification) that is only available via paid vendors (Unusual Whales, FlowAlgo, ~$50-200/mo). The free FINRA ATS Transparency data lags by 1-2 weeks, making it useless for a daily scanner. Without a data source, implementation is blocked regardless of signal quality.
Dark pool order flow (off-exchange block trades) predicts short-term returns in academic literature. A free, scrapable data source exists: `meridianfin.io/darkpool` surfaces daily FINRA ATS anomalies with Z-scores pre-computed, no auth required, plain HTML table. Data lags 1 day (FINRA ATS settlement). Signal: tickers with dark pool % anomaly (Z-score ≥ 2.0) are experiencing unusual institutional off-exchange accumulation — a pre-move signal distinct from any existing scanner.
## Sources Reviewed
- **Buti, Rindi & Werner (2022), Financial Management**: Dark pool retail order imbalance predicts future returns; effect is non-linear and context-dependent (regime-filtered by volatility)
- **Zhu (2012), NY Fed**: Price discovery in dark pools impaired above a critical imbalance threshold; strong-signal traders prefer exchanges, moderate-signal traders use dark pools
- **Unusual Whales docs**: Real-time dark pool prints classified bullish/bearish by bid/ask side; tiered by size ($5k, $15k, $30k+); subscription required
- **OptionsTradingOrg practitioner guide**: Block filter = trades >10,000 shares or >$200K; volume surge filter = dark pool volume >2-3x 30d average; confirmation = dark pool buy + OTM call sweeps on ask side
- **FINRA ATS Transparency**: Free, covers all ATS/dark pool volume by ticker — but data lags 1-2 weeks; not suitable for discovery
- **Buti, Rindi & Werner (2022), Financial Management**: Dark pool retail order imbalance predicts future returns; effect is non-linear and regime-dependent
- **Zhu (2012), NY Fed**: Strong-signal traders prefer lit exchanges; moderate-signal traders route to dark pools — so a dark pool surge suggests informed but not fully certain buying
- **Unusual Whales docs**: Real-time prints with bid/ask classification; subscription required — not used here
- **OptionsTradingOrg practitioner guide**: Volume surge >2-3x 30d average + dark pool % >40-50% of daily volume = actionable signal
- **meridianfin.io/darkpool**: Free, daily, FINRA-based; shows Ticker, Off-Exchange Vol, Dark Pool %, Z-score, Date; 8 top anomalies per day; scrapable with `requests` + `BeautifulSoup`; no auth needed
- **FINRA ATS Transparency (raw)**: Free CSV downloads but require joining multiple venue files and rolling baseline computation — Meridian does this work for us
## Fit Evaluation
| Dimension | Score | Notes |
|-----------|-------|-------|
| Data availability | ❌ | No free real-time feed; FINRA ATS lags 2 weeks; paid vendors ($50-200/mo) required |
| Complexity | moderate | Would need API client for Unusual Whales + directional classification logic |
| Signal uniqueness | low overlap | No existing dark pool scanner; closest is `options_flow` which uses public options data |
| Evidence quality | backtested | Zhu (2012) and Buti et al. (2022) provide academic evidence; practitioner signal logic well-documented |
| Data availability | ✅ | `meridianfin.io/darkpool` — free HTML table, 1-day lag, no auth, `requests`+BS4 sufficient |
| Complexity | moderate | ~2-4h: HTTP scraper + BS4 parser + scanner class + config entry |
| Signal uniqueness | low overlap | No dark pool scanner exists; `options_flow` uses options chains not off-exchange prints |
| Evidence quality | backtested | Zhu (2012) and Buti et al. (2022) academic backing; volume surge threshold validated by practitioners |
## Recommendation
**Skip (data blocker)** — The signal has genuine predictive content backed by academic evidence, but implementation requires a paid data vendor subscription. If Unusual Whales or FlowAlgo API access becomes available, this is a high-priority scanner to build.
**Implement** — all four thresholds pass. Signal has academic backing, data is free and scrapable, complexity is moderate, no overlap with existing scanners.
## Signal Logic (for future reference if data becomes available)
## Proposed Scanner Spec
- **Entry signal**: Dark pool print volume > 2x 30-day rolling average AND classified as bullish (ask-side fill)
- **Confirmation**: Same-day or prior-day OTM call sweep on the ask side
- **Priority**: CRITICAL if both conditions met + repeat prints over 3+ days; HIGH if single large bullish print; MEDIUM if volume surge without directional confirmation
- **Context format**: `"Dark pool: {volume:,} shares ({pct_of_daily:.0f}% of daily vol) | {bullish_pct:.0f}% bullish flow | {confirmation}"`
- **Suggested thresholds**: `min_dark_pool_volume_multiple: 2.0`, `min_bullish_pct: 55`, `min_block_size: 10000`
- **Data source needed**: Unusual Whales API (or equivalent) at `unusualwhales.com`
- **Scanner name:** `dark_pool_flow`
- **Pipeline:** `edge` (off-exchange institutional flow = information advantage)
- **Data source:** Scrape `https://meridianfin.io/darkpool` daily with `requests` + `BeautifulSoup`
- **Signal logic:**
1. Fetch the anomaly table (up to 8 rows, all pre-filtered by Meridian's Z-score engine)
2. Filter: Z-score ≥ `min_z_score` (default 2.0)
3. Filter: dark pool % ≥ `min_dark_pool_pct` (default 40.0%)
4. Return all passing tickers as candidates
- **Priority rules:**
- CRITICAL if Z-score ≥ 4.0
- HIGH if Z-score ≥ 3.0
- MEDIUM otherwise
- **Context format:** `"Dark pool anomaly: {dark_pool_pct:.1f}% off-exchange | Z-score {z_score:.2f} | Vol: {off_exchange_vol:,}"`
- **Config parameters:**
```python
"dark_pool_flow": {
"enabled": True,
"pipeline": "edge",
"limit": 8,
"min_z_score": 2.0, # Minimum FINRA ATS anomaly Z-score
"min_dark_pool_pct": 40.0, # Minimum % of daily volume off-exchange
"source_url": "https://meridianfin.io/darkpool",
}
```
- **Limitation:** 1-day lag (FINRA ATS settlement); no bid/ask directionality; only ~8 tickers/day surfaced