From fa322bf78497a793290866c097c698d2f94b14c2 Mon Sep 17 00:00:00 2001
From: Youssef Aitousarrah <youssef.aitousarrah@gmail.com>
Date: Mon, 13 Apr 2026 12:40:23 -0700
Subject: [PATCH] hypothesis(social_dd): register ranker suppression
 statistical hypothesis
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Analysis of 25 picks reveals 60% 30d win rate (+2.32%) vs 41.7% 7d (-1.92%).
Score suppression is not the primary issue (avg score 71.5, 22/25 >= 65).
Root cause is evaluation horizon mismatch — ranker calibrated on 7d outcomes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/iterations/hypotheses/active.json | 18 ++++++++++++++++++
 docs/iterations/scanners/social_dd.md  | 15 ++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/docs/iterations/hypotheses/active.json b/docs/iterations/hypotheses/active.json
index c9ca222f..9f6efc1a 100644
--- a/docs/iterations/hypotheses/active.json
+++ b/docs/iterations/hypotheses/active.json
@@ -22,5 +22,23 @@
       "baseline_scanner": "insider_buying",
       "conclusion": null
     }
+    ,{
+      "id": "social_dd-ranker-suppression",
+      "scanner": "social_dd",
+      "title": "Does ranker suppression cause us to miss social_dd 30d winners?",
+      "description": "social_dd shows 60% 30d win rate (+2.32% avg) but only 41.7% 7d (-1.92%). Hypothesis: the ranker and recommendation system evaluate at 7d horizon, unfairly penalizing a slow-win scanner. Most picks (22/25) already score >=65, so score suppression is not the primary issue — horizon mismatch is.",
+      "branch": null,
+      "pr_number": null,
+      "status": "pending",
+      "priority": 0,
+      "expected_impact": "medium",
+      "hypothesis_type": "statistical",
+      "created_at": "2026-04-13",
+      "min_days": 0,
+      "days_elapsed": 0,
+      "picks_log": [],
+      "baseline_scanner": "social_dd",
+      "conclusion": null
+    }
   ]
 }
\ No newline at end of file
diff --git a/docs/iterations/scanners/social_dd.md b/docs/iterations/scanners/social_dd.md
index b4d0c000..8bd80a0c 100644
--- a/docs/iterations/scanners/social_dd.md
+++ b/docs/iterations/scanners/social_dd.md
@@ -20,7 +20,16 @@ incorrect. Setups currently score below 65 and are filtered by the score thresho
 - 0 mature recommendations from discovery pipeline (no recommendation generated from this appearance).
 - Confidence: medium (outcome data from scoring.md gives P&L context, but very few appearances in discovery pipeline)
 
+### 2026-04-13 — Statistical analysis (n=25 picks)
+- Avg score: 71.5 — most picks (22/25) already score ≥65. Ranker suppression is an outlier case, not systematic.
+- 7d win rate: 41.7%, avg 7d return: -1.92% — poor short-term.
+- 30d win rate: 60.0%, avg 30d return: +2.32% — confirmed slow-win profile.
+- High-conf (≥7, n=9): 30d win rate 55.6% — high confidence does not add meaningful edge over base rate.
+- **Key insight**: the evaluation horizon mismatch is the real issue. Downstream recommendation scoring and ranker calibration use 7d outcomes, which penalize social_dd unfairly. The scanner works — but only at 30d.
+- Confidence: high (n=25, consistent with prior 55% 30d finding)
+
 ## Pending Hypotheses
-- [ ] Does the ranker's "social_dd / social_hype → SPECULATIVE" grouping suppress social_dd scores, causing us to miss 30d winners?
-- [ ] Should social_dd get a separate ranker treatment from social_hype, given divergent 30d outcomes?
-- [ ] At what social score threshold (>75? >85?) does the setup reliably score ≥65 to generate recommendations?
+- [x] Does the ranker's "social_dd / social_hype → SPECULATIVE" grouping suppress social_dd scores? → **Partially false**: avg score is 71.5, suppression affects only 3/25 picks. Not the primary issue.
+- [ ] Should social_dd get a separate ranker treatment from social_hype, given divergent 30d outcomes? → Still open. social_hype 7d win rate 14.3% vs social_dd 30d 60% — they are fundamentally different signals.
+- [ ] Fix evaluation horizon: ranker and recommendation system should assess social_dd at 30d, not 7d. This may require a scanner-level `eval_horizon` config field.
+- [ ] At what social score threshold (>75? >85?) does the setup reliably score ≥65 to generate recommendations? → Lower priority now that suppression is not the main issue.