Merge branch 'feature/hypothesis-backtesting'
This commit is contained in:
commit
704e257dd9
|
|
@ -0,0 +1,159 @@
|
||||||
|
# /backtest-hypothesis
|
||||||
|
|
||||||
|
Test a hypothesis about a scanner improvement using branch-per-hypothesis isolation.
|
||||||
|
|
||||||
|
**Usage:** `/backtest-hypothesis "<description of the hypothesis>"`
|
||||||
|
|
||||||
|
**Example:** `/backtest-hypothesis "options_flow: scan 3 expirations instead of 1 to capture institutional 30+ DTE positioning"`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Read Current Registry
|
||||||
|
|
||||||
|
Read `docs/iterations/hypotheses/active.json`. Note:
|
||||||
|
- How many hypotheses currently have `status: "running"`
|
||||||
|
- The `max_active` limit (default 5)
|
||||||
|
- Any existing `pending` entries
|
||||||
|
|
||||||
|
Also read `docs/iterations/LEARNINGS.md` and the relevant scanner domain file in
|
||||||
|
`docs/iterations/scanners/` to understand the current baseline.
|
||||||
|
|
||||||
|
## Step 2: Classify the Hypothesis
|
||||||
|
|
||||||
|
Determine whether this is:
|
||||||
|
|
||||||
|
**Statistical** — answerable from existing data in `data/recommendations/performance_database.json`
|
||||||
|
without any code change. Examples:
|
||||||
|
- "Does high confidence (≥8) predict better 30d returns?"
|
||||||
|
- "Are options_flow picks that are ITM outperforming OTM ones?"
|
||||||
|
|
||||||
|
**Implementation** — requires a code change and forward-testing period. Examples:
|
||||||
|
- "Scan 3 expirations instead of 1"
|
||||||
|
- "Apply a premium filter of $50K instead of $25K"
|
||||||
|
|
||||||
|
## Step 3a: Statistical Path
|
||||||
|
|
||||||
|
If statistical: run the analysis now against `data/recommendations/performance_database.json`.
|
||||||
|
Write the finding to the relevant scanner domain file under **Evidence Log**. Print a summary.
|
||||||
|
Done — no branch needed.
|
||||||
|
|
||||||
|
## Step 3b: Implementation Path
|
||||||
|
|
||||||
|
### 3b-i: Capacity check
|
||||||
|
|
||||||
|
Count running hypotheses from `active.json`. If fewer than `max_active` running, proceed.
|
||||||
|
If at capacity: add the new hypothesis as `status: "pending"` — running experiments are NEVER
|
||||||
|
paused mid-streak. Inform the user which slot it is queued behind and when it will likely start.
|
||||||
|
|
||||||
|
### 3b-ii: Score the hypothesis
|
||||||
|
|
||||||
|
Assign a `priority` score (1–9) using these factors:
|
||||||
|
|
||||||
|
| Factor | Score |
|
||||||
|
|---|---|
|
||||||
|
| Scanner 30d win rate < 40% | +3 |
|
||||||
|
| Change touches 1 file, 1 parameter | +2 |
|
||||||
|
| Directly addresses a weak spot in LEARNINGS.md | +2 |
|
||||||
|
| Scanner generates ≥2 picks/day (data accrues fast) | +1 |
|
||||||
|
| Supported by external research (arXiv, Alpha Architect, etc.) | +1 |
|
||||||
|
| Contradictory evidence or unclear direction | −2 |
|
||||||
|
|
||||||
|
### 3b-iii: Determine min_days
|
||||||
|
|
||||||
|
Set `min_days` based on the scanner's typical picks-per-day rate:
|
||||||
|
- ≥2 picks/day → 14 days
|
||||||
|
- 1 pick/day → 21 days
|
||||||
|
- <1 pick/day → 30 days
|
||||||
|
|
||||||
|
### 3b-iv: Create the branch and implement the code change
|
||||||
|
|
||||||
|
```bash
|
||||||
|
BRANCH="hypothesis/<scanner>-<slug>"
|
||||||
|
git checkout -b "$BRANCH"
|
||||||
|
```
|
||||||
|
|
||||||
|
Make the minimal code change that implements the hypothesis. Read the scanner file first.
|
||||||
|
Only change what the hypothesis requires — do not refactor surrounding code.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add tradingagents/
|
||||||
|
git commit -m "hypothesis(<scanner>): <title>"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3b-v: Create picks tracking file on the branch
|
||||||
|
|
||||||
|
Create `docs/iterations/hypotheses/<id>/picks.json` on the hypothesis branch:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"hypothesis_id": "<id>",
|
||||||
|
"scanner": "<scanner>",
|
||||||
|
"picks": []
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p docs/iterations/hypotheses/<id>
|
||||||
|
git add docs/iterations/hypotheses/<id>/picks.json
|
||||||
|
git commit -m "hypothesis(<scanner>): add picks tracker"
|
||||||
|
git push -u origin "$BRANCH"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3b-vi: Open a draft PR
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gh pr create \
|
||||||
|
--title "hypothesis(<scanner>): <title>" \
|
||||||
|
--body "**Hypothesis:** <description>
|
||||||
|
|
||||||
|
**Expected impact:** <high/medium/low>
|
||||||
|
**Min days:** <N>
|
||||||
|
**Priority:** <score>/9
|
||||||
|
|
||||||
|
*This is an automated hypothesis experiment. It will be auto-concluded after ${MIN_DAYS} days of data.*" \
|
||||||
|
--draft \
|
||||||
|
--base main
|
||||||
|
```
|
||||||
|
|
||||||
|
Note the PR number from the output.
|
||||||
|
|
||||||
|
### 3b-vii: Update active.json on main
|
||||||
|
|
||||||
|
Check out `main`, then update `docs/iterations/hypotheses/active.json` to add the new entry:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "<scanner>-<slug>",
|
||||||
|
"scanner": "<scanner>",
|
||||||
|
"title": "<title>",
|
||||||
|
"description": "<description>",
|
||||||
|
"branch": "hypothesis/<scanner>-<slug>",
|
||||||
|
"pr_number": <N>,
|
||||||
|
"status": "running",
|
||||||
|
"priority": <score>,
|
||||||
|
"expected_impact": "<high|medium|low>",
|
||||||
|
"hypothesis_type": "implementation",
|
||||||
|
"created_at": "<YYYY-MM-DD>",
|
||||||
|
"min_days": <N>,
|
||||||
|
"days_elapsed": 0,
|
||||||
|
"picks_log": [],
|
||||||
|
"baseline_scanner": "<scanner>",
|
||||||
|
"conclusion": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git checkout main
|
||||||
|
git add docs/iterations/hypotheses/active.json
|
||||||
|
git commit -m "feat(hypotheses): register hypothesis <id>"
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: Print Summary
|
||||||
|
|
||||||
|
Print a confirmation:
|
||||||
|
- Hypothesis ID and branch name
|
||||||
|
- Status: running or pending
|
||||||
|
- Expected conclusion date (created_at + min_days)
|
||||||
|
- PR link (if running)
|
||||||
|
- Priority score and why
|
||||||
|
|
@ -0,0 +1,74 @@
|
||||||
|
name: Hypothesis Runner
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
# 8:00 AM UTC daily — runs after iterate (06:00 UTC)
|
||||||
|
- cron: "0 8 * * *"
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
hypothesis_id:
|
||||||
|
description: "Run a specific hypothesis ID only (blank = all running)"
|
||||||
|
required: false
|
||||||
|
default: ""
|
||||||
|
|
||||||
|
env:
|
||||||
|
PYTHON_VERSION: "3.10"
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
run-hypotheses:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: TradingAgent
|
||||||
|
timeout-minutes: 60
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0
|
||||||
|
token: ${{ secrets.GH_TOKEN }}
|
||||||
|
|
||||||
|
- name: Set up git identity
|
||||||
|
run: |
|
||||||
|
git config user.name "github-actions[bot]"
|
||||||
|
git config user.email "github-actions[bot]@users.noreply.github.com"
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: ${{ env.PYTHON_VERSION }}
|
||||||
|
cache: pip
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install --upgrade pip && pip install -e .
|
||||||
|
|
||||||
|
- name: Run hypothesis experiments
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_TOKEN }}
|
||||||
|
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||||
|
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||||
|
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||||
|
FINNHUB_API_KEY: ${{ secrets.FINNHUB_API_KEY }}
|
||||||
|
ALPHA_VANTAGE_API_KEY: ${{ secrets.ALPHA_VANTAGE_API_KEY }}
|
||||||
|
FMP_API_KEY: ${{ secrets.FMP_API_KEY }}
|
||||||
|
REDDIT_CLIENT_ID: ${{ secrets.REDDIT_CLIENT_ID }}
|
||||||
|
REDDIT_CLIENT_SECRET: ${{ secrets.REDDIT_CLIENT_SECRET }}
|
||||||
|
TRADIER_API_KEY: ${{ secrets.TRADIER_API_KEY }}
|
||||||
|
FILTER_ID: ${{ inputs.hypothesis_id }}
|
||||||
|
run: |
|
||||||
|
python scripts/run_hypothesis_runner.py
|
||||||
|
|
||||||
|
- name: Commit active.json updates
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_TOKEN }}
|
||||||
|
run: |
|
||||||
|
git add docs/iterations/hypotheses/active.json docs/iterations/hypotheses/concluded/ || true
|
||||||
|
if git diff --cached --quiet; then
|
||||||
|
echo "No registry changes"
|
||||||
|
else
|
||||||
|
git commit -m "chore(hypotheses): update registry $(date -u +%Y-%m-%d)"
|
||||||
|
git pull --rebase origin main
|
||||||
|
git push origin main
|
||||||
|
fi
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
{
|
||||||
|
"max_active": 5,
|
||||||
|
"hypotheses": []
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,164 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Hypothesis comparison — computes 7d returns for hypothesis picks and
|
||||||
|
compares them against the baseline scanner in performance_database.json.
|
||||||
|
|
||||||
|
Usage (called by hypothesis-runner.yml after min_days elapsed):
|
||||||
|
python scripts/compare_hypothesis.py \
|
||||||
|
--hypothesis-id options_flow-scan-3-expirations \
|
||||||
|
--picks-json '[{"date": "2026-04-01", "ticker": "AAPL", ...}]' \
|
||||||
|
--scanner options_flow \
|
||||||
|
--db-path data/recommendations/performance_database.json
|
||||||
|
|
||||||
|
Prints a JSON conclusion to stdout.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional, Tuple
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parent.parent
|
||||||
|
sys.path.insert(0, str(ROOT))
|
||||||
|
|
||||||
|
from tradingagents.dataflows.y_finance import download_history
|
||||||
|
|
||||||
|
_MIN_EVALUATED = 5
|
||||||
|
_WIN_RATE_DELTA_THRESHOLD = 5.0
|
||||||
|
_AVG_RETURN_DELTA_THRESHOLD = 1.0
|
||||||
|
|
||||||
|
|
||||||
|
def compute_7d_return(ticker: str, pick_date: str) -> Tuple[Optional[float], Optional[bool]]:
|
||||||
|
"""Fetch 7-day return for a pick using yfinance. Returns (pct, is_win) or (None, None)."""
|
||||||
|
try:
|
||||||
|
entry_dt = datetime.strptime(pick_date, "%Y-%m-%d")
|
||||||
|
exit_dt = entry_dt + timedelta(days=10)
|
||||||
|
df = download_history(
|
||||||
|
ticker,
|
||||||
|
start=entry_dt.strftime("%Y-%m-%d"),
|
||||||
|
end=exit_dt.strftime("%Y-%m-%d"),
|
||||||
|
)
|
||||||
|
if df.empty or len(df) < 2:
|
||||||
|
return None, None
|
||||||
|
close = df["Close"]
|
||||||
|
entry_price = float(close.iloc[0])
|
||||||
|
exit_idx = min(6, len(close) - 1)
|
||||||
|
exit_price = float(close.iloc[exit_idx])
|
||||||
|
if entry_price <= 0:
|
||||||
|
return None, None
|
||||||
|
ret = (exit_price - entry_price) / entry_price * 100
|
||||||
|
return round(ret, 4), ret > 0
|
||||||
|
except Exception:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
|
||||||
|
def enrich_picks_with_returns(picks: list) -> list:
|
||||||
|
"""Compute 7d return for each pick >= 7 days old that lacks return_7d."""
|
||||||
|
cutoff = (datetime.utcnow() - timedelta(days=14)).strftime("%Y-%m-%d")
|
||||||
|
for pick in picks:
|
||||||
|
if pick.get("return_7d") is not None:
|
||||||
|
continue
|
||||||
|
if pick.get("date", "9999-99-99") > cutoff:
|
||||||
|
continue
|
||||||
|
ret, win = compute_7d_return(pick["ticker"], pick["date"])
|
||||||
|
pick["return_7d"] = ret
|
||||||
|
pick["win_7d"] = win
|
||||||
|
return picks
|
||||||
|
|
||||||
|
|
||||||
|
def compute_metrics(picks: list) -> dict:
|
||||||
|
"""Compute win rate and avg return. Only picks with non-None return_7d are evaluated."""
|
||||||
|
evaluated = [p for p in picks if p.get("return_7d") is not None]
|
||||||
|
if not evaluated:
|
||||||
|
return {"count": len(picks), "evaluated": 0, "win_rate": None, "avg_return": None}
|
||||||
|
wins = sum(1 for p in evaluated if p.get("win_7d"))
|
||||||
|
avg_ret = sum(p["return_7d"] for p in evaluated) / len(evaluated)
|
||||||
|
return {
|
||||||
|
"count": len(picks),
|
||||||
|
"evaluated": len(evaluated),
|
||||||
|
"win_rate": round(wins / len(evaluated) * 100, 1),
|
||||||
|
"avg_return": round(avg_ret, 2),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def load_baseline_metrics(scanner: str, db_path: str) -> dict:
|
||||||
|
"""Load baseline metrics for a scanner from performance_database.json."""
|
||||||
|
path = Path(db_path)
|
||||||
|
if not path.exists():
|
||||||
|
return {"count": 0, "win_rate": None, "avg_return": None}
|
||||||
|
try:
|
||||||
|
with open(path) as f:
|
||||||
|
db = json.load(f)
|
||||||
|
except Exception:
|
||||||
|
return {"count": 0, "win_rate": None, "avg_return": None}
|
||||||
|
picks = []
|
||||||
|
for recs in db.get("recommendations_by_date", {}).values():
|
||||||
|
for rec in (recs if isinstance(recs, list) else []):
|
||||||
|
if rec.get("strategy_match") == scanner and rec.get("return_7d") is not None:
|
||||||
|
picks.append(rec)
|
||||||
|
return compute_metrics(picks)
|
||||||
|
|
||||||
|
|
||||||
|
def make_decision(hypothesis: dict, baseline: dict) -> Tuple[str, str]:
|
||||||
|
"""Decide accepted/rejected. Requires _MIN_EVALUATED evaluated picks."""
|
||||||
|
evaluated = hypothesis.get("evaluated", 0)
|
||||||
|
if evaluated < _MIN_EVALUATED:
|
||||||
|
return (
|
||||||
|
"rejected",
|
||||||
|
f"Insufficient data: only {evaluated} evaluated picks (need {_MIN_EVALUATED})",
|
||||||
|
)
|
||||||
|
hyp_wr = hypothesis.get("win_rate")
|
||||||
|
hyp_ret = hypothesis.get("avg_return")
|
||||||
|
base_wr = baseline.get("win_rate")
|
||||||
|
base_ret = baseline.get("avg_return")
|
||||||
|
reasons = []
|
||||||
|
if hyp_wr is not None and base_wr is not None:
|
||||||
|
delta_wr = hyp_wr - base_wr
|
||||||
|
if delta_wr > _WIN_RATE_DELTA_THRESHOLD:
|
||||||
|
reasons.append(
|
||||||
|
f"win rate improved by {delta_wr:+.1f}pp ({base_wr:.1f}% → {hyp_wr:.1f}%)"
|
||||||
|
)
|
||||||
|
if hyp_ret is not None and base_ret is not None:
|
||||||
|
delta_ret = hyp_ret - base_ret
|
||||||
|
if delta_ret > _AVG_RETURN_DELTA_THRESHOLD:
|
||||||
|
reasons.append(
|
||||||
|
f"avg return improved by {delta_ret:+.2f}% ({base_ret:+.2f}% → {hyp_ret:+.2f}%)"
|
||||||
|
)
|
||||||
|
if reasons:
|
||||||
|
return "accepted", "; ".join(reasons)
|
||||||
|
wr_str = (
|
||||||
|
f"{hyp_wr:.1f}% vs baseline {base_wr:.1f}%" if hyp_wr is not None else "no win rate data"
|
||||||
|
)
|
||||||
|
ret_str = (
|
||||||
|
f"{hyp_ret:+.2f}% vs baseline {base_ret:+.2f}%" if hyp_ret is not None else "no return data"
|
||||||
|
)
|
||||||
|
return "rejected", f"No significant improvement — win rate: {wr_str}; avg return: {ret_str}"
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--hypothesis-id", required=True)
|
||||||
|
parser.add_argument("--picks-json", required=True)
|
||||||
|
parser.add_argument("--scanner", required=True)
|
||||||
|
parser.add_argument("--db-path", default="data/recommendations/performance_database.json")
|
||||||
|
args = parser.parse_args()
|
||||||
|
picks = json.loads(args.picks_json)
|
||||||
|
picks = enrich_picks_with_returns(picks)
|
||||||
|
hyp_metrics = compute_metrics(picks)
|
||||||
|
base_metrics = load_baseline_metrics(args.scanner, args.db_path)
|
||||||
|
decision, reason = make_decision(hyp_metrics, base_metrics)
|
||||||
|
result = {
|
||||||
|
"hypothesis_id": args.hypothesis_id,
|
||||||
|
"decision": decision,
|
||||||
|
"reason": reason,
|
||||||
|
"hypothesis": hyp_metrics,
|
||||||
|
"baseline": base_metrics,
|
||||||
|
"enriched_picks": picks,
|
||||||
|
}
|
||||||
|
print(json.dumps(result, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,399 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Hypothesis Runner — orchestrates daily experiment cycles.
|
||||||
|
|
||||||
|
For each running hypothesis in active.json:
|
||||||
|
1. Creates a git worktree for the hypothesis branch
|
||||||
|
2. Runs the daily discovery pipeline in that worktree
|
||||||
|
3. Extracts picks from the discovery result, appends to picks.json
|
||||||
|
4. Commits and pushes picks to hypothesis branch
|
||||||
|
5. Removes worktree
|
||||||
|
6. Updates active.json (days_elapsed, picks_log)
|
||||||
|
7. If days_elapsed >= min_days: concludes the hypothesis
|
||||||
|
|
||||||
|
After all hypotheses: promotes highest-priority pending → running if a slot opened.
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
FILTER_ID — if set, only run the hypothesis with this ID
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parent.parent
|
||||||
|
sys.path.insert(0, str(ROOT))
|
||||||
|
|
||||||
|
ACTIVE_JSON = ROOT / "docs/iterations/hypotheses/active.json"
|
||||||
|
CONCLUDED_DIR = ROOT / "docs/iterations/hypotheses/concluded"
|
||||||
|
DB_PATH = ROOT / "data/recommendations/performance_database.json"
|
||||||
|
TODAY = datetime.utcnow().strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
|
||||||
|
def load_registry() -> dict:
|
||||||
|
with open(ACTIVE_JSON) as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
|
||||||
|
def save_registry(registry: dict) -> None:
|
||||||
|
with open(ACTIVE_JSON, "w") as f:
|
||||||
|
json.dump(registry, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
def run(cmd: list, cwd: str = None, check: bool = True) -> subprocess.CompletedProcess:
|
||||||
|
print(f" $ {' '.join(cmd)}", flush=True)
|
||||||
|
return subprocess.run(cmd, cwd=cwd or str(ROOT), check=check, capture_output=False)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_picks(worktree: str, scanner: str) -> list:
|
||||||
|
"""Extract picks for the given scanner from the most recent discovery result in the worktree."""
|
||||||
|
results_dir = Path(worktree) / "results" / "discovery" / TODAY
|
||||||
|
if not results_dir.exists():
|
||||||
|
print(f" No discovery results for {TODAY} in worktree", flush=True)
|
||||||
|
return []
|
||||||
|
picks = []
|
||||||
|
for run_dir in sorted(results_dir.iterdir()):
|
||||||
|
result_file = run_dir / "discovery_result.json"
|
||||||
|
if not result_file.exists():
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
with open(result_file) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
for item in data.get("final_ranking", []):
|
||||||
|
if item.get("strategy_match") == scanner:
|
||||||
|
picks.append(
|
||||||
|
{
|
||||||
|
"date": TODAY,
|
||||||
|
"ticker": item["ticker"],
|
||||||
|
"score": item.get("final_score"),
|
||||||
|
"confidence": item.get("confidence"),
|
||||||
|
"scanner": scanner,
|
||||||
|
"return_7d": None,
|
||||||
|
"win_7d": None,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Warning: could not read {result_file}: {e}", flush=True)
|
||||||
|
return picks
|
||||||
|
|
||||||
|
|
||||||
|
def load_picks_from_branch(hypothesis_id: str, branch: str) -> list:
|
||||||
|
"""Load picks.json from the hypothesis branch using git show."""
|
||||||
|
picks_path = f"docs/iterations/hypotheses/{hypothesis_id}/picks.json"
|
||||||
|
result = subprocess.run(
|
||||||
|
["git", "show", f"{branch}:{picks_path}"],
|
||||||
|
cwd=str(ROOT),
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
if result.returncode != 0:
|
||||||
|
return []
|
||||||
|
try:
|
||||||
|
return json.loads(result.stdout).get("picks", [])
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def save_picks_to_worktree(worktree: str, hypothesis_id: str, scanner: str, picks: list) -> None:
|
||||||
|
"""Write updated picks.json into the worktree and commit."""
|
||||||
|
picks_dir = Path(worktree) / "docs" / "iterations" / "hypotheses" / hypothesis_id
|
||||||
|
picks_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
picks_file = picks_dir / "picks.json"
|
||||||
|
payload = {"hypothesis_id": hypothesis_id, "scanner": scanner, "picks": picks}
|
||||||
|
picks_file.write_text(json.dumps(payload, indent=2))
|
||||||
|
run(["git", "add", str(picks_file)], cwd=worktree)
|
||||||
|
result = subprocess.run(["git", "diff", "--cached", "--quiet"], cwd=worktree)
|
||||||
|
if result.returncode != 0:
|
||||||
|
run(
|
||||||
|
["git", "commit", "-m", f"chore(hypotheses): picks {TODAY} for {hypothesis_id}"],
|
||||||
|
cwd=worktree,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def run_hypothesis(hyp: dict) -> bool:
|
||||||
|
"""Run one hypothesis experiment cycle. Returns True if the experiment concluded."""
|
||||||
|
hid = hyp["id"]
|
||||||
|
# Validate id to prevent path traversal in worktree path
|
||||||
|
if not re.fullmatch(r"[a-zA-Z0-9_\-]+", hid):
|
||||||
|
print(f" Skipping hypothesis with invalid id: {hid!r}", flush=True)
|
||||||
|
return False
|
||||||
|
branch = hyp["branch"]
|
||||||
|
scanner = hyp["scanner"]
|
||||||
|
worktree = f"/tmp/hyp-{hid}"
|
||||||
|
|
||||||
|
print(f"\n── Hypothesis: {hid} ──", flush=True)
|
||||||
|
|
||||||
|
run(["git", "fetch", "origin", branch], check=False)
|
||||||
|
run(["git", "worktree", "add", worktree, branch])
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
[
|
||||||
|
sys.executable,
|
||||||
|
"scripts/run_daily_discovery.py",
|
||||||
|
"--date",
|
||||||
|
TODAY,
|
||||||
|
"--no-update-positions",
|
||||||
|
],
|
||||||
|
cwd=worktree,
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
if result.returncode != 0:
|
||||||
|
print(f" Discovery failed for {hid}, skipping picks update", flush=True)
|
||||||
|
else:
|
||||||
|
new_picks = extract_picks(worktree, scanner)
|
||||||
|
existing_picks = load_picks_from_branch(hid, branch)
|
||||||
|
seen = {(p["date"], p["ticker"]) for p in existing_picks}
|
||||||
|
merged = existing_picks + [p for p in new_picks if (p["date"], p["ticker"]) not in seen]
|
||||||
|
save_picks_to_worktree(worktree, hid, scanner, merged)
|
||||||
|
run(["git", "push", "origin", f"HEAD:{branch}"], cwd=worktree)
|
||||||
|
|
||||||
|
if TODAY not in hyp.get("picks_log", []):
|
||||||
|
hyp.setdefault("picks_log", []).append(TODAY)
|
||||||
|
hyp["days_elapsed"] = len(hyp["picks_log"])
|
||||||
|
|
||||||
|
if hyp["days_elapsed"] >= hyp["min_days"]:
|
||||||
|
return conclude_hypothesis(hyp)
|
||||||
|
|
||||||
|
finally:
|
||||||
|
run(["git", "worktree", "remove", "--force", worktree], check=False)
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def llm_analysis(hyp: dict, conclusion: dict, scanner_domain: str) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Ask Claude to interpret the experiment results and provide richer context.
|
||||||
|
|
||||||
|
Returns a markdown string to embed in the PR comment, or None if the API
|
||||||
|
call fails or ANTHROPIC_API_KEY is not set.
|
||||||
|
|
||||||
|
The LLM does NOT override the programmatic decision — it adds nuance:
|
||||||
|
sample-size caveats, market-condition context, follow-up hypotheses.
|
||||||
|
"""
|
||||||
|
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||||
|
if not api_key:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
import anthropic
|
||||||
|
except ImportError:
|
||||||
|
print(" anthropic SDK not installed, skipping LLM analysis", flush=True)
|
||||||
|
return None
|
||||||
|
|
||||||
|
hyp_metrics = conclusion["hypothesis"]
|
||||||
|
base_metrics = conclusion["baseline"]
|
||||||
|
decision = conclusion["decision"]
|
||||||
|
|
||||||
|
prompt = f"""You are analyzing the results of a scanner hypothesis experiment for an automated trading discovery system.
|
||||||
|
|
||||||
|
## Hypothesis
|
||||||
|
**ID:** {hyp["id"]}
|
||||||
|
**Title:** {hyp.get("title", "")}
|
||||||
|
**Description:** {hyp.get("description", hyp.get("title", ""))}
|
||||||
|
**Scanner:** {hyp["scanner"]}
|
||||||
|
**Period:** {hyp.get("created_at")} → {TODAY} ({hyp.get("days_elapsed")} days)
|
||||||
|
|
||||||
|
## Statistical Results
|
||||||
|
**Decision (programmatic):** {decision}
|
||||||
|
**Reason:** {conclusion["reason"]}
|
||||||
|
|
||||||
|
| Metric | Baseline | Experiment | Delta |
|
||||||
|
|---|---|---|---|
|
||||||
|
| 7d win rate | {base_metrics.get("win_rate") or "—"}% | {hyp_metrics.get("win_rate") or "—"}% | {_delta_str(hyp_metrics.get("win_rate"), base_metrics.get("win_rate"), "pp")} |
|
||||||
|
| Avg 7d return | {base_metrics.get("avg_return") or "—"}% | {hyp_metrics.get("avg_return") or "—"}% | {_delta_str(hyp_metrics.get("avg_return"), base_metrics.get("avg_return"), "%")} |
|
||||||
|
| Picks evaluated | {base_metrics.get("evaluated", base_metrics.get("count", "—"))} | {hyp_metrics.get("evaluated", hyp_metrics.get("count", "—"))} | — |
|
||||||
|
|
||||||
|
## Scanner Domain Knowledge
|
||||||
|
{scanner_domain}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Provide a concise analysis (3–5 sentences) covering:
|
||||||
|
1. Whether the sample size is sufficient to trust the result, or if more data is needed
|
||||||
|
2. Any caveats about the measurement period (e.g., unusual market conditions)
|
||||||
|
3. What the numbers suggest about the underlying hypothesis — even if the decision is "rejected", is the direction meaningful?
|
||||||
|
4. One concrete follow-up hypothesis worth testing next
|
||||||
|
|
||||||
|
Be direct. Do not restate the numbers — interpret them. Do not recommend merging or closing the PR."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
client = anthropic.Anthropic(api_key=api_key)
|
||||||
|
message = client.messages.create(
|
||||||
|
model="claude-haiku-4-5-20251001",
|
||||||
|
max_tokens=512,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
)
|
||||||
|
return message.content[0].text.strip()
|
||||||
|
except Exception as e:
|
||||||
|
print(f" LLM analysis failed: {e}", flush=True)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def conclude_hypothesis(hyp: dict) -> bool:
|
||||||
|
"""Run comparison, write conclusion doc, close/merge PR. Returns True."""
|
||||||
|
hid = hyp["id"]
|
||||||
|
scanner = hyp["scanner"]
|
||||||
|
branch = hyp["branch"]
|
||||||
|
|
||||||
|
print(f"\n Concluding {hid}...", flush=True)
|
||||||
|
|
||||||
|
picks = load_picks_from_branch(hid, branch)
|
||||||
|
if not picks:
|
||||||
|
conclusion = {
|
||||||
|
"decision": "rejected",
|
||||||
|
"reason": "No picks were collected during the experiment period",
|
||||||
|
"hypothesis": {"count": 0, "evaluated": 0, "win_rate": None, "avg_return": None},
|
||||||
|
"baseline": {"count": 0, "win_rate": None, "avg_return": None},
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
result = subprocess.run(
|
||||||
|
[
|
||||||
|
sys.executable,
|
||||||
|
"scripts/compare_hypothesis.py",
|
||||||
|
"--hypothesis-id",
|
||||||
|
hid,
|
||||||
|
"--picks-json",
|
||||||
|
json.dumps(picks),
|
||||||
|
"--scanner",
|
||||||
|
scanner,
|
||||||
|
"--db-path",
|
||||||
|
str(DB_PATH),
|
||||||
|
],
|
||||||
|
cwd=str(ROOT),
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
if result.returncode != 0:
|
||||||
|
print(f" compare_hypothesis.py failed: {result.stderr}", flush=True)
|
||||||
|
return False
|
||||||
|
conclusion = json.loads(result.stdout)
|
||||||
|
|
||||||
|
decision = conclusion["decision"]
|
||||||
|
hyp_metrics = conclusion["hypothesis"]
|
||||||
|
base_metrics = conclusion["baseline"]
|
||||||
|
|
||||||
|
# Load scanner domain knowledge (may not exist yet — that's fine)
|
||||||
|
scanner_domain_path = ROOT / "docs" / "iterations" / "scanners" / f"{scanner}.md"
|
||||||
|
scanner_domain = scanner_domain_path.read_text() if scanner_domain_path.exists() else ""
|
||||||
|
|
||||||
|
# Optional LLM analysis — enriches the conclusion without overriding the decision
|
||||||
|
analysis = llm_analysis(hyp, conclusion, scanner_domain)
|
||||||
|
analysis_section = f"\n\n## Analysis\n{analysis}" if analysis else ""
|
||||||
|
|
||||||
|
period_start = hyp.get("created_at", TODAY)
|
||||||
|
concluded_doc = CONCLUDED_DIR / f"{TODAY}-{hid}.md"
|
||||||
|
concluded_doc.write_text(
|
||||||
|
f"# Hypothesis: {hyp['title']}\n\n"
|
||||||
|
f"**Scanner:** {scanner}\n"
|
||||||
|
f"**Branch:** {branch}\n"
|
||||||
|
f"**Period:** {period_start} → {TODAY} ({hyp['days_elapsed']} days)\n"
|
||||||
|
f"**Outcome:** {'accepted ✅' if decision == 'accepted' else 'rejected ❌'}\n\n"
|
||||||
|
f"## Hypothesis\n{hyp.get('description', hyp['title'])}\n\n"
|
||||||
|
f"## Results\n\n"
|
||||||
|
f"| Metric | Baseline | Experiment | Delta |\n"
|
||||||
|
f"|---|---|---|---|\n"
|
||||||
|
f"| 7d win rate | {base_metrics.get('win_rate') or '—'}% | "
|
||||||
|
f"{hyp_metrics.get('win_rate') or '—'}% | "
|
||||||
|
f"{_delta_str(hyp_metrics.get('win_rate'), base_metrics.get('win_rate'), 'pp')} |\n"
|
||||||
|
f"| Avg return | {base_metrics.get('avg_return') or '—'}% | "
|
||||||
|
f"{hyp_metrics.get('avg_return') or '—'}% | "
|
||||||
|
f"{_delta_str(hyp_metrics.get('avg_return'), base_metrics.get('avg_return'), '%')} |\n"
|
||||||
|
f"| Picks | {base_metrics.get('count', '—')} | {hyp_metrics.get('count', '—')} | — |\n\n"
|
||||||
|
f"## Decision\n{conclusion['reason']}\n"
|
||||||
|
f"{analysis_section}\n\n"
|
||||||
|
f"## Action\n"
|
||||||
|
f"{'Ready to merge — awaiting manual review.' if decision == 'accepted' else 'Experiment concluded — awaiting manual review before closing.'}\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
run(["git", "add", str(concluded_doc)], check=False)
|
||||||
|
|
||||||
|
pr = hyp.get("pr_number")
|
||||||
|
if pr:
|
||||||
|
# Mark PR ready for review (removes draft status) and post conclusion as a comment.
|
||||||
|
# The PR is NOT merged or closed automatically — the user reviews and decides.
|
||||||
|
outcome_emoji = "✅ accepted" if decision == "accepted" else "❌ rejected"
|
||||||
|
analysis_block = f"\n\n**Analysis**\n{analysis}" if analysis else ""
|
||||||
|
comment = (
|
||||||
|
f"**Hypothesis concluded: {outcome_emoji}**\n\n"
|
||||||
|
f"{conclusion['reason']}\n\n"
|
||||||
|
f"| Metric | Baseline | Experiment |\n"
|
||||||
|
f"|---|---|---|\n"
|
||||||
|
f"| 7d win rate | {base_metrics.get('win_rate') or '—'}% | {hyp_metrics.get('win_rate') or '—'}% |\n"
|
||||||
|
f"| Avg return | {base_metrics.get('avg_return') or '—'}% | {hyp_metrics.get('avg_return') or '—'}% |\n"
|
||||||
|
f"{analysis_block}\n\n"
|
||||||
|
f"{'Merge this PR to apply the change.' if decision == 'accepted' else 'Close this PR to discard the experiment.'}"
|
||||||
|
)
|
||||||
|
subprocess.run(
|
||||||
|
["gh", "pr", "ready", str(pr)],
|
||||||
|
cwd=str(ROOT),
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
subprocess.run(
|
||||||
|
["gh", "pr", "comment", str(pr), "--body", comment],
|
||||||
|
cwd=str(ROOT),
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
hyp["status"] = "concluded"
|
||||||
|
hyp["conclusion"] = decision
|
||||||
|
|
||||||
|
print(f" {hid}: {decision} — {conclusion['reason']}", flush=True)
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _delta_str(hyp_val, base_val, unit: str) -> str:
|
||||||
|
if hyp_val is None or base_val is None:
|
||||||
|
return "—"
|
||||||
|
delta = hyp_val - base_val
|
||||||
|
sign = "+" if delta >= 0 else ""
|
||||||
|
return f"{sign}{delta:.1f}{unit}"
|
||||||
|
|
||||||
|
|
||||||
|
def promote_pending(registry: dict) -> None:
|
||||||
|
"""Promote the highest-priority pending hypothesis to running if a slot is open."""
|
||||||
|
running_count = sum(1 for h in registry["hypotheses"] if h["status"] == "running")
|
||||||
|
max_active = registry.get("max_active", 5)
|
||||||
|
if running_count >= max_active:
|
||||||
|
return
|
||||||
|
pending = [h for h in registry["hypotheses"] if h["status"] == "pending"]
|
||||||
|
if not pending:
|
||||||
|
return
|
||||||
|
to_promote = max(pending, key=lambda h: h.get("priority", 0))
|
||||||
|
to_promote["status"] = "running"
|
||||||
|
print(f"\n Promoted pending hypothesis to running: {to_promote['id']}", flush=True)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
registry = load_registry()
|
||||||
|
filter_id = os.environ.get("FILTER_ID", "").strip()
|
||||||
|
|
||||||
|
hypotheses = registry.get("hypotheses", [])
|
||||||
|
running = [
|
||||||
|
h
|
||||||
|
for h in hypotheses
|
||||||
|
if h["status"] == "running" and (not filter_id or h["id"] == filter_id)
|
||||||
|
]
|
||||||
|
|
||||||
|
if not running:
|
||||||
|
print("No running hypotheses to process.", flush=True)
|
||||||
|
else:
|
||||||
|
run(["git", "worktree", "prune"], check=False)
|
||||||
|
for hyp in running:
|
||||||
|
try:
|
||||||
|
run_hypothesis(hyp)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Error processing {hyp['id']}: {e}", flush=True)
|
||||||
|
|
||||||
|
promote_pending(registry)
|
||||||
|
save_registry(registry)
|
||||||
|
print("\nRegistry updated.", flush=True)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,135 @@
|
||||||
|
"""Tests for the hypothesis comparison script."""
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from datetime import date, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from scripts.compare_hypothesis import (
|
||||||
|
compute_metrics,
|
||||||
|
compute_7d_return,
|
||||||
|
load_baseline_metrics,
|
||||||
|
make_decision,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ── compute_metrics ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def test_compute_metrics_empty():
|
||||||
|
result = compute_metrics([])
|
||||||
|
assert result == {"count": 0, "evaluated": 0, "win_rate": None, "avg_return": None}
|
||||||
|
|
||||||
|
|
||||||
|
def test_compute_metrics_all_wins():
|
||||||
|
picks = [
|
||||||
|
{"return_7d": 5.0, "win_7d": True},
|
||||||
|
{"return_7d": 3.0, "win_7d": True},
|
||||||
|
]
|
||||||
|
result = compute_metrics(picks)
|
||||||
|
assert result["win_rate"] == 100.0
|
||||||
|
assert result["avg_return"] == 4.0
|
||||||
|
assert result["evaluated"] == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_compute_metrics_mixed():
|
||||||
|
picks = [
|
||||||
|
{"return_7d": 10.0, "win_7d": True},
|
||||||
|
{"return_7d": -5.0, "win_7d": False},
|
||||||
|
{"return_7d": None, "win_7d": None}, # pending — excluded
|
||||||
|
]
|
||||||
|
result = compute_metrics(picks)
|
||||||
|
assert result["win_rate"] == 50.0
|
||||||
|
assert result["avg_return"] == 2.5
|
||||||
|
assert result["evaluated"] == 2
|
||||||
|
assert result["count"] == 3
|
||||||
|
|
||||||
|
|
||||||
|
# ── compute_7d_return ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def test_compute_7d_return_positive():
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
close_data = [100.0, 101.0, 102.0, 103.0, 104.0, 110.0]
|
||||||
|
mock_df = pd.DataFrame({"Close": close_data})
|
||||||
|
|
||||||
|
with patch("scripts.compare_hypothesis.download_history", return_value=mock_df):
|
||||||
|
ret, win = compute_7d_return("AAPL", "2026-03-01")
|
||||||
|
|
||||||
|
assert ret == pytest.approx(10.0, rel=0.01)
|
||||||
|
assert win is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_compute_7d_return_empty_data():
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
mock_df = pd.DataFrame()
|
||||||
|
|
||||||
|
with patch("scripts.compare_hypothesis.download_history", return_value=mock_df):
|
||||||
|
ret, win = compute_7d_return("AAPL", "2026-03-01")
|
||||||
|
|
||||||
|
assert ret is None
|
||||||
|
assert win is None
|
||||||
|
|
||||||
|
|
||||||
|
# ── load_baseline_metrics ────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def test_load_baseline_metrics(tmp_path):
|
||||||
|
db = {
|
||||||
|
"recommendations_by_date": {
|
||||||
|
"2026-03-01": [
|
||||||
|
{"strategy_match": "options_flow", "return_7d": 5.0, "win_7d": True},
|
||||||
|
{"strategy_match": "options_flow", "return_7d": -2.0, "win_7d": False},
|
||||||
|
{"strategy_match": "reddit_dd", "return_7d": 3.0, "win_7d": True},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
db_file = tmp_path / "performance_database.json"
|
||||||
|
db_file.write_text(json.dumps(db))
|
||||||
|
|
||||||
|
result = load_baseline_metrics("options_flow", str(db_file))
|
||||||
|
|
||||||
|
assert result["win_rate"] == 50.0
|
||||||
|
assert result["avg_return"] == 1.5
|
||||||
|
assert result["count"] == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_baseline_metrics_missing_file(tmp_path):
|
||||||
|
result = load_baseline_metrics("options_flow", str(tmp_path / "missing.json"))
|
||||||
|
assert result == {"count": 0, "win_rate": None, "avg_return": None}
|
||||||
|
|
||||||
|
|
||||||
|
# ── make_decision ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def test_make_decision_accepted_by_win_rate():
|
||||||
|
hyp = {"win_rate": 60.0, "avg_return": 0.5, "evaluated": 10}
|
||||||
|
baseline = {"win_rate": 50.0, "avg_return": 0.5}
|
||||||
|
decision, reason = make_decision(hyp, baseline)
|
||||||
|
assert decision == "accepted"
|
||||||
|
assert "win rate" in reason.lower()
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_decision_accepted_by_return():
|
||||||
|
hyp = {"win_rate": 52.0, "avg_return": 3.0, "evaluated": 10}
|
||||||
|
baseline = {"win_rate": 50.0, "avg_return": 1.5}
|
||||||
|
decision, reason = make_decision(hyp, baseline)
|
||||||
|
assert decision == "accepted"
|
||||||
|
assert "return" in reason.lower()
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_decision_rejected():
|
||||||
|
hyp = {"win_rate": 48.0, "avg_return": 0.2, "evaluated": 10}
|
||||||
|
baseline = {"win_rate": 50.0, "avg_return": 1.0}
|
||||||
|
decision, reason = make_decision(hyp, baseline)
|
||||||
|
assert decision == "rejected"
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_decision_insufficient_data():
|
||||||
|
hyp = {"win_rate": 80.0, "avg_return": 5.0, "evaluated": 2}
|
||||||
|
baseline = {"win_rate": 50.0, "avg_return": 1.0}
|
||||||
|
decision, reason = make_decision(hyp, baseline)
|
||||||
|
assert decision == "rejected"
|
||||||
|
assert "insufficient" in reason.lower()
|
||||||
|
|
@ -0,0 +1,73 @@
|
||||||
|
"""Tests for the hypotheses dashboard page data loading."""
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from tradingagents.ui.pages.hypotheses import (
|
||||||
|
load_active_hypotheses,
|
||||||
|
load_concluded_hypotheses,
|
||||||
|
days_until_ready,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_active_hypotheses(tmp_path):
|
||||||
|
active = {
|
||||||
|
"max_active": 5,
|
||||||
|
"hypotheses": [
|
||||||
|
{
|
||||||
|
"id": "options_flow-test",
|
||||||
|
"title": "Test hypothesis",
|
||||||
|
"scanner": "options_flow",
|
||||||
|
"status": "running",
|
||||||
|
"priority": 7,
|
||||||
|
"days_elapsed": 5,
|
||||||
|
"min_days": 14,
|
||||||
|
"created_at": "2026-04-01",
|
||||||
|
"picks_log": ["2026-04-01"] * 5,
|
||||||
|
"conclusion": None,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
f = tmp_path / "active.json"
|
||||||
|
f.write_text(json.dumps(active))
|
||||||
|
result = load_active_hypotheses(str(f))
|
||||||
|
assert len(result) == 1
|
||||||
|
assert result[0]["id"] == "options_flow-test"
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_active_hypotheses_missing_file(tmp_path):
|
||||||
|
result = load_active_hypotheses(str(tmp_path / "missing.json"))
|
||||||
|
assert result == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_concluded_hypotheses(tmp_path):
|
||||||
|
doc = tmp_path / "2026-04-10-options_flow-test.md"
|
||||||
|
doc.write_text(
|
||||||
|
"# Hypothesis: Test\n\n"
|
||||||
|
"**Scanner:** options_flow\n"
|
||||||
|
"**Period:** 2026-03-27 → 2026-04-10 (14 days)\n"
|
||||||
|
"**Outcome:** accepted ✅\n"
|
||||||
|
)
|
||||||
|
results = load_concluded_hypotheses(str(tmp_path))
|
||||||
|
assert len(results) == 1
|
||||||
|
assert results[0]["filename"] == doc.name
|
||||||
|
assert results[0]["outcome"] == "accepted ✅"
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_concluded_hypotheses_empty_dir(tmp_path):
|
||||||
|
results = load_concluded_hypotheses(str(tmp_path))
|
||||||
|
assert results == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_days_until_ready_has_days_left():
|
||||||
|
hyp = {"days_elapsed": 5, "min_days": 14}
|
||||||
|
assert days_until_ready(hyp) == 9
|
||||||
|
|
||||||
|
|
||||||
|
def test_days_until_ready_past_due():
|
||||||
|
hyp = {"days_elapsed": 15, "min_days": 14}
|
||||||
|
assert days_until_ready(hyp) == 0
|
||||||
|
|
@ -52,7 +52,7 @@ def render_sidebar():
|
||||||
# Navigation
|
# Navigation
|
||||||
page = st.radio(
|
page = st.radio(
|
||||||
"Navigation",
|
"Navigation",
|
||||||
options=["Overview", "Signals", "Portfolio", "Performance", "Config"],
|
options=["Overview", "Signals", "Portfolio", "Performance", "Hypotheses", "Config"],
|
||||||
label_visibility="collapsed",
|
label_visibility="collapsed",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -116,6 +116,7 @@ def route_page(page):
|
||||||
"Signals": pages.todays_picks,
|
"Signals": pages.todays_picks,
|
||||||
"Portfolio": pages.portfolio,
|
"Portfolio": pages.portfolio,
|
||||||
"Performance": pages.performance,
|
"Performance": pages.performance,
|
||||||
|
"Hypotheses": pages.hypotheses,
|
||||||
"Config": pages.settings,
|
"Config": pages.settings,
|
||||||
}
|
}
|
||||||
module = page_map.get(page)
|
module = page_map.get(page)
|
||||||
|
|
|
||||||
|
|
@ -39,6 +39,12 @@ except Exception as _e:
|
||||||
_logger.error("Failed to import settings page: %s", _e, exc_info=True)
|
_logger.error("Failed to import settings page: %s", _e, exc_info=True)
|
||||||
settings = None
|
settings = None
|
||||||
|
|
||||||
|
try:
|
||||||
|
from tradingagents.ui.pages import hypotheses
|
||||||
|
except Exception as _e:
|
||||||
|
_logger.error("Failed to import hypotheses page: %s", _e, exc_info=True)
|
||||||
|
hypotheses = None
|
||||||
|
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"home",
|
"home",
|
||||||
|
|
@ -46,4 +52,5 @@ __all__ = [
|
||||||
"portfolio",
|
"portfolio",
|
||||||
"performance",
|
"performance",
|
||||||
"settings",
|
"settings",
|
||||||
|
"hypotheses",
|
||||||
]
|
]
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,171 @@
|
||||||
|
"""
|
||||||
|
Hypotheses dashboard page — tracks active and concluded experiments.
|
||||||
|
|
||||||
|
Reads docs/iterations/hypotheses/active.json and the concluded/ directory.
|
||||||
|
No external API calls; all data is file-based.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, List
|
||||||
|
|
||||||
|
import streamlit as st
|
||||||
|
|
||||||
|
from tradingagents.ui.theme import COLORS, page_header
|
||||||
|
|
||||||
|
_REPO_ROOT = Path(__file__).parent.parent.parent.parent
|
||||||
|
_ACTIVE_JSON = _REPO_ROOT / "docs/iterations/hypotheses/active.json"
|
||||||
|
_CONCLUDED_DIR = _REPO_ROOT / "docs/iterations/hypotheses/concluded"
|
||||||
|
|
||||||
|
|
||||||
|
def load_active_hypotheses(active_path: str = str(_ACTIVE_JSON)) -> List[Dict[str, Any]]:
|
||||||
|
"""Load all hypotheses from active.json. Returns [] if file missing."""
|
||||||
|
path = Path(active_path)
|
||||||
|
if not path.exists():
|
||||||
|
return []
|
||||||
|
try:
|
||||||
|
with open(path) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
return data.get("hypotheses", [])
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def load_concluded_hypotheses(concluded_dir: str = str(_CONCLUDED_DIR)) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Load concluded hypothesis metadata by parsing markdown files in concluded/.
|
||||||
|
Extracts: filename, title, scanner, period, outcome.
|
||||||
|
"""
|
||||||
|
dir_path = Path(concluded_dir)
|
||||||
|
if not dir_path.exists():
|
||||||
|
return []
|
||||||
|
results = []
|
||||||
|
for md_file in sorted(dir_path.glob("*.md"), reverse=True):
|
||||||
|
if md_file.name == ".gitkeep":
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
text = md_file.read_text()
|
||||||
|
title = _extract_md_field(text, r"^# Hypothesis: (.+)$")
|
||||||
|
scanner = _extract_md_field(text, r"^\*\*Scanner:\*\* (.+)$")
|
||||||
|
period = _extract_md_field(text, r"^\*\*Period:\*\* (.+)$")
|
||||||
|
outcome = _extract_md_field(text, r"^\*\*Outcome:\*\* (.+)$")
|
||||||
|
results.append({
|
||||||
|
"filename": md_file.name,
|
||||||
|
"title": title or md_file.stem,
|
||||||
|
"scanner": scanner or "—",
|
||||||
|
"period": period or "—",
|
||||||
|
"outcome": outcome or "—",
|
||||||
|
})
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_md_field(text: str, pattern: str) -> str:
|
||||||
|
"""Extract a field value from a markdown line using regex."""
|
||||||
|
match = re.search(pattern, text, re.MULTILINE)
|
||||||
|
return match.group(1).strip() if match else ""
|
||||||
|
|
||||||
|
|
||||||
|
def days_until_ready(hyp: Dict[str, Any]) -> int:
|
||||||
|
"""Return number of days remaining before hypothesis can conclude (min 0)."""
|
||||||
|
return max(0, hyp.get("min_days", 14) - hyp.get("days_elapsed", 0))
|
||||||
|
|
||||||
|
|
||||||
|
def render() -> None:
|
||||||
|
"""Render the hypotheses tracking page."""
|
||||||
|
st.markdown(
|
||||||
|
page_header("Hypotheses", "Active experiments & concluded findings"),
|
||||||
|
unsafe_allow_html=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
hypotheses = load_active_hypotheses()
|
||||||
|
concluded = load_concluded_hypotheses()
|
||||||
|
|
||||||
|
if not hypotheses and not concluded:
|
||||||
|
st.info(
|
||||||
|
"No hypotheses yet. Run `/backtest-hypothesis \"<description>\"` to start an experiment."
|
||||||
|
)
|
||||||
|
return
|
||||||
|
|
||||||
|
running = [h for h in hypotheses if h["status"] == "running"]
|
||||||
|
pending = [h for h in hypotheses if h["status"] == "pending"]
|
||||||
|
|
||||||
|
st.markdown(
|
||||||
|
f'<div class="section-title">Active Experiments '
|
||||||
|
f'<span class="accent">// {len(running)} running, {len(pending)} pending</span></div>',
|
||||||
|
unsafe_allow_html=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
if running or pending:
|
||||||
|
import pandas as pd
|
||||||
|
active_rows = []
|
||||||
|
for h in sorted(running + pending, key=lambda x: -x.get("priority", 0)):
|
||||||
|
days_left = days_until_ready(h)
|
||||||
|
ready_str = "concluding soon" if days_left == 0 else f"{days_left}d left"
|
||||||
|
active_rows.append({
|
||||||
|
"ID": h["id"],
|
||||||
|
"Title": h.get("title", "—"),
|
||||||
|
"Scanner": h.get("scanner", "—"),
|
||||||
|
"Status": h["status"],
|
||||||
|
"Progress": f"{h.get('days_elapsed', 0)}/{h.get('min_days', 14)}d",
|
||||||
|
"Picks": len(h.get("picks_log", [])),
|
||||||
|
"Ready": ready_str,
|
||||||
|
"Priority": h.get("priority", "—"),
|
||||||
|
})
|
||||||
|
df = pd.DataFrame(active_rows)
|
||||||
|
st.dataframe(
|
||||||
|
df,
|
||||||
|
width="stretch",
|
||||||
|
hide_index=True,
|
||||||
|
column_config={
|
||||||
|
"ID": st.column_config.TextColumn(width="medium"),
|
||||||
|
"Title": st.column_config.TextColumn(width="large"),
|
||||||
|
"Scanner": st.column_config.TextColumn(width="medium"),
|
||||||
|
"Status": st.column_config.TextColumn(width="small"),
|
||||||
|
"Progress": st.column_config.TextColumn(width="small"),
|
||||||
|
"Picks": st.column_config.NumberColumn(format="%d", width="small"),
|
||||||
|
"Ready": st.column_config.TextColumn(width="medium"),
|
||||||
|
"Priority": st.column_config.NumberColumn(format="%d/9", width="small"),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
st.info("No active experiments.")
|
||||||
|
|
||||||
|
st.markdown("<div style='height:1.5rem;'></div>", unsafe_allow_html=True)
|
||||||
|
|
||||||
|
st.markdown(
|
||||||
|
f'<div class="section-title">Concluded Experiments '
|
||||||
|
f'<span class="accent">// {len(concluded)} total</span></div>',
|
||||||
|
unsafe_allow_html=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
if concluded:
|
||||||
|
import pandas as pd
|
||||||
|
concluded_rows = []
|
||||||
|
for c in concluded:
|
||||||
|
outcome = c["outcome"]
|
||||||
|
emoji = "✅" if "accepted" in outcome else "❌"
|
||||||
|
concluded_rows.append({
|
||||||
|
"Date": c["filename"][:10],
|
||||||
|
"Title": c["title"],
|
||||||
|
"Scanner": c["scanner"],
|
||||||
|
"Period": c["period"],
|
||||||
|
"Outcome": emoji,
|
||||||
|
})
|
||||||
|
cdf = pd.DataFrame(concluded_rows)
|
||||||
|
st.dataframe(
|
||||||
|
cdf,
|
||||||
|
width="stretch",
|
||||||
|
hide_index=True,
|
||||||
|
column_config={
|
||||||
|
"Date": st.column_config.TextColumn(width="small"),
|
||||||
|
"Title": st.column_config.TextColumn(width="large"),
|
||||||
|
"Scanner": st.column_config.TextColumn(width="medium"),
|
||||||
|
"Period": st.column_config.TextColumn(width="medium"),
|
||||||
|
"Outcome": st.column_config.TextColumn(width="small"),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
st.info("No concluded experiments yet.")
|
||||||
Loading…
Reference in New Issue