feat: add global search graph scanners

This commit is contained in:
Ahmet Guzererler 2026-03-28 07:02:45 +01:00
parent 3dd396e194
commit 756d8358d7
21 changed files with 895 additions and 31 deletions

View File

@ -0,0 +1,148 @@
---
type: decision
status: active
date: 2026-03-27
agent_author: "codex"
tags: [scanner, gap-detector, ranking, yfinance, finviz]
related_files: [tradingagents/dataflows/yfinance_scanner.py, tradingagents/agents/utils/scanner_tools.py, tradingagents/graph/scanner_graph.py, tradingagents/agents/scanners/macro_synthesis.py]
---
## Context
The scanner now has stronger global-only stages for factor alignment and drift context, but three gaps remain between the current implementation and the target 13 month discovery engine:
1. The system does not compute a real market-data gap signal from OHLC data.
2. Candidate overlap is still mostly resolved by LLM synthesis rather than a deterministic ranking layer.
3. True analyst revision diffusion is still missing because we do not yet have the right structured data source.
The user explicitly wants:
- documentation first
- a live integration test before any gap-detector tool is implemented
- true revision diffusion deferred to a future section until a suitable data source exists
## Capability Check
### yfinance
`yfinance` does not expose a dedicated “gap detector” helper or preset screener. However, it does expose the raw ingredients needed to compute one:
- bounded candidate discovery via `yf.screen(...)`
- OHLC data via `yf.download(...)` / `Ticker.history(...)`
This is sufficient to build a deterministic gap detector from:
```text
gap_pct = (today_open - previous_close) / previous_close
```
plus optional confirmation such as relative volume and intraday hold (`close_vs_open`).
### Finviz / finvizfinance
In this environment, `finvizfinance` is not installed, so we cannot rely on it for immediate implementation or live local inspection.
Separately, Finvizs screener documentation confirms that the platform has:
- a `Gap` technical field
- `Unusual Volume`
- `Upgrades` / `Downgrades`
- earnings-before / earnings-after signals
This means Finviz remains a viable future gap source, but not the first implementation path here.
## The Decision
### 1. Build the first real gap detector on yfinance, not Finviz
Reason:
- available locally now
- deterministic from raw market data
- no dependence on scraper-specific filter strings
- easier to validate with a no-mock live integration test
### 2. Require a live integration test before the tool exists
The first implementation step is a live test that proves the chosen data path can produce a valid gap calculation from real data without mocks. The tool must only be added after this test passes.
### 3. Add a deterministic candidate-ranking layer
Golden Overlap should not be left entirely to LLM synthesis. We will add a Python ranking layer that rewards names appearing across:
- leading sectors
- market-data gap / continuation signals
- smart-money signals
- geopolitical or macro-theme support
- factor-alignment support
The LLM should explain and package the result, not invent the overlap rule.
### 4. Defer true revision diffusion
True revision diffusion requires structured analyst estimate data. We do not have that source yet, so this stays in an upcoming section rather than being approximated further.
## Planned Implementation
### Phase A — live validation
Add a live integration test that:
1. fetches a bounded real universe from `yfinance` screeners
2. downloads recent OHLC data for that universe
3. computes gap percentage from open vs previous close
4. verifies the test returns a structurally valid table of candidates or an explicit “no candidates today” result
No mocks are allowed in this test.
### Phase B — real gap detector
After live validation passes:
1. add a `yfinance_scanner` function that computes real gap candidates
2. expose it through `scanner_tools.py`
3. use it inside the drift scanner instead of gap-themed news alone
The first version should stay bounded:
- use a short universe from movers / actives / gainers
- avoid full-market scans
- filter by meaningful absolute gap and liquidity thresholds
### Phase C — candidate-ranking layer
Add a deterministic ranking step before final synthesis:
1. collect candidate tickers from each scanner stream
2. normalize them into a merged Python structure
3. score them by overlap across streams
4. pass the ranked top set into macro synthesis
The scoring model should be explicit and testable.
## Upcoming (Deferred)
### True Revision-Diffusion Metric
Deferred until we have a source that exposes:
- upward vs downward estimate revisions
- analyst coverage counts
- multi-period estimate drift
- ideally rating-change metadata
Until then, factor alignment remains a qualitative / news-backed approximation and must not be described as true diffusion.
## Constraints
- Do not implement per-ticker fan-out across the whole market.
- Do not claim revision diffusion exists until the structured data source exists.
- Do not introduce a gap-detector tool before the no-mock live integration test proves the data path.
- Keep overlap scoring deterministic and inspectable in Python.
## Actionable Rules
- Prefer `yfinance` for the first real gap implementation.
- Keep the candidate universe bounded and cheap.
- Use Finviz only as a future enhancement path unless `finvizfinance` is installed and validated.
- Treat revision diffusion as upcoming work, not current capability.

View File

@ -0,0 +1,75 @@
"""Live integration test for the real gap-detection data path.
This test intentionally exercises the raw yfinance path with no mocks before
the scanner tool is relied upon by the agent layer.
"""
import pytest
pytestmark = pytest.mark.integration
@pytest.mark.integration
def test_yfinance_gap_detection_data_path():
import yfinance as yf
screen = yf.screen("MOST_ACTIVES", count=10)
assert isinstance(screen, dict)
quotes = screen.get("quotes", [])
assert quotes, "MOST_ACTIVES returned no quotes"
symbols = []
for quote in quotes:
symbol = quote.get("symbol")
if symbol and symbol not in symbols:
symbols.append(symbol)
if len(symbols) == 5:
break
assert symbols, "No symbols extracted from screen results"
hist = yf.download(
symbols,
period="5d",
interval="1d",
auto_adjust=False,
progress=False,
threads=True,
)
assert not hist.empty, "download returned no OHLC data"
gap_rows = []
for symbol in symbols:
try:
opens = hist["Open"][symbol].dropna()
closes = hist["Close"][symbol].dropna()
except KeyError:
continue
if len(opens) < 1 or len(closes) < 2:
continue
today_open = float(opens.iloc[-1])
prev_close = float(closes.iloc[-2])
if prev_close == 0:
continue
gap_pct = (today_open - prev_close) / prev_close * 100
gap_rows.append((symbol, gap_pct))
assert gap_rows, "Could not compute any real gap percentages from live OHLC data"
assert all(isinstance(symbol, str) and isinstance(gap_pct, float) for symbol, gap_pct in gap_rows)
@pytest.mark.integration
def test_gap_candidates_tool_live():
from tradingagents.agents.utils.scanner_tools import get_gap_candidates
result = get_gap_candidates.invoke({})
assert isinstance(result, str)
assert (
"# Gap Candidates" in result
or "No stocks matched the live gap criteria today." in result
or "No stocks matched the live gap universe today." in result
)

View File

@ -76,12 +76,29 @@ class TestUsageEstimate:
class TestEstimateAnalyze:
def test_default_config_no_av_calls(self):
"""With default config (yfinance primary), AV calls should be 0."""
def test_default_config_uses_yfinance(self):
"""Default analyze path should materially use yfinance."""
est = estimate_analyze()
assert est.vendor_calls.alpha_vantage == 0
assert est.vendor_calls.yfinance > 0
def test_explicit_yfinance_config_has_no_av_calls(self):
"""A pure yfinance config should keep Alpha Vantage at zero."""
cfg = {
"data_vendors": {
"core_stock_apis": "yfinance",
"technical_indicators": "yfinance",
"fundamental_data": "yfinance",
"news_data": "yfinance",
"scanner_data": "yfinance",
"calendar_data": "finnhub",
},
"tool_vendors": {
"get_insider_transactions": "finnhub",
},
}
est = estimate_analyze(config=cfg)
assert est.vendor_calls.alpha_vantage == 0
def test_all_analysts_nonzero_total(self):
est = estimate_analyze(selected_analysts=["market", "news", "fundamentals", "social"])
assert est.vendor_calls.total > 0
@ -141,19 +158,23 @@ class TestEstimateScan:
assert est.vendor_calls.yfinance > 0
def test_finnhub_for_calendars(self):
"""Calendars should always use Finnhub."""
"""Global bounded scanners should add Finnhub earnings-calendar usage."""
est = estimate_scan()
assert est.vendor_calls.finnhub >= 2 # earnings + economic calendar
assert est.vendor_calls.finnhub >= 2
def test_scan_total_reasonable(self):
est = estimate_scan()
# Should be between 15-40 calls total
# Global-only scanner remains bounded despite added nodes.
assert 10 <= est.vendor_calls.total <= 50
def test_notes_have_phases(self):
est = estimate_scan()
phase_notes = [n for n in est.notes if "Phase" in n]
assert len(phase_notes) >= 3 # Phase 1A, 1B, 1C, 2, 3
assert len(phase_notes) >= 5
def test_macro_synthesis_has_no_external_calls(self):
est = estimate_scan()
assert any("Macro Synthesis" in note and "no external tool calls" in note for note in est.notes)
# ──────────────────────────────────────────────────────────────────────────────
@ -201,10 +222,10 @@ class TestFormatEstimate:
assert "yfinance" in text
assert "Total:" in text
def test_no_av_shows_not_needed(self):
est = estimate_analyze() # default config → no AV
def test_default_format_includes_av_assessment(self):
est = estimate_analyze()
text = format_estimate(est)
assert "NOT needed" in text
assert "Alpha Vantage Assessment" in text
def test_av_shows_assessment(self):
av_config = {

View File

@ -0,0 +1,132 @@
from types import SimpleNamespace
from unittest.mock import patch
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.runnables import Runnable
from tradingagents.agents.scanners.drift_scanner import create_drift_scanner
from tradingagents.agents.scanners.factor_alignment_scanner import (
create_factor_alignment_scanner,
)
class MockRunnable(Runnable):
def __init__(self, invoke_responses):
self.invoke_responses = invoke_responses
self.call_count = 0
def invoke(self, input, config=None, **kwargs):
response = self.invoke_responses[self.call_count]
self.call_count += 1
return response
class MockLLM(Runnable):
def __init__(self, invoke_responses):
self.runnable = MockRunnable(invoke_responses)
self.tools_bound = None
def invoke(self, input, config=None, **kwargs):
return self.runnable.invoke(input, config=config, **kwargs)
def bind_tools(self, tools):
self.tools_bound = tools
return self.runnable
def _base_state():
return {
"messages": [HumanMessage(content="Run the market scan.")],
"scan_date": "2026-03-27",
"sector_performance_report": "| Sector | 1-Month % |\n| Technology | +5.0% |",
"market_movers_report": "| Symbol | Change % |\n| NVDA | +4.0% |",
}
def test_factor_alignment_scanner_end_to_end():
llm = MockLLM(
[
AIMessage(
content="",
tool_calls=[
{"name": "get_topic_news", "args": {"topic": "analyst upgrades downgrades", "limit": 3}, "id": "tc1"},
{"name": "get_topic_news", "args": {"topic": "earnings estimate revisions", "limit": 3}, "id": "tc2"},
{"name": "get_earnings_calendar", "args": {"from_date": "2026-03-27", "to_date": "2026-04-17"}, "id": "tc3"},
],
),
AIMessage(content="Factor alignment report with globally surfaced tickers."),
]
)
topic_tool = SimpleNamespace(
name="get_topic_news",
invoke=lambda args: "analyst news" if "analyst" in args["topic"] else "revision news",
)
earnings_tool = SimpleNamespace(
name="get_earnings_calendar",
invoke=lambda args: "earnings calendar",
)
with patch(
"tradingagents.agents.scanners.factor_alignment_scanner.get_topic_news",
topic_tool,
), patch(
"tradingagents.agents.scanners.factor_alignment_scanner.get_earnings_calendar",
earnings_tool,
):
node = create_factor_alignment_scanner(llm)
result = node(_base_state())
assert "Factor alignment report" in result["factor_alignment_report"]
assert result["sender"] == "factor_alignment_scanner"
assert [tool.name for tool in llm.tools_bound] == ["get_topic_news", "get_earnings_calendar"]
def test_drift_scanner_end_to_end():
llm = MockLLM(
[
AIMessage(
content="",
tool_calls=[
{"name": "get_gap_candidates", "args": {}, "id": "tc1"},
{"name": "get_topic_news", "args": {"topic": "earnings beats raised guidance", "limit": 3}, "id": "tc2"},
{"name": "get_earnings_calendar", "args": {"from_date": "2026-03-27", "to_date": "2026-04-10"}, "id": "tc3"},
],
),
AIMessage(content="Drift opportunities report with continuation setups."),
]
)
gap_tool = SimpleNamespace(
name="get_gap_candidates",
invoke=lambda args: "gap candidates table",
)
topic_tool = SimpleNamespace(
name="get_topic_news",
invoke=lambda args: "continuation news",
)
earnings_tool = SimpleNamespace(
name="get_earnings_calendar",
invoke=lambda args: "earnings calendar",
)
with patch(
"tradingagents.agents.scanners.drift_scanner.get_gap_candidates",
gap_tool,
), patch(
"tradingagents.agents.scanners.drift_scanner.get_topic_news",
topic_tool,
), patch(
"tradingagents.agents.scanners.drift_scanner.get_earnings_calendar",
earnings_tool,
):
node = create_drift_scanner(llm)
result = node(_base_state())
assert "Drift opportunities report" in result["drift_opportunities_report"]
assert result["sender"] == "drift_scanner"
assert [tool.name for tool in llm.tools_bound] == [
"get_gap_candidates",
"get_topic_news",
"get_earnings_calendar",
]

View File

@ -48,7 +48,7 @@ SAMPLE_SECTOR_REPORT = """\
class TestExtractTopSectors:
"""Verify _extract_top_sectors parses the table correctly."""
def test_returns_top_3_by_absolute_1month(self):
def test_returns_top_3_by_positive_1month(self):
result = _extract_top_sectors(SAMPLE_SECTOR_REPORT, top_n=3)
assert len(result) == 3
assert result[0] == "energy"
@ -73,8 +73,8 @@ class TestExtractTopSectors:
result = _extract_top_sectors("not a table at all\njust random text", top_n=3)
assert result == VALID_SECTOR_KEYS[:3]
def test_negative_returns_sorted_by_absolute_value(self):
"""Sectors with large negative moves should rank high (big movers)."""
def test_negative_returns_do_not_outrank_positive_tailwinds(self):
"""Large drawdowns should not displace positive leadership."""
report = """\
| Sector | 1-Day % | 1-Week % | 1-Month % | YTD % |
|--------|---------|----------|-----------|-------|
@ -83,7 +83,7 @@ class TestExtractTopSectors:
| Healthcare | +0.05% | +0.10% | +0.50% | +1.00% |
"""
result = _extract_top_sectors(report, top_n=2)
assert result[0] == "energy"
assert result == ["technology", "healthcare"]
def test_all_returned_keys_are_valid(self):
result = _extract_top_sectors(SAMPLE_SECTOR_REPORT, top_n=11)

View File

@ -0,0 +1,38 @@
from tradingagents.agents.scanners.macro_synthesis import (
_build_candidate_rankings,
_extract_rankable_tickers,
_format_horizon_label,
)
def test_format_horizon_label_supported_values():
assert _format_horizon_label(30) == "1 month"
assert _format_horizon_label(60) == "2 months"
assert _format_horizon_label(90) == "3 months"
def test_format_horizon_label_unsupported_defaults_to_one_month():
assert _format_horizon_label(45) == "1 month"
def test_extract_rankable_tickers_filters_noise():
tickers = _extract_rankable_tickers(
"NVDA and AAPL look strong; GDP and JSON are not tickers. MSFT also appears."
)
assert {"NVDA", "AAPL", "MSFT"} <= tickers
assert "GDP" not in tickers
assert "JSON" not in tickers
def test_build_candidate_rankings_rewards_overlap():
state = {
"market_movers_report": "NVDA AAPL",
"smart_money_report": "NVDA",
"factor_alignment_report": "NVDA MSFT",
"drift_opportunities_report": "NVDA AAPL",
"industry_deep_dive_report": "MSFT",
}
ranked = _build_candidate_rankings(state)
assert ranked[0]["ticker"] == "NVDA"
assert ranked[0]["score"] > ranked[1]["score"]

View File

@ -44,6 +44,8 @@ def test_scanner_setup_compiles_graph():
"geopolitical_scanner": MagicMock(),
"market_movers_scanner": MagicMock(),
"sector_scanner": MagicMock(),
"factor_alignment_scanner": MagicMock(),
"drift_scanner": MagicMock(),
"smart_money_scanner": MagicMock(),
"industry_deep_dive": MagicMock(),
"macro_synthesis": MagicMock(),

View File

@ -156,6 +156,46 @@ class TestYfinanceScannerMarketMovers:
assert "Error" in result
class TestYfinanceScannerGapCandidates:
"""Offline tests for get_gap_candidates_yfinance."""
def _quote(self, symbol, open_price, prev_close, volume=2_000_000, avg_volume=1_000_000, price=25.0, change_pct=4.0):
return {
"symbol": symbol,
"shortName": f"{symbol} Inc",
"regularMarketOpen": open_price,
"regularMarketPreviousClose": prev_close,
"regularMarketVolume": volume,
"averageDailyVolume3Month": avg_volume,
"regularMarketPrice": price,
"regularMarketChangePercent": change_pct,
}
def test_returns_gap_table(self):
from tradingagents.dataflows.yfinance_scanner import get_gap_candidates_yfinance
screen_data = {
"quotes": [
self._quote("NVDA", 110.0, 100.0),
self._quote("AAPL", 103.0, 100.0),
]
}
with patch("tradingagents.dataflows.yfinance_scanner.yf.screen", return_value=screen_data):
result = get_gap_candidates_yfinance()
assert "# Gap Candidates" in result
assert "NVDA" in result
def test_returns_no_match_message_when_filters_fail(self):
from tradingagents.dataflows.yfinance_scanner import get_gap_candidates_yfinance
screen_data = {"quotes": [self._quote("AAPL", 100.5, 100.0, avg_volume=10_000_000)]}
with patch("tradingagents.dataflows.yfinance_scanner.yf.screen", return_value=screen_data):
result = get_gap_candidates_yfinance()
assert "No stocks matched the live gap criteria today." in result
# ---------------------------------------------------------------------------
# yfinance scanner — get_market_indices_yfinance
# ---------------------------------------------------------------------------

View File

@ -1,6 +1,8 @@
from .geopolitical_scanner import create_geopolitical_scanner
from .market_movers_scanner import create_market_movers_scanner
from .sector_scanner import create_sector_scanner
from .factor_alignment_scanner import create_factor_alignment_scanner
from .drift_scanner import create_drift_scanner
from .smart_money_scanner import create_smart_money_scanner
from .industry_deep_dive import create_industry_deep_dive
from .macro_synthesis import create_macro_synthesis

View File

@ -0,0 +1,77 @@
from datetime import datetime, timedelta, timezone
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from tradingagents.agents.utils.scanner_tools import (
get_earnings_calendar,
get_gap_candidates,
get_topic_news,
)
from tradingagents.agents.utils.tool_runner import run_tool_loop
def create_drift_scanner(llm):
def drift_scanner_node(state):
scan_date = state["scan_date"]
tools = [get_gap_candidates, get_topic_news, get_earnings_calendar]
market_context = state.get("market_movers_report", "")
sector_context = state.get("sector_performance_report", "")
context_chunks = []
if market_context:
context_chunks.append(f"Market movers context:\n{market_context}")
if sector_context:
context_chunks.append(f"Sector rotation context:\n{sector_context}")
context_section = f"\n\n{'\n\n'.join(context_chunks)}" if context_chunks else ""
try:
start_date = datetime.strptime(scan_date, "%Y-%m-%d").date()
except ValueError:
start_date = datetime.now(timezone.utc).date()
end_date = start_date + timedelta(days=14)
system_message = (
"You are a drift-window scanner focused on 1-3 month continuation setups. "
"Stay global and bounded: use the existing market movers context, then confirm whether those moves "
"look like the start of a sustained drift rather than one-day noise.\n\n"
"You MUST perform these bounded searches:\n"
"1. Call get_gap_candidates to retrieve real market-data gap candidates.\n"
"2. Call get_topic_news for earnings beats, raised guidance, and positive post-event follow-through.\n"
f"3. Call get_earnings_calendar from {start_date.isoformat()} to {end_date.isoformat()}.\n\n"
"Then write a concise report covering:\n"
"(1) which current movers look most likely to sustain a 1-3 month drift,\n"
"(2) which sectors show the cleanest drift setup rather than short-covering noise,\n"
"(3) 5-8 candidate tickers surfaced globally from the mover context plus catalyst confirmation,\n"
"(4) the key evidence for continuation risk versus reversal risk."
f"{context_section}"
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful AI assistant, collaborating with other assistants."
" Use the provided tools to progress towards answering the question."
" If you are unable to fully answer, that's OK; another assistant with different tools"
" will help where you left off. Execute what you can to make progress."
" You have access to the following tools: {tool_names}.\n{system_message}"
" For your reference, the current date is {current_date}.",
),
MessagesPlaceholder(variable_name="messages"),
]
)
prompt = prompt.partial(system_message=system_message)
prompt = prompt.partial(tool_names=", ".join([tool.name for tool in tools]))
prompt = prompt.partial(current_date=scan_date)
chain = prompt | llm.bind_tools(tools)
result = run_tool_loop(chain, state["messages"], tools)
return {
"messages": [result],
"drift_opportunities_report": result.content or "",
"sender": "drift_scanner",
}
return drift_scanner_node

View File

@ -0,0 +1,71 @@
from datetime import datetime, timedelta, timezone
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from tradingagents.agents.utils.scanner_tools import get_earnings_calendar, get_topic_news
from tradingagents.agents.utils.tool_runner import run_tool_loop
def create_factor_alignment_scanner(llm):
def factor_alignment_scanner_node(state):
scan_date = state["scan_date"]
tools = [get_topic_news, get_earnings_calendar]
sector_context = state.get("sector_performance_report", "")
sector_section = (
f"\n\nSector rotation context from the Sector Scanner:\n{sector_context}"
if sector_context
else ""
)
try:
start_date = datetime.strptime(scan_date, "%Y-%m-%d").date()
except ValueError:
start_date = datetime.now(timezone.utc).date()
end_date = start_date + timedelta(days=21)
system_message = (
"You are a factor strategist looking for global 1-3 month drift signals from analyst sentiment and "
"earnings revision flow. Stay market-wide: do not deep-dive individual tickers one by one.\n\n"
"You MUST perform these bounded searches:\n"
"1. Call get_topic_news on analyst upgrades/downgrades and recommendation changes.\n"
"2. Call get_topic_news on earnings estimate revisions, raised guidance, and estimate cuts.\n"
f"3. Call get_earnings_calendar from {start_date.isoformat()} to {end_date.isoformat()}.\n\n"
"Then write a concise report covering:\n"
"(1) sectors/themes seeing the strongest positive revision breadth,\n"
"(2) sectors/themes with deteriorating revision pressure,\n"
"(3) 5-8 globally surfaced tickers that appear repeatedly in the analyst/revision flow,\n"
"(4) how this factor evidence aligns or conflicts with the sector-tailwind backdrop.\n"
"Prefer names that show both positive analyst tone and upward earnings expectation drift."
f"{sector_section}"
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful AI assistant, collaborating with other assistants."
" Use the provided tools to progress towards answering the question."
" If you are unable to fully answer, that's OK; another assistant with different tools"
" will help where you left off. Execute what you can to make progress."
" You have access to the following tools: {tool_names}.\n{system_message}"
" For your reference, the current date is {current_date}.",
),
MessagesPlaceholder(variable_name="messages"),
]
)
prompt = prompt.partial(system_message=system_message)
prompt = prompt.partial(tool_names=", ".join([tool.name for tool in tools]))
prompt = prompt.partial(current_date=scan_date)
chain = prompt | llm.bind_tools(tools)
result = run_tool_loop(chain, state["messages"], tools)
return {
"messages": [result],
"factor_alignment_report": result.content or "",
"sender": "factor_alignment_scanner",
}
return factor_alignment_scanner_node

View File

@ -40,13 +40,15 @@ _DISPLAY_TO_KEY = {
def _extract_top_sectors(sector_report: str, top_n: int = 3) -> list[str]:
"""Parse the sector performance report and return the *top_n* sector keys
ranked by absolute 1-month performance (largest absolute move first).
ranked by strongest positive 1-month performance.
The sector performance table looks like:
| Technology | +0.45% | +1.20% | +5.67% | +12.3% |
We parse the 1-month column (index 3) and sort by absolute value.
We parse the 1-month column (index 3) and rank sectors by descending
1-month performance so the deep dive follows sector tailwinds. When all
sectors are weak, this still returns the least-bad leaders.
If the report is not a table, it attempts to parse list formats
(bullet points, numbered lists, or plain text).
@ -77,8 +79,7 @@ def _extract_top_sectors(sector_report: str, top_n: int = 3) -> list[str]:
rows.append((key, month_val))
if rows:
# Sort by absolute 1-month move (biggest mover first)
rows.sort(key=lambda r: abs(r[1]), reverse=True)
rows.sort(key=lambda r: r[1], reverse=True)
return [r[0] for r in rows[:top_n]]
# Fallback to parsing text formats: bullet points, numbered lists, plain text
@ -140,6 +141,15 @@ def create_industry_deep_dive(llm):
### Sector Performance Report:
{sector_report or "Not available"}
### Factor Alignment Report:
{state.get("factor_alignment_report", "Not available")}
### Drift Opportunities Report:
{state.get("drift_opportunities_report", "Not available")}
### Smart Money Report:
{state.get("smart_money_report", "Not available")}
"""
sector_list_str = ", ".join(f"'{s}'" for s in top_sectors)

View File

@ -1,19 +1,95 @@
import json
import logging
import re
from collections import defaultdict
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from tradingagents.agents.utils.json_utils import extract_json
logger = logging.getLogger(__name__)
_TICKER_RE = re.compile(r"\b[A-Z]{1,5}\b")
_TICKER_STOPWORDS = {
"A", "I", "AI", "AN", "AND", "ARE", "AS", "AT", "BE", "BY", "END", "ETF",
"GDP", "GICS", "JSON", "LOW", "NFP", "NOT", "NOW", "OIL", "ONLY", "OR",
"THE", "TO", "USD", "VIX", "YTD", "CPI", "PPI", "EPS", "CEO", "CFO", "N/A",
}
def create_macro_synthesis(llm, max_scan_tickers: int = 10):
def _format_horizon_label(scan_horizon_days: int) -> str:
if scan_horizon_days not in (30, 60, 90):
logger.warning(
"macro_synthesis: unsupported scan_horizon_days=%s; defaulting to 30",
scan_horizon_days,
)
scan_horizon_days = 30
if scan_horizon_days == 30:
return "1 month"
if scan_horizon_days == 60:
return "2 months"
return "3 months"
def _extract_rankable_tickers(text: str) -> set[str]:
if not text:
return set()
return {
token
for token in _TICKER_RE.findall(text)
if token not in _TICKER_STOPWORDS and len(token) > 1
}
def _build_candidate_rankings(state: dict, limit: int = 15) -> list[dict[str, object]]:
weighted_sources = [
("market_movers_report", 2, "market_movers"),
("smart_money_report", 2, "smart_money"),
("factor_alignment_report", 3, "factor_alignment"),
("drift_opportunities_report", 3, "drift"),
("industry_deep_dive_report", 1, "industry_deep_dive"),
]
scores: dict[str, int] = defaultdict(int)
sources: dict[str, list[str]] = defaultdict(list)
for state_key, weight, label in weighted_sources:
tickers = _extract_rankable_tickers(state.get(state_key, ""))
for ticker in tickers:
scores[ticker] += weight
sources[ticker].append(label)
ranked = sorted(
(
{
"ticker": ticker,
"score": score,
"sources": sorted(sources[ticker]),
"source_count": len(sources[ticker]),
}
for ticker, score in scores.items()
),
key=lambda row: (row["score"], row["source_count"], row["ticker"]),
reverse=True,
)
return ranked[:limit]
def create_macro_synthesis(llm, max_scan_tickers: int = 10, scan_horizon_days: int = 30):
def macro_synthesis_node(state):
scan_date = state["scan_date"]
horizon_label = _format_horizon_label(scan_horizon_days)
# Inject all previous reports for synthesis — no tools, pure LLM reasoning
smart_money = state.get("smart_money_report", "") or "Not available"
candidate_rankings = _build_candidate_rankings(state)
ranking_section = ""
if candidate_rankings:
ranking_lines = [
f"- {row['ticker']}: score={row['score']} sources={', '.join(row['sources'])}"
for row in candidate_rankings
]
ranking_section = "\n\n### Deterministic Candidate Ranking:\n" + "\n".join(ranking_lines)
all_reports_context = f"""## All Scanner and Research Reports
### Geopolitical Report:
@ -25,17 +101,26 @@ def create_macro_synthesis(llm, max_scan_tickers: int = 10):
### Sector Performance Report:
{state.get("sector_performance_report", "Not available")}
### Factor Alignment Report:
{state.get("factor_alignment_report", "Not available")}
### Drift Opportunities Report:
{state.get("drift_opportunities_report", "Not available")}
### Smart Money Report (Finviz institutional screeners):
{smart_money}
### Industry Deep Dive Report:
{state.get("industry_deep_dive_report", "Not available")}
{ranking_section}
"""
system_message = (
"You are a macro strategist synthesizing all scanner and research reports into a final investment thesis. "
"You have received: geopolitical analysis, market movers analysis, sector performance analysis, "
"smart money institutional screener results, and industry deep dive analysis. "
"A deterministic candidate-ranking snapshot is also provided when available. Treat higher-ranked "
"candidates as preferred because they appeared across more independent scanner streams. "
"## THE GOLDEN OVERLAP (apply when Smart Money Report is available and not 'Not available'):\n"
"Cross-reference the Smart Money tickers with your macro regime thesis. "
"If a Smart Money ticker fits your top-down macro narrative (e.g., an Energy stock with heavy insider "
@ -49,7 +134,7 @@ def create_macro_synthesis(llm, max_scan_tickers: int = 10):
"key_catalysts, and risks. "
"Output your response as valid JSON matching this schema:\n"
"{\n"
' "timeframe": "1 month",\n'
f' "timeframe": "{horizon_label}",\n'
' "executive_summary": "...",\n'
' "macro_context": { "economic_cycle": "...", "central_bank_stance": "...", "geopolitical_risks": [...] },\n'
' "key_themes": [{ "theme": "...", "description": "...", "conviction": "high|medium|low", "timeframe": "..." }],\n'

View File

@ -30,6 +30,8 @@ class ScannerState(MessagesState):
geopolitical_report: Annotated[str, _last_value]
market_movers_report: Annotated[str, _last_value]
sector_performance_report: Annotated[str, _last_value]
factor_alignment_report: Annotated[str, _last_value]
drift_opportunities_report: Annotated[str, _last_value]
smart_money_report: Annotated[str, _last_value]
# Phase 2: Deep dive output

View File

@ -39,6 +39,18 @@ def get_market_indices() -> str:
return route_to_vendor("get_market_indices")
@tool
def get_gap_candidates() -> str:
"""
Get a bounded set of real gap-up candidates derived from live market data.
Uses the configured scanner_data vendor, but currently relies on yfinance.
Returns:
str: Formatted table of gap candidates with gap %, price change, and relative volume
"""
return route_to_vendor("get_gap_candidates")
@tool
def get_sector_performance() -> str:
"""

View File

@ -239,6 +239,17 @@ def estimate_scan(config: dict | None = None) -> UsageEstimate:
_add("get_sector_performance")
est.notes.append("Phase 1C (Sector): 1 sector performance call")
# Phase 1D: Factor Alignment — bounded global revision/sentiment checks
_add("get_topic_news", 2)
_add("get_earnings_calendar")
est.notes.append("Phase 1D (Factor Alignment): ~2 topic news + 1 earnings calendar")
# Phase 1E: Drift Scanner — bounded global continuation checks
_add("get_gap_candidates")
_add("get_topic_news")
_add("get_earnings_calendar")
est.notes.append("Phase 1E (Drift): 1 live gap scan + ~1 topic news + 1 earnings calendar")
# Phase 2: Industry Deep Dive — ~3 industry perf + ~3 topic news
industry_calls = 3
_add("get_industry_performance", industry_calls)
@ -248,11 +259,8 @@ def estimate_scan(config: dict | None = None) -> UsageEstimate:
f"~{industry_calls} topic news calls"
)
# Phase 3: Macro Synthesis — ~2 topic news + calendars
_add("get_topic_news", 2)
_add("get_earnings_calendar")
_add("get_economic_calendar")
est.notes.append("Phase 3 (Macro Synthesis): ~2 topic news + calendar calls")
# Phase 3: Macro Synthesis — pure LLM reasoning, no external tools
est.notes.append("Phase 3 (Macro Synthesis): no external tool calls")
est.method_breakdown = breakdown
return est

View File

@ -13,6 +13,7 @@ from .y_finance import (
from .yfinance_news import get_news_yfinance, get_global_news_yfinance
from .yfinance_scanner import (
get_market_movers_yfinance,
get_gap_candidates_yfinance,
get_market_indices_yfinance,
get_sector_performance_yfinance,
get_industry_performance_yfinance,
@ -89,6 +90,7 @@ TOOLS_CATEGORIES = {
"description": "Market-wide scanner data (movers, indices, sectors, industries)",
"tools": [
"get_market_movers",
"get_gap_candidates",
"get_market_indices",
"get_sector_performance",
"get_industry_performance",
@ -117,6 +119,7 @@ FALLBACK_ALLOWED = {
"get_market_indices", # SPY/DIA/QQQ quotes are fungible
"get_sector_performance", # ETF-based proxy, same approach
"get_market_movers", # Approximation acceptable for screening
"get_gap_candidates", # Gap math from market data is fungible enough
"get_industry_performance", # ETF-based proxy
}
@ -168,6 +171,9 @@ VENDOR_METHODS = {
"yfinance": get_market_movers_yfinance,
"alpha_vantage": get_market_movers_alpha_vantage,
},
"get_gap_candidates": {
"yfinance": get_gap_candidates_yfinance,
},
"get_market_indices": {
"finnhub": get_market_indices_finnhub,
"alpha_vantage": get_market_indices_alpha_vantage,
@ -271,4 +277,3 @@ def route_to_vendor(method: str, *args, **kwargs):
error_msg = f"All vendors failed for '{method}' (tried: {', '.join(tried)})"
raise RuntimeError(error_msg) from last_error

View File

@ -82,6 +82,116 @@ def get_market_movers_yfinance(
return f"Error fetching market movers for {category}: {str(e)}"
def get_gap_candidates_yfinance() -> str:
"""
Compute real gap candidates from live yfinance screener quotes.
Uses a bounded universe from DAY_GAINERS and MOST_ACTIVES, then calculates
gap percentage from today's open versus the previous close. This is a real
market-data gap calculation, not a news heuristic.
Returns:
Markdown table of bounded gap candidates with liquidity confirmation.
"""
try:
universe = {}
for screener_key in ("DAY_GAINERS", "MOST_ACTIVES"):
data = yf.screen(screener_key, count=25)
if not data or not isinstance(data, dict):
continue
for quote in data.get("quotes", []):
symbol = quote.get("symbol")
if symbol:
universe[symbol] = quote
if not universe:
return "No stocks matched the live gap universe today."
rows = []
for symbol, quote in universe.items():
prev_close = quote.get("regularMarketPreviousClose")
open_price = quote.get("regularMarketOpen")
current_price = quote.get("regularMarketPrice")
volume = quote.get("regularMarketVolume")
avg_volume = quote.get("averageDailyVolume3Month")
change_pct = quote.get("regularMarketChangePercent")
name = quote.get("shortName", quote.get("displayName", "N/A"))
if not isinstance(prev_close, (int, float)) or prev_close == 0:
continue
if not isinstance(open_price, (int, float)):
continue
gap_pct = (open_price - prev_close) / prev_close * 100
rel_volume = None
if isinstance(volume, (int, float)) and isinstance(avg_volume, (int, float)) and avg_volume > 0:
rel_volume = volume / avg_volume
# Bounded long-bias filter for drift setups.
if gap_pct < 2.0:
continue
if rel_volume is not None and rel_volume < 1.25:
continue
if isinstance(current_price, (int, float)) and current_price < 5:
continue
rows.append(
{
"symbol": symbol,
"name": name[:30],
"open": open_price,
"prev_close": prev_close,
"gap_pct": gap_pct,
"price": current_price,
"change_pct": change_pct,
"rel_volume": rel_volume,
}
)
if not rows:
return "No stocks matched the live gap criteria today."
rows.sort(
key=lambda row: (
row["gap_pct"],
row["rel_volume"] if row["rel_volume"] is not None else 0,
),
reverse=True,
)
header = "# Gap Candidates\n"
header += f"# Data retrieved on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
lines = [
header,
"| Symbol | Name | Open | Prev Close | Gap % | Price | Change % | Rel Volume |",
"|--------|------|------|------------|-------|-------|----------|------------|",
]
for row in rows[:10]:
open_str = f"${row['open']:.2f}"
prev_str = f"${row['prev_close']:.2f}"
gap_str = f"{row['gap_pct']:+.2f}%"
price = row["price"]
price_str = f"${price:.2f}" if isinstance(price, (int, float)) else "N/A"
change = row["change_pct"]
change_str = f"{change:+.2f}%" if isinstance(change, (int, float)) else "N/A"
rel_volume = row["rel_volume"]
rel_volume_str = f"{rel_volume:.2f}x" if isinstance(rel_volume, (int, float)) else "N/A"
lines.append(
f"| {row['symbol']} | {row['name']} | {open_str} | {prev_str} | {gap_str} | "
f"{price_str} | {change_str} | {rel_volume_str} |"
)
return "\n".join(lines) + "\n"
except requests.exceptions.Timeout:
raise ThirdPartyTimeoutError("Request timed out fetching live gap candidates")
except ThirdPartyTimeoutError:
raise
except Exception as e:
return f"Error fetching live gap candidates: {str(e)}"
def get_market_indices_yfinance() -> str:
"""
Get major market indices data.

View File

@ -110,6 +110,9 @@ DEFAULT_CONFIG = {
# in auto mode. Portfolio holdings are always included regardless.
# Set to 0 or leave unset for the default (10).
"max_auto_tickers": _env_int("MAX_AUTO_TICKERS", 10),
# Scanner synthesis horizon in calendar days. 30/60/90 map cleanly to
# the 13 month search-graph variants.
"scan_horizon_days": _env_int("SCAN_HORIZON_DAYS", 30),
# Data vendor configuration
# Category-level configuration (default for all tools in category)
"data_vendors": {
@ -124,6 +127,8 @@ DEFAULT_CONFIG = {
"tool_vendors": {
# Finnhub free tier provides same data + MSPR aggregate bonus signal
"get_insider_transactions": "finnhub",
# First implementation is yfinance-only until another vendor is validated.
"get_gap_candidates": "yfinance",
},
# Report storage backend
# When mongo_uri is set, reports are persisted in MongoDB (never overwritten).

View File

@ -8,6 +8,8 @@ from tradingagents.agents.scanners import (
create_geopolitical_scanner,
create_market_movers_scanner,
create_sector_scanner,
create_factor_alignment_scanner,
create_drift_scanner,
create_smart_money_scanner,
create_industry_deep_dive,
create_macro_synthesis,
@ -19,7 +21,8 @@ class ScannerGraph:
"""Orchestrates the macro scanner pipeline.
Phase 1a (parallel): geopolitical_scanner, market_movers_scanner, sector_scanner
Phase 1b (sequential after sector): smart_money_scanner
Phase 1b (bounded global follow-ons): factor_alignment_scanner, smart_money_scanner
Phase 1c (after market + sector): drift_scanner
Phase 2: industry_deep_dive (fan-in from all Phase 1 nodes)
Phase 3: macro_synthesis -> END
"""
@ -46,14 +49,21 @@ class ScannerGraph:
deep_llm = self._create_llm("deep_think")
max_scan_tickers = int(self.config.get("max_auto_tickers", 10))
scan_horizon_days = int(self.config.get("scan_horizon_days", 30))
agents = {
"geopolitical_scanner": create_geopolitical_scanner(quick_llm),
"market_movers_scanner": create_market_movers_scanner(quick_llm),
"sector_scanner": create_sector_scanner(quick_llm),
"factor_alignment_scanner": create_factor_alignment_scanner(quick_llm),
"drift_scanner": create_drift_scanner(quick_llm),
"smart_money_scanner": create_smart_money_scanner(quick_llm),
"industry_deep_dive": create_industry_deep_dive(mid_llm),
"macro_synthesis": create_macro_synthesis(deep_llm, max_scan_tickers=max_scan_tickers),
"macro_synthesis": create_macro_synthesis(
deep_llm,
max_scan_tickers=max_scan_tickers,
scan_horizon_days=scan_horizon_days,
),
}
setup = ScannerGraphSetup(agents)
@ -148,6 +158,8 @@ class ScannerGraph:
"geopolitical_report": "",
"market_movers_report": "",
"sector_performance_report": "",
"factor_alignment_report": "",
"drift_opportunities_report": "",
"smart_money_report": "",
"industry_deep_dive_report": "",
"macro_scan_summary": "",

View File

@ -11,8 +11,10 @@ class ScannerGraphSetup:
Phase 1a (parallel from START):
geopolitical_scanner, market_movers_scanner, sector_scanner
Phase 1b (sequential after sector_scanner):
smart_money_scanner runs after sector data is available so it can
use sector rotation context when interpreting Finviz signals
factor_alignment_scanner, smart_money_scanner bounded global follow-ons
that use sector rotation context
Phase 1c:
drift_scanner runs after both sector and market-movers data exist
Phase 2: industry_deep_dive (fan-in from all Phase 1 nodes)
Phase 3: macro_synthesis -> END
"""
@ -24,6 +26,8 @@ class ScannerGraphSetup:
- geopolitical_scanner
- market_movers_scanner
- sector_scanner
- factor_alignment_scanner
- drift_scanner
- smart_money_scanner
- industry_deep_dive
- macro_synthesis
@ -46,12 +50,17 @@ class ScannerGraphSetup:
workflow.add_edge(START, "market_movers_scanner")
workflow.add_edge(START, "sector_scanner")
# Phase 1b: smart_money runs after sector (gets sector rotation context)
# Phase 1b: bounded global follow-ons that require sector context
workflow.add_edge("sector_scanner", "factor_alignment_scanner")
workflow.add_edge("sector_scanner", "smart_money_scanner")
workflow.add_edge("sector_scanner", "drift_scanner")
workflow.add_edge("market_movers_scanner", "drift_scanner")
# Fan-in: all Phase 1 nodes must complete before Phase 2
workflow.add_edge("geopolitical_scanner", "industry_deep_dive")
workflow.add_edge("market_movers_scanner", "industry_deep_dive")
workflow.add_edge("factor_alignment_scanner", "industry_deep_dive")
workflow.add_edge("drift_scanner", "industry_deep_dive")
workflow.add_edge("smart_money_scanner", "industry_deep_dive")
# Phase 2 -> Phase 3 -> END