docs: add implementation plans for opt-in fallback (011) and test fixes (012)

Also unblock docs/agent/plans/ from .gitignore — the plans/ rule was intended for generated scan-result plans, not design documents. - 011-opt-in-vendor-fallback.md: fail-fast-by-default with FALLBACK_ALLOWED whitelist for fungible data tools (OHLCV, indices, sector/industry perf, market movers); 4-phase plan covering interface.py change, new tests, ADR docs, and verification - 012-fix-preexisting-test-failures.md: root-cause analysis and exact fixes for all 12 pre-existing failures across 5 test files (now implemented) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 12:12:07 +01:00 · 2026-03-18 12:12:07 +01:00 · ff3e2817bf
parent cf636232aa
commit ff3e2817bf
3 changed files with 261 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -221,6 +221,7 @@ __marimo__/
 # Scan results and execution plans (generated artifacts)
 results/
 plans/
+!docs/agent/plans/

 # Backup files
 *.backup
--- a/docs/agent/plans/011-opt-in-vendor-fallback.md
+++ b/docs/agent/plans/011-opt-in-vendor-fallback.md
@ -0,0 +1,101 @@
+# Plan: Opt-in Vendor Fallback (Fail-Fast by Default)
+
+**Status**: pending
+**ADR**: 011 (to be created)
+**Branch**: claude/objective-galileo
+**Depends on**: PR #16 (Finnhub integration)
+
+## Context
+
+The current `route_to_vendor()` silently tries every available vendor when the primary fails. This is dangerous for trading software — different vendors return different data contracts (e.g., AV news has sentiment scores, yfinance doesn't; stockstats indicator names are incompatible with AV API names). Silent fallback corrupts signal quality without leaving a trace.
+
+**Decision**: Default to fail-fast. Only tools in `FALLBACK_ALLOWED` (where data contracts are vendor-agnostic) get vendor fallback. Everything else raises on primary vendor failure.
+
+## FALLBACK_ALLOWED Whitelist
+
+```python
+FALLBACK_ALLOWED = {
+    "get_stock_data",           # OHLCV is fungible across vendors
+    "get_market_indices",       # SPY/DIA/QQQ quotes are fungible
+    "get_sector_performance",   # ETF-based proxy, same approach
+    "get_market_movers",        # Approximation acceptable for screening
+    "get_industry_performance", # ETF-based proxy
+}
+```
+
+**Explicitly excluded** (data contracts differ across vendors):
+- `get_news` — AV has `ticker_sentiment_score`, `relevance_score`, `overall_sentiment_label`; yfinance has raw headlines only
+- `get_global_news` — same reason as get_news
+- `get_indicators` — stockstats names (`close_50_sma`, `macdh`, `boll_ub`) ≠ AV API names (`SMA`, `MACD`, `BBANDS`)
+- `get_fundamentals` — different fiscal period alignment, different coverage depth
+- `get_balance_sheet` — vendor-specific field schemas
+- `get_cashflow` — vendor-specific field schemas
+- `get_income_statement` — vendor-specific field schemas
+- `get_insider_transactions` — Finnhub provides MSPR aggregate data that AV/yfinance don't
+- `get_topic_news` — different structure/fields across vendors
+- `get_earnings_calendar` — Finnhub-only, nothing to fall back to
+- `get_economic_calendar` — Finnhub-only, nothing to fall back to
+
+## Phase 1: Core Logic Change
+
+- [ ] **1.1** Add `FALLBACK_ALLOWED` set to `tradingagents/dataflows/interface.py` (after `VENDOR_LIST`, ~line 108)
+- [ ] **1.2** Modify `route_to_vendor()`:
+  - Only build extended vendor chain when `method in FALLBACK_ALLOWED`
+  - Otherwise limit attempts to configured primary vendor(s) only
+  - Capture `last_error` and chain into RuntimeError via `from last_error`
+  - Improve error message: `"All vendors failed for '{method}' (tried: {vendors})"`
+
+## Phase 2: Test Updates
+
+- [ ] **2.1** Verify existing fallback tests still pass (`get_stock_data`, `get_market_movers`, `get_sector_performance` are all in `FALLBACK_ALLOWED`)
+- [ ] **2.2** Update `tests/test_e2e_api_integration.py::test_raises_runtime_error_when_all_vendors_fail` — error message changes from `"No available vendor"` to `"All vendors failed for..."`
+- [ ] **2.3** Create `tests/test_vendor_failfast.py` with:
+  - `test_news_fails_fast_no_fallback` — configure AV, make it raise, assert RuntimeError (no silent yfinance fallback)
+  - `test_indicators_fail_fast_no_fallback` — same pattern for indicators
+  - `test_fundamentals_fail_fast_no_fallback` — same for fundamentals
+  - `test_insider_transactions_fail_fast_no_fallback` — configure Finnhub, make it raise, assert RuntimeError
+  - `test_topic_news_fail_fast_no_fallback` — verify no cross-vendor fallback
+  - `test_calendar_fail_fast_single_vendor` — Finnhub-only, verify fail-fast
+  - `test_error_chain_preserved` — verify `RuntimeError.__cause__` is set
+  - `test_error_message_includes_method_and_vendors` — verify debuggable error text
+  - `test_auth_error_propagates` — verify 401/403 errors don't silently retry
+
+## Phase 3: Documentation
+
+- [ ] **3.1** Create `docs/agent/decisions/011-opt-in-vendor-fallback.md`
+  - Context: silent fallback corrupts signal quality
+  - Decision: fail-fast by default, opt-in fallback for fungible data
+  - Constraints: adding to `FALLBACK_ALLOWED` requires verifying data contract compatibility
+  - Actionable Rules: never add news/indicator tools to FALLBACK_ALLOWED
+- [ ] **3.2** Update ADR 002 — mark as `superseded-by: 011`
+- [ ] **3.3** Update ADR 008 — add opt-in fallback rule to vendor fallback section
+- [ ] **3.4** Update ADR 010 — note insider transactions excluded from fallback
+- [ ] **3.5** Update `docs/agent/CURRENT_STATE.md`
+
+## Phase 4: Verification
+
+- [ ] **4.1** Run full offline test suite: `pytest tests/ -v -m "not integration"`
+- [ ] **4.2** Verify zero new failures introduced
+- [ ] **4.3** Smoke test: `python -m cli.main scan --date 2026-03-17`
+
+## Files Changed
+
+| File | Change |
+|---|---|
+| `tradingagents/dataflows/interface.py` | Add `FALLBACK_ALLOWED`, rewrite `route_to_vendor()` |
+| `tests/test_e2e_api_integration.py` | Update error message match pattern |
+| `tests/test_vendor_failfast.py` | **New** — 9 fail-fast tests |
+| `docs/agent/decisions/011-opt-in-vendor-fallback.md` | **New** ADR |
+| `docs/agent/decisions/002-data-vendor-fallback.md` | Mark superseded |
+| `docs/agent/decisions/008-lessons-learned.md` | Add opt-in rule |
+| `docs/agent/decisions/010-finnhub-vendor-integration.md` | Note insider txn exclusion |
+| `docs/agent/CURRENT_STATE.md` | Update progress |
+
+## Edge Cases
+
+| Case | Handling |
+|---|---|
+| Multi-vendor primary config (`"finnhub,alpha_vantage"`) | All comma-separated vendors tried before giving up — works for both modes |
+| Calendar tools (Finnhub-only) | Not in `FALLBACK_ALLOWED`, single-vendor so fail-fast is a no-op |
+| `get_topic_news` | Excluded — different vendors have different news schemas |
+| Composite tools (`get_ttm_analysis`) | Calls `route_to_vendor()` for sub-tools directly — no action needed |
--- a/docs/agent/plans/012-fix-preexisting-test-failures.md
+++ b/docs/agent/plans/012-fix-preexisting-test-failures.md
@ -0,0 +1,159 @@
+# Plan: Fix Pre-existing Test Failures
+
+**Status**: complete
+**Branch**: claude/objective-galileo
+**Principle**: Tests that fail due to API rate limits are OK to fail — but they must state WHY. Never skip or artificially pass. Fix real bugs only.
+
+## Failures to Fix (12 total, 5 test files)
+
+---
+
+### 1. `tests/test_config_wiring.py` — 4 tests
+
+**Root cause**: `callable()` returns `False` on LangChain `StructuredTool` objects. The `@tool` decorator creates a `StructuredTool` instance, not a plain function.
+
+**Failing lines**: 28, 32, 36, 40 — all `assert callable(X)`
+
+**Fix**: Replace `assert callable(X)` with `assert hasattr(X, "invoke")` — this is the correct way to check LangChain tools are invocable.
+
+```python
+# BEFORE (broken)
+assert callable(get_ttm_analysis)
+
+# AFTER (correct)
+assert hasattr(get_ttm_analysis, "invoke")
+```
+
+- [x] Fix line 28: `get_ttm_analysis`
+- [x] Fix line 32: `get_peer_comparison`
+- [x] Fix line 36: `get_sector_relative`
+- [x] Fix line 40: `get_macro_regime`
+
+---
+
+### 2. `tests/test_env_override.py` — 2 tests
+
+**Root cause**: `importlib.reload()` re-runs `load_dotenv()` which reads `TRADINGAGENTS_*` vars from the user's `.env` file even after stripping them from `os.environ`. `patch.dict(clear=True)` removes the keys but doesn't prevent `load_dotenv()` from re-injecting them.
+
+**Failing tests**:
+- `test_mid_think_llm_none_by_default` (line ~40-46) — expects `None`, gets `qwen/qwq-32b`
+- `test_defaults_unchanged_when_no_env_set` (line ~96-108) — expects `gpt-5.2`, gets `deepseek/deepseek-r1-0528`
+
+**Fix**: Build a clean env dict (strip `TRADINGAGENTS_*` vars) AND patch `dotenv.load_dotenv` to prevent `.env` re-reads during module reload.
+
+```python
+# Pattern for proper isolation
+env_clean = {k: v for k, v in os.environ.items() if not k.startswith("TRADINGAGENTS_")}
+with patch.dict(os.environ, env_clean, clear=True):
+    with patch("dotenv.load_dotenv"):
+        cfg = self._reload_config()
+        assert cfg["mid_think_llm"] is None
+```
+
+- [x] Fix `test_mid_think_llm_none_by_default` — clean env + mock load_dotenv
+- [x] Fix `test_defaults_unchanged_when_no_env_set` — add mock load_dotenv (already had clean env)
+- [x] Audit other tests in the file — remaining tests use explicit env overrides, not affected
+
+---
+
+### 3. `tests/test_scanner_comprehensive.py` — 1 test
+
+**Root cause**: Two bugs in `test_scan_command_creates_output_files`:
+1. Wrong filenames (`market_movers.txt` etc.) — scanner saves `{key}.md` (e.g. `market_movers_report.md`)
+2. Wrong path format — `str(test_date_dir / filename)` produces absolute paths, but `written_files` keys are relative (matching what `Path("results/macro_scan") / date / key` produces)
+
+**Failing test**: `test_scan_command_creates_output_files` (line ~119)
+
+**Fix**: Update filenames and use relative path keys:
+
+```python
+# AFTER (correct)
+expected_files = [
+    "geopolitical_report.md",
+    "market_movers_report.md",
+    "sector_performance_report.md",
+    "industry_deep_dive_report.md",
+    "macro_scan_summary.md",
+]
+for filename in expected_files:
+    filepath = f"results/macro_scan/2026-03-15/{filename}"
+    assert filepath in written_files, ...
+```
+
+- [x] Update expected filenames to match actual scanner output
+- [x] Fix filepath key to use relative format matching run_scan() output
+- [x] Remove format-specific content assertions (content is LLM-generated, not tool output)
+
+---
+
+### 4. `tests/test_scanner_fallback.py` — 2 tests
+
+**Root cause**: Tests call `get_sector_performance_alpha_vantage()` and `get_industry_performance_alpha_vantage()` WITHOUT mocking, making real API calls. They expect ALL calls to fail and raise `AlphaVantageError`, but real API calls intermittently succeed.
+
+**Failing tests**:
+- `test_sector_perf_raises_on_total_failure` (line ~85)
+- `test_industry_perf_raises_on_total_failure` (line ~90)
+
+**Fix**: Mock `_fetch_global_quote` to simulate total failure:
+
+```python
+with patch(
+    "tradingagents.dataflows.alpha_vantage_scanner._fetch_global_quote",
+    side_effect=AlphaVantageError("Rate limit exceeded — mocked for test isolation"),
+):
+    with pytest.raises(AlphaVantageError, match="All .* sector queries failed"):
+        get_sector_performance_alpha_vantage()
+```
+
+- [x] Mock `_fetch_global_quote` in `test_sector_perf_raises_on_total_failure`
+- [x] Mock `_fetch_global_quote` in `test_industry_perf_raises_on_total_failure`
+- [x] No `@pytest.mark.integration` to remove — class had no marker
+
+---
+
+### 5. `tests/test_scanner_graph.py` — 3 tests
+
+**Root cause**: Tests import `MacroScannerGraph` but class was renamed to `ScannerGraph`. Third test uses `ScannerGraphSetup` but with wrong constructor args (no `agents` provided).
+
+**Failing tests**:
+- `test_scanner_graph_import` — `ImportError: cannot import name 'MacroScannerGraph'`
+- `test_scanner_graph_instantiates` — same import error
+- `test_scanner_setup_compiles_graph` — `TypeError: ScannerGraphSetup.__init__() missing 1 required positional argument: 'agents'`
+
+**Fix**: Rename import, mock `_create_llm` for instantiation test, provide `mock_agents` dict:
+
+```python
+from unittest.mock import MagicMock, patch
+from tradingagents.graph.scanner_graph import ScannerGraph
+
+with patch.object(ScannerGraph, "_create_llm", return_value=MagicMock()):
+    scanner = ScannerGraph()  # compiles real graph with mock LLMs
+```
+
+- [x] Fix import: `MacroScannerGraph` → `ScannerGraph`
+- [x] Mock `_create_llm` to avoid real LLM init in instantiation test
+- [x] Provide `mock_agents` dict to `ScannerGraphSetup` — compiles real wiring logic
+
+---
+
+## Verification
+
+- [x] Run `pytest tests/test_config_wiring.py -v` — all 4 previously failing tests pass
+- [x] Run `pytest tests/test_env_override.py -v` — all 2 previously failing tests pass
+- [x] Run `pytest tests/test_scanner_fallback.py -v` — all 2 previously failing tests pass
+- [x] Run `pytest tests/test_scanner_graph.py -v` — all 3 previously failing tests pass
+- [x] Run `python -m pytest tests/test_scanner_comprehensive.py -v` — 1 previously failing test passes (482s, real LLM)
+- [x] Run full offline suite: `python -m pytest tests/ -v -m "not integration"` — 388 passed, 70 deselected, 0 failures (512s)
+- [x] API-dependent tests that fail due to rate limits include clear WHY in mock side_effect message
+
+**Note**: Must use `python -m pytest` (not bare `pytest`) in this worktree. The editable install in `site-packages` maps `tradingagents` to the main repo. `python -m pytest` adds CWD to `sys.path`, making the worktree's `tradingagents` visible first.
+
+## Files Changed
+
+| File | Change |
+|---|---|
+| `tests/test_config_wiring.py` | `callable()` → `hasattr(x, "invoke")` |
+| `tests/test_env_override.py` | Clean env + `patch("dotenv.load_dotenv")` to block .env re-reads |
+| `tests/test_scanner_comprehensive.py` | Fix filenames + path format; remove format-specific content assertions |
+| `tests/test_scanner_fallback.py` | Mock `_fetch_global_quote` instead of making real API calls |
+| `tests/test_scanner_graph.py` | `MacroScannerGraph` → `ScannerGraph`; mock `_create_llm`; provide `mock_agents` |