TradingAgents

Commit Graph

Author	SHA1	Message	Date
ahmet guzererler	728ae69eab	feat: PM brain upgrade — macro/micro agents & memory split (#123 ) * feat: introduce flow_id with timestamp-based report versioning Replace run_id with flow_id as the primary grouping concept (one flow = one user analysis intent spanning scan + pipeline + portfolio). Reports are now written as {timestamp}_{name}.json so load methods always return the latest version by lexicographic sort, eliminating the latest.json pointer pattern for new flows. Key changes: - report_paths.py: add generate_flow_id(), ts_now() (ms precision), flow_id kwarg on all path helpers; keep run_id / pointer helpers for backward compatibility - ReportStore: dual-mode save/load — flow_id uses timestamped layout, run_id uses legacy runs/{id}/ layout with latest.json - MongoReportStore: add flow_id field and index; run_id stays for compat - DualReportStore: expose flow_id property - store_factory: accept flow_id as primary param, run_id as alias - runs.py / langgraph_engine.py: generate and thread flow_id through all trigger endpoints and run methods - Tests: add flow_id coverage for all layers; 905 tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: PM brain upgrade — macro/micro summary agents, memory split, forensic dashboard Replaces the PM's raw-JSON context (~6,800 tokens on deep_think) with a MAP-REDUCE compression layer using two parallel mid_think summary agents, achieving ~70% cost reduction at the PM tier. Architecture: - MacroMemory: new regime-level memory class (MongoDB/JSON, separate from per-ticker reflexion memory) with record_macro_state/build_macro_context - ReflexionMemory: extended with collection_name param to isolate micro_reflexion from the pipeline reflexion collection (with distinct local JSON fallback path to prevent file collision) - Macro_Summary_Agent (mid_think): compresses scan_summary into a 1-page regime brief with memory injection; sentinel guard prevents LLM call on empty/error scan data ("NO DATA AVAILABLE - ABORT MACRO") - Micro_Summary_Agent (mid_think): compresses holding_reviews + candidates into a markdown table brief with per-ticker memory injection - Portfolio graph: parallel fan-out (prioritize_candidates → macro_summary ‖ micro_summary → make_pm_decision) using _last_value reducers for safe concurrent state writes (ADR-005 pattern) - PM refactor: Pydantic PMDecisionSchema enforces Forensic Execution Dashboard output (macro_regime, forensic_report, per-trade macro_alignment/memory_note/position_sizing_logic); with_structured_output as primary path, extract_json fallback for non-conforming providers - PM sentinel handling: "NO DATA AVAILABLE" in macro_brief substituted with actionable conservative guidance before LLM sees it 62 new unit tests across 4 test files covering all new components. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address code review — relaxed error guard, ticker_analyses, PM memory wiring 1. macro_summary_agent: relaxed error guard to only abort when scan_summary's sole key is "error" (partial failures with real data are now processed) 2. micro_summary_agent: now reads ticker_analyses from state and enriches the per-ticker table with trading graph analysis data 3. portfolio_graph: wires macro_memory and micro_memory to PM factory call 4. test_empty_state: updated test for new partial-failure behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 12:55:24 +01:00
ahmet guzererler	6b3dd4172a	feat: finalise storage layout, run history loading & phase-level re-run (#121 ) * feat: introduce flow_id with timestamp-based report versioning Replace run_id with flow_id as the primary grouping concept (one flow = one user analysis intent spanning scan + pipeline + portfolio). Reports are now written as {timestamp}_{name}.json so load methods always return the latest version by lexicographic sort, eliminating the latest.json pointer pattern for new flows. Key changes: - report_paths.py: add generate_flow_id(), ts_now() (ms precision), flow_id kwarg on all path helpers; keep run_id / pointer helpers for backward compatibility - ReportStore: dual-mode save/load — flow_id uses timestamped layout, run_id uses legacy runs/{id}/ layout with latest.json - MongoReportStore: add flow_id field and index; run_id stays for compat - DualReportStore: expose flow_id property - store_factory: accept flow_id as primary param, run_id as alias - runs.py / langgraph_engine.py: generate and thread flow_id through all trigger endpoints and run methods - Tests: add flow_id coverage for all layers; 905 tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: finalise storage layout, run history loading & phase-level re-run Storage / persistence - flow_id (8-char hex) replaces run_id as the disk storage key; all sub-phases of one auto run share the same flow_id directory - Startup hydration: hydrate_runs_from_disk() rebuilds in-memory run store from run_meta.json on server restart (events lazy-loaded) WebSocket / run history fixes - Lazy-load events from run_events.jsonl on first WS connect; fixes blank terminal when clicking a historical run after restart - Orphaned "running" runs (server restarted mid-run) auto-detected and marked "failed" with partial events replayed correctly Phase re-run fixes - Analysts checkpoint: use any() instead of all() — Social Analyst is optional; all() silently blocked checkpoint saves in typical runs - Checkpoint lookup: pass original flow_id through rerun_params so _date_root() resolves to the correct flow_id subdirectory - Selective event filtering on re-run: preserves scan nodes and other tickers; only removes stale events for the re-run phase+ticker - Frontend graph now shows full auto-flow context during phase re-runs Documentation - ADR 018: canonical reference for storage layout, event schema, WebSocket streaming flows, checkpoint structure, MongoDB vs local - ADR 013 updated: reflects background-task + lazy-loading evolution - ADR 015 marked superseded by ADR 018 - CLAUDE.md: AgentOS storage section + 4 new critical patterns - CURRENT_STATE.md updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 11:12:16 +01:00
ahmet guzererler	0efbbd9400	feat: load flow_id in FE to resume runs and fix max_tickers cap (#113 ) * feat: introduce flow_id with timestamp-based report versioning Replace run_id with flow_id as the primary grouping concept (one flow = one user analysis intent spanning scan + pipeline + portfolio). Reports are now written as {timestamp}_{name}.json so load methods always return the latest version by lexicographic sort, eliminating the latest.json pointer pattern for new flows. Key changes: - report_paths.py: add generate_flow_id(), ts_now() (ms precision), flow_id kwarg on all path helpers; keep run_id / pointer helpers for backward compatibility - ReportStore: dual-mode save/load — flow_id uses timestamped layout, run_id uses legacy runs/{id}/ layout with latest.json - MongoReportStore: add flow_id field and index; run_id stays for compat - DualReportStore: expose flow_id property - store_factory: accept flow_id as primary param, run_id as alias - runs.py / langgraph_engine.py: generate and thread flow_id through all trigger endpoints and run methods - Tests: add flow_id coverage for all layers; 905 tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: load flow_id in FE to resume runs and fix max_tickers cap on continuation - Add flow_id to RunParams interface and initial state - loadRun() now restores flow_id + max_auto_tickers from history so the next run continues in the same flow directory (Phase 1 scan skipped, already-done tickers skipped via skip-if-exists logic) - startRun() spreads flow_id into the request body when set, letting the backend reuse the existing flow directory instead of generating a fresh flow_id - After each run, params.flow_id is updated from the response so subsequent runs automatically continue from the same flow - max_auto_tickers restored from run.params.max_tickers ensures the ticker cap matches the original run; scan_tickers[:max_t] on the backend then limits the Phase 2 queue to the user's setting even when the existing scan has more Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(mongo): fast-fail timeout + lazy ensure_indexes to avoid 30s block on fallback MongoClient previously used pymongo's 30-second serverSelectionTimeoutMS default, causing store_factory to hang for 30s before falling back to the filesystem when Atlas is unreachable. Also, ensure_indexes() was called eagerly in __init__, making every store construction attempt block on a live network call. - Set serverSelectionTimeoutMS=5_000 so fallback is triggered in ≤5s - Move ensure_indexes() call out of __init__ — indexes are now created lazily on the first _save() call via a guarded self._indexes_ensured flag - ensure_indexes() is still idempotent and safe to call explicitly in tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(store): wrap all DualReportStore mongo calls in _try_mongo() for graceful degradation Any MongoDB exception (SSL error, ServerSelectionTimeout, auth failure) was propagating uncaught through DualReportStore and crashing the run. Reads would return an error instead of falling back to local, and writes would abort mid-run without saving anything. Introduce a single _try_mongo(fn, default) helper that: - Executes the Mongo callable - Catches any exception, logs it as WARNING with type + message - Returns the default value so the caller continues with local-only data Pattern per method: writes → try mongo (fire-and-forget); always return local result reads → try mongo first; fall back to local on None or exception lists → try mongo; fall back to local on empty/None Runs now complete successfully even when Atlas is unreachable or returns SSL errors. MongoDB sync resumes automatically once connectivity is restored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(observability): non-blocking MongoDB inserts + 5s timeout in RunLogger Every LLM and tool callback called _append() which synchronously called insert_one() against MongoDB. When Atlas was unreachable this blocked the entire LangGraph run for pymongo's 30-second default timeout per event, effectively serializing all agent work behind MongoDB retries. Two fixes: 1. serverSelectionTimeoutMS=5_000 on the RunLogger's MongoClient — consistent with the same fix applied to MongoReportStore. 2. MongoDB inserts are now fire-and-forget via daemon threads — _append() spawns a Thread(target=_insert, daemon=True) and returns immediately. LLM callbacks and tool events are never delayed by MongoDB connectivity issues. Failures are still reported via WARNING log from the background thread. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * revert(observability): restore synchronous MongoDB inserts in RunLogger Root cause was an IP whitelist issue on Atlas causing SSL failures, not insert volume. The background-thread approach added unnecessary complexity. The 5s serverSelectionTimeoutMS is retained as a defensive safeguard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 07:10:42 +01:00
Ahmet Guzererler	c3762c0499	Reapply "feat: enhance data persistence with DualReportStore for local and MongoDB storage; update report store creation logic" This reverts commit `9358b7edc8`.	2026-03-25 19:54:34 +01:00
Ahmet Guzererler	9358b7edc8	Revert "feat: enhance data persistence with DualReportStore for local and MongoDB storage; update report store creation logic" This reverts commit `5f0a52f8e6`.	2026-03-25 19:44:01 +01:00
Ahmet Guzererler	5f0a52f8e6	feat: enhance data persistence with DualReportStore for local and MongoDB storage; update report store creation logic	2026-03-25 19:04:36 +01:00

6 Commits