Commit Graph

8 Commits

Author SHA1 Message Date
ahmet guzererler 8efcf2a58e
Remove unused import `field` from `observability.py` (#120)
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
2026-03-26 11:17:58 +01:00
ahmet guzererler 0efbbd9400
feat: load flow_id in FE to resume runs and fix max_tickers cap (#113)
* feat: introduce flow_id with timestamp-based report versioning

Replace run_id with flow_id as the primary grouping concept (one flow =
one user analysis intent spanning scan + pipeline + portfolio). Reports
are now written as {timestamp}_{name}.json so load methods always return
the latest version by lexicographic sort, eliminating the latest.json
pointer pattern for new flows.

Key changes:
- report_paths.py: add generate_flow_id(), ts_now() (ms precision),
  flow_id kwarg on all path helpers; keep run_id / pointer helpers for
  backward compatibility
- ReportStore: dual-mode save/load — flow_id uses timestamped layout,
  run_id uses legacy runs/{id}/ layout with latest.json
- MongoReportStore: add flow_id field and index; run_id stays for compat
- DualReportStore: expose flow_id property
- store_factory: accept flow_id as primary param, run_id as alias
- runs.py / langgraph_engine.py: generate and thread flow_id through all
  trigger endpoints and run methods
- Tests: add flow_id coverage for all layers; 905 tests pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: load flow_id in FE to resume runs and fix max_tickers cap on continuation

- Add flow_id to RunParams interface and initial state
- loadRun() now restores flow_id + max_auto_tickers from history so the next
  run continues in the same flow directory (Phase 1 scan skipped, already-done
  tickers skipped via skip-if-exists logic)
- startRun() spreads flow_id into the request body when set, letting the backend
  reuse the existing flow directory instead of generating a fresh flow_id
- After each run, params.flow_id is updated from the response so subsequent
  runs automatically continue from the same flow
- max_auto_tickers restored from run.params.max_tickers ensures the ticker cap
  matches the original run; scan_tickers[:max_t] on the backend then limits
  the Phase 2 queue to the user's setting even when the existing scan has more

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mongo): fast-fail timeout + lazy ensure_indexes to avoid 30s block on fallback

MongoClient previously used pymongo's 30-second serverSelectionTimeoutMS default,
causing store_factory to hang for 30s before falling back to the filesystem when
Atlas is unreachable.  Also, ensure_indexes() was called eagerly in __init__,
making every store construction attempt block on a live network call.

- Set serverSelectionTimeoutMS=5_000 so fallback is triggered in ≤5s
- Move ensure_indexes() call out of __init__ — indexes are now created lazily
  on the first _save() call via a guarded self._indexes_ensured flag
- ensure_indexes() is still idempotent and safe to call explicitly in tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(store): wrap all DualReportStore mongo calls in _try_mongo() for graceful degradation

Any MongoDB exception (SSL error, ServerSelectionTimeout, auth failure) was
propagating uncaught through DualReportStore and crashing the run.  Reads
would return an error instead of falling back to local, and writes would
abort mid-run without saving anything.

Introduce a single _try_mongo(fn, default) helper that:
- Executes the Mongo callable
- Catches *any* exception, logs it as WARNING with type + message
- Returns the default value so the caller continues with local-only data

Pattern per method:
  writes  → try mongo (fire-and-forget); always return local result
  reads   → try mongo first; fall back to local on None or exception
  lists   → try mongo; fall back to local on empty/None

Runs now complete successfully even when Atlas is unreachable or returns SSL
errors.  MongoDB sync resumes automatically once connectivity is restored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(observability): non-blocking MongoDB inserts + 5s timeout in RunLogger

Every LLM and tool callback called _append() which synchronously called
insert_one() against MongoDB.  When Atlas was unreachable this blocked the
entire LangGraph run for pymongo's 30-second default timeout per event,
effectively serializing all agent work behind MongoDB retries.

Two fixes:
1. serverSelectionTimeoutMS=5_000 on the RunLogger's MongoClient — consistent
   with the same fix applied to MongoReportStore.
2. MongoDB inserts are now fire-and-forget via daemon threads — _append() spawns
   a Thread(target=_insert, daemon=True) and returns immediately.  LLM callbacks
   and tool events are never delayed by MongoDB connectivity issues.
   Failures are still reported via WARNING log from the background thread.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert(observability): restore synchronous MongoDB inserts in RunLogger

Root cause was an IP whitelist issue on Atlas causing SSL failures, not
insert volume.  The background-thread approach added unnecessary complexity.
The 5s serverSelectionTimeoutMS is retained as a defensive safeguard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 07:10:42 +01:00
Ahmet Guzererler c3762c0499 Reapply "feat: enhance data persistence with DualReportStore for local and MongoDB storage; update report store creation logic"
This reverts commit 9358b7edc8.
2026-03-25 19:54:34 +01:00
Ahmet Guzererler 9358b7edc8 Revert "feat: enhance data persistence with DualReportStore for local and MongoDB storage; update report store creation logic"
This reverts commit 5f0a52f8e6.
2026-03-25 19:44:01 +01:00
Ahmet Guzererler 5f0a52f8e6 feat: enhance data persistence with DualReportStore for local and MongoDB storage; update report store creation logic 2026-03-25 19:04:36 +01:00
Copilot 9c9cc8c0b6
fix: address all PR#106 review findings (ADR 016) (#106)
* Initial plan

* feat: add observability logging - run event persistence and enriched tool events

- Integrate RunLogger into LangGraphEngine for JSONL event persistence
- Add _start_run_logger/_finish_run_logger lifecycle in all run methods
- Enrich tool events with service, status, and error fields
- Add _TOOL_SERVICE_MAP for tool-to-service name resolution
- Frontend: color error events in red, show service badges
- Frontend: display graceful_skip status with orange indicators
- Frontend: add error tab and service info to EventDetail/EventDetailModal
- Add 11 unit tests for new observability features

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/477a0676-af7b-48ff-8a3d-567e943323cf

* refactor: address code review - extract graceful keywords constant, fix imports

- Move get_daily_dir import to top-level (remove inline aliases)
- Extract _GRACEFUL_SKIP_KEYWORDS as module-level constant
- Update test patches to match top-level import location

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/477a0676-af7b-48ff-8a3d-567e943323cf

* feat: add run_id to report paths, MongoDB report store, store factory, and reflexion memory

- report_paths.py: All path helpers accept optional run_id for run-scoped dirs
- report_store.py: ReportStore supports run_id + latest.json pointer mechanism
- mongo_report_store.py: MongoDB-backed store with same interface (no overwrites)
- store_factory.py: Factory returns MongoDB or filesystem store based on config
- memory/reflexion.py: Reflexion memory for learning from past decisions
- langgraph_engine.py: Uses store factory + run_id for all run methods
- Fix save_holding_reviews bug (was save_holding_reviews, now save_holding_review)
- default_config.py: Add mongo_uri and mongo_db config keys
- pyproject.toml: Add pymongo>=4.12.1 dependency
- .env.example: Document TRADINGAGENTS_MONGO_URI and TRADINGAGENTS_MONGO_DB

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/16e673ea-40a1-40a0-8e77-f8cd08c1a716

* fix: clean up reflexion record_outcome (remove broken update_one with sort)

Also update runs.py reset endpoint to use store factory, fix tests,
add ADR 015, update CURRENT_STATE.md

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/16e673ea-40a1-40a0-8e77-f8cd08c1a716

* fix: address all PR#106 review findings (ADR 016)

- Fix save_holding_review: iterate per-ticker instead of passing
  portfolio_id as ticker (Finding 13)
- Fix RunLogger context: replace threading.local with contextvars
  for asyncio task isolation (Finding 3)
- Fix list_pm_decisions: add _id:0 projection to exclude ObjectId (Finding 6)
- Fix ReflexionMemory: native datetime for MongoDB, ISO string for
  local JSON fallback (Finding 7)
- Fix latest pointer: write/read_latest_pointer accept base_dir
  parameter, ReportStore passes _base_dir (Finding 12)
- Wire RunLogger callback into all astream_events calls (Finding 1)
- Call ensure_indexes in MongoReportStore.__init__ (Finding 11)
- Create ADR 016 documenting all 13 findings and resolutions
- Add 14 targeted tests covering all 7 fixes
- All 886 tests pass (872 existing + 14 new)

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/e52cdd2f-efae-4d2a-a56f-903d909b3342

* chore: remove unused imports in tests, remove redundant ensure_indexes call in factory

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/e52cdd2f-efae-4d2a-a56f-903d909b3342

* docs: update ADR 016 — mark Finding 2 resolved, update context docs for contextvars

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/ce9e2400-a60d-4a6b-896b-1b34ec786bed

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
2026-03-25 11:14:23 +01:00
copilot-swe-agent[bot] 92ebc13ce4 Add API consumption estimation module and CLI command
- New tradingagents/api_usage.py: Pre-run estimation of API calls per vendor
  for analyze, scan, and pipeline commands. Includes Alpha Vantage tier
  assessment (free: 25/day vs premium: 75/min).
- New CLI command: `estimate-api [analyze|scan|pipeline|all]`
- Enhanced observability: RunLogger.summary() now includes vendor_methods
  breakdown (vendor → method → call count)
- Enhanced CLI output: All 3 command summaries (analyze, scan, pipeline)
  now show per-vendor breakdown and Alpha Vantage assessment after runs
- 32 new tests in tests/unit/test_api_usage.py

Co-authored-by: aguzererler <6199053+aguzererler@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aguzererler/TradingAgents/sessions/bb80e772-3e03-420e-bb0e-76cfdde14a04
2026-03-21 17:25:26 +00:00
ahmet guzererler a90f14c086
feat: unified report paths, structured observability logging, and memory system update (#22)
* gitignore

* feat: unify report paths under reports/daily/{date}/ hierarchy

All generated artifacts now land under a single reports/ tree:
- reports/daily/{date}/market/ for scan results (was results/macro_scan/)
- reports/daily/{date}/{TICKER}/ for per-ticker analysis (was reports/{TICKER}_{timestamp}/)
- reports/daily/{date}/{TICKER}/eval/ for eval logs (was eval_results/{TICKER}/...)

Adds tradingagents/report_paths.py with centralized path helpers used by
CLI commands, trading graph, and pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: structured observability logging for LLM, tool, and vendor calls

Add RunLogger (tradingagents/observability.py) that emits JSON-lines events
for every LLM call (model, agent, tokens in/out, latency), tool invocation
(tool name, args, success, latency), data vendor call (method, vendor,
success/failure, latency), and report save.

Integration points:
- route_to_vendor: log_vendor_call() on every try/catch
- run_tool_loop: log_tool_call() on every tool invoke
- ScannerGraph: new callbacks param, passes RunLogger.callback to all LLM tiers
- pipeline/macro_bridge: picks up RunLogger from thread-local, passes to TradingAgentsGraph
- cli/main.py: one RunLogger per command (analyze/scan/pipeline), write_log()
  at end, summary line printed to console

Log files co-located with reports:
  reports/daily/{date}/{TICKER}/run_log.jsonl   (analyze)
  reports/daily/{date}/market/run_log.jsonl     (scan)
  reports/daily/{date}/run_log.jsonl            (pipeline)

Also fix test_long_response_no_nudge: update "A"*600 → "A"*2100 to match
MIN_REPORT_LENGTH=2000 threshold set in an earlier commit.

Update memory system context files (ARCHITECTURE, COMPONENTS, CONVENTIONS,
GLOSSARY, CURRENT_STATE) to document observability and report path systems.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 09:06:40 +01:00