TradingAgents/docs/agent/decisions/016-pr106-review-findings.md

6.5 KiB
Raw Blame History

ADR 016 — PR#106 Review Findings: Logging Strategy & MongoDB Models

Status: accepted Date: 2026-03-25 PR: copilot/increase-observability-logging (PR#106) Reviewer: Claude Code (PR#107 review)


Summary

This ADR documents the bugs and architectural gaps found during the review of PR#106 (observability logging + MongoDB report store + reflexion memory), along with the solutions applied.


Logging Strategy

Finding 1 — LangChain callback never wired into graph execution

Problem: RunLogger.callback was created but never passed to astream_events() or the LangGraph graph config. No LLM events would be captured in the JSONL log.

Solution: Wire rl.callback into all three astream_events() calls (run_scan, run_pipeline, run_portfolio) via the config={"callbacks": [rl.callback]} parameter.

Finding 2 — log_tool_call and log_vendor_call never called

Problem: These methods exist on RunLogger but nothing invokes them from run_tool_loop or route_to_vendor.

Solution: Both call-sites are now wired:

  • run_tool_loop (tradingagents/agents/utils/tool_runner.py) calls rl.log_tool_call() on every tool invocation (success, failure, unknown tool).
  • route_to_vendor (tradingagents/dataflows/interface.py) calls rl.log_vendor_call() on every vendor call (success and failure with fallback).

Finding 3 — threading.local incompatible with asyncio

Problem: set_run_logger / get_run_logger used threading.local(). Since asyncio runs all coroutines on one thread, concurrent pipelines (via asyncio.gather in run_auto) share the same thread-local slot.

Solution: Replace with contextvars.ContextVar which is correctly isolated per asyncio task.

Finding 4 — run_auto log lands in flat date directory

Problem: _finish_run_logger(run_id, get_daily_dir(date)) writes to the flat date directory instead of a run-namespaced path.

Status: Acceptable for V1. Each sub-phase already writes its own namespaced log. The top-level auto-run log is a summary.

Finding 5 — JSONL log not emitted in real time

Problem: Events are buffered in memory until _finish_run_logger.

Status: Acceptable for V1. Consider periodic flush for long-running auto runs in a future PR.


MongoDB Models

Finding 6 — list_pm_decisions returns raw ObjectId

Problem: The find() query returned full documents including _id: ObjectId, which is not JSON-serializable.

Solution: Add {"_id": 0} projection to the list_pm_decisions query.

Finding 7 — created_at type inconsistency

Problem: MongoReportStore stores created_at as native BSON datetime; ReflexionMemory stores it as an ISO 8601 string. Within separate collections this is consistent, but creates maintenance confusion.

Solution: ReflexionMemory.record_decision() now stores native datetime when writing to MongoDB, and only converts to ISO string for the local JSON fallback (which has no datetime type).

Finding 8 — No TTL index

Problem: Reports accumulate indefinitely in MongoDB.

Status: Deferred to a follow-up issue. Requires a retention policy decision before implementation.

Finding 9 — Synchronous pymongo in async FastAPI

Problem: All MongoDB calls block the asyncio event loop.

Status: Acceptable for V1. Plan motor migration before production deployment.

Finding 10 — MongoClient created per instance

Problem: Each MongoReportStore instantiation creates a new MongoClient with its own connection pool.

Status: Acceptable for V1. Plan singleton via FastAPI app lifespan.

Finding 11 — ensure_indexes() not called in __init__

Problem: Indexes were only created when going through the factory. Direct instantiation skips them.

Solution: Move ensure_indexes() call into MongoReportStore.__init__ so indexes are always created regardless of construction path.

Finding 12 — write_latest_pointer/read_latest_pointer use global REPORTS_ROOT

Problem: These functions use the global REPORTS_ROOT module constant, ignoring ReportStore._base_dir. When the base dir differs via env vars (PORTFOLIO_DATA_DIR, TRADINGAGENTS_REPORTS_DIR), the pointer file lands in the wrong directory tree.

Solution: Add an optional base_dir parameter to both functions, defaulting to REPORTS_ROOT. ReportStore now passes its _base_dir to both calls.


Holding Reviews Bug

Finding 13 — save_holding_review called with wrong arguments

Problem: In run_portfolio, store.save_holding_review(date, portfolio_id, reviews) passed portfolio_id as the ticker argument and the full reviews dict (keyed by ticker) as data. This created a single file named after the portfolio instead of one file per ticker.

Solution: Iterate over the reviews dict:

if isinstance(reviews, dict):
    for ticker, review_data in reviews.items():
        store.save_holding_review(date, ticker, review_data)

Consequences & Constraints

MUST

  • All astream_events() calls MUST pass rl.callback in the config to capture LLM metrics in the run log.
  • set_run_logger / get_run_logger MUST use contextvars.ContextVar, not threading.local.
  • write_latest_pointer / read_latest_pointer MUST accept a base_dir parameter and callers MUST pass it when their base differs from REPORTS_ROOT.
  • MongoReportStore.__init__ MUST call ensure_indexes().

SHOULD

  • Plan pymongomotor migration before production deployment.
  • Add TTL index strategy after retention policy is decided.

Corrected Status — Post-Verification

Re-verification of the actual copilot/increase-observability-logging branch code shows that recommended actions 13 are already implemented in PR#106:

  1. Fix list_pm_decisions — add {"_id": 0} projection Already done: {"_id": 0} is present with doc comment "Excludes _id (BSON ObjectId) which is not JSON-serializable."
  2. Standardise created_at to native datetime in ReflexionMemory Already done: uses datetime.now(timezone.utc); ISO string conversion only for local JSON fallback (which is correct — JSON has no native datetime type).
  3. Fix write_latest_pointer/read_latest_pointer to respect base_dir Already done: both functions accept base_dir: Path | None = None; _update_latest already passes self._base_dir.

Items 47 remain as future work (no code changes required before merge).