# ADR 013: AgentOS WebSocket Streaming Architecture

## Status

Accepted

## Context

TradingAgents needed a visual observability layer to monitor agent execution in real-time. The CLI (Rich-based) works well for terminal users but doesn't provide graph visualization or persistent portfolio views. Key requirements:

1. Stream LangGraph events to a web UI in real-time
2. Visualize the agent workflow as a live graph
3. Show portfolio holdings, trades, and metrics
4. Support all 4 run types (scan, pipeline, portfolio, auto)

## Decision

### REST + WebSocket Split

REST endpoints (`POST /api/run/{type}`) **only queue** runs to an in-memory store. The WebSocket endpoint (`WS /ws/stream/{run_id}`) is the **sole executor** — it picks up queued runs, calls the appropriate LangGraph engine method, and streams events back to the frontend.

This avoids the complexity of background task coordination. The frontend triggers a REST call, gets a `run_id`, then connects via WebSocket to that `run_id` to receive all events.

### Event Mapping

LangGraph v2's `astream_events()` produces raw events with varying structures per provider. `LangGraphEngine._map_langgraph_event()` normalizes these into 4 event types: `thought`, `tool`, `tool_result`, `result`. Each event includes:

- `node_id`, `parent_node_id` for graph construction
- `metrics` (model, tokens, latency)
- Optional `prompt` and `response` full-text fields

The mapper uses try/except per event type and a `_safe_dict()` helper to prevent crashes from non-dict metadata (e.g., some providers return strings or lists).

### Field Mapping (Backend → Frontend)

Portfolio models use different field names than the frontend expects. The `/latest` endpoint maps: `shares` → `quantity`, `portfolio_id` → `id`, `cash` → `cash_balance`, `trade_date` → `executed_at`. Computed runtime fields (`market_value`, `unrealized_pnl`) are included from enriched Holding properties.

### Pipeline Recursion Limit

`run_pipeline()` passes `config={"recursion_limit": propagator.max_recur_limit}` (default 100) to `astream_events()`. Without this, LangGraph defaults to 25, which is insufficient for the debate + risk cycles (up to ~10 iterations).

## Consequences

- **Pro**: Real-time visibility into agent execution with zero CLI changes
- **Pro**: Crash-proof event mapping — one bad event doesn't kill the stream
- **Pro**: Clean separation — frontend can reconnect to ongoing runs
- **Con**: In-memory run store is not persistent (acceptable for V1)
- **Con**: Single-tenant auth (hardcoded user) — needs JWT for production

## Source Files

- `agent_os/backend/services/langgraph_engine.py`
- `agent_os/backend/routes/websocket.py`
- `agent_os/backend/routes/runs.py`
- `agent_os/backend/routes/portfolios.py`
- `agent_os/frontend/src/hooks/useAgentStream.ts`
- `agent_os/frontend/src/Dashboard.tsx`