TradingAgents/docs/components/data-platform.md

# Data Platform & Tooling

TradingAgents separates data acquisition from agent logic so that research nodes focus on reasoning while the `Toolkit` and `dataflows` packages handle ingress, formatting, and caching.

## Toolkit Overview
`Toolkit` (`tradingagents/agents/utils/agent_utils.py`) exposes `@tool`-decorated functions consumable by LangGraph. Each tool encapsulates:
- Input annotations used by LangChain to validate structured tool calls.
- A thin adapter into the `dataflows.interface` module or a live API.
- Markdown-friendly string outputs to plug directly into prompts.

The toolkit is instantiated once per `TradingAgentsGraph` and shared across agents. Configuration merges `DEFAULT_CONFIG` with user overrides via `Toolkit.update_config()`.

## Offline vs Online Modes
`DEFAULT_CONFIG['online_tools']` determines which data plane to use.
- **Online**: Functions such as `get_YFin_data_online`, `get_global_news_openai`, or `get_fundamentals_openai` query remote APIs (OpenAI routes the heavier summarization workloads).
- **Offline**: Toolkit redirects to cached datasets stored under `tradingagents/dataflows/data_cache/` (written to on demand) or the curated archive referenced by `DEFAULT_CONFIG['data_dir']`.

```mermaid
flowchart LR
    AgentNode -->|tool call| Toolkit
    Toolkit -->|online| ExternalAPIs[(YFinance, Finnhub, OpenAI Functions, Google News)]
    Toolkit -->|offline| Dataflows
    Dataflows --> Cache[(CSV/JSON Cache)]
    ExternalAPIs --> Formatter[Markdown Formatter]
    Cache --> Formatter
    Formatter --> AgentNode
```

## Dataflows Package
Key modules under `tradingagents/dataflows/` include:
- `interface.py`: Public entry points that orchestrate date math, batching, and formatting for each provider. Functions like `get_finnhub_news` or `get_reddit_company_news` leverage helper utilities and guarantee consistent Markdown formatting.
- `yfin_utils.py`, `stockstats_utils.py`: Fetch and enrich market data (e.g., compute technical indicators before returning a report).
- `finnhub_utils.py`, `reddit_utils.py`, `googlenews_utils.py`: Read from exported datasets and support thread-safe, multi-day aggregation.
- `config.py`: Stores runtime config (`DATA_DIR`) and responds to updates from `TradingAgentsGraph` via `set_config()`.

## Memory Storage
`FinancialSituationMemory` (`tradingagents/agents/utils/memory.py`) functions as a lightweight experience replay buffer:
- Embeddings are computed using OpenAI's `text-embedding-3-small` or Ollama's `nomic-embed-text` depending on `backend_url`.
- Recommendations are stored in an in-memory ChromaDB collection per role.
- `get_memories()` returns the best matches for prompt injection and indicates similarity scores for future weighting.

Persisting memories across runs requires reconfiguring the Chroma client to use a file-backed storage provider.

## Reliability Concerns
- Service limits (API quotas, tool failures) bubble up as exceptions from Toolkit calls. Consider wrapping nodes with retry logic if productionizing the framework.
- Some offline datasets referenced in `DEFAULT_CONFIG['data_dir']` are not bundled; deployments must provision the directory or override the path.
- Tool sanitization (e.g., verifying tickers, clamping lookback windows) is minimal today; additional guards can be added in `dataflows/interface.py`.