TradingAgents/docs/components/data-platform.md

3.3 KiB

Data Platform & Tooling

TradingAgents separates data acquisition from agent logic so that research nodes focus on reasoning while the Toolkit and dataflows packages handle ingress, formatting, and caching.

Toolkit Overview

Toolkit (tradingagents/agents/utils/agent_utils.py) exposes @tool-decorated functions consumable by LangGraph. Each tool encapsulates:

  • Input annotations used by LangChain to validate structured tool calls.
  • A thin adapter into the dataflows.interface module or a live API.
  • Markdown-friendly string outputs to plug directly into prompts.

The toolkit is instantiated once per TradingAgentsGraph and shared across agents. Configuration merges DEFAULT_CONFIG with user overrides via Toolkit.update_config().

Offline vs Online Modes

DEFAULT_CONFIG['online_tools'] determines which data plane to use.

  • Online: Functions such as get_YFin_data_online, get_global_news_openai, or get_fundamentals_openai query remote APIs (OpenAI routes the heavier summarization workloads).
  • Offline: Toolkit redirects to cached datasets stored under tradingagents/dataflows/data_cache/ (written to on demand) or the curated archive referenced by DEFAULT_CONFIG['data_dir'].
flowchart LR
    AgentNode -->|tool call| Toolkit
    Toolkit -->|online| ExternalAPIs[(YFinance, Finnhub, OpenAI Functions, Google News)]
    Toolkit -->|offline| Dataflows
    Dataflows --> Cache[(CSV/JSON Cache)]
    ExternalAPIs --> Formatter[Markdown Formatter]
    Cache --> Formatter
    Formatter --> AgentNode

Dataflows Package

Key modules under tradingagents/dataflows/ include:

  • interface.py: Public entry points that orchestrate date math, batching, and formatting for each provider. Functions like get_finnhub_news or get_reddit_company_news leverage helper utilities and guarantee consistent Markdown formatting.
  • yfin_utils.py, stockstats_utils.py: Fetch and enrich market data (e.g., compute technical indicators before returning a report).
  • finnhub_utils.py, reddit_utils.py, googlenews_utils.py: Read from exported datasets and support thread-safe, multi-day aggregation.
  • config.py: Stores runtime config (DATA_DIR) and responds to updates from TradingAgentsGraph via set_config().

Memory Storage

FinancialSituationMemory (tradingagents/agents/utils/memory.py) functions as a lightweight experience replay buffer:

  • Embeddings are computed using OpenAI's text-embedding-3-small or Ollama's nomic-embed-text depending on backend_url.
  • Recommendations are stored in an in-memory ChromaDB collection per role.
  • get_memories() returns the best matches for prompt injection and indicates similarity scores for future weighting.

Persisting memories across runs requires reconfiguring the Chroma client to use a file-backed storage provider.

Reliability Concerns

  • Service limits (API quotas, tool failures) bubble up as exceptions from Toolkit calls. Consider wrapping nodes with retry logic if productionizing the framework.
  • Some offline datasets referenced in DEFAULT_CONFIG['data_dir'] are not bundled; deployments must provision the directory or override the path.
  • Tool sanitization (e.g., verifying tickers, clamping lookback windows) is minimal today; additional guards can be added in dataflows/interface.py.