TradingAgents/docs/security/PR281_CRITICAL_FIXES.md

7.5 KiB
Raw Blame History

PR #281 Critical Security Fixes

Priority: CRITICAL Impact: Prevents path traversal attacks, data loss, and unauthorized file access Estimated Total Time: 15-20 minutes


Fix 1: ChromaDB Reset Flag - Production Hardening

File: /tradingagents/agents/utils/memory.py Line: 13 Severity: HIGH - Allows complete database deletion Time to Apply: 2 minutes

Why This Matters

Setting allow_reset=True in production allows anyone with access to completely wipe the ChromaDB database. This is a data loss risk and should only be enabled in development/testing environments.

BEFORE

def __init__(self, name, config):
    if config["backend_url"] == "http://localhost:11434/v1":
        self.embedding = "nomic-embed-text"
    else:
        self.embedding = "text-embedding-3-small"
    self.client = OpenAI(base_url=config["backend_url"])
    self.chroma_client = chromadb.Client(Settings(allow_reset=True))  # ⚠️ DANGEROUS
    self.situation_collection = self.chroma_client.create_collection(name=name)

AFTER

def __init__(self, name, config):
    if config["backend_url"] == "http://localhost:11434/v1":
        self.embedding = "nomic-embed-text"
    else:
        self.embedding = "text-embedding-3-small"
    self.client = OpenAI(base_url=config["backend_url"])
    self.chroma_client = chromadb.Client(Settings(allow_reset=False))  # ✓ SECURE
    self.situation_collection = self.chroma_client.create_collection(name=name)

Fix 2: Input Validation - Prevent Path Traversal

File: /tradingagents/dataflows/local.py Lines: 11-50, 51-84, and similar patterns throughout Severity: CRITICAL - Allows arbitrary file access Time to Apply: 8-10 minutes

Why This Matters

Ticker symbols are directly interpolated into file paths without validation. An attacker could provide input like ../../etc/passwd or ../../../sensitive_data to access files outside the intended directory.

BEFORE

def get_YFin_data_window(
    symbol: Annotated[str, "ticker symbol of the company"],
    curr_date: Annotated[str, "Start date in yyyy-mm-dd format"],
    look_back_days: Annotated[int, "how many days to look back"],
) -> str:
    # calculate past days
    date_obj = datetime.strptime(curr_date, "%Y-%m-%d")
    before = date_obj - relativedelta(days=look_back_days)
    start_date = before.strftime("%Y-%m-%d")

    # read in data
    data = pd.read_csv(
        os.path.join(
            DATA_DIR,
            f"market_data/price_data/{symbol}-YFin-data-2015-01-01-2025-03-25.csv",  # ⚠️ VULNERABLE
        )
    )

AFTER

import re

def validate_ticker_symbol(symbol: str) -> str:
    """
    Validate and sanitize ticker symbol to prevent path traversal.

    Args:
        symbol: Ticker symbol to validate

    Returns:
        Sanitized ticker symbol

    Raises:
        ValueError: If ticker contains invalid characters
    """
    # Ticker symbols should only contain alphanumeric characters, dots, and hyphens
    if not re.match(r'^[A-Za-z0-9.\-]+$', symbol):
        raise ValueError(f"Invalid ticker symbol: {symbol}")

    # Prevent path traversal patterns
    if '..' in symbol or '/' in symbol or '\\' in symbol:
        raise ValueError(f"Invalid ticker symbol: {symbol}")

    # Limit length (typical tickers are 1-5 characters, extended can be longer)
    if len(symbol) > 10:
        raise ValueError(f"Ticker symbol too long: {symbol}")

    return symbol.upper()  # Normalize to uppercase


def get_YFin_data_window(
    symbol: Annotated[str, "ticker symbol of the company"],
    curr_date: Annotated[str, "Start date in yyyy-mm-dd format"],
    look_back_days: Annotated[int, "how many days to look back"],
) -> str:
    # Validate ticker symbol
    symbol = validate_ticker_symbol(symbol)  # ✓ SECURE

    # calculate past days
    date_obj = datetime.strptime(curr_date, "%Y-%m-%d")
    before = date_obj - relativedelta(days=look_back_days)
    start_date = before.strftime("%Y-%m-%d")

    # read in data
    data = pd.read_csv(
        os.path.join(
            DATA_DIR,
            f"market_data/price_data/{symbol}-YFin-data-2015-01-01-2025-03-25.csv",  # ✓ SAFE NOW
        )
    )

Additional Changes Required

Apply the validate_ticker_symbol() call to ALL functions in local.py that accept a ticker parameter:

  • get_YFin_data() - line 51
  • get_finnhub_news() - line 85
  • get_finnhub_company_insider_sentiment() - line 120
  • get_finnhub_company_insider_transactions() - line 157
  • get_data_in_range() - line 194
  • get_simfin_balance_sheet() - line 227
  • get_simfin_cashflow() - line 274
  • get_simfin_income_statements() - line 321

Pattern to apply:

def function_name(ticker: str, ...):
    ticker = validate_ticker_symbol(ticker)  # Add this as first line
    # ... rest of function

Fix 3: CLI Input Validation

File: /cli/main.py Lines: 499-501, 438 Severity: HIGH - Entry point for malicious input Time to Apply: 3-5 minutes

Why This Matters

The CLI accepts ticker symbols without validation, which feeds directly into the vulnerable file path operations in local.py. This is the primary attack vector.

BEFORE

def get_ticker():
    """Get ticker symbol from user input."""
    return typer.prompt("", default="SPY")  # ⚠️ NO VALIDATION

AFTER

def get_ticker():
    """Get ticker symbol from user input with validation."""
    while True:
        ticker = typer.prompt("", default="SPY")
        try:
            # Validate ticker format (alphanumeric, dots, hyphens only)
            if not ticker or len(ticker) > 10:
                console.print("[red]Error: Ticker must be 1-10 characters[/red]")
                continue

            # Check for path traversal attempts
            if '..' in ticker or '/' in ticker or '\\' in ticker:
                console.print("[red]Error: Invalid characters in ticker symbol[/red]")
                continue

            # Validate characters
            if not all(c.isalnum() or c in '.-' for c in ticker):
                console.print("[red]Error: Ticker can only contain letters, numbers, dots, and hyphens[/red]")
                continue

            return ticker.upper()  # ✓ SECURE AND NORMALIZED
        except Exception as e:
            console.print(f"[red]Error validating ticker: {e}[/red]")

Testing Recommendations

After applying these fixes, test with these attack vectors to ensure they're blocked:

# Test CLI with malicious input
python -m cli.main analyze
# Try entering: ../../etc/passwd
# Try entering: ../../../sensitive_file
# Try entering: AAPL/../../../etc/hosts

# Test programmatically
python -c "
from tradingagents.dataflows.local import validate_ticker_symbol
try:
    validate_ticker_symbol('../../etc/passwd')
    print('FAIL: Attack not blocked')
except ValueError:
    print('PASS: Attack blocked')
"

Summary

Fix File Lines Changed Time Risk Reduced
ChromaDB Reset memory.py 1 2 min Data loss
Path Traversal local.py ~30 10 min File access
CLI Validation cli/main.py ~20 5 min Attack vector

Total Estimated Time: 15-20 minutes Security Impact: Prevents critical path traversal and data loss vulnerabilities


References

  • CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
  • CWE-73: External Control of File Name or Path
  • OWASP Top 10: A01:2021 Broken Access Control