TradingAgents/MarketDataService_PRD.md

18 KiB

Product Requirements Document: MarketDataService Completion

Overview

Complete the MarketDataService to provide strongly-typed market data and technical indicators to trading agents using a local-first data strategy with gap detection and intelligent caching.

Current State Analysis

Issues to Fix

  • CRITICAL: Service uses BaseClient inheritance but YFinanceClient exists and needs refactoring to FinnhubClient standard
  • CRITICAL: Service calls client methods with string dates instead of date objects
  • CRITICAL: Need to integrate stockstats library for technical analysis calculations instead of legacy utils
  • CRITICAL: MarketDataRepository exists but missing service interface methods
  • Missing strongly-typed interface between YFinanceClient and service
  • YFinanceClient uses BaseClient inheritance and string dates (needs refactoring)
  • No concrete gap detection logic
  • Missing technical indicator data sufficiency validation

What Works

  • Local-first data strategy implementation (_get_price_data_local_first)
  • Force refresh logic (_fetch_and_cache_fresh_data)
  • MarketDataContext Pydantic model for agent consumption
  • Error handling and metadata creation patterns
  • YFinanceClient exists with yfinance SDK integration and comprehensive methods
  • MarketDataRepository exists with CSV storage and pandas DataFrame operations
  • Service structure ready for stockstats integration for technical analysis

Technical Requirements

1. Strongly-Typed Interfaces

Client → Service Interface

# YFinanceClient methods (to be refactored)
def get_historical_data(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
def get_price_data(symbol: str, start_date: date, end_date: date) -> dict[str, Any]

# Technical analysis handled in service layer using stockstats
# No get_technical_indicator method needed in client - calculated from OHLCV data

Service → Repository Interface

# MarketDataRepository methods (to be implemented)
def has_data_for_period(symbol: str, start_date: str, end_date: str) -> bool
def get_data(symbol: str, start_date: str, end_date: str) -> dict[str, Any]
def store_data(symbol: str, cache_data: dict, overwrite: bool) -> bool
def clear_data(symbol: str, start_date: str, end_date: str) -> bool

Service → Agent Interface

# Service output (already defined)
def get_context(symbol: str, start_date: str, end_date: str, indicators: list[str], force_refresh: bool) -> MarketDataContext

2. Local-First Data Strategy

Flow

  1. Repository Lookup: Check MarketDataRepository.has_data_for_period()
  2. Gap Detection: Identify missing price data periods using detect_market_gaps()
  3. Data Sufficiency Check: Ensure enough historical data for requested indicators
  4. Selective Fetching: Fetch only missing data from YFinanceClient
  5. Cache Updates: Store new data via repository.store_data()
  6. Context Assembly: Return validated MarketDataContext

Gap Detection Implementation

def detect_market_gaps(self, cached_dates: list[str], requested_start: str, requested_end: str) -> list[tuple[str, str]]:
    """
    Returns list of (start, end) tuples for missing periods.

    Example: If requesting 2024-01-01 to 2024-01-31 and cache has:
    - 2024-01-01 to 2024-01-10
    - 2024-01-20 to 2024-01-25
    Returns: [("2024-01-11", "2024-01-19"), ("2024-01-26", "2024-01-31")]

    Accounts for:
    - Weekends (Saturday/Sunday)
    - Market holidays
    - Continuous date ranges to minimize API calls
    """
    # Implementation should use pandas business day logic

Force Refresh Support

  • force_refresh=True bypasses local data completely
  • Clears existing cache before fetching fresh data
  • Stores refreshed data with metadata indicating refresh

Cache Invalidation Strategy

  • Historical data is immutable: Data older than yesterday never changes
  • Today's data needs updates: During market hours, refresh every 15 minutes
  • After market close: Today's data becomes immutable
def is_data_stale(self, data_date: date, last_updated: datetime) -> bool:
    today = date.today()
    if data_date < today:
        return False  # Historical data never stale

    # For today's data, check if market is open and last update > 15 min
    if is_market_open() and (datetime.now() - last_updated).minutes > 15:
        return True
    return False

3. Date Object Conversion

Service Boundary Conversion

# Service receives string dates from agents
def get_context(self, symbol: str, start_date: str, end_date: str, ...) -> MarketDataContext:
    # Validate date strings
    try:
        start_dt = date.fromisoformat(start_date)
        end_dt = date.fromisoformat(end_date)
    except ValueError as e:
        raise ValueError(f"Invalid date format: {e}")

    # Check date order
    if end_dt < start_dt:
        raise ValueError(f"End date {end_date} is before start date {start_date}")

    # Expand date range for technical indicators
    expanded_start = self._calculate_lookback_start(start_dt, indicators)

    # Use date objects when calling YFinanceClient
    price_data = self.yfinance_client.get_historical_data(symbol, expanded_start, end_dt)

    # Calculate technical indicators using stockstats library
    technical_indicators = self._calculate_technical_indicators(price_data, indicators)

4. Technical Analysis with Stockstats

Data Sufficiency Validation

# Minimum data points required for each indicator
INDICATOR_REQUIREMENTS = {
    "sma_20": 20,
    "sma_200": 200,
    "ema_12": 24,      # 2x for exponential smoothing
    "ema_200": 400,
    "rsi_14": 28,      # 2x period for warm-up
    "macd": 34,        # 26 + 8 for signal line
    "bb_upper": 20,    # Based on 20-period SMA
    "atr_14": 28,      # 2x period for accuracy
    "stochrsi_14": 42, # 3x period for double smoothing
}

def _calculate_lookback_start(self, start_date: date, indicators: list[str]) -> date:
    """Calculate how far back we need data to compute indicators accurately."""
    max_lookback = 0
    for indicator in indicators:
        lookback = INDICATOR_REQUIREMENTS.get(indicator, 0)
        max_lookback = max(max_lookback, lookback)

    # Add buffer for weekends/holidays
    business_days_back = max_lookback * 1.5
    return start_date - timedelta(days=int(business_days_back))

def _validate_data_sufficiency(self, data_points: int, indicators: list[str]) -> dict[str, bool]:
    """Check if we have enough data for each indicator."""
    return {
        indicator: data_points >= INDICATOR_REQUIREMENTS.get(indicator, 0)
        for indicator in indicators
    }

Stockstats Integration

def _calculate_technical_indicators(self, price_data: list[dict], indicators: list[str]) -> dict[str, list[dict]]:
    """
    Calculate technical indicators using stockstats library.

    Args:
        price_data: OHLCV data from YFinanceClient
        indicators: List of requested indicators (e.g., ['rsi_14', 'macd', 'bb_upper', 'sma_20'])

    Returns:
        Dict mapping indicator names to time series data
    """
    import pandas as pd
    from stockstats import StockDataFrame

    # Convert price data to pandas DataFrame
    df = pd.DataFrame(price_data)
    df['date'] = pd.to_datetime(df['date'])
    df.set_index('date', inplace=True)

    # Check data sufficiency
    sufficiency = self._validate_data_sufficiency(len(df), indicators)

    # Create StockDataFrame for technical analysis
    sdf = StockDataFrame.retype(df)

    # Calculate requested indicators
    indicator_data = {}
    for indicator in indicators:
        if not sufficiency[indicator]:
            logger.warning(f"Insufficient data for {indicator}, need {INDICATOR_REQUIREMENTS[indicator]} points")
            indicator_data[indicator] = []
            continue

        try:
            if indicator in sdf.columns:
                values = sdf[indicator].dropna()
                indicator_data[indicator] = [
                    {"date": idx.strftime("%Y-%m-%d"), "value": float(val)}
                    for idx, val in values.items()
                ]
        except Exception as e:
            logger.warning(f"Failed to calculate {indicator}: {e}")
            indicator_data[indicator] = []

    return indicator_data

5. Error Recovery and Partial Data

def handle_partial_price_data(
    self,
    requested_start: str,
    requested_end: str,
    available_data: list[dict]
) -> MarketDataContext:
    """
    Handle cases where only partial date range is available.

    - If no data available: Raise exception
    - If partial data: Return what's available with metadata
    - Mark gaps in metadata
    """
    if not available_data:
        raise ValueError(f"No market data available for {symbol}")

    actual_start = min(d['date'] for d in available_data)
    actual_end = max(d['date'] for d in available_data)

    metadata = {
        "requested_period": {"start": requested_start, "end": requested_end},
        "actual_period": {"start": actual_start, "end": actual_end},
        "partial_data": actual_start > requested_start or actual_end < requested_end,
        "data_points": len(available_data)
    }

    # Return context with available data and metadata

6. Pydantic Validation

Context Structure

@dataclass
class MarketDataContext(BaseModel):
    symbol: str
    period: dict[str, str]  # {"start": "2024-01-01", "end": "2024-01-31"}
    price_data: list[dict[str, Any]]  # OHLCV records
    technical_indicators: dict[str, list[TechnicalIndicatorData]]
    metadata: dict[str, Any]

    @validator('price_data')
    def validate_price_data(cls, v):
        # Ensure OHLCV fields present and valid
        required_fields = {'date', 'open', 'high', 'low', 'close', 'volume'}
        for record in v:
            if not all(field in record for field in required_fields):
                raise ValueError(f"Missing required OHLCV fields")
        return v

Implementation Tasks

Phase 1: Refactor YFinanceClient

  1. YFinanceClient Refactoring

    • Refactor existing tradingagents/clients/yfinance_client.py
    • Remove BaseClient inheritance
    • Update all method signatures to accept date objects instead of strings
    • Keep all existing functionality intact
    • Example changes:
    # Current (wrong)
    def get_historical_data(self, symbol: str, start_date: str, end_date: str) -> dict[str, Any]:
    
    # Updated (correct)
    def get_historical_data(self, symbol: str, start_date: date, end_date: date) -> dict[str, Any]:
    
  2. Comprehensive Testing

    • Update tradingagents/clients/test_yfinance_client.py
    • Test with date objects
    • Use pytest-vcr for HTTP interaction recording
    • Test error handling and edge cases

Phase 2: Update MarketDataRepository

  1. Repository Interface Enhancement
    • Update existing tradingagents/repositories/market_data_repository.py
    • Add missing service interface methods: has_data_for_period(), get_data(), store_data(), clear_data()
    • Maintain existing CSV/pandas functionality while adding service compatibility
    • Support gap detection and partial data scenarios

Phase 3: Update MarketDataService

  1. Client Integration Fix

    • Replace BaseClient dependency with YFinanceClient
    • File: tradingagents/services/market_data_service.py:8, 26
    • Update constructor to accept yfinance_client: YFinanceClient
  2. Date Conversion and Validation

    • Add date.fromisoformat() conversion in service methods
    • Add date validation (format, order)
    • Update client calls to use date objects instead of strings
    • File: tradingagents/services/market_data_service.py:151, 227
  3. Technical Indicator Integration with Stockstats

    • Implement _calculate_technical_indicators() method using stockstats library
    • Add _calculate_lookback_start() for data sufficiency
    • Add _validate_data_sufficiency() to check if enough data
    • Replace legacy StockstatsUtils integration with direct stockstats usage
    • File: tradingagents/services/market_data_service.py:9, 43, 280-346

Phase 4: Type Safety & Validation

  1. Comprehensive Type Checking

    • Run mise run typecheck - must pass with 0 errors
    • Validate all date object conversions
    • Ensure MarketDataContext compliance
  2. Enhanced Testing

    • Update existing service tests for new YFinanceClient interface
    • Add gap detection test scenarios
    • Test technical indicator data sufficiency
    • Test partial data handling

Testing Scenarios

Integration Tests

  1. Gap Detection

    • Test with empty cache (should fetch all)
    • Test with partial cache (should fetch only missing periods)
    • Test weekend/holiday handling
  2. Technical Indicator Sufficiency

    • Test SMA_200 with only 100 days of data (should skip indicator)
    • Test RSI_14 with exactly 28 days (should calculate)
    • Test mixed indicators with varying data requirements
  3. Partial Data Recovery

    • Test when API returns less data than requested
    • Test when some dates are missing (holidays)
    • Test metadata accuracy for partial data
  4. Date Handling

    • Test invalid date formats
    • Test end_date < start_date
    • Test future dates
    • Test weekend date handling
  5. Cache Staleness

    • Test historical data (should never refresh)
    • Test today's data during market hours (should refresh if > 15 min)
    • Test today's data after market close (should not refresh)

Success Criteria

Functional Requirements

  • Service successfully calls refactored YFinanceClient with date objects
  • Gap detection correctly identifies missing trading days
  • Technical indicators validate data sufficiency before calculation
  • Partial data scenarios handled gracefully
  • Local-first strategy works: checks cache → identifies gaps → fetches missing → stores updates
  • Returns properly validated MarketDataContext to agents
  • Technical indicators calculated from OHLCV data using stockstats library
  • Force refresh bypasses cache and refreshes data

Technical Requirements

  • Zero type checking errors: mise run typecheck
  • Zero linting errors: mise run lint
  • All existing tests pass with updated architecture
  • No runtime errors with date conversions
  • Proper error messages for validation failures

Quality Requirements

  • Strongly-typed interfaces between all components
  • Official yfinance SDK and stockstats library usage
  • Comprehensive error handling and logging
  • Efficient caching with minimal API calls
  • Clear separation of concerns between service, client, and repository

Data Architecture

YFinanceClient Response Format

{
    "symbol": "AAPL",
    "period": {"start": "2024-01-01", "end": "2024-01-31"},
    "data": [
        {
            "date": "2024-01-02",  # Note: Jan 1 was a holiday
            "open": 150.0,
            "high": 155.0,
            "low": 149.0,
            "close": 154.0,
            "volume": 1000000,
            "adj_close": 154.0
        },
        ...
    ],
    "metadata": {
        "source": "yfinance",
        "retrieved_at": "2024-01-31T10:00:00Z",
        "data_quality": "HIGH",
        "missing_dates": ["2024-01-01", "2024-01-15"]  # Holidays
    }
}

Technical Indicator Data Format

# MarketDataContext.technical_indicators structure
{
    "rsi_14": [
        {"date": "2024-01-29", "value": 65.5},  # First valid after 28 days
        {"date": "2024-01-30", "value": 67.2},
        ...
    ],
    "sma_200": [],  # Empty if insufficient data
    "macd": [
        {"date": "2024-01-31", "value": {"macd": 2.1, "signal": 1.8, "histogram": 0.3}}
    ],
    "_metadata": {
        "indicators_calculated": ["rsi_14", "macd"],
        "indicators_skipped": {
            "sma_200": "Insufficient data: need 200 points, have 31"
        }
    }
}

Dependencies

Existing Components (Need Updates)

  • YFinanceClient exists but needs refactoring (remove BaseClient, use date objects)
  • MarketDataRepository exists with CSV storage but needs service interface methods
  • Tests exist but need updates for new interfaces

Required

  • Official yfinance library for market data fetching
  • stockstats library for technical analysis calculations
  • pandas for date/time handling and business day calculations
  • Working internet connection for live data fetching
  • Writable data directory for repository storage

Timeline

Immediate (Phase 1)

  • Refactor existing YFinanceClient to use date objects
  • Remove BaseClient inheritance
  • Update tests for new interface

Phase 2-3

  • Add service interface methods to MarketDataRepository
  • Update MarketDataService to use refactored YFinanceClient
  • Implement data sufficiency validation
  • Integrate stockstats library for technical indicators

Phase 4

  • Comprehensive type checking and validation
  • Integration testing with gap detection
  • Performance optimization and caching efficiency

Acceptance Criteria

Must Have

  1. Type Safety: Service passes mise run typecheck with zero errors
  2. Client Refactoring: YFinanceClient uses date objects, no BaseClient
  3. Gap Detection: Correctly identifies missing trading days
  4. Data Sufficiency: Validates enough data for technical indicators
  5. Partial Data: Service handles incomplete data gracefully
  6. Local-First: Service checks repository before API calls
  7. Context Validation: Returns valid MarketDataContext with Pydantic validation
  8. Technical Indicators: Calculated using stockstats with proper validation

Should Have

  1. Cache Efficiency: Minimal redundant API calls to Yahoo Finance
  2. Force Refresh: Complete cache bypass when requested
  3. Stale Data Handling: Refresh today's data during market hours
  4. Clear Error Messages: Informative errors for validation failures

Nice to Have

  1. Performance Metrics: Timing and cache hit rate logging
  2. Extended Indicators: Support for 50+ technical indicators
  3. Real-time Data: WebSocket integration for live prices
  4. Bulk Symbol Support: Fetch multiple symbols efficiently

This PRD focuses on completing the MarketDataService as a strongly-typed, local-first data service that integrates OHLCV price data from a refactored YFinanceClient and calculates comprehensive technical indicators using the stockstats library, with robust gap detection and data sufficiency validation.