TradingAgents/FundamentalDataService_PRD.md

10 KiB

Product Requirements Document: FundamentalDataService Completion

Overview

Complete the FundamentalDataService to provide strongly-typed fundamental financial data to trading agents using a local-first data strategy with gap detection and intelligent caching.

Current State Analysis

Issues to Fix

  • CRITICAL: Service calls FinnhubClient methods with string dates but client expects date objects
  • CRITICAL: References non-existent self.simfin_client instead of self.finnhub_client
  • Missing strongly-typed interfaces between components
  • Incomplete local-first strategy implementation
  • No concrete gap detection logic
  • Missing error recovery for partial data

What Works

  • FinnhubClient fully implemented with strict date object interface
  • FundamentalDataRepository with dataclass-based storage
  • FundamentalContext Pydantic model for agent consumption
  • Basic service structure and error handling

Technical Requirements

1. Strongly-Typed Interfaces

Client → Service Interface

# FinnhubClient methods (already implemented)
def get_balance_sheet(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
def get_income_statement(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
def get_cash_flow(symbol: str, frequency: str, report_date: date) -> dict[str, Any]

Service → Repository Interface

# Repository methods (already implemented)
def has_data_for_period(symbol: str, start_date: str, end_date: str, frequency: str) -> bool
def get_data(symbol: str, start_date: str, end_date: str, frequency: str) -> dict[str, Any]
def store_data(symbol: str, cache_data: dict, frequency: str, overwrite: bool) -> bool
def clear_data(symbol: str, start_date: str, end_date: str, frequency: str) -> bool

Service → Agent Interface

# Service output (already defined)
def get_context(symbol: str, start_date: str, end_date: str, frequency: str, force_refresh: bool) -> FundamentalContext

2. Local-First Data Strategy

Flow

  1. Repository Lookup: Check FundamentalDataRepository.has_data_for_period()
  2. Gap Detection: Identify missing data periods using detect_fundamental_gaps()
  3. Selective Fetching: Fetch only missing data from FinnhubClient
  4. Cache Updates: Store new data via repository.store_data()
  5. Context Assembly: Return validated FundamentalContext

Gap Detection Implementation

def detect_fundamental_gaps(self, symbol: str, start_date: str, end_date: str, frequency: str) -> list[str]:
    """
    Returns list of report dates that need fetching.

    Example: If requesting quarterly from 2024-01-01 to 2024-12-31
    and cache has Q1 and Q3, returns ["2024-06-30", "2024-09-30", "2024-12-31"]

    For quarterly: Check for Q1 (Mar 31), Q2 (Jun 30), Q3 (Sep 30), Q4 (Dec 31)
    For annual: Check for fiscal year ends
    """
    # Implementation should:
    # 1. Get existing report dates from repository
    # 2. Calculate expected report dates in requested period
    # 3. Return difference between expected and existing

Force Refresh Support

  • force_refresh=True bypasses local data completely
  • Clears existing cache before fetching fresh data
  • Stores refreshed data with metadata indicating refresh

Cache Invalidation Strategy

  • Fundamental data is immutable: Once a report is filed, it doesn't change
  • No staleness checks needed: Reports are valid indefinitely
  • Only fetch if missing: Never re-fetch existing reports

3. Date Object Conversion

Service Boundary Conversion

# Service receives string dates from agents
def get_context(self, symbol: str, start_date: str, end_date: str, ...) -> FundamentalContext:
    # Validate date strings
    try:
        start_dt = date.fromisoformat(start_date)
        end_dt = date.fromisoformat(end_date)
    except ValueError as e:
        raise ValueError(f"Invalid date format: {e}")

    # Check date order
    if end_dt < start_dt:
        raise ValueError(f"End date {end_date} is before start date {start_date}")

    # Use date objects when calling FinnhubClient
    data = self.finnhub_client.get_balance_sheet(symbol, frequency, end_dt)

4. Error Recovery and Partial Data

def handle_partial_statements(
    self,
    balance_sheet: dict | None,
    income_statement: dict | None,
    cash_flow: dict | None
) -> FundamentalContext:
    """
    Create context even if some statements are missing.

    - If all statements fail: Raise exception
    - If some statements succeed: Return partial context
    - Mark missing statements in metadata
    """
    metadata = {
        "has_balance_sheet": balance_sheet is not None,
        "has_income_statement": income_statement is not None,
        "has_cash_flow": cash_flow is not None,
        "partial_data": any(s is None for s in [balance_sheet, income_statement, cash_flow])
    }

    # Convert available statements to FinancialStatement objects
    # Return FundamentalContext with available data

5. Pydantic Validation

Context Structure

@dataclass
class FundamentalContext(BaseModel):
    symbol: str
    period: dict[str, str]  # {"start": "2024-01-01", "end": "2024-01-31"}
    balance_sheet: FinancialStatement | None
    income_statement: FinancialStatement | None
    cash_flow: FinancialStatement | None
    key_ratios: dict[str, float]
    metadata: dict[str, Any]

    @validator('period')
    def validate_period(cls, v):
        # Ensure start and end dates are present and valid
        return v

Implementation Tasks

Phase 1: Fix Critical Issues

  1. Date Conversion Fix

    • Add date.fromisoformat() conversion in service methods
    • Add date validation (format, order)
    • Update all FinnhubClient method calls to use date objects
    • File: tradingagents/services/fundamental_data_service.py:153, 164, 175
  2. Client Reference Fix

    • Replace self.simfin_client with self.finnhub_client
    • File: tradingagents/services/fundamental_data_service.py:375

Phase 2: Enhanced Local-First Strategy

  1. Gap Detection Logic

    • Implement detect_fundamental_gaps() method
    • Calculate expected report dates based on frequency
    • Compare with cached data to find gaps
    • Handle fiscal year variations
  2. Partial Data Handling

    • Implement handle_partial_statements() method
    • Continue processing if some statements succeed
    • Mark missing data in metadata
    • Only fail if all statements fail

Phase 3: Type Safety & Validation

  1. Comprehensive Type Checking

    • Run mise run typecheck - must pass with 0 errors
    • Validate all date object conversions
    • Ensure Pydantic model compliance
  2. Enhanced Testing

    • Update existing tests for new date handling
    • Add gap detection test scenarios
    • Test partial data scenarios
    • Test force refresh behavior
    • Test date validation edge cases

Testing Scenarios

Integration Tests

  1. Gap Detection

    • Test with empty cache (should fetch all)
    • Test with partial cache (should fetch only missing)
    • Test with complete cache (should fetch none)
  2. Partial Data Recovery

    • Test when balance sheet API fails but others succeed
    • Test when only one statement type is available
    • Test when all APIs fail (should raise exception)
  3. Date Handling

    • Test invalid date formats
    • Test end_date < start_date
    • Test boundary conditions (year start/end)
  4. Force Refresh

    • Test that force_refresh=True clears cache
    • Test that new data is fetched and stored

Success Criteria

Functional Requirements

  • Service successfully calls FinnhubClient with date objects
  • Gap detection correctly identifies missing reports
  • Partial data scenarios handled gracefully
  • Local-first strategy works: checks cache → identifies gaps → fetches missing → stores updates
  • Returns properly validated FundamentalContext to agents
  • Force refresh bypasses cache and refreshes data

Technical Requirements

  • Zero type checking errors: mise run typecheck
  • Zero linting errors: mise run lint
  • All existing tests pass
  • No runtime errors with date conversions
  • Proper error messages for validation failures

Quality Requirements

  • Strongly-typed interfaces between all components
  • Comprehensive error handling and logging
  • Efficient caching with minimal API calls
  • Clear separation of concerns between service, client, and repository

Dependencies

Completed

  • FinnhubClient with date object interface
  • FundamentalDataRepository with dataclass storage
  • FundamentalContext Pydantic model

Required

  • Working FinnhubClient instance with valid API key
  • Writable data directory for repository storage

Timeline

Immediate (Today)

  • Fix critical date conversion and reference issues
  • Implement basic gap detection
  • Add date validation

Next Steps

  • Implement partial data handling
  • Comprehensive testing
  • Integration with agent workflows

Acceptance Criteria

Must Have

  1. Type Safety: Service passes mise run typecheck with zero errors
  2. Client Integration: All FinnhubClient calls use date objects correctly
  3. Gap Detection: Correctly identifies missing report periods
  4. Partial Data: Service returns partial context when some statements fail
  5. Local-First: Service checks repository before API calls
  6. Context Validation: Returns valid FundamentalContext with Pydantic validation
  7. Error Handling: Graceful handling of API failures and missing data

Should Have

  1. Cache Efficiency: Minimal redundant API calls
  2. Force Refresh: Complete cache bypass when requested
  3. Data Quality: Metadata indicating data completeness
  4. Clear Error Messages: Informative errors for date validation failures

Nice to Have

  1. Performance Metrics: Timing and cache hit rate logging
  2. Fiscal Year Handling: Support for non-calendar fiscal years
  3. Bulk Operations: Fetch multiple symbols efficiently

This PRD focuses on completing the FundamentalDataService as a strongly-typed, local-first data service that seamlessly integrates with the existing FinnhubClient and FundamentalDataRepository components while providing robust gap detection and partial data handling.