TradingAgents/FundamentalDataService_PRD.md

290 lines
10 KiB
Markdown

# Product Requirements Document: FundamentalDataService Completion
## Overview
Complete the `FundamentalDataService` to provide strongly-typed fundamental financial data to trading agents using a local-first data strategy with gap detection and intelligent caching.
## Current State Analysis
### Issues to Fix
- **CRITICAL**: Service calls `FinnhubClient` methods with string dates but client expects `date` objects
- **CRITICAL**: References non-existent `self.simfin_client` instead of `self.finnhub_client`
- Missing strongly-typed interfaces between components
- Incomplete local-first strategy implementation
- No concrete gap detection logic
- Missing error recovery for partial data
### What Works
-`FinnhubClient` fully implemented with strict `date` object interface
-`FundamentalDataRepository` with dataclass-based storage
-`FundamentalContext` Pydantic model for agent consumption
- ✅ Basic service structure and error handling
## Technical Requirements
### 1. Strongly-Typed Interfaces
#### Client → Service Interface
```python
# FinnhubClient methods (already implemented)
def get_balance_sheet(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
def get_income_statement(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
def get_cash_flow(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
```
#### Service → Repository Interface
```python
# Repository methods (already implemented)
def has_data_for_period(symbol: str, start_date: str, end_date: str, frequency: str) -> bool
def get_data(symbol: str, start_date: str, end_date: str, frequency: str) -> dict[str, Any]
def store_data(symbol: str, cache_data: dict, frequency: str, overwrite: bool) -> bool
def clear_data(symbol: str, start_date: str, end_date: str, frequency: str) -> bool
```
#### Service → Agent Interface
```python
# Service output (already defined)
def get_context(symbol: str, start_date: str, end_date: str, frequency: str, force_refresh: bool) -> FundamentalContext
```
### 2. Local-First Data Strategy
#### Flow
1. **Repository Lookup**: Check `FundamentalDataRepository.has_data_for_period()`
2. **Gap Detection**: Identify missing data periods using `detect_fundamental_gaps()`
3. **Selective Fetching**: Fetch only missing data from `FinnhubClient`
4. **Cache Updates**: Store new data via `repository.store_data()`
5. **Context Assembly**: Return validated `FundamentalContext`
#### Gap Detection Implementation
```python
def detect_fundamental_gaps(self, symbol: str, start_date: str, end_date: str, frequency: str) -> list[str]:
"""
Returns list of report dates that need fetching.
Example: If requesting quarterly from 2024-01-01 to 2024-12-31
and cache has Q1 and Q3, returns ["2024-06-30", "2024-09-30", "2024-12-31"]
For quarterly: Check for Q1 (Mar 31), Q2 (Jun 30), Q3 (Sep 30), Q4 (Dec 31)
For annual: Check for fiscal year ends
"""
# Implementation should:
# 1. Get existing report dates from repository
# 2. Calculate expected report dates in requested period
# 3. Return difference between expected and existing
```
#### Force Refresh Support
- `force_refresh=True` bypasses local data completely
- Clears existing cache before fetching fresh data
- Stores refreshed data with metadata indicating refresh
#### Cache Invalidation Strategy
- **Fundamental data is immutable**: Once a report is filed, it doesn't change
- **No staleness checks needed**: Reports are valid indefinitely
- **Only fetch if missing**: Never re-fetch existing reports
### 3. Date Object Conversion
#### Service Boundary Conversion
```python
# Service receives string dates from agents
def get_context(self, symbol: str, start_date: str, end_date: str, ...) -> FundamentalContext:
# Validate date strings
try:
start_dt = date.fromisoformat(start_date)
end_dt = date.fromisoformat(end_date)
except ValueError as e:
raise ValueError(f"Invalid date format: {e}")
# Check date order
if end_dt < start_dt:
raise ValueError(f"End date {end_date} is before start date {start_date}")
# Use date objects when calling FinnhubClient
data = self.finnhub_client.get_balance_sheet(symbol, frequency, end_dt)
```
### 4. Error Recovery and Partial Data
```python
def handle_partial_statements(
self,
balance_sheet: dict | None,
income_statement: dict | None,
cash_flow: dict | None
) -> FundamentalContext:
"""
Create context even if some statements are missing.
- If all statements fail: Raise exception
- If some statements succeed: Return partial context
- Mark missing statements in metadata
"""
metadata = {
"has_balance_sheet": balance_sheet is not None,
"has_income_statement": income_statement is not None,
"has_cash_flow": cash_flow is not None,
"partial_data": any(s is None for s in [balance_sheet, income_statement, cash_flow])
}
# Convert available statements to FinancialStatement objects
# Return FundamentalContext with available data
```
### 5. Pydantic Validation
#### Context Structure
```python
@dataclass
class FundamentalContext(BaseModel):
symbol: str
period: dict[str, str] # {"start": "2024-01-01", "end": "2024-01-31"}
balance_sheet: FinancialStatement | None
income_statement: FinancialStatement | None
cash_flow: FinancialStatement | None
key_ratios: dict[str, float]
metadata: dict[str, Any]
@validator('period')
def validate_period(cls, v):
# Ensure start and end dates are present and valid
return v
```
## Implementation Tasks
### Phase 1: Fix Critical Issues
1. **Date Conversion Fix**
- Add `date.fromisoformat()` conversion in service methods
- Add date validation (format, order)
- Update all `FinnhubClient` method calls to use `date` objects
- File: `tradingagents/services/fundamental_data_service.py:153, 164, 175`
2. **Client Reference Fix**
- Replace `self.simfin_client` with `self.finnhub_client`
- File: `tradingagents/services/fundamental_data_service.py:375`
### Phase 2: Enhanced Local-First Strategy
3. **Gap Detection Logic**
- Implement `detect_fundamental_gaps()` method
- Calculate expected report dates based on frequency
- Compare with cached data to find gaps
- Handle fiscal year variations
4. **Partial Data Handling**
- Implement `handle_partial_statements()` method
- Continue processing if some statements succeed
- Mark missing data in metadata
- Only fail if all statements fail
### Phase 3: Type Safety & Validation
5. **Comprehensive Type Checking**
- Run `mise run typecheck` - must pass with 0 errors
- Validate all `date` object conversions
- Ensure Pydantic model compliance
6. **Enhanced Testing**
- Update existing tests for new date handling
- Add gap detection test scenarios
- Test partial data scenarios
- Test force refresh behavior
- Test date validation edge cases
## Testing Scenarios
### Integration Tests
1. **Gap Detection**
- Test with empty cache (should fetch all)
- Test with partial cache (should fetch only missing)
- Test with complete cache (should fetch none)
2. **Partial Data Recovery**
- Test when balance sheet API fails but others succeed
- Test when only one statement type is available
- Test when all APIs fail (should raise exception)
3. **Date Handling**
- Test invalid date formats
- Test end_date < start_date
- Test boundary conditions (year start/end)
4. **Force Refresh**
- Test that force_refresh=True clears cache
- Test that new data is fetched and stored
## Success Criteria
### Functional Requirements
- Service successfully calls `FinnhubClient` with `date` objects
- Gap detection correctly identifies missing reports
- Partial data scenarios handled gracefully
- Local-first strategy works: checks cache identifies gaps fetches missing stores updates
- Returns properly validated `FundamentalContext` to agents
- Force refresh bypasses cache and refreshes data
### Technical Requirements
- Zero type checking errors: `mise run typecheck`
- Zero linting errors: `mise run lint`
- All existing tests pass
- No runtime errors with date conversions
- Proper error messages for validation failures
### Quality Requirements
- Strongly-typed interfaces between all components
- Comprehensive error handling and logging
- Efficient caching with minimal API calls
- Clear separation of concerns between service, client, and repository
## Dependencies
### Completed
- `FinnhubClient` with `date` object interface
- `FundamentalDataRepository` with dataclass storage
- `FundamentalContext` Pydantic model
### Required
- Working `FinnhubClient` instance with valid API key
- Writable data directory for repository storage
## Timeline
### Immediate (Today)
- Fix critical date conversion and reference issues
- Implement basic gap detection
- Add date validation
### Next Steps
- Implement partial data handling
- Comprehensive testing
- Integration with agent workflows
## Acceptance Criteria
### Must Have
1. **Type Safety**: Service passes `mise run typecheck` with zero errors
2. **Client Integration**: All `FinnhubClient` calls use `date` objects correctly
3. **Gap Detection**: Correctly identifies missing report periods
4. **Partial Data**: Service returns partial context when some statements fail
5. **Local-First**: Service checks repository before API calls
6. **Context Validation**: Returns valid `FundamentalContext` with Pydantic validation
7. **Error Handling**: Graceful handling of API failures and missing data
### Should Have
1. **Cache Efficiency**: Minimal redundant API calls
2. **Force Refresh**: Complete cache bypass when requested
3. **Data Quality**: Metadata indicating data completeness
4. **Clear Error Messages**: Informative errors for date validation failures
### Nice to Have
1. **Performance Metrics**: Timing and cache hit rate logging
2. **Fiscal Year Handling**: Support for non-calendar fiscal years
3. **Bulk Operations**: Fetch multiple symbols efficiently
---
This PRD focuses on completing the `FundamentalDataService` as a strongly-typed, local-first data service that seamlessly integrates with the existing `FinnhubClient` and `FundamentalDataRepository` components while providing robust gap detection and partial data handling.