503 lines
18 KiB
Markdown
503 lines
18 KiB
Markdown
# Product Requirements Document: MarketDataService Completion
|
|
|
|
## Overview
|
|
|
|
Complete the `MarketDataService` to provide strongly-typed market data and technical indicators to trading agents using a local-first data strategy with gap detection and intelligent caching.
|
|
|
|
## Current State Analysis
|
|
|
|
### Issues to Fix
|
|
- **CRITICAL**: Service uses `BaseClient` inheritance but `YFinanceClient` exists and needs refactoring to FinnhubClient standard
|
|
- **CRITICAL**: Service calls client methods with string dates instead of date objects
|
|
- **CRITICAL**: Need to integrate `stockstats` library for technical analysis calculations instead of legacy utils
|
|
- **CRITICAL**: `MarketDataRepository` exists but missing service interface methods
|
|
- Missing strongly-typed interface between YFinanceClient and service
|
|
- YFinanceClient uses BaseClient inheritance and string dates (needs refactoring)
|
|
- No concrete gap detection logic
|
|
- Missing technical indicator data sufficiency validation
|
|
|
|
### What Works
|
|
- ✅ Local-first data strategy implementation (`_get_price_data_local_first`)
|
|
- ✅ Force refresh logic (`_fetch_and_cache_fresh_data`)
|
|
- ✅ `MarketDataContext` Pydantic model for agent consumption
|
|
- ✅ Error handling and metadata creation patterns
|
|
- ✅ `YFinanceClient` exists with yfinance SDK integration and comprehensive methods
|
|
- ✅ `MarketDataRepository` exists with CSV storage and pandas DataFrame operations
|
|
- ✅ Service structure ready for `stockstats` integration for technical analysis
|
|
|
|
## Technical Requirements
|
|
|
|
### 1. Strongly-Typed Interfaces
|
|
|
|
#### Client → Service Interface
|
|
```python
|
|
# YFinanceClient methods (to be refactored)
|
|
def get_historical_data(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
|
|
def get_price_data(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
|
|
|
|
# Technical analysis handled in service layer using stockstats
|
|
# No get_technical_indicator method needed in client - calculated from OHLCV data
|
|
```
|
|
|
|
#### Service → Repository Interface
|
|
```python
|
|
# MarketDataRepository methods (to be implemented)
|
|
def has_data_for_period(symbol: str, start_date: str, end_date: str) -> bool
|
|
def get_data(symbol: str, start_date: str, end_date: str) -> dict[str, Any]
|
|
def store_data(symbol: str, cache_data: dict, overwrite: bool) -> bool
|
|
def clear_data(symbol: str, start_date: str, end_date: str) -> bool
|
|
```
|
|
|
|
#### Service → Agent Interface
|
|
```python
|
|
# Service output (already defined)
|
|
def get_context(symbol: str, start_date: str, end_date: str, indicators: list[str], force_refresh: bool) -> MarketDataContext
|
|
```
|
|
|
|
### 2. Local-First Data Strategy
|
|
|
|
#### Flow
|
|
1. **Repository Lookup**: Check `MarketDataRepository.has_data_for_period()`
|
|
2. **Gap Detection**: Identify missing price data periods using `detect_market_gaps()`
|
|
3. **Data Sufficiency Check**: Ensure enough historical data for requested indicators
|
|
4. **Selective Fetching**: Fetch only missing data from `YFinanceClient`
|
|
5. **Cache Updates**: Store new data via `repository.store_data()`
|
|
6. **Context Assembly**: Return validated `MarketDataContext`
|
|
|
|
#### Gap Detection Implementation
|
|
```python
|
|
def detect_market_gaps(self, cached_dates: list[str], requested_start: str, requested_end: str) -> list[tuple[str, str]]:
|
|
"""
|
|
Returns list of (start, end) tuples for missing periods.
|
|
|
|
Example: If requesting 2024-01-01 to 2024-01-31 and cache has:
|
|
- 2024-01-01 to 2024-01-10
|
|
- 2024-01-20 to 2024-01-25
|
|
Returns: [("2024-01-11", "2024-01-19"), ("2024-01-26", "2024-01-31")]
|
|
|
|
Accounts for:
|
|
- Weekends (Saturday/Sunday)
|
|
- Market holidays
|
|
- Continuous date ranges to minimize API calls
|
|
"""
|
|
# Implementation should use pandas business day logic
|
|
```
|
|
|
|
#### Force Refresh Support
|
|
- `force_refresh=True` bypasses local data completely
|
|
- Clears existing cache before fetching fresh data
|
|
- Stores refreshed data with metadata indicating refresh
|
|
|
|
#### Cache Invalidation Strategy
|
|
- **Historical data is immutable**: Data older than yesterday never changes
|
|
- **Today's data needs updates**: During market hours, refresh every 15 minutes
|
|
- **After market close**: Today's data becomes immutable
|
|
```python
|
|
def is_data_stale(self, data_date: date, last_updated: datetime) -> bool:
|
|
today = date.today()
|
|
if data_date < today:
|
|
return False # Historical data never stale
|
|
|
|
# For today's data, check if market is open and last update > 15 min
|
|
if is_market_open() and (datetime.now() - last_updated).minutes > 15:
|
|
return True
|
|
return False
|
|
```
|
|
|
|
### 3. Date Object Conversion
|
|
|
|
#### Service Boundary Conversion
|
|
```python
|
|
# Service receives string dates from agents
|
|
def get_context(self, symbol: str, start_date: str, end_date: str, ...) -> MarketDataContext:
|
|
# Validate date strings
|
|
try:
|
|
start_dt = date.fromisoformat(start_date)
|
|
end_dt = date.fromisoformat(end_date)
|
|
except ValueError as e:
|
|
raise ValueError(f"Invalid date format: {e}")
|
|
|
|
# Check date order
|
|
if end_dt < start_dt:
|
|
raise ValueError(f"End date {end_date} is before start date {start_date}")
|
|
|
|
# Expand date range for technical indicators
|
|
expanded_start = self._calculate_lookback_start(start_dt, indicators)
|
|
|
|
# Use date objects when calling YFinanceClient
|
|
price_data = self.yfinance_client.get_historical_data(symbol, expanded_start, end_dt)
|
|
|
|
# Calculate technical indicators using stockstats library
|
|
technical_indicators = self._calculate_technical_indicators(price_data, indicators)
|
|
```
|
|
|
|
### 4. Technical Analysis with Stockstats
|
|
|
|
#### Data Sufficiency Validation
|
|
```python
|
|
# Minimum data points required for each indicator
|
|
INDICATOR_REQUIREMENTS = {
|
|
"sma_20": 20,
|
|
"sma_200": 200,
|
|
"ema_12": 24, # 2x for exponential smoothing
|
|
"ema_200": 400,
|
|
"rsi_14": 28, # 2x period for warm-up
|
|
"macd": 34, # 26 + 8 for signal line
|
|
"bb_upper": 20, # Based on 20-period SMA
|
|
"atr_14": 28, # 2x period for accuracy
|
|
"stochrsi_14": 42, # 3x period for double smoothing
|
|
}
|
|
|
|
def _calculate_lookback_start(self, start_date: date, indicators: list[str]) -> date:
|
|
"""Calculate how far back we need data to compute indicators accurately."""
|
|
max_lookback = 0
|
|
for indicator in indicators:
|
|
lookback = INDICATOR_REQUIREMENTS.get(indicator, 0)
|
|
max_lookback = max(max_lookback, lookback)
|
|
|
|
# Add buffer for weekends/holidays
|
|
business_days_back = max_lookback * 1.5
|
|
return start_date - timedelta(days=int(business_days_back))
|
|
|
|
def _validate_data_sufficiency(self, data_points: int, indicators: list[str]) -> dict[str, bool]:
|
|
"""Check if we have enough data for each indicator."""
|
|
return {
|
|
indicator: data_points >= INDICATOR_REQUIREMENTS.get(indicator, 0)
|
|
for indicator in indicators
|
|
}
|
|
```
|
|
|
|
#### Stockstats Integration
|
|
```python
|
|
def _calculate_technical_indicators(self, price_data: list[dict], indicators: list[str]) -> dict[str, list[dict]]:
|
|
"""
|
|
Calculate technical indicators using stockstats library.
|
|
|
|
Args:
|
|
price_data: OHLCV data from YFinanceClient
|
|
indicators: List of requested indicators (e.g., ['rsi_14', 'macd', 'bb_upper', 'sma_20'])
|
|
|
|
Returns:
|
|
Dict mapping indicator names to time series data
|
|
"""
|
|
import pandas as pd
|
|
from stockstats import StockDataFrame
|
|
|
|
# Convert price data to pandas DataFrame
|
|
df = pd.DataFrame(price_data)
|
|
df['date'] = pd.to_datetime(df['date'])
|
|
df.set_index('date', inplace=True)
|
|
|
|
# Check data sufficiency
|
|
sufficiency = self._validate_data_sufficiency(len(df), indicators)
|
|
|
|
# Create StockDataFrame for technical analysis
|
|
sdf = StockDataFrame.retype(df)
|
|
|
|
# Calculate requested indicators
|
|
indicator_data = {}
|
|
for indicator in indicators:
|
|
if not sufficiency[indicator]:
|
|
logger.warning(f"Insufficient data for {indicator}, need {INDICATOR_REQUIREMENTS[indicator]} points")
|
|
indicator_data[indicator] = []
|
|
continue
|
|
|
|
try:
|
|
if indicator in sdf.columns:
|
|
values = sdf[indicator].dropna()
|
|
indicator_data[indicator] = [
|
|
{"date": idx.strftime("%Y-%m-%d"), "value": float(val)}
|
|
for idx, val in values.items()
|
|
]
|
|
except Exception as e:
|
|
logger.warning(f"Failed to calculate {indicator}: {e}")
|
|
indicator_data[indicator] = []
|
|
|
|
return indicator_data
|
|
```
|
|
|
|
### 5. Error Recovery and Partial Data
|
|
|
|
```python
|
|
def handle_partial_price_data(
|
|
self,
|
|
requested_start: str,
|
|
requested_end: str,
|
|
available_data: list[dict]
|
|
) -> MarketDataContext:
|
|
"""
|
|
Handle cases where only partial date range is available.
|
|
|
|
- If no data available: Raise exception
|
|
- If partial data: Return what's available with metadata
|
|
- Mark gaps in metadata
|
|
"""
|
|
if not available_data:
|
|
raise ValueError(f"No market data available for {symbol}")
|
|
|
|
actual_start = min(d['date'] for d in available_data)
|
|
actual_end = max(d['date'] for d in available_data)
|
|
|
|
metadata = {
|
|
"requested_period": {"start": requested_start, "end": requested_end},
|
|
"actual_period": {"start": actual_start, "end": actual_end},
|
|
"partial_data": actual_start > requested_start or actual_end < requested_end,
|
|
"data_points": len(available_data)
|
|
}
|
|
|
|
# Return context with available data and metadata
|
|
```
|
|
|
|
### 6. Pydantic Validation
|
|
|
|
#### Context Structure
|
|
```python
|
|
@dataclass
|
|
class MarketDataContext(BaseModel):
|
|
symbol: str
|
|
period: dict[str, str] # {"start": "2024-01-01", "end": "2024-01-31"}
|
|
price_data: list[dict[str, Any]] # OHLCV records
|
|
technical_indicators: dict[str, list[TechnicalIndicatorData]]
|
|
metadata: dict[str, Any]
|
|
|
|
@validator('price_data')
|
|
def validate_price_data(cls, v):
|
|
# Ensure OHLCV fields present and valid
|
|
required_fields = {'date', 'open', 'high', 'low', 'close', 'volume'}
|
|
for record in v:
|
|
if not all(field in record for field in required_fields):
|
|
raise ValueError(f"Missing required OHLCV fields")
|
|
return v
|
|
```
|
|
|
|
## Implementation Tasks
|
|
|
|
### Phase 1: Refactor YFinanceClient
|
|
|
|
1. **YFinanceClient Refactoring**
|
|
- **Refactor existing** `tradingagents/clients/yfinance_client.py`
|
|
- Remove BaseClient inheritance
|
|
- Update all method signatures to accept `date` objects instead of strings
|
|
- Keep all existing functionality intact
|
|
- Example changes:
|
|
```python
|
|
# Current (wrong)
|
|
def get_historical_data(self, symbol: str, start_date: str, end_date: str) -> dict[str, Any]:
|
|
|
|
# Updated (correct)
|
|
def get_historical_data(self, symbol: str, start_date: date, end_date: date) -> dict[str, Any]:
|
|
```
|
|
|
|
2. **Comprehensive Testing**
|
|
- Update `tradingagents/clients/test_yfinance_client.py`
|
|
- Test with date objects
|
|
- Use pytest-vcr for HTTP interaction recording
|
|
- Test error handling and edge cases
|
|
|
|
### Phase 2: Update MarketDataRepository
|
|
|
|
3. **Repository Interface Enhancement**
|
|
- Update existing `tradingagents/repositories/market_data_repository.py`
|
|
- Add missing service interface methods: `has_data_for_period()`, `get_data()`, `store_data()`, `clear_data()`
|
|
- Maintain existing CSV/pandas functionality while adding service compatibility
|
|
- Support gap detection and partial data scenarios
|
|
|
|
### Phase 3: Update MarketDataService
|
|
|
|
4. **Client Integration Fix**
|
|
- Replace `BaseClient` dependency with `YFinanceClient`
|
|
- File: `tradingagents/services/market_data_service.py:8, 26`
|
|
- Update constructor to accept `yfinance_client: YFinanceClient`
|
|
|
|
5. **Date Conversion and Validation**
|
|
- Add `date.fromisoformat()` conversion in service methods
|
|
- Add date validation (format, order)
|
|
- Update client calls to use date objects instead of strings
|
|
- File: `tradingagents/services/market_data_service.py:151, 227`
|
|
|
|
6. **Technical Indicator Integration with Stockstats**
|
|
- Implement `_calculate_technical_indicators()` method using `stockstats` library
|
|
- Add `_calculate_lookback_start()` for data sufficiency
|
|
- Add `_validate_data_sufficiency()` to check if enough data
|
|
- Replace legacy `StockstatsUtils` integration with direct stockstats usage
|
|
- File: `tradingagents/services/market_data_service.py:9, 43, 280-346`
|
|
|
|
### Phase 4: Type Safety & Validation
|
|
|
|
7. **Comprehensive Type Checking**
|
|
- Run `mise run typecheck` - must pass with 0 errors
|
|
- Validate all date object conversions
|
|
- Ensure MarketDataContext compliance
|
|
|
|
8. **Enhanced Testing**
|
|
- Update existing service tests for new YFinanceClient interface
|
|
- Add gap detection test scenarios
|
|
- Test technical indicator data sufficiency
|
|
- Test partial data handling
|
|
|
|
## Testing Scenarios
|
|
|
|
### Integration Tests
|
|
|
|
1. **Gap Detection**
|
|
- Test with empty cache (should fetch all)
|
|
- Test with partial cache (should fetch only missing periods)
|
|
- Test weekend/holiday handling
|
|
|
|
2. **Technical Indicator Sufficiency**
|
|
- Test SMA_200 with only 100 days of data (should skip indicator)
|
|
- Test RSI_14 with exactly 28 days (should calculate)
|
|
- Test mixed indicators with varying data requirements
|
|
|
|
3. **Partial Data Recovery**
|
|
- Test when API returns less data than requested
|
|
- Test when some dates are missing (holidays)
|
|
- Test metadata accuracy for partial data
|
|
|
|
4. **Date Handling**
|
|
- Test invalid date formats
|
|
- Test end_date < start_date
|
|
- Test future dates
|
|
- Test weekend date handling
|
|
|
|
5. **Cache Staleness**
|
|
- Test historical data (should never refresh)
|
|
- Test today's data during market hours (should refresh if > 15 min)
|
|
- Test today's data after market close (should not refresh)
|
|
|
|
## Success Criteria
|
|
|
|
### Functional Requirements
|
|
- ✅ Service successfully calls refactored `YFinanceClient` with `date` objects
|
|
- ✅ Gap detection correctly identifies missing trading days
|
|
- ✅ Technical indicators validate data sufficiency before calculation
|
|
- ✅ Partial data scenarios handled gracefully
|
|
- ✅ Local-first strategy works: checks cache → identifies gaps → fetches missing → stores updates
|
|
- ✅ Returns properly validated `MarketDataContext` to agents
|
|
- ✅ Technical indicators calculated from OHLCV data using stockstats library
|
|
- ✅ Force refresh bypasses cache and refreshes data
|
|
|
|
### Technical Requirements
|
|
- ✅ Zero type checking errors: `mise run typecheck`
|
|
- ✅ Zero linting errors: `mise run lint`
|
|
- ✅ All existing tests pass with updated architecture
|
|
- ✅ No runtime errors with date conversions
|
|
- ✅ Proper error messages for validation failures
|
|
|
|
### Quality Requirements
|
|
- ✅ Strongly-typed interfaces between all components
|
|
- ✅ Official yfinance SDK and stockstats library usage
|
|
- ✅ Comprehensive error handling and logging
|
|
- ✅ Efficient caching with minimal API calls
|
|
- ✅ Clear separation of concerns between service, client, and repository
|
|
|
|
## Data Architecture
|
|
|
|
### YFinanceClient Response Format
|
|
```python
|
|
{
|
|
"symbol": "AAPL",
|
|
"period": {"start": "2024-01-01", "end": "2024-01-31"},
|
|
"data": [
|
|
{
|
|
"date": "2024-01-02", # Note: Jan 1 was a holiday
|
|
"open": 150.0,
|
|
"high": 155.0,
|
|
"low": 149.0,
|
|
"close": 154.0,
|
|
"volume": 1000000,
|
|
"adj_close": 154.0
|
|
},
|
|
...
|
|
],
|
|
"metadata": {
|
|
"source": "yfinance",
|
|
"retrieved_at": "2024-01-31T10:00:00Z",
|
|
"data_quality": "HIGH",
|
|
"missing_dates": ["2024-01-01", "2024-01-15"] # Holidays
|
|
}
|
|
}
|
|
```
|
|
|
|
### Technical Indicator Data Format
|
|
```python
|
|
# MarketDataContext.technical_indicators structure
|
|
{
|
|
"rsi_14": [
|
|
{"date": "2024-01-29", "value": 65.5}, # First valid after 28 days
|
|
{"date": "2024-01-30", "value": 67.2},
|
|
...
|
|
],
|
|
"sma_200": [], # Empty if insufficient data
|
|
"macd": [
|
|
{"date": "2024-01-31", "value": {"macd": 2.1, "signal": 1.8, "histogram": 0.3}}
|
|
],
|
|
"_metadata": {
|
|
"indicators_calculated": ["rsi_14", "macd"],
|
|
"indicators_skipped": {
|
|
"sma_200": "Insufficient data: need 200 points, have 31"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
### Existing Components (Need Updates)
|
|
- ✅ `YFinanceClient` exists but needs refactoring (remove BaseClient, use date objects)
|
|
- ✅ `MarketDataRepository` exists with CSV storage but needs service interface methods
|
|
- ✅ Tests exist but need updates for new interfaces
|
|
|
|
### Required
|
|
- Official `yfinance` library for market data fetching
|
|
- `stockstats` library for technical analysis calculations
|
|
- `pandas` for date/time handling and business day calculations
|
|
- Working internet connection for live data fetching
|
|
- Writable data directory for repository storage
|
|
|
|
## Timeline
|
|
|
|
### Immediate (Phase 1)
|
|
- Refactor existing YFinanceClient to use date objects
|
|
- Remove BaseClient inheritance
|
|
- Update tests for new interface
|
|
|
|
### Phase 2-3
|
|
- Add service interface methods to MarketDataRepository
|
|
- Update MarketDataService to use refactored YFinanceClient
|
|
- Implement data sufficiency validation
|
|
- Integrate stockstats library for technical indicators
|
|
|
|
### Phase 4
|
|
- Comprehensive type checking and validation
|
|
- Integration testing with gap detection
|
|
- Performance optimization and caching efficiency
|
|
|
|
## Acceptance Criteria
|
|
|
|
### Must Have
|
|
1. **Type Safety**: Service passes `mise run typecheck` with zero errors
|
|
2. **Client Refactoring**: YFinanceClient uses date objects, no BaseClient
|
|
3. **Gap Detection**: Correctly identifies missing trading days
|
|
4. **Data Sufficiency**: Validates enough data for technical indicators
|
|
5. **Partial Data**: Service handles incomplete data gracefully
|
|
6. **Local-First**: Service checks repository before API calls
|
|
7. **Context Validation**: Returns valid `MarketDataContext` with Pydantic validation
|
|
8. **Technical Indicators**: Calculated using stockstats with proper validation
|
|
|
|
### Should Have
|
|
1. **Cache Efficiency**: Minimal redundant API calls to Yahoo Finance
|
|
2. **Force Refresh**: Complete cache bypass when requested
|
|
3. **Stale Data Handling**: Refresh today's data during market hours
|
|
4. **Clear Error Messages**: Informative errors for validation failures
|
|
|
|
### Nice to Have
|
|
1. **Performance Metrics**: Timing and cache hit rate logging
|
|
2. **Extended Indicators**: Support for 50+ technical indicators
|
|
3. **Real-time Data**: WebSocket integration for live prices
|
|
4. **Bulk Symbol Support**: Fetch multiple symbols efficiently
|
|
|
|
---
|
|
|
|
This PRD focuses on completing the `MarketDataService` as a strongly-typed, local-first data service that integrates OHLCV price data from a refactored `YFinanceClient` and calculates comprehensive technical indicators using the `stockstats` library, with robust gap detection and data sufficiency validation.
|