TradingAgents/docs/specs/MarketData/spec-lite.md

98 lines
4.0 KiB
Markdown

# MarketData Domain - PostgreSQL Migration (Lite Spec)
## Migration Overview
**Project**: 85% complete MarketData domain → PostgreSQL + TimescaleDB + pgvectorscale
**Objective**: 10x performance + RAG capabilities while preserving 100% API compatibility
**Pattern**: Follow news domain PostgreSQL implementation for architectural consistency
## Key Requirements
### Performance Targets
- Sub-100ms market data queries (10x improvement from CSV)
- Sub-200ms RAG vector similarity search
- Support 500+ tickers with concurrent agent access
### API Preservation (Critical)
- **MarketDataService**: All existing methods preserved
- **FundamentalDataService**: Complete compatibility maintained
- **InsiderDataService**: Zero breaking changes
- **20 TA-Lib indicators**: Full functionality preserved
### Data Sources & Collection
- **yfinance**: Daily OHLC data via Dagster pipelines
- **FinnHub**: Insider transactions + fundamental data
- **TimescaleDB hypertables**: market_data, fundamental_data, insider_data
- **Vector storage**: pgvectorscale for RAG pattern matching
## Technical Implementation
### Database Schema (TimescaleDB)
```sql
-- Hypertables for time-series optimization
market_data (symbol, date, ohlc, volume) - 10 year retention
fundamental_data (symbol, report_date, metrics) - 5 year retention
insider_data (symbol, transaction_date, person, shares) - 3 year retention
technical_indicators (symbol, date, values, pattern_embedding) - RAG support
```
### Entity Models
- **MarketDataEntity**: OHLC + validation + database conversion
- **FundamentalDataEntity**: Financial statement data
- **InsiderDataEntity**: SEC transaction records
- **TechnicalIndicatorEntity**: Calculated values + vector embeddings
### Repository Pattern (Async PostgreSQL)
```python
class MarketDataRepository:
async def get_ohlc_data(symbol, start, end) -> List[MarketDataEntity]
async def bulk_upsert_market_data(entities) -> int # Dagster ingestion
async def find_similar_patterns(embedding, limit) -> List[Dict] # RAG
```
### Service Layer (100% Compatible)
```python
class MarketDataService:
async def get_stock_data(symbol, period) -> pd.DataFrame # Preserved API
async def calculate_technical_indicators(symbol, indicators) -> Dict # 20 TA-Lib
async def get_trading_style_preset(style) -> Dict # Existing presets
```
## Migration Strategy
### Phase 1: Entities & Schema
1. Create SQLAlchemy entities following news domain patterns
2. Setup TimescaleDB hypertables with proper indexing
3. Configure pgvectorscale for vector embeddings
### Phase 2: Repository Migration
1. Implement async PostgreSQL repositories (mirror NewsRepository pattern)
2. Create data migration scripts (CSV → PostgreSQL)
3. Add vector embedding generation for RAG
### Phase 3: Service Preservation
1. Update services to use PostgreSQL repositories
2. Maintain exact API signatures and return types
3. Add RAG-enhanced pattern analysis capabilities
### Phase 4: Integration & Testing
1. Real PostgreSQL tests for repositories
2. Preserve pytest-vcr for YFinanceClient/FinnhubClient
3. Validate 100% API compatibility with existing agents
## Ready Dependencies
- YFinanceClient + FinnhubClient (fully implemented)
- PostgreSQL + TimescaleDB + pgvectorscale (established)
- DatabaseManager async operations (available)
- News domain patterns for consistency (reference implementation)
## Success Metrics
- **Performance**: 10x query improvement, sub-100ms operations
- **Compatibility**: Zero API breaking changes, seamless agent migration
- **Scalability**: 500+ concurrent tickers, efficient bulk ingestion
- **Quality**: 85%+ test coverage, comprehensive validation
## Implementation Approach
**Follow news domain patterns** → Create entities → Migrate repositories → Preserve service APIs → Enhance with vector RAG → Integrate Dagster pipelines
This migration provides the high-performance, RAG-enabled market data foundation essential for sophisticated multi-agent trading analysis while maintaining complete backward compatibility.