4.0 KiB
4.0 KiB
MarketData Domain - PostgreSQL Migration (Lite Spec)
Migration Overview
Project: 85% complete MarketData domain → PostgreSQL + TimescaleDB + pgvectorscale
Objective: 10x performance + RAG capabilities while preserving 100% API compatibility
Pattern: Follow news domain PostgreSQL implementation for architectural consistency
Key Requirements
Performance Targets
- Sub-100ms market data queries (10x improvement from CSV)
- Sub-200ms RAG vector similarity search
- Support 500+ tickers with concurrent agent access
API Preservation (Critical)
- MarketDataService: All existing methods preserved
- FundamentalDataService: Complete compatibility maintained
- InsiderDataService: Zero breaking changes
- 20 TA-Lib indicators: Full functionality preserved
Data Sources & Collection
- yfinance: Daily OHLC data via Dagster pipelines
- FinnHub: Insider transactions + fundamental data
- TimescaleDB hypertables: market_data, fundamental_data, insider_data
- Vector storage: pgvectorscale for RAG pattern matching
Technical Implementation
Database Schema (TimescaleDB)
-- Hypertables for time-series optimization
market_data (symbol, date, ohlc, volume) - 10 year retention
fundamental_data (symbol, report_date, metrics) - 5 year retention
insider_data (symbol, transaction_date, person, shares) - 3 year retention
technical_indicators (symbol, date, values, pattern_embedding) - RAG support
Entity Models
- MarketDataEntity: OHLC + validation + database conversion
- FundamentalDataEntity: Financial statement data
- InsiderDataEntity: SEC transaction records
- TechnicalIndicatorEntity: Calculated values + vector embeddings
Repository Pattern (Async PostgreSQL)
class MarketDataRepository:
async def get_ohlc_data(symbol, start, end) -> List[MarketDataEntity]
async def bulk_upsert_market_data(entities) -> int # Dagster ingestion
async def find_similar_patterns(embedding, limit) -> List[Dict] # RAG
Service Layer (100% Compatible)
class MarketDataService:
async def get_stock_data(symbol, period) -> pd.DataFrame # Preserved API
async def calculate_technical_indicators(symbol, indicators) -> Dict # 20 TA-Lib
async def get_trading_style_preset(style) -> Dict # Existing presets
Migration Strategy
Phase 1: Entities & Schema
- Create SQLAlchemy entities following news domain patterns
- Setup TimescaleDB hypertables with proper indexing
- Configure pgvectorscale for vector embeddings
Phase 2: Repository Migration
- Implement async PostgreSQL repositories (mirror NewsRepository pattern)
- Create data migration scripts (CSV → PostgreSQL)
- Add vector embedding generation for RAG
Phase 3: Service Preservation
- Update services to use PostgreSQL repositories
- Maintain exact API signatures and return types
- Add RAG-enhanced pattern analysis capabilities
Phase 4: Integration & Testing
- Real PostgreSQL tests for repositories
- Preserve pytest-vcr for YFinanceClient/FinnhubClient
- Validate 100% API compatibility with existing agents
Ready Dependencies
- YFinanceClient + FinnhubClient (fully implemented)
- PostgreSQL + TimescaleDB + pgvectorscale (established)
- DatabaseManager async operations (available)
- News domain patterns for consistency (reference implementation)
Success Metrics
- Performance: 10x query improvement, sub-100ms operations
- Compatibility: Zero API breaking changes, seamless agent migration
- Scalability: 500+ concurrent tickers, efficient bulk ingestion
- Quality: 85%+ test coverage, comprehensive validation
Implementation Approach
Follow news domain patterns → Create entities → Migrate repositories → Preserve service APIs → Enhance with vector RAG → Integrate Dagster pipelines
This migration provides the high-performance, RAG-enabled market data foundation essential for sophisticated multi-agent trading analysis while maintaining complete backward compatibility.