13 KiB
MarketData Domain - PostgreSQL Migration Specification
Feature Overview
Feature: MarketData Domain PostgreSQL Migration
Status: Migration project (85% complete → PostgreSQL integration)
Priority: High (foundational infrastructure for AI agents)
This specification defines the migration of the MarketData domain from CSV-based storage to PostgreSQL + TimescaleDB + pgvectorscale integration, while preserving 100% API compatibility and delivering 10x performance improvements for AI agent operations.
User Stories
Primary User Story
As a Dagster pipeline and AI Agent, I want to collect daily OHLC data from yfinance, insider data from FinnHub, and fundamental data from FinnHub with PostgreSQL + TimescaleDB storage, so that agents have high-performance, RAG-enhanced market data access for comprehensive trading analysis.
Supporting User Stories
Agent Performance
- As an AI Agent, I want market data queries to complete in under 100ms, so that real-time trading analysis is efficient
- As a Technical Analyst Agent, I want vector similarity search for historical patterns, so that pattern-based trading decisions are context-aware
Data Pipeline Reliability
- As a Dagster pipeline, I want atomic data ingestion with PostgreSQL ACID transactions, so that data integrity is guaranteed during bulk operations
- As a Risk Management Agent, I want comprehensive audit trails for all market data access, so that trading decisions are fully traceable
Acceptance Criteria
Migration Compatibility
- AC1: GIVEN the MarketData domain migration WHEN PostgreSQL + TimescaleDB integration is complete THEN all existing MarketDataService APIs remain 100% compatible with 10x performance improvement
Data Collection Pipeline
- AC2: GIVEN daily market data collection WHEN Dagster pipelines execute THEN OHLC data from yfinance and insider/fundamental data from FinnHub are stored in TimescaleDB hypertables
Performance Requirements
- AC3: GIVEN historical market data queries WHEN AI agents request technical analysis THEN responses are delivered within 100ms using TimescaleDB time-series optimization
- AC4: GIVEN technical analysis requests WHEN agents query indicators THEN all 20 existing TA-Lib indicators are preserved with PostgreSQL-backed data access
RAG Integration
- AC5: GIVEN RAG-powered analysis WHEN agents search for historical patterns THEN vector similarity search using pgvectorscale returns relevant market conditions within 200ms
Scalability
- AC6: GIVEN concurrent agent operations WHEN multiple agents access market data THEN PostgreSQL async operations support concurrent reads without file system limitations
Data Quality
- AC7: GIVEN data quality requirements WHEN market data is collected THEN comprehensive validation, audit trails, and error handling maintain data integrity with PostgreSQL ACID transactions
Business Rules
API Preservation
- BR1: Preserve 100% API compatibility with existing MarketDataService for seamless migration
- BR2: Maintain all existing method signatures in FundamentalDataService and InsiderDataService
Data Collection Standards
- BR3: Daily automated collection from yfinance (OHLC) and FinnHub (insider + fundamentals) via Dagster pipelines
- BR4: FinnHub API rate limiting compliance with proper backoff strategies
- BR5: Graceful degradation when external APIs are unavailable
Database Architecture
- BR6: TimescaleDB hypertables for market_data, fundamental_data, and insider_data tables
- BR7: Vector embeddings generation for technical analysis patterns using pgvectorscale
Performance Standards
- BR8: Sub-100ms query performance for common market data operations
- BR9: Data retention policy: 10 years for OHLC, 5 years for fundamentals, 3 years for insider data
Audit and Compliance
- BR10: Comprehensive audit logging for all data collection and agent queries
Technical Implementation Details
Architecture Pattern
Router → Service → Repository → Entity → Database
The migration preserves the existing service interfaces while upgrading the underlying data persistence layer.
Database Schema Design
TimescaleDB Hypertables
-- Market Data (OHLC)
CREATE TABLE market_data (
id SERIAL PRIMARY KEY,
symbol VARCHAR(10) NOT NULL,
date TIMESTAMPTZ NOT NULL,
open DECIMAL(12,4),
high DECIMAL(12,4),
low DECIMAL(12,4),
close DECIMAL(12,4),
adj_close DECIMAL(12,4),
volume BIGINT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
SELECT create_hypertable('market_data', 'date', chunk_time_interval => INTERVAL '1 month');
-- Fundamental Data
CREATE TABLE fundamental_data (
id SERIAL PRIMARY KEY,
symbol VARCHAR(10) NOT NULL,
report_date TIMESTAMPTZ NOT NULL,
period_type VARCHAR(20), -- annual, quarterly
metric_name VARCHAR(100),
metric_value DECIMAL(20,4),
created_at TIMESTAMPTZ DEFAULT NOW()
);
SELECT create_hypertable('fundamental_data', 'report_date', chunk_time_interval => INTERVAL '3 months');
-- Insider Data
CREATE TABLE insider_data (
id SERIAL PRIMARY KEY,
symbol VARCHAR(10) NOT NULL,
transaction_date TIMESTAMPTZ NOT NULL,
person_name VARCHAR(200),
position VARCHAR(100),
transaction_type VARCHAR(50),
shares BIGINT,
price DECIMAL(12,4),
value DECIMAL(20,4),
created_at TIMESTAMPTZ DEFAULT NOW()
);
SELECT create_hypertable('insider_data', 'transaction_date', chunk_time_interval => INTERVAL '1 month');
Vector Storage for RAG
-- Technical Indicators with Vector Embeddings
CREATE TABLE technical_indicators (
id SERIAL PRIMARY KEY,
symbol VARCHAR(10) NOT NULL,
date TIMESTAMPTZ NOT NULL,
indicator_name VARCHAR(50),
indicator_value DECIMAL(12,6),
pattern_embedding vector(384), -- OpenRouter embeddings
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON technical_indicators USING hnsw (pattern_embedding vector_cosine_ops);
SQLAlchemy Entity Models
# MarketDataEntity
@dataclass
class MarketDataEntity:
symbol: str
date: datetime
open: Optional[Decimal] = None
high: Optional[Decimal] = None
low: Optional[Decimal] = None
close: Optional[Decimal] = None
adj_close: Optional[Decimal] = None
volume: Optional[int] = None
id: Optional[int] = None
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
@classmethod
def from_yfinance_data(cls, symbol: str, row: pd.Series) -> "MarketDataEntity":
"""Convert yfinance data to entity"""
def to_database_record(self) -> dict:
"""Convert entity to database record"""
def validate(self) -> None:
"""Validate entity data integrity"""
Repository Migration
class MarketDataRepository:
"""PostgreSQL + TimescaleDB repository with async operations"""
def __init__(self, database_manager: DatabaseManager):
self.db = database_manager
async def get_ohlc_data(
self,
symbol: str,
start_date: datetime,
end_date: datetime
) -> List[MarketDataEntity]:
"""Retrieve OHLC data with TimescaleDB optimization"""
query = """
SELECT * FROM market_data
WHERE symbol = $1 AND date BETWEEN $2 AND $3
ORDER BY date DESC
"""
rows = await self.db.fetch(query, symbol, start_date, end_date)
return [MarketDataEntity.from_database_record(row) for row in rows]
async def bulk_upsert_market_data(
self,
entities: List[MarketDataEntity]
) -> int:
"""Atomic bulk upsert for Dagster pipelines"""
async def find_similar_patterns(
self,
pattern_embedding: List[float],
limit: int = 10
) -> List[Dict]:
"""RAG-powered pattern matching using pgvectorscale"""
query = """
SELECT symbol, date, indicator_name, indicator_value,
pattern_embedding <=> $1 as similarity
FROM technical_indicators
ORDER BY pattern_embedding <=> $1
LIMIT $2
"""
return await self.db.fetch(query, pattern_embedding, limit)
Service Compatibility Layer
class MarketDataService:
"""Preserved API with PostgreSQL backend"""
def __init__(self, repository: MarketDataRepository, yfinance_client: YFinanceClient):
self.repository = repository
self.yfinance_client = yfinance_client
async def get_stock_data(self, symbol: str, period: str = "1y") -> pd.DataFrame:
"""100% compatible with existing API signature"""
# Implementation using PostgreSQL repository
async def calculate_technical_indicators(
self,
symbol: str,
indicators: List[str]
) -> Dict[str, np.ndarray]:
"""Preserve all 20 TA-Lib indicators with PostgreSQL data"""
async def get_trading_style_preset(self, style: str) -> Dict:
"""Preserved trading style presets with enhanced performance"""
Vector RAG Enhancement
class MarketDataRAGService:
"""RAG-powered market analysis enhancement"""
async def find_historical_patterns(
self,
current_indicators: Dict[str, float],
lookback_days: int = 30
) -> List[Dict]:
"""Vector similarity search for historical patterns"""
async def generate_pattern_embedding(
self,
indicator_values: Dict[str, float]
) -> List[float]:
"""Generate embeddings using OpenRouter for pattern matching"""
Migration Components
Phase 1: Database Schema & Entities
-
SQLAlchemy Entity Models
- MarketDataEntity for OHLC data
- FundamentalDataEntity for financial statements
- InsiderDataEntity for SEC transactions
- TechnicalIndicatorEntity for calculated values
-
TimescaleDB Setup
- Hypertable creation for time-series optimization
- Proper indexing strategy
- Vector extension configuration
Phase 2: Repository Migration
-
Async PostgreSQL Operations
- Follow news domain patterns for consistency
- Connection pooling and transaction management
- Error handling and retry logic
-
Data Migration Scripts
- CSV to PostgreSQL data transfer
- Data validation and integrity checks
- Performance optimization
Phase 3: Service Preservation
-
API Compatibility
- Maintain all existing method signatures
- Preserve return types and data formats
- Performance optimization through PostgreSQL
-
Vector RAG Integration
- Pattern embedding generation
- Similarity search capabilities
- Historical context enhancement
Phase 4: Testing & Integration
-
Comprehensive Testing
- Real PostgreSQL database for repository tests
- Preserved pytest-vcr for API clients
- Service compatibility validation
-
Agent Integration
- AgentToolkit RAG capabilities
- Performance benchmarking
- Concurrent access testing
Dependencies
Ready Dependencies
- YFinanceClient and FinnhubClient: Fully implemented and tested
- PostgreSQL + TimescaleDB + pgvectorscale: Database infrastructure established
- News domain PostgreSQL patterns: Migration templates available
- DatabaseManager: Async operations and connection management ready
- OpenRouter configuration: Vector embeddings generation available
Planned Dependencies
- Dagster orchestration: Framework for daily data collection pipelines
Success Criteria
Performance Metrics
- 10x query performance improvement over CSV-based storage
- Sub-100ms market data operations for common agent queries
- Sub-200ms RAG queries for vector similarity search
- Support for 500+ tickers with concurrent agent access
Compatibility Standards
- 100% existing API preservation without breaking changes
- Seamless migration without agent disruption
- Efficient bulk data ingestion for Dagster pipelines
Quality Assurance
- 85%+ test coverage maintained across all components
- Comprehensive data validation and audit trails
- PostgreSQL ACID transactions for data integrity
Architecture Alignment
This migration aligns with the multi-agent trading framework vision by providing:
- High-performance market data foundation for sophisticated agent analysis
- RAG-powered historical context for pattern-based trading decisions
- Scalable concurrent access supporting multiple agents simultaneously
- Comprehensive audit trails for regulatory compliance and risk management
- Time-series optimization for efficient technical analysis operations
The migration follows established news domain patterns to ensure architectural consistency across the entire TradingAgents framework.