TradingAgents/docs/product/roadmap.md

9.2 KiB

TradingAgents Personal Fork Roadmap

Overview

This roadmap outlines the technical development path for the personal fork of TradingAgents, focusing on building a robust data infrastructure with PostgreSQL + TimescaleDB + pgvectorscale, implementing RAG-powered agents, and establishing automated data collection pipelines with Dagster.

Current Status: Phase 1 - News Domain (95% Complete)

The foundation has been established with core domain architecture, comprehensive testing framework, and the news domain nearly complete.

Completed Infrastructure

  • Domain Architecture: Clean separation of news, marketdata, and socialmedia domains
  • Testing Framework: Pragmatic TDD with 85%+ coverage, pytest-vcr for HTTP mocking
  • Repository Pattern: Efficient data caching and management system
  • News Domain: Article scraping, sentiment analysis, and storage (95% complete)
  • Basic Agent System: Multi-agent trading analysis framework with LangGraph

Development Phases

Phase 1: News Domain Completion (Current - 95% Complete)

Timeline: 2-3 weeks
Status: 🔄 In Progress

Remaining Work

  • News Processing Pipeline: Complete article content processing and deduplication
  • Sentiment Analysis Optimization: Fine-tune sentiment scoring algorithms
  • News Repository: Finalize PostgreSQL integration for news storage
  • Testing Coverage: Achieve 85%+ test coverage for news domain
  • Performance Optimization: Optimize news retrieval and search performance

Success Criteria

  • All news APIs integrated and tested
  • Sentiment analysis producing consistent scores
  • News data properly stored in PostgreSQL
  • Comprehensive test suite covering edge cases
  • News domain ready for RAG integration

Phase 2: Market Data Domain + PostgreSQL Migration (Next Priority)

Timeline: 4-6 weeks
Status: 📋 Planned

Core Objectives

  • TimescaleDB Integration: Implement hypertables for efficient time-series storage
  • Market Data Collection: Complete price, volume, and technical indicator collection
  • PostgreSQL Migration: Move all data persistence from file-based to PostgreSQL
  • Technical Analysis: Implement MACD, RSI, and other technical indicators
  • Database Schema: Design optimized schema for market data with proper indexing

Key Deliverables

  • Market data repository with TimescaleDB optimization
  • Real-time and historical price data collection
  • Technical analysis calculation engine
  • Migration scripts for moving existing data
  • Performance benchmarks for time-series queries

Success Criteria

  • Market data efficiently stored in TimescaleDB hypertables
  • Sub-100ms queries for common market data retrievals
  • All technical indicators calculating accurately
  • Complete migration from file-based storage
  • Market data domain ready for agent integration

Phase 3: Social Media Domain (Following Phase 2)

Timeline: 3-4 weeks
Status: 📋 Planned

Core Objectives

  • Reddit Integration: Implement Reddit API for financial subreddits
  • Twitter/X Integration: Add social sentiment from Twitter feeds
  • Social Sentiment Analysis: Aggregate sentiment scoring across platforms
  • Cross-Domain Relations: Link social sentiment to market data and news
  • pgvectorscale Preparation: Prepare social data for vector search

Key Deliverables

  • Reddit and Twitter data collection clients
  • Social sentiment aggregation algorithms
  • Social media data repository with PostgreSQL storage
  • Cross-domain correlation analysis tools
  • Foundation for RAG implementation

Success Criteria

  • Social media data collected from multiple sources
  • Sentiment scores integrated with market events
  • Cross-domain relationships established in database
  • Social media domain ready for RAG enhancement
  • Three-domain architecture complete

Phase 4: Dagster Data Collection Orchestration

Timeline: 3-4 weeks
Status: 📋 Planned

Core Objectives

  • Pipeline Architecture: Design daily/twice-daily data collection workflows
  • Data Quality Monitoring: Implement validation and gap detection
  • Automated Backfill: Handle missing data and API failures gracefully
  • Performance Monitoring: Track pipeline health and data freshness
  • Alerting System: Notify on pipeline failures or data quality issues

Key Deliverables

  • Dagster asset definitions for all data domains
  • Automated data quality checks and validation
  • Gap detection and backfill capabilities
  • Monitoring dashboard for pipeline health
  • Comprehensive logging and error handling

Success Criteria

  • Fully automated data collection running daily
  • Data quality monitoring with automated alerts
  • Zero-downtime pipeline updates and maintenance
  • Historical data gaps automatically detected and filled
  • Pipeline performance metrics tracked and optimized

Phase 5: RAG Implementation + OpenRouter Migration

Timeline: 4-5 weeks
Status: 📋 Planned

Core Objectives

  • pgvectorscale Integration: Implement vector storage for historical patterns
  • RAG Agent Enhancement: Agents use similarity search for context
  • OpenRouter Migration: Complete migration to unified LLM provider
  • Historical Context: Agents reference past decisions and market conditions
  • Pattern Recognition: Semantic similarity for comparable market scenarios

Key Deliverables

  • pgvectorscale extension configured and optimized
  • Vector embeddings for all historical data
  • RAG-enhanced agent decision making
  • OpenRouter integration replacing all LLM providers
  • Similarity search for historical pattern matching

Success Criteria

  • All agents using RAG for contextual decisions
  • Vector search performing sub-50ms similarity queries
  • OpenRouter as sole LLM provider across all agents
  • Agents demonstrating improved decision accuracy
  • Historical pattern matching enhancing trading analysis

Technical Milestones

Database Architecture

  • Month 1: Complete PostgreSQL foundation with news domain
  • Month 2: TimescaleDB hypertables optimized for market data
  • Month 3: pgvectorscale configured for RAG implementation
  • Month 4: Full database optimization and performance tuning

Agent Capabilities

  • Month 1: Basic multi-agent framework operational
  • Month 2: Agents using PostgreSQL for all data access
  • Month 3: Cross-domain agent collaboration established
  • Month 4: RAG-powered agents with historical context

Data Pipeline Maturity

  • Month 1: Manual data collection with basic automation
  • Month 2: Automated collection for market data
  • Month 3: Full three-domain automated collection
  • Month 4: Production-grade pipeline with monitoring and alerting

Success Metrics

Technical Excellence

  • Test Coverage: Maintain 85%+ across all domains
  • Query Performance: < 100ms for common database operations
  • Pipeline Reliability: 99%+ uptime for data collection
  • Data Quality: < 0.1% missing data points across all domains

Feature Completeness

  • Domain Coverage: 100% implementation across news, marketdata, socialmedia
  • Agent Capabilities: RAG-enhanced decision making operational
  • Data Infrastructure: Complete PostgreSQL + TimescaleDB + pgvectorscale stack
  • Automation: Fully automated data collection and processing

Development Velocity

  • Code Quality: Consistent formatting, type checking, and documentation
  • Testing Strategy: Comprehensive test suite with domain-specific approaches
  • Architecture Consistency: Clean domain separation and layered architecture
  • Performance Optimization: Regular profiling and optimization cycles

Risk Management

Technical Risks

  • Database Performance: Mitigate with proper indexing and query optimization
  • API Rate Limits: Implement intelligent backoff and caching strategies
  • Data Quality: Establish comprehensive validation and monitoring
  • Vector Search Performance: Optimize pgvectorscale configuration and queries

Development Risks

  • Scope Creep: Maintain focus on sequential domain completion
  • Technical Debt: Regular refactoring and code quality maintenance
  • Testing Coverage: Continuous integration with coverage enforcement
  • Documentation: Maintain comprehensive documentation throughout development

Long-Term Vision (6+ Months)

Advanced Capabilities

  • Strategy Backtesting: Historical strategy validation with complete data
  • Real-Time Analysis: Live market analysis with sub-second agent responses
  • Advanced RAG: Multi-modal RAG with charts, documents, and audio data
  • Performance Analytics: Comprehensive analysis of agent decision accuracy

Research Applications

  • Academic Research: Platform for publishing trading AI research
  • Strategy Development: Complete environment for developing proprietary strategies
  • Data Science: Advanced analytics and machine learning on financial data
  • Educational Use: Comprehensive learning platform for financial AI

This roadmap prioritizes building a solid data foundation before enhancing agent capabilities, ensuring each phase delivers measurable value while maintaining high code quality and comprehensive testing.