TradingAgents/docs/product/product.md

150 lines
8.1 KiB
Markdown

# TradingAgents Product Definition
## Product Overview
**TradingAgents** is a personal fork of the multi-agent LLM financial trading framework designed for individual trading research and data infrastructure development. This fork focuses on PostgreSQL + TimescaleDB + pgvectorscale architecture with RAG-powered agents for enhanced decision making through historical context and pattern recognition.
## Target User
### Primary User
- **Single Developer/Researcher**: Individual focused on personal trading research, strategy development, and building robust data infrastructure for financial analysis
### Use Cases
- **Personal Trading Research**: Developing and testing proprietary trading strategies with AI-powered analysis
- **Data Infrastructure Development**: Building scalable time-series and vector search capabilities for financial data
- **RAG Implementation**: Experimenting with retrieval-augmented generation for context-aware trading decisions
- **Academic Research**: Individual research projects exploring AI applications in financial markets
## Core Value Proposition
This personal fork transforms the original TradingAgents framework into a focused research and development platform that:
- **Enables Personal Research**: Provides a complete data infrastructure for individual trading research and strategy development
- **Implements Modern Architecture**: PostgreSQL + TimescaleDB + pgvectorscale stack for efficient time-series and vector operations
- **Supports RAG-Powered Decisions**: Agents leverage historical context through vector similarity search for informed decisions
- **Streamlines Data Collection**: Automated daily/twice-daily data pipelines with Dagster orchestration
- **Unifies LLM Access**: Single OpenRouter integration for consistent model access across all agents
## Key Features
### Enhanced Data Architecture
- **PostgreSQL Foundation**: Robust relational database for structured financial data
- **TimescaleDB Integration**: Optimized time-series storage and querying for market data
- **pgvectorscale Extension**: High-performance vector search for RAG and similarity matching
- **Automated Migrations**: Database schema versioning and management
### RAG-Powered Multi-Agent System
- **Context-Aware Analysis**: Agents use vector similarity search to find relevant historical patterns
- **Enhanced Decision Making**: Retrieval-augmented generation provides historical context for trading decisions
- **Pattern Recognition**: Semantic similarity matching for comparable market conditions
- **Learning from History**: Agents reference past decisions and outcomes for improved analysis
### Automated Data Collection
- **Dagster Orchestration**: Daily/twice-daily data collection pipelines with monitoring and alerting
- **Quality Assurance**: Automated data validation, gap detection, and backfill capabilities
- **Domain Coverage**: Comprehensive data collection for news (95% complete), market data, and social media domains
- **Scalable Processing**: Efficient batch processing with dependency management
### Unified LLM Provider
- **OpenRouter Integration**: Single provider for all model access, reducing API complexity
- **Cost Optimization**: Strategic model selection with clear separation between analysis and data processing models
- **Model Flexibility**: Easy switching between different models through OpenRouter's unified interface
## Business Context
### Research Focus Areas
- **Individual Strategy Development**: Personal trading algorithm research and backtesting
- **Data Infrastructure**: Building scalable financial data storage and retrieval systems
- **AI/ML in Finance**: Experimenting with RAG, vector search, and multi-agent systems
- **Time-Series Analysis**: Advanced market data analysis with TimescaleDB optimization
### Technical Advantages
- **Modern Data Stack**: PostgreSQL + TimescaleDB + pgvectorscale provides production-grade data infrastructure
- **RAG Implementation**: Real-world application of retrieval-augmented generation in financial decision making
- **Comprehensive Testing**: Maintains 85%+ test coverage with pragmatic TDD approach
- **Scalable Architecture**: Domain-driven design supports extensibility and maintainability
### Development Metrics
- **Code Quality**: 85%+ test coverage, comprehensive type checking, automated formatting
- **Data Pipeline Health**: Automated monitoring and alerting for data collection processes
- **Performance**: Optimized queries with TimescaleDB, fast vector search with pgvectorscale
- **Maintainability**: Clean architecture patterns, comprehensive documentation
## Technical Constraints
### Requirements
- **Database**: PostgreSQL with TimescaleDB and pgvectorscale extensions
- **Python Environment**: Python 3.13+ with comprehensive dependency management
- **API Access**: OpenRouter API key for LLM access, optional FinnHub for real-time data
- **Infrastructure**: Docker Compose for local development, Dagster for data orchestration
### Architectural Decisions
- **Single Developer Focus**: Optimized for individual use rather than multi-user collaboration
- **PostgreSQL-First**: All data persistence through PostgreSQL with appropriate extensions
- **OpenRouter Exclusive**: Unified LLM provider reduces complexity and improves consistency
- **Domain Completion**: Sequential domain development (news 95% → marketdata → socialmedia)
## Project Scope
### Current Implementation Status
- **News Domain**: 95% complete with comprehensive article scraping and sentiment analysis
- **Core Infrastructure**: PostgreSQL + TimescaleDB + pgvectorscale foundation established
- **Agent Framework**: RAG-powered agents with vector search capabilities
- **Data Pipelines**: Dagster orchestration for automated data collection
### Included Features
- Complete PostgreSQL-based data architecture with time-series and vector extensions
- RAG-enhanced multi-agent analysis framework with historical context
- Automated data collection pipelines with Dagster orchestration
- OpenRouter integration for unified LLM access
- Comprehensive test suite with domain-specific testing strategies
- CLI interface for interactive analysis and debugging
### Excluded Features
- Multi-user collaboration features
- Real money trading capabilities
- Production-grade risk management for live trading
- Multiple database backend support
- Legacy LLM provider integrations (focus on OpenRouter only)
## Development Phases
### Phase 1: News Domain Completion (Current - 95% Complete)
- Finalize news article scraping and processing
- Complete sentiment analysis pipeline
- Optimize news data storage and retrieval
- Implement comprehensive testing for news domain
### Phase 2: Market Data Domain + PostgreSQL Migration
- Complete market data collection and processing
- Implement TimescaleDB optimizations for price data
- Add technical analysis calculations
- Migrate all data persistence to PostgreSQL
### Phase 3: Social Media Domain
- Implement Reddit and Twitter data collection
- Add social sentiment analysis
- Complete the three-domain architecture
- Optimize cross-domain data relationships
### Phase 4: Dagster Pipeline Implementation
- Daily/twice-daily data collection automation
- Comprehensive monitoring and alerting
- Data quality validation and gap detection
- Performance optimization and scaling
### Phase 5: RAG Enhancement and OpenRouter Migration
- Complete RAG implementation for all agents
- Migrate to OpenRouter as sole LLM provider
- Optimize vector search performance
- Implement advanced pattern recognition
## Success Criteria
This personal fork is successful when it provides:
- **Robust Data Infrastructure**: PostgreSQL + TimescaleDB + pgvectorscale handling all financial data efficiently
- **Intelligent Decision Making**: RAG-powered agents making context-aware trading recommendations
- **Reliable Data Collection**: Automated pipelines collecting high-quality data consistently
- **Research Capability**: Complete platform for individual trading strategy research and development
- **Maintainable Codebase**: 85%+ test coverage with clear architecture and comprehensive documentation
The fork serves as both a practical trading research platform and a demonstration of modern data architecture patterns applied to financial AI systems.