TradingAgents/spec-lite.md at c20771bf2050533df22570afd6492721e8afa221

3.6 KiB

Raw Blame History

Summary

Complete implementation of social media data collection from Reddit with LLM sentiment analysis and vector embeddings for AI agent RAG integration.

Core Requirements

Data Collection

Daily Reddit collection from financial subreddits (wallstreetbets, investing, stocks, SecurityAnalysis)
OpenRouter LLM sentiment analysis with confidence scoring
Vector embeddings for semantic similarity search
PostgreSQL storage with TimescaleDB + pgvectorscale optimization

Agent Integration

AgentToolkit methods: get_reddit_news() and get_reddit_stock_info()
RAG-enhanced queries with < 2 second response time
Vector similarity search for contextual social media insights

Technical Implementation

Architecture Pattern

Router → Service → Repository → Entity → Database (matching news domain)

Database Schema

social_media_posts (
    post_id, ticker, subreddit, title, content, author,
    created_at, upvotes, comment_count, 
    sentiment_score, sentiment_label, sentiment_confidence,
    embedding vector(1536), -- pgvectorscale
    data_quality_score, processing_status
)

Key Components

1. RedditClient

PRAW integration with rate limiting
Financial subreddit targeting
Ticker-specific post filtering

2. SentimentAnalyzer

OpenRouter LLM integration
Structured sentiment scoring (-1.0 to +1.0)
Financial context awareness

3. SocialRepository

PostgreSQL with deduplication by post_id
Vector similarity search using pgvectorscale
TimescaleDB time-series optimization

4. SocialMediaService

Orchestrates collection pipeline: Reddit → Sentiment → Embeddings → Storage
Provides ticker-specific social context
Calculates aggregate sentiment metrics

5. AgentToolkit Integration

async def get_reddit_news(ticker: str, days: int = 7) -> str:
    # Returns formatted social media context with sentiment analysis
    
async def get_reddit_stock_info(ticker: str, query: Optional[str] = None) -> str:  
    # Returns semantic search results with sentiment aggregation

Implementation Scope

Complete Implementation ✅

PostgreSQL migration from file storage
Reddit API client (currently empty stub)
SQLAlchemy entities with vector fields
LLM sentiment analysis pipeline
Vector embedding generation and search
Dagster pipeline for scheduled collection
Comprehensive test coverage (pytest-vcr for APIs)

Current Status

Basic stub implementation - requires complete rebuild of all components

Dependencies

Reddit API credentials (PRAW)
OpenRouter API access
PostgreSQL with TimescaleDB + pgvectorscale
Existing TradingAgentsConfig
News domain patterns for consistency

Data Flow

Dagster pipeline triggers daily collection
RedditClient fetches posts from financial subreddits
SentimentAnalyzer processes posts via OpenRouter LLM
EmbeddingGenerator creates vector embeddings
SocialRepository stores in PostgreSQL with deduplication
AI Agents query via AgentToolkit with RAG-enhanced context

Testing Strategy

pytest-vcr for Reddit API mocking
Real PostgreSQL for repository integration tests
Service mocks for business logic testing
85%+ coverage matching project standards

Success Criteria

Daily automated Reddit collection with sentiment analysis
Sub-2-second agent queries with vector search
Seamless RAG integration matching news domain patterns
Production-ready reliability with comprehensive error handling

3.6 KiB Raw Blame History

Social Media Domain - Specification Lite