TradingAgents/docs/specs/socialmedia/design.md

# Social Media Domain - Technical Design Document

## Executive Summary

This document specifies the complete greenfield implementation of the Social Media domain within TradingAgents, transitioning from empty stubs to a production-ready system for collecting and analyzing social media sentiment from financial subreddits. This domain will provide AI agents with social sentiment context for trading decisions through a PostgreSQL + TimescaleDB + pgvectorscale architecture with RAG-powered capabilities.

**Implementation Scope**: Complete domain implementation (0% → 100% completion)
**Architecture**: PostgreSQL + TimescaleDB + pgvectorscale with PRAW Reddit integration and OpenRouter LLM processing
**Target**: 400+ posts daily across 4 financial subreddits with 85%+ test coverage

---

## 1. Architecture Overview

### 1.1 System Architecture

The Social Media domain follows the established layered architecture pattern while introducing new capabilities for social media data collection and semantic search:

```
┌─────────────────────────────────────────────────────────────┐
│                    Dagster Pipeline                         │
│                 (Scheduled Collection)                      │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────┐
│                 RedditClient                                │
│           (PRAW + Rate Limiting)                           │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────┐
│              SocialMediaService                             │
│        (Business Logic + LLM Integration)                  │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────┐
│              SocialRepository                               │
│    (PostgreSQL + TimescaleDB + pgvectorscale)             │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────┐
│         PostgreSQL + TimescaleDB + pgvectorscale           │
│          (Time-series + Vector Storage)                    │
└─────────────────────────────────────────────────────────────┘
```

### 1.2 Data Flow Architecture

**Collection Flow:**
```
Reddit API → RedditClient → SocialMediaService → OpenRouter LLM →
SocialRepository → PostgreSQL + Vector Storage
```

**Agent Query Flow:**
```
AgentToolkit → SocialMediaService → SocialRepository →
Vector Similarity Search + Sentiment Aggregation → Structured Response
```

### 1.3 Key Architectural Principles

- **Consistent Patterns**: Follow news domain architecture for maintainability
- **Vector-Enhanced Search**: Semantic similarity using pgvectorscale for contextual social media analysis
- **Best-Effort Processing**: Continue operation even when LLM services are unavailable
- **Rate Limiting Compliance**: Respect Reddit API limits with exponential backoff
- **Event-Driven Design**: Publish domain events for system integration

---

## 2. Domain Model

### 2.1 Core Entities

#### SocialPost (Domain Entity)

The primary domain entity managing business rules and data transformations:

```python
@dataclass
class SocialPost:
    """Core domain entity for Reddit posts with sentiment and engagement data."""

    # Core Reddit Data
    post_id: str                    # Reddit unique ID (e.g., 't3_abc123')
    title: str                      # Post title
    content: Optional[str]          # Post content (selftext for text posts)
    author: str                     # Reddit username
    subreddit: str                  # Subreddit name
    created_utc: datetime           # Post creation time
    url: str                        # Reddit permalink or external URL

    # Engagement Metrics
    upvotes: int                    # Post score
    downvotes: int                  # Calculated from score + upvote_ratio
    comments_count: int             # Number of comments

    # Enhanced Data
    sentiment_score: Optional[SentimentScore] = None
    tickers: List[str] = field(default_factory=list)
    title_embedding: Optional[List[float]] = None
    content_embedding: Optional[List[float]] = None

    def from_praw_submission(cls, submission: praw.Submission) -> 'SocialPost':
        """Create SocialPost from PRAW Submission object."""

    def to_entity(self) -> SocialMediaPostEntity:
        """Transform to database entity for storage."""

    def validate(self) -> List[str]:
        """Validate business rules and return errors."""

    def extract_tickers(self) -> List[str]:
        """Extract stock ticker symbols from title and content."""

    def has_reliable_sentiment(self) -> bool:
        """Check if sentiment confidence >= 0.5."""

    def to_response(self) -> Dict[str, Any]:
        """Format for agent consumption."""
```

**Validation Rules:**
- `post_id` must match Reddit format (starts with 't3_')
- `title` cannot be empty
- `created_utc` cannot be in the future
- `sentiment_score.confidence` must be 0.0-1.0
- `embeddings` must be 1536 dimensions if present
- `subreddit` must be in allowed financial subreddits list

#### SentimentScore (Value Object)

Structured sentiment analysis result from OpenRouter LLM:

```python
@dataclass
class SentimentScore:
    """Structured sentiment analysis result with confidence and reasoning."""

    sentiment: Literal['positive', 'negative', 'neutral']
    confidence: float  # 0.0-1.0
    reasoning: str     # Brief explanation

    def is_reliable(self) -> bool:
        """Check if confidence >= 0.5 for reliable sentiment."""
        return self.confidence >= 0.5

    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary for JSON storage."""
```

#### SocialJobConfig (Configuration)

Configuration for scheduled Reddit collection:

```python
@dataclass
class SocialJobConfig:
    """Configuration for scheduled Reddit data collection."""

    # Collection Settings
    subreddits: List[str] = field(default_factory=lambda: [
        'wallstreetbets', 'investing', 'stocks', 'SecurityAnalysis'
    ])
    max_posts_per_subreddit: int = 50
    lookback_hours: int = 12
    min_score: int = 10

    # Processing Settings
    sentiment_model: str = "anthropic/claude-3.5-haiku"
    embedding_model: str = "text-embedding-3-large"

    # Rate Limiting
    rate_limit_delay: float = 1.0  # seconds between API calls

    # Scheduling
    schedule_times: List[str] = field(default_factory=lambda: [
        '0 6 * * *',   # 6 AM UTC
        '0 18 * * *'   # 6 PM UTC
    ])
```

---

## 3. Database Design

### 3.1 Schema Definition

The `social_media_posts` table leverages PostgreSQL with TimescaleDB for time-series optimization and pgvectorscale for vector similarity search:

```sql
-- Core table definition
CREATE TABLE social_media_posts (
    id UUID PRIMARY KEY DEFAULT uuid7(),
    post_id VARCHAR(50) UNIQUE NOT NULL,
    title TEXT NOT NULL,
    content TEXT,
    author VARCHAR(100) NOT NULL,
    subreddit VARCHAR(50) NOT NULL,
    created_utc TIMESTAMPTZ NOT NULL,
    upvotes INTEGER NOT NULL DEFAULT 0,
    downvotes INTEGER NOT NULL DEFAULT 0,
    comments_count INTEGER NOT NULL DEFAULT 0,
    url TEXT NOT NULL,
    sentiment_score JSONB,
    sentiment_label VARCHAR(20),
    tickers TEXT[] DEFAULT '{}',
    title_embedding VECTOR(1536),
    content_embedding VECTOR(1536),
    inserted_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- TimescaleDB hypertable for time-series optimization
SELECT create_hypertable('social_media_posts', 'created_utc',
                         chunk_time_interval => INTERVAL '1 day');

-- Performance indexes
CREATE UNIQUE INDEX idx_social_posts_post_id ON social_media_posts (post_id);
CREATE INDEX idx_social_posts_subreddit_time ON social_media_posts (subreddit, created_utc DESC);
CREATE INDEX idx_social_posts_tickers_gin ON social_media_posts USING GIN (tickers);
CREATE INDEX idx_social_posts_title_embedding ON social_media_posts
    USING vectors (title_embedding vector_cosine_ops);
CREATE INDEX idx_social_posts_content_embedding ON social_media_posts
    USING vectors (content_embedding vector_cosine_ops);
CREATE INDEX idx_social_posts_sentiment ON social_media_posts
    (((sentiment_score->>'sentiment'))) WHERE sentiment_score IS NOT NULL;

-- Data validation constraints
ALTER TABLE social_media_posts ADD CONSTRAINT chk_sentiment_score
    CHECK (sentiment_score IS NULL OR
           ((sentiment_score->>'confidence')::float BETWEEN 0 AND 1));
ALTER TABLE social_media_posts ADD CONSTRAINT chk_created_utc
    CHECK (created_utc <= NOW());
```

### 3.2 SQLAlchemy Entity

```python
class SocialMediaPostEntity(Base):
    """SQLAlchemy entity for PostgreSQL persistence with vector support."""

    __tablename__ = "social_media_posts"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid7)
    post_id = Column(String(50), unique=True, nullable=False, index=True)
    title = Column(Text, nullable=False)
    content = Column(Text)
    author = Column(String(100), nullable=False)
    subreddit = Column(String(50), nullable=False)
    created_utc = Column(DateTime(timezone=True), nullable=False)
    upvotes = Column(Integer, nullable=False, default=0)
    downvotes = Column(Integer, nullable=False, default=0)
    comments_count = Column(Integer, nullable=False, default=0)
    url = Column(Text, nullable=False)
    sentiment_score = Column(JSONB)
    sentiment_label = Column(String(20))
    tickers = Column(ARRAY(String), default=[])
    title_embedding = Column(Vector(1536))
    content_embedding = Column(Vector(1536))
    inserted_at = Column(DateTime(timezone=True), default=func.now())
    updated_at = Column(DateTime(timezone=True), default=func.now(), onupdate=func.now())

    def to_domain(self) -> SocialPost:
        """Convert to domain entity."""

    @classmethod
    def from_domain(cls, post: SocialPost) -> 'SocialMediaPostEntity':
        """Create from domain entity."""
```

### 3.3 Access Patterns and Query Optimization

**Common Access Patterns:**
- Ticker-based queries: `SELECT * WHERE 'AAPL' = ANY(tickers)`
- Time-range filtering: `SELECT * WHERE created_utc BETWEEN ? AND ?`
- Vector similarity: `SELECT * ORDER BY embedding <=> ? LIMIT 10`
- Sentiment aggregations: `SELECT AVG(sentiment_score) GROUP BY subreddit`

**Performance Targets:**
- Vector similarity queries: < 1s for top 10 results
- Batch upserts: < 5s for 1000 posts
- Ticker-based queries: < 100ms for 30-day ranges

---

## 4. API Integration

### 4.1 Reddit Client (PRAW Integration)

Complete implementation of Reddit data collection using PRAW (Python Reddit API Wrapper):

```python
class RedditClient:
    """PRAW wrapper with rate limiting and error handling."""

    def __init__(self, config: RedditClientConfig):
        """Initialize Reddit client with OAuth2 credentials."""
        self.reddit = praw.Reddit(
            client_id=config.client_id,
            client_secret=config.client_secret,
            user_agent=config.user_agent
        )
        self.rate_limiter = AsyncLimiter(1, 1)  # 1 request per second

    async def fetch_subreddit_posts(
        self,
        subreddit: str,
        limit: int = 50,
        time_filter: str = 'day'
    ) -> List[Dict[str, Any]]:
        """Fetch hot posts from subreddit with rate limiting."""

    async def search_posts(
        self,
        query: str,
        subreddit: Optional[str] = None,
        limit: int = 25
    ) -> List[Dict[str, Any]]:
        """Search posts with ticker symbols or keywords."""

    async def get_post_details(self, post_id: str) -> Optional[Dict[str, Any]]:
        """Get detailed information for a specific post."""
```

**Configuration Requirements:**
- Reddit App Credentials: `client_id`, `client_secret`, `user_agent`
- Rate Limiting: 1 request per second (60 requests/minute limit)
- Error Handling: Exponential backoff for rate limits, graceful degradation for authentication errors

### 4.2 OpenRouter LLM Integration

Leverage existing OpenRouter infrastructure with social media-specific enhancements:

**Sentiment Analysis Prompt:**
```
Analyze this Reddit post about stocks/finance. Consider the informal language,
memes, and community context typical of financial subreddits.

Post: {title} - {content}

Respond with valid JSON:
{
  "sentiment": "positive|negative|neutral",
  "confidence": 0.0-1.0,
  "reasoning": "brief explanation considering context"
}
```

**Embedding Configuration:**
- Model: `text-embedding-3-large` (1536 dimensions)
- Batch processing for efficiency
- Generate embeddings for both title and content when available
- Store NULL for failed embedding generation (best-effort processing)

---

## 5. Component Architecture

### 5.1 Repository Layer (Data Access)

```python
class SocialRepository:
    """Data access layer for social media posts with vector capabilities."""

    def __init__(self, session: AsyncSession):
        self.session = session

    async def find_by_ticker(
        self,
        ticker: str,
        days: int = 30,
        limit: int = 50
    ) -> List[SocialPost]:
        """Find posts mentioning specific ticker within time range."""

    async def find_similar_posts(
        self,
        query_embedding: List[float],
        ticker: Optional[str] = None,
        limit: int = 10
    ) -> List[SocialPost]:
        """Find semantically similar posts using vector similarity."""

    async def get_sentiment_summary(
        self,
        ticker: str,
        subreddit: Optional[str] = None,
        hours: int = 24
    ) -> Dict[str, Any]:
        """Generate sentiment aggregation for ticker."""

    async def upsert_batch(self, posts: List[SocialPost]) -> List[SocialPost]:
        """Batch upsert posts with conflict resolution."""

    async def cleanup_old_posts(self, days: int = 90) -> int:
        """Remove posts older than retention period."""
```

### 5.2 Service Layer (Business Logic)

```python
class SocialMediaService:
    """Business logic orchestration with LLM integration."""

    def __init__(
        self,
        repository: SocialRepository,
        reddit_client: RedditClient,
        openrouter_client: OpenRouterClient
    ):
        self.repository = repository
        self.reddit_client = reddit_client
        self.openrouter_client = openrouter_client

    async def collect_subreddit_posts(self, config: SocialJobConfig) -> int:
        """Orchestrate complete collection process for configured subreddits."""

    async def update_post_sentiment(
        self,
        posts: List[SocialPost]
    ) -> List[SocialPost]:
        """Add sentiment analysis to posts using OpenRouter LLM."""

    async def generate_embeddings(
        self,
        posts: List[SocialPost]
    ) -> List[SocialPost]:
        """Generate vector embeddings for semantic search."""

    async def find_trending_tickers(
        self,
        hours: int = 24
    ) -> List[Dict[str, Any]]:
        """Identify trending ticker mentions across subreddits."""
```

### 5.3 Agent Integration Layer

```python
class SocialMediaAgentToolkit:
    """RAG methods for AI agent integration."""

    def __init__(self, service: SocialMediaService):
        self.service = service

    async def get_reddit_sentiment(
        self,
        ticker: str,
        days: int = 7
    ) -> Dict[str, Any]:
        """Get sentiment summary for ticker from Reddit discussions."""

    async def search_social_posts(
        self,
        query: str,
        ticker: Optional[str] = None
    ) -> List[Dict[str, Any]]:
        """Semantic search for relevant social media posts."""

    async def get_trending_discussions(
        self,
        ticker: str
    ) -> List[Dict[str, Any]]:
        """Get trending discussions and sentiment for specific ticker."""

    async def get_subreddit_analysis(
        self,
        subreddit: str,
        ticker: str
    ) -> Dict[str, Any]:
        """Analyze sentiment and engagement for ticker in specific subreddit."""
```

**Agent Response Format:**
```json
{
  "posts": [
    {
      "post_id": "t3_abc123",
      "title": "AAPL earnings beat expectations",
      "subreddit": "stocks",
      "created_utc": "2024-01-15T14:30:00Z",
      "sentiment": {
        "sentiment": "positive",
        "confidence": 0.85,
        "reasoning": "Strong positive language about earnings"
      },
      "engagement": {
        "upvotes": 245,
        "comments_count": 67
      },
      "tickers": ["AAPL"],
      "url": "https://reddit.com/r/stocks/comments/abc123"
    }
  ],
  "summary": {
    "total_posts": 15,
    "sentiment_breakdown": {
      "positive": 0.6,
      "negative": 0.2,
      "neutral": 0.2
    },
    "avg_confidence": 0.78,
    "data_quality": "high"
  }
}
```

---

## 6. Dagster Pipeline Architecture

### 6.1 Scheduled Collection Pipeline

```python
@asset(
    partitions_def=DailyPartitionsDefinition(start_date="2024-01-01"),
    config_schema=SocialJobConfig.schema()
)
def reddit_posts_collection(context: AssetExecutionContext) -> MaterializeResult:
    """Collect Reddit posts from financial subreddits."""

@asset(deps=[reddit_posts_collection])
def reddit_sentiment_analysis(context: AssetExecutionContext) -> MaterializeResult:
    """Add sentiment analysis to collected posts."""

@asset(deps=[reddit_sentiment_analysis])
def reddit_embeddings_generation(context: AssetExecutionContext) -> MaterializeResult:
    """Generate vector embeddings for semantic search."""

# Schedule: Twice daily collection
reddit_collection_schedule = ScheduleDefinition(
    name="reddit_collection_schedule",
    job=define_asset_job("reddit_collection", selection=[
        reddit_posts_collection,
        reddit_sentiment_analysis,
        reddit_embeddings_generation
    ]),
    cron_schedule="0 6,18 * * *"  # 6 AM and 6 PM UTC
)
```

### 6.2 Data Quality and Monitoring

**Collection Metrics:**
- Posts collected per subreddit per run
- Sentiment analysis success rate
- Embedding generation success rate
- API error rates and retry attempts

**Data Quality Checks:**
- Post deduplication verification
- Sentiment confidence distribution
- Embedding vector validation
- Reddit API rate limit utilization

**Failure Handling:**
- Best-effort processing: Continue with remaining subreddits if one fails
- Exponential backoff for Reddit API rate limits
- Graceful degradation: Store posts without sentiment/embeddings if LLM fails
- Dead letter queue for failed posts with retry mechanism

---

## 7. Testing Strategy

### 7.1 Test Structure

Following the project's pragmatic outside-in TDD approach:

```
tests/domains/socialmedia/
├── __init__.py
├── test_social_post.py                 # Domain entity validation
├── test_social_repository.py           # PostgreSQL + vector operations
├── test_reddit_client.py               # PRAW integration with VCR
├── test_social_media_service.py        # Business logic with mocked deps
├── test_social_agent_toolkit.py        # Agent integration methods
└── fixtures/
    ├── reddit_responses.json           # Sample PRAW responses
    └── vcr_cassettes/                   # HTTP cassettes for external APIs
```

### 7.2 Testing Approach

**Unit Tests (Mock I/O boundaries):**
- `SocialPost` entity validation and transformations
- `SocialRepository` with test PostgreSQL database
- `RedditClient` with mocked PRAW responses
- `SocialMediaService` with mocked dependencies

**Integration Tests (Real components):**
- End-to-end collection pipeline with test Reddit data
- Vector similarity search with actual pgvectorscale
- LLM integration with pytest-vcr cassettes
- Dagster pipeline execution

**Performance Tests:**
- Vector similarity query performance (< 1s target)
- Batch upsert performance (< 5s for 1000 posts)
- Memory usage during large collection runs

### 7.3 Test Fixtures and Mocking

**Reddit API Mocking:**
```python
@pytest.fixture
def mock_reddit_response():
    """Sample Reddit API response for testing."""
    return {
        "id": "abc123",
        "title": "AAPL earnings discussion",
        "selftext": "Strong quarter, bullish outlook",
        "author": "test_user",
        "subreddit_display_name": "stocks",
        "created_utc": 1705315200,
        "score": 150,
        "upvote_ratio": 0.85,
        "num_comments": 45,
        "permalink": "/r/stocks/comments/abc123/aapl_earnings/"
    }
```

**Vector Similarity Testing:**
```python
@pytest.mark.asyncio
async def test_vector_similarity_search(social_repository, sample_posts):
    """Test semantic similarity search using pgvectorscale."""
    # Insert test posts with embeddings
    await social_repository.upsert_batch(sample_posts)

    # Test similarity search
    query_embedding = [0.1] * 1536  # Sample embedding
    similar_posts = await social_repository.find_similar_posts(
        query_embedding, limit=5
    )

    assert len(similar_posts) <= 5
    assert all(post.title_embedding for post in similar_posts)
```

---

## 8. Implementation Roadmap

### 8.1 Phase 1: Database Foundation (Week 1)

**Priority 1: Database Schema**
1. Create PostgreSQL migration for `social_media_posts` table
2. Add TimescaleDB hypertable configuration
3. Set up pgvectorscale indexes for vector similarity
4. Implement data validation constraints

**Priority 2: Core Entities**
1. `SocialMediaPostEntity` (SQLAlchemy entity)
2. `SocialPost` (domain entity with validation)
3. `SentimentScore` (value object)
4. Entity transformation methods (`to_domain`, `from_domain`)

### 8.2 Phase 2: Data Collection (Week 2)

**Priority 1: Reddit Integration**
1. `RedditClient` with PRAW implementation
2. Rate limiting and error handling
3. Subreddit post collection methods
4. Reddit API authentication setup

**Priority 2: Repository Layer**
1. `SocialRepository` with PostgreSQL operations
2. Vector similarity search methods
3. Batch upsert operations
4. Sentiment aggregation queries

### 8.3 Phase 3: Processing & Intelligence (Week 3)

**Priority 1: Service Layer**
1. `SocialMediaService` business logic
2. OpenRouter LLM integration for sentiment
3. Vector embedding generation
4. Batch processing workflows

**Priority 2: Agent Integration**
1. `SocialMediaAgentToolkit` RAG methods
2. Structured response formatting
3. Context-aware social media analysis
4. Integration with existing agent workflows

### 8.4 Phase 4: Automation & Monitoring (Week 4)

**Priority 1: Dagster Pipeline**
1. Scheduled Reddit collection assets
2. Processing pipeline orchestration
3. Data quality monitoring
4. Error handling and retry logic

**Priority 2: Testing & Documentation**
1. Comprehensive test suite (>85% coverage)
2. Performance testing and optimization
3. API documentation updates
4. Integration with existing test infrastructure

---

## 9. Monitoring and Observability

### 9.1 Key Metrics

**Collection Metrics:**
- Posts collected per subreddit per day
- Collection job success/failure rates
- Reddit API rate limit utilization
- Data deduplication effectiveness

**Processing Metrics:**
- Sentiment analysis success rate and latency
- Embedding generation success rate and latency
- LLM token usage and costs
- Vector similarity query performance

**Business Metrics:**
- Active tickers with social sentiment data
- Sentiment distribution across subreddits
- Trending ticker detection accuracy
- Agent query response times

### 9.2 Alerting Strategy

**Critical Alerts:**
- Collection job failures (> 2 consecutive failures)
- Reddit API authentication errors
- Database connection failures
- High LLM processing error rates (> 20%)

**Warning Alerts:**
- Low collection volumes (< 50% of expected)
- High sentiment analysis latency (> 30s per batch)
- Vector similarity performance degradation
- Approaching Reddit API rate limits

### 9.3 Logging and Debugging

**Structured Logging Format:**
```json
{
  "timestamp": "2024-01-15T14:30:00Z",
  "level": "INFO",
  "component": "SocialMediaService",
  "operation": "collect_subreddit_posts",
  "subreddit": "stocks",
  "posts_collected": 45,
  "sentiment_analyzed": 43,
  "embeddings_generated": 41,
  "duration_ms": 12500,
  "metadata": {
    "reddit_api_calls": 3,
    "llm_tokens_used": 15420
  }
}
```

---

## 10. Security and Compliance

### 10.1 Data Privacy

**Reddit Data Handling:**
- Store only publicly available Reddit posts
- Respect user privacy: hash usernames for analytics
- Implement data retention policies (90-day maximum)
- No collection of private or deleted content

**API Key Management:**
- Environment variable storage for Reddit credentials
- OpenRouter API key rotation support
- No credential logging or persistence in plain text

### 10.2 Rate Limiting Compliance

**Reddit API Compliance:**
- Respect 60 requests per minute OAuth limit
- Implement exponential backoff for rate limit violations
- User-Agent string identification as required
- Monitor and log API usage statistics

**OpenRouter Usage:**
- Monitor token usage and costs
- Implement request batching for efficiency
- Handle API rate limits gracefully
- Cost optimization through model selection

---

## 11. Future Enhancements

### 11.1 Extended Social Media Sources

**Twitter/X Integration:**
- Similar architecture pattern for Twitter API v2
- Real-time streaming for high-frequency updates
- Hashtag and mention tracking

**News Comment Sections:**
- Integration with financial news comment sections
- Cross-platform sentiment correlation
- Enhanced context for news articles

### 11.2 Advanced Analytics

**Sentiment Trend Analysis:**
- Time-series sentiment tracking
- Volatility correlation with social sentiment
- Predictive sentiment modeling

**Influence Network Analysis:**
- User influence scoring based on engagement
- Community detection within financial subreddits
- Viral content identification and tracking

### 11.3 Real-time Processing

**Streaming Architecture:**
- Real-time Reddit post collection
- Event-driven sentiment processing
- Live sentiment dashboards for agents

**Market Hours Integration:**
- Increased collection frequency during market hours
- After-hours sentiment tracking
- Weekend vs. weekday sentiment patterns

---

This technical design provides a comprehensive blueprint for implementing the complete Social Media domain from empty stubs to a production-ready system. The architecture leverages proven patterns from the news domain while introducing specialized capabilities for social media data collection, semantic search, and AI agent integration.