TradingAgents/.claude/skills/github-workflow/examples/issue-template.md

14 KiB
Raw Blame History

Example Issue Description

Problem

Report generation times out for datasets containing more than 10,000 rows, causing 45% of export attempts to fail and generating 20+ support tickets per week.

Current Behavior

  1. User clicks "Generate Report" button for dataset with >10K rows
  2. No progress indication displayed to user
  3. After 60 seconds, browser timeout error appears: "Request timeout"
  4. No way to resume, cancel, or save partial results
  5. User forced to manually split dataset and export in smaller batches

Impact

Users:

  • 45% of report generation attempts fail (from analytics)
  • Average 3-5 retry attempts before giving up or contacting support
  • Lost productivity: ~15 minutes per failed export

Business:

  • 20-25 support tickets per week (5 hours support time)
  • User frustration score: 3.2/10 (below acceptable threshold of 7/10)
  • Enterprise customers threatening to churn due to export limitations

Technical:

  • Server memory spikes to 2GB+ during large exports
  • CPU usage reaches 100% during processing
  • Occasional OOM crashes affecting other users

Root Cause Analysis

Current implementation loads entire dataset into memory before processing:

# Current approach (problematic)
def generate_report(dataset_id):
    # Load ALL data into memory at once
    data = db.query(f"SELECT * FROM {dataset_id}").fetchall()  # 10K+ rows

    # Process all data before returning
    results = process_all_data(data)  # Blocks for 60+ seconds

    return results  # Times out before reaching this point

Problems:

  1. No streaming - all data loaded at once
  2. No progress tracking - user sees nothing for 60s
  3. No cancellation - process continues even if user navigates away
  4. No memory limits - can spike to 2GB+

Solution

Implement streaming report generation with progressive rendering and chunked processing.

Proposed Architecture

┌──────────┐  1. Request   ┌────────────────┐  2. Query   ┌──────────┐
│  Client  │ ────────────> │  API Server    │ ──────────> │ Database │
└──────────┘               └────────────────┘             └──────────┘
     │                            │                             │
     │                            │ 3. Stream results           │
     │                            │ <───────────────────────────┘
     │                            │
     │ 4. Server-Sent Events      │ 5. Process chunks (1K rows)
     │    (progress updates)      │    Send to client as ready
     │ <──────────────────────────│
     │                            │
     │ 6. Progressive rendering   │
     │    Display results as      │
     │    they arrive             │

Implementation Approach

Backend (Python/FastAPI):

async def generate_report_streaming(dataset_id):
    """Stream report generation with chunked processing."""
    async def event_generator():
        # Query with cursor (no full load)
        cursor = db.cursor()
        cursor.execute(f"SELECT * FROM {dataset_id}")

        total_rows = cursor.rowcount
        processed = 0

        # Process in 1,000-row chunks
        while True:
            chunk = cursor.fetchmany(size=1000)
            if not chunk:
                break

            # Process chunk
            results = process_chunk(chunk)

            # Send progress update
            processed += len(chunk)
            yield {
                "progress": (processed / total_rows) * 100,
                "data": results
            }

    return StreamingResponse(event_generator(), media_type="text/event-stream")

Frontend (JavaScript):

// Connect to streaming endpoint
const eventSource = new EventSource('/api/reports/stream/' + datasetId);

// Update progress bar
eventSource.addEventListener('message', (event) => {
    const { progress, data } = JSON.parse(event.data);

    // Update UI
    progressBar.value = progress;
    resultsTable.append(data);

    if (progress >= 100) {
        eventSource.close();
        showCompleteMessage();
    }
});

// Allow cancellation
cancelButton.onclick = () => {
    eventSource.close();
    fetch('/api/reports/cancel/' + jobId, { method: 'POST' });
};

Key Features

  1. Chunked processing: Process 1,000 rows at a time
  2. Progressive rendering: Display results as they arrive
  3. Progress tracking: Real-time percentage indicator
  4. Cancellation support: User can cancel at any time
  5. Memory limits: Max 500MB regardless of dataset size
  6. Fault tolerance: Resume on network interruption

Motivation

User Impact

  • Current: 45% failure rate → 2-3 hour productivity loss per week
  • After fix: <1% failure rate → 30 minutes saved per week per user
  • Scale: 500 active users × 30 min/week = 250 hours/week saved

Business Impact

  • Reduce support tickets from 20/week to <5/week (15 hours/week saved)
  • Improve user satisfaction score from 3.2/10 to >7/10
  • Prevent enterprise customer churn ($50K ARR at risk)
  • Enable larger dataset support (competitive advantage)

Technical Impact

  • Reduce server memory usage by 75% (2GB → 500MB)
  • Enable horizontal scaling (stateless processing)
  • Improve overall system stability (fewer OOM crashes)
  • Better resource utilization (CPU distributed over time)

Acceptance Criteria

Functional Requirements

  • Reports with 10K+ rows complete successfully without timeout
  • First results visible within 2 seconds of clicking "Generate"
  • Complete report generated in <10 seconds for 10K rows
  • Progress indicator shows accurate % complete during generation
  • User can cancel report generation at any time
  • Partial results saved if user cancels
  • Report generation works for datasets up to 100K rows

Non-Functional Requirements

  • Memory usage stays below 500MB regardless of dataset size
  • No memory leaks (tested with 100 consecutive report generations)
  • Works on Chrome 119+, Firefox 120+, Safari 17+
  • Responsive on mobile devices (tablet and desktop)
  • Handles slow network connections (3G, throttled)

Performance Targets

Metric Current Target Improvement
Success rate 55% >99% +80%
Time to first result N/A (timeout) <2s
Complete export (10K rows) Timeout (60s) <10s 6x faster
Memory usage (10K rows) 2GB+ <500MB 75% reduction
Support tickets/week 20-25 <5 80% reduction

Edge Cases

  • Empty datasets display "No data" message
  • Datasets with 100K+ rows generate successfully (may take 30-60s)
  • Special characters render correctly (unicode, emojis, HTML entities)
  • Network interruption shows error and allows retry
  • Concurrent report generation by same user works correctly
  • Server restart during generation shows clear error message

Error Handling

  • Database connection errors display user-friendly message
  • Permission denied shows appropriate error (403 Forbidden)
  • Invalid dataset ID returns 404 Not Found
  • Rate limiting (>5 concurrent reports) shows clear message
  • Timeout after 5 minutes shows clear error and suggests smaller dataset

Technical Approach

Architecture Changes

Current (Synchronous):

Client ──> API Server ──> Database
                ↓ (load all)
            Process all
                ↓
              Return
                ↓
          (timeout!)

Proposed (Streaming):

Client ──> API Server ──> Database
   ↑          ↓ (cursor)      ↓
   │      Stream chunks    Stream rows
   │          ↓               ↑
   └──── Progressive ─────────┘
         rendering

Implementation Steps

Phase 1: Backend Streaming (Week 1)

  1. Add FastAPI StreamingResponse support
  2. Implement chunked database queries (1K rows/chunk)
  3. Add Server-Sent Events (SSE) endpoint
  4. Implement job cancellation endpoint
  5. Add memory usage monitoring

Phase 2: Frontend Progressive Rendering (Week 1)

  1. Add EventSource for SSE connection
  2. Implement progress bar component
  3. Add cancel button with confirmation
  4. Implement progressive table rendering
  5. Add error handling and retry logic

Phase 3: Testing & Optimization (Week 2)

  1. Load testing with 100K row datasets
  2. Memory profiling during generation
  3. Concurrent user testing (10 simultaneous exports)
  4. Edge case testing (network interruption, cancellation)
  5. Performance tuning (chunk size optimization)

Phase 4: Deployment (Week 2)

  1. Deploy to staging environment
  2. Internal beta testing (dev team)
  3. Gradual rollout (10% → 50% → 100%)
  4. Monitor error rates and performance
  5. Full production deployment

Database Optimization

  • Add index on frequently filtered columns
  • Use read replicas for report queries (reduce load on primary)
  • Implement query result caching for identical requests

Monitoring

  • Track report generation success rate
  • Monitor memory usage per report
  • Alert on failure rate >5%
  • Track average generation time

Alternatives Considered

Alternative 1: Asynchronous Job Queue

Approach: Submit report to background job queue, email user when complete

Pros:

  • Simple implementation (Celery + Redis)
  • No frontend changes needed
  • Works for very large datasets

Cons:

  • Poor UX (user must wait for email)
  • No real-time progress updates
  • Increased infrastructure complexity
  • Doesn't solve immediate feedback problem

Decision: Rejected - UX too poor for interactive reports

Alternative 2: Client-Side Processing

Approach: Download raw data, process in browser with Web Workers

Pros:

  • Offloads processing to client
  • No server load

Cons:

  • Slow download for large datasets
  • High bandwidth usage
  • Limited by browser memory
  • Requires significant client-side code

Decision: Rejected - Not viable for 10K+ row datasets

Alternative 3: Paginated Results

Approach: Show first 100 rows, user clicks "Load More"

Pros:

  • Fast initial load
  • Simple implementation

Cons:

  • User must click multiple times for full report
  • Not a true "export" solution
  • Poor UX for users needing complete data

Decision: Rejected - Doesn't meet user requirements

Open Questions

  • Should we cache generated reports? → No, data changes frequently
  • What's the ideal chunk size? → 1,000 rows (tested)
  • Should we limit concurrent reports per user? → Yes, max 5
  • Should we support export to CSV/Excel during streaming?
  • Should we add email notification when generation completes?

Testing Strategy

Unit Tests

  • test_streaming_report_generator.py: Chunked processing logic
  • test_progress_tracking.py: Accurate progress calculation
  • test_cancellation.py: Job cancellation and cleanup
  • test_error_handling.py: Database errors, network issues

Integration Tests

  • test_report_api.py: End-to-end streaming report generation
  • test_concurrent_reports.py: Multiple simultaneous reports
  • test_large_datasets.py: 100K row datasets

Load Tests

# Test with 50 concurrent users generating 10K row reports
locust -f tests/load/test_report_streaming.py --users 50 --spawn-rate 5

# Performance targets:
# - 99th percentile response time: <15s
# - Error rate: <1%
# - Memory usage per worker: <500MB

Edge Case Tests

  • Empty dataset
  • Single row dataset
  • 100K row dataset
  • Network interruption mid-generation
  • Database connection loss
  • Server restart during generation
  • Concurrent cancellations

Rollout Plan

Week 1: Development

  • Implement backend streaming
  • Implement frontend progressive rendering
  • Unit tests and integration tests

Week 2: Testing & Staging

  • Load testing
  • Deploy to staging
  • Internal testing (dev team)
  • Fix any issues found

Week 3: Gradual Production Rollout

  • Deploy to production with feature flag
  • Enable for 10% of users
  • Monitor error rates, performance metrics
  • If successful, increase to 50%
  • If successful, increase to 100%

Week 4: Full Deployment

  • 100% of users on streaming reports
  • Remove old synchronous implementation
  • Update documentation
  • Related to #234 (API performance improvements)
  • Related to #235 (Memory optimization)
  • Blocks #236 (Enterprise tier launch - requires large dataset support)
  • Depends on #237 (Database read replica setup)
  • See design doc: Streaming Reports Architecture

Priority

P1-High

Justification:

  • Affects 45% of report generation attempts (critical failure rate)
  • Generating 20+ support tickets per week (significant support burden)
  • Enterprise customer churn risk ($50K ARR)
  • Competitive disadvantage (competitors support larger datasets)

Timeline: Target completion in 3 weeks (includes testing and gradual rollout)

Complexity Estimate

  • Effort: 2-3 weeks (including testing and gradual rollout)
  • Risk: Medium (requires careful testing of streaming implementation)
  • Dependencies: Database read replica setup (Issue #237)
  • Skills needed: Backend (Python/FastAPI), Frontend (JavaScript/SSE), Database optimization

Labels

bug, performance, P1-high, backend, frontend, user-experience

Assignees

  • Backend: @backend-dev
  • Frontend: @frontend-dev
  • QA: @qa-engineer

Issue created by: Product Manager (@pm-user) Date: 2025-11-12 Milestone: Q4 2025