14 KiB

Raw Blame History

Example Issue Description

Problem

Report generation times out for datasets containing more than 10,000 rows, causing 45% of export attempts to fail and generating 20+ support tickets per week.

Current Behavior

User clicks "Generate Report" button for dataset with >10K rows
No progress indication displayed to user
After 60 seconds, browser timeout error appears: "Request timeout"
No way to resume, cancel, or save partial results
User forced to manually split dataset and export in smaller batches

Impact

Users:

45% of report generation attempts fail (from analytics)
Average 3-5 retry attempts before giving up or contacting support
Lost productivity: ~15 minutes per failed export

Business:

20-25 support tickets per week (5 hours support time)
User frustration score: 3.2/10 (below acceptable threshold of 7/10)
Enterprise customers threatening to churn due to export limitations

Technical:

Server memory spikes to 2GB+ during large exports
CPU usage reaches 100% during processing
Occasional OOM crashes affecting other users

Root Cause Analysis

Current implementation loads entire dataset into memory before processing:

# Current approach (problematic)
def generate_report(dataset_id):
    # Load ALL data into memory at once
    data = db.query(f"SELECT * FROM {dataset_id}").fetchall()  # 10K+ rows

    # Process all data before returning
    results = process_all_data(data)  # Blocks for 60+ seconds

    return results  # Times out before reaching this point

Problems:

No streaming - all data loaded at once
No progress tracking - user sees nothing for 60s
No cancellation - process continues even if user navigates away
No memory limits - can spike to 2GB+

Solution

Implement streaming report generation with progressive rendering and chunked processing.

Proposed Architecture

┌──────────┐  1. Request   ┌────────────────┐  2. Query   ┌──────────┐
│  Client  │ ────────────> │  API Server    │ ──────────> │ Database │
└──────────┘               └────────────────┘             └──────────┘
     │                            │                             │
     │                            │ 3. Stream results           │
     │                            │ <───────────────────────────┘
     │                            │
     │ 4. Server-Sent Events      │ 5. Process chunks (1K rows)
     │    (progress updates)      │    Send to client as ready
     │ <──────────────────────────│
     │                            │
     │ 6. Progressive rendering   │
     │    Display results as      │
     │    they arrive             │

Implementation Approach

Backend (Python/FastAPI):

async def generate_report_streaming(dataset_id):
    """Stream report generation with chunked processing."""
    async def event_generator():
        # Query with cursor (no full load)
        cursor = db.cursor()
        cursor.execute(f"SELECT * FROM {dataset_id}")

        total_rows = cursor.rowcount
        processed = 0

        # Process in 1,000-row chunks
        while True:
            chunk = cursor.fetchmany(size=1000)
            if not chunk:
                break

            # Process chunk
            results = process_chunk(chunk)

            # Send progress update
            processed += len(chunk)
            yield {
                "progress": (processed / total_rows) * 100,
                "data": results
            }

    return StreamingResponse(event_generator(), media_type="text/event-stream")

Frontend (JavaScript):

// Connect to streaming endpoint
const eventSource = new EventSource('/api/reports/stream/' + datasetId);

// Update progress bar
eventSource.addEventListener('message', (event) => {
    const { progress, data } = JSON.parse(event.data);

    // Update UI
    progressBar.value = progress;
    resultsTable.append(data);

    if (progress >= 100) {
        eventSource.close();
        showCompleteMessage();
    }
});

// Allow cancellation
cancelButton.onclick = () => {
    eventSource.close();
    fetch('/api/reports/cancel/' + jobId, { method: 'POST' });
};

Key Features

Chunked processing: Process 1,000 rows at a time
Progressive rendering: Display results as they arrive
Progress tracking: Real-time percentage indicator
Cancellation support: User can cancel at any time
Memory limits: Max 500MB regardless of dataset size
Fault tolerance: Resume on network interruption

Motivation

User Impact

Current: 45% failure rate → 2-3 hour productivity loss per week
After fix: <1% failure rate → 30 minutes saved per week per user
Scale: 500 active users × 30 min/week = 250 hours/week saved

Business Impact

Reduce support tickets from 20/week to <5/week (15 hours/week saved)
Improve user satisfaction score from 3.2/10 to >7/10
Prevent enterprise customer churn ($50K ARR at risk)
Enable larger dataset support (competitive advantage)

Technical Impact

Reduce server memory usage by 75% (2GB → 500MB)
Enable horizontal scaling (stateless processing)
Improve overall system stability (fewer OOM crashes)
Better resource utilization (CPU distributed over time)

Acceptance Criteria

Functional Requirements

Reports with 10K+ rows complete successfully without timeout
First results visible within 2 seconds of clicking "Generate"
Complete report generated in <10 seconds for 10K rows
Progress indicator shows accurate % complete during generation
User can cancel report generation at any time
Partial results saved if user cancels
Report generation works for datasets up to 100K rows

Non-Functional Requirements

Memory usage stays below 500MB regardless of dataset size
No memory leaks (tested with 100 consecutive report generations)
Works on Chrome 119+, Firefox 120+, Safari 17+
Responsive on mobile devices (tablet and desktop)
Handles slow network connections (3G, throttled)

Performance Targets

Metric	Current	Target	Improvement
Success rate	55%	>99%	+80%
Time to first result	N/A (timeout)	<2s	∞
Complete export (10K rows)	Timeout (60s)	<10s	6x faster
Memory usage (10K rows)	2GB+	<500MB	75% reduction
Support tickets/week	20-25	<5	80% reduction

Edge Cases

Empty datasets display "No data" message
Datasets with 100K+ rows generate successfully (may take 30-60s)
Special characters render correctly (unicode, emojis, HTML entities)
Network interruption shows error and allows retry
Concurrent report generation by same user works correctly
Server restart during generation shows clear error message

Error Handling

Database connection errors display user-friendly message
Permission denied shows appropriate error (403 Forbidden)
Invalid dataset ID returns 404 Not Found
Rate limiting (>5 concurrent reports) shows clear message
Timeout after 5 minutes shows clear error and suggests smaller dataset

Technical Approach

Architecture Changes

Current (Synchronous):

Client ──> API Server ──> Database
                ↓ (load all)
            Process all
                ↓
              Return
                ↓
          (timeout!)

Proposed (Streaming):

Client ──> API Server ──> Database
   ↑          ↓ (cursor)      ↓
   │      Stream chunks    Stream rows
   │          ↓               ↑
   └──── Progressive ─────────┘
         rendering

Implementation Steps

Phase 1: Backend Streaming (Week 1)

Add FastAPI StreamingResponse support
Implement chunked database queries (1K rows/chunk)
Add Server-Sent Events (SSE) endpoint
Implement job cancellation endpoint
Add memory usage monitoring

Phase 2: Frontend Progressive Rendering (Week 1)

Add EventSource for SSE connection
Implement progress bar component
Add cancel button with confirmation
Implement progressive table rendering
Add error handling and retry logic

Phase 3: Testing & Optimization (Week 2)

Load testing with 100K row datasets
Memory profiling during generation
Concurrent user testing (10 simultaneous exports)
Edge case testing (network interruption, cancellation)
Performance tuning (chunk size optimization)

Phase 4: Deployment (Week 2)

Deploy to staging environment
Internal beta testing (dev team)
Gradual rollout (10% → 50% → 100%)
Monitor error rates and performance
Full production deployment

Database Optimization

Add index on frequently filtered columns
Use read replicas for report queries (reduce load on primary)
Implement query result caching for identical requests

Monitoring

Track report generation success rate
Monitor memory usage per report
Alert on failure rate >5%
Track average generation time

Alternatives Considered

Alternative 1: Asynchronous Job Queue

Approach: Submit report to background job queue, email user when complete

Pros:

Simple implementation (Celery + Redis)
No frontend changes needed
Works for very large datasets

Cons:

Poor UX (user must wait for email)
No real-time progress updates
Increased infrastructure complexity
Doesn't solve immediate feedback problem

Decision: Rejected - UX too poor for interactive reports

Alternative 2: Client-Side Processing

Approach: Download raw data, process in browser with Web Workers

Pros:

Offloads processing to client
No server load

Cons:

Slow download for large datasets
High bandwidth usage
Limited by browser memory
Requires significant client-side code

Decision: Rejected - Not viable for 10K+ row datasets

Alternative 3: Paginated Results

Approach: Show first 100 rows, user clicks "Load More"

Pros:

Fast initial load
Simple implementation

Cons:

User must click multiple times for full report
Not a true "export" solution
Poor UX for users needing complete data

Decision: Rejected - Doesn't meet user requirements

Open Questions

Should we cache generated reports? → No, data changes frequently
What's the ideal chunk size? → 1,000 rows (tested)
Should we limit concurrent reports per user? → Yes, max 5
Should we support export to CSV/Excel during streaming?
Should we add email notification when generation completes?

Testing Strategy

Unit Tests

test_streaming_report_generator.py: Chunked processing logic
test_progress_tracking.py: Accurate progress calculation
test_cancellation.py: Job cancellation and cleanup
test_error_handling.py: Database errors, network issues

Integration Tests

test_report_api.py: End-to-end streaming report generation
test_concurrent_reports.py: Multiple simultaneous reports
test_large_datasets.py: 100K row datasets

Load Tests

# Test with 50 concurrent users generating 10K row reports
locust -f tests/load/test_report_streaming.py --users 50 --spawn-rate 5

# Performance targets:
# - 99th percentile response time: <15s
# - Error rate: <1%
# - Memory usage per worker: <500MB

Edge Case Tests

Empty dataset
Single row dataset
100K row dataset
Network interruption mid-generation
Database connection loss
Server restart during generation
Concurrent cancellations

Rollout Plan

Week 1: Development

Implement backend streaming
Implement frontend progressive rendering
Unit tests and integration tests

Week 2: Testing & Staging

Load testing
Deploy to staging
Internal testing (dev team)
Fix any issues found

Week 3: Gradual Production Rollout

Deploy to production with feature flag
Enable for 10% of users
Monitor error rates, performance metrics
If successful, increase to 50%
If successful, increase to 100%

Week 4: Full Deployment

100% of users on streaming reports
Remove old synchronous implementation
Update documentation

Related to #234 (API performance improvements)
Related to #235 (Memory optimization)
Blocks #236 (Enterprise tier launch - requires large dataset support)
Depends on #237 (Database read replica setup)
See design doc: Streaming Reports Architecture

Priority

P1-High

Justification:

Affects 45% of report generation attempts (critical failure rate)
Generating 20+ support tickets per week (significant support burden)
Enterprise customer churn risk ($50K ARR)
Competitive disadvantage (competitors support larger datasets)

Timeline: Target completion in 3 weeks (includes testing and gradual rollout)

Complexity Estimate

Effort: 2-3 weeks (including testing and gradual rollout)
Risk: Medium (requires careful testing of streaming implementation)
Dependencies: Database read replica setup (Issue #237)
Skills needed: Backend (Python/FastAPI), Frontend (JavaScript/SSE), Database optimization

Labels

bug, performance, P1-high, backend, frontend, user-experience

Assignees

Backend: @backend-dev
Frontend: @frontend-dev
QA: @qa-engineer

Issue created by: Product Manager (@pm-user) Date: 2025-11-12 Milestone: Q4 2025

14 KiB Raw Blame History Unescape Escape

Example Issue Description

Problem

Current Behavior

Impact

Root Cause Analysis

Solution

Proposed Architecture

Implementation Approach

Key Features

Motivation

User Impact

Business Impact

Technical Impact

Acceptance Criteria

Functional Requirements

Non-Functional Requirements

Performance Targets

Edge Cases

Error Handling

Technical Approach

Architecture Changes

Implementation Steps

Phase 1: Backend Streaming (Week 1)

Phase 2: Frontend Progressive Rendering (Week 1)

Phase 3: Testing & Optimization (Week 2)

Phase 4: Deployment (Week 2)

Database Optimization

Monitoring

Alternatives Considered

Alternative 1: Asynchronous Job Queue

Alternative 2: Client-Side Processing

Alternative 3: Paginated Results

Open Questions

Testing Strategy

Unit Tests

Integration Tests

Load Tests

Edge Case Tests

Rollout Plan

Week 1: Development

Week 2: Testing & Staging

Week 3: Gradual Production Rollout

Week 4: Full Deployment

Related

Priority

Complexity Estimate

Labels

Assignees

14 KiB

Raw Blame History