TradingAgents/.claude/skills/github-workflow/examples/issue-template.md

414 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Example Issue Description
## Problem
Report generation times out for datasets containing more than 10,000 rows, causing 45% of export attempts to fail and generating 20+ support tickets per week.
### Current Behavior
1. User clicks "Generate Report" button for dataset with >10K rows
2. No progress indication displayed to user
3. After 60 seconds, browser timeout error appears: "Request timeout"
4. No way to resume, cancel, or save partial results
5. User forced to manually split dataset and export in smaller batches
### Impact
**Users:**
- 45% of report generation attempts fail (from analytics)
- Average 3-5 retry attempts before giving up or contacting support
- Lost productivity: ~15 minutes per failed export
**Business:**
- 20-25 support tickets per week (5 hours support time)
- User frustration score: 3.2/10 (below acceptable threshold of 7/10)
- Enterprise customers threatening to churn due to export limitations
**Technical:**
- Server memory spikes to 2GB+ during large exports
- CPU usage reaches 100% during processing
- Occasional OOM crashes affecting other users
### Root Cause Analysis
Current implementation loads entire dataset into memory before processing:
```python
# Current approach (problematic)
def generate_report(dataset_id):
# Load ALL data into memory at once
data = db.query(f"SELECT * FROM {dataset_id}").fetchall() # 10K+ rows
# Process all data before returning
results = process_all_data(data) # Blocks for 60+ seconds
return results # Times out before reaching this point
```
Problems:
1. No streaming - all data loaded at once
2. No progress tracking - user sees nothing for 60s
3. No cancellation - process continues even if user navigates away
4. No memory limits - can spike to 2GB+
## Solution
Implement streaming report generation with progressive rendering and chunked processing.
### Proposed Architecture
```
┌──────────┐ 1. Request ┌────────────────┐ 2. Query ┌──────────┐
│ Client │ ────────────> │ API Server │ ──────────> │ Database │
└──────────┘ └────────────────┘ └──────────┘
│ │ │
│ │ 3. Stream results │
│ │ <───────────────────────────┘
│ │
│ 4. Server-Sent Events │ 5. Process chunks (1K rows)
│ (progress updates) │ Send to client as ready
│ <──────────────────────────│
│ │
│ 6. Progressive rendering │
│ Display results as │
│ they arrive │
```
### Implementation Approach
**Backend (Python/FastAPI):**
```python
async def generate_report_streaming(dataset_id):
"""Stream report generation with chunked processing."""
async def event_generator():
# Query with cursor (no full load)
cursor = db.cursor()
cursor.execute(f"SELECT * FROM {dataset_id}")
total_rows = cursor.rowcount
processed = 0
# Process in 1,000-row chunks
while True:
chunk = cursor.fetchmany(size=1000)
if not chunk:
break
# Process chunk
results = process_chunk(chunk)
# Send progress update
processed += len(chunk)
yield {
"progress": (processed / total_rows) * 100,
"data": results
}
return StreamingResponse(event_generator(), media_type="text/event-stream")
```
**Frontend (JavaScript):**
```javascript
// Connect to streaming endpoint
const eventSource = new EventSource('/api/reports/stream/' + datasetId);
// Update progress bar
eventSource.addEventListener('message', (event) => {
const { progress, data } = JSON.parse(event.data);
// Update UI
progressBar.value = progress;
resultsTable.append(data);
if (progress >= 100) {
eventSource.close();
showCompleteMessage();
}
});
// Allow cancellation
cancelButton.onclick = () => {
eventSource.close();
fetch('/api/reports/cancel/' + jobId, { method: 'POST' });
};
```
### Key Features
1. **Chunked processing**: Process 1,000 rows at a time
2. **Progressive rendering**: Display results as they arrive
3. **Progress tracking**: Real-time percentage indicator
4. **Cancellation support**: User can cancel at any time
5. **Memory limits**: Max 500MB regardless of dataset size
6. **Fault tolerance**: Resume on network interruption
## Motivation
### User Impact
- **Current**: 45% failure rate → 2-3 hour productivity loss per week
- **After fix**: <1% failure rate 30 minutes saved per week per user
- **Scale**: 500 active users × 30 min/week = 250 hours/week saved
### Business Impact
- Reduce support tickets from 20/week to <5/week (15 hours/week saved)
- Improve user satisfaction score from 3.2/10 to >7/10
- Prevent enterprise customer churn ($50K ARR at risk)
- Enable larger dataset support (competitive advantage)
### Technical Impact
- Reduce server memory usage by 75% (2GB → 500MB)
- Enable horizontal scaling (stateless processing)
- Improve overall system stability (fewer OOM crashes)
- Better resource utilization (CPU distributed over time)
## Acceptance Criteria
### Functional Requirements
- [ ] Reports with 10K+ rows complete successfully without timeout
- [ ] First results visible within 2 seconds of clicking "Generate"
- [ ] Complete report generated in <10 seconds for 10K rows
- [ ] Progress indicator shows accurate % complete during generation
- [ ] User can cancel report generation at any time
- [ ] Partial results saved if user cancels
- [ ] Report generation works for datasets up to 100K rows
### Non-Functional Requirements
- [ ] Memory usage stays below 500MB regardless of dataset size
- [ ] No memory leaks (tested with 100 consecutive report generations)
- [ ] Works on Chrome 119+, Firefox 120+, Safari 17+
- [ ] Responsive on mobile devices (tablet and desktop)
- [ ] Handles slow network connections (3G, throttled)
### Performance Targets
| Metric | Current | Target | Improvement |
|--------|---------|--------|-------------|
| Success rate | 55% | >99% | +80% |
| Time to first result | N/A (timeout) | <2s | |
| Complete export (10K rows) | Timeout (60s) | <10s | 6x faster |
| Memory usage (10K rows) | 2GB+ | <500MB | 75% reduction |
| Support tickets/week | 20-25 | <5 | 80% reduction |
### Edge Cases
- [ ] Empty datasets display "No data" message
- [ ] Datasets with 100K+ rows generate successfully (may take 30-60s)
- [ ] Special characters render correctly (unicode, emojis, HTML entities)
- [ ] Network interruption shows error and allows retry
- [ ] Concurrent report generation by same user works correctly
- [ ] Server restart during generation shows clear error message
### Error Handling
- [ ] Database connection errors display user-friendly message
- [ ] Permission denied shows appropriate error (403 Forbidden)
- [ ] Invalid dataset ID returns 404 Not Found
- [ ] Rate limiting (>5 concurrent reports) shows clear message
- [ ] Timeout after 5 minutes shows clear error and suggests smaller dataset
## Technical Approach
### Architecture Changes
**Current (Synchronous):**
```
Client ──> API Server ──> Database
↓ (load all)
Process all
Return
(timeout!)
```
**Proposed (Streaming):**
```
Client ──> API Server ──> Database
↑ ↓ (cursor) ↓
│ Stream chunks Stream rows
│ ↓ ↑
└──── Progressive ─────────┘
rendering
```
### Implementation Steps
#### Phase 1: Backend Streaming (Week 1)
1. Add FastAPI StreamingResponse support
2. Implement chunked database queries (1K rows/chunk)
3. Add Server-Sent Events (SSE) endpoint
4. Implement job cancellation endpoint
5. Add memory usage monitoring
#### Phase 2: Frontend Progressive Rendering (Week 1)
1. Add EventSource for SSE connection
2. Implement progress bar component
3. Add cancel button with confirmation
4. Implement progressive table rendering
5. Add error handling and retry logic
#### Phase 3: Testing & Optimization (Week 2)
1. Load testing with 100K row datasets
2. Memory profiling during generation
3. Concurrent user testing (10 simultaneous exports)
4. Edge case testing (network interruption, cancellation)
5. Performance tuning (chunk size optimization)
#### Phase 4: Deployment (Week 2)
1. Deploy to staging environment
2. Internal beta testing (dev team)
3. Gradual rollout (10% → 50% → 100%)
4. Monitor error rates and performance
5. Full production deployment
### Database Optimization
- Add index on frequently filtered columns
- Use read replicas for report queries (reduce load on primary)
- Implement query result caching for identical requests
### Monitoring
- Track report generation success rate
- Monitor memory usage per report
- Alert on failure rate >5%
- Track average generation time
## Alternatives Considered
### Alternative 1: Asynchronous Job Queue
**Approach:** Submit report to background job queue, email user when complete
**Pros:**
- Simple implementation (Celery + Redis)
- No frontend changes needed
- Works for very large datasets
**Cons:**
- Poor UX (user must wait for email)
- No real-time progress updates
- Increased infrastructure complexity
- Doesn't solve immediate feedback problem
**Decision:** Rejected - UX too poor for interactive reports
### Alternative 2: Client-Side Processing
**Approach:** Download raw data, process in browser with Web Workers
**Pros:**
- Offloads processing to client
- No server load
**Cons:**
- Slow download for large datasets
- High bandwidth usage
- Limited by browser memory
- Requires significant client-side code
**Decision:** Rejected - Not viable for 10K+ row datasets
### Alternative 3: Paginated Results
**Approach:** Show first 100 rows, user clicks "Load More"
**Pros:**
- Fast initial load
- Simple implementation
**Cons:**
- User must click multiple times for full report
- Not a true "export" solution
- Poor UX for users needing complete data
**Decision:** Rejected - Doesn't meet user requirements
## Open Questions
- [x] Should we cache generated reports? → No, data changes frequently
- [x] What's the ideal chunk size? → 1,000 rows (tested)
- [x] Should we limit concurrent reports per user? → Yes, max 5
- [ ] Should we support export to CSV/Excel during streaming?
- [ ] Should we add email notification when generation completes?
## Testing Strategy
### Unit Tests
- `test_streaming_report_generator.py`: Chunked processing logic
- `test_progress_tracking.py`: Accurate progress calculation
- `test_cancellation.py`: Job cancellation and cleanup
- `test_error_handling.py`: Database errors, network issues
### Integration Tests
- `test_report_api.py`: End-to-end streaming report generation
- `test_concurrent_reports.py`: Multiple simultaneous reports
- `test_large_datasets.py`: 100K row datasets
### Load Tests
```bash
# Test with 50 concurrent users generating 10K row reports
locust -f tests/load/test_report_streaming.py --users 50 --spawn-rate 5
# Performance targets:
# - 99th percentile response time: <15s
# - Error rate: <1%
# - Memory usage per worker: <500MB
```
### Edge Case Tests
- Empty dataset
- Single row dataset
- 100K row dataset
- Network interruption mid-generation
- Database connection loss
- Server restart during generation
- Concurrent cancellations
## Rollout Plan
### Week 1: Development
- [x] Implement backend streaming
- [x] Implement frontend progressive rendering
- [x] Unit tests and integration tests
### Week 2: Testing & Staging
- [x] Load testing
- [x] Deploy to staging
- [x] Internal testing (dev team)
- [x] Fix any issues found
### Week 3: Gradual Production Rollout
- [ ] Deploy to production with feature flag
- [ ] Enable for 10% of users
- [ ] Monitor error rates, performance metrics
- [ ] If successful, increase to 50%
- [ ] If successful, increase to 100%
### Week 4: Full Deployment
- [ ] 100% of users on streaming reports
- [ ] Remove old synchronous implementation
- [ ] Update documentation
## Related
- Related to #234 (API performance improvements)
- Related to #235 (Memory optimization)
- Blocks #236 (Enterprise tier launch - requires large dataset support)
- Depends on #237 (Database read replica setup)
- See design doc: [Streaming Reports Architecture](link)
## Priority
**P1-High**
**Justification:**
- Affects 45% of report generation attempts (critical failure rate)
- Generating 20+ support tickets per week (significant support burden)
- Enterprise customer churn risk ($50K ARR)
- Competitive disadvantage (competitors support larger datasets)
**Timeline:** Target completion in 3 weeks (includes testing and gradual rollout)
## Complexity Estimate
- **Effort**: 2-3 weeks (including testing and gradual rollout)
- **Risk**: Medium (requires careful testing of streaming implementation)
- **Dependencies**: Database read replica setup (Issue #237)
- **Skills needed**: Backend (Python/FastAPI), Frontend (JavaScript/SSE), Database optimization
## Labels
`bug`, `performance`, `P1-high`, `backend`, `frontend`, `user-experience`
## Assignees
- Backend: @backend-dev
- Frontend: @frontend-dev
- QA: @qa-engineer
---
**Issue created by:** Product Manager (@pm-user)
**Date:** 2025-11-12
**Milestone:** Q4 2025