4.5 KiB

Raw Blame History

Concurrent Scanner Execution

Overview

Implemented concurrent scanner execution using Python's ThreadPoolExecutor to improve discovery pipeline performance by 25-30%.

Performance Results

Concurrent (8 workers):  42-43 seconds
Sequential (1 worker):   54-56 seconds
Improvement:             25-30% faster ⚡

Configuration

Add to your config or use defaults in default_config.py:

"scanner_execution": {
    "concurrent": True,        # Enable parallel execution
    "max_workers": 8,          # Max concurrent scanner threads
    "timeout_seconds": 30,     # Per-scanner timeout
}

How It Works

Thread Pool Execution

Scanner Preparation: All enabled scanners are instantiated and validated
Concurrent Dispatch: Scanners submitted to ThreadPoolExecutor
State Isolation: Each scanner gets a copy of state (thread-safe)
Result Collection: Candidates collected as scanners complete
Log Merging: Tool logs merged back into main state

Timeout Handling

# Per-scanner timeout (not global timeout)
for future in as_completed(future_to_scanner):
    try:
        result = future.result(timeout=timeout_seconds)
        # Process result
    except TimeoutError:
        # Scanner timed out, continue with others
        logger.warning(f"Scanner {name} timed out")

Key insight: Using per-scanner timeout instead of global timeout means slow scanners don't block the entire batch.

Error Isolation

def run_scanner(scanner_info):
    try:
        candidates = scanner.scan_with_validation(state_copy)
        return (name, pipeline, candidates, None)
    except Exception as e:
        # Return error, don't raise
        return (name, pipeline, [], str(e))

Key insight: Each scanner runs in isolation. One failure doesn't stop others.

Why ThreadPoolExecutor?

I/O-Bound Operations

Scanners spend most time waiting for:

API responses (Reddit, Finnhub, Alpha Vantage)
Network requests (news, fundamentals)
Database queries

CPU time is minimal compared to I/O waits.

GIL Not a Problem

Python's Global Interpreter Lock (GIL) doesn't affect I/O-bound code because:

Threads release GIL during I/O operations
Multiple threads can wait on I/O concurrently
Only one thread executes Python bytecode at a time (but that's fast)

State Management

# Thread-safe pattern
scanner_state = state.copy()  # Each thread gets copy
scanner.scan(scanner_state)    # No race conditions

# Merge results after completion
state["tool_logs"].extend(scanner_state["tool_logs"])

Key insight: Copying state dict is cheap (<1ms) compared to API latency (5-10s).

Testing

Run comprehensive tests:

# Full test suite
python tests/test_concurrent_scanners.py

# Quick verification
python verify_concurrent_execution.py

Test coverage:

✅ Concurrent execution works
✅ Sequential fallback when disabled
✅ Timeout handling (graceful degradation)
✅ Error isolation (one failure doesn't stop others)
✅ Same candidates found in both modes

Disabling Concurrent Execution

Set concurrent: False to revert to sequential execution:

config["discovery"]["scanner_execution"]["concurrent"] = False

Useful for:

Debugging individual scanners
Environments with limited resources
Rate limit testing

Performance Tips

Optimal Worker Count: 8 workers balances parallelism with resource usage
- Too few: Underutilized (scanners wait in queue)
- Too many: Thread overhead, potential rate limiting
Timeout Configuration: 30s per scanner is reasonable
- Too short: Legitimate slow scanners timeout
- Too long: Keeps slow scanners running unnecessarily
Enable for Production: Always use concurrent mode unless debugging

Monitoring

Concurrent execution logs scanner completion:

Running 8 scanners concurrently (max 8 workers)...
✓ market_movers: 10 candidates
✓ insider_buying: 20 candidates
⏱️  slow_scanner: timeout after 30s
⚠️  broken_scanner: HTTP 500 error
✓ volume_accumulation: 2 candidates

Next Steps

Remaining performance optimizations:

Rate Limiting: Add exponential backoff for API calls
TTL Caching: Time-based cache for expensive operations
Circuit Breaker: Auto-disable consistently failing scanners

Implementation Files

tradingagents/default_config.py - Configuration
tradingagents/graph/discovery_graph.py - Execution logic
tests/test_concurrent_scanners.py - Test suite
verify_concurrent_execution.py - Quick verification

4.5 KiB Raw Blame History