TradingAgents/SECURITY_AUDIT.md

532 lines
13 KiB
Markdown

# TradingAgents Security Audit Report
**Date:** 2025-11-14
**Auditor:** Claude (AI Security Analysis)
**Project:** TradingAgents - Multi-Agents LLM Financial Trading Framework
**Version:** Current main branch
---
## Executive Summary
This security audit identifies critical and moderate security vulnerabilities in the TradingAgents codebase, along with recommendations for remediation. The project handles sensitive financial data and API keys, making security a top priority.
### Risk Summary
- **Critical Issues:** 3
- **High Issues:** 5
- **Medium Issues:** 7
- **Low Issues:** 4
---
## Critical Security Issues
### 1. Path Traversal Vulnerability (CRITICAL)
**File:** `cli/main.py:757`
**Risk Level:** Critical
**CVSS Score:** 8.6
**Issue:**
```python
results_dir = Path(config["results_dir"]) / selections["ticker"] / selections["analysis_date"]
```
User-controlled input (`ticker` and `analysis_date`) is used directly in file path construction without sanitization.
**Attack Scenario:**
```python
ticker = "../../../etc/passwd"
analysis_date = "../../secrets"
# Results in: ./results/../../../etc/passwd/../../secrets
```
**Remediation:**
```python
import re
from pathlib import Path
def sanitize_path_component(value):
"""Sanitize user input for safe file path usage."""
# Remove any path traversal attempts
value = value.replace('..', '').replace('/', '').replace('\\', '')
# Allow only alphanumeric, dash, underscore
value = re.sub(r'[^a-zA-Z0-9_-]', '_', value)
return value
# Usage
results_dir = Path(config["results_dir"]) / sanitize_path_component(selections["ticker"]) / sanitize_path_component(selections["analysis_date"])
```
### 2. Hardcoded Developer Path Exposure (CRITICAL)
**File:** `tradingagents/default_config.py:6`
**Risk Level:** Critical
**CVSS Score:** 7.5
**Issue:**
```python
"data_dir": "/Users/yluo/Documents/Code/ScAI/FR1-data",
```
Exposes developer's local file system structure and potentially identifies system users.
**Remediation:**
```python
"data_dir": os.getenv("TRADINGAGENTS_DATA_DIR", "./data"),
```
### 3. No Input Validation on External API Calls (CRITICAL)
**File:** `tradingagents/dataflows/googlenews_utils.py:60-64`
**Risk Level:** Critical
**CVSS Score:** 8.1
**Issue:**
```python
url = (
f"https://www.google.com/search?q={query}"
f"&tbs=cdr:1,cd_min:{start_date},cd_max:{end_date}"
f"&tbm=nws&start={offset}"
)
```
User input is directly interpolated into URLs without encoding or validation.
**Remediation:**
```python
from urllib.parse import quote_plus
url = (
f"https://www.google.com/search?q={quote_plus(query)}"
f"&tbs=cdr:1,cd_min:{quote_plus(start_date)},cd_max:{quote_plus(end_date)}"
f"&tbm=nws&start={int(offset)}"
)
```
---
## High Security Issues
### 4. Missing API Key Validation (HIGH)
**File:** `tradingagents/dataflows/openai.py:7`
**Risk Level:** High
**Issue:**
The OpenAI client is initialized without checking if API key is set, leading to unclear error messages.
**Remediation:**
```python
import os
def get_openai_client():
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise ValueError(
"OPENAI_API_KEY environment variable is not set. "
"Please set it in your .env file or environment."
)
config = get_config()
return OpenAI(base_url=config["backend_url"], api_key=api_key)
```
### 5. No Rate Limiting Protection (HIGH)
**File:** Multiple data vendor files
**Risk Level:** High
**Issue:**
No centralized rate limiting for API calls could lead to:
- API quota exhaustion
- Service denial
- Unexpected costs
**Remediation:**
Implement a rate limiter using `ratelimit` library or custom implementation:
```python
from functools import wraps
import time
from collections import deque
class RateLimiter:
def __init__(self, max_calls, period):
self.max_calls = max_calls
self.period = period
self.calls = deque()
def __call__(self, func):
@wraps(func)
def wrapper(*args, **kwargs):
now = time.time()
# Remove old calls
while self.calls and self.calls[0] < now - self.period:
self.calls.popleft()
if len(self.calls) >= self.max_calls:
sleep_time = self.period - (now - self.calls[0])
if sleep_time > 0:
time.sleep(sleep_time)
self.calls.popleft()
self.calls.append(time.time())
return func(*args, **kwargs)
return wrapper
# Usage
@RateLimiter(max_calls=60, period=60) # 60 calls per minute
def make_api_call():
pass
```
### 6. Insufficient Error Handling in API Calls (HIGH)
**File:** `tradingagents/dataflows/alpha_vantage_common.py:66`
**Risk Level:** High
**Issue:**
```python
response = requests.get(API_BASE_URL, params=api_params)
response.raise_for_status()
```
No timeout specified, could lead to hung connections.
**Remediation:**
```python
response = requests.get(
API_BASE_URL,
params=api_params,
timeout=30, # 30 second timeout
verify=True # Ensure SSL verification
)
```
### 7. Debug Mode Enabled in Production Examples (HIGH)
**File:** Multiple files
**Risk Level:** High
**Issue:**
Documentation examples show `debug=True`:
```python
ta = TradingAgentsGraph(debug=True, config=config)
```
**Remediation:**
Update all examples to:
```python
# For production
ta = TradingAgentsGraph(debug=False, config=config)
# For development only
# ta = TradingAgentsGraph(debug=True, config=config)
```
### 8. No Test Coverage (HIGH)
**File:** Project-wide
**Risk Level:** High
**Issue:**
No unit tests or integration tests found. This makes it difficult to:
- Verify security fixes
- Prevent regressions
- Ensure code quality
**Remediation:**
Create comprehensive test suite. Example structure:
```
tests/
├── __init__.py
├── unit/
│ ├── test_security.py
│ ├── test_config.py
│ ├── test_dataflows.py
│ └── test_agents.py
├── integration/
│ ├── test_trading_graph.py
│ └── test_api_vendors.py
└── security/
├── test_input_validation.py
├── test_path_traversal.py
└── test_api_key_handling.py
```
---
## Medium Security Issues
### 9. Exposed Global State (MEDIUM)
**File:** `tradingagents/dataflows/alpha_vantage_common.py:57`
**Risk Level:** Medium
**Issue:**
```python
current_entitlement = globals().get('_current_entitlement')
```
Using globals for configuration is error-prone and not thread-safe.
**Remediation:**
Use configuration objects or dependency injection instead.
### 10. Web Scraping User-Agent Spoofing (MEDIUM)
**File:** `tradingagents/dataflows/googlenews_utils.py:48-54`
**Risk Level:** Medium
**Issue:**
User-Agent spoofing may violate Google's Terms of Service.
**Remediation:**
- Use official Google News API if available
- Clearly document scraping behavior
- Implement respectful rate limiting
- Consider alternative news sources with official APIs
### 11. No Secret Scanning in CI/CD (MEDIUM)
**Risk Level:** Medium
**Issue:**
No automated secret scanning in version control.
**Remediation:**
Add `.gitguard.yml` or use GitHub secret scanning:
```yaml
# .github/workflows/security.yml
name: Security Scan
on: [push, pull_request]
jobs:
secret-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: TruffleHog Secret Scan
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: ${{ github.event.repository.default_branch }}
```
### 12. Insufficient Logging of Security Events (MEDIUM)
**Risk Level:** Medium
**Issue:**
No logging of:
- Failed authentication attempts
- API rate limit hits
- Unusual activity patterns
**Remediation:**
Implement security event logging:
```python
import logging
security_logger = logging.getLogger('security')
security_logger.setLevel(logging.INFO)
# Log security events
security_logger.warning(f"API rate limit exceeded for vendor: {vendor}")
security_logger.info(f"API key rotation detected")
```
### 13. No Dependency Pinning (MEDIUM)
**File:** `requirements.txt`
**Risk Level:** Medium
**Issue:**
Dependencies specified without version pins:
```
typing-extensions
langchain-openai
pandas
```
**Remediation:**
Pin all dependencies with specific versions:
```
typing-extensions==4.15.0
langchain-openai==1.0.2
pandas==2.3.3
```
Or use `requirements-lock.txt`:
```bash
pip freeze > requirements-lock.txt
```
### 14. Missing Security Headers (MEDIUM)
**Risk Level:** Medium (if web interface is added)
**Remediation:**
If adding web interface, implement security headers:
```python
security_headers = {
'X-Content-Type-Options': 'nosniff',
'X-Frame-Options': 'DENY',
'X-XSS-Protection': '1; mode=block',
'Strict-Transport-Security': 'max-age=31536000; includeSubDomains',
'Content-Security-Policy': "default-src 'self'"
}
```
### 15. No Data Validation Schema (MEDIUM)
**Risk Level:** Medium
**Issue:**
No validation of API responses before processing.
**Remediation:**
Use Pydantic for data validation:
```python
from pydantic import BaseModel, validator
class StockData(BaseModel):
ticker: str
date: str
price: float
@validator('ticker')
def validate_ticker(cls, v):
if not v.isalnum() or len(v) > 10:
raise ValueError('Invalid ticker symbol')
return v.upper()
```
---
## Low Security Issues
### 16. Verbose Error Messages (LOW)
**Risk Level:** Low
**Issue:**
Error messages may leak sensitive information in production.
**Remediation:**
Implement error handling that logs detailed errors but shows generic messages to users.
### 17. No Security.md File (LOW)
**Risk Level:** Low
**Remediation:**
Create SECURITY.md for responsible disclosure.
### 18. No Code Signing (LOW)
**Risk Level:** Low
**Remediation:**
Consider signing releases with GPG.
### 19. No Dependency Vulnerability Scanning (LOW)
**Risk Level:** Low
**Remediation:**
Add `pip-audit` or `safety` to CI/CD:
```bash
pip install pip-audit
pip-audit
```
---
## Compliance Considerations
### Financial Data Handling
1. **GDPR**: If handling EU user data
2. **SOC 2**: For service providers
3. **PCI DSS**: If handling payment data (future)
### Recommendations:
- Document data retention policies
- Implement data encryption at rest
- Add audit logging
- Create data access controls
---
## Security Best Practices for Contributors
### 1. Environment Setup
- Never commit `.env` files
- Use `.env.example` as template
- Rotate API keys regularly
### 2. Code Review Checklist
- [ ] No hardcoded secrets
- [ ] Input validation on all user inputs
- [ ] Error handling doesn't leak sensitive info
- [ ] Dependencies are pinned and scanned
- [ ] Tests added for security-critical code
### 3. Secure Development Guidelines
- Always validate and sanitize user input
- Use parameterized queries (if SQL is added)
- Implement least privilege principle
- Log security events
- Keep dependencies updated
---
## Automated Security Tools Recommendations
1. **Static Analysis:**
- `bandit` - Python security linter
- `semgrep` - Lightweight static analysis
2. **Dependency Scanning:**
- `pip-audit` - Check for known vulnerabilities
- `safety` - Check Python dependencies
3. **Secret Scanning:**
- `trufflehog` - Find secrets in git history
- `gitleaks` - Detect hardcoded secrets
4. **Dynamic Analysis:**
- `pytest-security` - Security testing framework
---
## Immediate Action Items
### Priority 1 (Critical - Fix Immediately)
1. Fix path traversal vulnerability in cli/main.py
2. Remove hardcoded path from default_config.py
3. Add input validation to URL construction
### Priority 2 (High - Fix This Week)
1. Add API key validation
2. Implement rate limiting
3. Add timeouts to all network requests
4. Pin all dependencies
5. Add basic test coverage
### Priority 3 (Medium - Fix This Month)
1. Implement comprehensive logging
2. Add secret scanning to CI/CD
3. Create SECURITY.md
4. Add data validation schemas
### Priority 4 (Low - Fix When Possible)
1. Improve error messages
2. Add code signing
3. Document security practices
---
## Conclusion
The TradingAgents framework has several security issues that should be addressed before production use. The critical issues around path traversal and input validation pose immediate risks and should be fixed as the highest priority.
The project would benefit from:
1. Comprehensive test coverage
2. Automated security scanning
3. Clear security documentation
4. Regular security audits
---
## References
- OWASP Top 10: https://owasp.org/www-project-top-ten/
- Python Security Best Practices: https://python.readthedocs.io/en/stable/library/security_warnings.html
- CWE-22 Path Traversal: https://cwe.mitre.org/data/definitions/22.html
- CWE-89 SQL Injection: https://cwe.mitre.org/data/definitions/89.html