TradingAgents/CONTRIBUTING_SECURITY.md

596 lines
14 KiB
Markdown

# Security Best Practices for Contributors
Thank you for contributing to TradingAgents! Security is a top priority for this project. This guide outlines security best practices all contributors should follow.
## Table of Contents
1. [General Security Principles](#general-security-principles)
2. [Code Security Guidelines](#code-security-guidelines)
3. [Secure Coding Checklist](#secure-coding-checklist)
4. [Common Vulnerabilities to Avoid](#common-vulnerabilities-to-avoid)
5. [Security Testing](#security-testing)
6. [Code Review Security Focus](#code-review-security-focus)
7. [Tools and Resources](#tools-and-resources)
---
## General Security Principles
### Defense in Depth
Always implement multiple layers of security:
- Input validation
- Output encoding
- Least privilege
- Fail securely
### Secure by Default
- Default configurations should be secure
- Require explicit opt-in for insecure features
- Never expose sensitive data by default
### Zero Trust
- Validate all inputs, even from trusted sources
- Never assume data is safe
- Always sanitize before use
---
## Code Security Guidelines
### 1. Input Validation
**ALWAYS validate all user inputs:**
```python
# ✗ BAD - No validation
def get_stock_data(ticker: str):
return fetch_data(ticker)
# ✓ GOOD - Proper validation
from tradingagents.security import validate_ticker
def get_stock_data(ticker: str):
ticker = validate_ticker(ticker) # Raises ValueError if invalid
return fetch_data(ticker)
```
**Use the provided validators:**
```python
from tradingagents.security import (
validate_ticker,
validate_date,
sanitize_path_component,
validate_api_key
)
# Validate ticker symbols
ticker = validate_ticker(user_input)
# Validate dates
date = validate_date(user_input_date)
# Sanitize file paths
safe_filename = sanitize_path_component(user_filename)
# Validate API keys
api_key = validate_api_key(os.getenv("API_KEY"), "API_KEY")
```
### 2. Path Traversal Prevention
**NEVER use user input directly in file paths:**
```python
# ✗ BAD - Path traversal vulnerability
from pathlib import Path
results_dir = Path("./results") / user_ticker / user_date
# ✓ GOOD - Sanitized path
from tradingagents.security import sanitize_path_component
results_dir = Path("./results") / sanitize_path_component(user_ticker) / sanitize_path_component(user_date)
```
### 3. Secret Management
**NEVER hardcode secrets:**
```python
# ✗ BAD - Hardcoded secret
API_KEY = "sk-1234567890abcdef"
# ✗ BAD - Hardcoded path with personal info
DATA_DIR = "/Users/john/Documents/private-data"
# ✓ GOOD - Environment variable
import os
API_KEY = os.getenv("API_KEY")
if not API_KEY:
raise ValueError("API_KEY environment variable not set")
# ✓ GOOD - Configurable path
DATA_DIR = os.getenv("TRADINGAGENTS_DATA_DIR", "./data")
```
### 4. API Security
**Always implement proper error handling and timeouts:**
```python
# ✗ BAD - No timeout, poor error handling
response = requests.get(url)
data = response.json()
# ✓ GOOD - Timeout and proper error handling
try:
response = requests.get(
url,
timeout=30, # 30 second timeout
verify=True # Verify SSL certificates
)
response.raise_for_status()
data = response.json()
except requests.exceptions.Timeout:
logger.error("Request timed out")
raise
except requests.exceptions.RequestException as e:
logger.error(f"Request failed: {e}")
raise
```
**Use rate limiting:**
```python
from tradingagents.security import RateLimiter
# Apply rate limiting to API calls
@RateLimiter(max_calls=60, period=60) # 60 calls per minute
def fetch_stock_data(ticker: str):
return api.get_stock_data(ticker)
```
### 5. URL Construction
**Always encode user input in URLs:**
```python
# ✗ BAD - Direct string interpolation
url = f"https://api.example.com/search?q={user_query}"
# ✓ GOOD - Proper URL encoding
from urllib.parse import quote_plus
url = f"https://api.example.com/search?q={quote_plus(user_query)}"
# ✓ BETTER - Use params argument
import requests
response = requests.get(
"https://api.example.com/search",
params={"q": user_query} # Automatically encoded
)
```
### 6. SQL Injection Prevention
**If you add database functionality:**
```python
# ✗ BAD - SQL injection vulnerability
query = f"SELECT * FROM stocks WHERE ticker = '{user_ticker}'"
# ✓ GOOD - Parameterized query
query = "SELECT * FROM stocks WHERE ticker = ?"
cursor.execute(query, (user_ticker,))
```
### 7. Safe File Operations
**Always validate file paths and use safe methods:**
```python
# ✗ BAD - Unsafe file read
with open(user_provided_path, 'r') as f:
data = f.read()
# ✓ GOOD - Validated and safe
from pathlib import Path
from tradingagents.security import sanitize_path_component
safe_path = Path("./data") / sanitize_path_component(user_filename)
# Ensure path is within allowed directory
if not safe_path.resolve().is_relative_to(Path("./data").resolve()):
raise ValueError("Invalid file path")
with open(safe_path, 'r') as f:
data = f.read()
```
### 8. Logging Security
**Never log sensitive information:**
```python
# ✗ BAD - Logging sensitive data
logger.info(f"API request with key: {api_key}")
logger.debug(f"User password: {password}")
# ✓ GOOD - Safe logging
logger.info(f"API request initiated")
logger.debug(f"User ID: {user_id}") # OK to log non-sensitive IDs
# ✓ GOOD - Redact sensitive data
def redact_api_key(key: str) -> str:
if len(key) > 8:
return f"{key[:4]}...{key[-4:]}"
return "***"
logger.info(f"Using API key: {redact_api_key(api_key)}")
```
### 9. Error Messages
**Don't leak sensitive information in error messages:**
```python
# ✗ BAD - Leaking internal paths
except Exception as e:
return f"Error reading file: {str(e)}" # Might expose paths
# ✓ GOOD - Generic error message
except FileNotFoundError:
logger.error(f"File not found: {safe_path}")
return "The requested file was not found"
except Exception as e:
logger.error(f"Unexpected error: {e}")
return "An error occurred while processing your request"
```
### 10. Type Hints and Validation
**Use type hints for better security:**
```python
from typing import Dict, List, Optional
from datetime import datetime
# ✓ GOOD - Clear types make validation easier
def analyze_stock(
ticker: str,
start_date: str,
end_date: str,
config: Optional[Dict] = None
) -> Dict[str, float]:
"""
Analyze stock performance.
Args:
ticker: Stock ticker symbol (validated)
start_date: Start date in YYYY-MM-DD format
end_date: End date in YYYY-MM-DD format
config: Optional configuration dictionary
Returns:
Dictionary with analysis results
Raises:
ValueError: If inputs are invalid
"""
ticker = validate_ticker(ticker)
start_date = validate_date(start_date)
end_date = validate_date(end_date)
# Implementation...
return {"return": 0.15}
```
---
## Secure Coding Checklist
Before submitting a pull request, verify:
- [ ] **Input Validation**
- [ ] All user inputs are validated
- [ ] Used appropriate validators from `tradingagents.security`
- [ ] Edge cases are handled
- [ ] **Path Security**
- [ ] No user input in file paths without sanitization
- [ ] Used `sanitize_path_component` for file operations
- [ ] Paths are restricted to allowed directories
- [ ] **Secret Management**
- [ ] No hardcoded API keys, passwords, or secrets
- [ ] Environment variables used for configuration
- [ ] No personal file paths or usernames in code
- [ ] **API Security**
- [ ] All HTTP requests have timeouts
- [ ] SSL verification is enabled
- [ ] Error handling is comprehensive
- [ ] Rate limiting is applied where appropriate
- [ ] **Error Handling**
- [ ] Errors are logged appropriately
- [ ] Sensitive data is not in error messages
- [ ] Failures are handled gracefully
- [ ] **Logging**
- [ ] No sensitive data in logs
- [ ] Appropriate log levels used
- [ ] Security events are logged
- [ ] **Code Quality**
- [ ] Type hints added
- [ ] Docstrings include security notes
- [ ] Code is readable and maintainable
- [ ] **Testing**
- [ ] Security tests added
- [ ] Edge cases tested
- [ ] Error paths tested
---
## Common Vulnerabilities to Avoid
### 1. Path Traversal (CWE-22)
```python
# ✗ VULNERABLE
user_file = request.get("file")
with open(f"./data/{user_file}", 'r') as f:
# Attacker could use: ../../../../etc/passwd
data = f.read()
# ✓ SECURE
from tradingagents.security import sanitize_path_component
user_file = sanitize_path_component(request.get("file"))
safe_path = Path("./data") / user_file
if not safe_path.resolve().is_relative_to(Path("./data").resolve()):
raise ValueError("Invalid path")
```
### 2. SQL Injection (CWE-89)
```python
# ✗ VULNERABLE
ticker = request.get("ticker")
query = f"SELECT * FROM stocks WHERE ticker = '{ticker}'"
# Attacker could use: AAPL'; DROP TABLE stocks; --
# ✓ SECURE
query = "SELECT * FROM stocks WHERE ticker = ?"
cursor.execute(query, (ticker,))
```
### 3. Command Injection (CWE-78)
```python
# ✗ VULNERABLE
import subprocess
user_input = request.get("command")
subprocess.run(f"process_data {user_input}", shell=True)
# ✓ SECURE - Don't use shell=True, validate inputs
allowed_commands = ['analyze', 'report', 'export']
if user_input not in allowed_commands:
raise ValueError("Invalid command")
subprocess.run(["process_data", user_input], shell=False)
```
### 4. SSRF (Server-Side Request Forgery) (CWE-918)
```python
# ✗ VULNERABLE
user_url = request.get("data_source")
response = requests.get(user_url)
# ✓ SECURE
from tradingagents.security.validators import validate_url
user_url = validate_url(user_url, allowed_schemes=['https'])
# URL validator blocks private IPs, localhost, etc.
response = requests.get(user_url, timeout=10)
```
### 5. Insecure Deserialization (CWE-502)
```python
# ✗ VULNERABLE
import pickle
data = pickle.loads(user_provided_data)
# ✓ SECURE - Use safe serialization
import json
try:
data = json.loads(user_provided_data)
except json.JSONDecodeError:
raise ValueError("Invalid JSON data")
```
### 6. Information Disclosure (CWE-200)
```python
# ✗ VULNERABLE
@app.route("/debug")
def debug():
return {"config": app.config, "env": os.environ}
# ✓ SECURE
@app.route("/health")
def health():
return {"status": "healthy", "version": VERSION}
```
### 7. Insufficient Logging & Monitoring (CWE-778)
```python
# ✗ INSUFFICIENT
def transfer_funds(amount):
# No logging of financial transaction
execute_transfer(amount)
# ✓ SECURE
import logging
security_logger = logging.getLogger('security')
def transfer_funds(amount, user_id):
security_logger.info(
f"Transfer initiated",
extra={
"user_id": user_id,
"amount": amount,
"timestamp": datetime.now().isoformat()
}
)
try:
execute_transfer(amount)
security_logger.info(f"Transfer completed for user {user_id}")
except Exception as e:
security_logger.error(f"Transfer failed for user {user_id}: {e}")
raise
```
---
## Security Testing
### Write Security Tests
```python
# tests/security/test_input_validation.py
import pytest
from tradingagents.security import validate_ticker, sanitize_path_component
def test_validate_ticker_prevents_path_traversal():
"""Test that ticker validation prevents path traversal."""
malicious_inputs = [
"../../../etc/passwd",
"..\\..\\..\\windows\\system32",
"ticker/../../../secrets"
]
for malicious in malicious_inputs:
with pytest.raises(ValueError, match="Invalid ticker"):
validate_ticker(malicious)
def test_sanitize_path_component():
"""Test path sanitization."""
assert sanitize_path_component("../etc/passwd") == "etcpasswd"
assert sanitize_path_component("normal_file.txt") == "normal_file.txt"
assert ".." not in sanitize_path_component("../../data")
def test_api_key_validation():
"""Test API key validation."""
from tradingagents.security import validate_api_key
# Should pass
validate_api_key("sk-1234567890abcdef", "TEST_KEY")
# Should fail
with pytest.raises(ValueError):
validate_api_key(None, "TEST_KEY")
with pytest.raises(ValueError):
validate_api_key("", "TEST_KEY")
```
### Run Security Scans
```bash
# Static security analysis
bandit -r tradingagents/ -ll
# Dependency vulnerability scan
safety check
pip-audit
# Secret scanning
gitleaks detect --source=. --verbose
```
---
## Code Review Security Focus
When reviewing code, check for:
### Input Validation
- [ ] All user inputs are validated
- [ ] Validation happens on server side
- [ ] Whitelist approach used where possible
### Authentication & Authorization
- [ ] Proper authentication checks
- [ ] Authorization before sensitive operations
- [ ] Session management is secure
### Data Protection
- [ ] Sensitive data is encrypted
- [ ] No secrets in code or logs
- [ ] Proper error handling doesn't leak info
### API Security
- [ ] Rate limiting implemented
- [ ] Timeouts configured
- [ ] SSL/TLS used for all connections
### Dependencies
- [ ] Dependencies are up to date
- [ ] No known vulnerabilities
- [ ] Minimal dependencies used
---
## Tools and Resources
### Security Tools
1. **Bandit** - Python security linter
```bash
pip install bandit
bandit -r tradingagents/
```
2. **Safety** - Dependency vulnerability scanner
```bash
pip install safety
safety check
```
3. **pip-audit** - Another dependency scanner
```bash
pip install pip-audit
pip-audit
```
4. **Gitleaks** - Secret scanning
```bash
docker run -v $(pwd):/path zricethezav/gitleaks:latest detect --source="/path"
```
5. **Pre-commit hooks** - Automated checks
```bash
pip install pre-commit
pre-commit install
```
### Resources
- [OWASP Top 10](https://owasp.org/www-project-top-ten/)
- [CWE Top 25](https://cwe.mitre.org/top25/)
- [Python Security](https://python.readthedocs.io/en/stable/library/security_warnings.html)
- [NIST Guidelines](https://www.nist.gov/cybersecurity)
---
## Questions?
If you have security questions or concerns:
- Email: yijia.xiao@cs.ucla.edu
- Review: [SECURITY.md](SECURITY.md)
- Check: [SECURITY_AUDIT.md](SECURITY_AUDIT.md)
**Remember: When in doubt, ask before committing!**
Thank you for keeping TradingAgents secure!