refactor: migrate to newspaper4k and improve news service repository integration
- Upgrade from newspaper3k to newspaper4k for better article scraping
- Add repository integration for cached news data retrieval
- Implement proper date handling and data conversion in news service
- Move PRD files to dedicated prd/ directory
- Add type stubs and improve type checking configuration
- Fix linting issues (unused variables and loop control variables)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
07606f6bf4
commit
d773ed4cfa
|
|
@ -0,0 +1,23 @@
|
|||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Edit|MultiEdit|Write",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "mise run format"
|
||||
},
|
||||
{
|
||||
"type": "command",
|
||||
"command": "mise run lint --fix"
|
||||
},
|
||||
{
|
||||
"type": "command",
|
||||
"command": "mise run typecheck"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
@ -1,289 +0,0 @@
|
|||
# Product Requirements Document: FundamentalDataService Completion
|
||||
|
||||
## Overview
|
||||
|
||||
Complete the `FundamentalDataService` to provide strongly-typed fundamental financial data to trading agents using a local-first data strategy with gap detection and intelligent caching.
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### Issues to Fix
|
||||
- **CRITICAL**: Service calls `FinnhubClient` methods with string dates but client expects `date` objects
|
||||
- **CRITICAL**: References non-existent `self.simfin_client` instead of `self.finnhub_client`
|
||||
- Missing strongly-typed interfaces between components
|
||||
- Incomplete local-first strategy implementation
|
||||
- No concrete gap detection logic
|
||||
- Missing error recovery for partial data
|
||||
|
||||
### What Works
|
||||
- ✅ `FinnhubClient` fully implemented with strict `date` object interface
|
||||
- ✅ `FundamentalDataRepository` with dataclass-based storage
|
||||
- ✅ `FundamentalContext` Pydantic model for agent consumption
|
||||
- ✅ Basic service structure and error handling
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
### 1. Strongly-Typed Interfaces
|
||||
|
||||
#### Client → Service Interface
|
||||
```python
|
||||
# FinnhubClient methods (already implemented)
|
||||
def get_balance_sheet(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
|
||||
def get_income_statement(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
|
||||
def get_cash_flow(symbol: str, frequency: str, report_date: date) -> dict[str, Any]
|
||||
```
|
||||
|
||||
#### Service → Repository Interface
|
||||
```python
|
||||
# Repository methods (already implemented)
|
||||
def has_data_for_period(symbol: str, start_date: str, end_date: str, frequency: str) -> bool
|
||||
def get_data(symbol: str, start_date: str, end_date: str, frequency: str) -> dict[str, Any]
|
||||
def store_data(symbol: str, cache_data: dict, frequency: str, overwrite: bool) -> bool
|
||||
def clear_data(symbol: str, start_date: str, end_date: str, frequency: str) -> bool
|
||||
```
|
||||
|
||||
#### Service → Agent Interface
|
||||
```python
|
||||
# Service output (already defined)
|
||||
def get_context(symbol: str, start_date: str, end_date: str, frequency: str, force_refresh: bool) -> FundamentalContext
|
||||
```
|
||||
|
||||
### 2. Local-First Data Strategy
|
||||
|
||||
#### Flow
|
||||
1. **Repository Lookup**: Check `FundamentalDataRepository.has_data_for_period()`
|
||||
2. **Gap Detection**: Identify missing data periods using `detect_fundamental_gaps()`
|
||||
3. **Selective Fetching**: Fetch only missing data from `FinnhubClient`
|
||||
4. **Cache Updates**: Store new data via `repository.store_data()`
|
||||
5. **Context Assembly**: Return validated `FundamentalContext`
|
||||
|
||||
#### Gap Detection Implementation
|
||||
```python
|
||||
def detect_fundamental_gaps(self, symbol: str, start_date: str, end_date: str, frequency: str) -> list[str]:
|
||||
"""
|
||||
Returns list of report dates that need fetching.
|
||||
|
||||
Example: If requesting quarterly from 2024-01-01 to 2024-12-31
|
||||
and cache has Q1 and Q3, returns ["2024-06-30", "2024-09-30", "2024-12-31"]
|
||||
|
||||
For quarterly: Check for Q1 (Mar 31), Q2 (Jun 30), Q3 (Sep 30), Q4 (Dec 31)
|
||||
For annual: Check for fiscal year ends
|
||||
"""
|
||||
# Implementation should:
|
||||
# 1. Get existing report dates from repository
|
||||
# 2. Calculate expected report dates in requested period
|
||||
# 3. Return difference between expected and existing
|
||||
```
|
||||
|
||||
#### Force Refresh Support
|
||||
- `force_refresh=True` bypasses local data completely
|
||||
- Clears existing cache before fetching fresh data
|
||||
- Stores refreshed data with metadata indicating refresh
|
||||
|
||||
#### Cache Invalidation Strategy
|
||||
- **Fundamental data is immutable**: Once a report is filed, it doesn't change
|
||||
- **No staleness checks needed**: Reports are valid indefinitely
|
||||
- **Only fetch if missing**: Never re-fetch existing reports
|
||||
|
||||
### 3. Date Object Conversion
|
||||
|
||||
#### Service Boundary Conversion
|
||||
```python
|
||||
# Service receives string dates from agents
|
||||
def get_context(self, symbol: str, start_date: str, end_date: str, ...) -> FundamentalContext:
|
||||
# Validate date strings
|
||||
try:
|
||||
start_dt = date.fromisoformat(start_date)
|
||||
end_dt = date.fromisoformat(end_date)
|
||||
except ValueError as e:
|
||||
raise ValueError(f"Invalid date format: {e}")
|
||||
|
||||
# Check date order
|
||||
if end_dt < start_dt:
|
||||
raise ValueError(f"End date {end_date} is before start date {start_date}")
|
||||
|
||||
# Use date objects when calling FinnhubClient
|
||||
data = self.finnhub_client.get_balance_sheet(symbol, frequency, end_dt)
|
||||
```
|
||||
|
||||
### 4. Error Recovery and Partial Data
|
||||
|
||||
```python
|
||||
def handle_partial_statements(
|
||||
self,
|
||||
balance_sheet: dict | None,
|
||||
income_statement: dict | None,
|
||||
cash_flow: dict | None
|
||||
) -> FundamentalContext:
|
||||
"""
|
||||
Create context even if some statements are missing.
|
||||
|
||||
- If all statements fail: Raise exception
|
||||
- If some statements succeed: Return partial context
|
||||
- Mark missing statements in metadata
|
||||
"""
|
||||
metadata = {
|
||||
"has_balance_sheet": balance_sheet is not None,
|
||||
"has_income_statement": income_statement is not None,
|
||||
"has_cash_flow": cash_flow is not None,
|
||||
"partial_data": any(s is None for s in [balance_sheet, income_statement, cash_flow])
|
||||
}
|
||||
|
||||
# Convert available statements to FinancialStatement objects
|
||||
# Return FundamentalContext with available data
|
||||
```
|
||||
|
||||
### 5. Pydantic Validation
|
||||
|
||||
#### Context Structure
|
||||
```python
|
||||
@dataclass
|
||||
class FundamentalContext(BaseModel):
|
||||
symbol: str
|
||||
period: dict[str, str] # {"start": "2024-01-01", "end": "2024-01-31"}
|
||||
balance_sheet: FinancialStatement | None
|
||||
income_statement: FinancialStatement | None
|
||||
cash_flow: FinancialStatement | None
|
||||
key_ratios: dict[str, float]
|
||||
metadata: dict[str, Any]
|
||||
|
||||
@validator('period')
|
||||
def validate_period(cls, v):
|
||||
# Ensure start and end dates are present and valid
|
||||
return v
|
||||
```
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Phase 1: Fix Critical Issues
|
||||
|
||||
1. **Date Conversion Fix**
|
||||
- Add `date.fromisoformat()` conversion in service methods
|
||||
- Add date validation (format, order)
|
||||
- Update all `FinnhubClient` method calls to use `date` objects
|
||||
- File: `tradingagents/services/fundamental_data_service.py:153, 164, 175`
|
||||
|
||||
2. **Client Reference Fix**
|
||||
- Replace `self.simfin_client` with `self.finnhub_client`
|
||||
- File: `tradingagents/services/fundamental_data_service.py:375`
|
||||
|
||||
### Phase 2: Enhanced Local-First Strategy
|
||||
|
||||
3. **Gap Detection Logic**
|
||||
- Implement `detect_fundamental_gaps()` method
|
||||
- Calculate expected report dates based on frequency
|
||||
- Compare with cached data to find gaps
|
||||
- Handle fiscal year variations
|
||||
|
||||
4. **Partial Data Handling**
|
||||
- Implement `handle_partial_statements()` method
|
||||
- Continue processing if some statements succeed
|
||||
- Mark missing data in metadata
|
||||
- Only fail if all statements fail
|
||||
|
||||
### Phase 3: Type Safety & Validation
|
||||
|
||||
5. **Comprehensive Type Checking**
|
||||
- Run `mise run typecheck` - must pass with 0 errors
|
||||
- Validate all `date` object conversions
|
||||
- Ensure Pydantic model compliance
|
||||
|
||||
6. **Enhanced Testing**
|
||||
- Update existing tests for new date handling
|
||||
- Add gap detection test scenarios
|
||||
- Test partial data scenarios
|
||||
- Test force refresh behavior
|
||||
- Test date validation edge cases
|
||||
|
||||
## Testing Scenarios
|
||||
|
||||
### Integration Tests
|
||||
1. **Gap Detection**
|
||||
- Test with empty cache (should fetch all)
|
||||
- Test with partial cache (should fetch only missing)
|
||||
- Test with complete cache (should fetch none)
|
||||
|
||||
2. **Partial Data Recovery**
|
||||
- Test when balance sheet API fails but others succeed
|
||||
- Test when only one statement type is available
|
||||
- Test when all APIs fail (should raise exception)
|
||||
|
||||
3. **Date Handling**
|
||||
- Test invalid date formats
|
||||
- Test end_date < start_date
|
||||
- Test boundary conditions (year start/end)
|
||||
|
||||
4. **Force Refresh**
|
||||
- Test that force_refresh=True clears cache
|
||||
- Test that new data is fetched and stored
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Functional Requirements
|
||||
- ✅ Service successfully calls `FinnhubClient` with `date` objects
|
||||
- ✅ Gap detection correctly identifies missing reports
|
||||
- ✅ Partial data scenarios handled gracefully
|
||||
- ✅ Local-first strategy works: checks cache → identifies gaps → fetches missing → stores updates
|
||||
- ✅ Returns properly validated `FundamentalContext` to agents
|
||||
- ✅ Force refresh bypasses cache and refreshes data
|
||||
|
||||
### Technical Requirements
|
||||
- ✅ Zero type checking errors: `mise run typecheck`
|
||||
- ✅ Zero linting errors: `mise run lint`
|
||||
- ✅ All existing tests pass
|
||||
- ✅ No runtime errors with date conversions
|
||||
- ✅ Proper error messages for validation failures
|
||||
|
||||
### Quality Requirements
|
||||
- ✅ Strongly-typed interfaces between all components
|
||||
- ✅ Comprehensive error handling and logging
|
||||
- ✅ Efficient caching with minimal API calls
|
||||
- ✅ Clear separation of concerns between service, client, and repository
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Completed
|
||||
- ✅ `FinnhubClient` with `date` object interface
|
||||
- ✅ `FundamentalDataRepository` with dataclass storage
|
||||
- ✅ `FundamentalContext` Pydantic model
|
||||
|
||||
### Required
|
||||
- Working `FinnhubClient` instance with valid API key
|
||||
- Writable data directory for repository storage
|
||||
|
||||
## Timeline
|
||||
|
||||
### Immediate (Today)
|
||||
- Fix critical date conversion and reference issues
|
||||
- Implement basic gap detection
|
||||
- Add date validation
|
||||
|
||||
### Next Steps
|
||||
- Implement partial data handling
|
||||
- Comprehensive testing
|
||||
- Integration with agent workflows
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Must Have
|
||||
1. **Type Safety**: Service passes `mise run typecheck` with zero errors
|
||||
2. **Client Integration**: All `FinnhubClient` calls use `date` objects correctly
|
||||
3. **Gap Detection**: Correctly identifies missing report periods
|
||||
4. **Partial Data**: Service returns partial context when some statements fail
|
||||
5. **Local-First**: Service checks repository before API calls
|
||||
6. **Context Validation**: Returns valid `FundamentalContext` with Pydantic validation
|
||||
7. **Error Handling**: Graceful handling of API failures and missing data
|
||||
|
||||
### Should Have
|
||||
1. **Cache Efficiency**: Minimal redundant API calls
|
||||
2. **Force Refresh**: Complete cache bypass when requested
|
||||
3. **Data Quality**: Metadata indicating data completeness
|
||||
4. **Clear Error Messages**: Informative errors for date validation failures
|
||||
|
||||
### Nice to Have
|
||||
1. **Performance Metrics**: Timing and cache hit rate logging
|
||||
2. **Fiscal Year Handling**: Support for non-calendar fiscal years
|
||||
3. **Bulk Operations**: Fetch multiple symbols efficiently
|
||||
|
||||
---
|
||||
|
||||
This PRD focuses on completing the `FundamentalDataService` as a strongly-typed, local-first data service that seamlessly integrates with the existing `FinnhubClient` and `FundamentalDataRepository` components while providing robust gap detection and partial data handling.
|
||||
|
|
@ -1,502 +0,0 @@
|
|||
# Product Requirements Document: MarketDataService Completion
|
||||
|
||||
## Overview
|
||||
|
||||
Complete the `MarketDataService` to provide strongly-typed market data and technical indicators to trading agents using a local-first data strategy with gap detection and intelligent caching.
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### Issues to Fix
|
||||
- **CRITICAL**: Service uses `BaseClient` inheritance but `YFinanceClient` exists and needs refactoring to FinnhubClient standard
|
||||
- **CRITICAL**: Service calls client methods with string dates instead of date objects
|
||||
- **CRITICAL**: Need to integrate `stockstats` library for technical analysis calculations instead of legacy utils
|
||||
- **CRITICAL**: `MarketDataRepository` exists but missing service interface methods
|
||||
- Missing strongly-typed interface between YFinanceClient and service
|
||||
- YFinanceClient uses BaseClient inheritance and string dates (needs refactoring)
|
||||
- No concrete gap detection logic
|
||||
- Missing technical indicator data sufficiency validation
|
||||
|
||||
### What Works
|
||||
- ✅ Local-first data strategy implementation (`_get_price_data_local_first`)
|
||||
- ✅ Force refresh logic (`_fetch_and_cache_fresh_data`)
|
||||
- ✅ `MarketDataContext` Pydantic model for agent consumption
|
||||
- ✅ Error handling and metadata creation patterns
|
||||
- ✅ `YFinanceClient` exists with yfinance SDK integration and comprehensive methods
|
||||
- ✅ `MarketDataRepository` exists with CSV storage and pandas DataFrame operations
|
||||
- ✅ Service structure ready for `stockstats` integration for technical analysis
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
### 1. Strongly-Typed Interfaces
|
||||
|
||||
#### Client → Service Interface
|
||||
```python
|
||||
# YFinanceClient methods (to be refactored)
|
||||
def get_historical_data(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
|
||||
def get_price_data(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
|
||||
|
||||
# Technical analysis handled in service layer using stockstats
|
||||
# No get_technical_indicator method needed in client - calculated from OHLCV data
|
||||
```
|
||||
|
||||
#### Service → Repository Interface
|
||||
```python
|
||||
# MarketDataRepository methods (to be implemented)
|
||||
def has_data_for_period(symbol: str, start_date: str, end_date: str) -> bool
|
||||
def get_data(symbol: str, start_date: str, end_date: str) -> dict[str, Any]
|
||||
def store_data(symbol: str, cache_data: dict, overwrite: bool) -> bool
|
||||
def clear_data(symbol: str, start_date: str, end_date: str) -> bool
|
||||
```
|
||||
|
||||
#### Service → Agent Interface
|
||||
```python
|
||||
# Service output (already defined)
|
||||
def get_context(symbol: str, start_date: str, end_date: str, indicators: list[str], force_refresh: bool) -> MarketDataContext
|
||||
```
|
||||
|
||||
### 2. Local-First Data Strategy
|
||||
|
||||
#### Flow
|
||||
1. **Repository Lookup**: Check `MarketDataRepository.has_data_for_period()`
|
||||
2. **Gap Detection**: Identify missing price data periods using `detect_market_gaps()`
|
||||
3. **Data Sufficiency Check**: Ensure enough historical data for requested indicators
|
||||
4. **Selective Fetching**: Fetch only missing data from `YFinanceClient`
|
||||
5. **Cache Updates**: Store new data via `repository.store_data()`
|
||||
6. **Context Assembly**: Return validated `MarketDataContext`
|
||||
|
||||
#### Gap Detection Implementation
|
||||
```python
|
||||
def detect_market_gaps(self, cached_dates: list[str], requested_start: str, requested_end: str) -> list[tuple[str, str]]:
|
||||
"""
|
||||
Returns list of (start, end) tuples for missing periods.
|
||||
|
||||
Example: If requesting 2024-01-01 to 2024-01-31 and cache has:
|
||||
- 2024-01-01 to 2024-01-10
|
||||
- 2024-01-20 to 2024-01-25
|
||||
Returns: [("2024-01-11", "2024-01-19"), ("2024-01-26", "2024-01-31")]
|
||||
|
||||
Accounts for:
|
||||
- Weekends (Saturday/Sunday)
|
||||
- Market holidays
|
||||
- Continuous date ranges to minimize API calls
|
||||
"""
|
||||
# Implementation should use pandas business day logic
|
||||
```
|
||||
|
||||
#### Force Refresh Support
|
||||
- `force_refresh=True` bypasses local data completely
|
||||
- Clears existing cache before fetching fresh data
|
||||
- Stores refreshed data with metadata indicating refresh
|
||||
|
||||
#### Cache Invalidation Strategy
|
||||
- **Historical data is immutable**: Data older than yesterday never changes
|
||||
- **Today's data needs updates**: During market hours, refresh every 15 minutes
|
||||
- **After market close**: Today's data becomes immutable
|
||||
```python
|
||||
def is_data_stale(self, data_date: date, last_updated: datetime) -> bool:
|
||||
today = date.today()
|
||||
if data_date < today:
|
||||
return False # Historical data never stale
|
||||
|
||||
# For today's data, check if market is open and last update > 15 min
|
||||
if is_market_open() and (datetime.now() - last_updated).minutes > 15:
|
||||
return True
|
||||
return False
|
||||
```
|
||||
|
||||
### 3. Date Object Conversion
|
||||
|
||||
#### Service Boundary Conversion
|
||||
```python
|
||||
# Service receives string dates from agents
|
||||
def get_context(self, symbol: str, start_date: str, end_date: str, ...) -> MarketDataContext:
|
||||
# Validate date strings
|
||||
try:
|
||||
start_dt = date.fromisoformat(start_date)
|
||||
end_dt = date.fromisoformat(end_date)
|
||||
except ValueError as e:
|
||||
raise ValueError(f"Invalid date format: {e}")
|
||||
|
||||
# Check date order
|
||||
if end_dt < start_dt:
|
||||
raise ValueError(f"End date {end_date} is before start date {start_date}")
|
||||
|
||||
# Expand date range for technical indicators
|
||||
expanded_start = self._calculate_lookback_start(start_dt, indicators)
|
||||
|
||||
# Use date objects when calling YFinanceClient
|
||||
price_data = self.yfinance_client.get_historical_data(symbol, expanded_start, end_dt)
|
||||
|
||||
# Calculate technical indicators using stockstats library
|
||||
technical_indicators = self._calculate_technical_indicators(price_data, indicators)
|
||||
```
|
||||
|
||||
### 4. Technical Analysis with Stockstats
|
||||
|
||||
#### Data Sufficiency Validation
|
||||
```python
|
||||
# Minimum data points required for each indicator
|
||||
INDICATOR_REQUIREMENTS = {
|
||||
"sma_20": 20,
|
||||
"sma_200": 200,
|
||||
"ema_12": 24, # 2x for exponential smoothing
|
||||
"ema_200": 400,
|
||||
"rsi_14": 28, # 2x period for warm-up
|
||||
"macd": 34, # 26 + 8 for signal line
|
||||
"bb_upper": 20, # Based on 20-period SMA
|
||||
"atr_14": 28, # 2x period for accuracy
|
||||
"stochrsi_14": 42, # 3x period for double smoothing
|
||||
}
|
||||
|
||||
def _calculate_lookback_start(self, start_date: date, indicators: list[str]) -> date:
|
||||
"""Calculate how far back we need data to compute indicators accurately."""
|
||||
max_lookback = 0
|
||||
for indicator in indicators:
|
||||
lookback = INDICATOR_REQUIREMENTS.get(indicator, 0)
|
||||
max_lookback = max(max_lookback, lookback)
|
||||
|
||||
# Add buffer for weekends/holidays
|
||||
business_days_back = max_lookback * 1.5
|
||||
return start_date - timedelta(days=int(business_days_back))
|
||||
|
||||
def _validate_data_sufficiency(self, data_points: int, indicators: list[str]) -> dict[str, bool]:
|
||||
"""Check if we have enough data for each indicator."""
|
||||
return {
|
||||
indicator: data_points >= INDICATOR_REQUIREMENTS.get(indicator, 0)
|
||||
for indicator in indicators
|
||||
}
|
||||
```
|
||||
|
||||
#### Stockstats Integration
|
||||
```python
|
||||
def _calculate_technical_indicators(self, price_data: list[dict], indicators: list[str]) -> dict[str, list[dict]]:
|
||||
"""
|
||||
Calculate technical indicators using stockstats library.
|
||||
|
||||
Args:
|
||||
price_data: OHLCV data from YFinanceClient
|
||||
indicators: List of requested indicators (e.g., ['rsi_14', 'macd', 'bb_upper', 'sma_20'])
|
||||
|
||||
Returns:
|
||||
Dict mapping indicator names to time series data
|
||||
"""
|
||||
import pandas as pd
|
||||
from stockstats import StockDataFrame
|
||||
|
||||
# Convert price data to pandas DataFrame
|
||||
df = pd.DataFrame(price_data)
|
||||
df['date'] = pd.to_datetime(df['date'])
|
||||
df.set_index('date', inplace=True)
|
||||
|
||||
# Check data sufficiency
|
||||
sufficiency = self._validate_data_sufficiency(len(df), indicators)
|
||||
|
||||
# Create StockDataFrame for technical analysis
|
||||
sdf = StockDataFrame.retype(df)
|
||||
|
||||
# Calculate requested indicators
|
||||
indicator_data = {}
|
||||
for indicator in indicators:
|
||||
if not sufficiency[indicator]:
|
||||
logger.warning(f"Insufficient data for {indicator}, need {INDICATOR_REQUIREMENTS[indicator]} points")
|
||||
indicator_data[indicator] = []
|
||||
continue
|
||||
|
||||
try:
|
||||
if indicator in sdf.columns:
|
||||
values = sdf[indicator].dropna()
|
||||
indicator_data[indicator] = [
|
||||
{"date": idx.strftime("%Y-%m-%d"), "value": float(val)}
|
||||
for idx, val in values.items()
|
||||
]
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to calculate {indicator}: {e}")
|
||||
indicator_data[indicator] = []
|
||||
|
||||
return indicator_data
|
||||
```
|
||||
|
||||
### 5. Error Recovery and Partial Data
|
||||
|
||||
```python
|
||||
def handle_partial_price_data(
|
||||
self,
|
||||
requested_start: str,
|
||||
requested_end: str,
|
||||
available_data: list[dict]
|
||||
) -> MarketDataContext:
|
||||
"""
|
||||
Handle cases where only partial date range is available.
|
||||
|
||||
- If no data available: Raise exception
|
||||
- If partial data: Return what's available with metadata
|
||||
- Mark gaps in metadata
|
||||
"""
|
||||
if not available_data:
|
||||
raise ValueError(f"No market data available for {symbol}")
|
||||
|
||||
actual_start = min(d['date'] for d in available_data)
|
||||
actual_end = max(d['date'] for d in available_data)
|
||||
|
||||
metadata = {
|
||||
"requested_period": {"start": requested_start, "end": requested_end},
|
||||
"actual_period": {"start": actual_start, "end": actual_end},
|
||||
"partial_data": actual_start > requested_start or actual_end < requested_end,
|
||||
"data_points": len(available_data)
|
||||
}
|
||||
|
||||
# Return context with available data and metadata
|
||||
```
|
||||
|
||||
### 6. Pydantic Validation
|
||||
|
||||
#### Context Structure
|
||||
```python
|
||||
@dataclass
|
||||
class MarketDataContext(BaseModel):
|
||||
symbol: str
|
||||
period: dict[str, str] # {"start": "2024-01-01", "end": "2024-01-31"}
|
||||
price_data: list[dict[str, Any]] # OHLCV records
|
||||
technical_indicators: dict[str, list[TechnicalIndicatorData]]
|
||||
metadata: dict[str, Any]
|
||||
|
||||
@validator('price_data')
|
||||
def validate_price_data(cls, v):
|
||||
# Ensure OHLCV fields present and valid
|
||||
required_fields = {'date', 'open', 'high', 'low', 'close', 'volume'}
|
||||
for record in v:
|
||||
if not all(field in record for field in required_fields):
|
||||
raise ValueError(f"Missing required OHLCV fields")
|
||||
return v
|
||||
```
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Phase 1: Refactor YFinanceClient
|
||||
|
||||
1. **YFinanceClient Refactoring**
|
||||
- **Refactor existing** `tradingagents/clients/yfinance_client.py`
|
||||
- Remove BaseClient inheritance
|
||||
- Update all method signatures to accept `date` objects instead of strings
|
||||
- Keep all existing functionality intact
|
||||
- Example changes:
|
||||
```python
|
||||
# Current (wrong)
|
||||
def get_historical_data(self, symbol: str, start_date: str, end_date: str) -> dict[str, Any]:
|
||||
|
||||
# Updated (correct)
|
||||
def get_historical_data(self, symbol: str, start_date: date, end_date: date) -> dict[str, Any]:
|
||||
```
|
||||
|
||||
2. **Comprehensive Testing**
|
||||
- Update `tradingagents/clients/test_yfinance_client.py`
|
||||
- Test with date objects
|
||||
- Use pytest-vcr for HTTP interaction recording
|
||||
- Test error handling and edge cases
|
||||
|
||||
### Phase 2: Update MarketDataRepository
|
||||
|
||||
3. **Repository Interface Enhancement**
|
||||
- Update existing `tradingagents/repositories/market_data_repository.py`
|
||||
- Add missing service interface methods: `has_data_for_period()`, `get_data()`, `store_data()`, `clear_data()`
|
||||
- Maintain existing CSV/pandas functionality while adding service compatibility
|
||||
- Support gap detection and partial data scenarios
|
||||
|
||||
### Phase 3: Update MarketDataService
|
||||
|
||||
4. **Client Integration Fix**
|
||||
- Replace `BaseClient` dependency with `YFinanceClient`
|
||||
- File: `tradingagents/services/market_data_service.py:8, 26`
|
||||
- Update constructor to accept `yfinance_client: YFinanceClient`
|
||||
|
||||
5. **Date Conversion and Validation**
|
||||
- Add `date.fromisoformat()` conversion in service methods
|
||||
- Add date validation (format, order)
|
||||
- Update client calls to use date objects instead of strings
|
||||
- File: `tradingagents/services/market_data_service.py:151, 227`
|
||||
|
||||
6. **Technical Indicator Integration with Stockstats**
|
||||
- Implement `_calculate_technical_indicators()` method using `stockstats` library
|
||||
- Add `_calculate_lookback_start()` for data sufficiency
|
||||
- Add `_validate_data_sufficiency()` to check if enough data
|
||||
- Replace legacy `StockstatsUtils` integration with direct stockstats usage
|
||||
- File: `tradingagents/services/market_data_service.py:9, 43, 280-346`
|
||||
|
||||
### Phase 4: Type Safety & Validation
|
||||
|
||||
7. **Comprehensive Type Checking**
|
||||
- Run `mise run typecheck` - must pass with 0 errors
|
||||
- Validate all date object conversions
|
||||
- Ensure MarketDataContext compliance
|
||||
|
||||
8. **Enhanced Testing**
|
||||
- Update existing service tests for new YFinanceClient interface
|
||||
- Add gap detection test scenarios
|
||||
- Test technical indicator data sufficiency
|
||||
- Test partial data handling
|
||||
|
||||
## Testing Scenarios
|
||||
|
||||
### Integration Tests
|
||||
|
||||
1. **Gap Detection**
|
||||
- Test with empty cache (should fetch all)
|
||||
- Test with partial cache (should fetch only missing periods)
|
||||
- Test weekend/holiday handling
|
||||
|
||||
2. **Technical Indicator Sufficiency**
|
||||
- Test SMA_200 with only 100 days of data (should skip indicator)
|
||||
- Test RSI_14 with exactly 28 days (should calculate)
|
||||
- Test mixed indicators with varying data requirements
|
||||
|
||||
3. **Partial Data Recovery**
|
||||
- Test when API returns less data than requested
|
||||
- Test when some dates are missing (holidays)
|
||||
- Test metadata accuracy for partial data
|
||||
|
||||
4. **Date Handling**
|
||||
- Test invalid date formats
|
||||
- Test end_date < start_date
|
||||
- Test future dates
|
||||
- Test weekend date handling
|
||||
|
||||
5. **Cache Staleness**
|
||||
- Test historical data (should never refresh)
|
||||
- Test today's data during market hours (should refresh if > 15 min)
|
||||
- Test today's data after market close (should not refresh)
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Functional Requirements
|
||||
- ✅ Service successfully calls refactored `YFinanceClient` with `date` objects
|
||||
- ✅ Gap detection correctly identifies missing trading days
|
||||
- ✅ Technical indicators validate data sufficiency before calculation
|
||||
- ✅ Partial data scenarios handled gracefully
|
||||
- ✅ Local-first strategy works: checks cache → identifies gaps → fetches missing → stores updates
|
||||
- ✅ Returns properly validated `MarketDataContext` to agents
|
||||
- ✅ Technical indicators calculated from OHLCV data using stockstats library
|
||||
- ✅ Force refresh bypasses cache and refreshes data
|
||||
|
||||
### Technical Requirements
|
||||
- ✅ Zero type checking errors: `mise run typecheck`
|
||||
- ✅ Zero linting errors: `mise run lint`
|
||||
- ✅ All existing tests pass with updated architecture
|
||||
- ✅ No runtime errors with date conversions
|
||||
- ✅ Proper error messages for validation failures
|
||||
|
||||
### Quality Requirements
|
||||
- ✅ Strongly-typed interfaces between all components
|
||||
- ✅ Official yfinance SDK and stockstats library usage
|
||||
- ✅ Comprehensive error handling and logging
|
||||
- ✅ Efficient caching with minimal API calls
|
||||
- ✅ Clear separation of concerns between service, client, and repository
|
||||
|
||||
## Data Architecture
|
||||
|
||||
### YFinanceClient Response Format
|
||||
```python
|
||||
{
|
||||
"symbol": "AAPL",
|
||||
"period": {"start": "2024-01-01", "end": "2024-01-31"},
|
||||
"data": [
|
||||
{
|
||||
"date": "2024-01-02", # Note: Jan 1 was a holiday
|
||||
"open": 150.0,
|
||||
"high": 155.0,
|
||||
"low": 149.0,
|
||||
"close": 154.0,
|
||||
"volume": 1000000,
|
||||
"adj_close": 154.0
|
||||
},
|
||||
...
|
||||
],
|
||||
"metadata": {
|
||||
"source": "yfinance",
|
||||
"retrieved_at": "2024-01-31T10:00:00Z",
|
||||
"data_quality": "HIGH",
|
||||
"missing_dates": ["2024-01-01", "2024-01-15"] # Holidays
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Technical Indicator Data Format
|
||||
```python
|
||||
# MarketDataContext.technical_indicators structure
|
||||
{
|
||||
"rsi_14": [
|
||||
{"date": "2024-01-29", "value": 65.5}, # First valid after 28 days
|
||||
{"date": "2024-01-30", "value": 67.2},
|
||||
...
|
||||
],
|
||||
"sma_200": [], # Empty if insufficient data
|
||||
"macd": [
|
||||
{"date": "2024-01-31", "value": {"macd": 2.1, "signal": 1.8, "histogram": 0.3}}
|
||||
],
|
||||
"_metadata": {
|
||||
"indicators_calculated": ["rsi_14", "macd"],
|
||||
"indicators_skipped": {
|
||||
"sma_200": "Insufficient data: need 200 points, have 31"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Existing Components (Need Updates)
|
||||
- ✅ `YFinanceClient` exists but needs refactoring (remove BaseClient, use date objects)
|
||||
- ✅ `MarketDataRepository` exists with CSV storage but needs service interface methods
|
||||
- ✅ Tests exist but need updates for new interfaces
|
||||
|
||||
### Required
|
||||
- Official `yfinance` library for market data fetching
|
||||
- `stockstats` library for technical analysis calculations
|
||||
- `pandas` for date/time handling and business day calculations
|
||||
- Working internet connection for live data fetching
|
||||
- Writable data directory for repository storage
|
||||
|
||||
## Timeline
|
||||
|
||||
### Immediate (Phase 1)
|
||||
- Refactor existing YFinanceClient to use date objects
|
||||
- Remove BaseClient inheritance
|
||||
- Update tests for new interface
|
||||
|
||||
### Phase 2-3
|
||||
- Add service interface methods to MarketDataRepository
|
||||
- Update MarketDataService to use refactored YFinanceClient
|
||||
- Implement data sufficiency validation
|
||||
- Integrate stockstats library for technical indicators
|
||||
|
||||
### Phase 4
|
||||
- Comprehensive type checking and validation
|
||||
- Integration testing with gap detection
|
||||
- Performance optimization and caching efficiency
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Must Have
|
||||
1. **Type Safety**: Service passes `mise run typecheck` with zero errors
|
||||
2. **Client Refactoring**: YFinanceClient uses date objects, no BaseClient
|
||||
3. **Gap Detection**: Correctly identifies missing trading days
|
||||
4. **Data Sufficiency**: Validates enough data for technical indicators
|
||||
5. **Partial Data**: Service handles incomplete data gracefully
|
||||
6. **Local-First**: Service checks repository before API calls
|
||||
7. **Context Validation**: Returns valid `MarketDataContext` with Pydantic validation
|
||||
8. **Technical Indicators**: Calculated using stockstats with proper validation
|
||||
|
||||
### Should Have
|
||||
1. **Cache Efficiency**: Minimal redundant API calls to Yahoo Finance
|
||||
2. **Force Refresh**: Complete cache bypass when requested
|
||||
3. **Stale Data Handling**: Refresh today's data during market hours
|
||||
4. **Clear Error Messages**: Informative errors for validation failures
|
||||
|
||||
### Nice to Have
|
||||
1. **Performance Metrics**: Timing and cache hit rate logging
|
||||
2. **Extended Indicators**: Support for 50+ technical indicators
|
||||
3. **Real-time Data**: WebSocket integration for live prices
|
||||
4. **Bulk Symbol Support**: Fetch multiple symbols efficiently
|
||||
|
||||
---
|
||||
|
||||
This PRD focuses on completing the `MarketDataService` as a strongly-typed, local-first data service that integrates OHLCV price data from a refactored `YFinanceClient` and calculates comprehensive technical indicators using the `stockstats` library, with robust gap detection and data sufficiency validation.
|
||||
|
|
@ -1,779 +0,0 @@
|
|||
# Product Requirements Document: NewsService Completion
|
||||
|
||||
## Overview
|
||||
|
||||
Complete the `NewsService` to provide strongly-typed news data and sentiment analysis to trading agents using a local-first data strategy with RSS feed integration, article content extraction, and LLM-powered sentiment analysis.
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### Issues to Fix
|
||||
- **CRITICAL**: Service is currently empty placeholder with only method stubs
|
||||
- **CRITICAL**: Need to implement GoogleNewsClient to read RSS feeds
|
||||
- **CRITICAL**: Need RSS article fetching with fallback to Internet Archive
|
||||
- **CRITICAL**: Need LLM-powered sentiment analysis integration
|
||||
- **CRITICAL**: Service uses `BaseClient` inheritance instead of typed clients
|
||||
- **CRITICAL**: `NewsRepository` has different interface than service expectations
|
||||
- Missing strongly-typed interfaces between components
|
||||
- No concrete approach for article content extraction
|
||||
|
||||
### What Works
|
||||
- ✅ `NewsContext` and `ArticleData` Pydantic models for agent consumption
|
||||
- ✅ `SentimentScore` model for structured sentiment data
|
||||
- ✅ `FinnhubClient` with `get_company_news()` method using date objects
|
||||
- ✅ `NewsRepository` with dataclass-based storage and deduplication
|
||||
- ✅ Service structure placeholder ready for implementation
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
### 1. Strongly-Typed Interfaces
|
||||
|
||||
#### Client → Service Interface
|
||||
```python
|
||||
# FinnhubClient methods (already implemented)
|
||||
def get_company_news(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
|
||||
|
||||
# GoogleNewsClient methods (to be implemented)
|
||||
def fetch_rss_feed(query: str, start_date: date, end_date: date) -> dict[str, Any]
|
||||
def fetch_article_content(url: str, use_archive_fallback: bool = True) -> dict[str, Any]
|
||||
def get_company_news(symbol: str, start_date: date, end_date: date) -> dict[str, Any]
|
||||
def get_global_news(start_date: date, end_date: date, categories: list[str]) -> dict[str, Any]
|
||||
```
|
||||
|
||||
#### Service → Repository Interface
|
||||
```python
|
||||
# NewsRepository methods (to be implemented/bridged)
|
||||
def has_data_for_period(query: str, start_date: str, end_date: str, symbol: str | None) -> bool
|
||||
def get_data(query: str, start_date: str, end_date: str, symbol: str | None) -> dict[str, Any]
|
||||
def store_data(query: str, cache_data: dict, symbol: str | None, overwrite: bool) -> bool
|
||||
def clear_data(query: str, start_date: str, end_date: str, symbol: str | None) -> bool
|
||||
```
|
||||
|
||||
#### Service → Agent Interface
|
||||
```python
|
||||
# Service output (already defined)
|
||||
def get_context(query: str, start_date: str, end_date: str, symbol: str | None, sources: list[str], force_refresh: bool) -> NewsContext
|
||||
```
|
||||
|
||||
### 2. Local-First Data Strategy
|
||||
|
||||
#### Flow
|
||||
1. **Repository Lookup**: Check `NewsRepository.has_data_for_period()`
|
||||
2. **Freshness Check**: Determine if cache needs updating (news is append-only)
|
||||
3. **RSS Feed Fetching**: Fetch RSS feeds from Google News
|
||||
4. **Content Extraction**: Extract full article content with Internet Archive fallback
|
||||
5. **LLM Analysis**: Perform sentiment analysis using LLM
|
||||
6. **Cache Updates**: Store enriched articles via `repository.store_data()`
|
||||
7. **Context Assembly**: Return validated `NewsContext`
|
||||
|
||||
#### News-Specific Gap Detection
|
||||
```python
|
||||
def should_fetch_new_articles(self, last_fetch_time: datetime, current_time: datetime) -> bool:
|
||||
"""
|
||||
News doesn't have "gaps" - it's append-only. Check if enough time passed for new articles.
|
||||
|
||||
Returns True if:
|
||||
- Last fetch was more than 6 hours ago
|
||||
- User requested force_refresh
|
||||
- No data exists for the query/period
|
||||
"""
|
||||
if not last_fetch_time:
|
||||
return True
|
||||
|
||||
hours_since_fetch = (current_time - last_fetch_time).total_seconds() / 3600
|
||||
return hours_since_fetch >= 6 # Fetch new articles every 6 hours
|
||||
```
|
||||
|
||||
#### Force Refresh Support
|
||||
- `force_refresh=True` fetches all articles fresh from sources
|
||||
- Does NOT clear existing cache (news is immutable)
|
||||
- Deduplicates against existing articles before storing
|
||||
|
||||
#### Cache Invalidation Strategy
|
||||
- **Articles are immutable**: Once published, articles don't change
|
||||
- **Cache grows append-only**: New articles are added, old ones retained
|
||||
- **Freshness check**: Re-fetch every 6 hours for new articles
|
||||
- **No deletion**: Articles are never removed from cache
|
||||
|
||||
### 3. RSS Feed Processing & Article Fetching
|
||||
|
||||
#### GoogleNewsClient RSS Implementation
|
||||
```python
|
||||
import feedparser
|
||||
from newspaper import Article
|
||||
import requests
|
||||
from datetime import date, datetime
|
||||
from typing import Any, Optional
|
||||
|
||||
class GoogleNewsClient:
|
||||
"""Google News RSS client following FinnhubClient standard."""
|
||||
|
||||
def __init__(self):
|
||||
self.base_rss_url = "https://news.google.com/rss"
|
||||
self.archive_base_url = "https://archive.org/wayback/available"
|
||||
|
||||
def fetch_rss_feed(self, query: str, start_date: date, end_date: date) -> dict[str, Any]:
|
||||
"""
|
||||
Fetch RSS feed data for news articles.
|
||||
|
||||
Args:
|
||||
query: Search query or company symbol
|
||||
start_date: Start date for filtering articles
|
||||
end_date: End date for filtering articles
|
||||
|
||||
Returns:
|
||||
Dict containing RSS feed articles with metadata
|
||||
"""
|
||||
# Construct RSS feed URL
|
||||
rss_url = f"{self.base_rss_url}/search?q={query}&hl=en-US&gl=US&ceid=US:en"
|
||||
|
||||
# Parse RSS feed
|
||||
feed = feedparser.parse(rss_url)
|
||||
|
||||
# Filter and structure articles
|
||||
articles = []
|
||||
for entry in feed.entries:
|
||||
# Parse publication date
|
||||
pub_date = datetime(*entry.published_parsed[:6]).date()
|
||||
|
||||
# Filter by date range
|
||||
if start_date <= pub_date <= end_date:
|
||||
articles.append({
|
||||
"headline": entry.title,
|
||||
"url": entry.link,
|
||||
"source": entry.source.get('title', 'Google News'),
|
||||
"date": pub_date.isoformat(),
|
||||
"summary": entry.get('summary', ''),
|
||||
})
|
||||
|
||||
return {
|
||||
"query": query,
|
||||
"period": {"start": start_date.isoformat(), "end": end_date.isoformat()},
|
||||
"articles": articles,
|
||||
"metadata": {
|
||||
"source": "google_news_rss",
|
||||
"rss_feed_url": rss_url,
|
||||
"article_count": len(articles)
|
||||
}
|
||||
}
|
||||
|
||||
def fetch_article_content(self, url: str, use_archive_fallback: bool = True) -> dict[str, Any]:
|
||||
"""
|
||||
Fetch full article content from URL with Internet Archive fallback.
|
||||
|
||||
Args:
|
||||
url: Article URL to fetch
|
||||
use_archive_fallback: Whether to try Internet Archive if direct fetch fails
|
||||
|
||||
Returns:
|
||||
Dict containing article content, title, publication date
|
||||
"""
|
||||
try:
|
||||
# Try direct fetch
|
||||
article = Article(url)
|
||||
article.download()
|
||||
article.parse()
|
||||
|
||||
return {
|
||||
"content": article.text,
|
||||
"title": article.title,
|
||||
"authors": article.authors,
|
||||
"publish_date": article.publish_date.isoformat() if article.publish_date else None,
|
||||
"extracted_via": "direct_fetch",
|
||||
"extraction_success": True
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
if use_archive_fallback:
|
||||
# Try Internet Archive
|
||||
archive_url = self._get_archive_url(url)
|
||||
if archive_url:
|
||||
try:
|
||||
article = Article(archive_url)
|
||||
article.download()
|
||||
article.parse()
|
||||
|
||||
return {
|
||||
"content": article.text,
|
||||
"title": article.title,
|
||||
"authors": article.authors,
|
||||
"publish_date": article.publish_date.isoformat() if article.publish_date else None,
|
||||
"extracted_via": "internet_archive",
|
||||
"extraction_success": True
|
||||
}
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Return failure
|
||||
return {
|
||||
"content": "",
|
||||
"title": "",
|
||||
"extracted_via": "failed",
|
||||
"extraction_success": False,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
def _get_archive_url(self, url: str) -> Optional[str]:
|
||||
"""Get Internet Archive URL for a given URL."""
|
||||
try:
|
||||
response = requests.get(f"{self.archive_base_url}?url={url}")
|
||||
data = response.json()
|
||||
if data.get("archived_snapshots", {}).get("closest", {}).get("available"):
|
||||
return data["archived_snapshots"]["closest"]["url"]
|
||||
except Exception:
|
||||
pass
|
||||
return None
|
||||
```
|
||||
|
||||
### 4. LLM-Powered Sentiment Analysis
|
||||
|
||||
#### Sentiment Analysis Integration
|
||||
```python
|
||||
class LLMSentimentAnalyzer:
|
||||
"""LLM-based sentiment analyzer for financial news."""
|
||||
|
||||
def __init__(self, llm_client):
|
||||
self.llm_client = llm_client
|
||||
self.sentiment_prompt = """
|
||||
Analyze the sentiment of this financial news article for trading purposes.
|
||||
|
||||
Article:
|
||||
Title: {headline}
|
||||
Content: {content}
|
||||
|
||||
Provide your analysis in the following JSON format:
|
||||
{{
|
||||
"score": <float between -1.0 (very negative) and 1.0 (very positive)>,
|
||||
"confidence": <float between 0.0 and 1.0>,
|
||||
"label": <"positive", "negative", or "neutral">,
|
||||
"reasoning": <brief explanation>,
|
||||
"key_themes": <list of key financial themes>,
|
||||
"financial_entities": <list of mentioned companies/tickers>
|
||||
}}
|
||||
|
||||
Focus on the financial and market implications of the news.
|
||||
"""
|
||||
|
||||
def analyze_sentiment(self, article: ArticleData) -> SentimentScore:
|
||||
"""
|
||||
Analyze article sentiment using LLM.
|
||||
|
||||
Args:
|
||||
article: Article data with headline and content
|
||||
|
||||
Returns:
|
||||
SentimentScore with score, confidence, and label
|
||||
"""
|
||||
# Prepare prompt
|
||||
prompt = self.sentiment_prompt.format(
|
||||
headline=article.headline,
|
||||
content=article.content[:2000] # Limit content length
|
||||
)
|
||||
|
||||
# Get LLM response
|
||||
response = self.llm_client.complete(prompt)
|
||||
|
||||
# Parse response
|
||||
try:
|
||||
result = json.loads(response)
|
||||
|
||||
# Convert to SentimentScore
|
||||
score = result.get("score", 0.0)
|
||||
return SentimentScore(
|
||||
positive=max(0, score),
|
||||
negative=abs(min(0, score)),
|
||||
neutral=1.0 - abs(score),
|
||||
metadata={
|
||||
"confidence": result.get("confidence", 0.5),
|
||||
"label": result.get("label", "neutral"),
|
||||
"reasoning": result.get("reasoning", ""),
|
||||
"key_themes": result.get("key_themes", []),
|
||||
"financial_entities": result.get("financial_entities", [])
|
||||
}
|
||||
)
|
||||
except Exception as e:
|
||||
# Return neutral sentiment on error
|
||||
return SentimentScore(
|
||||
positive=0.0,
|
||||
negative=0.0,
|
||||
neutral=1.0,
|
||||
metadata={"error": str(e)}
|
||||
)
|
||||
|
||||
def batch_analyze(self, articles: list[ArticleData], batch_size: int = 5) -> list[SentimentScore]:
|
||||
"""
|
||||
Batch process sentiment analysis for multiple articles.
|
||||
|
||||
Args:
|
||||
articles: List of articles to analyze
|
||||
batch_size: Number of articles to process in parallel
|
||||
|
||||
Returns:
|
||||
List of sentiment scores corresponding to input articles
|
||||
"""
|
||||
results = []
|
||||
|
||||
for i in range(0, len(articles), batch_size):
|
||||
batch = articles[i:i + batch_size]
|
||||
|
||||
# Process batch (could be parallelized)
|
||||
for article in batch:
|
||||
sentiment = self.analyze_sentiment(article)
|
||||
results.append(sentiment)
|
||||
|
||||
# Add small delay to respect rate limits
|
||||
time.sleep(0.1)
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
### 5. Date Object Conversion
|
||||
|
||||
#### Service Boundary Conversion
|
||||
```python
|
||||
# Service receives string dates from agents
|
||||
def get_context(self, query: str, start_date: str, end_date: str, ...) -> NewsContext:
|
||||
# Validate date strings
|
||||
try:
|
||||
start_dt = date.fromisoformat(start_date)
|
||||
end_dt = date.fromisoformat(end_date)
|
||||
except ValueError as e:
|
||||
raise ValueError(f"Invalid date format: {e}")
|
||||
|
||||
# Check date order
|
||||
if end_dt < start_dt:
|
||||
raise ValueError(f"End date {end_date} is before start date {start_date}")
|
||||
|
||||
# Fetch from multiple sources
|
||||
finnhub_data = self.finnhub_client.get_company_news(symbol, start_dt, end_dt) if symbol else None
|
||||
google_rss = self.google_client.fetch_rss_feed(query, start_dt, end_dt)
|
||||
|
||||
# Fetch full article content for RSS articles
|
||||
for article in google_rss.get('articles', []):
|
||||
content_data = self.google_client.fetch_article_content(article['url'])
|
||||
article.update(content_data)
|
||||
|
||||
# Combine all articles
|
||||
all_articles = self._combine_and_deduplicate(finnhub_data, google_rss)
|
||||
|
||||
# Perform LLM sentiment analysis
|
||||
enriched_articles = []
|
||||
for article in all_articles:
|
||||
article_data = ArticleData(**article)
|
||||
article_data.sentiment = self.sentiment_analyzer.analyze_sentiment(article_data)
|
||||
enriched_articles.append(article_data)
|
||||
|
||||
# Create and return context
|
||||
return self._create_news_context(enriched_articles, start_date, end_date)
|
||||
```
|
||||
|
||||
### 6. Error Recovery and Partial Data
|
||||
|
||||
```python
|
||||
def handle_source_failure(
|
||||
self,
|
||||
finnhub_data: dict | None,
|
||||
google_data: dict | None,
|
||||
errors: dict[str, Exception]
|
||||
) -> NewsContext:
|
||||
"""
|
||||
Handle cases where one or more news sources fail.
|
||||
|
||||
- If all sources fail: Raise exception
|
||||
- If some sources succeed: Return partial data with metadata
|
||||
- Track content extraction failures separately
|
||||
"""
|
||||
if not finnhub_data and not google_data:
|
||||
raise ValueError("All news sources failed to return data")
|
||||
|
||||
# Track extraction statistics
|
||||
extraction_stats = {
|
||||
"total_articles": 0,
|
||||
"successful_extractions": 0,
|
||||
"archive_fallbacks": 0,
|
||||
"failed_extractions": 0
|
||||
}
|
||||
|
||||
# Process available articles
|
||||
all_articles = []
|
||||
successful_sources = []
|
||||
|
||||
if finnhub_data:
|
||||
all_articles.extend(finnhub_data.get('articles', []))
|
||||
successful_sources.append('finnhub')
|
||||
|
||||
if google_data:
|
||||
articles = google_data.get('articles', [])
|
||||
for article in articles:
|
||||
extraction_stats["total_articles"] += 1
|
||||
if article.get("extraction_success"):
|
||||
extraction_stats["successful_extractions"] += 1
|
||||
if article.get("extracted_via") == "internet_archive":
|
||||
extraction_stats["archive_fallbacks"] += 1
|
||||
else:
|
||||
extraction_stats["failed_extractions"] += 1
|
||||
|
||||
all_articles.extend(articles)
|
||||
successful_sources.append('google_news')
|
||||
|
||||
metadata = {
|
||||
"sources_requested": ["finnhub", "google_news"],
|
||||
"sources_successful": successful_sources,
|
||||
"sources_failed": {source: str(error) for source, error in errors.items()},
|
||||
"extraction_stats": extraction_stats,
|
||||
"partial_data": len(successful_sources) < 2
|
||||
}
|
||||
|
||||
# Deduplicate and return context
|
||||
return self._create_context(all_articles, metadata)
|
||||
```
|
||||
|
||||
### 7. Repository Method Bridging
|
||||
|
||||
```python
|
||||
# Add these bridge methods to NewsRepository
|
||||
def has_data_for_period(self, query: str, start_date: str, end_date: str, symbol: str | None = None) -> bool:
|
||||
"""Bridge to existing get_news_data method."""
|
||||
existing_data = self.get_news_data(
|
||||
symbol=symbol or query,
|
||||
start_date=start_date,
|
||||
end_date=end_date
|
||||
)
|
||||
return len(existing_data.get('articles', [])) > 0
|
||||
|
||||
def get_data(self, query: str, start_date: str, end_date: str, symbol: str | None = None) -> dict[str, Any]:
|
||||
"""Bridge to existing get_news_data method."""
|
||||
return self.get_news_data(
|
||||
symbol=symbol or query,
|
||||
start_date=start_date,
|
||||
end_date=end_date
|
||||
)
|
||||
|
||||
def store_data(self, query: str, cache_data: dict, symbol: str | None = None, overwrite: bool = False) -> bool:
|
||||
"""Bridge to existing store_news_articles method."""
|
||||
articles = cache_data.get('articles', [])
|
||||
if not articles:
|
||||
return False
|
||||
|
||||
# Convert to expected format
|
||||
news_articles = [
|
||||
NewsArticle(
|
||||
symbol=symbol or query,
|
||||
headline=a['headline'],
|
||||
summary=a.get('summary', ''),
|
||||
content=a.get('content', ''),
|
||||
url=a['url'],
|
||||
source=a['source'],
|
||||
date=a['date'],
|
||||
entities=a.get('entities', []),
|
||||
sentiment_score=a.get('sentiment', {}).get('score', 0.0),
|
||||
sentiment_metadata=a.get('sentiment', {})
|
||||
)
|
||||
for a in articles
|
||||
]
|
||||
|
||||
return self.store_news_articles(news_articles)
|
||||
|
||||
def clear_data(self, query: str, start_date: str, end_date: str, symbol: str | None = None) -> bool:
|
||||
"""News is append-only, so this just marks data as stale for re-fetch."""
|
||||
# Implementation depends on repository design
|
||||
# Could update metadata to trigger re-fetch
|
||||
return True
|
||||
```
|
||||
|
||||
### 8. Pydantic Validation
|
||||
|
||||
#### Context Structure
|
||||
```python
|
||||
@dataclass
|
||||
class NewsContext(BaseModel):
|
||||
symbol: str | None
|
||||
period: dict[str, str] # {"start": "2024-01-01", "end": "2024-01-31"}
|
||||
articles: list[ArticleData]
|
||||
sentiment_summary: SentimentScore
|
||||
article_count: int
|
||||
sources: list[str]
|
||||
metadata: dict[str, Any]
|
||||
|
||||
@validator('period')
|
||||
def validate_period(cls, v):
|
||||
# Ensure start and end dates are present and valid
|
||||
if 'start' not in v or 'end' not in v:
|
||||
raise ValueError("Period must have 'start' and 'end' dates")
|
||||
return v
|
||||
|
||||
@validator('articles')
|
||||
def validate_articles(cls, v):
|
||||
# Ensure no duplicate URLs
|
||||
urls = [a.url for a in v]
|
||||
if len(urls) != len(set(urls)):
|
||||
raise ValueError("Duplicate articles detected")
|
||||
return v
|
||||
```
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Phase 1: Create GoogleNewsClient
|
||||
|
||||
1. **GoogleNewsClient Implementation**
|
||||
- Create `tradingagents/clients/google_news_client.py` following FinnhubClient standard
|
||||
- Implement RSS feed parsing using `feedparser` library
|
||||
- Add `fetch_rss_feed()` method with Google News RSS integration
|
||||
- Add `fetch_article_content()` method with `newspaper3k` and Internet Archive fallback
|
||||
- Use `date` objects for all date parameters
|
||||
- No BaseClient inheritance
|
||||
|
||||
2. **Article Content Extraction**
|
||||
- Implement robust article content extraction using `newspaper3k`
|
||||
- Add fallback to Internet Archive Wayback Machine for failed fetches
|
||||
- Handle paywall detection and alternative content sources
|
||||
- Extract clean text, title, publication date, and metadata
|
||||
|
||||
3. **Comprehensive Testing**
|
||||
- Create test suite for GoogleNewsClient
|
||||
- Test RSS parsing with various queries
|
||||
- Test content extraction with real and archived URLs
|
||||
- Use pytest-vcr for HTTP interaction recording
|
||||
|
||||
### Phase 2: Bridge NewsRepository Interface
|
||||
|
||||
4. **Repository Interface Standardization**
|
||||
- Add standard service interface methods to `NewsRepository`
|
||||
- Bridge existing methods without changing underlying storage
|
||||
- File: `tradingagents/repositories/news_repository.py`
|
||||
- Maintain backward compatibility
|
||||
|
||||
### Phase 3: Implement NewsService
|
||||
|
||||
5. **Service Core Implementation**
|
||||
- Replace method stubs with full implementation
|
||||
- Implement `get_context()`, `get_company_news_context()`, `get_global_news_context()`
|
||||
- Add local-first data strategy with freshness checking
|
||||
- Replace `BaseClient` dependencies with typed clients
|
||||
- File: `tradingagents/services/news_service.py`
|
||||
|
||||
6. **LLM Sentiment Analysis Integration**
|
||||
- Implement `LLMSentimentAnalyzer` class
|
||||
- Create financial news sentiment prompts
|
||||
- Add batch processing for efficiency
|
||||
- Handle LLM rate limiting and errors
|
||||
|
||||
7. **Date Conversion and Article Processing**
|
||||
- Add date validation and conversion
|
||||
- Implement RSS article fetching pipeline
|
||||
- Add content extraction with fallback
|
||||
- Combine articles from multiple sources
|
||||
- Implement deduplication by URL
|
||||
|
||||
### Phase 4: Type Safety & Validation
|
||||
|
||||
8. **Comprehensive Type Checking**
|
||||
- Run `mise run typecheck` - must pass with 0 errors
|
||||
- Validate all date object conversions
|
||||
- Ensure NewsContext compliance
|
||||
|
||||
9. **Enhanced Testing**
|
||||
- Test RSS feed parsing edge cases
|
||||
- Test content extraction failures and fallbacks
|
||||
- Test LLM sentiment analysis with various article types
|
||||
- Test multi-source aggregation and deduplication
|
||||
|
||||
## Testing Scenarios
|
||||
|
||||
### Integration Tests
|
||||
|
||||
1. **RSS Feed Processing**
|
||||
- Test with various search queries
|
||||
- Test date filtering in RSS results
|
||||
- Test handling of malformed RSS feeds
|
||||
|
||||
2. **Content Extraction**
|
||||
- Test direct fetch success
|
||||
- Test Internet Archive fallback
|
||||
- Test paywall detection
|
||||
- Test extraction failure handling
|
||||
|
||||
3. **LLM Sentiment Analysis**
|
||||
- Test positive news sentiment
|
||||
- Test negative earnings reports
|
||||
- Test neutral market updates
|
||||
- Test batch processing
|
||||
- Test LLM error handling
|
||||
|
||||
4. **Multi-Source Aggregation**
|
||||
- Test both sources succeed
|
||||
- Test Finnhub fails, Google succeeds
|
||||
- Test Google fails, Finnhub succeeds
|
||||
- Test both sources fail
|
||||
|
||||
5. **Date Handling**
|
||||
- Test invalid date formats
|
||||
- Test end_date < start_date
|
||||
- Test date filtering in RSS feeds
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Functional Requirements
|
||||
- ✅ Service successfully implements all placeholder methods
|
||||
- ✅ GoogleNewsClient reads and parses RSS feeds correctly
|
||||
- ✅ Article content extraction works with Internet Archive fallback
|
||||
- ✅ LLM sentiment analysis provides structured financial sentiment
|
||||
- ✅ Local-first strategy with proper freshness checking
|
||||
- ✅ Multi-source aggregation with deduplication
|
||||
- ✅ Returns properly validated `NewsContext` to agents
|
||||
- ✅ Force refresh fetches fresh articles without clearing cache
|
||||
|
||||
### Technical Requirements
|
||||
- ✅ Zero type checking errors: `mise run typecheck`
|
||||
- ✅ Zero linting errors: `mise run lint`
|
||||
- ✅ All tests pass with new implementation
|
||||
- ✅ No runtime errors with date conversions
|
||||
- ✅ Proper error messages for validation failures
|
||||
|
||||
### Quality Requirements
|
||||
- ✅ Strongly-typed interfaces between all components
|
||||
- ✅ RSS feed parsing with robust error handling
|
||||
- ✅ Article content extraction with fallback strategy
|
||||
- ✅ LLM integration with proper prompt engineering
|
||||
- ✅ Efficient caching with minimal external calls
|
||||
- ✅ Clear separation of concerns
|
||||
|
||||
## Data Architecture
|
||||
|
||||
### GoogleNewsClient RSS Response Format
|
||||
```python
|
||||
{
|
||||
"query": "Apple stock",
|
||||
"period": {"start": "2024-01-01", "end": "2024-01-31"},
|
||||
"articles": [
|
||||
{
|
||||
"headline": "Apple Stock Soars on New Product Launch",
|
||||
"summary": "Brief summary from RSS feed...",
|
||||
"content": "Full article text extracted from source...",
|
||||
"url": "https://www.cnbc.com/2024/01/20/apple-stock.html",
|
||||
"source": "CNBC",
|
||||
"date": "2024-01-20",
|
||||
"authors": ["Tech Reporter"],
|
||||
"publish_date": "2024-01-20T14:30:00Z",
|
||||
"extracted_via": "direct_fetch", # or "internet_archive"
|
||||
"extraction_success": true
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"source": "google_news_rss",
|
||||
"article_count": 25,
|
||||
"rss_feed_url": "https://news.google.com/rss/search?q=Apple+stock",
|
||||
"extraction_stats": {
|
||||
"successful": 22,
|
||||
"archive_fallback": 2,
|
||||
"failed": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### LLM Sentiment Analysis Response Format
|
||||
```python
|
||||
{
|
||||
"article_url": "https://www.cnbc.com/2024/01/20/apple-stock.html",
|
||||
"sentiment": {
|
||||
"positive": 0.7,
|
||||
"negative": 0.1,
|
||||
"neutral": 0.2,
|
||||
"metadata": {
|
||||
"score": 0.7,
|
||||
"confidence": 0.85,
|
||||
"label": "positive",
|
||||
"reasoning": "Article discusses positive earnings and growth outlook",
|
||||
"key_themes": ["earnings_beat", "product_launch", "revenue_growth"],
|
||||
"financial_entities": ["AAPL", "Apple Inc.", "iPhone 15"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Aggregate Sentiment Summary
|
||||
```python
|
||||
{
|
||||
"sentiment_summary": {
|
||||
"positive": 0.65, # Average across all articles
|
||||
"negative": 0.20,
|
||||
"neutral": 0.15,
|
||||
"metadata": {
|
||||
"dominant_sentiment": "positive",
|
||||
"confidence": 0.82,
|
||||
"article_count": 25,
|
||||
"themes": {
|
||||
"earnings": 8,
|
||||
"product_launch": 5,
|
||||
"market_analysis": 12
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Components to Create
|
||||
- ⏳ `GoogleNewsClient` - Full implementation with RSS and content extraction
|
||||
- ⏳ `LLMSentimentAnalyzer` - LLM integration for sentiment analysis
|
||||
- ⏳ `NewsService` - Replace stubs with full implementation
|
||||
|
||||
### Existing Components
|
||||
- ✅ `FinnhubClient` with company news using date objects
|
||||
- ✅ `NewsRepository` with dataclass storage
|
||||
- ✅ `NewsContext` and related Pydantic models
|
||||
|
||||
### Required Libraries
|
||||
- `feedparser` - RSS feed parsing
|
||||
- `newspaper3k` - Article content extraction
|
||||
- `requests` - HTTP requests and Internet Archive API
|
||||
- `beautifulsoup4` - HTML parsing fallback
|
||||
- LLM client library (OpenAI, Anthropic, etc.)
|
||||
|
||||
## Timeline
|
||||
|
||||
### Immediate (Phase 1)
|
||||
- Create GoogleNewsClient with RSS and content extraction
|
||||
- Implement feedparser integration
|
||||
- Add Internet Archive fallback
|
||||
- Create comprehensive test suite
|
||||
|
||||
### Phase 2-3
|
||||
- Add repository bridge methods
|
||||
- Implement full NewsService
|
||||
- Integrate LLM sentiment analysis
|
||||
- Handle multi-source aggregation
|
||||
|
||||
### Phase 4
|
||||
- Type checking and validation
|
||||
- Integration testing
|
||||
- Performance optimization
|
||||
- Documentation
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Must Have
|
||||
1. **Type Safety**: Service passes `mise run typecheck` with zero errors
|
||||
2. **RSS Integration**: Successfully parse Google News RSS feeds
|
||||
3. **Content Extraction**: Extract full articles with fallback
|
||||
4. **LLM Sentiment**: Financial sentiment analysis for all articles
|
||||
5. **Service Implementation**: All stubs replaced with working code
|
||||
6. **Local-First**: Check cache before fetching new data
|
||||
7. **Multi-Source**: Aggregate Finnhub and Google News
|
||||
|
||||
### Should Have
|
||||
1. **Extraction Stats**: Track success/failure rates
|
||||
2. **Batch Processing**: Efficient LLM sentiment analysis
|
||||
3. **Force Refresh**: Fetch new articles on demand
|
||||
4. **Error Recovery**: Handle partial failures gracefully
|
||||
|
||||
### Nice to Have
|
||||
1. **Additional Sources**: Support more news providers
|
||||
2. **Real-time Monitoring**: WebSocket for breaking news
|
||||
3. **Advanced Extraction**: Handle PDFs, videos
|
||||
4. **Sentiment Trends**: Track sentiment over time
|
||||
|
||||
---
|
||||
|
||||
This PRD focuses on completing the currently empty `NewsService` with a full implementation including RSS feed integration, article content extraction with Internet Archive fallback, and LLM-powered sentiment analysis for financial news.
|
||||
30
README.md
30
README.md
|
|
@ -293,6 +293,33 @@ This project uses [mise](https://mise.jdx.dev/) for tool and task management. Al
|
|||
- **Install tools**: `mise install` - Install Python, uv, ruff, pyright
|
||||
- **Install dependencies**: `mise run install` - Install project dependencies with uv
|
||||
|
||||
### Testing Principles
|
||||
|
||||
**Pragmatic outside-in TDD** - Mock I/O boundaries, test real logic, fast feedback.
|
||||
|
||||
#### Test Structure (Mirror Source)
|
||||
```
|
||||
tests/
|
||||
├── conftest.py # Shared fixtures
|
||||
├── domains/
|
||||
│ ├── __init__.py
|
||||
│ └── news/
|
||||
│ ├── __init__.py
|
||||
│ ├── test_news_service.py # Mock repo + clients
|
||||
│ ├── test_news_repository.py # Docker test DB
|
||||
│ └── test_google_news_client.py # pytest-vcr
|
||||
```
|
||||
|
||||
#### Mocking Strategy by Layer
|
||||
- **Services**: Mock Repository + Clients, test real transformations
|
||||
- **Repositories**: Real persistence (temp files/Docker), no mocks
|
||||
- **Clients**: Real HTTP with pytest-vcr cassettes
|
||||
|
||||
#### Quality Standards
|
||||
- **85% coverage** minimum
|
||||
- **< 100ms** per unit test
|
||||
- **Mock boundaries, test behavior**
|
||||
|
||||
### Configuration
|
||||
|
||||
The TradingAgents framework uses a centralized `TradingAgentsConfig` class for all configuration management.
|
||||
|
|
@ -428,4 +455,5 @@ ALWAYS prefer editing an existing file to creating a new one.
|
|||
NEVER proactively create documentation files (*.md) or README files. Only create documentation files if explicitly requested by the User.
|
||||
|
||||
|
||||
IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task.
|
||||
IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task.
|
||||
- remember what we learnt about testing?
|
||||
|
|
@ -1,424 +0,0 @@
|
|||
# Product Requirements Document: SocialMediaService Completion
|
||||
|
||||
## Overview
|
||||
|
||||
Complete the `SocialMediaService` to provide strongly-typed social media data and sentiment analysis to trading agents using a local-first data strategy with gap detection and intelligent caching.
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### Issues to Fix
|
||||
- **CRITICAL**: Missing `RedditClient` implementation - service calls non-existent client methods
|
||||
- **CRITICAL**: Service uses `BaseClient` inheritance but needs typed `RedditClient`
|
||||
- **CRITICAL**: `SocialRepository` has different interface than standard service pattern
|
||||
- **CRITICAL**: Repository uses `date` objects internally but service expects string date interface
|
||||
- Missing strongly-typed interfaces between components
|
||||
- Service calls `reddit_client.search_posts()`, `get_top_posts()`, `filter_posts_by_date()` methods that don't exist
|
||||
|
||||
### What Works
|
||||
- ✅ Local-first data strategy implementation (`_get_social_data_local_first`)
|
||||
- ✅ Force refresh logic (`_fetch_and_cache_fresh_social_data`)
|
||||
- ✅ `SocialContext` Pydantic model for agent consumption
|
||||
- ✅ Comprehensive sentiment analysis with keyword-based scoring
|
||||
- ✅ Engagement metrics calculation and post ranking
|
||||
- ✅ Error handling and metadata creation patterns
|
||||
- ✅ `SocialRepository` with JSON storage and post deduplication
|
||||
- ✅ `PostData` and `SentimentScore` models for structured data
|
||||
- ✅ Real-time sentiment analysis with weighted scoring
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
### 1. Strongly-Typed Interfaces
|
||||
|
||||
#### Client → Service Interface
|
||||
```python
|
||||
# RedditClient methods (to be implemented)
|
||||
def search_posts(query: str, subreddit_names: list[str], start_date: date, end_date: date, limit: int, time_filter: str) -> dict[str, Any]
|
||||
def get_top_posts(subreddit_names: list[str], start_date: date, end_date: date, limit: int, time_filter: str) -> dict[str, Any]
|
||||
def get_company_posts(symbol: str, subreddit_names: list[str], start_date: date, end_date: date, limit: int) -> dict[str, Any]
|
||||
```
|
||||
|
||||
#### Service → Repository Interface
|
||||
```python
|
||||
# SocialRepository methods (to be implemented/bridged)
|
||||
def has_data_for_period(query: str, start_date: str, end_date: str, symbol: str | None) -> bool
|
||||
def get_data(query: str, start_date: str, end_date: str, symbol: str | None) -> dict[str, Any]
|
||||
def store_data(query: str, cache_data: dict, symbol: str | None, overwrite: bool) -> bool
|
||||
def clear_data(query: str, start_date: str, end_date: str, symbol: str | None) -> bool
|
||||
```
|
||||
|
||||
#### Service → Agent Interface
|
||||
```python
|
||||
# Service output (already defined)
|
||||
def get_context(query: str, start_date: str, end_date: str, symbol: str | None, subreddits: list[str], force_refresh: bool) -> SocialContext
|
||||
def get_company_social_context(symbol: str, start_date: str, end_date: str, subreddits: list[str]) -> SocialContext
|
||||
def get_global_trends(start_date: str, end_date: str, subreddits: list[str]) -> SocialContext
|
||||
```
|
||||
|
||||
### 2. Local-First Data Strategy
|
||||
|
||||
#### Flow
|
||||
1. **Repository Lookup**: Check `SocialRepository.has_data_for_period()`
|
||||
2. **Gap Detection**: Identify missing social media data periods
|
||||
3. **Selective Fetching**: Fetch only missing data from `RedditClient`
|
||||
4. **Cache Updates**: Store new data via `repository.store_data()`
|
||||
5. **Context Assembly**: Return validated `SocialContext`
|
||||
|
||||
#### Force Refresh Support
|
||||
- `force_refresh=True` bypasses local data completely
|
||||
- Clears existing cache before fetching fresh data
|
||||
- Stores refreshed data with metadata indicating refresh
|
||||
|
||||
### 3. Date Object Conversion
|
||||
|
||||
#### Service Boundary Conversion
|
||||
```python
|
||||
# Service receives string dates from agents
|
||||
def get_context(self, query: str, start_date: str, end_date: str, ...) -> SocialContext:
|
||||
# Convert to date objects for client calls
|
||||
start_dt = date.fromisoformat(start_date)
|
||||
end_dt = date.fromisoformat(end_date)
|
||||
|
||||
# Use date objects when calling RedditClient
|
||||
posts_data = self.reddit_client.search_posts(query, subreddits, start_dt, end_dt, limit, time_filter)
|
||||
|
||||
# Repository bridge handles string to date conversion internally
|
||||
cached_data = self.repository.get_data(query, start_date, end_date, symbol)
|
||||
```
|
||||
|
||||
### 4. Reddit API Integration
|
||||
|
||||
#### RedditClient Implementation Strategy
|
||||
```python
|
||||
# RedditClient following FinnhubClient standard
|
||||
class RedditClient:
|
||||
"""Client for Reddit API access with PRAW library integration."""
|
||||
|
||||
def __init__(self, client_id: str, client_secret: str, user_agent: str):
|
||||
"""Initialize Reddit client with PRAW."""
|
||||
import praw
|
||||
self.reddit = praw.Reddit(
|
||||
client_id=client_id,
|
||||
client_secret=client_secret,
|
||||
user_agent=user_agent
|
||||
)
|
||||
|
||||
def search_posts(self, query: str, subreddit_names: list[str],
|
||||
start_date: date, end_date: date, limit: int = 50,
|
||||
time_filter: str = "week") -> dict[str, Any]:
|
||||
"""Search for posts across subreddits within date range."""
|
||||
|
||||
def get_top_posts(self, subreddit_names: list[str],
|
||||
start_date: date, end_date: date, limit: int = 50,
|
||||
time_filter: str = "week") -> dict[str, Any]:
|
||||
"""Get top posts from subreddits within date range."""
|
||||
|
||||
def get_company_posts(self, symbol: str, subreddit_names: list[str],
|
||||
start_date: date, end_date: date, limit: int = 50) -> dict[str, Any]:
|
||||
"""Get company-specific posts from subreddits."""
|
||||
```
|
||||
|
||||
#### Reddit Response Format
|
||||
```python
|
||||
{
|
||||
"query": "AAPL",
|
||||
"period": {"start": "2024-01-01", "end": "2024-01-31"},
|
||||
"posts": [
|
||||
{
|
||||
"title": "Apple earnings discussion",
|
||||
"content": "What do you think about...",
|
||||
"author": "redditor123",
|
||||
"subreddit": "investing",
|
||||
"created_utc": 1704067200,
|
||||
"score": 125,
|
||||
"num_comments": 45,
|
||||
"upvote_ratio": 0.87,
|
||||
"url": "https://reddit.com/r/investing/comments/abc123",
|
||||
"id": "abc123"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"source": "reddit",
|
||||
"retrieved_at": "2024-01-31T10:00:00Z",
|
||||
"data_quality": "HIGH",
|
||||
"subreddits": ["investing", "stocks"],
|
||||
"total_posts": 25
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Sentiment Analysis Enhancement
|
||||
|
||||
#### Advanced Sentiment Features
|
||||
- **Weighted Scoring**: High-engagement posts have more influence on overall sentiment
|
||||
- **Keyword Analysis**: Comprehensive positive/negative keyword detection
|
||||
- **Score Adjustment**: Reddit score (upvotes) influences sentiment confidence
|
||||
- **Confidence Metrics**: Based on post count and engagement levels
|
||||
- **Multi-level Analysis**: Individual post sentiment + overall summary sentiment
|
||||
|
||||
#### Sentiment Calculation Strategy
|
||||
```python
|
||||
def _calculate_advanced_sentiment(self, posts: list[PostData]) -> SentimentScore:
|
||||
"""Enhanced sentiment analysis with multiple factors."""
|
||||
# Weight by engagement score (upvotes + comments)
|
||||
# Adjust for subreddit context (WSB vs investing)
|
||||
# Consider temporal patterns (recent posts weighted higher)
|
||||
# Apply confidence scoring based on data volume
|
||||
```
|
||||
|
||||
### 6. Pydantic Validation
|
||||
|
||||
#### Context Structure
|
||||
```python
|
||||
@dataclass
|
||||
class SocialContext(BaseModel):
|
||||
symbol: str | None
|
||||
period: dict[str, str] # {"start": "2024-01-01", "end": "2024-01-31"}
|
||||
posts: list[PostData]
|
||||
engagement_metrics: dict[str, float]
|
||||
sentiment_summary: SentimentScore
|
||||
post_count: int
|
||||
platforms: list[str] # ["reddit"]
|
||||
metadata: dict[str, Any]
|
||||
```
|
||||
|
||||
#### PostData Format
|
||||
```python
|
||||
@dataclass
|
||||
class PostData(BaseModel):
|
||||
title: str
|
||||
content: str
|
||||
author: str
|
||||
source: str # subreddit name
|
||||
date: str
|
||||
url: str
|
||||
score: int
|
||||
comments: int
|
||||
engagement_score: int
|
||||
subreddit: str | None
|
||||
sentiment: SentimentScore | None
|
||||
metadata: dict[str, Any]
|
||||
```
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Phase 1: Create RedditClient
|
||||
|
||||
1. **RedditClient Implementation**
|
||||
- Create `tradingagents/clients/reddit_client.py`
|
||||
- Follow FinnhubClient standard: no BaseClient inheritance, date objects, proper error handling
|
||||
- Use PRAW (Python Reddit API Wrapper) library for Reddit API access
|
||||
- Methods: `search_posts()`, `get_top_posts()`, `get_company_posts()`
|
||||
- Implement date filtering for posts within specified ranges
|
||||
- Handle Reddit API rate limits and authentication
|
||||
|
||||
2. **Comprehensive Testing**
|
||||
- Create `tradingagents/clients/test_reddit_client.py`
|
||||
- Use pytest-vcr for Reddit API interaction recording
|
||||
- Test all client methods with multiple queries and subreddits
|
||||
- Test error handling and API rate limit scenarios
|
||||
- Mock Reddit API responses for consistent testing
|
||||
|
||||
### Phase 2: Bridge SocialRepository Interface
|
||||
|
||||
3. **Repository Interface Standardization**
|
||||
- Add standard service interface methods to `SocialRepository`
|
||||
- Bridge existing `get_social_data()` with `get_data()`
|
||||
- Bridge existing `store_social_posts()` with `store_data()`
|
||||
- Add missing `has_data_for_period()` and `clear_data()` methods
|
||||
- File: `tradingagents/repositories/social_repository.py`
|
||||
- Maintain existing dataclass functionality while adding service compatibility
|
||||
|
||||
4. **Repository Method Implementation**
|
||||
```python
|
||||
# Add these methods to SocialRepository
|
||||
def has_data_for_period(self, query: str, start_date: str, end_date: str, symbol: str | None = None) -> bool
|
||||
def get_data(self, query: str, start_date: str, end_date: str, symbol: str | None = None) -> dict[str, Any]
|
||||
def store_data(self, query: str, cache_data: dict, symbol: str | None = None, overwrite: bool = False) -> bool
|
||||
def clear_data(self, query: str, start_date: str, end_date: str, symbol: str | None = None) -> bool
|
||||
```
|
||||
|
||||
### Phase 3: Update SocialMediaService
|
||||
|
||||
5. **Client Integration Fix**
|
||||
- Replace `BaseClient` dependency with `RedditClient`
|
||||
- File: `tradingagents/services/social_media_service.py:27`
|
||||
- Update constructor: `reddit_client: RedditClient`
|
||||
|
||||
6. **Date Conversion Fix**
|
||||
- Add `date.fromisoformat()` conversion in service methods
|
||||
- Update all client calls to use date objects instead of strings
|
||||
- File: `tradingagents/services/social_media_service.py:182-190, 418-429`
|
||||
|
||||
7. **Repository Interface Integration**
|
||||
- Update repository method calls to use new standard interface
|
||||
- Ensure proper error handling for repository operations
|
||||
- File: `tradingagents/services/social_media_service.py:302-311, 325-337`
|
||||
|
||||
### Phase 4: Type Safety & Validation
|
||||
|
||||
8. **Comprehensive Type Checking**
|
||||
- Run `mise run typecheck` - must pass with 0 errors
|
||||
- Validate all date object conversions
|
||||
- Ensure SocialContext compliance
|
||||
|
||||
9. **Enhanced Testing**
|
||||
- Update existing service tests for new RedditClient interface
|
||||
- Add gap detection test scenarios
|
||||
- Test sentiment analysis accuracy with known datasets
|
||||
- Test multi-subreddit aggregation and deduplication
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Functional Requirements
|
||||
- ✅ Service successfully calls `RedditClient` with `date` objects
|
||||
- ✅ Local-first strategy works: checks cache → identifies gaps → fetches missing → stores updates
|
||||
- ✅ Returns properly validated `SocialContext` to agents
|
||||
- ✅ Sentiment analysis provides accurate scores with confidence metrics
|
||||
- ✅ Multi-subreddit support with post deduplication
|
||||
- ✅ Force refresh bypasses cache and refreshes data
|
||||
|
||||
### Technical Requirements
|
||||
- ✅ Zero type checking errors: `mise run typecheck`
|
||||
- ✅ Zero linting errors: `mise run lint`
|
||||
- ✅ All existing tests pass with updated architecture
|
||||
- ✅ No runtime errors with date conversions
|
||||
|
||||
### Quality Requirements
|
||||
- ✅ Strongly-typed interfaces between all components
|
||||
- ✅ PRAW library integration for reliable Reddit API access
|
||||
- ✅ Comprehensive error handling and logging
|
||||
- ✅ Efficient caching with minimal API calls
|
||||
- ✅ Clear separation of concerns between service, client, and repository
|
||||
- ✅ Accurate sentiment analysis with engagement weighting
|
||||
|
||||
## Data Architecture
|
||||
|
||||
### RedditClient Response Format
|
||||
```python
|
||||
{
|
||||
"query": "Tesla",
|
||||
"period": {"start": "2024-01-01", "end": "2024-01-31"},
|
||||
"posts": [
|
||||
{
|
||||
"title": "Tesla Q4 earnings beat expectations",
|
||||
"content": "Tesla reported strong Q4 results...",
|
||||
"author": "teslaInvestor",
|
||||
"subreddit": "TeslaInvestors",
|
||||
"created_utc": 1704067200,
|
||||
"score": 245,
|
||||
"num_comments": 67,
|
||||
"upvote_ratio": 0.92,
|
||||
"url": "https://reddit.com/r/TeslaInvestors/comments/xyz789",
|
||||
"id": "xyz789"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"source": "reddit",
|
||||
"retrieved_at": "2024-01-31T10:00:00Z",
|
||||
"data_quality": "HIGH",
|
||||
"subreddits": ["TeslaInvestors", "stocks"],
|
||||
"post_count": 25,
|
||||
"api_calls": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### SocialRepository Data Bridge Format
|
||||
```python
|
||||
# Repository stores data in existing SocialPost format but provides service interface
|
||||
{
|
||||
"query": "Tesla",
|
||||
"symbol": "TSLA",
|
||||
"posts": [
|
||||
{
|
||||
"title": "Tesla Q4 earnings beat expectations",
|
||||
"content": "Tesla reported strong Q4 results...",
|
||||
"author": "teslaInvestor",
|
||||
"source": "TeslaInvestors",
|
||||
"date": "2024-01-15",
|
||||
"url": "https://reddit.com/r/TeslaInvestors/comments/xyz789",
|
||||
"score": 245,
|
||||
"comments": 67,
|
||||
"engagement_score": 312,
|
||||
"subreddit": "TeslaInvestors",
|
||||
"sentiment": {
|
||||
"score": 0.7,
|
||||
"confidence": 0.8,
|
||||
"label": "positive"
|
||||
},
|
||||
"metadata": {
|
||||
"platform_id": "xyz789",
|
||||
"upvote_ratio": 0.92
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"cached_at": "2024-01-31T10:00:00Z",
|
||||
"post_count": 25,
|
||||
"sources": ["reddit"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Missing Components (Need Creation)
|
||||
- ⏳ `RedditClient` needs full implementation from scratch
|
||||
- ⏳ Service interface bridge methods for `SocialRepository`
|
||||
- ⏳ Comprehensive pytest-vcr test suites for Reddit API
|
||||
|
||||
### Existing Components (Ready)
|
||||
- ✅ `SocialRepository` with JSON storage and deduplication
|
||||
- ✅ `SocialContext` and `PostData` Pydantic models
|
||||
- ✅ Sentiment analysis and engagement metrics logic
|
||||
|
||||
### Required
|
||||
- PRAW (Python Reddit API Wrapper) library for Reddit integration
|
||||
- Valid Reddit API credentials (client_id, client_secret, user_agent)
|
||||
- Working internet connection for live data fetching
|
||||
- Writable data directory for repository storage
|
||||
|
||||
## Timeline
|
||||
|
||||
### Immediate (Phase 1)
|
||||
- Create RedditClient following FinnhubClient standard with PRAW integration
|
||||
- Implement comprehensive testing with pytest-vcr for Reddit API
|
||||
- Validate client functionality with multiple subreddits and queries
|
||||
|
||||
### Phase 2-3
|
||||
- Add standard service interface methods to SocialRepository
|
||||
- Update SocialMediaService to use RedditClient with date objects
|
||||
- Bridge repository interfaces while maintaining existing functionality
|
||||
|
||||
### Phase 4
|
||||
- Comprehensive type checking and validation
|
||||
- Integration testing with sentiment analysis workflows
|
||||
- Performance optimization and caching efficiency
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Must Have
|
||||
1. **Type Safety**: Service passes `mise run typecheck` with zero errors
|
||||
2. **Client Integration**: All `RedditClient` calls use `date` objects correctly
|
||||
3. **Local-First**: Service checks repository before Reddit API calls
|
||||
4. **Context Validation**: Returns valid `SocialContext` with Pydantic validation
|
||||
5. **Sentiment Analysis**: Provides accurate sentiment scores with confidence metrics
|
||||
6. **Multi-Platform**: Seamlessly aggregates social data from Reddit with extensibility
|
||||
|
||||
### Should Have
|
||||
1. **Gap Detection**: Intelligent identification of missing data periods
|
||||
2. **Cache Efficiency**: Minimal redundant API calls to Reddit
|
||||
3. **Force Refresh**: Complete cache bypass when requested
|
||||
4. **Data Quality**: Metadata indicating data source and quality metrics
|
||||
5. **Deduplication**: Automatic removal of duplicate posts by platform_id
|
||||
|
||||
### Nice to Have
|
||||
1. **Performance Metrics**: Timing and cache hit rate logging
|
||||
2. **Data Staleness**: Automatic refresh of old cached social data
|
||||
3. **Enhanced Sentiment**: Integration with advanced NLP libraries (TextBlob, VADER)
|
||||
4. **Real-time Social**: Support for live social media feeds and alerts
|
||||
5. **Platform Expansion**: Easy addition of Twitter, Discord, other social platforms
|
||||
|
||||
---
|
||||
|
||||
This PRD focuses on completing the `SocialMediaService` as a strongly-typed, local-first data service that integrates Reddit social media data through a new `RedditClient` following the established FinnhubClient standard patterns, while providing comprehensive sentiment analysis and engagement metrics to trading agents.
|
||||
File diff suppressed because it is too large
Load Diff
|
|
@ -33,7 +33,7 @@ dependencies = [
|
|||
"typing-extensions>=4.14.0",
|
||||
"yfinance>=0.2.63",
|
||||
"TA-Lib>=0.4.28",
|
||||
"newspaper3k>=0.2.8",
|
||||
"newspaper4k>=0.9.3",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
|
|
|
|||
|
|
@ -7,5 +7,6 @@
|
|||
"reportMissingTypeStubs": false,
|
||||
"useLibraryCodeForTypes": true,
|
||||
"autoSearchPaths": true,
|
||||
"extraPaths": []
|
||||
"extraPaths": [],
|
||||
"stubPath": "typings"
|
||||
}
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
#!/bin/bash
|
||||
echo "Running type check..."
|
||||
cd /Users/martinrichards/code/TradingAgents
|
||||
mise run typecheck
|
||||
|
|
@ -0,0 +1 @@
|
|||
"""Test package for TradingAgents following pragmatic outside-in TDD."""
|
||||
|
|
@ -0,0 +1,127 @@
|
|||
"""
|
||||
Test configuration and shared fixtures following pragmatic TDD principles.
|
||||
|
||||
Provides shared fixtures for mocking I/O boundaries while using real objects
|
||||
for business logic and data transformations.
|
||||
"""
|
||||
|
||||
import shutil
|
||||
import tempfile
|
||||
from datetime import date, datetime
|
||||
from unittest.mock import Mock
|
||||
|
||||
import pytest
|
||||
|
||||
from tradingagents.domains.news.article_scraper_client import (
|
||||
ArticleScraperClient,
|
||||
ScrapeResult,
|
||||
)
|
||||
from tradingagents.domains.news.google_news_client import (
|
||||
GoogleNewsArticle,
|
||||
GoogleNewsClient,
|
||||
)
|
||||
from tradingagents.domains.news.news_repository import (
|
||||
NewsArticle,
|
||||
NewsRepository,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_google_client():
|
||||
"""Mock GoogleNewsClient for testing I/O boundary."""
|
||||
return Mock(spec=GoogleNewsClient)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_article_scraper():
|
||||
"""Mock ArticleScraperClient for testing I/O boundary."""
|
||||
return Mock(spec=ArticleScraperClient)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_repository():
|
||||
"""Mock NewsRepository for testing I/O boundary."""
|
||||
return Mock(spec=NewsRepository)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_data_dir():
|
||||
"""Temporary directory for testing real repository persistence."""
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
yield temp_dir
|
||||
shutil.rmtree(temp_dir)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def real_repository(temp_data_dir):
|
||||
"""Real NewsRepository instance for testing persistence logic."""
|
||||
return NewsRepository(temp_data_dir)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_news_articles():
|
||||
"""Sample NewsArticle objects for testing data transformations."""
|
||||
return [
|
||||
NewsArticle(
|
||||
headline="Apple Stock Rises 5% on Strong Earnings",
|
||||
url="https://example.com/apple-earnings",
|
||||
source="CNBC",
|
||||
published_date=date(2024, 1, 15),
|
||||
summary="Apple reports strong quarterly earnings beating expectations",
|
||||
sentiment_score=0.7,
|
||||
author="John Reporter",
|
||||
),
|
||||
NewsArticle(
|
||||
headline="Apple Faces Supply Chain Challenges",
|
||||
url="https://example.com/apple-supply-chain",
|
||||
source="Reuters",
|
||||
published_date=date(2024, 1, 16),
|
||||
summary="Apple struggles with component shortages affecting production",
|
||||
sentiment_score=-0.3,
|
||||
author="Jane Analyst",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_google_articles():
|
||||
"""Sample GoogleNewsArticle objects for testing data transformations."""
|
||||
return [
|
||||
GoogleNewsArticle(
|
||||
title="Apple Stock Soars on Positive Outlook",
|
||||
link="https://example.com/apple-soars",
|
||||
published=datetime(2024, 1, 15, 10, 30),
|
||||
summary="Investors are optimistic about Apple's future",
|
||||
source="MarketWatch",
|
||||
guid="article1",
|
||||
),
|
||||
GoogleNewsArticle(
|
||||
title="Apple Announces New Product Line",
|
||||
link="https://example.com/apple-products",
|
||||
published=datetime(2024, 1, 16, 14, 20),
|
||||
summary="Apple unveils exciting new product lineup",
|
||||
source="TechCrunch",
|
||||
guid="article2",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_scrape_results():
|
||||
"""Sample ScrapeResult objects for testing data transformations."""
|
||||
return {
|
||||
"https://example.com/apple-soars": ScrapeResult(
|
||||
status="SUCCESS",
|
||||
content="Full article content about Apple's stock performance...",
|
||||
author="Market Reporter",
|
||||
title="Apple Stock Soars on Positive Outlook",
|
||||
publish_date="2024-01-15",
|
||||
),
|
||||
"https://example.com/apple-products": ScrapeResult(
|
||||
status="SUCCESS",
|
||||
content="Detailed content about Apple's new product announcements...",
|
||||
author="Tech Writer",
|
||||
title="Apple Announces New Product Line",
|
||||
publish_date="2024-01-16",
|
||||
),
|
||||
}
|
||||
|
|
@ -0,0 +1 @@
|
|||
"""Domain tests package."""
|
||||
|
|
@ -0,0 +1 @@
|
|||
"""News domain tests package."""
|
||||
|
|
@ -0,0 +1,532 @@
|
|||
"""
|
||||
Test ArticleScraperClient with pytest-vcr for HTTP recording/replay.
|
||||
|
||||
Following pragmatic TDD principles:
|
||||
- Mock HTTP boundaries with VCR cassettes
|
||||
- Test real business logic and data transformations
|
||||
- Fast, deterministic tests
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from tradingagents.domains.news.article_scraper_client import (
|
||||
ArticleScraperClient,
|
||||
ScrapeResult,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cassette_dir():
|
||||
"""Directory for VCR cassettes."""
|
||||
return (
|
||||
Path(__file__).parent.parent.parent
|
||||
/ "fixtures"
|
||||
/ "vcr_cassettes"
|
||||
/ "article_scraper"
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def scraper():
|
||||
"""ArticleScraperClient instance for testing."""
|
||||
return ArticleScraperClient(
|
||||
user_agent="Test-Agent/1.0",
|
||||
delay=0.1, # Faster tests
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def valid_urls():
|
||||
"""Valid test URLs."""
|
||||
return [
|
||||
"https://www.reuters.com/business/finance/",
|
||||
"https://www.bloomberg.com/markets/stocks",
|
||||
"https://techcrunch.com/2024/01/15/tech-news/",
|
||||
]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def invalid_urls():
|
||||
"""Invalid test URLs."""
|
||||
return [
|
||||
"",
|
||||
"not-a-url",
|
||||
"http://",
|
||||
"https://",
|
||||
"ftp://example.com/file.txt",
|
||||
"https://non-existent-domain-123456.com/article",
|
||||
]
|
||||
|
||||
|
||||
class TestArticleScraperClient:
|
||||
"""Test ArticleScraperClient functionality."""
|
||||
|
||||
def test_initialization(self):
|
||||
"""Test scraper initializes with correct configuration."""
|
||||
# Test with custom user agent
|
||||
scraper = ArticleScraperClient("Custom-Agent/1.0", delay=2.0)
|
||||
assert scraper.user_agent == "Custom-Agent/1.0"
|
||||
assert scraper.delay == 2.0
|
||||
|
||||
# Test with default user agent (None/empty)
|
||||
scraper_default = ArticleScraperClient(None)
|
||||
assert "Chrome" in scraper_default.user_agent
|
||||
assert scraper_default.delay == 1.0
|
||||
|
||||
def test_is_valid_url(self, scraper):
|
||||
"""Test URL validation logic."""
|
||||
# Valid URLs
|
||||
assert scraper._is_valid_url("https://example.com/article") is True
|
||||
assert scraper._is_valid_url("http://example.com/article") is True
|
||||
assert scraper._is_valid_url("https://sub.domain.com/path?query=value") is True
|
||||
|
||||
# Invalid URLs
|
||||
assert scraper._is_valid_url("") is False
|
||||
assert scraper._is_valid_url("not-a-url") is False
|
||||
assert scraper._is_valid_url("ftp://example.com") is False
|
||||
assert scraper._is_valid_url("http://") is False
|
||||
assert scraper._is_valid_url("https://") is False
|
||||
|
||||
def test_scrape_article_invalid_url(self, scraper, invalid_urls):
|
||||
"""Test scraping with invalid URLs returns NOT_FOUND."""
|
||||
for url in invalid_urls:
|
||||
result = scraper.scrape_article(url)
|
||||
assert result.status == "NOT_FOUND"
|
||||
assert result.content == ""
|
||||
assert result.final_url == url
|
||||
|
||||
|
||||
class TestArticleScrapingSuccess:
|
||||
"""Test successful article scraping scenarios."""
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_success(self, mock_article_class, mock_sleep, scraper):
|
||||
"""Test successful article scraping with mocked newspaper4k."""
|
||||
# Setup mock article
|
||||
mock_article = Mock()
|
||||
mock_article.text = "This is a long article content that is definitely over 100 characters in length and should pass the validation check."
|
||||
mock_article.title = "Test Article Title"
|
||||
mock_article.authors = ["John Doe", "Jane Smith"]
|
||||
mock_article.publish_date = "2024-01-15"
|
||||
mock_article.download.return_value = None
|
||||
mock_article.parse.return_value = None
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
# Test scraping
|
||||
result = scraper.scrape_article("https://example.com/article")
|
||||
|
||||
# Verify results
|
||||
assert result.status == "SUCCESS"
|
||||
assert result.content == mock_article.text
|
||||
assert result.title == "Test Article Title"
|
||||
assert result.author == "John Doe, Jane Smith"
|
||||
assert result.publish_date == "2024-01-15"
|
||||
assert result.final_url == "https://example.com/article"
|
||||
|
||||
# Verify newspaper4k was configured correctly
|
||||
mock_article_class.assert_called_once()
|
||||
args, kwargs = mock_article_class.call_args
|
||||
assert args[0] == "https://example.com/article"
|
||||
config = (
|
||||
kwargs["config"]
|
||||
if "config" in kwargs
|
||||
else args[1]
|
||||
if len(args) > 1
|
||||
else None
|
||||
)
|
||||
assert config is not None
|
||||
assert config.browser_user_agent == "Test-Agent/1.0"
|
||||
assert config.request_timeout == 10
|
||||
|
||||
# Verify delay was applied
|
||||
mock_sleep.assert_called_once_with(0.1)
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_with_datetime_publish_date(
|
||||
self, mock_article_class, mock_sleep, scraper
|
||||
):
|
||||
"""Test successful scraping with datetime publish_date."""
|
||||
from datetime import datetime
|
||||
|
||||
mock_article = Mock()
|
||||
mock_article.text = "Long article content over 100 characters for testing publish date handling in the newspaper4k client."
|
||||
mock_article.title = "DateTime Test Article"
|
||||
mock_article.authors = []
|
||||
mock_article.publish_date = datetime(2024, 1, 15, 14, 30, 0)
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper.scrape_article("https://example.com/datetime-article")
|
||||
|
||||
assert result.status == "SUCCESS"
|
||||
assert result.publish_date == "2024-01-15"
|
||||
assert result.author == "" # Empty authors list
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_short_content_fails(
|
||||
self, mock_article_class, mock_sleep, scraper
|
||||
):
|
||||
"""Test that articles with content under 100 chars are rejected."""
|
||||
mock_article = Mock()
|
||||
mock_article.text = "Short content" # Under 100 characters
|
||||
mock_article.title = "Short Article"
|
||||
mock_article.authors = []
|
||||
mock_article.publish_date = None
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper.scrape_article("https://example.com/short-article")
|
||||
|
||||
assert result.status == "SCRAPE_FAILED"
|
||||
assert result.content == ""
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_empty_content_fails(
|
||||
self, mock_article_class, mock_sleep, scraper
|
||||
):
|
||||
"""Test that articles with empty content are rejected."""
|
||||
mock_article = Mock()
|
||||
mock_article.text = "" # Empty content
|
||||
mock_article.title = ""
|
||||
mock_article.authors = []
|
||||
mock_article.publish_date = None
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper.scrape_article("https://example.com/empty-article")
|
||||
|
||||
assert result.status == "SCRAPE_FAILED"
|
||||
assert result.content == ""
|
||||
|
||||
|
||||
class TestArticleScrapingFailure:
|
||||
"""Test article scraping failure scenarios."""
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_download_exception(
|
||||
self, mock_article_class, mock_sleep, scraper
|
||||
):
|
||||
"""Test scraping when newspaper4k download fails."""
|
||||
mock_article = Mock()
|
||||
mock_article.download.side_effect = Exception("Download failed")
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper.scrape_article("https://example.com/failing-article")
|
||||
|
||||
assert result.status == "SCRAPE_FAILED"
|
||||
assert result.content == ""
|
||||
assert result.final_url == "https://example.com/failing-article"
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_parse_exception(
|
||||
self, mock_article_class, mock_sleep, scraper
|
||||
):
|
||||
"""Test scraping when newspaper4k parse fails."""
|
||||
mock_article = Mock()
|
||||
mock_article.download.return_value = None
|
||||
mock_article.parse.side_effect = Exception("Parse failed")
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper.scrape_article("https://example.com/parse-fail-article")
|
||||
|
||||
assert result.status == "SCRAPE_FAILED"
|
||||
assert result.content == ""
|
||||
|
||||
|
||||
class TestWaybackMachineFallback:
|
||||
"""Test Internet Archive Wayback Machine fallback functionality."""
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.requests.get")
|
||||
def test_scrape_from_wayback_no_requests(self, mock_get, scraper):
|
||||
"""Test Wayback fallback when requests is not available."""
|
||||
with patch(
|
||||
"builtins.__import__", side_effect=ImportError("No module named 'requests'")
|
||||
):
|
||||
result = scraper._scrape_from_wayback("https://example.com/article")
|
||||
|
||||
assert result.status == "NOT_FOUND"
|
||||
assert result.final_url == "https://example.com/article"
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.requests.get")
|
||||
def test_scrape_from_wayback_no_snapshots(self, mock_get, scraper):
|
||||
"""Test Wayback fallback when no archived snapshots exist."""
|
||||
# Mock CDX API response with only headers (no snapshots)
|
||||
mock_response = Mock()
|
||||
mock_response.json.return_value = [["timestamp", "original"]] # Only headers
|
||||
mock_response.raise_for_status.return_value = None
|
||||
mock_get.return_value = mock_response
|
||||
|
||||
result = scraper._scrape_from_wayback("https://example.com/no-archive")
|
||||
|
||||
assert result.status == "NOT_FOUND"
|
||||
assert result.final_url == "https://example.com/no-archive"
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.requests.get")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_from_wayback_success(
|
||||
self, mock_article_class, mock_sleep, mock_get, scraper
|
||||
):
|
||||
"""Test successful Wayback Machine scraping."""
|
||||
# Mock CDX API response
|
||||
mock_response = Mock()
|
||||
mock_response.json.return_value = [
|
||||
["timestamp", "original"], # Headers
|
||||
["20240115120000", "https://example.com/article"], # Snapshot data
|
||||
]
|
||||
mock_response.raise_for_status.return_value = None
|
||||
mock_get.return_value = mock_response
|
||||
|
||||
# Mock successful article scraping from archive
|
||||
mock_article = Mock()
|
||||
mock_article.text = "Archived article content that is long enough to pass validation checks and contains meaningful information."
|
||||
mock_article.title = "Archived Article"
|
||||
mock_article.authors = ["Archive Author"]
|
||||
mock_article.publish_date = "2024-01-15"
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper._scrape_from_wayback("https://example.com/article")
|
||||
|
||||
assert result.status == "ARCHIVE_SUCCESS"
|
||||
assert result.content == mock_article.text
|
||||
assert result.title == "Archived Article"
|
||||
assert (
|
||||
result.final_url
|
||||
== "https://web.archive.org/web/20240115120000/https://example.com/article"
|
||||
)
|
||||
|
||||
# Verify CDX API was called correctly
|
||||
mock_get.assert_called_with(
|
||||
"http://web.archive.org/cdx/search/cdx",
|
||||
params={
|
||||
"url": "https://example.com/article",
|
||||
"output": "json",
|
||||
"fl": "timestamp,original",
|
||||
"filter": "statuscode:200",
|
||||
"limit": "1",
|
||||
},
|
||||
timeout=10,
|
||||
)
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.requests.get")
|
||||
def test_scrape_from_wayback_requests_exception(self, mock_get, scraper):
|
||||
"""Test Wayback fallback when requests fails."""
|
||||
mock_get.side_effect = Exception("Request timeout")
|
||||
|
||||
result = scraper._scrape_from_wayback("https://example.com/timeout")
|
||||
|
||||
assert result.status == "NOT_FOUND"
|
||||
assert result.final_url == "https://example.com/timeout"
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_scrape_article_fallback_to_wayback(
|
||||
self, mock_article_class, mock_sleep, scraper
|
||||
):
|
||||
"""Test full workflow: source fails, fallback to Wayback succeeds."""
|
||||
# First call (original source) fails
|
||||
# Second call (Wayback source) succeeds
|
||||
mock_article_fail = Mock()
|
||||
mock_article_fail.download.side_effect = Exception("Download failed")
|
||||
|
||||
mock_article_success = Mock()
|
||||
mock_article_success.text = "Successfully scraped content from Wayback Machine with enough length to pass validation tests."
|
||||
mock_article_success.title = "Wayback Success"
|
||||
mock_article_success.authors = ["Wayback Author"]
|
||||
mock_article_success.publish_date = "2024-01-15"
|
||||
mock_article_success.download.return_value = None
|
||||
mock_article_success.parse.return_value = None
|
||||
|
||||
mock_article_class.side_effect = [mock_article_fail, mock_article_success]
|
||||
|
||||
with patch(
|
||||
"tradingagents.domains.news.article_scraper_client.requests.get"
|
||||
) as mock_get:
|
||||
# Mock successful CDX API response
|
||||
mock_response = Mock()
|
||||
mock_response.json.return_value = [
|
||||
["timestamp", "original"],
|
||||
["20240115120000", "https://example.com/article"],
|
||||
]
|
||||
mock_response.raise_for_status.return_value = None
|
||||
mock_get.return_value = mock_response
|
||||
|
||||
result = scraper.scrape_article("https://example.com/article")
|
||||
|
||||
assert result.status == "ARCHIVE_SUCCESS"
|
||||
assert (
|
||||
result.content
|
||||
== "Successfully scraped content from Wayback Machine with enough length to pass validation tests."
|
||||
)
|
||||
assert "web.archive.org" in result.final_url
|
||||
|
||||
|
||||
class TestMultipleArticles:
|
||||
"""Test scraping multiple articles functionality."""
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
def test_scrape_multiple_articles_empty_list(self, mock_sleep, scraper):
|
||||
"""Test scraping empty list returns empty dict."""
|
||||
results = scraper.scrape_multiple_articles([])
|
||||
assert results == {}
|
||||
mock_sleep.assert_not_called()
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
def test_scrape_multiple_articles_single_url(self, mock_sleep, scraper):
|
||||
"""Test scraping single URL in list."""
|
||||
urls = ["https://example.com/single"]
|
||||
|
||||
with patch.object(scraper, "scrape_article") as mock_scrape:
|
||||
mock_scrape.return_value = ScrapeResult(
|
||||
status="SUCCESS", content="Single article content"
|
||||
)
|
||||
|
||||
results = scraper.scrape_multiple_articles(urls)
|
||||
|
||||
assert len(results) == 1
|
||||
assert results["https://example.com/single"].status == "SUCCESS"
|
||||
mock_scrape.assert_called_once_with("https://example.com/single")
|
||||
# No delay needed for single article
|
||||
mock_sleep.assert_not_called()
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
def test_scrape_multiple_articles_with_delays(self, mock_sleep, scraper):
|
||||
"""Test scraping multiple URLs with delays between requests."""
|
||||
urls = [
|
||||
"https://example.com/article1",
|
||||
"https://example.com/article2",
|
||||
"https://example.com/article3",
|
||||
]
|
||||
|
||||
with patch.object(scraper, "scrape_article") as mock_scrape:
|
||||
mock_scrape.side_effect = [
|
||||
ScrapeResult(status="SUCCESS", content="Article 1"),
|
||||
ScrapeResult(status="SUCCESS", content="Article 2"),
|
||||
ScrapeResult(status="SCRAPE_FAILED", content=""),
|
||||
]
|
||||
|
||||
results = scraper.scrape_multiple_articles(urls)
|
||||
|
||||
assert len(results) == 3
|
||||
assert results["https://example.com/article1"].status == "SUCCESS"
|
||||
assert results["https://example.com/article2"].status == "SUCCESS"
|
||||
assert results["https://example.com/article3"].status == "SCRAPE_FAILED"
|
||||
|
||||
# Verify delay called between requests (n-1 times)
|
||||
assert mock_sleep.call_count == 2
|
||||
mock_sleep.assert_called_with(0.1)
|
||||
|
||||
|
||||
class TestDataTransformation:
|
||||
"""Test data transformation and edge cases."""
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_publish_date_edge_cases(self, mock_article_class, mock_sleep, scraper):
|
||||
"""Test various publish_date formats are handled correctly."""
|
||||
from datetime import datetime
|
||||
|
||||
test_cases = [
|
||||
(None, ""),
|
||||
("", ""),
|
||||
("2024-01-15", "2024-01-15"),
|
||||
(datetime(2024, 1, 15), "2024-01-15"),
|
||||
(12345, "12345"), # Numeric conversion
|
||||
({"year": 2024}, "{'year': 2024}"), # Dict conversion
|
||||
]
|
||||
|
||||
for pub_date, expected in test_cases:
|
||||
mock_article = Mock()
|
||||
mock_article.text = "Long enough content for validation testing with various publish date formats and edge cases."
|
||||
mock_article.title = "Date Test"
|
||||
mock_article.authors = []
|
||||
mock_article.publish_date = pub_date
|
||||
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
result = scraper.scrape_article("https://example.com/date-test")
|
||||
assert result.status == "SUCCESS"
|
||||
assert result.publish_date == expected
|
||||
|
||||
def test_scrape_result_dataclass_defaults(self):
|
||||
"""Test ScrapeResult dataclass has correct defaults."""
|
||||
result = ScrapeResult(status="TEST")
|
||||
|
||||
assert result.status == "TEST"
|
||||
assert result.content == ""
|
||||
assert result.author == ""
|
||||
assert result.final_url == ""
|
||||
assert result.title == ""
|
||||
assert result.publish_date == ""
|
||||
|
||||
def test_scrape_result_all_fields(self):
|
||||
"""Test ScrapeResult with all fields populated."""
|
||||
result = ScrapeResult(
|
||||
status="SUCCESS",
|
||||
content="Full article content",
|
||||
author="Test Author",
|
||||
final_url="https://final.com/url",
|
||||
title="Test Title",
|
||||
publish_date="2024-01-15",
|
||||
)
|
||||
|
||||
assert result.status == "SUCCESS"
|
||||
assert result.content == "Full article content"
|
||||
assert result.author == "Test Author"
|
||||
assert result.final_url == "https://final.com/url"
|
||||
assert result.title == "Test Title"
|
||||
assert result.publish_date == "2024-01-15"
|
||||
|
||||
|
||||
class TestErrorHandlingAndEdgeCases:
|
||||
"""Test error handling and edge cases."""
|
||||
|
||||
def test_user_agent_fallback(self):
|
||||
"""Test user agent fallback when None or empty is provided."""
|
||||
scraper_none = ArticleScraperClient(None)
|
||||
scraper_empty = ArticleScraperClient("")
|
||||
|
||||
# Both should use default Chrome user agent
|
||||
default_ua = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
|
||||
)
|
||||
|
||||
assert scraper_none.user_agent == default_ua
|
||||
assert scraper_empty.user_agent == default_ua
|
||||
|
||||
@patch("tradingagents.domains.news.article_scraper_client.time.sleep")
|
||||
@patch("tradingagents.domains.news.article_scraper_client.Article")
|
||||
def test_config_applied_correctly(self, mock_article_class, mock_sleep):
|
||||
"""Test that newspaper4k Config is applied with correct settings."""
|
||||
scraper = ArticleScraperClient("Custom-Agent/2.0", delay=0.5)
|
||||
|
||||
mock_article = Mock()
|
||||
mock_article.text = "Test content that meets minimum length requirements for successful article scraping validation."
|
||||
mock_article_class.return_value = mock_article
|
||||
|
||||
scraper.scrape_article("https://example.com/config-test")
|
||||
|
||||
# Verify Article was created with correct config
|
||||
mock_article_class.assert_called_once()
|
||||
args, kwargs = mock_article_class.call_args
|
||||
|
||||
assert args[0] == "https://example.com/config-test"
|
||||
config = kwargs.get("config") or (args[1] if len(args) > 1 else None)
|
||||
assert config is not None
|
||||
assert config.browser_user_agent == "Custom-Agent/2.0"
|
||||
assert config.request_timeout == 10
|
||||
assert config.keep_article_html is True
|
||||
assert config.fetch_images is False
|
||||
|
|
@ -0,0 +1,336 @@
|
|||
"""
|
||||
Test suite for NewsService following pragmatic outside-in TDD methodology.
|
||||
|
||||
This test suite follows the CLAUDE.md testing principles:
|
||||
- Mock I/O boundaries (Repository calls, HTTP clients, external systems)
|
||||
- Real objects for logic (Data transformations, validation, business logic)
|
||||
- Outside-in but practical - Start with service tests, work inward
|
||||
"""
|
||||
|
||||
from datetime import date
|
||||
from unittest.mock import Mock
|
||||
|
||||
import pytest
|
||||
|
||||
# Import mock ScrapeResult from conftest to avoid newspaper3k import issues
|
||||
from conftest import ScrapeResult
|
||||
|
||||
from tradingagents.domains.news.news_repository import (
|
||||
NewsData,
|
||||
)
|
||||
from tradingagents.domains.news.news_service import (
|
||||
ArticleData,
|
||||
NewsContext,
|
||||
NewsService,
|
||||
NewsUpdateResult,
|
||||
SentimentScore,
|
||||
)
|
||||
|
||||
|
||||
class TestNewsServiceCollaboratorInteractions:
|
||||
"""Test NewsService interactions with its collaborators (I/O boundaries)."""
|
||||
|
||||
def test_get_company_news_context_calls_repository_with_correct_params(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test that get_company_news_context calls repository with correct parameters."""
|
||||
# Arrange - Mock the I/O boundary
|
||||
mock_repository.get_news_data.return_value = {}
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act - Call the service method
|
||||
result = service.get_company_news_context("AAPL", "2024-01-01", "2024-01-31")
|
||||
|
||||
# Assert - Repository should be called with converted date objects
|
||||
mock_repository.get_news_data.assert_called_once_with(
|
||||
query="AAPL",
|
||||
start_date=date(2024, 1, 1),
|
||||
end_date=date(2024, 1, 31),
|
||||
sources=["finnhub", "google_news"],
|
||||
)
|
||||
|
||||
# Assert - Result should have correct structure (real object logic)
|
||||
assert isinstance(result, NewsContext)
|
||||
assert result.query == "AAPL"
|
||||
assert result.symbol == "AAPL"
|
||||
assert result.period == {"start": "2024-01-01", "end": "2024-01-31"}
|
||||
|
||||
def test_get_global_news_context_calls_repository_for_each_category(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test that get_global_news_context calls repository for each category."""
|
||||
# Arrange - Mock the I/O boundary
|
||||
mock_repository.get_news_data.return_value = {}
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
categories = ["business", "politics", "technology"]
|
||||
|
||||
# Act
|
||||
service.get_global_news_context(
|
||||
"2024-01-01", "2024-01-31", categories=categories
|
||||
)
|
||||
|
||||
# Assert - Repository should be called once for each category
|
||||
assert mock_repository.get_news_data.call_count == 3
|
||||
|
||||
for call_args in mock_repository.get_news_data.call_args_list:
|
||||
args, kwargs = call_args
|
||||
assert args[0] in categories # query should be one of the categories
|
||||
assert args[1] == date(2024, 1, 1) # start_date
|
||||
assert args[2] == date(2024, 1, 31) # end_date
|
||||
assert kwargs["sources"] == ["google_news"]
|
||||
|
||||
def test_update_company_news_calls_google_client(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test that update_company_news calls GoogleNewsClient correctly."""
|
||||
# Arrange - Mock the I/O boundary
|
||||
mock_google_client.get_company_news.return_value = []
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act
|
||||
result = service.update_company_news("AAPL")
|
||||
|
||||
# Assert - Google client should be called
|
||||
mock_google_client.get_company_news.assert_called_once_with("AAPL")
|
||||
assert isinstance(result, NewsUpdateResult)
|
||||
assert result.symbol == "AAPL"
|
||||
assert result.articles_found == 0
|
||||
|
||||
def test_update_company_news_scrapes_each_article_url(
|
||||
self,
|
||||
mock_repository,
|
||||
mock_google_client,
|
||||
mock_article_scraper,
|
||||
sample_google_articles,
|
||||
):
|
||||
"""Test that update_company_news calls scraper for each article URL."""
|
||||
# Arrange - Mock I/O boundaries with real data objects
|
||||
mock_google_client.get_company_news.return_value = sample_google_articles
|
||||
mock_article_scraper.scrape_article.return_value = ScrapeResult(
|
||||
status="SUCCESS",
|
||||
content="Full article content",
|
||||
author="Test Author",
|
||||
title="Test Title",
|
||||
publish_date="2024-01-15",
|
||||
)
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act
|
||||
result = service.update_company_news("AAPL")
|
||||
|
||||
# Assert - Scraper should be called for each article
|
||||
assert mock_article_scraper.scrape_article.call_count == 2
|
||||
mock_article_scraper.scrape_article.assert_any_call(
|
||||
"https://example.com/apple-soars"
|
||||
)
|
||||
mock_article_scraper.scrape_article.assert_any_call(
|
||||
"https://example.com/apple-products"
|
||||
)
|
||||
|
||||
# Assert - Real object logic for result
|
||||
assert result.articles_found == 2
|
||||
assert result.articles_scraped == 2
|
||||
assert result.articles_failed == 0
|
||||
|
||||
def test_repository_failure_returns_empty_context_with_error_metadata(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test that repository failure is handled gracefully."""
|
||||
# Arrange - Mock repository failure (I/O boundary)
|
||||
mock_repository.get_news_data.side_effect = Exception(
|
||||
"Database connection failed"
|
||||
)
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act
|
||||
result = service.get_company_news_context("AAPL", "2024-01-01", "2024-01-31")
|
||||
|
||||
# Assert - Should return empty context with error metadata (real object logic)
|
||||
assert isinstance(result, NewsContext)
|
||||
assert result.articles == []
|
||||
assert result.article_count == 0
|
||||
assert "error" in result.metadata
|
||||
assert "Database connection failed" in result.metadata["error"]
|
||||
|
||||
|
||||
class TestNewsServiceDataTransformations:
|
||||
"""Test data transformations using real objects (no mocking)."""
|
||||
|
||||
def test_converts_repository_articles_to_article_data(
|
||||
self, mock_google_client, mock_article_scraper, sample_news_articles
|
||||
):
|
||||
"""Test conversion of NewsRepository.NewsArticle to ArticleData."""
|
||||
# Arrange - Create real repository with sample data
|
||||
mock_repo = Mock()
|
||||
news_data = NewsData(
|
||||
query="AAPL",
|
||||
date=date(2024, 1, 15),
|
||||
source="finnhub",
|
||||
articles=sample_news_articles,
|
||||
)
|
||||
mock_repo.get_news_data.return_value = {date(2024, 1, 15): [news_data]}
|
||||
|
||||
service = NewsService(mock_google_client, mock_repo, mock_article_scraper)
|
||||
|
||||
# Act - Test real data transformation logic
|
||||
result = service.get_company_news_context("AAPL", "2024-01-01", "2024-01-31")
|
||||
|
||||
# Assert - Real object data transformation
|
||||
assert len(result.articles) == 2
|
||||
assert result.articles[0].title == "Apple Stock Rises 5% on Strong Earnings"
|
||||
assert (
|
||||
result.articles[0].content
|
||||
== "Apple reports strong quarterly earnings beating expectations"
|
||||
)
|
||||
assert result.articles[0].date == "2024-01-15"
|
||||
assert result.articles[0].source == "CNBC"
|
||||
assert result.articles[0].url == "https://example.com/apple-earnings"
|
||||
|
||||
def test_calculates_sentiment_summary_from_articles(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test sentiment summary calculation from article list."""
|
||||
# Arrange - Create articles with sentiment-bearing content (real objects)
|
||||
articles = [
|
||||
ArticleData(
|
||||
title="Great News for Apple",
|
||||
content="Apple stock is performing excellent with strong growth and positive outlook",
|
||||
author="Analyst",
|
||||
source="CNBC",
|
||||
date="2024-01-15",
|
||||
url="https://example.com/positive",
|
||||
),
|
||||
ArticleData(
|
||||
title="Apple Faces Challenges",
|
||||
content="Apple stock is declining due to bad earnings and negative market sentiment",
|
||||
author="Reporter",
|
||||
source="Reuters",
|
||||
date="2024-01-16",
|
||||
url="https://example.com/negative",
|
||||
),
|
||||
]
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act - Test real sentiment calculation logic (private method)
|
||||
sentiment = service._calculate_sentiment_summary(articles)
|
||||
|
||||
# Assert - Real sentiment calculation
|
||||
assert isinstance(sentiment, SentimentScore)
|
||||
assert -1.0 <= sentiment.score <= 1.0
|
||||
assert 0.0 <= sentiment.confidence <= 1.0
|
||||
assert sentiment.label in ["positive", "negative", "neutral"]
|
||||
|
||||
def test_extracts_trending_topics_from_articles(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test trending topic extraction."""
|
||||
# Arrange - Create articles with repeated keywords (real objects)
|
||||
articles = [
|
||||
ArticleData(
|
||||
title="Apple iPhone Sales Surge",
|
||||
content="Content about iPhone",
|
||||
author="Reporter",
|
||||
source="TechNews",
|
||||
date="2024-01-15",
|
||||
url="https://example.com/iphone1",
|
||||
),
|
||||
ArticleData(
|
||||
title="iPhone Market Share Growth",
|
||||
content="More iPhone content",
|
||||
author="Analyst",
|
||||
source="MarketWatch",
|
||||
date="2024-01-16",
|
||||
url="https://example.com/iphone2",
|
||||
),
|
||||
ArticleData(
|
||||
title="Apple Revenue from Services",
|
||||
content="Services revenue content",
|
||||
author="Finance Writer",
|
||||
source="Bloomberg",
|
||||
date="2024-01-17",
|
||||
url="https://example.com/services",
|
||||
),
|
||||
]
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act - Test real trending topic extraction logic
|
||||
topics = service._extract_trending_topics(articles)
|
||||
|
||||
# Assert - Should identify repeated keywords
|
||||
assert isinstance(topics, list)
|
||||
assert "iphone" in topics # Should appear twice
|
||||
assert "apple" in topics # Should appear multiple times
|
||||
|
||||
|
||||
class TestNewsServiceErrorScenarios:
|
||||
"""Test various error scenarios and edge cases."""
|
||||
|
||||
def test_handles_google_client_failure(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test handling of GoogleNewsClient failure."""
|
||||
# Arrange - Mock client failure (I/O boundary)
|
||||
mock_google_client.get_company_news.side_effect = Exception(
|
||||
"API rate limit exceeded"
|
||||
)
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act & Assert - Should raise the exception
|
||||
with pytest.raises(Exception, match="API rate limit exceeded"):
|
||||
service.update_company_news("AAPL")
|
||||
|
||||
def test_handles_article_scraper_failure(
|
||||
self,
|
||||
mock_repository,
|
||||
mock_google_client,
|
||||
mock_article_scraper,
|
||||
sample_google_articles,
|
||||
):
|
||||
"""Test handling of article scraper failure."""
|
||||
# Arrange - Mock scraper returning failure status
|
||||
mock_google_client.get_company_news.return_value = sample_google_articles
|
||||
mock_article_scraper.scrape_article.return_value = ScrapeResult(
|
||||
status="SCRAPE_FAILED", content="", author="", title="", publish_date=""
|
||||
)
|
||||
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act
|
||||
result = service.update_company_news("AAPL")
|
||||
|
||||
# Assert - Should handle scraper failures gracefully
|
||||
assert result.articles_found == 2
|
||||
assert result.articles_scraped == 0
|
||||
assert result.articles_failed == 2
|
||||
|
||||
def test_handles_invalid_date_formats(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test validation of date formats."""
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act & Assert - Should raise ValueError for invalid date format
|
||||
with pytest.raises(ValueError):
|
||||
service.get_company_news_context("AAPL", "invalid-date", "2024-01-31")
|
||||
|
||||
def test_handles_empty_articles_gracefully(
|
||||
self, mock_repository, mock_google_client, mock_article_scraper
|
||||
):
|
||||
"""Test handling of empty article list."""
|
||||
service = NewsService(mock_google_client, mock_repository, mock_article_scraper)
|
||||
|
||||
# Act - Test sentiment calculation with empty list
|
||||
sentiment = service._calculate_sentiment_summary([])
|
||||
|
||||
# Assert - Should return neutral sentiment
|
||||
assert sentiment.score == 0.0
|
||||
assert sentiment.confidence == 0.0
|
||||
assert sentiment.label == "neutral"
|
||||
|
|
@ -8,7 +8,7 @@ from dataclasses import dataclass
|
|||
from datetime import datetime
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import newspaper
|
||||
from newspaper import Article, Config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
@ -28,12 +28,12 @@ class ScrapeResult:
|
|||
class ArticleScraperClient:
|
||||
"""Client for scraping article content with Internet Archive fallback."""
|
||||
|
||||
def __init__(self, user_agent: str, delay: float = 1.0):
|
||||
def __init__(self, user_agent: str | None = None, delay: float = 1.0):
|
||||
"""
|
||||
Initialize article scraper.
|
||||
|
||||
Args:
|
||||
user_agent: User agent string for requests
|
||||
user_agent: User agent string for requests (None for default)
|
||||
delay: Delay between requests in seconds
|
||||
"""
|
||||
self.user_agent = user_agent or (
|
||||
|
|
@ -65,17 +65,18 @@ class ArticleScraperClient:
|
|||
return self._scrape_from_wayback(url)
|
||||
|
||||
def _scrape_from_source(self, url: str) -> ScrapeResult:
|
||||
"""Scrape article from original source using newspaper3k."""
|
||||
"""Scrape article from original source using newspaper4k."""
|
||||
try:
|
||||
# Add delay to be respectful
|
||||
time.sleep(self.delay)
|
||||
|
||||
# Configure newspaper article
|
||||
article = newspaper.Article(url)
|
||||
article.config.browser_user_agent = self.user_agent
|
||||
article.config.request_timeout = 10
|
||||
# Configure newspaper4k with optimizations
|
||||
config = Config()
|
||||
config.browser_user_agent = self.user_agent
|
||||
config.request_timeout = 10
|
||||
config.fetch_images = False
|
||||
|
||||
# Download and parse
|
||||
article = Article(url, config=config)
|
||||
article.download()
|
||||
article.parse()
|
||||
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ News service that provides structured news context.
|
|||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from datetime import date
|
||||
from enum import Enum
|
||||
from typing import Any
|
||||
|
||||
|
|
@ -134,13 +135,39 @@ class NewsService:
|
|||
try:
|
||||
logger.info(f"Getting company news context for {symbol} from repository")
|
||||
|
||||
# Get articles from repository
|
||||
# Get articles from repository (READ PATH - no API calls)
|
||||
articles = []
|
||||
if self.repository:
|
||||
try:
|
||||
# This would depend on the actual repository interface
|
||||
# For now, return empty list - repository integration needs to be completed
|
||||
articles = []
|
||||
# Convert date strings to date objects
|
||||
start_date_obj = date.fromisoformat(start_date)
|
||||
end_date_obj = date.fromisoformat(end_date)
|
||||
|
||||
# Get cached news data from repository
|
||||
news_data_by_date = self.repository.get_news_data(
|
||||
query=symbol,
|
||||
start_date=start_date_obj,
|
||||
end_date=end_date_obj,
|
||||
sources=["finnhub", "google_news"],
|
||||
)
|
||||
|
||||
# Convert repository data to ArticleData objects
|
||||
for _date_key, news_data_list in news_data_by_date.items():
|
||||
for news_data in news_data_list:
|
||||
for article in news_data.articles:
|
||||
articles.append(
|
||||
ArticleData(
|
||||
title=article.headline,
|
||||
content=article.summary
|
||||
or "", # Use summary as fallback for content
|
||||
author=article.author or "",
|
||||
source=article.source,
|
||||
date=article.published_date.isoformat(),
|
||||
url=article.url,
|
||||
sentiment=None, # Will be calculated later
|
||||
)
|
||||
)
|
||||
|
||||
logger.debug(
|
||||
f"Retrieved {len(articles)} articles from repository for {symbol}"
|
||||
)
|
||||
|
|
@ -218,13 +245,39 @@ class NewsService:
|
|||
f"Getting global news context from repository for categories: {categories}"
|
||||
)
|
||||
|
||||
# Get articles from repository
|
||||
# Get articles from repository (READ PATH - no API calls)
|
||||
articles = []
|
||||
if self.repository:
|
||||
try:
|
||||
# This would depend on the actual repository interface
|
||||
# For now, return empty list - repository integration needs to be completed
|
||||
articles = []
|
||||
# Convert date strings to date objects
|
||||
start_date_obj = date.fromisoformat(start_date)
|
||||
end_date_obj = date.fromisoformat(end_date)
|
||||
|
||||
# Get cached news data from repository for each category
|
||||
for category in categories:
|
||||
news_data_by_date = self.repository.get_news_data(
|
||||
query=category,
|
||||
start_date=start_date_obj,
|
||||
end_date=end_date_obj,
|
||||
sources=["google_news"], # Global news mainly from Google
|
||||
)
|
||||
|
||||
# Convert repository data to ArticleData objects
|
||||
for _date_key, news_data_list in news_data_by_date.items():
|
||||
for news_data in news_data_list:
|
||||
for article in news_data.articles:
|
||||
articles.append(
|
||||
ArticleData(
|
||||
title=article.headline,
|
||||
content=article.summary or "",
|
||||
author=article.author or "",
|
||||
source=article.source,
|
||||
date=article.published_date.isoformat(),
|
||||
url=article.url,
|
||||
sentiment=None,
|
||||
)
|
||||
)
|
||||
|
||||
logger.debug(
|
||||
f"Retrieved {len(articles)} global articles from repository"
|
||||
)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,31 @@
|
|||
"""Type stubs for newspaper (newspaper4k package)."""
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
class Config:
|
||||
"""Configuration for newspaper Article."""
|
||||
|
||||
browser_user_agent: str
|
||||
request_timeout: int
|
||||
fetch_images: bool
|
||||
|
||||
def __init__(self) -> None: ...
|
||||
|
||||
class Article:
|
||||
"""Article class for parsing web articles."""
|
||||
|
||||
text: str
|
||||
title: str | None
|
||||
authors: list[str]
|
||||
publish_date: datetime | None
|
||||
top_image: str | None
|
||||
movies: list[str]
|
||||
keywords: list[str]
|
||||
summary: str
|
||||
|
||||
def __init__(self, url: str, config: Config | None = None) -> None: ...
|
||||
def download(self) -> None: ...
|
||||
def parse(self) -> None: ...
|
||||
def nlp(self) -> None: ...
|
||||
|
||||
def article(url: str) -> Article: ...
|
||||
41
uv.lock
41
uv.lock
|
|
@ -633,17 +633,6 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/32/b6/7517af5234378518f27ad35a7b24af9591bc500b8c1780929c1295999eb6/fastapi-0.115.9-py3-none-any.whl", hash = "sha256:4a439d7923e4de796bcc88b64e9754340fcd1574673cbd865ba8a99fe0d28c56", size = 94919, upload-time = "2025-02-27T16:43:40.537Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "feedfinder2"
|
||||
version = "0.0.4"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "beautifulsoup4" },
|
||||
{ name = "requests" },
|
||||
{ name = "six" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/35/82/1251fefec3bb4b03fd966c7e7f7a41c9fc2bb00d823a34c13f847fd61406/feedfinder2-0.0.4.tar.gz", hash = "sha256:3701ee01a6c85f8b865a049c30ba0b4608858c803fe8e30d1d289fdbe89d0efe", size = 3297, upload-time = "2016-01-25T15:09:17.492Z" }
|
||||
|
||||
[[package]]
|
||||
name = "feedparser"
|
||||
version = "6.0.11"
|
||||
|
|
@ -1049,12 +1038,6 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/2c/e1/e6716421ea10d38022b952c159d5161ca1193197fb744506875fbb87ea7b/iniconfig-2.1.0-py3-none-any.whl", hash = "sha256:9deba5723312380e77435581c6bf4935c94cbfab9b1ed33ef8d238ea168eb760", size = 6050, upload-time = "2025-03-19T20:10:01.071Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "jieba3k"
|
||||
version = "0.35.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/a9/cb/2c8332bcdc14d33b0bedd18ae0a4981a069c3513e445120da3c3f23a8aaa/jieba3k-0.35.1.zip", hash = "sha256:980a4f2636b778d312518066be90c7697d410dd5a472385f5afced71a2db1c10", size = 7423646, upload-time = "2014-11-15T05:47:47.978Z" }
|
||||
|
||||
[[package]]
|
||||
name = "jinja2"
|
||||
version = "3.1.6"
|
||||
|
|
@ -1700,27 +1683,25 @@ wheels = [
|
|||
]
|
||||
|
||||
[[package]]
|
||||
name = "newspaper3k"
|
||||
version = "0.2.8"
|
||||
name = "newspaper4k"
|
||||
version = "0.9.3.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "beautifulsoup4" },
|
||||
{ name = "cssselect" },
|
||||
{ name = "feedfinder2" },
|
||||
{ name = "feedparser" },
|
||||
{ name = "jieba3k" },
|
||||
{ name = "lxml" },
|
||||
{ name = "nltk" },
|
||||
{ name = "numpy" },
|
||||
{ name = "pandas" },
|
||||
{ name = "pillow" },
|
||||
{ name = "python-dateutil" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "requests" },
|
||||
{ name = "tinysegmenter" },
|
||||
{ name = "tldextract" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/ce/fb/8f8525be0cafa48926e85b0c06a7cb3e2a892d340b8036f8c8b1b572df1c/newspaper3k-0.2.8.tar.gz", hash = "sha256:9f1bd3e1fb48f400c715abf875cc7b0a67b7ddcd87f50c9aeeb8fcbbbd9004fb", size = 205685, upload-time = "2018-09-28T04:58:23.53Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/af/a8/80a186f09ffa2a9366ed93391b03fdaf8057d75a67a21c2eafef36b654ba/newspaper4k-0.9.3.1.tar.gz", hash = "sha256:fc237ae6a7b65d5ac4df224f962b2d7368c991fdf63b5176e439a1b74a2992e0", size = 273009, upload-time = "2024-03-18T21:56:46.344Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/d7/b9/51afecb35bb61b188a4b44868001de348a0e8134b4dfa00ffc191567c4b9/newspaper3k-0.2.8-py3-none-any.whl", hash = "sha256:44a864222633d3081113d1030615991c3dbba87239f6bbf59d91240f71a22e3e", size = 211132, upload-time = "2018-09-28T04:58:18.847Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ab/73/cc4e7a57373e6940fc081d4f36988e3faa54c59a51dea4e8f01d5c10ccb6/newspaper4k-0.9.3.1-py3-none-any.whl", hash = "sha256:42a03b7915d92941a9fe4cc8dab47240219560e0cb8ecb5a291dc5a913eb8aa4", size = 296617, upload-time = "2024-03-18T21:56:43.932Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -3443,12 +3424,6 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/de/a8/8f499c179ec900783ffe133e9aab10044481679bb9aad78436d239eee716/tiktoken-0.9.0-cp313-cp313-win_amd64.whl", hash = "sha256:5ea0edb6f83dc56d794723286215918c1cde03712cbbafa0348b33448faf5b95", size = 894669, upload-time = "2025-02-14T06:02:47.341Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tinysegmenter"
|
||||
version = "0.3"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/17/82/86982e4b6d16e4febc79c2a1d68ee3b707e8a020c5d2bc4af8052d0f136a/tinysegmenter-0.3.tar.gz", hash = "sha256:ed1f6d2e806a4758a73be589754384cbadadc7e1a414c81a166fc9adf2d40c6d", size = 16893, upload-time = "2017-07-23T11:18:29.85Z" }
|
||||
|
||||
[[package]]
|
||||
name = "tldextract"
|
||||
version = "5.3.0"
|
||||
|
|
@ -3591,7 +3566,7 @@ dependencies = [
|
|||
{ name = "langchain-google-genai" },
|
||||
{ name = "langchain-openai" },
|
||||
{ name = "langgraph" },
|
||||
{ name = "newspaper3k" },
|
||||
{ name = "newspaper4k" },
|
||||
{ name = "pandas" },
|
||||
{ name = "parsel" },
|
||||
{ name = "praw" },
|
||||
|
|
@ -3642,7 +3617,7 @@ requires-dist = [
|
|||
{ name = "langchain-google-genai", specifier = ">=2.1.5" },
|
||||
{ name = "langchain-openai", specifier = ">=0.3.23" },
|
||||
{ name = "langgraph", specifier = ">=0.4.8" },
|
||||
{ name = "newspaper3k", specifier = ">=0.2.8" },
|
||||
{ name = "newspaper4k", specifier = ">=0.9.3" },
|
||||
{ name = "pandas", specifier = ">=2.3.0" },
|
||||
{ name = "parsel", specifier = ">=1.10.0" },
|
||||
{ name = "praw", specifier = ">=7.8.1" },
|
||||
|
|
|
|||
Loading…
Reference in New Issue