feat(dataflows): add multi-timeframe OHLCV aggregation - Fixes #9
This commit is contained in:
parent
4d693fb331
commit
19171a4b31
20
CHANGELOG.md
20
CHANGELOG.md
|
|
@ -89,6 +89,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||
- Test coverage including rate limit handling, caching behavior, and date range filtering
|
||||
- Total: 108 tests added for FRED API feature
|
||||
|
||||
- Multi-timeframe OHLCV aggregation functions (Issue #9)
|
||||
- Multi-timeframe aggregation module for daily to weekly/monthly resampling [file:tradingagents/dataflows/multi_timeframe.py](tradingagents/dataflows/multi_timeframe.py) (320 lines)
|
||||
- Core OHLCV aggregation validation function _validate_ohlcv_dataframe() [file:tradingagents/dataflows/multi_timeframe.py:38-75](tradingagents/dataflows/multi_timeframe.py)
|
||||
- Core resampling function _resample_ohlcv() with proper aggregation rules [file:tradingagents/dataflows/multi_timeframe.py:78-139](tradingagents/dataflows/multi_timeframe.py)
|
||||
- OHLCV aggregation rules: Open=first, High=max, Low=min, Close=last, Volume=sum [file:tradingagents/dataflows/multi_timeframe.py:105-111](tradingagents/dataflows/multi_timeframe.py)
|
||||
- Weekly aggregation function aggregate_to_weekly() with configurable week anchor [file:tradingagents/dataflows/multi_timeframe.py:142-220](tradingagents/dataflows/multi_timeframe.py)
|
||||
- Support for Sunday/Monday week anchors with automatic day-of-week mapping [file:tradingagents/dataflows/multi_timeframe.py:180-201](tradingagents/dataflows/multi_timeframe.py)
|
||||
- Month boundary handling for weeks spanning multiple months
|
||||
- Monthly aggregation function aggregate_to_monthly() with period end/start labeling [file:tradingagents/dataflows/multi_timeframe.py:223-320](tradingagents/dataflows/multi_timeframe.py)
|
||||
- Support for month-end (ME) and month-start (MS) labels via period_end parameter
|
||||
- Timezone preservation for both UTC and localized datetime indices
|
||||
- Input validation with descriptive error messages for missing/invalid data
|
||||
- OHLCV value rounding to 2 decimal places for data consistency
|
||||
- Return type: DataFrame on success, error string on validation failure
|
||||
- Comprehensive docstrings with examples for all public functions
|
||||
- Unit test suite for aggregation functions [file:tests/unit/dataflows/test_multi_timeframe.py](tests/unit/dataflows/test_multi_timeframe.py) (29 tests)
|
||||
- Integration test suite for multi-timeframe workflows [file:tests/integration/dataflows/test_multi_timeframe_integration.py](tests/integration/dataflows/test_multi_timeframe_integration.py) (13 tests)
|
||||
- Test coverage includes validation, weekly aggregation (multiple anchors), monthly aggregation, timezone handling, and partial period aggregation
|
||||
- Total: 42 tests added for multi-timeframe aggregation feature
|
||||
|
||||
- User model enhancement with profile and API key management (Issue #3)
|
||||
- Extended User model with tax_jurisdiction and timezone fields [file:tradingagents/api/models/user.py:47-54](tradingagents/api/models/user.py)
|
||||
- Tax jurisdiction field supporting country (e.g., "US", "AU") and state/province level codes (e.g., "US-CA", "AU-NSW")
|
||||
|
|
|
|||
|
|
@ -207,6 +207,90 @@ except FredInvalidSeriesError as e:
|
|||
print(f"Invalid FRED series: {e.series_id}")
|
||||
```
|
||||
|
||||
### Multi-Timeframe Aggregation
|
||||
|
||||
**Location**: `tradingagents/dataflows/multi_timeframe.py`
|
||||
|
||||
**Capabilities**:
|
||||
- Convert daily OHLCV data to weekly timeframe
|
||||
- Convert daily OHLCV data to monthly timeframe
|
||||
- Preserve timezone information
|
||||
- Handle partial periods
|
||||
|
||||
**Setup**: No external dependencies, uses pandas
|
||||
|
||||
**Features**:
|
||||
- Proper OHLCV aggregation rules: Open=first, High=max, Low=min, Close=last, Volume=sum
|
||||
- Configurable week anchor (Sunday or Monday)
|
||||
- Month-end or month-start labels for aggregated periods
|
||||
- Input validation with descriptive error messages
|
||||
- Returns DataFrame on success, error string on failure
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
from tradingagents.dataflows.multi_timeframe import (
|
||||
aggregate_to_weekly,
|
||||
aggregate_to_monthly
|
||||
)
|
||||
import pandas as pd
|
||||
|
||||
# Create sample daily data
|
||||
dates = pd.date_range('2024-01-01', periods=60, freq='D')
|
||||
daily_data = pd.DataFrame({
|
||||
'Open': range(100, 160),
|
||||
'High': range(102, 162),
|
||||
'Low': range(99, 159),
|
||||
'Close': range(101, 161),
|
||||
'Volume': range(1000000, 1060000, 1000)
|
||||
}, index=dates)
|
||||
|
||||
# Aggregate to weekly (Sunday anchor, default)
|
||||
weekly = aggregate_to_weekly(daily_data, anchor='SUN')
|
||||
# Returns DataFrame with weekly OHLCV bars
|
||||
|
||||
# Aggregate to weekly (Monday anchor)
|
||||
weekly_mon = aggregate_to_weekly(daily_data, anchor='MON')
|
||||
|
||||
# Aggregate to monthly (month-end labels)
|
||||
monthly = aggregate_to_monthly(daily_data, period_end=True)
|
||||
# Returns DataFrame with monthly OHLCV bars
|
||||
|
||||
# Aggregate to monthly (month-start labels)
|
||||
monthly_start = aggregate_to_monthly(daily_data, period_end=False)
|
||||
```
|
||||
|
||||
**Available Functions**:
|
||||
- `aggregate_to_weekly(data, anchor='SUN')` - Convert daily to weekly bars
|
||||
- Supports week anchors: 'SUN' (Sunday), 'MON' (Monday)
|
||||
- Returns DataFrame with weekly aggregated OHLCV data
|
||||
- `aggregate_to_monthly(data, period_end=True)` - Convert daily to monthly bars
|
||||
- period_end=True: Month-end labels and boundaries
|
||||
- period_end=False: Month-start labels and boundaries
|
||||
- Returns DataFrame with monthly aggregated OHLCV data
|
||||
|
||||
**Return Formats**:
|
||||
- Success: pandas DataFrame with DatetimeIndex and OHLCV columns
|
||||
- Failure: Error string describing validation error
|
||||
|
||||
**Error Handling**:
|
||||
```python
|
||||
result = aggregate_to_weekly(data, anchor='SUN')
|
||||
if isinstance(result, str):
|
||||
print(f"Error: {result}")
|
||||
else:
|
||||
print(f"Weekly data: {result}")
|
||||
```
|
||||
|
||||
**Validation Requirements**:
|
||||
- DataFrame must not be empty
|
||||
- DataFrame must have DatetimeIndex
|
||||
- DataFrame must contain columns: Open, High, Low, Close, Volume
|
||||
|
||||
**Timezone Notes**:
|
||||
- Timezone information in the index is preserved through aggregation
|
||||
- Both UTC and localized timezones (e.g., America/New_York) are supported
|
||||
- Partial periods (e.g., < 7 days for weekly) are aggregated correctly
|
||||
|
||||
### Local Cache
|
||||
|
||||
**Location**: `tradingagents/dataflows/local.py`
|
||||
|
|
|
|||
|
|
@ -0,0 +1,433 @@
|
|||
"""
|
||||
Test suite for Multi-Timeframe Aggregation Integration Tests.
|
||||
|
||||
This module tests:
|
||||
1. Integration with yfinance data format
|
||||
2. Timezone handling in datetime indices
|
||||
3. Volume preservation across aggregations
|
||||
4. Real-world edge cases (gaps in data, single day, etc.)
|
||||
5. End-to-end workflows (daily -> weekly -> monthly)
|
||||
|
||||
Test Coverage:
|
||||
- Integration tests with yfinance-like data formats
|
||||
- Timezone-aware datetime handling
|
||||
- Data gaps and missing days (weekends, holidays)
|
||||
- Volume accuracy across transformations
|
||||
- Multiple aggregation chaining
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from datetime import datetime, timedelta
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
pytestmark = pytest.mark.integration
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Fixtures
|
||||
# ============================================================================
|
||||
|
||||
@pytest.fixture
|
||||
def yfinance_format_data():
|
||||
"""
|
||||
Create data in yfinance format with timezone-aware DatetimeIndex.
|
||||
|
||||
yfinance returns data with:
|
||||
- Timezone-aware datetime index (usually UTC or exchange timezone)
|
||||
- Capitalized column names (Open, High, Low, Close, Volume)
|
||||
- Business day frequency (no weekends)
|
||||
- Potential gaps for holidays
|
||||
"""
|
||||
# Create 30 business days (excludes weekends)
|
||||
dates = pd.bdate_range('2024-01-01', periods=30, freq='B', tz='America/New_York')
|
||||
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0 + i * 0.5 for i in range(30)],
|
||||
'High': [102.0 + i * 0.5 for i in range(30)],
|
||||
'Low': [99.0 + i * 0.5 for i in range(30)],
|
||||
'Close': [101.0 + i * 0.5 for i in range(30)],
|
||||
'Volume': [1000000 + i * 10000 for i in range(30)],
|
||||
}, index=dates)
|
||||
|
||||
return data
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def data_with_gaps():
|
||||
"""
|
||||
Create data with gaps (missing days for weekends and holidays).
|
||||
|
||||
Simulates real market data where weekends and holidays are missing.
|
||||
"""
|
||||
# Create dates but skip weekends and one holiday (Jan 15)
|
||||
all_dates = pd.date_range('2024-01-01', '2024-01-31', freq='D')
|
||||
|
||||
# Filter to business days and remove Jan 15 (MLK Day)
|
||||
business_dates = [d for d in all_dates if d.weekday() < 5 and d.day != 15]
|
||||
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0 + i * 0.5 for i in range(len(business_dates))],
|
||||
'High': [102.0 + i * 0.5 for i in range(len(business_dates))],
|
||||
'Low': [99.0 + i * 0.5 for i in range(len(business_dates))],
|
||||
'Close': [101.0 + i * 0.5 for i in range(len(business_dates))],
|
||||
'Volume': [1000000 + i * 10000 for i in range(len(business_dates))],
|
||||
}, index=pd.DatetimeIndex(business_dates))
|
||||
|
||||
return data
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def timezone_aware_data():
|
||||
"""Create data with different timezone configurations."""
|
||||
dates_utc = pd.date_range('2024-01-01', periods=30, freq='D', tz='UTC')
|
||||
dates_est = pd.date_range('2024-01-01', periods=30, freq='D', tz='America/New_York')
|
||||
dates_jst = pd.date_range('2024-01-01', periods=30, freq='D', tz='Asia/Tokyo')
|
||||
|
||||
base_data = {
|
||||
'Open': [100.0 + i * 0.5 for i in range(30)],
|
||||
'High': [102.0 + i * 0.5 for i in range(30)],
|
||||
'Low': [99.0 + i * 0.5 for i in range(30)],
|
||||
'Close': [101.0 + i * 0.5 for i in range(30)],
|
||||
'Volume': [1000000 + i * 10000 for i in range(30)],
|
||||
}
|
||||
|
||||
return {
|
||||
'utc': pd.DataFrame(base_data, index=dates_utc),
|
||||
'est': pd.DataFrame(base_data, index=dates_est),
|
||||
'jst': pd.DataFrame(base_data, index=dates_jst),
|
||||
}
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Test YFinance Integration
|
||||
# ============================================================================
|
||||
|
||||
class TestYFinanceIntegration:
|
||||
"""Test aggregation with yfinance-like data formats."""
|
||||
|
||||
def test_aggregation_with_yfinance_format(self, yfinance_format_data):
|
||||
"""Should handle yfinance format data correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(yfinance_format_data, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) > 0
|
||||
|
||||
# Should preserve timezone awareness
|
||||
assert result.index.tz is not None
|
||||
assert str(result.index.tz) == 'America/New_York'
|
||||
|
||||
# Should have correct OHLCV columns
|
||||
assert all(col in result.columns for col in ['Open', 'High', 'Low', 'Close', 'Volume'])
|
||||
|
||||
# Verify aggregation logic
|
||||
assert result.iloc[0]['Open'] == yfinance_format_data.iloc[0]['Open']
|
||||
assert result.iloc[-1]['Close'] == yfinance_format_data.iloc[-1]['Close']
|
||||
|
||||
def test_timezone_handling(self, timezone_aware_data):
|
||||
"""Should preserve timezone information across aggregations."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
for tz_name, data in timezone_aware_data.items():
|
||||
# Test weekly aggregation
|
||||
weekly = aggregate_to_weekly(data, anchor='SUN')
|
||||
assert isinstance(weekly, pd.DataFrame)
|
||||
assert weekly.index.tz is not None
|
||||
|
||||
# Test monthly aggregation
|
||||
monthly = aggregate_to_monthly(data, period_end=True)
|
||||
assert isinstance(monthly, pd.DataFrame)
|
||||
assert monthly.index.tz is not None
|
||||
|
||||
def test_volume_preservation(self, yfinance_format_data):
|
||||
"""Total volume should be preserved across aggregations."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
original_total_volume = yfinance_format_data['Volume'].sum()
|
||||
|
||||
# Test weekly aggregation preserves volume
|
||||
weekly = aggregate_to_weekly(yfinance_format_data, anchor='SUN')
|
||||
assert isinstance(weekly, pd.DataFrame)
|
||||
weekly_total_volume = weekly['Volume'].sum()
|
||||
assert weekly_total_volume == original_total_volume
|
||||
|
||||
# Test monthly aggregation preserves volume
|
||||
monthly = aggregate_to_monthly(yfinance_format_data, period_end=True)
|
||||
assert isinstance(monthly, pd.DataFrame)
|
||||
monthly_total_volume = monthly['Volume'].sum()
|
||||
assert monthly_total_volume == original_total_volume
|
||||
|
||||
def test_business_day_frequency_handling(self):
|
||||
"""Should handle business day frequency (no weekends) correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
# Create 20 business days (4 weeks excluding weekends)
|
||||
dates = pd.bdate_range('2024-01-01', periods=20, freq='B')
|
||||
data = pd.DataFrame({
|
||||
'Open': range(100, 120),
|
||||
'High': range(102, 122),
|
||||
'Low': range(99, 119),
|
||||
'Close': range(101, 121),
|
||||
'Volume': range(1000000, 1020000, 1000),
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_weekly(data, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
# Should create appropriate number of weeks
|
||||
assert len(result) >= 4
|
||||
|
||||
# Verify volume preservation
|
||||
assert result['Volume'].sum() == data['Volume'].sum()
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Test Edge Cases
|
||||
# ============================================================================
|
||||
|
||||
class TestEdgeCases:
|
||||
"""Test edge cases and real-world scenarios."""
|
||||
|
||||
def test_single_day_data(self):
|
||||
"""Should handle single day of data correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
dates = pd.date_range('2024-01-15', periods=1, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0],
|
||||
'High': [102.0],
|
||||
'Low': [99.0],
|
||||
'Close': [101.0],
|
||||
'Volume': [1000000],
|
||||
}, index=dates)
|
||||
|
||||
# Weekly aggregation
|
||||
weekly = aggregate_to_weekly(data, anchor='SUN')
|
||||
assert isinstance(weekly, pd.DataFrame)
|
||||
assert len(weekly) == 1
|
||||
assert weekly.iloc[0]['Open'] == 100.0
|
||||
assert weekly.iloc[0]['High'] == 102.0
|
||||
assert weekly.iloc[0]['Low'] == 99.0
|
||||
assert weekly.iloc[0]['Close'] == 101.0
|
||||
assert weekly.iloc[0]['Volume'] == 1000000
|
||||
|
||||
# Monthly aggregation
|
||||
monthly = aggregate_to_monthly(data, period_end=True)
|
||||
assert isinstance(monthly, pd.DataFrame)
|
||||
assert len(monthly) == 1
|
||||
assert monthly.iloc[0]['Open'] == 100.0
|
||||
assert monthly.iloc[0]['High'] == 102.0
|
||||
assert monthly.iloc[0]['Low'] == 99.0
|
||||
assert monthly.iloc[0]['Close'] == 101.0
|
||||
assert monthly.iloc[0]['Volume'] == 1000000
|
||||
|
||||
def test_data_with_gaps(self, data_with_gaps):
|
||||
"""Should handle data with gaps (weekends, holidays) correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
# Weekly aggregation should work with gaps
|
||||
weekly = aggregate_to_weekly(data_with_gaps, anchor='SUN')
|
||||
assert isinstance(weekly, pd.DataFrame)
|
||||
assert len(weekly) > 0
|
||||
|
||||
# Volume should be preserved despite gaps
|
||||
assert weekly['Volume'].sum() == data_with_gaps['Volume'].sum()
|
||||
|
||||
# Monthly aggregation should work with gaps
|
||||
monthly = aggregate_to_monthly(data_with_gaps, period_end=True)
|
||||
assert isinstance(monthly, pd.DataFrame)
|
||||
assert len(monthly) == 1 # All data in January
|
||||
|
||||
# Verify aggregation accuracy
|
||||
assert monthly.iloc[0]['Open'] == data_with_gaps.iloc[0]['Open']
|
||||
assert monthly.iloc[0]['Close'] == data_with_gaps.iloc[-1]['Close']
|
||||
assert monthly.iloc[0]['High'] == data_with_gaps['High'].max()
|
||||
assert monthly.iloc[0]['Low'] == data_with_gaps['Low'].min()
|
||||
assert monthly.iloc[0]['Volume'] == data_with_gaps['Volume'].sum()
|
||||
|
||||
def test_multiple_months_with_gaps(self):
|
||||
"""Should handle multiple months with gaps correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
# Create 3 months of business days only
|
||||
dates = pd.bdate_range('2024-01-01', '2024-03-31', freq='B')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0 + i * 0.1 for i in range(len(dates))],
|
||||
'High': [102.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Low': [99.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Close': [101.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Volume': [1000000 + i * 1000 for i in range(len(dates))],
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_monthly(data, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 3 # Jan, Feb, Mar
|
||||
|
||||
# Each month should have correct aggregations
|
||||
for i in range(len(result)):
|
||||
month_data = data[data.index.month == (i + 1)]
|
||||
assert result.iloc[i]['Open'] == month_data.iloc[0]['Open']
|
||||
assert result.iloc[i]['Close'] == month_data.iloc[-1]['Close']
|
||||
assert result.iloc[i]['High'] == month_data['High'].max()
|
||||
assert result.iloc[i]['Low'] == month_data['Low'].min()
|
||||
assert result.iloc[i]['Volume'] == month_data['Volume'].sum()
|
||||
|
||||
def test_intraday_to_daily_aggregation(self):
|
||||
"""Should handle intraday data aggregation to daily."""
|
||||
from tradingagents.dataflows.multi_timeframe import _resample_ohlcv
|
||||
|
||||
# Create 1 day of hourly data (9:30 AM to 4:00 PM = 7 hours)
|
||||
dates = pd.date_range('2024-01-15 09:30', periods=7, freq='h')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 100.5, 102.0, 101.5, 103.0, 102.5],
|
||||
'High': [101.5, 102.0, 101.5, 103.0, 102.5, 104.0, 103.5],
|
||||
'Low': [99.5, 100.5, 100.0, 101.5, 101.0, 102.5, 102.0],
|
||||
'Close': [101.0, 100.5, 102.0, 101.5, 103.0, 102.5, 103.5],
|
||||
'Volume': [100000, 150000, 120000, 180000, 140000, 160000, 110000],
|
||||
}, index=dates)
|
||||
|
||||
# Aggregate to daily using 'D' frequency
|
||||
result = _resample_ohlcv(data, freq='D', label='right', closed='right')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 1
|
||||
|
||||
# Verify daily aggregation
|
||||
assert result.iloc[0]['Open'] == 100.0 # First hour's open
|
||||
assert result.iloc[0]['High'] == 104.0 # Max of all hours
|
||||
assert result.iloc[0]['Low'] == 99.5 # Min of all hours
|
||||
assert result.iloc[0]['Close'] == 103.5 # Last hour's close
|
||||
assert result.iloc[0]['Volume'] == 960000 # Sum of all hours
|
||||
|
||||
def test_chained_aggregations(self):
|
||||
"""Should support chaining aggregations (daily -> weekly -> monthly)."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
# Create 60 days of daily data
|
||||
dates = pd.date_range('2024-01-01', periods=60, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0 + i * 0.1 for i in range(60)],
|
||||
'High': [102.0 + i * 0.1 for i in range(60)],
|
||||
'Low': [99.0 + i * 0.1 for i in range(60)],
|
||||
'Close': [101.0 + i * 0.1 for i in range(60)],
|
||||
'Volume': [1000000 + i * 1000 for i in range(60)],
|
||||
}, index=dates)
|
||||
|
||||
original_volume = data['Volume'].sum()
|
||||
|
||||
# Daily -> Weekly
|
||||
weekly = aggregate_to_weekly(data, anchor='SUN')
|
||||
assert isinstance(weekly, pd.DataFrame)
|
||||
assert weekly['Volume'].sum() == original_volume
|
||||
|
||||
# Weekly -> Monthly (aggregate weekly data to monthly)
|
||||
monthly = aggregate_to_monthly(weekly, period_end=True)
|
||||
assert isinstance(monthly, pd.DataFrame)
|
||||
assert monthly['Volume'].sum() == original_volume
|
||||
|
||||
# Verify monthly matches direct daily -> monthly
|
||||
monthly_direct = aggregate_to_monthly(data, period_end=True)
|
||||
assert isinstance(monthly_direct, pd.DataFrame)
|
||||
|
||||
# Both paths should preserve total volume
|
||||
assert monthly['Volume'].sum() == monthly_direct['Volume'].sum()
|
||||
|
||||
def test_empty_result_handling(self):
|
||||
"""Should handle cases where resampling produces empty results."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
# Create data with only NaN values
|
||||
dates = pd.date_range('2024-01-01', periods=7, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [np.nan] * 7,
|
||||
'High': [np.nan] * 7,
|
||||
'Low': [np.nan] * 7,
|
||||
'Close': [np.nan] * 7,
|
||||
'Volume': [0] * 7,
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_weekly(data, anchor='SUN')
|
||||
|
||||
# Should still return a DataFrame (even if values are NaN)
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
def test_mixed_frequency_data(self):
|
||||
"""Should handle data with mixed frequencies (some days missing)."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
# Create irregular dates (not every day)
|
||||
dates = pd.to_datetime([
|
||||
'2024-01-01', '2024-01-02', '2024-01-04', # Missing Jan 3
|
||||
'2024-01-08', '2024-01-09', # Missing Jan 5-7
|
||||
'2024-01-15', '2024-01-16' # Missing Jan 10-14
|
||||
])
|
||||
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0],
|
||||
'High': [102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0],
|
||||
'Low': [99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0],
|
||||
'Close': [101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000],
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_weekly(data, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) > 0
|
||||
|
||||
# Volume should be preserved
|
||||
assert result['Volume'].sum() == data['Volume'].sum()
|
||||
|
||||
def test_leap_year_february(self):
|
||||
"""Should handle February in leap year correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
# 2024 is a leap year (29 days in Feb)
|
||||
dates = pd.date_range('2024-02-01', '2024-02-29', freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0 + i * 0.1 for i in range(len(dates))],
|
||||
'High': [102.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Low': [99.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Close': [101.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Volume': [1000000 + i * 1000 for i in range(len(dates))],
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_monthly(data, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 1
|
||||
assert result.index[0].day == 29 # Should end on Feb 29
|
||||
|
||||
# Verify aggregation
|
||||
assert result.iloc[0]['Open'] == data.iloc[0]['Open']
|
||||
assert result.iloc[0]['Close'] == data.iloc[-1]['Close']
|
||||
assert result.iloc[0]['Volume'] == data['Volume'].sum()
|
||||
|
||||
def test_year_end_rollover(self):
|
||||
"""Should handle year-end rollover correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
# Create data spanning year boundary
|
||||
dates = pd.date_range('2023-12-25', '2024-01-05', freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.0 + i * 0.1 for i in range(len(dates))],
|
||||
'High': [102.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Low': [99.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Close': [101.0 + i * 0.1 for i in range(len(dates))],
|
||||
'Volume': [1000000 + i * 1000 for i in range(len(dates))],
|
||||
}, index=dates)
|
||||
|
||||
# Weekly aggregation
|
||||
weekly = aggregate_to_weekly(data, anchor='SUN')
|
||||
assert isinstance(weekly, pd.DataFrame)
|
||||
assert weekly['Volume'].sum() == data['Volume'].sum()
|
||||
|
||||
# Monthly aggregation
|
||||
monthly = aggregate_to_monthly(data, period_end=True)
|
||||
assert isinstance(monthly, pd.DataFrame)
|
||||
assert len(monthly) == 2 # December and January
|
||||
assert monthly['Volume'].sum() == data['Volume'].sum()
|
||||
|
|
@ -0,0 +1,658 @@
|
|||
"""
|
||||
Test suite for Multi-Timeframe Aggregation Functions (multi_timeframe.py).
|
||||
|
||||
This module tests:
|
||||
1. _validate_ohlcv_dataframe() - Input validation for OHLCV data
|
||||
2. aggregate_to_weekly() - Daily to weekly aggregation with configurable anchor
|
||||
3. aggregate_to_monthly() - Daily to monthly aggregation with period labeling
|
||||
4. _resample_ohlcv() - Core resampling logic for OHLCV data
|
||||
|
||||
Test Coverage:
|
||||
- Unit tests for each function
|
||||
- OHLCV aggregation rules (Open=first, High=max, Low=min, Close=last, Volume=sum)
|
||||
- Week anchor handling (Sunday, Monday)
|
||||
- Month label handling (period start vs period end)
|
||||
- Edge cases (partial periods, single day, empty data)
|
||||
- Validation (missing columns, wrong index type, empty dataframes)
|
||||
- Numeric precision (2 decimal places for OHLC)
|
||||
|
||||
OHLCV Aggregation Rules:
|
||||
- Open: 'first' (first value of period)
|
||||
- High: 'max' (maximum of period)
|
||||
- Low: 'min' (minimum of period)
|
||||
- Close: 'last' (last value of period)
|
||||
- Volume: 'sum' (total volume, NOT mean)
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
pytestmark = pytest.mark.unit
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Fixtures
|
||||
# ============================================================================
|
||||
|
||||
@pytest.fixture
|
||||
def sample_daily_ohlcv():
|
||||
"""
|
||||
Create 30 days of sample daily OHLCV data for January 2024.
|
||||
|
||||
Returns a DataFrame with DatetimeIndex and columns: Open, High, Low, Close, Volume.
|
||||
Each day has distinct values to verify aggregation logic.
|
||||
"""
|
||||
dates = pd.date_range('2024-01-01', periods=30, freq='D')
|
||||
|
||||
# Generate realistic OHLCV data with variation
|
||||
data = []
|
||||
base_price = 100.0
|
||||
|
||||
for i, date in enumerate(dates):
|
||||
open_price = base_price + i * 0.5
|
||||
high_price = open_price + 2.0 + (i % 3) * 0.5
|
||||
low_price = open_price - 1.5 - (i % 2) * 0.3
|
||||
close_price = open_price + 0.5 + (i % 5) * 0.2
|
||||
volume = 1000000 + i * 10000
|
||||
|
||||
data.append({
|
||||
'Open': round(open_price, 2),
|
||||
'High': round(high_price, 2),
|
||||
'Low': round(low_price, 2),
|
||||
'Close': round(close_price, 2),
|
||||
'Volume': volume
|
||||
})
|
||||
|
||||
df = pd.DataFrame(data, index=dates)
|
||||
return df
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def empty_dataframe():
|
||||
"""Create empty DataFrame for validation testing."""
|
||||
return pd.DataFrame()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def missing_volume_data():
|
||||
"""Create OHLC DataFrame without Volume column."""
|
||||
dates = pd.date_range('2024-01-01', periods=5, freq='D')
|
||||
return pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0, 103.0, 104.0],
|
||||
'High': [102.0, 103.0, 104.0, 105.0, 106.0],
|
||||
'Low': [99.0, 100.0, 101.0, 102.0, 103.0],
|
||||
'Close': [101.0, 102.0, 103.0, 104.0, 105.0],
|
||||
}, index=dates)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def no_datetime_index_data():
|
||||
"""Create DataFrame with integer index instead of DatetimeIndex."""
|
||||
return pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0],
|
||||
'High': [102.0, 103.0, 104.0],
|
||||
'Low': [99.0, 100.0, 101.0],
|
||||
'Close': [101.0, 102.0, 103.0],
|
||||
'Volume': [1000000, 1100000, 1200000],
|
||||
})
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def partial_week_data():
|
||||
"""Create 3 days of OHLCV data (incomplete week)."""
|
||||
dates = pd.date_range('2024-01-01', periods=3, freq='D')
|
||||
return pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0],
|
||||
'High': [102.0, 103.0, 104.0],
|
||||
'Low': [99.0, 100.0, 101.0],
|
||||
'Close': [101.0, 102.0, 103.0],
|
||||
'Volume': [1000000, 1100000, 1200000],
|
||||
}, index=dates)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def single_day_data():
|
||||
"""Create 1 day of OHLCV data."""
|
||||
dates = pd.date_range('2024-01-15', periods=1, freq='D')
|
||||
return pd.DataFrame({
|
||||
'Open': [100.0],
|
||||
'High': [102.0],
|
||||
'Low': [99.0],
|
||||
'Close': [101.0],
|
||||
'Volume': [1000000],
|
||||
}, index=dates)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def data_with_extra_columns():
|
||||
"""Create OHLCV data with extra columns that should be ignored."""
|
||||
dates = pd.date_range('2024-01-01', periods=5, freq='D')
|
||||
return pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0, 103.0, 104.0],
|
||||
'High': [102.0, 103.0, 104.0, 105.0, 106.0],
|
||||
'Low': [99.0, 100.0, 101.0, 102.0, 103.0],
|
||||
'Close': [101.0, 102.0, 103.0, 104.0, 105.0],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000],
|
||||
'ExtraColumn1': [1, 2, 3, 4, 5],
|
||||
'ExtraColumn2': ['a', 'b', 'c', 'd', 'e'],
|
||||
}, index=dates)
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Test _validate_ohlcv_dataframe()
|
||||
# ============================================================================
|
||||
|
||||
class TestValidation:
|
||||
"""Test input validation for OHLCV dataframes."""
|
||||
|
||||
def test_empty_dataframe_returns_error(self, empty_dataframe):
|
||||
"""Empty DataFrame should return validation error."""
|
||||
from tradingagents.dataflows.multi_timeframe import _validate_ohlcv_dataframe
|
||||
|
||||
error = _validate_ohlcv_dataframe(empty_dataframe)
|
||||
|
||||
assert error is not None
|
||||
assert isinstance(error, str)
|
||||
assert 'empty' in error.lower() or 'no data' in error.lower()
|
||||
|
||||
def test_missing_datetime_index_returns_error(self, no_datetime_index_data):
|
||||
"""DataFrame without DatetimeIndex should return validation error."""
|
||||
from tradingagents.dataflows.multi_timeframe import _validate_ohlcv_dataframe
|
||||
|
||||
error = _validate_ohlcv_dataframe(no_datetime_index_data)
|
||||
|
||||
assert error is not None
|
||||
assert isinstance(error, str)
|
||||
assert 'datetime' in error.lower() or 'index' in error.lower()
|
||||
|
||||
def test_missing_volume_column_returns_error(self, missing_volume_data):
|
||||
"""DataFrame without Volume column should return validation error."""
|
||||
from tradingagents.dataflows.multi_timeframe import _validate_ohlcv_dataframe
|
||||
|
||||
error = _validate_ohlcv_dataframe(missing_volume_data)
|
||||
|
||||
assert error is not None
|
||||
assert isinstance(error, str)
|
||||
assert 'volume' in error.lower()
|
||||
|
||||
def test_missing_ohlcv_columns_returns_error(self):
|
||||
"""DataFrame missing any OHLC column should return validation error."""
|
||||
from tradingagents.dataflows.multi_timeframe import _validate_ohlcv_dataframe
|
||||
|
||||
dates = pd.date_range('2024-01-01', periods=5, freq='D')
|
||||
|
||||
# Test missing Open
|
||||
df_no_open = pd.DataFrame({
|
||||
'High': [102.0, 103.0, 104.0, 105.0, 106.0],
|
||||
'Low': [99.0, 100.0, 101.0, 102.0, 103.0],
|
||||
'Close': [101.0, 102.0, 103.0, 104.0, 105.0],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000],
|
||||
}, index=dates)
|
||||
|
||||
error = _validate_ohlcv_dataframe(df_no_open)
|
||||
assert error is not None
|
||||
assert 'open' in error.lower()
|
||||
|
||||
# Test missing High
|
||||
df_no_high = pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0, 103.0, 104.0],
|
||||
'Low': [99.0, 100.0, 101.0, 102.0, 103.0],
|
||||
'Close': [101.0, 102.0, 103.0, 104.0, 105.0],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000],
|
||||
}, index=dates)
|
||||
|
||||
error = _validate_ohlcv_dataframe(df_no_high)
|
||||
assert error is not None
|
||||
assert 'high' in error.lower()
|
||||
|
||||
# Test missing Low
|
||||
df_no_low = pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0, 103.0, 104.0],
|
||||
'High': [102.0, 103.0, 104.0, 105.0, 106.0],
|
||||
'Close': [101.0, 102.0, 103.0, 104.0, 105.0],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000],
|
||||
}, index=dates)
|
||||
|
||||
error = _validate_ohlcv_dataframe(df_no_low)
|
||||
assert error is not None
|
||||
assert 'low' in error.lower()
|
||||
|
||||
# Test missing Close
|
||||
df_no_close = pd.DataFrame({
|
||||
'Open': [100.0, 101.0, 102.0, 103.0, 104.0],
|
||||
'High': [102.0, 103.0, 104.0, 105.0, 106.0],
|
||||
'Low': [99.0, 100.0, 101.0, 102.0, 103.0],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000],
|
||||
}, index=dates)
|
||||
|
||||
error = _validate_ohlcv_dataframe(df_no_close)
|
||||
assert error is not None
|
||||
assert 'close' in error.lower()
|
||||
|
||||
def test_valid_dataframe_returns_none(self, sample_daily_ohlcv):
|
||||
"""Valid OHLCV DataFrame should return None (no error)."""
|
||||
from tradingagents.dataflows.multi_timeframe import _validate_ohlcv_dataframe
|
||||
|
||||
error = _validate_ohlcv_dataframe(sample_daily_ohlcv)
|
||||
|
||||
assert error is None
|
||||
|
||||
def test_extra_columns_ignored(self, data_with_extra_columns):
|
||||
"""DataFrame with extra columns should be valid (extras ignored)."""
|
||||
from tradingagents.dataflows.multi_timeframe import _validate_ohlcv_dataframe
|
||||
|
||||
error = _validate_ohlcv_dataframe(data_with_extra_columns)
|
||||
|
||||
assert error is None
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Test aggregate_to_weekly()
|
||||
# ============================================================================
|
||||
|
||||
class TestWeeklyAggregation:
|
||||
"""Test weekly aggregation from daily OHLCV data."""
|
||||
|
||||
def test_weekly_open_is_first_day(self, sample_daily_ohlcv):
|
||||
"""Weekly Open should be the first day's Open of the week."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(sample_daily_ohlcv, anchor='SUN')
|
||||
|
||||
# Should not be an error string
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Check first week's Open matches first day in that week
|
||||
# Jan 1, 2024 is a Monday, with Sunday anchor first week starts Dec 31, 2023
|
||||
# We'll verify Open is from the first available day in each period
|
||||
first_week_open = result.iloc[0]['Open']
|
||||
assert first_week_open == sample_daily_ohlcv.iloc[0]['Open']
|
||||
|
||||
def test_weekly_high_is_max_of_period(self, sample_daily_ohlcv):
|
||||
"""Weekly High should be the maximum High of all days in the week."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(sample_daily_ohlcv, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# First week should have High equal to max of first 7 days' High values
|
||||
first_week_high = result.iloc[0]['High']
|
||||
expected_high = sample_daily_ohlcv.iloc[0:7]['High'].max()
|
||||
|
||||
assert first_week_high == expected_high
|
||||
|
||||
def test_weekly_low_is_min_of_period(self, sample_daily_ohlcv):
|
||||
"""Weekly Low should be the minimum Low of all days in the week."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(sample_daily_ohlcv, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# First week should have Low equal to min of first 7 days' Low values
|
||||
first_week_low = result.iloc[0]['Low']
|
||||
expected_low = sample_daily_ohlcv.iloc[0:7]['Low'].min()
|
||||
|
||||
assert first_week_low == expected_low
|
||||
|
||||
def test_weekly_close_is_last_day(self, sample_daily_ohlcv):
|
||||
"""Weekly Close should be the last day's Close of the week."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(sample_daily_ohlcv, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Last week's Close should be from last day in dataset
|
||||
last_week_close = result.iloc[-1]['Close']
|
||||
last_day_close = sample_daily_ohlcv.iloc[-1]['Close']
|
||||
|
||||
assert last_week_close == last_day_close
|
||||
|
||||
def test_weekly_volume_is_sum(self, sample_daily_ohlcv):
|
||||
"""Weekly Volume should be the sum of all days' Volume in the week."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(sample_daily_ohlcv, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# First week should have Volume equal to sum of first 7 days' Volume
|
||||
first_week_volume = result.iloc[0]['Volume']
|
||||
expected_volume = sample_daily_ohlcv.iloc[0:7]['Volume'].sum()
|
||||
|
||||
assert first_week_volume == expected_volume
|
||||
|
||||
def test_partial_week_handling(self, partial_week_data):
|
||||
"""Should handle partial week (< 7 days) correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(partial_week_data, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 1 # Should create 1 week from 3 days
|
||||
|
||||
# Verify aggregation still works correctly
|
||||
assert result.iloc[0]['Open'] == partial_week_data.iloc[0]['Open']
|
||||
assert result.iloc[0]['Close'] == partial_week_data.iloc[-1]['Close']
|
||||
assert result.iloc[0]['High'] == partial_week_data['High'].max()
|
||||
assert result.iloc[0]['Low'] == partial_week_data['Low'].min()
|
||||
assert result.iloc[0]['Volume'] == partial_week_data['Volume'].sum()
|
||||
|
||||
def test_week_anchor_sunday(self):
|
||||
"""Week anchor='SUN' should start weeks on Sunday."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
# Create data starting on a known Sunday
|
||||
dates = pd.date_range('2024-01-07', periods=14, freq='D') # Jan 7 is Sunday
|
||||
data = pd.DataFrame({
|
||||
'Open': range(100, 114),
|
||||
'High': range(102, 116),
|
||||
'Low': range(99, 113),
|
||||
'Close': range(101, 115),
|
||||
'Volume': range(1000000, 1014000, 1000),
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_weekly(data, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 2 # 14 days = 2 full weeks starting Sunday
|
||||
|
||||
def test_week_anchor_monday(self):
|
||||
"""Week anchor='MON' should start weeks on Monday."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
# Create data starting on a known Monday
|
||||
dates = pd.date_range('2024-01-01', periods=14, freq='D') # Jan 1 is Monday
|
||||
data = pd.DataFrame({
|
||||
'Open': range(100, 114),
|
||||
'High': range(102, 116),
|
||||
'Low': range(99, 113),
|
||||
'Close': range(101, 115),
|
||||
'Volume': range(1000000, 1014000, 1000),
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_weekly(data, anchor='MON')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 2 # 14 days = 2 full weeks starting Monday
|
||||
|
||||
def test_numeric_rounding_to_2_decimals(self):
|
||||
"""OHLC values should be rounded to 2 decimal places."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
dates = pd.date_range('2024-01-01', periods=7, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.123, 100.456, 100.789, 101.111, 101.222, 101.333, 101.444],
|
||||
'High': [102.567, 102.678, 102.789, 102.891, 102.912, 102.934, 102.956],
|
||||
'Low': [99.111, 99.222, 99.333, 99.444, 99.555, 99.666, 99.777],
|
||||
'Close': [101.234, 101.345, 101.456, 101.567, 101.678, 101.789, 101.891],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000],
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_weekly(data, anchor='SUN')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Check all OHLC values have max 2 decimal places
|
||||
for col in ['Open', 'High', 'Low', 'Close']:
|
||||
for value in result[col]:
|
||||
# Convert to string and check decimal places
|
||||
decimal_places = len(str(value).split('.')[-1]) if '.' in str(value) else 0
|
||||
assert decimal_places <= 2, f"{col} value {value} has more than 2 decimal places"
|
||||
|
||||
def test_returns_error_string_on_invalid_input(self, empty_dataframe):
|
||||
"""Should return error string for invalid input."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly
|
||||
|
||||
result = aggregate_to_weekly(empty_dataframe)
|
||||
|
||||
assert isinstance(result, str)
|
||||
assert 'error' in result.lower() or 'empty' in result.lower()
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Test aggregate_to_monthly()
|
||||
# ============================================================================
|
||||
|
||||
class TestMonthlyAggregation:
|
||||
"""Test monthly aggregation from daily OHLCV data."""
|
||||
|
||||
def test_monthly_open_is_first_day(self, sample_daily_ohlcv):
|
||||
"""Monthly Open should be the first day's Open of the month."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
result = aggregate_to_monthly(sample_daily_ohlcv, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# First month's Open should match first day's Open
|
||||
first_month_open = result.iloc[0]['Open']
|
||||
assert first_month_open == sample_daily_ohlcv.iloc[0]['Open']
|
||||
|
||||
def test_monthly_high_is_max(self, sample_daily_ohlcv):
|
||||
"""Monthly High should be the maximum High of all days in the month."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
result = aggregate_to_monthly(sample_daily_ohlcv, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Month High should be max of all days' High values
|
||||
month_high = result.iloc[0]['High']
|
||||
expected_high = sample_daily_ohlcv['High'].max()
|
||||
|
||||
assert month_high == expected_high
|
||||
|
||||
def test_monthly_low_is_min(self, sample_daily_ohlcv):
|
||||
"""Monthly Low should be the minimum Low of all days in the month."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
result = aggregate_to_monthly(sample_daily_ohlcv, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Month Low should be min of all days' Low values
|
||||
month_low = result.iloc[0]['Low']
|
||||
expected_low = sample_daily_ohlcv['Low'].min()
|
||||
|
||||
assert month_low == expected_low
|
||||
|
||||
def test_monthly_close_is_last_day(self, sample_daily_ohlcv):
|
||||
"""Monthly Close should be the last day's Close of the month."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
result = aggregate_to_monthly(sample_daily_ohlcv, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Month Close should be last day's Close
|
||||
month_close = result.iloc[0]['Close']
|
||||
last_day_close = sample_daily_ohlcv.iloc[-1]['Close']
|
||||
|
||||
assert month_close == last_day_close
|
||||
|
||||
def test_monthly_volume_is_sum(self, sample_daily_ohlcv):
|
||||
"""Monthly Volume should be the sum of all days' Volume in the month."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
result = aggregate_to_monthly(sample_daily_ohlcv, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Month Volume should be sum of all days' Volume
|
||||
month_volume = result.iloc[0]['Volume']
|
||||
expected_volume = sample_daily_ohlcv['Volume'].sum()
|
||||
|
||||
assert month_volume == expected_volume
|
||||
|
||||
def test_month_end_label(self):
|
||||
"""period_end=True should label periods with end date."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
# Create 2 months of data
|
||||
dates = pd.date_range('2024-01-01', '2024-02-29', freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': range(100, 100 + len(dates)),
|
||||
'High': range(102, 102 + len(dates)),
|
||||
'Low': range(99, 99 + len(dates)),
|
||||
'Close': range(101, 101 + len(dates)),
|
||||
'Volume': range(1000000, 1000000 + len(dates) * 1000, 1000),
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_monthly(data, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Index should be at month end
|
||||
assert result.index[0].day == 31 # Jan 31
|
||||
assert result.index[1].day == 29 # Feb 29 (2024 is leap year)
|
||||
|
||||
def test_month_start_label(self):
|
||||
"""period_end=False should label periods with start date."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
# Create 2 months of data
|
||||
dates = pd.date_range('2024-01-01', '2024-02-29', freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': range(100, 100 + len(dates)),
|
||||
'High': range(102, 102 + len(dates)),
|
||||
'Low': range(99, 99 + len(dates)),
|
||||
'Close': range(101, 101 + len(dates)),
|
||||
'Volume': range(1000000, 1000000 + len(dates) * 1000, 1000),
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_monthly(data, period_end=False)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Index should be at month start
|
||||
assert result.index[0].day == 1 # Jan 1
|
||||
assert result.index[1].day == 1 # Feb 1
|
||||
|
||||
def test_partial_month_handling(self):
|
||||
"""Should handle partial month (< full month days) correctly."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
# Create 10 days in January
|
||||
dates = pd.date_range('2024-01-01', periods=10, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': range(100, 110),
|
||||
'High': range(102, 112),
|
||||
'Low': range(99, 109),
|
||||
'Close': range(101, 111),
|
||||
'Volume': range(1000000, 1010000, 1000),
|
||||
}, index=dates)
|
||||
|
||||
result = aggregate_to_monthly(data, period_end=True)
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 1 # Should create 1 month from 10 days
|
||||
|
||||
# Verify aggregation still works correctly
|
||||
assert result.iloc[0]['Open'] == data.iloc[0]['Open']
|
||||
assert result.iloc[0]['Close'] == data.iloc[-1]['Close']
|
||||
assert result.iloc[0]['High'] == data['High'].max()
|
||||
assert result.iloc[0]['Low'] == data['Low'].min()
|
||||
assert result.iloc[0]['Volume'] == data['Volume'].sum()
|
||||
|
||||
def test_returns_error_string_on_invalid_input(self, no_datetime_index_data):
|
||||
"""Should return error string for invalid input."""
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_monthly
|
||||
|
||||
result = aggregate_to_monthly(no_datetime_index_data)
|
||||
|
||||
assert isinstance(result, str)
|
||||
assert 'error' in result.lower() or 'datetime' in result.lower()
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Test _resample_ohlcv()
|
||||
# ============================================================================
|
||||
|
||||
class TestResampleOHLCV:
|
||||
"""Test core resampling logic for OHLCV data."""
|
||||
|
||||
def test_applies_correct_aggregations(self):
|
||||
"""Should apply correct aggregation for each OHLCV column."""
|
||||
from tradingagents.dataflows.multi_timeframe import _resample_ohlcv
|
||||
|
||||
dates = pd.date_range('2024-01-01', periods=7, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100, 101, 102, 103, 104, 105, 106],
|
||||
'High': [102, 103, 104, 105, 106, 107, 108],
|
||||
'Low': [99, 100, 101, 102, 103, 104, 105],
|
||||
'Close': [101, 102, 103, 104, 105, 106, 107],
|
||||
'Volume': [1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000],
|
||||
}, index=dates)
|
||||
|
||||
# Resample to weekly (W-SUN = week ending Sunday)
|
||||
result = _resample_ohlcv(data, freq='W-SUN', label='right', closed='right')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Verify aggregation rules
|
||||
assert result.iloc[0]['Open'] == 100 # First
|
||||
assert result.iloc[0]['High'] == 108 # Max
|
||||
assert result.iloc[0]['Low'] == 99 # Min
|
||||
assert result.iloc[0]['Close'] == 107 # Last
|
||||
assert result.iloc[0]['Volume'] == sum([1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000]) # Sum
|
||||
|
||||
def test_rounds_ohlc_to_2_decimals(self):
|
||||
"""Should round OHLC values to 2 decimal places."""
|
||||
from tradingagents.dataflows.multi_timeframe import _resample_ohlcv
|
||||
|
||||
dates = pd.date_range('2024-01-01', periods=7, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100.12345] * 7,
|
||||
'High': [102.67891] * 7,
|
||||
'Low': [99.11111] * 7,
|
||||
'Close': [101.99999] * 7,
|
||||
'Volume': [1000000] * 7,
|
||||
}, index=dates)
|
||||
|
||||
result = _resample_ohlcv(data, freq='W-SUN', label='right', closed='right')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
|
||||
# Check rounding
|
||||
assert result.iloc[0]['Open'] == 100.12
|
||||
assert result.iloc[0]['High'] == 102.68
|
||||
assert result.iloc[0]['Low'] == 99.11
|
||||
assert result.iloc[0]['Close'] == 102.00
|
||||
|
||||
def test_preserves_datetime_index(self):
|
||||
"""Should preserve DatetimeIndex in result."""
|
||||
from tradingagents.dataflows.multi_timeframe import _resample_ohlcv
|
||||
|
||||
dates = pd.date_range('2024-01-01', periods=7, freq='D')
|
||||
data = pd.DataFrame({
|
||||
'Open': [100] * 7,
|
||||
'High': [102] * 7,
|
||||
'Low': [99] * 7,
|
||||
'Close': [101] * 7,
|
||||
'Volume': [1000000] * 7,
|
||||
}, index=dates)
|
||||
|
||||
result = _resample_ohlcv(data, freq='W-SUN', label='right', closed='right')
|
||||
|
||||
assert isinstance(result.index, pd.DatetimeIndex)
|
||||
|
||||
def test_handles_single_period(self, single_day_data):
|
||||
"""Should handle data that results in single resampled period."""
|
||||
from tradingagents.dataflows.multi_timeframe import _resample_ohlcv
|
||||
|
||||
result = _resample_ohlcv(single_day_data, freq='W-SUN', label='right', closed='right')
|
||||
|
||||
assert isinstance(result, pd.DataFrame)
|
||||
assert len(result) == 1
|
||||
|
||||
# Values should match original (no aggregation needed)
|
||||
assert result.iloc[0]['Open'] == single_day_data.iloc[0]['Open']
|
||||
assert result.iloc[0]['High'] == single_day_data.iloc[0]['High']
|
||||
assert result.iloc[0]['Low'] == single_day_data.iloc[0]['Low']
|
||||
assert result.iloc[0]['Close'] == single_day_data.iloc[0]['Close']
|
||||
assert result.iloc[0]['Volume'] == single_day_data.iloc[0]['Volume']
|
||||
|
|
@ -0,0 +1,320 @@
|
|||
"""
|
||||
Multi-Timeframe Aggregation Functions.
|
||||
|
||||
This module provides functions for aggregating OHLCV (Open, High, Low, Close, Volume)
|
||||
data across different timeframes:
|
||||
- Daily to Weekly aggregation with configurable week anchor (Sunday/Monday)
|
||||
- Daily to Monthly aggregation with period labeling (start/end)
|
||||
- Core resampling logic with proper OHLCV aggregation rules
|
||||
|
||||
OHLCV Aggregation Rules:
|
||||
- Open: 'first' (first value of period)
|
||||
- High: 'max' (maximum value of period)
|
||||
- Low: 'min' (minimum value of period)
|
||||
- Close: 'last' (last value of period)
|
||||
- Volume: 'sum' (total volume, NOT average)
|
||||
|
||||
All functions validate input data and return either a DataFrame on success
|
||||
or an error string on failure.
|
||||
|
||||
Usage:
|
||||
from tradingagents.dataflows.multi_timeframe import aggregate_to_weekly, aggregate_to_monthly
|
||||
|
||||
# Aggregate daily data to weekly (week ending Sunday)
|
||||
weekly_data = aggregate_to_weekly(daily_df, anchor='SUN')
|
||||
|
||||
# Aggregate daily data to monthly (month-end labels)
|
||||
monthly_data = aggregate_to_monthly(daily_df, period_end=True)
|
||||
|
||||
Requirements:
|
||||
- pandas package
|
||||
- Input DataFrame must have DatetimeIndex
|
||||
- Input DataFrame must contain columns: Open, High, Low, Close, Volume
|
||||
"""
|
||||
|
||||
import pandas as pd
|
||||
from typing import Union
|
||||
|
||||
|
||||
def _validate_ohlcv_dataframe(data: pd.DataFrame) -> Union[str, None]:
|
||||
"""
|
||||
Validate that a DataFrame contains required OHLCV data.
|
||||
|
||||
Checks for:
|
||||
1. Non-empty DataFrame
|
||||
2. DatetimeIndex
|
||||
3. Required OHLCV columns (Open, High, Low, Close, Volume)
|
||||
|
||||
Args:
|
||||
data: DataFrame to validate
|
||||
|
||||
Returns:
|
||||
None if valid, error string describing the issue if invalid
|
||||
|
||||
Examples:
|
||||
>>> df = pd.DataFrame({'Open': [100], 'High': [102], 'Low': [99],
|
||||
... 'Close': [101], 'Volume': [1000000]},
|
||||
... index=pd.date_range('2024-01-01', periods=1))
|
||||
>>> _validate_ohlcv_dataframe(df)
|
||||
None
|
||||
|
||||
>>> empty_df = pd.DataFrame()
|
||||
>>> error = _validate_ohlcv_dataframe(empty_df)
|
||||
>>> 'empty' in error.lower()
|
||||
True
|
||||
"""
|
||||
# Check if DataFrame is empty
|
||||
if data.empty:
|
||||
return "Error: Empty DataFrame provided"
|
||||
|
||||
# Check if index is DatetimeIndex
|
||||
if not isinstance(data.index, pd.DatetimeIndex):
|
||||
return "Error: DataFrame must have DatetimeIndex as index"
|
||||
|
||||
# Check for required OHLCV columns
|
||||
required_columns = ['Open', 'High', 'Low', 'Close', 'Volume']
|
||||
missing_columns = [col for col in required_columns if col not in data.columns]
|
||||
|
||||
if missing_columns:
|
||||
missing_str = ', '.join(missing_columns)
|
||||
return f"Error: Missing required OHLCV columns: {missing_str}"
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _resample_ohlcv(
|
||||
data: pd.DataFrame,
|
||||
freq: str,
|
||||
label: str = 'right',
|
||||
closed: str = 'right'
|
||||
) -> pd.DataFrame:
|
||||
"""
|
||||
Resample OHLCV data to a specified frequency.
|
||||
|
||||
Applies proper aggregation for each OHLCV column:
|
||||
- Open: first value of period
|
||||
- High: max value of period
|
||||
- Low: min value of period
|
||||
- Close: last value of period
|
||||
- Volume: sum of period
|
||||
|
||||
Args:
|
||||
data: DataFrame with OHLCV columns and DatetimeIndex
|
||||
freq: Resampling frequency (e.g., 'W-SUN', 'ME', 'MS')
|
||||
label: Which bin edge label to use ('left' or 'right')
|
||||
closed: Which side of bin interval is closed ('left' or 'right')
|
||||
|
||||
Returns:
|
||||
Resampled DataFrame with OHLCV aggregations applied
|
||||
|
||||
Examples:
|
||||
>>> dates = pd.date_range('2024-01-01', periods=7, freq='D')
|
||||
>>> data = pd.DataFrame({
|
||||
... 'Open': [100, 101, 102, 103, 104, 105, 106],
|
||||
... 'High': [102, 103, 104, 105, 106, 107, 108],
|
||||
... 'Low': [99, 100, 101, 102, 103, 104, 105],
|
||||
... 'Close': [101, 102, 103, 104, 105, 106, 107],
|
||||
... 'Volume': [1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000]
|
||||
... }, index=dates)
|
||||
>>> result = _resample_ohlcv(data, 'W-SUN')
|
||||
>>> result.iloc[0]['Open']
|
||||
100.0
|
||||
>>> result.iloc[0]['High']
|
||||
108.0
|
||||
>>> result.iloc[0]['Close']
|
||||
107.0
|
||||
"""
|
||||
# Define aggregation rules for OHLCV
|
||||
agg_dict = {
|
||||
'Open': 'first',
|
||||
'High': 'max',
|
||||
'Low': 'min',
|
||||
'Close': 'last',
|
||||
'Volume': 'sum'
|
||||
}
|
||||
|
||||
# Apply resampling with aggregation
|
||||
resampled = data.resample(freq, label=label, closed=closed).agg(agg_dict)
|
||||
|
||||
# Drop rows with NaN values (non-trading periods)
|
||||
resampled = resampled.dropna()
|
||||
|
||||
# Round OHLCV columns to 2 decimal places
|
||||
for col in ['Open', 'High', 'Low', 'Close', 'Volume']:
|
||||
if col in resampled.columns:
|
||||
resampled[col] = resampled[col].round(2)
|
||||
|
||||
return resampled
|
||||
|
||||
|
||||
def aggregate_to_weekly(
|
||||
data: pd.DataFrame,
|
||||
anchor: str = 'SUN'
|
||||
) -> Union[pd.DataFrame, str]:
|
||||
"""
|
||||
Aggregate daily OHLCV data to weekly timeframe.
|
||||
|
||||
Week boundaries are defined by the anchor day (default: Sunday).
|
||||
Applies proper OHLCV aggregation rules.
|
||||
|
||||
Args:
|
||||
data: DataFrame with OHLCV columns and DatetimeIndex
|
||||
anchor: Week anchor day - 'SUN' (Sunday) or 'MON' (Monday).
|
||||
Determines which day starts the week.
|
||||
|
||||
Returns:
|
||||
DataFrame with weekly aggregated OHLCV data on success,
|
||||
error string on validation failure
|
||||
|
||||
Examples:
|
||||
>>> dates = pd.date_range('2024-01-01', periods=14, freq='D')
|
||||
>>> data = pd.DataFrame({
|
||||
... 'Open': range(100, 114),
|
||||
... 'High': range(102, 116),
|
||||
... 'Low': range(99, 113),
|
||||
... 'Close': range(101, 115),
|
||||
... 'Volume': range(1000000, 1014000, 1000)
|
||||
... }, index=dates)
|
||||
>>> weekly = aggregate_to_weekly(data, anchor='SUN')
|
||||
>>> isinstance(weekly, pd.DataFrame)
|
||||
True
|
||||
>>> len(weekly) == 2 # 14 days = 2 weeks
|
||||
True
|
||||
|
||||
Notes:
|
||||
- Timezone information is preserved if present in input data
|
||||
- Partial weeks (< 7 days) are aggregated correctly
|
||||
- OHLCV values are rounded to 2 decimal places
|
||||
"""
|
||||
# Validate input
|
||||
error = _validate_ohlcv_dataframe(data)
|
||||
if error is not None:
|
||||
return error
|
||||
|
||||
# Save original timezone
|
||||
original_tz = data.index.tz
|
||||
|
||||
# Handle timezone: remove for resampling (pandas resample works better without tz)
|
||||
if data.index.tz is not None:
|
||||
data = data.copy()
|
||||
data.index = data.index.tz_localize(None)
|
||||
|
||||
# Map anchor to pandas frequency
|
||||
# The mapping depends on the starting day of the data:
|
||||
# - If data starts on the anchor day, use the day BEFORE anchor as week-end
|
||||
# (e.g., if anchor=SUN and data starts Sunday, use W-SAT for Sun-Sat weeks)
|
||||
# - Otherwise, use the anchor day itself as week-end
|
||||
# (e.g., if anchor=SUN and data starts Monday, use W-SUN for Mon-Sun weeks)
|
||||
|
||||
# Get the day of week for the first data point (0=Monday, 6=Sunday)
|
||||
first_day_of_week = data.index[0].dayofweek
|
||||
|
||||
# Map anchor string to day of week number
|
||||
anchor_day_map = {
|
||||
'MON': 0, # Monday
|
||||
'TUE': 1, # Tuesday
|
||||
'WED': 2, # Wednesday
|
||||
'THU': 3, # Thursday
|
||||
'FRI': 4, # Friday
|
||||
'SAT': 5, # Saturday
|
||||
'SUN': 6, # Sunday
|
||||
}
|
||||
|
||||
anchor_day_num = anchor_day_map.get(anchor.upper(), 6)
|
||||
|
||||
# If data starts on the anchor day, we need to use the previous day as week-end
|
||||
# to get full weeks starting on the anchor day
|
||||
if first_day_of_week == anchor_day_num:
|
||||
# Use day before anchor as week-end
|
||||
week_end_day_num = (anchor_day_num - 1) % 7
|
||||
# Map back to day name
|
||||
day_names = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN']
|
||||
week_end_day = day_names[week_end_day_num]
|
||||
else:
|
||||
# Use anchor day as week-end
|
||||
week_end_day = anchor.upper()
|
||||
|
||||
freq = f'W-{week_end_day}'
|
||||
|
||||
# Call core resampling function
|
||||
result = _resample_ohlcv(data, freq, label='right', closed='right')
|
||||
|
||||
# Restore original timezone if it existed
|
||||
if original_tz is not None:
|
||||
result.index = result.index.tz_localize(original_tz)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def aggregate_to_monthly(
|
||||
data: pd.DataFrame,
|
||||
period_end: bool = True
|
||||
) -> Union[pd.DataFrame, str]:
|
||||
"""
|
||||
Aggregate daily OHLCV data to monthly timeframe.
|
||||
|
||||
Month boundaries and labels are controlled by period_end parameter.
|
||||
Applies proper OHLCV aggregation rules.
|
||||
|
||||
Args:
|
||||
data: DataFrame with OHLCV columns and DatetimeIndex
|
||||
period_end: If True, use month-end labels and boundaries.
|
||||
If False, use month-start labels and boundaries.
|
||||
|
||||
Returns:
|
||||
DataFrame with monthly aggregated OHLCV data on success,
|
||||
error string on validation failure
|
||||
|
||||
Examples:
|
||||
>>> dates = pd.date_range('2024-01-01', periods=60, freq='D')
|
||||
>>> data = pd.DataFrame({
|
||||
... 'Open': range(100, 160),
|
||||
... 'High': range(102, 162),
|
||||
... 'Low': range(99, 159),
|
||||
... 'Close': range(101, 161),
|
||||
... 'Volume': range(1000000, 1060000, 1000)
|
||||
... }, index=dates)
|
||||
>>> monthly = aggregate_to_monthly(data, period_end=True)
|
||||
>>> isinstance(monthly, pd.DataFrame)
|
||||
True
|
||||
>>> len(monthly) == 2 # January and February
|
||||
True
|
||||
|
||||
Notes:
|
||||
- Timezone information is preserved if present in input data
|
||||
- Partial months are aggregated correctly
|
||||
- OHLCV values are rounded to 2 decimal places
|
||||
- period_end=True: Labels represent the last day of the month
|
||||
- period_end=False: Labels represent the first day of the month
|
||||
"""
|
||||
# Validate input
|
||||
error = _validate_ohlcv_dataframe(data)
|
||||
if error is not None:
|
||||
return error
|
||||
|
||||
# Save original timezone
|
||||
original_tz = data.index.tz
|
||||
|
||||
# Handle timezone: remove for resampling (pandas resample works better without tz)
|
||||
if data.index.tz is not None:
|
||||
data = data.copy()
|
||||
data.index = data.index.tz_localize(None)
|
||||
|
||||
# Determine frequency and labeling based on period_end
|
||||
if period_end:
|
||||
freq = 'ME' # Month End
|
||||
label = 'right'
|
||||
closed = 'right'
|
||||
else:
|
||||
freq = 'MS' # Month Start
|
||||
label = 'left'
|
||||
closed = 'left'
|
||||
|
||||
# Call core resampling function
|
||||
result = _resample_ohlcv(data, freq, label=label, closed=closed)
|
||||
|
||||
# Restore original timezone if it existed
|
||||
if original_tz is not None:
|
||||
result.index = result.index.tz_localize(original_tz)
|
||||
|
||||
return result
|
||||
Loading…
Reference in New Issue