230 lines
7.8 KiB
Markdown
230 lines
7.8 KiB
Markdown
---
|
|
name: timescaledb
|
|
description: Manage time-series data in PostgreSQL with TimescaleDB. Use this skill to install, configure, optimize, and interact with TimescaleDB for high-performance time-series data storage and analysis. This includes creating hypertables, continuous aggregates, handling data retention, and querying time-series data efficiently.
|
|
---
|
|
|
|
# TimescaleDB Skill
|
|
|
|
Manage time-series data in PostgreSQL using TimescaleDB, extending PostgreSQL for high-performance time-series workloads.
|
|
|
|
## When to Use This Skill
|
|
|
|
Use this skill when you need to:
|
|
- Work with time-series data in PostgreSQL
|
|
- Install and configure TimescaleDB
|
|
- Create and manage hypertables
|
|
- Optimize performance for time-series data
|
|
- Implement continuous aggregates for rollup data
|
|
- Manage data retention and compression
|
|
- Query and analyze time-series data
|
|
- Migrate existing PostgreSQL tables to hypertables
|
|
- Integrate with other PostgreSQL tools and extensions
|
|
|
|
## Not For / Boundaries
|
|
|
|
This skill is NOT for:
|
|
- General PostgreSQL administration (use a specific PostgreSQL skill for that)
|
|
- Deep database tuning unrelated to time-series performance
|
|
- Replacing dedicated time-series databases if TimescaleDB's PostgreSQL foundation is not a requirement
|
|
- Providing data visualization beyond basic SQL queries (use a BI tool or separate visualization library)
|
|
|
|
## Quick Reference
|
|
|
|
### Installation & Configuration
|
|
|
|
**Install TimescaleDB Extension (Debian/Ubuntu):**
|
|
```bash
|
|
sudo apt install -y postgresql-{{pg_version}}-timescaledb
|
|
sudo pg_createcluster {{pg_version}} main --start
|
|
sudo pg_ctlcluster {{pg_version}} main start
|
|
sudo -u postgres psql -c "CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;"
|
|
```
|
|
*(Replace `{{pg_version}}` with your PostgreSQL version, e.g., 16)*
|
|
|
|
**Configuration (postgresql.conf):**
|
|
```ini
|
|
# Add to postgresql.conf
|
|
shared_preload_libraries = 'timescaledb'
|
|
timescaledb.max_background_workers = 8 # Adjust based on CPU cores
|
|
max_connections = 100 # Adjust based on workload
|
|
```
|
|
*(After changes, restart PostgreSQL: `sudo systemctl restart postgresql`)*
|
|
|
|
### Hypertables
|
|
|
|
**Create Hypertables:**
|
|
```sql
|
|
CREATE TABLE sensor_data (
|
|
time TIMESTAMPTZ NOT NULL,
|
|
device_id INT,
|
|
temperature DOUBLE PRECISION,
|
|
humidity DOUBLE PRECISION
|
|
);
|
|
|
|
SELECT create_hypertable('sensor_data', 'time');
|
|
```
|
|
|
|
**Convert Existing Table to Hypertable:**
|
|
```sql
|
|
SELECT create_hypertable('your_existing_table', 'time_column', migrate_data => true);
|
|
```
|
|
|
|
**Show Hypertables:**
|
|
```sql
|
|
\d+
|
|
SELECT * FROM timescaledb_information.hypertables;
|
|
```
|
|
|
|
### Continuous Aggregates
|
|
|
|
**Create Continuous Aggregate:**
|
|
```sql
|
|
CREATE MATERIALIZED VIEW device_hourly_summary
|
|
WITH (timescaledb.continuous) AS
|
|
SELECT
|
|
time_bucket('1 hour', time) AS bucket,
|
|
device_id,
|
|
AVG(temperature) AS avg_temp,
|
|
MAX(temperature) AS max_temp
|
|
FROM sensor_data
|
|
GROUP BY time_bucket('1 hour', time), device_id
|
|
WITH NO DATA; -- Initially create without data
|
|
|
|
-- Refresh the continuous aggregate
|
|
CALL refresh_continuous_aggregate('device_hourly_summary', NULL, NULL);
|
|
```
|
|
|
|
**Get Continuous Aggregates Info:**
|
|
```sql
|
|
SELECT * FROM timescaledb_information.continuous_aggregates;
|
|
```
|
|
|
|
### Data Retention & Compression
|
|
|
|
**Set Data Retention Policy (Drop data older than 3 months):**
|
|
```sql
|
|
SELECT add_retention_policy('sensor_data', INTERVAL '3 months');
|
|
```
|
|
|
|
**Enable Compression (Compress data older than 7 days):**
|
|
```sql
|
|
ALTER TABLE sensor_data SET (timescaledb.compress = TRUE);
|
|
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
|
|
```
|
|
|
|
**Show Compression Status:**
|
|
```sql
|
|
SELECT * FROM timescaledb_information.compression_settings;
|
|
```
|
|
|
|
### Querying Time-Series Data
|
|
|
|
**Basic Time-Range Query:**
|
|
```sql
|
|
SELECT * FROM sensor_data
|
|
WHERE time >= NOW() - INTERVAL '1 day'
|
|
AND time < NOW()
|
|
ORDER BY time DESC;
|
|
```
|
|
|
|
**Gapfilling and Interpolation:**
|
|
```sql
|
|
SELECT
|
|
time_bucket('1 hour', time) AS bucket,
|
|
AVG(temperature) AS avg_temp,
|
|
locf(AVG(temperature)) OVER (ORDER BY time_bucket('1 hour', time)) AS avg_temp_locf
|
|
FROM sensor_data
|
|
GROUP BY bucket
|
|
ORDER BY bucket;
|
|
```
|
|
|
|
### High-Performance Queries
|
|
|
|
**Approximate Count:**
|
|
```sql
|
|
SELECT COUNT(*) FROM sensor_data TABLESAMPLE BERNOULLI (1);
|
|
```
|
|
|
|
**Top-N Queries:**
|
|
```sql
|
|
SELECT time, device_id, temperature
|
|
FROM sensor_data
|
|
WHERE time >= NOW() - INTERVAL '1 day'
|
|
ORDER BY temperature DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
## Examples
|
|
|
|
### Example 1: IoT Sensor Data Pipeline
|
|
|
|
- Input: Stream of sensor readings (time, device_id, value)
|
|
- Steps:
|
|
1. Create a hypertable for `iot_readings`.
|
|
2. Ingest data into the hypertable.
|
|
3. Create a continuous aggregate to compute hourly average readings.
|
|
4. Query the continuous aggregate for a specific device's hourly trend.
|
|
5. Set a retention policy to keep only 1 year of raw data.
|
|
- Expected output / acceptance: Efficient storage, automatic hourly rollups, and proper data pruning.
|
|
|
|
### Example 2: Financial Tick Data Analysis
|
|
|
|
- Input: High-frequency financial tick data (timestamp, symbol, price, volume)
|
|
- Steps:
|
|
1. Create a hypertable `tick_data` with proper chunk sizing for high ingest rate.
|
|
2. Enable compression for older `tick_data`.
|
|
3. Query `tick_data` to calculate 5-minute VWAP (Volume Weighted Average Price) for a specific symbol.
|
|
4. Visualize the VWAP over the last trading day.
|
|
- Expected output / acceptance: Ability to ingest and analyze millions of rows/second, with optimized storage and fast analytical queries.
|
|
|
|
### Example 3: Monitoring System Metrics
|
|
|
|
- Input: Server metrics (timestamp, host_id, cpu_usage, memory_usage, network_io)
|
|
- Steps:
|
|
1. Create a hypertable `system_metrics` partitioned by `time` and `host_id`.
|
|
2. Use a `time_bucket_gapfill` query to find CPU usage for all hosts over the last 24 hours, filling in missing data points.
|
|
3. Create an alert based on `MAX(cpu_usage)` exceeding a threshold using a continuous aggregate.
|
|
- Expected output / acceptance: Comprehensive monitoring with gap-filled data for visualization and real-time alerting.
|
|
|
|
## References
|
|
|
|
- `references/installation.md`: Detailed installation and setup
|
|
- `references/hypertables.md`: Deep dive into hypertable management
|
|
- `references/continuous_aggregates.md`: Advanced continuous aggregate techniques
|
|
- `references/compression.md`: Comprehensive guide to data compression
|
|
- `references/api.md`: TimescaleDB SQL functions and commands reference
|
|
- `references/performance.md`: Performance tuning and best practices
|
|
- `references/getting_started.md`: Official TimescaleDB Getting Started Guide
|
|
- `references/llms.md`: Using TimescaleDB with LLMs (e.g., storing embeddings, RAG)
|
|
- `references/llms-full.md`: Full LLM integration scenarios
|
|
- `references/tutorials.md`: Official TimescaleDB Tutorials and Use Cases
|
|
- `references/time_buckets.md`: Guide to `time_bucket` and gapfilling functions
|
|
- `references/hyperfunctions.md`: Advanced analytical functions for time-series
|
|
|
|
## Maintenance
|
|
|
|
- Sources: Official TimescaleDB Documentation, GitHub repository, blog posts.
|
|
- Last updated: 2025-12-17
|
|
- Known limits: This skill focuses on core TimescaleDB features. Advanced PostgreSQL features (e.g., PostGIS, JSONB) are covered by other specialized skills.
|
|
|
|
## Troubleshooting
|
|
|
|
### Slow Queries
|
|
- Ensure indexes are on `time` and other frequently queried columns.
|
|
- Verify chunk sizing is appropriate for your data ingestion rate.
|
|
- Use `EXPLAIN ANALYZE` to identify bottlenecks.
|
|
- Consider creating continuous aggregates for frequently accessed aggregated data.
|
|
|
|
### High Disk Usage
|
|
- Implement data retention policies for older, less critical data.
|
|
- Enable compression for older chunks.
|
|
- Regularly run `VACUUM ANALYZE` on your tables.
|
|
|
|
### Failed to Create Hypertable
|
|
- Ensure the `time` column is `TIMESTAMPTZ` or a supported integer type.
|
|
- The table must be empty or you must use `migrate_data => true`.
|
|
- Check for existing triggers or foreign keys that might conflict.
|
|
|
|
---
|
|
|
|
**This skill provides a robust foundation for managing time-series data with TimescaleDB!** |