DuckDB
This feature is in alpha
Expect rapid changes, limited features, and possible breaking updates. Share feedback as we refine the experience and expand access.
DuckDB is a high-performance SQL database system optimized for analytical workloads. It can directly query Iceberg tables stored in your analytics buckets, making it ideal for data exploration and complex analytical queries.
Installation#
Install DuckDB and the Iceberg extension:
1pip install duckdb duckdb-icebergConnecting to Analytics buckets#
Here's a complete example of connecting to your Supabase analytics bucket and querying Iceberg tables:
1import duckdb2import os34# Configuration5PROJECT_REF = "your-project-ref"6WAREHOUSE = "your-analytics-bucket-name"7SERVICE_KEY = "your-service-key"89# S3 credentials10S3_ACCESS_KEY = "your-access-key"11S3_SECRET_KEY = "your-secret-key"12S3_REGION = "us-east-1"1314# Construct endpoints15S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3"16CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg"1718# Initialize DuckDB connection19conn = duckdb.connect(":memory:")2021# Install and load the Iceberg extension22conn.install_extension("iceberg")23conn.load_extension("iceberg")2425# Configure Iceberg catalog with Supabase credentials26conn.execute(f"""27 CREATE SECRET (28 TYPE S3,29 KEY_ID '{S3_ACCESS_KEY}',30 SECRET '{S3_SECRET_KEY}',31 REGION '{S3_REGION}',32 ENDPOINT '{S3_ENDPOINT}',33 URL_STYLE 'virtual'34 );35""")3637# Configure the REST catalog38conn.execute(f"""39 ATTACH 'iceberg://{CATALOG_URI}' AS iceberg_catalog40 (41 TYPE ICEBERG_REST,42 WAREHOUSE '{WAREHOUSE}',43 TOKEN '{SERVICE_KEY}'44 );45""")4647# Query your Iceberg tables48result = conn.execute("""49 SELECT *50 FROM iceberg_catalog.default.events51 LIMIT 1052""").fetchall()5354for row in result:55 print(row)5657# Complex aggregation example58analytics = conn.execute("""59 SELECT60 event_name,61 COUNT(*) as event_count,62 COUNT(DISTINCT user_id) as unique_users63 FROM iceberg_catalog.default.events64 GROUP BY event_name65 ORDER BY event_count DESC66""").fetchdf()6768print(analytics)Key features with DuckDB#
Efficient data exploration#
DuckDB's lazy evaluation means it only scans the data you need:
1# This only reads the columns you select2events = conn.execute("""3 SELECT event_id, event_name, event_timestamp4 FROM iceberg_catalog.default.events5 WHERE event_timestamp > NOW() - INTERVAL '7 days'6""").fetchdf()Converting to Pandas#
Convert results to Pandas DataFrames for further analysis:
1df = conn.execute("""2 SELECT *3 FROM iceberg_catalog.default.events4""").fetchdf()56# Use pandas for visualization or further processing7print(df.describe())Exporting results#
Save your analytical results to various formats:
1# Export to Parquet2conn.execute("""3 COPY (4 SELECT * FROM iceberg_catalog.default.events5 ) TO 'results.parquet'6""")78# Export to CSV9conn.execute("""10 COPY (11 SELECT event_name, COUNT(*) as count12 FROM iceberg_catalog.default.events13 GROUP BY event_name14 ) TO 'summary.csv' (FORMAT CSV, HEADER true)15""")Best practices#
- Connection pooling - Reuse connections for multiple queries
- Partition pruning - Filter by partition columns to improve query performance
- Column selection - Only select columns you need to reduce I/O
- Limit results - Use LIMIT during exploration to avoid processing large datasets