ClickHouse Partnership, improved Postgres Replication, and Disk Management

30 Oct 2024

6 minute read

We're seeing an emerging trend for AI customers: Postgres and ClickHouse is becoming the “default data stack”.

This makes sense - AI companies typically generate a lot of logs and analytical data, which is better suited for an OLAP database like ClickHouse.

The combination of Supabase and ClickHouse together are perfect for Helicone.ai, providing the flexibility of Postgres with the analytical power of ClickHouse — an open-source stack we can trust and customize.

Justin Torre, CTO @ Helicone.ai avatar

Justin Torre, CTO @ Helicone.ai

Supabase + ClickHouse Partnership

The partnership between Supabase and ClickHouse aims to create a seamless experience, building on the already solid Postgres + ClickHouse foundation. Today, we're releasing new features to enhance this integration.

ClickHouse is very excited to partner with Supabase to make it easy for customers to use both technologies together. Through this partnership, we aim to make it even simpler for Postgres developers to use ClickHouse in conjunction and build real-time, data-driven applications at scale.

Aaron Katz, CEO @ ClickHouse Inc. avatar

Aaron Katz, CEO @ ClickHouse Inc.

Using Postgres and ClickHouse together

Before diving into those changes, some context on how most customers use Supabase and ClickHouse together. While both are databases, they serve different use-cases:

  • Postgres: Ideal for storing and querying application data, powering critical transactional and web app use cases.
  • ClickHouse: Optimized for analytics and reporting, supporting both customer-facing and internal applications

Postgres is a row-oriented database, ClickHouse is column-oriented. The ClickHouse team have a great write up about the difference between the two formats.

To provide an interface between these, Supabase customers generally use:

  1. clickhouse_fdw to query their ClickHouse data from their Postgres database.
  2. PeerDB to replicate their data from Postgres to ClickHouse.

Improving the ClickHouse & Supabase experience

We're making a number of changes to our platform based on the feedback we've had from customers.

Updated ClickHouse Foreign Data Wrapper

Using the ClickHouse FDW, you can directly query your ClickHouse database from Postgres:


_11
-- Connect Postgres to your ClickHouse database:
_11
create foreign table user_analytics (
_11
id bigint,
_11
user_id bigint,
_11
event text
_11
)
_11
server clickhouse_server
_11
options ( table 'UserAnalytics' );
_11
_11
-- Query your ClickHouse instance from Postgres:
_11
select * from user_analytics where user_id = 1;

This means you can query your ClickHouse data using the Postgres tooling that you're familiar with.

The Wrapper now has support for ClickHouse Parameterized Views. With this update, you can pass query parameters directly to ClickHouse, taking full advantage of its analytical engine::


_10
create foreign table user_analytics (
_10
id bigint,
_10
user_id bigint,
_10
_event text,
_10
)
_10
server clickhouse_server
_10
options ( table '(select * from UserAnalytics(event=${_event}))' );
_10
_10
select * from user_analytics where _event='button_click';

More granular replication control

Many of our customers use PeerDB to replicate data from Postgres to ClickHouse. This has occasionally presented challenges, particularly with Postgres's default 1GB WAL size, which, for large data volumes, can result in data loss if the WAL exceeds this size.

To resolve this, we've added 13 configurable Postgres parameters, enabling you to adjust replication settings through the CLI. For example, you can increase the default WAL size to 2GB:


_10
supabase --experimental --project-ref xxxx-yyy \
_10
postgres-config update --config max_slot_wal_keep_size=2GB

The new CLI config includes the following Postgres parameters:

  1. logical_decoding_work_mem: Controls memory used during logical decoding.
  2. max_connections: Limits total connections to the Postgres server.
  3. max_locks_per_transaction: Sets the maximum locks allowed in a single transaction.
  4. max_replication_slots: Defines the number of replication slots for data streaming.
  5. max_slot_wal_keep_size: Limits disk space for WAL in replication slots.
  6. max_standby_archive_delay: Sets how long standby servers can wait for archive recovery.
  7. max_standby_streaming_delay: Controls delay on standby servers for streaming replication.
  8. max_wal_size: Specifies the maximum size of the Write Ahead Log.
  9. max_wal_senders: Sets the maximum number of processes sending WAL data.
  10. max_worker_processes: Defines the number of background worker processes.
  11. shared_buffers: Determines the amount of memory for shared buffers.
  12. wal_keep_size: Sets minimum WAL size to keep for standby servers.
  13. wal_sender_timeout: Specifies the timeout for inactive WAL sender processes.

Improved Disk Management

Supabase now provides granular control over disk usage for your Postgres database:

This is driven directly by customers using tools like PeerDB. With adjustable WAL configuration, it's important that developers can manage the disk as well. For example, on the Pro Plan's 8GB disk, you can configure your project with options like:

  • Default: 7GB database space, 1GB Write Ahead Log
  • Custom example: 6GB database space, 2 GB Write Ahead Log

Additionally, we're introducing High-performance Disks. We'll release more details about this later.

ClickHouse platform updates

The ClickHouse team have also been busy. They've released a number of updates to their platform, including:

  1. A native Supabase OAuth integration in PeerDB for Postgres CDC to ClickHouse.
  2. Support for IPV6 in PeerDB Cloud.

You can learn more about these features in the Supabase Partnership post they released today.

What's next?

Improving the experience between Postgres and ClickHouse is the first phase of this partnership. We're already working on native platform integrations. If you're using (or plan to use) Supabase and ClickHouse together please reach out - we'd love more design partners to help shape the future of this integration.

If you simply want to try out the tools and updates we've described above, you can get started with all of them, free of charge:

Share this article

Build in a weekend, scale to millions