Open Data Standards: Postgres, OTel, and Iceberg

There are three emerging standards in the world of data: Postgres, Open Telemetry, and Iceberg.

Postgres is arguably a standard already. OTel and Iceberg are nascent but they have the same ingredients helped Postgres become the world's most popular database. I've been asked many times "why did Postgres win?". The typical response is "extensibility", which is correct, but incomplete.

Besides being a great product, Postgres has some important open source dynamics: the nuance lies in the open source model.

The Three Tenets of Open Source

I've realized there are three tenets of open source. Developers will assess the "open sourceness" of any project by:

License: is the license OSI-approved.
Self-hosting: how feasible is it to self-host the entire product.
Commercialization: is the project commercially unencumbered and vendor neutral or (even better) is it backed by many commercial entities.

The third tenet took me a while to grasp. Yes, Postgres won because it's a great product, but even better: it cannot be owned by any single entity. The governance and cultural resistance makes it impossible. Postgres is like the International Space Station where several large commercial entities collaborated because no single entity can own it.

Postgres checks the "open source" checkbox, but it's not a magic-bullet for all data-related tasks.

The Three Data Personas

In the data space there are 3 primary "data personas" and their respective tools:

OLTP Databases: used by developers to build apps.
Telemetry: used by SREs to manage infra and optimize applications.
OLAP/Warehousing: used by data engineers/scientists to draw insights.

The data lifecycle typically flows from 1 → 2 → 3: first a developer builds an application, then they add basic telemetry (sometimes just click events in their OLTP database), and eventually their OLTP database becomes large enough that they need a data warehouse.

The three personas are distinct in their way of operating and preferred tools. That said, the industry continues to shift left: as data tooling improves, observability and warehousing is increasingly the domain of the developer. This shift isn't an intentional action by SREs or data engineers. It's because databases are becoming more scalable and developers/startups can cope for longer before hiring specialized help.

The Three Open Data Standards

Around these three data use-cases, I've started to see three open data standards emerging that exhibit the same open source tenets:

OLTP: Postgres
Telemetry: Open Telemetry
OLAP: Iceberg

The second two are interesting because they are "standards" rather than "tools". The dynamic is similar to HTML (standard) and browsers (tools): data is stored/transferred in an agreed format and the tools need to adopt the standard or get left behind.

These standards start as a grass-roots movement and commercial entities are compelled to adopt them through a disruptive technology dilemma:

if they don't adopt the standard, they will miss out on the growing trend.
if they do adopt the standard, they lower the lock-in for their own tool/platform.

This is a fantastic dynamic for the developer community, and one that we strongly believe in: portability forces companies to compete on experience.

Diving deeper into each of these tools:

Postgres is the open OLTP standard

While Postgres is a tool, it has also become a standard. Nearly every new database offering aims for "Postgres wire compatibility". Because Postgres isn't owned by any individual commercial entity, every major cloud can offer it - or even is forced to offer it because it’s a standard. Hell, even Oracle cloud offers it. If you have a bad experience with any particular vendor, you can simply pg_dump your data and take it to another provider. Postgres uses the PostgreSQL license, which is functionally equivalent to MIT.

OTel is the open telemetry standard

It's even in the name "open telemetry". OTel is still nascent and incredibly complicated, but it fits the open source tenets above: the license is Apache 2.0 and it’s vendor neutral. Just as the major cloud providers embraced Postgres, the leading telemetry platforms are now adopting OTel including Datadog, Honeycomb, Grafana Labs, and Elastic. For self-hosting, developers are able to choose from a various open source options like SigNoz, OpenObserve, and the default OTel tools.

Iceberg is the open OLAP standard

Open Table Formats are a relatively new development. They are simply an "agreed format" for organizing large amounts of data. Because the format is agreed, any tool can query it. There are a few competing Open Table Formats including DeltaLake and Hudi, but Iceberg has emerged as the leader.

All the major data warehouses are adopting Iceberg, including Databricks, Snowflake, and ClickHouse. The most important commercial entity however, driving the Third Tenet of Open Source, is AWS. At the end of 2024 AWS announced S3 Tables, which makes it trivial to store data in S3 using Iceberg format in S3.

S3 is the ultimate data infrastructure

Object storage is so cheap that it's becoming the fundamental substrate of all three open data standards. All data tooling built today is adopting some form of interoperability with S3, or an S3-compatible service.

The AWS S3 team are releasing updates that will accelerate the concept of "S3 as a database" - including Conditional Writes and S3 Express which is 10x faster than "standard S3” and recently became 85% cheaper.

The tooling for S3 interoperability differs slightly depending on the use-case:

For OLTP databases, where performance is paramount, there will always be a "disk" layer between S3. There is simply no way that a network disk can operate at the speed of NVMe SSDs. The key interoperability with S3 will be ZeroETL and Tiered Storage: the ability to move "cold" data easily between your operational database and S3. Postgres offers several methods to read the data out of Iceberg, including pg_mooncake, pg_duckdb, and Iceberg Foreign Data Wrappers.
For Telemetry and Warehousing, cardinality is the key factor. As S3 becomes cheaper, businesses are storing exponentially more data in object storage, driving an architecture where storage and compute are decoupled. This has sparked demand for database offerings that serve as compute layers, connecting directly to S3. Most of these are embedded databases, like DuckDB (OLAP), SQLite’s cloud-backed storage, turbopuffer (vectors/embeddings), SlateDB (KV), and Tonbo (Arrow). These embedded databases can run within your application or be used as a standalone tool.

Data at Supabase

Supabase is now relatively well-known as a Postgres provider. We've spent 5 years building a delightful database platform for developers and that will continue to be our focus.

What's unique about Supabase is that we're not just a Postgres provider (despite the memes). We also offer Supabase Storage, an S3-compatible object store. Where we're going is less about a database, and more about data. This includes:

Adding OTel to all of the open source tools we maintain.
Adding Iceberg support in Supabase Storage.
Adding Iceberg support to Supabase ETL for zero-ETL between Postgres and Iceberg.
Adding Iceberg read/write capabilities to Postgres via extensions and FDWs.

Our focus from here is the three Open Data Standards: Postgres, OTel, and Iceberg.

Open Data Standards: Postgres, OTel, and Iceberg

The Three Tenets of Open Source

The Three Data Personas

The Three Open Data Standards

Postgres is the open OLTP standard

OTel is the open telemetry standard

Iceberg is the open OLAP standard

S3 is the ultimate data infrastructure

Data at Supabase

Building on open table formats

Simplifying back-end complexity with Supabase Data APIs

Build in a weekend, scale to millions

Open Data Standards: Postgres, OTel, and Iceberg

The Three Tenets of Open Source#

The Three Data Personas#

The Three Open Data Standards#

Postgres is the open OLTP standard#

OTel is the open telemetry standard#

Iceberg is the open OLAP standard#

S3 is the ultimate data infrastructure#

Data at Supabase#

Building on open table formats

Simplifying back-end complexity with Supabase Data APIs

Build in a weekend, scale to millions

The Three Tenets of Open Source

The Three Data Personas

The Three Open Data Standards

Postgres is the open OLTP standard

OTel is the open telemetry standard

Iceberg is the open OLAP standard

S3 is the ultimate data infrastructure

Data at Supabase