Partitioning tables

Table partitioning is a technique that allows you to divide a large table into smaller, more manageable parts called “partitions”.

Each partition contains a subset of the data based on a specified criteria, such as a range of values or a specific condition. Partitioning can significantly improve query performance and simplify data management for large datasets.

Benefits of table partitioning

Improved query performance: allows queries to target specific partitions, reducing the amount of data scanned and improving query execution time.
Scalability: With partitioning, you can add or remove partitions as your data grows or changes, enabling better scalability and flexibility.
Efficient data management: simplifies tasks such as data loading, archiving, and deletion by operating on smaller partitions instead of the entire table.
Enhanced maintenance operations: can optimize vacuuming and indexing, leading to faster maintenance tasks.

Partitioning methods

Postgres supports various partitioning methods based on how you want to partition your data. The commonly used methods are:

Range Partitioning: Data is divided into partitions based on a specified range of values. For example, you can partition a sales table by date, where each partition represents a specific time range (e.g., one partition for each month).
List Partitioning: Data is divided into partitions based on a specified list of values. For instance, you can partition a customer table by region, where each partition contains customers from a specific region (e.g., one partition for customers in the US, another for customers in Europe).
Hash Partitioning: Data is distributed across partitions using a hash function. This method provides a way to evenly distribute data among partitions, which can be useful for load balancing. However, it doesn't allow direct querying based on specific values.

Creating partitioned tables

Let's consider an example of range partitioning for a sales table based on the order date. We'll create monthly partitions to store data for each month:


_19create table sales (
_19    id bigint generated by default as identity,
_19    order_date date not null,
_19    customer_id bigint,
_19    amount bigint,
_19
_19    -- We need to include all the
_19    -- partitioning columns in constraints:
_19    primary key (order_date, id)
_19)
_19partition by range (order_date);
_19
_19create table sales_2000_01
_19	partition of sales
_19  for values from ('2000-01-01') to ('2000-02-01');
_19
_19create table sales_2000_02
_19	partition of sales
_19	for values from ('2000-02-01') to ('2000-03-01');

To create a partitioned table you append partition by range (<column_name>) to the table creation statement. The column that you are partitioning with must be included in any unique index, which is the reason why we specify a composite primary key here (primary key (order_date, id)).

Querying partitioned tables

To query a partitioned table, you have two options:

Querying the parent table
Querying specific partitions

Querying the parent table

When you query the parent table, Postgres automatically routes the query to the relevant partitions based on the conditions specified in the query. This allows you to retrieve data from all partitions simultaneously.

Example:


_10select *
_10from sales
_10where order_date >= '2000-01-01' and order_date < '2000-03-01';

This query will retrieve data from both the sales_2000_01 and sales_2000_02 partitions.

Querying specific partitions

If you only need to retrieve data from a specific partition, you can directly query that partition instead of the parent table. This approach is useful when you want to target a specific range or condition within a partition.


_10select *
_10from sales_2000_02;

This query will retrieve data only from the sales_2000_02 partition.

When to partition your tables

There is no real threshold to determine when you should use partitions. Partitions introduce complexity, and complexity should be avoided until it's needed. A few guidelines:

If you are considering performance, avoid partitions until you see performance degradation on non-partitioned tables.
If you are using partitions as a management tool, it's fine to create the partitions any time.
If you don't know how you should partition your data, then it's probably too early.

Examples

Here are simple examples for each of the partitioning types in Postgres.

Range partitioning

Let's consider a range partitioning example for a table that stores sales data based on the order date. We'll create monthly partitions to store data for each month.

In this example, the sales table is partitioned into two partitions: sales_january and sales_february. The data in these partitions is based on the specified range of order dates:


_19create table sales (
_19    id bigint generated by default as identity,
_19    order_date date not null,
_19    customer_id bigint,
_19    amount bigint,
_19
_19    -- We need to include all the
_19    -- partitioning columns in constraints:
_19    primary key (order_date, id)
_19)
_19partition by range (order_date);
_19
_19create table sales_2000_01
_19	partition of sales
_19  for values from ('2000-01-01') to ('2000-02-01');
_19
_19create table sales_2000_02
_19	partition of sales
_19	for values from ('2000-02-01') to ('2000-03-01');

List partitioning

Let's consider a list partitioning example for a table that stores customer data based on their region. We'll create partitions to store customers from different regions.

In this example, the customers table is partitioned into two partitions: customers_americas and customers_asia. The data in these partitions is based on the specified list of regions:


_19-- Create the partitioned table
_19create table customers (
_19    id bigint generated by default as identity,
_19    name text,
_19    country text,
_19
_19    -- We need to include all the
_19    -- partitioning columns in constraints:
_19    primary key (country, id)
_19)
_19partition by list(country);
_19
_19create table customers_americas
_19	partition of customers
_19	for values in ('US', 'CANADA');
_19
_19create table customers_asia
_19	partition of customers
_19  for values in ('INDIA', 'CHINA', 'JAPAN');

Hash partitioning

You can use hash partitioning to evenly distribute data.

In this example, the products table is partitioned into two partitions: products_one and products_two. The data is distributed across these partitions using a hash function:


_15create table products (
_15    id bigint generated by default as identity,
_15    name text,
_15    category text,
_15    price bigint
_15)
_15partition by hash (id);
_15
_15create table products_one
_15	partition of products
_15  for values with (modulus 2, remainder 1);
_15
_15create table products_two
_15	partition of products
_15  for values with (modulus 2, remainder 0);

Other tools

There are several other tools available for Postgres partitioning, most notably pg_partman. Native partitioning was introduced in Postgres 10 and is generally thought to have better performance.

Partitioning tables

Benefits of table partitioning#

Partitioning methods#

Creating partitioned tables#

Querying partitioned tables#

Querying the parent table#

Querying specific partitions#

When to partition your tables#

Examples#

Range partitioning#

List partitioning#

Hash partitioning#

Other tools#