How Postgres chooses which index to use
Last edited: 1/17/2025
For the curious: here is a list of all built-in indexes in Postgres
Postgres internals
How an index is chosen
PostgreSQL, internally, contains a few components that manage query execution:
Module | Description |
---|---|
Parser | Converts SQL into an easily traversable query tree |
Planner/Optimizer | Takes the query tree and uses rules and database statistics to find the optimal strategy for getting the data |
Executor | Executes the plan created by the planner |
The planner will consider using an index when an indexed column is present in a filter statement, such as:
WHERE
LIKE
ILIKE
DISTINCT
SIMILAR TO
JOIN
ORDER BY
Otherwise, it will likely perform a full table scan (sequential scan).
In the majority of cases, the indexed column must not only be present but also must be filtered by a comparison operator (=
, >
, <>
) that is compatible with the index.
As an example, one can create the following table:
Column Name | Data Type |
---|---|
id | INT |
data | JSONB |
On the data column, a GIN index can be applied, which is excellent for filtering JSONB datatypes:
_10CREATE INDEX some_arbitary_index_name ON some_table USING gin (data);
Here's a link to the list operators supported by the GIN index; notably, it does not support greater than >
:
_10-- GIN index will never be used_10select *_10from some_table_10where data -> val > 5;
GIN does support the @>
operator:
_10--GIN will be considered_10SELECT id FROM some_table_10WHERE data @> '[ { "itemId": "p11" } ]';
In most cases, developers work with the default BTREE index. It is the most practical and performant in the majority of cases and is compatible with the following filter operators:
Comparison Operator |
---|
< |
<= |
= |
>= |
> |
An operator's functional equivalents, such as IN
, BETWEEN
, and ANY
, are also valid.
However, just because the base requirements (relevant column, filter, and operators) are present, doesn't mean that an index will be used.
Indexes have a startup cost, so for small tables, Postgres might use a sequential scan if it believes that it will take less time. The database keeps statistics about each table that it uses to inform these choices.
In very rare cases, these statistics can become stale, and Postgres may opt to use a slower index or sequential scan when a better option is available.
You can see the query plan with the EXPLAIN
keyword:
_10EXPLAIN <your query>
To understand how to interpret its output, you can check out this explainer.
To reset statistics within the database, you can use the following query:
_10-- use judiciously_10select pg_stat_reset();
Complex or composite indexes
For a more complete rundown, check the Postgres Official Docs
Multi-column indexes
If you make independent indexes on multiple columns, Postgres will likely use each of them independently to find the relevant rows and then combine the results together.
It is possible to make multi-column indexes. If you are regularly filtering against multiple columns, there can be performance benefits using them instead of several independent indexes.
_10-- multi-column index_10create index test2_mm_idx on test2 (major, minor);_10_10-- multi-column comparison:_10select name_10from test2_10where major = constant and minor = constant;
Ordered indexes
If you're using an ORDER BY clause, indexes can also be pre-sorted by DESC/ASC for better performance.
_10-- organizes the index in a DESC order, places NULL values at the end_10CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
Functional indexes
Although not as common, indexes can also be leveraged against modified values, such as when using a LOWER function:
_10-- Index on modified column through function_10create index test1_lower_col1_idx on test1 (lower(col1));_10_10-- Index will be considered for the following query:_10select * from test1 where lower(col1) = 'value';
Covering indexes
Indexes contain pointers to a specific row, but you could instruct an index to actually hold a copy of a column's value for even faster retrieval. These are known as covering
indexes. Because maintaining a copy is storage intensive, you should avoid using it for values with large data footprints. FULL VIDEO ON TOPIC
_10CREATE INDEX a_b_idx ON x (a,b) INCLUDE (c);
Indexes on JSONB
Although a GIN/GIST index can be used to index entire JSONB bodies, you can also target just specific Key-values with standard BTREE indexes:
_10-- Example table_10create table person (_10 id serial primary key,_10 data jsonb_10);_10_10create index index_name on person ((data ->> 'name'));