Managing Indexes in PostgreSQL
An index makes your Postgres queries faster. The index is like a "table of contents" for your data - a reference list which allows queries to quickly locate a row in a given table without needing to scan the entire table (which in large tables can take a long time).
Indexes can be structured in a few different ways. The type of index chosen depends on the values you are indexing. By far the most common index type, and the default in Postgres, is the B-Tree. A B-Tree is the generalized form of a binary search tree, where nodes can have more than two children.
Even though indexes improve query performance, the Postgres query planner may not always make use of a given index when choosing which optimizations to make. Additionally indexes come with some overhead - additional writes and increased storage - so it's useful to understand how and when to use indexes, if at all.
Create an index#
Let's take an example table:
_10create table persons (_10id bigint generated by default as identity primary key,_10age int,_10height int,_10weight int,_10name text,_10deceased boolean_10);
We might want to frequently query users based on their age:
_10select name from persons where age = 32;
Without an index, Postgres will scan every row in the table to find equality matches on age.
You can verify this by doing an explain on the query:
_10explain select name from persons where age = 32;
_10Seq Scan on persons (cost=0.00..22.75 rows=x width=y)_10Filter: (age = 32)
To add a simple B-Tree index you can run:
_10create index idx_persons_age on persons (age);
It can take a long time to build indexes on large datasets and the default behaviour of
create index is to lock the table from writes.
Luckily Postgres provides us with
create index concurrently which prevents blocking writes on the table, but does take a bit longer to build.
Here is a simplified diagram of the index we just created (note that in practice, nodes actually have more than two children).
You can see that in any large data set, traversing the index to locate a given value can be done in much less operations (O(log n)) than compared to scanning the table one value at a time from top to bottom (O(n)).
If you are frequently querying a subset of rows then it may be more efficient to build a partial index. In our example, perhaps we only want to match on
deceased is false. We could build a partial index:
_10create index idx_living_persons_age on persons (age)_10where deceased is false;
By default B-Tree indexes are sorted in ascending order, but sometimes you may want to provide a different ordering. Perhaps our application has a page featuring the top 10 oldest people. Here we would want to sort in descending order, and include
NULL values last. For this we can use:
_10create index idx_persons_age_desc on persons (age desc nulls last);
After a while indexes can become stale and may need rebuilding. Postgres provides a
reindex command for this, but due to Postgres locks being placed on the index during this process, you may want to make use of the
_10reindex index concurrently idx_persons_age;
Alternatively you can reindex all indexes on a particular table:
_10reindex table concurrently persons;
Take note that
reindex can be used inside a transaction, but
reindex [index/table] concurrently cannot.