Increase vector lookup speeds by applying an HSNW index

Last edited: 1/17/2025

Although this guide is specifically for HSNW indexes, it can be generalized to work for any index type

Building an index without the CONCURRENTLY modifier will lock the table, but it will also increase build times. For general advice about indexes, check out this guide.

To speed up queries, it is ideal to create an HSNW index on your embedded column

The general structure for creating an hsnw index follows this pattern:


_10
CREATE INDEX <custom name of index> ON <table name> USING hnsw (<vectorized column> <search type>);

Search can be one of three types:

operatordescriptionsearch type
<->Euclidean distancevector_l2_ops
<#>negative inner productvector_ip_ops
<=>cosine distancevector_cosine_ops

Queries can only utilize the index if it matches the search type used. If you are unsure which search type to prioritize, vector_cosine_ops is the most commonly used. You can checkout our guide for more info. The folks at Crunchy Data also wrote an explainer that you may find useful.

Applying an index can be slow and computationally expensive, so there are a few preparations that should be made beforehand:

1. Make sure your pgvector is the latest available version on Supabase.

Versions 0.6 and later have accelerated HNSW build speeds. You can observe your current version in the Dashboard's Extensions Page. You can perform a software upgrade in the Infrastructure Settings if necessary.

2. Setting up an external connection

The Dashboard has an internal time limit of ~2 minutes for queries. Indexing a large table will almost always take more time, so it is necessary to execute your code through an external interface, such as PSQL.

You can install PSQL in macOS and Windows by following these links and instructions. For Linux (Debian) you can run the following:


_10
sudo apt-get update
_10
sudo apt-get install postgresql-client

Once installed, you can find your PSQL string from the Database Settings, which can be executed in the terminal to create a psql session.

If your network can use IPv6, consider using the direct connection string instead of Supavisor. It's not mandatory, but for tasks that run a long time, it's best to reduce network complexity. To check if your network is compatible, use this cURL command to request your IPv6 address:


_10
curl -6 https://ifconfig.co/ip

If an address is returned, you should be able to use your direct connection string found in the Database Settings:

3. Increase memory for index creation (optional)

The maintance_work_mem variable limits the maximum amount of memory that can be used by maintenance operations, such as vacuuming, altering, and indexing tables. In your session you should try to set it to a reasonably high value:


_10
set maintenance_work_mem to <several Gb's>; -- '#GB'

Inspect value to make sure it has been set:


_10
show maintenance_work_mem;

4. Increase cores for index creation (optional)

The max_parallel_maintenance_workers variable limits the amount of cores that can be used by maintenance operations, including indexing tables. In your session, you should try to set it to a value roughly 1/2 to 2/3s of your compute core amount:


_10
set max_parallel_maintenance_workers to <integer>;

Inspect value to make sure it has been set:


_10
show max_parallel_maintenance_workers;

5. Setting a custom timeout

Disable query timeout for your connection:


_10
set statement_timeout = '0';

Inspect value to make sure it has been set:


_10
show statement_timeout;

6. Consider temporarily upgrading your compute size (optional)

If your task is particularly long, you can speed it up by boosting your computing power temporarily. Compute size is charged by the hour, so you can increase it for an hour or two to finish your task faster, then scale it back afterward. Here is a list of compute add-ons. If you want to temporarily upgrade, you can find the add-ons for your project in your Dashboard's Add-Ons Settings.

7. Consider increasing disk size (optional)

HSNW indexes can produce temporary files during their construction that may consume a few GBs worth of disk. Consider increasing the disk size in the Database Settings to accommodate for short-term disk stress.

Screenshot 2024-06-10 at 8 00 28 PM