Interpreting Supabase Grafana IO charts
Last edited: 1/17/2025
There are two primary values that matter for IO:
- Disk Throughput: how much data can be moved to and from disk per second
- IOPS(Input/Output per second): how many read/write requests can be performed against your disk per second
Each compute instance has unique IO settings. The settings at the time this was written are listed below.
Plan | Baseline Disk Throughput | Baseline IOPS |
---|---|---|
Nano (free) | 43 Mbps | 250 IOPS |
Micro | 87 Mbps | 500 IOPS |
Small | 174 Mbps | 1,000 IOPS |
Medium | 347 Mbps | 2,000 IOPS |
Large | 630 Mbps | 3,000 IOPS |
XL | 1,048 Mbps | 3,000 IOPS |
2XL | 1,048 Mbps | 3,000 IOPS |
4XL | 1,048 Mbps | 3,000 IOPS |
8XL | 1,048 Mbps | 3,000 IOPS |
12XL | 1,048 Mbps | 3,000 IOPS |
16XL | 1,048 Mbps | 3,000 IOPS |
Compute sizes below XL, to accommodate their limitations, have burst budgets. This is when instances are allowed to utilize 1048 Mbps of disk throughput and 3,000 IOPS for 30 minutes before returning back to their baseline behavior.
There are other metrics that indicate IO strain.
This example shows a 16XL database exhibiting severe IO strain:
Its Disk IOPS is constantly near peak capacity:
Its throughput is also high:
As a side-effect, its CPU is encumbered by heavy Busy IOWait activity:
Excessive IO usage is highly problematic as it clarifies that your database is expending more IO than it normally is intended to manage. This can be caused by
- Excessive and needless sequential scans: poorly indexed tables force requests to scan disk (guide to resolve)
- Too little cache: There is not enough memory, so instead of reading data from the memory cache, it is accessed from disk (guide to inspect)
- Poorly optimized RLS policies: RLS that rely heavily on joins are more likely to hit disk. If possible, they should optimized (RLS best practice guide)
- Excessive bloat: This is the least likely to cause major issues, but bloat can take up space, preventing data on disk from being placed in the same locality. This can force the database to scan more pages than necessary. (explainer guide)
- Uploading high amounts of data: temporarily increase compute add-on size for the duration of the uploads
- Insufficient memory: Sometimes an inadequate amount of memory forces queries to hit disk instead of the memory cache. Address memory issues (guide) can reduce disk strain.
If a database exhibited some of these metrics for prolonged periods, then there are a few primary approaches:
- Scale the database to get more IO if possible
- Optimize queries/tables or refactor database/app to reduce IO
- Spin-up a read-replica
- Modify IO configs in the Database Settings
- Partitions: Generally should be used on very large tables to minimize data pulled from disk
Other useful Supabase Grafana guides:
Esoteric factors
Webhooks:
Supabase webhooks use the pg_net extension to handle requests. The net.http_request_queue
table isn't indexed to keep write costs low. However, if you upload millions of rows to a webhook-enabled table too quickly, it can significantly increase the read costs for the extension.
To check if reads are becoming expensive, run:
_10select count(*) as exact_count from net.http_request_queue;_10-- the number should be relatively low <20,000
If you encounter this issue, you can either:
-
Increase your compute size to help handle the large volume of requests.
-
Truncate the table to clear the queue:
_10TRUNCATE net.http_request_queue;