Interpreting Supabase Grafana IO charts

Last edited: 1/17/2025

Supabase Grafana Installation Guide

There are two primary values that matter for IO:

  • Disk Throughput: how much data can be moved to and from disk per second
  • IOPS(Input/Output per second): how many read/write requests can be performed against your disk per second

Each compute instance has unique IO settings. The settings at the time this was written are listed below.

PlanBaseline Disk ThroughputBaseline IOPS
Nano (free)43 Mbps250 IOPS
Micro87 Mbps500 IOPS
Small174 Mbps1,000 IOPS
Medium347 Mbps2,000 IOPS
Large630 Mbps3,000 IOPS
XL1,048 Mbps3,000 IOPS
2XL1,048 Mbps3,000 IOPS
4XL1,048 Mbps3,000 IOPS
8XL1,048 Mbps3,000 IOPS
12XL1,048 Mbps3,000 IOPS
16XL1,048 Mbps3,000 IOPS

Compute sizes below XL, to accommodate their limitations, have burst budgets. This is when instances are allowed to utilize 1048 Mbps of disk throughput and 3,000 IOPS for 30 minutes before returning back to their baseline behavior.

There are other metrics that indicate IO strain.

This example shows a 16XL database exhibiting severe IO strain:

image

Its Disk IOPS is constantly near peak capacity:

image

Its throughput is also high:

image

As a side-effect, its CPU is encumbered by heavy Busy IOWait activity:

image

Excessive IO usage is highly problematic as it clarifies that your database is expending more IO than it normally is intended to manage. This can be caused by

  • Excessive and needless sequential scans: poorly indexed tables force requests to scan disk (guide to resolve)
  • Too little cache: There is not enough memory, so instead of reading data from the memory cache, it is accessed from disk (guide to inspect)
  • Poorly optimized RLS policies: RLS that rely heavily on joins are more likely to hit disk. If possible, they should optimized (RLS best practice guide)
  • Excessive bloat: This is the least likely to cause major issues, but bloat can take up space, preventing data on disk from being placed in the same locality. This can force the database to scan more pages than necessary. (explainer guide)
  • Uploading high amounts of data: temporarily increase compute add-on size for the duration of the uploads
  • Insufficient memory: Sometimes an inadequate amount of memory forces queries to hit disk instead of the memory cache. Address memory issues (guide) can reduce disk strain.

If a database exhibited some of these metrics for prolonged periods, then there are a few primary approaches:

  • Scale the database to get more IO if possible
  • Optimize queries/tables or refactor database/app to reduce IO
  • Spin-up a read-replica
  • Modify IO configs in the Database Settings
  • Partitions: Generally should be used on very large tables to minimize data pulled from disk

Other useful Supabase Grafana guides:

Esoteric factors

Webhooks: Supabase webhooks use the pg_net extension to handle requests. The net.http_request_queue table isn't indexed to keep write costs low. However, if you upload millions of rows to a webhook-enabled table too quickly, it can significantly increase the read costs for the extension.

To check if reads are becoming expensive, run:


_10
select count(*) as exact_count from net.http_request_queue;
_10
-- the number should be relatively low <20,000

If you encounter this issue, you can either:

  1. Increase your compute size to help handle the large volume of requests.

  2. Truncate the table to clear the queue:


_10
TRUNCATE net.http_request_queue;