Importing into Supabase

Importing large datasets into Supabase.

Importing large datasets requires careful planning and execution to ensure the process is efficient, reliable, and doesn't disrupt your application. In this guide, we cover various best practices and techniques for importing datasets, and highlight common pitfalls to avoid.

How to import data into Supabase#

You have multiple options for importing your data into Supabase:

  1. CSV import via the Supabase dashboard
  2. Bulk import using pgloader
  3. Using the Postgres COPY command
  4. Using the Supabase API

Option 1: CSV Import via Supabase Dashboard#

Supabase dashboard provides a user-friendly way to import data. However, for very large datasets, this method may not be the most efficient choice, given the size limit is 100MB. It's generally better suited for smaller datasets and quick data imports. Consider using alternative methods like pgloader for large-scale data imports.

  1. Navigate to the relevant table in the Table Editor.
  2. Click on “Insert” then choose "Import Data from CSV" and follow the on-screen instructions to upload your CSV file.

Option 2: Bulk Import using pgloader#

pgloader is a powerful tool for efficiently importing data into a PostgreSQL database that supports a wide range of source database engines, including MySQL and MS SQL.

You can use it in conjunction with Supabase by following these steps:

  1. Install pgloader on your local machine or a server. For more info, you can refer to the official pgloader installation page.

    $ apt-get install pgloader

  2. Create a configuration file that specifies the source data and the target Supabase database (e.g., config.load). Here's an example configuration file:

    INTO postgres://postgres:password@db.xxxx.supabase.co:6543/postgres
    ALTER SCHEMA 'public' OWNER TO 'postgres';
    set wal_buffers = '64MB', max_wal_senders = 0, statement_timeout = 0, work_mem to '2GB';

    Customize the source and Supabase database URL and options to fit your specific use case:

    • wal_buffers: This parameter is set to '64MB' to allocate 64 megabytes of memory for write-ahead logging buffers. A larger value can help improve write performance by caching more data in memory before writing it to disk. This can be useful during data import operations to speed up the writing of transaction logs.
    • max_wal_senders: It is set to 0, to disable replication connections. This is done during the data import process to prevent replication-related conflicts and issues.
    • statement_timeout: The value is set to 0, which means it's disabled, allowing SQL statements to run without a time limit.
    • work_mem: It is set to '2GB', allocating 2 GB of memory for query operations. This enhances the performance of complex queries by allowing larger in-memory datasets.
  3. Run pgloader with the configuration file.

    pgloader config.load

For databases using the Postgres engine, we recommend using the pg_dump and psql command line tools.

Option 3: Using Postgres Copy command#

Read more about Bulk data loading.

Option 4: Using the Supabase API#

The Supabase API allows you to programmatically import data into your tables. You can use various client libraries to interact with the API and perform data import operations. This approach is useful when you need to automate data imports, and it gives you fine-grained control over the process. Refer to our API guide for more details.

Preparing to import data#

Before importing data, you should have a plan for timing, disk space, and more. Prepare your database in advance.

1. Statement Timeouts#

By default, Supabase enforces query statement timeouts to ensure fair resource allocation and prevent long-running queries from affecting the overall system. When importing large datasets, you may encounter timeouts. To address this:

  • Increase the Statement Timeout: You can adjust the statement timeout for your session or connection to accommodate longer-running queries. Be cautious when doing this, as excessively long queries can negatively impact system performance. Read more about Statement Timeouts.

2. Disk Size Considerations#

Large datasets consume disk space. Ensure your Supabase project has sufficient disk capacity to accommodate the imported data. If you know how big your database is going to be, you can manually increase the size in your projects database settings.

Read more about disk management.

3. Disabling Triggers#

When importing large datasets, it's often beneficial to disable triggers temporarily. Triggers can significantly slow down the import process, especially if they involve complex logic or referential integrity checks. After the import, you can re-enable the triggers.

To disable triggers, use the following SQL commands:

-- Disable triggers on a specific table
-- To re-enable triggers

4. Building Indices at the End#

Indexing is crucial for query performance, but building indices while importing a large dataset can be time-consuming. Consider building or rebuilding indices after the data import is complete. This approach can significantly speed up the import process and reduce the overall time required.

To build an index after the data import:

-- Create an index on a table
create index index_name on table_name (column_name);

Read more about Managing Indexes in PostgreSQL.

Common Pitfalls to Avoid#

  • Not Planning Ahead: Importing large datasets should be well-planned. Ensure you have enough disk space, account for statement timeouts, and know which tools to use.
  • Not Monitoring Progress: Keep an eye on the import progress, especially when using custom scripts or tools. Monitor for any errors or issues.
  • Not Backing Up Data: Before any major import, create a backup of your existing data to ensure you can recover if something goes wrong. Read more about Database Backups.
  • Neglecting Indexing: Don't forget to build necessary indexes after importing data for optimal query performance.

By following these best practices and avoiding common pitfalls, you can successfully import large datasets into your Supabase project while maintaining data integrity and system performance.