Storing OpenAI embeddings in Postgres with pgvector

A new PostgreSQL extension is now available in Supabase: pgvector, an open-source vector similarity search.

The exponential progress of AI functionality over the past year has inspired many new real world applications. One specific challenge has been the ability to store and query embeddings at scale. In this post we'll explain what embeddings are, why we might want to use them, and how we can store and query them in PostgreSQL using pgvector.

🆕 Supabase has now released an open source toolkit for developing AI applications using Postgres and pgvector. Learn more in the AI & Vectors docs.

What are embeddings?

Embeddings capture the “relatedness” of text, images, video, or other types of information. This relatedness is most commonly used for:

Search: how similar is a search term to a body of text?
Recommendations: how similar are two products?
Classifications: how do we categorize a body of text?
Clustering: how do we identify trends?

Let's explore an example of text embeddings. Say we have three phrases:

“The cat chases the mouse”
“The kitten hunts rodents”
“I like ham sandwiches”

Your job is to group phrases with similar meaning. If you are a human, this should be obvious. Phrases 1 and 2 are almost identical, while phrase 3 has a completely different meaning.

Although phrases 1 and 2 are similar, they share no common vocabulary (besides “the”). Yet their meanings are nearly identical. How can we teach a computer that these are the same?

Human language

Humans use words and symbols to communicate language. But words in isolation are mostly meaningless - we need to draw from shared knowledge & experience in order to make sense of them. The phrase “You should Google it” only makes sense if you know that Google is a search engine and that people have been using it as a verb.

In the same way, we need to train a neural network model to understand human language. An effective model should be trained on millions of different examples to understand what each word, phrase, sentence, or paragraph could mean in different contexts.

So how does this relate to embeddings?

How do embeddings work?

Embeddings compress discrete information (words & symbols) into distributed continuous-valued data (vectors). If we took our phrases from before and plot them on a chart, it might look something like this:

Phrases 1 and 2 would be plotted close to each other, since their meanings are similar. We would expect phrase 3 to live somewhere far away since it isn't related. If we had a fourth phrase, “Sally ate Swiss cheese”, this might exist somewhere between phrase 3 (cheese can go on sandwiches) and phrase 1 (mice like Swiss cheese).

In this example we only have 2 dimensions: the X and Y axis. In reality, we would need many more dimensions to effectively capture the complexities of human language.

OpenAI embeddings

OpenAI offers an API to generate embeddings for a string of text using its language model. You feed it any text information (blog articles, documentation, your company's knowledge base), and it will output a vector of floating point numbers that represents the “meaning” of that text.

Compared to our 2-dimensional example above, their latest embedding model text-embedding-ada-002 will output 1536 dimensions.

Why is this useful? Once we have generated embeddings on multiple texts, it is trivial to calculate how similar they are using vector math operations like cosine distance. A perfect use case for this is search. Your process might look something like this:

Pre-process your knowledge base and generate embeddings for each page
Store your embeddings to be referenced later (more on this)
Build a search page that prompts your user for input
Take user's input, generate a one-time embedding, then perform a similarity search against your pre-processed embeddings.
Return the most similar pages to the user

Embeddings in practice

At a small scale, you could store your embeddings in a CSV file, load them into Python, and use a library like numPy to calculate similarity between them using something like cosine distance or dot product. OpenAI has a cookbook example that does just that. Unfortunately this likely won't scale well:

What if I need to store and search over a large number of documents and embeddings (more than can fit in memory)?
What if I want to create/update/delete embeddings dynamically?
What if I'm not using Python?

Using PostgreSQL

Enter pgvector, an extension for PostgreSQL that allows you to both store and query vector embeddings within your database. Let's try it out.

First we'll enable the Vector extension. In Supabase, this can be done from the web portal through Database → Extensions. You can also do this in SQL by running:


_10create extension vector;

Next let's create a table to store our documents and their embeddings:


_10create table documents (
_10  id bigserial primary key,
_10  content text,
_10  embedding vector(1536)
_10);

pgvector introduces a new data type called vector. In the code above, we create a column named embedding with the vector data type. The size of the vector defines how many dimensions the vector holds. OpenAI's text-embedding-ada-002 model outputs 1536 dimensions, so we will use that for our vector size.

We also create a text column named content to store the original document text that produced this embedding. Depending on your use case, you might just store a reference (URL or foreign key) to a document here instead.

Soon we're going to need to perform a similarity search over these embeddings. Let's create a function to do that:


_21create or replace function match_documents (
_21  query_embedding vector(1536),
_21  match_threshold float,
_21  match_count int
_21)
_21returns table (
_21  id bigint,
_21  content text,
_21  similarity float
_21)
_21language sql stable
_21as $$
_21  select
_21    documents.id,
_21    documents.content,
_21    1 - (documents.embedding <=> query_embedding) as similarity
_21  from documents
_21  where documents.embedding <=> query_embedding < 1 - match_threshold
_21  order by documents.embedding <=> query_embedding
_21  limit match_count;
_21$$;

pgvector introduces 3 new operators that can be used to calculate similarity:

Operator	Description
`<->`	Euclidean distance
`<#>`	negative inner product
`<=>`	cosine distance

OpenAI recommends cosine similarity on their embeddings, so we will use that here.

Now we can call match_documents(), pass in our embedding, similarity threshold, and match count, and we'll get a list of all documents that match. And since this is all managed by Postgres, our application code becomes very simple.

Indexing

Once your table starts to grow with embeddings, you will likely want to add an index to speed up queries. Vector indexes are particularly important when you're ordering results because vectors are not grouped by similarity, so finding the closest by sequential scan is a resource-intensive operation.

Each distance operator requires a different type of index. We expect to order by cosine distance, so we need vector_cosine_ops index. A good starting number of lists is 4 * sqrt(table_rows):


_10create index on documents using ivfflat (embedding vector_cosine_ops)
_10with
_10  (lists = 100);

You can read more about indexing on pgvector's GitHub page here.

Generating embeddings

Let's use JavaScript to generate embeddings and store them in Postgres:


_29import { createClient } from '@supabase/supabase-js'
_29import { Configuration, OpenAIApi } from 'openai'
_29import { supabaseClient } from './lib/supabase'
_29
_29async function generateEmbeddings() {
_29  const configuration = new Configuration({ apiKey: '<YOUR_OPENAI_API_KEY>' })
_29  const openAi = new OpenAIApi(configuration)
_29
_29  const documents = await getDocuments() // Your custom function to load docs
_29
_29  // Assuming each document is a string
_29  for (const document of documents) {
_29    // OpenAI recommends replacing newlines with spaces for best results
_29    const input = document.replace(/\n/g, ' ')
_29
_29    const embeddingResponse = await openai.createEmbedding({
_29      model: 'text-embedding-ada-002',
_29      input,
_29    })
_29
_29    const [{ embedding }] = embeddingResponse.data.data
_29
_29    // In production we should handle possible errors
_29    await supabaseClient.from('documents').insert({
_29      content: document,
_29      embedding,
_29    })
_29  }
_29}

Building a simple search function

Finally, let's create an Edge Function to perform our similarity search:


_45import { serve } from 'https://deno.land/std@0.170.0/http/server.ts'
_45import 'https://deno.land/x/xhr@0.2.1/mod.ts'
_45import { createClient } from 'jsr:@supabase/supabase-js@2'
_45import { Configuration, OpenAIApi } from 'https://esm.sh/openai@3.1.0'
_45import { supabaseClient } from './lib/supabase'
_45
_45export const corsHeaders = {
_45  'Access-Control-Allow-Origin': '*',
_45  'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
_45}
_45
_45serve(async (req) => {
_45  // Handle CORS
_45  if (req.method === 'OPTIONS') {
_45    return new Response('ok', { headers: corsHeaders })
_45  }
_45
_45  // Search query is passed in request payload
_45  const { query } = await req.json()
_45
_45  // OpenAI recommends replacing newlines with spaces for best results
_45  const input = query.replace(/\n/g, ' ')
_45
_45  const configuration = new Configuration({ apiKey: '<YOUR_OPENAI_API_KEY>' })
_45  const openai = new OpenAIApi(configuration)
_45
_45  // Generate a one-time embedding for the query itself
_45  const embeddingResponse = await openai.createEmbedding({
_45    model: 'text-embedding-ada-002',
_45    input,
_45  })
_45
_45  const [{ embedding }] = embeddingResponse.data.data
_45
_45  // In production we should handle possible errors
_45  const { data: documents } = await supabaseClient.rpc('match_documents', {
_45    query_embedding: embedding,
_45    match_threshold: 0.78, // Choose an appropriate threshold for your data
_45    match_count: 10, // Choose the number of matches
_45  })
_45
_45  return new Response(JSON.stringify(documents), {
_45    headers: { ...corsHeaders, 'Content-Type': 'application/json' },
_45  })
_45})

Building a smarter search function

ChatGPT doesn't just return existing documents. It's able to assimilate a variety of information into a single, cohesive answer. To do this, we need to provide GPT with some relevant documents, and a prompt that it can use to formulate this answer.

One of the biggest challenges of OpenAI's text-davinci-003 completion model is the 4000 token limit. You must fit both your prompt and the resulting completion within the 4000 tokens. This makes it challenging if you wanted to prompt GPT-3 to answer questions about your own custom knowledge base that would never fit in a single prompt.

Embeddings can help solve this by splitting your prompts into a two-phased process:

Query your embedding database for the most relevant documents related to the question
Inject these documents as context for GPT-3 to reference in its answer

Here's another Edge Function that expands upon the simple example above:


_100import { serve } from 'https://deno.land/std@0.170.0/http/server.ts'
_100import 'https://deno.land/x/xhr@0.2.1/mod.ts'
_100import { createClient } from 'jsr:@supabase/supabase-js@2'
_100import GPT3Tokenizer from 'https://esm.sh/gpt3-tokenizer@1.1.5'
_100import { Configuration, OpenAIApi } from 'https://esm.sh/openai@3.1.0'
_100import { oneLine, stripIndent } from 'https://esm.sh/common-tags@1.8.2'
_100import { supabaseClient } from './lib/supabase'
_100
_100export const corsHeaders = {
_100  'Access-Control-Allow-Origin': '*',
_100  'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
_100}
_100
_100serve(async (req) => {
_100  // Handle CORS
_100  if (req.method === 'OPTIONS') {
_100    return new Response('ok', { headers: corsHeaders })
_100  }
_100
_100  // Search query is passed in request payload
_100  const { query } = await req.json()
_100
_100  // OpenAI recommends replacing newlines with spaces for best results
_100  const input = query.replace(/\n/g, ' ')
_100
_100  const configuration = new Configuration({ apiKey: '<YOUR_OPENAI_API_KEY>' })
_100  const openai = new OpenAIApi(configuration)
_100
_100  // Generate a one-time embedding for the query itself
_100  const embeddingResponse = await openai.createEmbedding({
_100    model: 'text-embedding-ada-002',
_100    input,
_100  })
_100
_100  const [{ embedding }] = embeddingResponse.data.data
_100
_100  // Fetching whole documents for this simple example.
_100  //
_100  // Ideally for context injection, documents are chunked into
_100  // smaller sections at earlier pre-processing/embedding step.
_100  const { data: documents } = await supabaseClient.rpc('match_documents', {
_100    query_embedding: embedding,
_100    match_threshold: 0.78, // Choose an appropriate threshold for your data
_100    match_count: 10, // Choose the number of matches
_100  })
_100
_100  const tokenizer = new GPT3Tokenizer({ type: 'gpt3' })
_100  let tokenCount = 0
_100  let contextText = ''
_100
_100  // Concat matched documents
_100  for (let i = 0; i < documents.length; i++) {
_100    const document = documents[i]
_100    const content = document.content
_100    const encoded = tokenizer.encode(content)
_100    tokenCount += encoded.text.length
_100
_100    // Limit context to max 1500 tokens (configurable)
_100    if (tokenCount > 1500) {
_100      break
_100    }
_100
_100    contextText += `${content.trim()}\n---\n`
_100  }
_100
_100  const prompt = stripIndent`${oneLine`
_100    You are a very enthusiastic Supabase representative who loves
_100    to help people! Given the following sections from the Supabase
_100    documentation, answer the question using only that information,
_100    outputted in markdown format. If you are unsure and the answer
_100    is not explicitly written in the documentation, say
_100    "Sorry, I don't know how to help with that."`}
_100
_100    Context sections:
_100    ${contextText}
_100
_100    Question: """
_100    ${query}
_100    """
_100
_100    Answer as markdown (including related code snippets if available):
_100  `
_100
_100  // In production we should handle possible errors
_100  const completionResponse = await openai.createCompletion({
_100    model: 'text-davinci-003',
_100    prompt,
_100    max_tokens: 512, // Choose the max allowed tokens in completion
_100    temperature: 0, // Set to 0 for deterministic results
_100  })
_100
_100  const {
_100    id,
_100    choices: [{ text }],
_100  } = completionResponse.data
_100
_100  return new Response(JSON.stringify({ id, text }), {
_100    headers: { ...corsHeaders, 'Content-Type': 'application/json' },
_100  })
_100})

Streaming results

OpenAI API responses take longer to depending on the length of the “answer”. ChatGPT has a nice UX for this by streaming the response to the user immediately. You can see a similar effect for the Supabase docs:

The OpenAI API supports completion streaming with Server Side Events. Supabase Edge Functions are run Deno, which also supports Server Side Events. Check out this commit to see how we modified the Function above to build a streaming interface.

Wrap up

Storing embeddings in Postgres opens a world of possibilities. You can combine your search function with telemetry functions, add an user-provided feedback (thumbs up/down), and make your search feel more integrated with your products.

The pgvector extension is available on all new Supabase projects today. To try it out, launch a new Postgres database: database.new

Storing OpenAI embeddings in Postgres with pgvector

What are embeddings?

Human language

How do embeddings work?

OpenAI embeddings

Embeddings in practice

Using PostgreSQL

Indexing

Generating embeddings

Building a simple search function

Building a smarter search function

Streaming results

Wrap up

More pgvector and AI resources

Supabase Clippy: ChatGPT for Supabase Docs

Supabase Beta December 2022

Build in a weekend, scale to millions

Storing OpenAI embeddings in Postgres with pgvector

What are embeddings?#

Human language#

How do embeddings work?#

OpenAI embeddings#

Embeddings in practice#

Using PostgreSQL#

Indexing#

Generating embeddings#

Building a simple search function#

Building a smarter search function#

Streaming results#

Wrap up#

More pgvector and AI resources#

Supabase Clippy: ChatGPT for Supabase Docs

Supabase Beta December 2022

Build in a weekend, scale to millions

What are embeddings?

Human language

How do embeddings work?

OpenAI embeddings

Embeddings in practice

Using PostgreSQL

Indexing

Generating embeddings

Building a simple search function

Building a smarter search function

Streaming results

Wrap up

More pgvector and AI resources