AI & Vectors

Hybrid search

Combine keyword search with semantic search.

Hybrid search combines full text search (searching by keyword) with semantic search (searching by meaning) to identify results that are both directly and contextually relevant to the user's query.

Sometimes a single search method doesn't quite capture what a user is really looking for. For example, if a user searches for "Italian recipes with tomato sauce" on a cooking app, a keyword search would pull up recipes that specifically mention "Italian," "recipes," and "tomato sauce" in the text. However, it might miss out on dishes that are quintessentially Italian and use tomato sauce but don't explicitly label themselves with these words, or use variations like "pasta sauce" or "marinara." On the other hand, a semantic search might understand the culinary context and find recipes that match the intent, such as a traditional "Spaghetti Marinara," even if they don't match the exact keyword phrase. However, it could also suggest recipes that are contextually related but not what the user is looking for, like a "Mexican salsa" recipe, because it understands the context to be broadly about tomato-based sauces.

Hybrid search combines the strengths of both these methods. It would ensure that recipes explicitly mentioning the keywords are prioritized, thus capturing direct hits that satisfy the keyword criteria. At the same time, it would include recipes identified through semantic understanding as being related in meaning or context, like different Italian dishes that traditionally use tomato sauce but might not have been tagged explicitly with the user's search terms. It identifies results that are both directly and contextually relevant to the user's query while ideally minimizing misses and irrelevant suggestions.

The decision to use hybrid search depends on what your users are looking for in your app. For a code repository where developers need to find exact lines of code or error messages, keyword search is likely ideal because it matches specific terms. In a mental health forum where users search for advice or experiences related to their feelings, semantic search may be better because it finds results based on the meaning of a query, not just specific words. For a shopping app where customers might search for specific product names yet also be open to related suggestions, hybrid search combines the best of both worlds - finding exact matches while also uncovering similar products based on the shopping context.

How to combine search methods

Hybrid search merges keyword search and semantic search, but how does this process work?

First, each search method is executed separately. Keyword search, which involves searching by specific words or phrases present in the content, will yield its own set of results. Similarly, semantic search, which involves understanding the context or meaning behind the search query rather than the specific words used, will generate its own unique results.

Now with these separate result lists available, the next step is to combine them into a single, unified list. This is achieved through a process known as “fusion”. Fusion takes the results from both search methods and merges them together based on a certain ranking or scoring system. This system may prioritize certain results based on factors like their relevance to the search query, their ranking in the individual lists, or other criteria. The result is a final list that integrates the strengths of both keyword and semantic search methods.

Reciprocal Ranked Fusion (RRF)

One of the most common fusion methods is Reciprocal Ranked Fusion (RRF). The key idea behind RRF is to give more weight to the top-ranked items in each individual result list when building the final combined list.

In RRF, we iterate over each record and assign a score (noting that each record could exist in one or both lists). The score is calculated as 1 divided by that record's rank in each list, summed together between both lists. For example, if a record with an ID of 123 was ranked third in the keyword search and ninth in semantic search, it would receive a score of 13+19=0.444\dfrac{1}{3} + \dfrac{1}{9} = 0.444. If the record was found in only one list and not the other, it would receive a score of 0 for the other list. The records are then sorted by this score to create the final list. The items with the highest scores are ranked first, and lowest scores ranked last.

This method ensures that items that are ranked high in multiple lists are given a high rank in the final list. It also ensures that items that are ranked high in only a few lists but low in others are not given a high rank in the final list. Placing the rank in the denominator when calculating score helps penalize the low ranking records.

Smoothing constant k

To prevent extremely high scores for items that are ranked first (since we're dividing by the rank), a k constant is often added to the denominator to smooth the score:

1k+rank\dfrac{1}{k+rank}

This constant can be any positive number, but is typically small. A constant of 1 would mean that a record ranked first would have a score of 11+1=0.5\dfrac{1}{1+1} = 0.5 instead of 11. This adjustment can help balance the influence of items that are ranked very high in individual lists when creating the final combined list.

Hybrid search in Postgres

Let's implement hybrid search in Postgres using tsvector (keyword search) and pgvector (semantic search).

First we'll create a documents table to store the documents that we will search over. This is just an example - adjust this to match the structure of your application.


_10
create table documents (
_10
id bigint primary key generated always as identity,
_10
content text,
_10
fts tsvector generated always as (to_tsvector('english', content)) stored,
_10
embedding vector(512)
_10
);

The table contains 4 columns:

  • id is an auto-generated unique ID for the record. We'll use this later to match records when performing RRF.
  • content contains the actual text we will be searching over.
  • fts is an auto-generated tsvector column that is generated using the text in content. We will use this for full text search (search by keyword).
  • embedding is a vector column that stores the vector generated from our embedding model. We will use this for semantic search (search by meaning). We chose 512 dimensions for this example, but adjust this to match the size of the embedding vectors generated from your preferred model.

Next we'll create indexes on the fts and embedding columns so that their individual queries will remain fast at scale:


_10
-- Create an index for the full-text search
_10
create index on documents using gin(fts);
_10
_10
-- Create an index for the semantic vector search
_10
create index on documents using hnsw (embedding vector_ip_ops);

For full text search we use a generalized inverted (GIN) index which is designed for handling composite values like those stored in a tsvector.

For semantic vector search we use an HNSW index, which is a high performing approximate nearest neighbor (ANN) search algorithm. Note that we are using the vector_ip_ops (inner product) operator with this index because we plan on using the inner product (<#>) operator later in our query. If you plan to use a different operator like cosine distance (<=>), be sure to update the index accordingly. For more information, see distance operators.

Finally we'll create our hybrid_search function:


_48
create or replace function hybrid_search(
_48
query_text text,
_48
query_embedding vector(512),
_48
match_count int,
_48
full_text_weight float = 1,
_48
semantic_weight float = 1,
_48
rrf_k int = 50
_48
)
_48
returns setof documents
_48
language sql
_48
as $$
_48
with full_text as (
_48
select
_48
id,
_48
-- Note: ts_rank_cd is not indexable but will only rank matches of the where clause
_48
-- which shouldn't be too big
_48
row_number() over(order by ts_rank_cd(fts, websearch_to_tsquery(query_text)) desc) as rank_ix
_48
from
_48
documents
_48
where
_48
fts @@ websearch_to_tsquery(query_text)
_48
order by rank_ix
_48
limit least(match_count, 30) * 2
_48
),
_48
semantic as (
_48
select
_48
id,
_48
row_number() over (order by doc_vector <#> query_embedding) as rank_ix
_48
from
_48
documents
_48
order by rank_ix
_48
limit least(match_count, 30) * 2
_48
)
_48
select
_48
documents.*
_48
from
_48
full_text
_48
full outer join semantic
_48
on full_text.id = semantic.id
_48
join documents
_48
on coalesce(full_text.id, semantic.id) = documents.id
_48
order by
_48
coalesce(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight +
_48
coalesce(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight
_48
desc
_48
limit
_48
least(match_count, 30)
_48
$$;

Let's break this down:

  • Parameters: The function accepts quite a few parameters, but the main (required) ones are query_text, query_embedding, and match_count.

    • query_text is the user's query text (more on this shortly)
    • query_embedding is the vector representation of the user's query produced by the embedding model. We chose 512 dimensions for this example, but adjust this to match the size of the embedding vectors generated from your preferred model. This must match the size of the embedding vector on the documents table (and use the same model).
    • match_count is the number of records returned in the limit clause.

    The other parameters are optional, but give more control over the fusion process.

    • full_text_weight and semantic_weight decide how much weight each search method gets in the final score. These are both 1 by default which means they both equally contribute towards the final rank. A full_text_weight of 2 and semantic_weight of 1 would give full-text search twice as much weight as semantic search.
    • rrf_k is the k smoothing constant added to the reciprocal rank. The default is 50.
  • Return type: The function returns a set of records from our documents table.

  • CTE: We create two common table expressions (CTE), one for full-text search and one for semantic search. These perform each query individually prior to joining them.

  • RRF: The final query combines the results from the two CTEs using reciprocal rank fusion (RRF).

To use this function in SQL, we can run:


_10
select
_10
*
_10
from
_10
hybrid_search(
_10
'Italian recipes with tomato sauce', -- user query
_10
'[...]'::vector(512), -- embedding generated from user query
_10
10
_10
);

In practice, you will likely be calling this from the Supabase client or through a custom backend layer. Here is a quick example of how you might call this from an Edge Function using JavaScript:


_38
import { createClient } from 'npm:@supabase/supabase-js'
_38
import OpenAI from 'npm:openai'
_38
_38
const supabaseUrl = Deno.env.get('SUPABASE_URL')!
_38
const supabaseServiceRoleKey = Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
_38
const openaiApiKey = Deno.env.get('OPENAI_API_KEY')!
_38
_38
Deno.serve(async (req) => {
_38
// Grab the user's query from the JSON payload
_38
const { query } = await req.json()
_38
_38
// Instantiate OpenAI client
_38
const openai = new OpenAI({ apiKey: openaiApiKey })
_38
_38
// Generate a one-time embedding for the user's query
_38
const embeddingResponse = await openai.embeddings.create({
_38
model: 'text-embedding-3-large',
_38
input: query,
_38
dimensions: 512,
_38
})
_38
_38
const [{ embedding }] = embeddingResponse.data
_38
_38
// Instantiate the Supabase client
_38
// (replace service role key with user's JWT if using Supabase auth and RLS)
_38
const supabase = createClient(supabaseUrl, supabaseServiceRoleKey)
_38
_38
// Call hybrid_search Postgres function via RPC
_38
const { data: documents } = await supabase.rpc('hybrid_search', {
_38
query_text: query,
_38
query_embedding: embedding,
_38
match_count: 10,
_38
})
_38
_38
return new Response(JSON.stringify(documents), {
_38
headers: { 'Content-Type': 'application/json' },
_38
})
_38
})

This uses OpenAI's text-embedding-3-large model to generate embeddings (shortened to 512 dimensions for faster retrieval). Swap in your preferred embedding model (and dimension size) accordingly.

To test this, make a POST request to the function's endpoint while passing in a JSON payload containing the user's query. Here is an example POST request using cURL:


_10
curl -i --location --request POST \
_10
'http://127.0.0.1:54321/functions/v1/hybrid-search' \
_10
--header 'Authorization: Bearer <anonymous key>' \
_10
--header 'Content-Type: application/json' \
_10
--data '{"query":"Italian recipes with tomato sauce"}'

For more information on how to create, test, and deploy edge functions, see Getting started.

See also