data:image/s3,"s3://crabby-images/5eb03/5eb03e8558741454ed8bfe37775c9d34b7459239" alt="Postgres as a Graph Database: (Ab)using pgRouting"
pgRouting is a Postgres extension. It's often used for finding the “shortest path” between two locations, however it's a hidden gem in Postgres and can be used for basic graph functionality.
pgRouting is typically combined with PostGIS for working with geospatial data, but it can also be useful beyond that as a lightweight alternative to Graph extensions like Apache AGE, or specialized graph databases like Neo4j.
Let's explore some useful applications of pgRouting and graphs.
What is pgRouting?
pgRouting is an extension of PostGIS that provides geospatial routing functionality. You can use it to calculate the shortest path, perform network analysis, and solve complex routing problems on a graph-based structure. Most commonly, this is used in Geographic Information Systems (GIS) for tasks like determining the fastest route between two locations.
Working with Graphs
The power of pgRouting lies in its ability to work with any data structured as a graph. A graph is essentially a network of interconnected points, where:
- Nodes represent entities.
- Edges represent relationships or paths between those nodes.
In maps / GIS, nodes and edges represent intersections and roads respectively. However, this structure can also be applied to abstract systems like a social networks, where users are nodes and friendships are edges.
Non-GIS Use Cases for pgRouting
Let's explore how pgRouting can be applied to a few non-GIS problems.
Task scheduling
In any project, tasks have dependencies. For example, task B can only start after task A is completed. This creates a directed acyclic graph (DAG), where:
- nodes represent tasks
- edges represent dependencies
One of the most challenging aspects of managing projects is determining the “critical path” — the project's overall duration, determined by the longest sequence of dependencies.
Using pgRouting, you can model your task's dependencies, using graph algorithms to find the critical path. Suppose we have a table tasks with task dependencies modeled as a graph:
_35-- Create the tasks table with dependencies_35create table tasks (_35 id serial primary key,_35 name text not null_35);_35_35-- insert tasks into the table_35insert into tasks (name)_35values_35 ('Start Project'),_35 ('Task A'),_35 ('Task B'),_35 ('Task C'),_35 ('Task D'),_35 ('End Project');_35_35-- create the dependencies table_35create table dependencies (_35 id serial primary key,_35 source integer not null, -- task id where the dependency starts_35 target integer not null, -- task id where the dependency ends_35 duration integer not null, -- duration of the task in days_35 constraint fk_source foreign key (source) references tasks (id),_35 constraint fk_target foreign key (target) references tasks (id)_35);_35_35-- insert dependencies with durations (directed edges)_35insert into dependencies (source, target, duration)_35values_35 (1, 2, 3), -- start project -> task a (3 days)_35 (2, 3, 4), -- task a -> task b (4 days)_35 (3, 4, 5), -- task b -> task c (5 days)_35 (4, 5, 2), -- task c -> task d (2 days)_35 (5, 6, 6);_35-- task d -> end project (6 days)
You can then use the pgr_dijkstra()
function to find the shortest (or longest) path through the tasks, allowing you to map out the project schedule effectively:
_10create schema if not exists extensions;_10create extension pgrouting schema extensions cascade;_10_10-- find the longest path using pgr_dijkstra()_10-- (as it calculates shortest path, use negative weights)_10select * FROM extensions.pgr_dijkstra(_10 'select id, source, target, duration as cost from dependencies',_10 1, -- Start Project (Task ID 1)_10 6 -- End Project (Task ID 6)_10);
Which returns a table showing that this project will take 20 days from start to finish:
seq | path_seq | node | edge | cost | agg_cost |
---|---|---|---|---|---|
1 | 1 | 1 | 1 | 3 | 0 |
2 | 2 | 2 | 2 | 4 | 3 |
3 | 3 | 3 | 3 | 5 | 7 |
4 | 4 | 4 | 4 | 2 | 12 |
5 | 5 | 5 | 5 | 6 | 14 |
6 | 6 | 6 | -1 | 0 | 20 |
Tangent: the Dijkstra algorithm
The pgr_dijkstra()
function implements Dijkstra's
algorithm, which is used to find the
shortest path between nodes in a graph. This algorithm guarantees the shortest path from a
source node to a target node (or all other nodes), based on the cost of edges connecting the
nodes.
Fun fact: Dijkstra's algorithm was published in 1959 by Dutch computer scientist Edsger Dijkstra. It's a “greedy” algorithm, meaning it always picks the closest, cheapest node to explore next.
Reverse proxy routing based on resource allocation
Distributed systems usually involve allocating resources efficiently across a network of nodes. Each node might represent a physical location or a computing process, and the edges represent the available pathways to move resources between them. For example, in a cloud infrastructure, pgRouting could help determine how to allocate compute tasks across a set of distributed servers by finding the shortest or least-congested path to route data.
Suppose you have a network of servers represented by nodes and their data connections as edges in a table servers.
_38-- create the servers table representing the nodes_38create table servers (_38 id serial primary key,_38 name text,_38 x double precision, -- x coordinate for spatial data (latitude)_38 y double precision -- y coordinate for spatial data (longitude)_38);_38_38-- insert some sample servers_38insert into servers (name, x, y)_38values_38 ('server a', 0, 0),_38 ('server b', 2, 1),_38 ('server c', 4, 3),_38 ('server d', 3, 5);_38_38-- create the server_connections table representing the edges_38create table server_latency (_38 id serial primary key,_38 source integer,_38 target integer,_38 cost double precision, -- cost could represent latency or bandwidth_38 x1 double precision, -- x coordinate of source_38 y1 double precision, -- y coordinate of source_38 x2 double precision, -- x coordinate of target_38 y2 double precision, -- y coordinate of target,_38 constraint fk_source foreign key (source) references servers (id),_38 constraint fk_target foreign key (target) references servers (id)_38);_38_38-- insert connections between servers_38insert into server_latency (source, target, cost, x1, y1, x2, y2)_38values_38 (1, 2, 1.5, 0, 0, 2, 1), -- server a -> server b with a cost of 1.5 (could be latency)_38 (2, 3, 2.0, 2, 1, 4, 3), -- server b -> server c with a cost of 2.0_38 (2, 4, 1.8, 2, 1, 3, 5), -- server b -> server d with a cost of 1.8_38 (4, 3, 1.0, 3, 5, 4, 3);_38-- server d -> server c with a cost of 1.0
You can then use pgr_astar
() to find the most efficient path for data or compute tasks to travel through this network, optimizing for speed or load:
_10-- Query to find the most efficient path (using pgr_astar)_10select *_10from_10 extensions.pgr_astar(_10 'select id, source, target, cost, x1, y1, x2, y2 from server_latency',_10 1,_10 3 -- Start from Server A (id=1) to Server C (id=3)_10 );
Tangent: the A* algorithm
The pgr_astar()
function is an implementation of the A* (A-star)
algorithm. It's used to find the most
efficient (shortest) path between two points in a graph. A* is commonly used in navigation and
routing because it is more efficient than Dijkstra's algorithm in many scenarios, especially
when you have spatial data with coordinates (e.g., X, Y positions).
Fun fact: A* was originally designed in the 1960s for artificial intelligence applications and pathfinding in games. Today, it's one of the most widely used algorithms in video game development to help characters navigate complex environments efficiently.
Recommendation engines like YouTube
In recommendation engines or search algorithms that use knowledge graphs, pgRouting can be used to build relationships between entities and events. Take YouTube's recommendation algorithm, we can structure this data as a graph where:
- Nodes represent entities like users, videos, or categories.
- Edges represent relationships or interactions between those entities, such as a user liking a video or videos being part of the same category.
Let's create a list of “nodes”:
_24create table categories (_24 id serial primary key,_24 name text_24);_24_24insert into categories (name)_24values_24 ('Graph Theory'),_24 ('AI & Machine Learning'),_24 ('Python Programming');_24_24create table videos (_24 id serial primary key,_24 title text,_24 category_id int references categories (id)_24);_24_24insert into videos (title, category_id)_24values_24 ('Intro to Graph Theory', 1),_24 ('Advanced Graph Algorithms', 1),_24 ('Graph Neural Networks', 2),_24 ('Beginner Python Tutorial', 3),_24 ('Advanced Python Techniques', 3);
And some “edges”:
_28create table video_relationships (_28 source_video_id int references videos (id),_28 target_video_id int references videos (id),_28 relationship_type text, -- 'same_category', 'watched_by_same_users', etc._28 weight int default 1 -- strength of the relationship_28);_28_28insert into video_relationships (source_video_id, target_video_id, relationship_type, weight)_28values_28 (1, 2, 'same_category', 5), -- "Intro to Graph Theory" and "Advanced Graph Algorithms" are in the same category_28 (2, 3, 'watched_by_same_users', 3), -- "Advanced Graph Algorithms" and "Graph Neural Networks" are often watched together_28 (4, 5, 'same_category', 5); -- "Beginner Python Tutorial_28_28create table interactions (_28 user_id int references auth.users (id),_28 video_id int references videos (id),_28 interaction_type text, -- 'liked', 'viewed', etc._28 weight int default 1 -- strength of the interaction_28);_28_28insert into interactions (user_id, video_id, interaction_type, weight)_28values_28 ('user_01', 1, 'viewed', 5), -- "User 01" watched "Intro to Graph Theory" to the end (weight = 5)_28 ('user_01', 2, 'liked', 5), -- "User 01" liked "Advanced Graph Algorithms"_28 ('user_02', 3, 'viewed', 2), -- "User 02" watched "Graph Neural Networks" and bounced halfway through (weight = 2)_28 ('user_03', 4, 'liked', 5), -- "User 03" liked "Beginner Python Tutorial"_28 ('user_03', 5, 'viewed', 2);_28-- "User 03" watched "Advanced Python Techniques" and bounced halfway through (weight = 2)
Now we can use the pgr_dijkstra()
function to find the shortest or most relevant path between a user and new videos. For example, let's find videos that are most relevant to user_01
considering their past interactions:
Tangent: ranking recommendations
Since it's “just postgres” it's simple enough to rank the results using an order by
clause. For example, if we stored the pgr_dijkstra()
results above in a table called “recommendations”, we use this a query like this to sort the paths by the highest ranking:
_10select videos.title, sum(weight) as recommendation_score_10from_10 recommendations_10 join videos on recommendations.target = videos.id_10group by videos.title_10order by recommendation_score desc;
Get Started
pgRouting is a powerful extension for Postgres that can be used to solve a wide range of graph-based problems. Check out the pgRouting docs for more information on how to use it. You can also use it on Supabase:
- Docs: pgrouting: Geospatial Routing
- Launch a new Postgres database: database.new