‘SELECT … WITH RECURSIVE’: Advanced SQL Tree Traversals Explained
Handling hierarchical data like category trees, folder structures, and organizational charts is a common yet complex challenge in SQL. PostgreSQL offers a powerful tool to solve this challenge: the WITH RECURSIVE
Common Table Expression (CTE). In this article, we’ll demystify how recursive queries work in SQL, specifically in PostgreSQL, using real-world examples.
1. Understanding Hierarchical Data in Relational Databases
Relational databases are inherently flat, which makes representing trees and hierarchies tricky. A common design pattern for such data is a self-referencing table, in which each row has a foreign key to its parent.
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
parent_id INTEGER REFERENCES categories(id)
);
This structure allows nesting, where a category can have a parent category, which enables tree-like relationships.
Example data:
INSERT INTO categories (id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktops', 2),
(5, 'Smartphones', 1);
Now we want to query all descendants of a given category. This is where WITH RECURSIVE
shines.
2. Basic Recursive CTE Syntax
The idea of a recursive CTE is to combine a base query (anchor member) with a recursive part that refers back to the CTE itself.
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id
FROM categories
WHERE id = 1 -- starting node ('Electronics')
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c
INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree;
This query finds all categories that are descendants of ‘Electronics’.
How it works:
- The base query selects the root node with
id = 1
. - The recursive part finds all categories whose
parent_id
matches the CTE’s current rows. UNION ALL
accumulates results until no new rows are found.
3. Practical Use Case: Organizational Chart Traversal
Let’s model an employee directory where employees can report to other employees:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
manager_id INTEGER REFERENCES employees(id)
);
INSERT INTO employees (id, name, manager_id) VALUES
(1, 'CEO', NULL),
(2, 'CTO', 1),
(3, 'CFO', 1),
(4, 'Dev Manager', 2),
(5, 'Developer', 4);
Now we want to find all subordinates of the CTO.
WITH RECURSIVE employee_chain AS (
SELECT id, name, manager_id
FROM employees
WHERE id = 2 -- CTO
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e
INNER JOIN employee_chain ec ON e.manager_id = ec.id
)
SELECT * FROM employee_chain;
This recursively fetches everyone reporting directly or indirectly to the CTO.
Tip: Adding a depth
column is useful for indentation or tree format display.
WITH RECURSIVE employee_chain AS (
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE id = 2
UNION ALL
SELECT e.id, e.name, e.manager_id, ec.depth + 1
FROM employees e
INNER JOIN employee_chain ec ON e.manager_id = ec.id
)
SELECT * FROM employee_chain;
4. Limiting Depth and Preventing Infinite Loops
Recursive queries can loop forever if there are cycles. To protect against this, PostgreSQL limits recursion to 100 iterations by default, but it’s wise to also include custom logic to prevent cycles and limit depth:
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id, ARRAY[id] AS path
FROM categories
WHERE id = 1
UNION ALL
SELECT c.id, c.name, c.parent_id, path || c.id
FROM categories c
INNER JOIN category_tree ct ON c.parent_id = ct.id
WHERE NOT c.id = ANY(ct.path)
)
SELECT * FROM category_tree;
This ensures no node is revisited by tracking visited nodes in an array.
5. Optimizations and Best Practices
While recursive CTEs are powerful, performance can degrade with deep trees or large datasets. Here are some tips to optimize:
- Add indexes: Make sure
parent_id
ormanager_id
columns are indexed. - Limit recursion depth: Add a
WHERE depth <= N
clause when appropriate. - Use
UNION ALL
instead ofUNION
: Avoids costly sorting and duplicate elimination unless necessary. - Denormalize cautiously: For heavily-read systems, consider maintaining flattened trees using triggers or materialized views.
Conclusion
WITH RECURSIVE
in PostgreSQL empowers developers to elegantly handle hierarchical data directly in SQL. Whether building navigation menus, processing file systems, or modeling org charts, recursive CTEs offer a clean, efficient way to walk through trees. Mastering this technique makes your SQL queries more expressive, performant, and maintainable.
Want to experiment? Try tools like DB Fiddle or your favorite Postgres client!
Useful links: