‘SELECT … WITH RECURSIVE’: Advanced SQL Tree Traversals Explained

‘SELECT … WITH RECURSIVE’: Advanced SQL Tree Traversals Explained

‘SELECT … WITH RECURSIVE’: Advanced SQL Tree Traversals Explained

 

Handling hierarchical data like category trees, folder structures, and organizational charts is a common yet complex challenge in SQL. PostgreSQL offers a powerful tool to solve this challenge: the WITH RECURSIVE Common Table Expression (CTE). In this article, we’ll demystify how recursive queries work in SQL, specifically in PostgreSQL, using real-world examples.

1. Understanding Hierarchical Data in Relational Databases

Relational databases are inherently flat, which makes representing trees and hierarchies tricky. A common design pattern for such data is a self-referencing table, in which each row has a foreign key to its parent.

CREATE TABLE categories (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  parent_id INTEGER REFERENCES categories(id)
);

This structure allows nesting, where a category can have a parent category, which enables tree-like relationships.

Example data:

INSERT INTO categories (id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktops', 2),
(5, 'Smartphones', 1);

Now we want to query all descendants of a given category. This is where WITH RECURSIVE shines.

2. Basic Recursive CTE Syntax

The idea of a recursive CTE is to combine a base query (anchor member) with a recursive part that refers back to the CTE itself.

WITH RECURSIVE category_tree AS (
  SELECT id, name, parent_id
  FROM categories
  WHERE id = 1 -- starting node ('Electronics')

  UNION ALL

  SELECT c.id, c.name, c.parent_id
  FROM categories c
  INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree;

This query finds all categories that are descendants of ‘Electronics’.

How it works:

  • The base query selects the root node with id = 1.
  • The recursive part finds all categories whose parent_id matches the CTE’s current rows.
  • UNION ALL accumulates results until no new rows are found.

3. Practical Use Case: Organizational Chart Traversal

Let’s model an employee directory where employees can report to other employees:

CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  manager_id INTEGER REFERENCES employees(id)
);

INSERT INTO employees (id, name, manager_id) VALUES
(1, 'CEO', NULL),
(2, 'CTO', 1),
(3, 'CFO', 1),
(4, 'Dev Manager', 2),
(5, 'Developer', 4);

Now we want to find all subordinates of the CTO.

WITH RECURSIVE employee_chain AS (
  SELECT id, name, manager_id
  FROM employees
  WHERE id = 2 -- CTO

  UNION ALL

  SELECT e.id, e.name, e.manager_id
  FROM employees e
  INNER JOIN employee_chain ec ON e.manager_id = ec.id
)
SELECT * FROM employee_chain;

This recursively fetches everyone reporting directly or indirectly to the CTO.

Tip: Adding a depth column is useful for indentation or tree format display.

WITH RECURSIVE employee_chain AS (
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE id = 2

  UNION ALL

  SELECT e.id, e.name, e.manager_id, ec.depth + 1
  FROM employees e
  INNER JOIN employee_chain ec ON e.manager_id = ec.id
)
SELECT * FROM employee_chain;

4. Limiting Depth and Preventing Infinite Loops

Recursive queries can loop forever if there are cycles. To protect against this, PostgreSQL limits recursion to 100 iterations by default, but it’s wise to also include custom logic to prevent cycles and limit depth:

WITH RECURSIVE category_tree AS (
  SELECT id, name, parent_id, ARRAY[id] AS path
  FROM categories
  WHERE id = 1

  UNION ALL

  SELECT c.id, c.name, c.parent_id, path || c.id
  FROM categories c
  INNER JOIN category_tree ct ON c.parent_id = ct.id
  WHERE NOT c.id = ANY(ct.path)
)
SELECT * FROM category_tree;

This ensures no node is revisited by tracking visited nodes in an array.

5. Optimizations and Best Practices

While recursive CTEs are powerful, performance can degrade with deep trees or large datasets. Here are some tips to optimize:

  • Add indexes: Make sure parent_id or manager_id columns are indexed.
  • Limit recursion depth: Add a WHERE depth <= N clause when appropriate.
  • Use UNION ALL instead of UNION: Avoids costly sorting and duplicate elimination unless necessary.
  • Denormalize cautiously: For heavily-read systems, consider maintaining flattened trees using triggers or materialized views.

Conclusion

WITH RECURSIVE in PostgreSQL empowers developers to elegantly handle hierarchical data directly in SQL. Whether building navigation menus, processing file systems, or modeling org charts, recursive CTEs offer a clean, efficient way to walk through trees. Mastering this technique makes your SQL queries more expressive, performant, and maintainable.

Want to experiment? Try tools like DB Fiddle or your favorite Postgres client!

 

Useful links: