What are the options for storing hierarchical data in a relational database?

general discussion forum gives members an opportunity to discuss proposed policy ideas prior to submitting as a formal
Post Reply
admin
Site Admin
Posts: 36
Joined: Sun Aug 08, 2021 7:49 am

What are the options for storing hierarchical data in a relational database?

Post by admin »

Good Overviews

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set, and Nested Interval I've found.
Models for hierarchical data: slides with good explanations of tradeoffs and example usage
Representing hierarchies in MySQL: very good overview of Nested Set in particular
Hierarchical data in RDBMSs: a most comprehensive and well-organized set of links I've seen, but not much in the way of explanation
Options

Ones I am aware of and general features:

Adjacency List:
Columns: ID, ParentID
Easy to implement.
Cheap node moves, inserts, and deletes.
Expensive to find the level, ancestry & descendants, path
Avoid N+1 via Common Table Expressions in databases that support them
Nested Set (a.k.a Modified Preorder Tree Traversal)
Columns: Left, Right
Cheap ancestry, descendants
Very expensive O(n/2) moves, inserts, deletes due to volatile encoding
Bridge Table (a.k.a. Closure Table /w triggers)
Uses separate join table with ancestor, descendant, depth (optional)
Cheap ancestry and descendants
Writes costs O(log n) (size of the subtree) for insert, updates, deletes
Normalized encoding: good for RDBMS statistics & query planner in joins
Requires multiple rows per node
Lineage Column (a.k.a. Materialized Path, Path Enumeration)
Column: lineage (e.g. /parent/child/grandchild/etc...)
Cheap descendants via prefix query (e.g. LEFT(lineage, #) = '/enumerated/path')
Writes costs O(log n) (size of the subtree) for insert, updates, deletes
Non-relational: relies on Array datatype or serialized string format
Nested Intervals
Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
Has real/float/decimal representation/precision issues
Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with the added trickiness of linear algebra.
Flat Table
A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
Cheap to iterate/paginate over
Expensive move and delete
Good Use: threaded discussion - forums / blog comments
Multiple lineage columns
Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
Cheap ancestors, descendants, level
Cheap insert, delete, move of the leaves
Expensive insert, delete, move of the internal nodes
Hard limit to how deep the hierarchy can be
Database Specific Notes

MySQL

Use session variables for Adjacency List
Oracle

Use CONNECT BY to traverse Adjacency Lists
PostgreSQL

ltree datatype for Materialized Path
SQL Server

General summary
2008 offers HierarchyId data type that appears to help with the Lineage Column approach and expand the depth that can be represented.
Post Reply