Hierarchical queries in MySQL
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
There is no need in explaining how convenient hierarchical queries are.
A quick reminder: hierarchical data is a parent-child relationship contained in one table.
A typical task is to return values from the table in the following way:
- Resultset should be sorted like a tree, that is lexical sort by ancestry chains
- Depth level should be returned along with each row
It may sound confusing, but it’s very simple in fact, like shown on this Oracle query:
1.
SELECT
LPAD(
' '
,
level
* 4,
' '
) || id, parent,
level
2.
FROM
t_hierarchy
3.
START
WITH
4.
parent = 0
5.
CONNECT
BY
6.
parent =
PRIOR
id
We have a nice tree sorted as a tree, with rows indented according to the depth level.
In the query above, START WITH
defines the root of the tree, and CONNECT BY
defines join condition between parent and child rows. Parent columns are defined by adding PRIOR
keyword to them.
In MySQL there is no such construct, but it can be emulated.
We may try construct a sorting function somehow, but it will be too complex to implement and probably inefficient.
Instead, we will develop our function in the following way:
- The function will be called for each row in a rowset
- We will not use the actual values of the rowset, but instead keep all intermediary data in the session variables. Rowset will serve just as a dummy source of rows for subsequent joins
- We will not use recursion, as it’s impossible to keep the recursion state between function calls. Instead, we will just calculate next
id
given only currentid
How do we find the next id
given a current one?
- Initially,
id
is set to theid
of the root we start with. - If there is a row in the table with the same
parent
but greaterid
, we return that greaterid
, i. e. we return the next sibling of our row. - If there current row is last of its parent, the parent becomes the current
id
, and we try repeat the step above. - If the our row is the last child of the root, we return
NULL
for this call and all subsequent calls.
This is implemented as following:
01.
CREATE
FUNCTION
hierarchy_connect_by_parent_eq_prior_id(value
INT
)
RETURNS
INT
02.
NOT
DETERMINISTIC
03.
READS SQL DATA
04.
BEGIN
05.
DECLARE
_id
INT
;
06.
DECLARE
_parent
INT
;
07.
DECLARE
_next
INT
;
08.
DECLARE
CONTINUE
HANDLER
FOR
NOT
FOUND
SET
@id =
NULL
;
09.
10.
SET
_parent = @id;
11.
SET
_id = -1;
12.
13.
IF @id
IS
NULL
THEN
14.
RETURN
NULL
;
15.
END
IF;
16.
17.
LOOP
18.
SELECT
MIN
(id)
19.
INTO
@id
20.
FROM
t_hierarchy
21.
WHERE
parent = _parent
22.
AND
id > _id;
23.
IF @id
IS
NOT
NULL
OR
_parent = @start_with
THEN
24.
SET
@
level
= @
level
+ 1;
25.
RETURN
@id;
26.
END
IF;
27.
SET
@
level
:= @
level
- 1;
28.
SELECT
id, parent
29.
INTO
_id, _parent
30.
FROM
t_hierarchy
31.
WHERE
id = _parent;
32.
END
LOOP;
33.
END
As you can see, this function takes a value as an input agrument, but doesn’t actually use it. This is becauseMySQL ignores NON DETERMINISTIC
clause in the function definition, caches it’s return value and doesn’t actually call it if it’s called with the same arguments.
To avoid this, we need some non-repeating value, that is id
.
We use this function in a query as following:
01.
SELECT
CONCAT(REPEAT(
' '
,
level
- 1),
CAST
(hi.id
AS
CHAR
))
AS
treeitem, parent,
level
02.
FROM
(
03.
SELECT
hierarchy_connect_by_parent_eq_prior_id(id)
AS
id, @
level
AS
level
04.
FROM
(
05.
SELECT
@start_with := 0,
06.
@id := @start_with,
07.
@
level
:= 0
08.
) vars, t_hierarchy
09.
WHERE
@id
IS
NOT
NULL
10.
) ho
11.
JOIN
t_hierarchy hi
12.
ON
hi.id = ho.id
In the first subquery, we initialize the variables and set our root item.
Note that it’s unadvisable to use NULL
as a root, because it’s not comparable with an equality condition and therefore cannot be easily searched for without syntax change.
We use a 0 instead, because real id
‘s traditionally start with 1.
We check for @id IS NOT NULL
in the WHERE
clause to optimize the query. Our function is written so that@id
, being once set to NULL
, will always remain NULL
, and we can skip remaining rows.
For each row of our rowset (which is used only as an iterator loop), the function returns next id
in a tree. We then join another instance of t_hierarchy
on this id
.
As a result of the query, we get:
To be continued.