Improving Oracle SQL Query with Connect By Performance
Hierarchical / recursive queries are amazing…but can get slow with highly nested data sets.

I recently had a hierarchical query that was slow. I could run the same query, un-nested, quickly, but when I added in the “connect by” things slowed down. I had the same issue using a recursive common table expression (CTE or “with clause”). Either way, the query took 104 or more seconds to run. Below is an example of the query:
-- this is slow
select id,
parent_id,
sys_connect_by_path(id,'.') cb_path,
connect_by_isleaf is_leaf,
c1
...
c200,
(select my_function(c1) from dual) my_f,
...
from my_table
start with parent_id is null -- ** set parent id column
connect by nocycle prior id = parent_id
order siblings by seq_no
Looking at the explain plan and a trace, I found that the database was reading the entire table many times. I guessed that I might be able to get the database to do a single read of the table if I could limit the “connect by” portion to use only an index, and then, once the hierarchy is established, join in the rest of the data in a single pass.
-- this is much faster
-- assumes you have a single compound index on id,parent_id
-- and another unique key index on id
with cb_data as
(select id,
parent_id,
sys_connect_by_path(id,'.') cb_path,
connect_by_isleaf is_leaf
from my_table
start with parent_id is null -- ** set parent id column
connect by nocycle prior id = parent_id
order siblings by seq_no
)
select id,
parent_id,
cb_path,
is_leaf,
c1
...
c200,
(select my_function(c1) from dual) my_f,
...
from cb_data
join my_table mt on mt.id = cb_data.id
In the new query, the hierarchical portion, cb_data, contains only 3 columns: id, parent_id, seq_no. I created an index with just these three columns. If I run just the hierarchical portion of the query the database does not read the table at all. It only reads the index. When I run the entire query, the database reads only the index to establish the results of cb_data and then does a single pass of my_table to return the final results. The new query returns the same results but, in my case, runs in 4 seconds. That’s a solid improvement over the original 104 seconds.
The key is to have a single index that contains all of the columns used in the CTE (cb_data above). If you require any additional columns in the CTE to accomplish the connect by, ensure they are in the index. I have not done extensive testing related to the order of the columns. My initial impression is that id, parent_id, seq_no provides good performance.
I hope this helps others as well. Comment to let me know if you try it and have similar results.
