PostgreSQL运维案例之递归查询死循环解决方案

一、问题背景

某日，开发同事上报一sql性能问题，一条查询好似一直跑不出结果，查询了n小时，还未返回结果。比较诡异的是同样的sql，相同的数据量，相同的表大小，且在服务器硬件配置相同的情况下，在另外一套环境查询非常快，毫秒级。

第一时间排查了异常环境的查询进程stack，并抓取了一分钟的strace。从结果得知进程是正常执行的，那么看起来就是查询慢的问题了。

最终发现是递归查询出现了死循环，以下内容均是在个人电脑进行的模拟复现

sql语句如下：

with s as (select * from emp_info where empno=\’200\’ and emp_type>\’5\’ and emp_status=\’Y\’)

select
s.empno as \”staffNo\”,
s.emp_type as \”empType\”,
s.emp_tel_info as \”empNum\”,
a.cust_name as \”Name\”,
a.cust_position as \”Postion\”,
a.cust_addr as \”Addr\”,
a.cust_tel_info as \”Mobile\”,
(
with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status
from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)

select r.region_code as \”FirstRegCode\”
from r where r.region_type=\’5\’
and r.region_status=\’Y\’
),
(
with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status
from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)

select r.region_code as \”SecondRegCode\”
from r where r.region_type=\’4\’
and r.region_status=\’Y\’
),
(
with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status
from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)

select r.region_code as \”ThirdRegCode\”
from r where r.region_type=\’3\’
and r.region_status=\’Y\’

),
(
with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status
from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)

select r.region_code as \”FurthRegCode\”
from r where r.region_type=\’2\’
and r.region_status=\’Y\’

)

from s left join cust_info a on s.empno=a.cust_id;

二、问题分析

对比了两个坏境的执行计划，代价预估及扫描算子、连接算子看起来都是一样的。

执行计划如下：

QUERY PLAN
————————————————————————————————————-
Nested Loop Left Join (cost=8.58..1944.99 rows=1 width=866)
CTE s
-> Index Scan using emp_info_pkey on emp_info (cost=0.28..8.30 rows=1 width=57)
Index Cond: ((empno)::text = \’200\’::text)
Filter: (((emp_type)::text > \’5\’::text) AND ((emp_status)::text = \’Y\’::text))
-> CTE Scan on s (cost=0.00..0.02 rows=1 width=256)
-> Index Scan using cust_info_pkey on cust_info a (cost=0.28..8.29 rows=1 width=200)
Index Cond: ((s.empno)::text = (cust_id)::text)
SubPlan 3
-> CTE Scan on r r_1 (cost=479.57..482.09 rows=1 width=118)
Filter: (((region_type)::text = \’5\’::text) AND ((region_status)::text = \’Y\’::text))
CTE r
-> Recursive Union (cost=0.28..479.57 rows=101 width=19)
-> Index Scan using region_tbl_pkey on region_tbl f (cost=0.28..8.29 rows=1 width=19)
Index Cond: ((region_code)::text = (s.region_code)::text)
-> Hash Join (cost=0.33..46.93 rows=10 width=19)
Hash Cond: ((f_1.region_code)::text = (r.parent_region_code)::text)
-> Seq Scan on region_tbl f_1 (cost=0.00..39.00 rows=2000 width=19)
-> Hash (cost=0.20..0.20 rows=10 width=118)
-> WorkTable Scan on r (cost=0.00..0.20 rows=10 width=118)
SubPlan 5
-> CTE Scan on r r_3 (cost=479.57..482.09 rows=1 width=118)
Filter: (((region_type)::text = \’4\’::text) AND ((region_status)::text = \’Y\’::text))
CTE r
-> Recursive Union (cost=0.28..479.57 rows=101 width=19)
-> Index Scan using region_tbl_pkey on region_tbl f_2 (cost=0.28..8.29 rows=1 width=19)
Index Cond: ((region_code)::text = (s.region_code)::text)
-> Hash Join (cost=0.33..46.93 rows=10 width=19)
Hash Cond: ((f_3.region_code)::text = (r_2.parent_region_code)::text)
-> Seq Scan on region_tbl f_3 (cost=0.00..39.00 rows=2000 width=19)
-> Hash (cost=0.20..0.20 rows=10 width=118)
-> WorkTable Scan on r r_2 (cost=0.00..0.20 rows=10 width=118)
SubPlan 7
-> CTE Scan on r r_5 (cost=479.57..482.09 rows=1 width=118)
Filter: (((region_type)::text = \’3\’::text) AND ((region_status)::text = \’Y\’::text))
CTE r
-> Recursive Union (cost=0.28..479.57 rows=101 width=19)
-> Index Scan using region_tbl_pkey on region_tbl f_4 (cost=0.28..8.29 rows=1 width=19)
Index Cond: ((region_code)::text = (s.region_code)::text)
-> Hash Join (cost=0.33..46.93 rows=10 width=19)
Hash Cond: ((f_5.region_code)::text = (r_4.parent_region_code)::text)
-> Seq Scan on region_tbl f_5 (cost=0.00..39.00 rows=2000 width=19)
-> Hash (cost=0.20..0.20 rows=10 width=118)
-> WorkTable Scan on r r_4 (cost=0.00..0.20 rows=10 width=118)
SubPlan 9
-> CTE Scan on r r_7 (cost=479.57..482.09 rows=1 width=118)
Filter: (((region_type)::text = \’2\’::text) AND ((region_status)::text = \’Y\’::text))
CTE r
-> Recursive Union (cost=0.28..479.57 rows=101 width=19)
-> Index Scan using region_tbl_pkey on region_tbl f_6 (cost=0.28..8.29 rows=1 width=19)
Index Cond: ((region_code)::text = (s.region_code)::text)
-> Hash Join (cost=0.33..46.93 rows=10 width=19)
Hash Cond: ((f_7.region_code)::text = (r_6.parent_region_code)::text)
-> Seq Scan on region_tbl f_7 (cost=0.00..39.00 rows=2000 width=19)
-> Hash (cost=0.20..0.20 rows=10 width=118)
-> WorkTable Scan on r r_6 (cost=0.00..0.20 rows=10 width=118)
(56 rows)

postgres=#

从执行计划来看，代价预估中没有发现非常耗时的步骤。对正常的环境中explain analyze查看实际消耗，实际执行300ms，最终返回了一条数据，和代价预估基本一致。逐步排查，最终将重心放在了递归查询这部分。

递归部分sql：

with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status
from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)

分析sql逻辑，递归条件为f.region_code=r.parent_region_code，并且递归开始的f.region_code字段值为s.region_code=‘1200’，这里的1200是通过对s表进行查询得到的，如下：

改写递归部分的sql，查看实际执行，并打印了元组的ctid，如下是limit 10的结果：

postgres=# with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f where f.region_code=\’1200\’ union all select f.ctid, f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)select * from r limit 10;
ctid | region_code | parent_region_code | region_type | region_status
———+————-+——————–+————-+—————
(18,75) | 1200 | 1020 | 5 | Y
(18,76) | 1020 | 1002 | 4 | Y
(9,108) | 1002 | 120 | 3 | Y
(18,79) | 120 | 12 | 2 | N
(18,81) | 12 | 1 | 1 | N
(0,110) | 1 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(0,109) | 4 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(0,109) | 4 | 3 | 3 | N
(10 rows)

再看limit 15的结果：

postgres=# with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f where f.region_code=\’1200\’ union all select f.ctid, f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)select * from r limit 15;
ctid | region_code | parent_region_code | region_type | region_status
———+————-+——————–+————-+—————
(18,75) | 1200 | 1020 | 5 | Y
(18,76) | 1020 | 1002 | 4 | Y
(9,108) | 1002 | 120 | 3 | Y
(18,79) | 120 | 12 | 2 | N
(18,81) | 12 | 1 | 1 | N
(0,110) | 1 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(0,109) | 4 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(0,109) | 4 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(0,109) | 4 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(0,109) | 4 | 3 | 3 | N
(0,108) | 3 | 4 | 6 | N
(15 rows)

之后还打印了limit 1000，limit 10000的结果。发现一个现象，ctid为(0,108) (0,109)这两条数据一直在交替迭代，所以sql执行慢是一直在交替扫描这两条数据，这条sql在这个环境中是永远都跑不出结果的。

这两条数据很有特点，目前的递归条件为f.region_code=r.parent_region_code，而这两条数据的值刚好形成了一个闭环，导致递归陷入了死循环。

postgres=# select ctid,region_code,parent_region_code from region_tbl where region_code in (\’3\’,\’4\’);
ctid | region_code | parent_region_code
———+————-+——————–
(0,108) | 3 | 4
(0,109) | 4 | 3
(2 rows)

另外一个正常的环境中这两条数据并没有形成闭环，如下：