我有两台运行 PostgreSQL 10.11 的服务器,它们的配置相同,在类似的硬件上运行,包含相同的数据库和(大部分)相同的表。
各种 Nagios 使用check_postgres
定期检查这些服务器进行检查,包括检查“膨胀”的服务器。
在一台服务器上,这种膨胀检查需要 3-4 秒。
在另一台服务器上,膨胀检查需要 24-30分钟。编辑:截至 2020 年 1 月 14 日,需要 55-60 分钟。
此外,这种情况刚刚开始,就像 3 天前一样。在此之前,膨胀检查在现在“慢”的服务器上运行了一段合理的时间(不到 10 秒)。
在对 使用的“check_bloat”查询进行额外调查后check_postgres
,我发现我认为这可能是为什么它在一台服务器上比另一台服务器慢得多的原因。该pg_stats
表在慢速服务器上包含约 815,000 个条目。在“快速”服务器上,它仅包含约 45,000 个条目。编辑:四十多天后,截至 2020 年 1 月 14 日,该pg_stats
表包含约 838,000 个条目。
我能做些什么呢?有没有办法缩小pg_stats
( pg_statistic
) 表?
我尝试重新启动“慢”服务器并vacuum analyze;
对其进行操作。没有效果。
我尝试select pg_stat_reset()
在“慢”服务器上的每个数据库中执行,但它似乎没有做任何事情。(中的行数pg_stats
没有改变。)
编辑:这是 SQL:
SELECT ns.nspname, tbl.relname, hdr, ma, bs,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
) AS nullhdr
FROM pg_attribute att
JOIN pg_class tbl ON att.attrelid = tbl.oid
JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname,
(
SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN SUBSTRING(SPLIT_PART(v, ' ', 2) FROM '#\"[0-9]+.[0-9]+#\"%' for '#')
IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' OR v ~ '64-bit' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
WHERE att.attnum > 0 AND tbl.relkind='r'
GROUP BY 1,2,3,4,5;
编辑:这是EXPLAIN (ANALYZE,BUFFERS)
“慢”服务器上的输出:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=1935395.99..4529587255.06 rows=4097995 width=188) (actual time=106609.972..396360.540 rows=96069 loops=1)
Group Key: ns.nspname, tbl.relname, CASE WHEN ("substring"(split_part(version(), ' '::text, 2), '#\"[0-9]+.[0-9]+#\"%'::text, '#'::text) = ANY ('{8.0,8.1,8.2}'::text[])) THEN 27 ELSE 23 END, CASE WHEN ((version() ~ 'mingw32'::text) OR (version() ~ '64-bit'::text)) THEN 8 ELSE 4 END, $0
Buffers: shared hit=81434823 read=621336, temp read=340059 written=340054
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.022..0.023 rows=1 loops=1)
-> Merge Left Join (cost=1935395.97..2063714.73 rows=4097995 width=176) (actual time=106594.140..263322.505 rows=13304172 loops=1)
Merge Cond: ((ns.nspname = s.schemaname) AND (tbl.relname = s.tablename) AND (att.attname = s.attname))
Buffers: shared hit=6832941 read=331003, temp read=340059 written=340054
-> Sort (cost=1427752.62..1437997.60 rows=4097995 width=192) (actual time=100214.124..113229.530 rows=13304172 loops=1)
Sort Key: ns.nspname, tbl.relname, att.attname
Sort Method: external merge Disk: 2629600kB
Buffers: shared hit=52022 read=282240, temp read=328700 written=328701
-> Hash Join (cost=15335.36..599474.35 rows=4097995 width=192) (actual time=242.121..56135.695 rows=13304172 loops=1)
Hash Cond: (tbl.relnamespace = ns.oid)
Buffers: shared hit=52022 read=282240
-> Hash Join (cost=15246.41..588609.50 rows=4097995 width=132) (actual time=237.991..35301.264 rows=13304172 loops=1)
Hash Cond: (att.attrelid = tbl.oid)
Buffers: shared hit=52006 read=282219
-> Seq Scan on pg_attribute att (cost=0.00..532962.28 rows=15390604 width=68) (actual time=0.004..13225.400 rows=13870158 loops=1)
Filter: (attnum > 0)
Rows Removed by Filter: 1147617
Buffers: shared hit=42449 read=282219
-> Hash (cost=14050.06..14050.06 rows=95708 width=72) (actual time=237.495..237.496 rows=96069 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 10782kB
Buffers: shared hit=9557
-> Seq Scan on pg_class tbl (cost=0.00..14050.06 rows=95708 width=72) (actual time=0.005..149.710 rows=96069 loops=1)
Filter: (relkind = 'r'::"char")
Rows Removed by Filter: 270863
Buffers: shared hit=9557
-> Hash (cost=58.42..58.42 rows=2442 width=68) (actual time=4.095..4.096 rows=2393 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 266kB
Buffers: shared hit=13 read=21
-> Seq Scan on pg_namespace ns (cost=0.00..58.42 rows=2442 width=68) (actual time=0.007..2.020 rows=2393 loops=1)
Buffers: shared hit=13 read=21
-> Sort (cost=507643.36..507707.46 rows=25640 width=200) (actual time=6379.663..6409.276 rows=37000 loops=1)
Sort Key: s.schemaname, s.tablename, s.attname
Sort Method: quicksort Memory: 11365kB
Buffers: shared hit=6780919 read=48763, temp read=11359 written=11353
-> Subquery Scan on s (cost=23845.50..505765.73 rows=25640 width=200) (actual time=1299.167..6301.807 rows=37000 loops=1)
Buffers: shared hit=6780919 read=48763, temp read=11359 written=11353
-> Gather (cost=23845.50..505509.33 rows=25640 width=401) (actual time=1299.164..6213.093 rows=37000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6780919 read=48763, temp read=11359 written=11353
-> Hash Left Join (cost=22845.50..501945.33 rows=10683 width=401) (actual time=1127.147..5861.412 rows=12333 loops=3)
Hash Cond: (c.relnamespace = n.oid)
Buffers: shared hit=6780919 read=48763, temp read=11359 written=11353
-> Hash Join (cost=22756.56..501828.29 rows=10683 width=140) (actual time=1121.277..5826.907 rows=12333 loops=3)
Hash Cond: (s_1.starelid = c.oid)
Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
Rows Removed by Join Filter: 266947
Buffers: shared hit=6780817 read=48763, temp read=11359 written=11353
-> Nested Loop (cost=0.43..462367.09 rows=416791 width=82) (actual time=0.061..2480.113 rows=279280 loops=3)
Buffers: shared hit=3354232 read=48763
-> Parallel Seq Scan on pg_statistic s_1 (cost=0.00..53955.73 rows=236437 width=14) (actual time=0.030..405.942 rows=279280 loops=3)
Filter: (NOT stainherit)
Buffers: shared hit=464 read=48763
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.43..3.02 rows=2 width=70) (actual time=0.004..0.004 rows=1 loops=837840)
Index Cond: ((attrelid = s_1.starelid) AND (attnum = s_1.staattnum))
Filter: (NOT attisdropped)
Buffers: shared hit=3353768
-> Hash (cost=14050.06..14050.06 rows=359445 width=72) (actual time=1118.810..1118.811 rows=366932 loops=3)
Buckets: 262144 Batches: 2 Memory Usage: 20667kB
Buffers: shared hit=28671, temp written=6186
-> Seq Scan on pg_class c (cost=0.00..14050.06 rows=359445 width=72) (actual time=0.007..551.732 rows=366932 loops=3)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
Buffers: shared hit=28671
-> Hash (cost=58.42..58.42 rows=2442 width=68) (actual time=5.839..5.840 rows=2393 loops=3)
Buckets: 4096 Batches: 1 Memory Usage: 266kB
Buffers: shared hit=102
-> Seq Scan on pg_namespace n (cost=0.00..58.42 rows=2442 width=68) (actual time=0.017..2.882 rows=2393 loops=3)
Buffers: shared hit=102
SubPlan 2
-> Aggregate (cost=1104.73..1104.74 rows=1 width=8) (actual time=1.149..1.150 rows=1 loops=96069)
Buffers: shared hit=74601882 read=290333
-> Nested Loop (cost=1.56..1104.72 rows=1 width=401) (actual time=1.144..1.147 rows=0 loops=96069)
Join Filter: (c_1.oid = s_2.starelid)
Buffers: shared hit=74601882 read=290333
-> Nested Loop (cost=1.14..1103.30 rows=1 width=10) (actual time=1.141..1.143 rows=0 loops=96069)
Buffers: shared hit=74440049 read=284290
-> Nested Loop (cost=0.70..16.75 rows=1 width=4) (actual time=0.181..0.356 rows=1 loops=96069)
Join Filter: (c_1.relnamespace = n_1.oid)
Rows Removed by Join Filter: 57
Buffers: shared hit=19268934 read=1187
-> Index Scan using pg_class_relname_nsp_index on pg_class c_1 (cost=0.42..8.44 rows=1 width=8) (actual time=0.006..0.069 rows=58 loops=96069)
Index Cond: (relname = tbl.relname)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
Buffers: shared hit=2617035 read=1168
-> Index Scan using pg_namespace_nspname_index on pg_namespace n_1 (cost=0.28..8.30 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=5531986)
Index Cond: (nspname = ns.nspname)
Buffers: shared hit=16651899 read=19
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a_1 (cost=0.43..1084.53 rows=201 width=6) (actual time=0.782..0.782 rows=0 loops=96069)
Index Cond: (attrelid = c_1.oid)
Filter: ((NOT attisdropped) AND has_column_privilege(c_1.oid, attnum, 'select'::text))
Rows Removed by Filter: 144
Buffers: shared hit=55171115 read=283103
-> Index Scan using pg_statistic_relid_att_inh_index on pg_statistic s_2 (cost=0.43..1.07 rows=28 width=6) (actual time=0.003..0.004 rows=0 loops=43518)
Index Cond: ((starelid = a_1.attrelid) AND (staattnum = a_1.attnum))
Filter: (stanullfrac <> '0'::double precision)
Rows Removed by Filter: 0
Buffers: shared hit=161833 read=6043
Planning time: 3.341 ms
Execution time: 397056.086 ms
(103 rows)
这是EXPLAIN (ANALYZE,BUFFERS)
“快速”服务器上的输出:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=18339.88..841921.77 rows=13555 width=188) (actual time=1379.248..8600.924 rows=2694 loops=1)
Group Key: ns.nspname, tbl.relname, CASE WHEN ("substring"(split_part(version(), ' '::text, 2), '#\"[0-9]+.[0-9]+#\"%'::text, '#'::text) = ANY ('{8.0,8.1,8.2}'::text[])) THEN 27 ELSE 23 END, CASE WHEN ((version() ~ 'mingw32'::text) OR (version() ~ '64-bit'::text)) THEN 8 ELSE 4 END, $0
Buffers: shared hit=3648911
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.026..0.027 rows=1 loops=1)
-> Merge Left Join (cost=18339.86..18764.78 rows=13555 width=176) (actual time=1372.967..2097.150 rows=55107 loops=1)
Merge Cond: ((ns.nspname = s.schemaname) AND (tbl.relname = s.tablename) AND (att.attname = s.attname))
Buffers: shared hit=262164
-> Sort (cost=5887.15..5921.04 rows=13555 width=192) (actual time=371.212..415.656 rows=55107 loops=1)
Sort Key: ns.nspname, tbl.relname, att.attname
Sort Method: quicksort Memory: 16174kB
Buffers: shared hit=3056
-> Hash Join (cost=708.49..4956.84 rows=13555 width=192) (actual time=9.314..282.912 rows=55107 loops=1)
Hash Cond: (tbl.relnamespace = ns.oid)
Buffers: shared hit=3056
-> Hash Join (cost=670.68..4883.30 rows=13555 width=132) (actual time=7.529..194.704 rows=55107 loops=1)
Hash Cond: (att.attrelid = tbl.oid)
Buffers: shared hit=3038
-> Seq Scan on pg_attribute att (cost=0.00..4007.49 rows=78125 width=68) (actual time=0.004..77.499 rows=82849 loops=1)
Filter: (attnum > 0)
Rows Removed by Filter: 37659
Buffers: shared hit=2595
-> Hash (cost=637.01..637.01 rows=2693 width=72) (actual time=7.497..7.497 rows=2694 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 306kB
Buffers: shared hit=443
-> Seq Scan on pg_class tbl (cost=0.00..637.01 rows=2693 width=72) (actual time=0.007..5.110 rows=2694 loops=1)
Filter: (relkind = 'r'::"char")
Rows Removed by Filter: 12831
Buffers: shared hit=443
-> Hash (cost=25.14..25.14 rows=1014 width=68) (actual time=1.757..1.758 rows=1036 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 118kB
Buffers: shared hit=15
-> Seq Scan on pg_namespace ns (cost=0.00..25.14 rows=1014 width=68) (actual time=0.008..0.841 rows=1036 loops=1)
Buffers: shared hit=15
-> Sort (cost=12452.71..12453.04 rows=131 width=200) (actual time=1001.404..1031.723 rows=36997 loops=1)
Sort Key: s.schemaname, s.tablename, s.attname
Sort Method: quicksort Memory: 11364kB
Buffers: shared hit=259108
-> Subquery Scan on s (cost=5455.47..12448.10 rows=131 width=200) (actual time=709.076..941.593 rows=36997 loops=1)
Buffers: shared hit=259108
-> Hash Left Join (cost=5455.47..12446.79 rows=131 width=401) (actual time=709.073..886.682 rows=36997 loops=1)
Hash Cond: (c.relnamespace = n.oid)
Buffers: shared hit=259108
-> Hash Join (cost=5417.65..12408.63 rows=131 width=140) (actual time=707.296..823.908 rows=36997 loops=1)
Hash Cond: ((s_1.starelid = c.oid) AND (s_1.staattnum = a.attnum))
Buffers: shared hit=259093
-> Seq Scan on pg_statistic s_1 (cost=0.00..6793.45 rows=19622 width=14) (actual time=0.014..45.484 rows=45420 loops=1)
Filter: (NOT stainherit)
Buffers: shared hit=6401
-> Hash (cost=4852.66..4852.66 rows=37666 width=142) (actual time=706.898..706.899 rows=69779 loops=1)
Buckets: 131072 (originally 65536) Batches: 1 (originally 1) Memory Usage: 12881kB
Buffers: shared hit=252689
-> Hash Join (cost=831.03..4852.66 rows=37666 width=142) (actual time=29.082..629.863 rows=69779 loops=1)
Hash Cond: (a.attrelid = c.oid)
Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
Rows Removed by Join Filter: 50729
Buffers: shared hit=252689
-> Seq Scan on pg_attribute a (cost=0.00..3724.99 rows=112999 width=70) (actual time=0.005..117.968 rows=120508 loops=1)
Filter: (NOT attisdropped)
Buffers: shared hit=2595
-> Hash (cost=637.01..637.01 rows=15521 width=72) (actual time=28.956..28.957 rows=15525 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1705kB
Buffers: shared hit=443
-> Seq Scan on pg_class c (cost=0.00..637.01 rows=15521 width=72) (actual time=0.007..15.192 rows=15525 loops=1)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
Buffers: shared hit=443
-> Hash (cost=25.14..25.14 rows=1014 width=68) (actual time=1.760..1.761 rows=1036 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 118kB
Buffers: shared hit=15
-> Seq Scan on pg_namespace n (cost=0.00..25.14 rows=1014 width=68) (actual time=0.012..0.858 rows=1036 loops=1)
Buffers: shared hit=15
SubPlan 2
-> Aggregate (cost=60.64..60.66 rows=1 width=8) (actual time=2.376..2.377 rows=1 loops=2694)
Buffers: shared hit=3386747
-> Nested Loop (cost=1.39..60.63 rows=1 width=401) (actual time=2.280..2.369 rows=6 loops=2694)
Join Filter: (c_1.oid = s_2.starelid)
Buffers: shared hit=3386747
-> Nested Loop (cost=1.10..60.06 rows=1 width=10) (actual time=2.228..2.283 rows=16 loops=2694)
Buffers: shared hit=3262332
-> Nested Loop (cost=0.69..16.73 rows=1 width=4) (actual time=1.108..2.211 rows=1 loops=2694)
Join Filter: (c_1.relnamespace = n_1.oid)
Rows Removed by Join Filter: 364
Buffers: shared hit=3247620
-> Index Scan using pg_class_relname_nsp_index on pg_class c_1 (cost=0.41..8.43 rows=1 width=8) (actual time=0.005..0.398 rows=365 loops=2694)
Index Cond: (relname = tbl.relname)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
Buffers: shared hit=290274
-> Index Scan using pg_namespace_nspname_index on pg_namespace n_1 (cost=0.28..8.29 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=983801)
Index Cond: (nspname = ns.nspname)
Buffers: shared hit=2957346
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a_1 (cost=0.42..43.29 rows=4 width=6) (actual time=0.019..0.045 rows=16 loops=2694)
Index Cond: (attrelid = c_1.oid)
Filter: ((NOT attisdropped) AND has_column_privilege(c_1.oid, attnum, 'select'::text))
Rows Removed by Filter: 10
Buffers: shared hit=14712
-> Index Scan using pg_statistic_relid_att_inh_index on pg_statistic s_2 (cost=0.29..0.55 rows=1 width=6) (actual time=0.003..0.003 rows=0 loops=43518)
Index Cond: ((starelid = a_1.attrelid) AND (staattnum = a_1.attnum))
Filter: (stanullfrac <> '0'::double precision)
Rows Removed by Filter: 0
Buffers: shared hit=124415
Planning time: 4.337 ms
Execution time: 8605.391 ms
(102 rows)
嗯,谜团解开了。感谢上面的评论,我看到 EXPLAIN 输出表明 pg_attribute 也比我预期的要大得多。看完之后,我就知道问题出在哪里了。我们的一位开发人员有一个失控的进程,该进程创建了超过十万个表,这些表应该是临时的,但不是,然后“忘记”清理它们。它们处于非公开模式中,所以直到现在我才注意到它们。一旦我删除了这十万张左右的表并执行
vacuum full pg_attribute
andvacuum full pg_class
andvacuum full pg_statistic
,我在 Nagios 膨胀查询中遇到的问题终于得到了解决。