我有一张包含 2,395,015 行的表,其中一TEXT
列有三个值之一,并且从不为NULL
。在计算值与大多数 (>99%) 行匹配的行数时,我遇到了间歇性查询性能问题。我想修复这个性能问题。这些查询必须返回精确计数,因此我不能使用近似计数。
corpus=# \d metadata
Table "public.metadata"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+----------------
id | text | | not null |
priority | integer | | not null | 10
media_type | text | | not null |
modified | timestamp without time zone | | not null | now()
processed | timestamp without time zone | | |
status | text | | not null | 'QUEUED'::text
note | text | | |
content | text | | |
resolved | text | | |
response_time | integer | | |
luid | integer | | not null |
jamo_date | timestamp without time zone | | |
audit_path | text | | |
Indexes:
"metadata_pkey" PRIMARY KEY, btree (id)
"metadata_id_idx" btree (id)
"metadata_luid_idx" btree (luid)
"metadata_modified_idx" btree (modified DESC)
"metadata_processed_idx" btree (processed DESC)
"metadata_status_idx" btree (status)
Check constraints:
"media_type_ck" CHECK (media_type = ANY (ARRAY['text/json'::text, 'text/yaml'::text]))
"status_ck" CHECK (status = ANY (ARRAY['QUEUED'::text, 'PROCESSED'::text, 'ERROR'::text]))
Foreign-key constraints:
"metadata_luid_fkey" FOREIGN KEY (luid) REFERENCES concept(luid) ON DELETE CASCADE
corpus=#
QUEUED
我有一些简单的查询,用于计算与三个状态代码( 、PROCESSED
、 )之一匹配的行数ERROR
。匹配的行数为 0 行QUEUED
,匹配的行数为 9,794 行ERROR
,匹配的行数为 2,385,221 行PROCESSED
。当我针对每个状态代码运行相同的查询时,通常会立即得到一组结果:
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='QUEUED';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1947.17..1947.18 rows=1 width=8) (actual time=2.935..2.936 rows=1 loops=1)
Output: count(*)
-> Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..1915.97 rows=12480 width=0) (actual time=2.932..2.933 rows=0 loops=1)
Output: status
Index Cond: (metadata.status = 'QUEUED'::text)
Heap Fetches: 0
Planning Time: 0.734 ms
Execution Time: 2.988 ms
(8 rows)
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='ERROR';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1184.19..1184.20 rows=1 width=8) (actual time=1484.763..1484.764 rows=1 loops=1)
Output: count(*)
-> Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..1165.26 rows=7569 width=0) (actual time=4.235..1484.029 rows=9794 loops=1)
Output: status
Index Cond: (metadata.status = 'ERROR'::text)
Heap Fetches: 9584
Planning Time: 0.072 ms
Execution Time: 1484.786 ms
(8 rows)
corpus=#
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=261398.83..261398.84 rows=1 width=8) (actual time=741.319..749.026 rows=1 loops=1)
Output: count(*)
-> Gather (cost=261398.62..261398.83 rows=2 width=8) (actual time=741.309..749.020 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=260398.62..260398.63 rows=1 width=8) (actual time=735.099..735.100 rows=1 loops=3)
Output: PARTIAL count(*)
Worker 0: actual time=730.871..730.872 rows=1 loops=1
Worker 1: actual time=733.435..733.436 rows=1 loops=1
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..257903.37 rows=998100 width=0) (actual time=0.065..700.529 rows=795074 loops=3)
Output: status
Index Cond: (metadata.status = 'PROCESSED'::text)
Heap Fetches: 747048
Worker 0: actual time=0.060..702.980 rows=670975 loops=1
Worker 1: actual time=0.076..686.946 rows=1010099 loops=1
Planning Time: 0.085 ms
Execution Time: 749.068 ms
(18 rows)
corpus=#
但有时,计算PROCESSED
行数会花费过多的时间(有时需要几分钟):
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=261398.83..261398.84 rows=1 width=8) (actual time=30019.273..30019.336 rows=1 loops=1)
Output: count(*)
-> Gather (cost=261398.62..261398.83 rows=2 width=8) (actual time=30019.261..30019.326 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=260398.62..260398.63 rows=1 width=8) (actual time=29967.734..29967.735 rows=1 loops=3)
Output: PARTIAL count(*)
Worker 0: actual time=29939.915..29939.916 rows=1 loops=1
Worker 1: actual time=29944.395..29944.395 rows=1 loops=1
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..257903.37 rows=998100 width=0) (actual time=75.385..29931.795 rows=795074 loops=3)
Output: status
Index Cond: (metadata.status = 'PROCESSED'::text)
Heap Fetches: 747151
Worker 0: actual time=128.857..29899.156 rows=916461 loops=1
Worker 1: actual time=28.609..29905.708 rows=854439 loops=1
Planning Time: 421.203 ms
Execution Time: 30019.440 ms
(18 rows)
corpus=#
虽然上述查询运行缓慢,但我能够针对其他两个代码中的任一个查询同一张表,并且这些查询在 1 秒内返回。我查找了表锁(没有)。即使没有其他查询或表插入正在运行,也会发生这种情况。
- 这些间歇性慢速查询的可能原因有哪些?
- 我可以尝试哪些额外的调试来获取有关这些慢速查询的更多信息?
- 有没有相关的服务器设置?
- 是否有更有效的方法来索引/编码这些列(例如,我应该使用
CHAR(1)
),甚至是SMALLINT
?如果是这样,应该为该列使用什么索引?
如果我使用CHAR(1)
,以下约束之间是否有区别:
ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code = ANY (ARRAY['Q'::char(1), 'P'::char(1), 'E'::char(1)]));
ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code IN ('Q', 'P', 'E'));
是否可以对该列使用部分索引,即使它从来没有被使用过
NULL
?我是否应该将其
PROCESSED
拆分为布尔列,然后status
仅将该列用于其他代码并使用部分索引使其可空?
这是在 Linux 上运行的具有默认设置的 PostgreSQL 11。
我还尝试过其他方法:
- 将 work_mem 增加到 100MB(通过
postgresql.conf
)。性能没有变化。 - 我尝试在状态列上创建部分索引。
更新:我发现这个性能问题与状态列无关,而是与表本身的大小有关,如以下 2 分钟查询所示:
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=196398.52..196398.53 rows=1 width=8) (actual time=118527.897..118554.762 rows=1 loops=1)
Output: count(*)
-> Gather (cost=196398.30..196398.51 rows=2 width=8) (actual time=118522.165..118554.756 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=195398.30..195398.31 rows=1 width=8) (actual time=118491.043..118491.044 rows=1 loops=3)
Output: PARTIAL count(*)
Worker 0: actual time=118475.143..118475.144 rows=1 loops=1
Worker 1: actual time=118476.110..118476.111 rows=1 loops=1
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..192876.13rows=1008870 width=0) (actual time=71.797..118449.265 rows=809820 loops=3)
Output: status
Heap Fetches: 552630
Worker 0: actual time=75.877..118434.476 rows=761049 loops=1
Worker 1: actual time=104.872..118436.647 rows=745770 loops=1
Planning Time: 592.040 ms
Execution Time: 118554.839 ms
(17 rows)
corpus=#
这似乎与现在的其他问题非常相似,所以我正在尝试从这个答案中采取缓解策略:
VACUUM ANALYZE metadata;
第一次COUNT(*)
计数耗时 5 秒,后续计数耗时 190 毫秒。
其他想法:
- 如果将状态列拆分成其自己的表,并在
metadata
表中设置外键,这会有帮助吗?
注意:我越来越相信这个问题与这里的其他几个问题重复:
- PostgreSQL 极慢计数
- 即使使用索引扫描,count(*) 查询也太慢了
- 为什么有些计数查询这么慢?
- 优化 Postgresql 中的选择计数结果
- https://stackoverflow.com/questions/58449716/postgres-why-does-select-count-take-so-long
- https://stackoverflow.com/questions/16916633/if-postgresql-count-is-always-slow-how-to-paginate-complex-queries
- https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql/7945274#7945274
这个答案可能是该问题的最佳解决方案:
根据要求,这里是带有缓冲区的查询计划分析:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT COUNT(*) FROM metadata;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=80771.95..80771.96 rows=1 width=8) (actual time=26711.481..26716.494 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
-> Gather (cost=80771.73..80771.94 rows=2 width=8) (actual time=26711.203..26716.488 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
-> Partial Aggregate (cost=79771.73..79771.74 rows=1 width=8) (actual time=26565.622..26565.623 rows=1 loops=3)
Output: PARTIAL count(*)
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
Worker 0: actual time=26530.890..26530.891 rows=1 loops=1
Buffers: shared hit=105264 read=6760 dirtied=145 written=5
Worker 1: actual time=26530.942..26530.942 rows=1 loops=1
Buffers: shared hit=84675 read=7529 dirtied=46 written=2
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..77241.05 rows=1012275 width=0) (actual time=42.254..26529.232 rows=809820 loops=3)
Output: status
Heap Fetches: 17185
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
Worker 0: actual time=59.291..26494.376 rows=815113 loops=1
Buffers: shared hit=105264 read=6760 dirtied=145 written=5
Worker 1: actual time=31.165..26484.729 rows=1036972 loops=1
Buffers: shared hit=84675 read=7529 dirtied=46 written=2
Planning Time: 98.400 ms
Execution Time: 26716.529 ms
(25 rows)