SQL从一个表中获取另一个表中的多个条目的ID

Question

Parker

Asked: 2024-08-08 21:50:13 +0800 CST2024-08-08 21:50:13 +0800 CST 2024-08-08 21:50:13 +0800 CST

在 PostgreSQL 中对单个索引文本列进行计数时出现这种间歇性慢查询的原因是什么？

772

我有一张包含 2,395,015 行的表，其中一TEXT列有三个值之一，并且从不为NULL。在计算值与大多数 (>99%) 行匹配的行数时，我遇到了间歇性查询性能问题。我想修复这个性能问题。这些查询必须返回精确计数，因此我不能使用近似计数。

corpus=# \d metadata
                             Table "public.metadata"
    Column     |            Type             | Collation | Nullable |    Default
---------------+-----------------------------+-----------+----------+----------------
 id            | text                        |           | not null |
 priority      | integer                     |           | not null | 10
 media_type    | text                        |           | not null |
 modified      | timestamp without time zone |           | not null | now()
 processed     | timestamp without time zone |           |          |
 status        | text                        |           | not null | 'QUEUED'::text
 note          | text                        |           |          |
 content       | text                        |           |          |
 resolved      | text                        |           |          |
 response_time | integer                     |           |          |
 luid          | integer                     |           | not null |
 jamo_date     | timestamp without time zone |           |          |
 audit_path    | text                        |           |          |
Indexes:
    "metadata_pkey" PRIMARY KEY, btree (id)
    "metadata_id_idx" btree (id)
    "metadata_luid_idx" btree (luid)
    "metadata_modified_idx" btree (modified DESC)
    "metadata_processed_idx" btree (processed DESC)
    "metadata_status_idx" btree (status)
Check constraints:
    "media_type_ck" CHECK (media_type = ANY (ARRAY['text/json'::text, 'text/yaml'::text]))
    "status_ck" CHECK (status = ANY (ARRAY['QUEUED'::text, 'PROCESSED'::text, 'ERROR'::text]))
Foreign-key constraints:
    "metadata_luid_fkey" FOREIGN KEY (luid) REFERENCES concept(luid) ON DELETE CASCADE

corpus=#

QUEUED我有一些简单的查询，用于计算与三个状态代码（、PROCESSED、）之一匹配的行数ERROR。匹配的行数为 0 行QUEUED，匹配的行数为 9,794 行ERROR，匹配的行数为 2,385,221 行PROCESSED。当我针对每个状态代码运行相同的查询时，通常会立即得到一组结果：

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='QUEUED';
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1947.17..1947.18 rows=1 width=8) (actual time=2.935..2.936 rows=1 loops=1)
   Output: count(*)
   ->  Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..1915.97 rows=12480 width=0) (actual time=2.932..2.933 rows=0 loops=1)
         Output: status
         Index Cond: (metadata.status = 'QUEUED'::text)
         Heap Fetches: 0
 Planning Time: 0.734 ms
 Execution Time: 2.988 ms
(8 rows)

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='ERROR';
                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1184.19..1184.20 rows=1 width=8) (actual time=1484.763..1484.764 rows=1 loops=1)
   Output: count(*)
   ->  Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..1165.26 rows=7569 width=0) (actual time=4.235..1484.029 rows=9794 loops=1)
         Output: status
         Index Cond: (metadata.status = 'ERROR'::text)
         Heap Fetches: 9584
 Planning Time: 0.072 ms
 Execution Time: 1484.786 ms
(8 rows)

corpus=#

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
                                                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=261398.83..261398.84 rows=1 width=8) (actual time=741.319..749.026 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=261398.62..261398.83 rows=2 width=8) (actual time=741.309..749.020 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=260398.62..260398.63 rows=1 width=8) (actual time=735.099..735.100 rows=1 loops=3)
               Output: PARTIAL count(*)
               Worker 0: actual time=730.871..730.872 rows=1 loops=1
               Worker 1: actual time=733.435..733.436 rows=1 loops=1
               ->  Parallel Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..257903.37 rows=998100 width=0) (actual time=0.065..700.529 rows=795074 loops=3)
                     Output: status
                     Index Cond: (metadata.status = 'PROCESSED'::text)
                     Heap Fetches: 747048
                     Worker 0: actual time=0.060..702.980 rows=670975 loops=1
                     Worker 1: actual time=0.076..686.946 rows=1010099 loops=1
 Planning Time: 0.085 ms
 Execution Time: 749.068 ms
(18 rows)

corpus=#

但有时，计算PROCESSED行数会花费过多的时间（有时需要几分钟）：

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
                                                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=261398.83..261398.84 rows=1 width=8) (actual time=30019.273..30019.336 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=261398.62..261398.83 rows=2 width=8) (actual time=30019.261..30019.326 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=260398.62..260398.63 rows=1 width=8) (actual time=29967.734..29967.735 rows=1 loops=3)
               Output: PARTIAL count(*)
               Worker 0: actual time=29939.915..29939.916 rows=1 loops=1
               Worker 1: actual time=29944.395..29944.395 rows=1 loops=1
               ->  Parallel Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..257903.37 rows=998100 width=0) (actual time=75.385..29931.795 rows=795074 loops=3)
                     Output: status
                     Index Cond: (metadata.status = 'PROCESSED'::text)
                     Heap Fetches: 747151
                     Worker 0: actual time=128.857..29899.156 rows=916461 loops=1
                     Worker 1: actual time=28.609..29905.708 rows=854439 loops=1
 Planning Time: 421.203 ms
 Execution Time: 30019.440 ms
(18 rows)

corpus=#

虽然上述查询运行缓慢，但我能够针对其他两个代码中的任一个查询同一张表，并且这些查询在 1 秒内返回。我查找了表锁（没有）。即使没有其他查询或表插入正在运行，也会发生这种情况。

这些间歇性慢速查询的可能原因有哪些？
我可以尝试哪些额外的调试来获取有关这些慢速查询的更多信息？
有没有相关的服务器设置？
是否有更有效的方法来索引/编码这些列（例如，我应该使用CHAR(1)），甚至是SMALLINT？如果是这样，应该为该列使用什么索引？

如果我使用CHAR(1)，以下约束之间是否有区别：

ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code = ANY (ARRAY['Q'::char(1), 'P'::char(1), 'E'::char(1)]));
ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code IN ('Q', 'P', 'E'));
是否可以对该列使用部分索引，即使它从来没有被使用过NULL？
我是否应该将其PROCESSED拆分为布尔列，然后status仅将该列用于其他代码并使用部分索引使其可空？

这是在 Linux 上运行的具有默认设置的 PostgreSQL 11。

我还尝试过其他方法：

将 work_mem 增加到 100MB（通过postgresql.conf）。性能没有变化。
我尝试在状态列上创建部分索引。

更新：我发现这个性能问题与状态列无关，而是与表本身的大小有关，如以下 2 分钟查询所示：

corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata;
                                                                                            QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=196398.52..196398.53 rows=1 width=8) (actual time=118527.897..118554.762 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=196398.30..196398.51 rows=2 width=8) (actual time=118522.165..118554.756 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=195398.30..195398.31 rows=1 width=8) (actual time=118491.043..118491.044 rows=1 loops=3)
               Output: PARTIAL count(*)
               Worker 0: actual time=118475.143..118475.144 rows=1 loops=1
               Worker 1: actual time=118476.110..118476.111 rows=1 loops=1
               ->  Parallel Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..192876.13rows=1008870 width=0) (actual time=71.797..118449.265 rows=809820 loops=3)
                     Output: status
                     Heap Fetches: 552630
                     Worker 0: actual time=75.877..118434.476 rows=761049 loops=1
                     Worker 1: actual time=104.872..118436.647 rows=745770 loops=1
 Planning Time: 592.040 ms
 Execution Time: 118554.839 ms
(17 rows)

corpus=#

这似乎与现在的其他问题非常相似，所以我正在尝试从这个答案中采取缓解策略：

VACUUM ANALYZE metadata;第一次COUNT(*)计数耗时 5 秒，后续计数耗时 190 毫秒。

其他想法：

如果将状态列拆分成其自己的表，并在metadata表中设置外键，这会有帮助吗？

注意：我越来越相信这个问题与这里的其他几个问题重复：

这个答案可能是该问题的最佳解决方案：

https://stackoverflow.com/a/7945274/2074605

根据要求，这里是带有缓冲区的查询计划分析：

EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT COUNT(*) FROM metadata;

                                                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=80771.95..80771.96 rows=1 width=8) (actual time=26711.481..26716.494 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=293915 read=19595 dirtied=282 written=12
   ->  Gather  (cost=80771.73..80771.94 rows=2 width=8) (actual time=26711.203..26716.488 rows=3 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=293915 read=19595 dirtied=282 written=12
         ->  Partial Aggregate  (cost=79771.73..79771.74 rows=1 width=8) (actual time=26565.622..26565.623 rows=1 loops=3)
               Output: PARTIAL count(*)
               Buffers: shared hit=293915 read=19595 dirtied=282 written=12
               Worker 0: actual time=26530.890..26530.891 rows=1 loops=1
                 Buffers: shared hit=105264 read=6760 dirtied=145 written=5
               Worker 1: actual time=26530.942..26530.942 rows=1 loops=1
                 Buffers: shared hit=84675 read=7529 dirtied=46 written=2
               ->  Parallel Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..77241.05 rows=1012275 width=0) (actual time=42.254..26529.232 rows=809820 loops=3)
                     Output: status
                     Heap Fetches: 17185
                     Buffers: shared hit=293915 read=19595 dirtied=282 written=12
                     Worker 0: actual time=59.291..26494.376 rows=815113 loops=1
                       Buffers: shared hit=105264 read=6760 dirtied=145 written=5
                     Worker 1: actual time=31.165..26484.729 rows=1036972 loops=1
                       Buffers: shared hit=84675 read=7529 dirtied=46 written=2
 Planning Time: 98.400 ms
 Execution Time: 26716.529 ms
(25 rows)

4 个回答

Voted

Stanislav Bashkyrtsev · Answer 1 · 2024-08-12T23:55:05+08:00

我的猜测是，有时很多堆页面都被缓冲了，查询运行得很快。其他时候，缓冲区中加载了其他页面，因此您最终需要等待从磁盘读取数据。

这种查询有两个问题：

即使查询被索引覆盖，由于 MVCC，它必须获取堆元组 - 它不知道哪些索引记录对于当前 tx 是可见的。如果大量页面最终出现在可见性映射中，则可以改进这一点。但如果您不断更新旧页面上的记录，它不会持续很长时间。
无论优化如何，这都只是糟糕的查询。您的列的基数非常低。如果查询需要检查几乎所有数据，那么它在大表上不会很快。也许您应该在单独的表中跟踪计数，并在主表中的行被修改时更新这些计数。

Postgres 保存列的统计信息，包括每个值预计有多少行的直方图。如果近似统计信息对您来说足够了，您可以直接查询pg_stats。另一种方法可能是使用抽样。这也是一种近似值。但如果您需要精确的数字，那么您应该考虑做一些更复杂的操作，并将统计信息保存在单独的表中。

重新设计架构以存储预先计算的统计数据

这种架构的可扩展性是有限制的 - 而且限制不大。每次运行此查询时，您都必须对表进行顺序扫描（索引毫无用处）。

如果这是一个关键查询，您可以重新设计您的架构以将摘要统计信息保存在单独的表中：

添加列counted default false，
在后台作业中每 N 分钟将计数保存到其他表一次：

with counted as(
  update record_status set counted=true 
  where not counted 
  returning 1
)
update stats set counts=counts+(select count(*) from counted)

然后，当您需要进行计数时，剩下的就是将统计数据和后台作业尚未更新的新计数相加：

select (select counts from stats)
       +
       (select count(*) from record_status where not counted)

需要counted进行索引（部分索引是完美的）。这将立即给出结果。

如果您还需要支持删除，那么counted您可以将其转换为 3 个值，而不是只有 2 个值：JUST_INSERTED，JUST_DELETED，COUNTED。然后，不要立即删除记录 - 将其标记为JUST_DELETED，以便在同一个后台作业中更新统计数据。但这次要减去计数。

Parker · Answer 2 · 2024-08-12T23:05:00+08:00

根据我在修改后的问题中描述的测试以及对类似问题的回答的建议（请参阅此处和此处），我实施了以下更改：

在状态列上创建了部分索引。CREATE INDEX status_not_processed_idx ON metadata (status) WHERE status<>'PROCESSED';
用吸尘器清洁过的桌子。VACUUM ANALYZE metadata;
调整桌子上的自动真空设置。ALTER TABLE metadata SET (autovacuum_vacuum_scale_factor = 0, autovacuum_analyze_scale_factor = 0, autovacuum_vacuum_threshold = 10000, autovacuum_analyze_threshold = 10000);
根据此处描述的方法创建了一个 fast_row_count(table_name) 函数：

CREATE OR REPLACE FUNCTION fast_count_rows(table_name text)
RETURNS bigint
AS $$
SELECT
(
  CASE WHEN c.reltuples < 0 THEN NULL
       WHEN c.relpages = 0 THEN float8 '0'
       ELSE c.reltuples / c.relpages END * (pg_catalog.pg_relation_size(c.oid) / pg_catalog.current_setting('block_size')::int)
)::bigint
  FROM pg_catalog.pg_class c
  WHERE c.oid = CONCAT('public.',table_name)::regclass
$$ LANGUAGE SQL STABLE;

corpus=> SELECT fast_count_rows('metadata');
 fast_count_rows
-----------------
         2192335

避免使用COUNT(*)或依赖PROCESSED值：

CREATE OR REPLACE VIEW metadata_queue_statistics AS
SELECT
  COUNT(*) FILTER (WHERE status = 'QUEUED'::text) AS queued,
  fast_count_rows('metadata') - COUNT(*) FILTER (WHERE status = ANY (ARRAY ['QUEUED'::text, 'ERROR'::text])) AS completed,
  COUNT(*) FILTER (WHERE status = 'ERROR'::text) AS failed,
  fast_count_rows('metadata') AS total
FROM metadata
;

corpus=> SELECT * FROM metadata_queue_statistics;
queued | failed | completed |  total
--------+--------+-----------+---------
  19258 |  11372 |   2398829 | 2429459
(1 row)

初始性能测试看起来很有希望（422ms），但随后对此方法的测试产生了相同的性能问题：

corpus=> EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM metadata_queue_statistics;

                        QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=127854.78..127855.29 rows=1 width=32) (actual time=76853.872..76853.872 rows=1 loops=1)
   Output: count(*) FILTER (WHERE (metadata.status = 'QUEUED'::text)), (fast_count_rows('metadata'::text) - count(*) FILTER (WHERE (metadata.status = ANY ('{QUEUED,ERROR}'::text[])))), count(*) FILTER (WHERE (metadata.status = 'ERROR'::text)), fast_count_rows('metadata'::text)
   Buffers: shared hit=295240 read=26638 dirtied=383 written=18
   ->  Index Only Scan using metadata_status_idx on public.metadata  (cost=0.43..91412.89 rows=2429459 width=9) (actual time=0.071..76432.532 rows=2429459 loops=1)
         Output: metadata.status
         Heap Fetches: 28878
         Buffers: shared hit=295234 read=26638 dirtied=383 written=18
 Planning Time: 124.091 ms
 Execution Time: 76853.911 ms
(9 rows)

我不会将此标记为我的问题的答案，因为性能问题仍然存在。我将其留在这里作为失败解决方案的示例。

还尝试过：

shared_buffers从postgresql.conf默认增加到128MB。2GB（查询性能没有显著变化）

Parker · Answer 3 · 2025-02-26T22:53:36+08:00

这个问题最终到了我无法再忽视的地步。

我最终通过将状态代码移到它自己的表中并使用luid主表中的整数（自动递增的本地唯一标识符）作为键来解决此性能问题。状态代码本身现在只是一个字符。数据库最多有 4,736,786 条记录，视图（也经过修改，粘贴在下面）在不到一秒的时间内使用新表完成查询。

ALTER TABLE jgi_metadata ADD CONSTRAINT unique_luid UNIQUE (luid);

DROP TABLE record_status; -- earlier attempt

CREATE TABLE record_status
(
  luid integer NOT NULL REFERENCES jgi_metadata(luid),
  status_code char(1) NOT NULL DEFAULT 'Q',
  status text,
  PRIMARY KEY (luid)
);
ALTER TABLE record_status ADD CONSTRAINT status_ck CHECK (status_code = ANY (ARRAY['Q'::char(1), 'P'::char(1), 'E'::char(1)]));

INSERT INTO record_status (luid,status) SELECT luid,status FROM jgi_metadata;

UPDATE record_status SET status_code='Q' WHERE status='QUEUED';
UPDATE record_status SET status_code='P' WHERE status='PROCESSED';
UPDATE record_status SET status_code='E' WHERE status='ERROR';

ALTER TABLE record_status DROP COLUMN status;

CREATE INDEX ON record_status(luid);
CREATE INDEX ON record_status(status_code);

ALTER TABLE jgi_metadata DROP COLUMN status;

表“public.record_status”

   Column    |     Type     | Collation | Nullable |   Default   | Storage  | Stats target | Description
-------------+--------------+-----------+----------+-------------+----------+--------------+-------------
 luid        | integer      |           | not null |             | plain    |              |
 status_code | character(1) |           | not null | 'Q'::bpchar | extended |              |
Indexes:
    "record_status_pkey" PRIMARY KEY, btree (luid)
    "record_status_luid_idx" btree (luid)
    "record_status_status_code_idx" btree (status_code)
Check constraints:
    "status_ck" CHECK (status_code = ANY (ARRAY['Q'::character(1), 'P'::character(1), 'E'::character(1)]))
Foreign-key constraints:
    "record_status_luid_fkey" FOREIGN KEY (luid) REFERENCES metadata(luid)

fast_count_rows()我还通过删除函数（使用）简化了统计视图bigint，这又减少了查询所需的 500 毫秒：

查看“public.metadata_queue_statistics”

CREATE OR REPLACE VIEW metadata_queue_statistics AS
 SELECT count(*) FILTER (WHERE record_status.status_code = 'Q'::char(1)) AS queued,
        count(*) FILTER (WHERE record_status.status_code = 'P'::char(1)) AS completed,
        count(*) FILTER (WHERE record_status.status_code = 'E'::char(1)) AS failed,
        count(*) AS total
  FROM record_status
;

从视图中删除fast_count_rows函数之前的性能：

corpus=# EXPLAIN ANALYZE SELECT * FROM metadata_queue_statistics;
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=170413.05..170413.56 rows=1 width=32) (actual time=828.100..828.101 rows=1 loops=1)
   ->  Seq Scan on record_status  (cost=0.00..98880.42 rows=4768842 width=2) (actual time=0.009..337.786 rows=4736786 loops=1)
 Planning Time: 0.227 ms
 Execution Time: 828.123 ms
(4 rows)

从视图中删除fast_count_rows函数后的性能：

corpus=# EXPLAIN ANALYZE SELECT * FROM metadata_queue_statistics;
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=106835.22..106835.23 rows=1 width=32) (actual time=360.787..379.916 rows=1 loops=1)
   ->  Gather  (cost=106834.99..106835.20 rows=2 width=32) (actual time=360.502..379.910 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=105834.99..105835.00 rows=1 width=32) (actual time=349.462..349.462 rows=1 loops=3)
               ->  Parallel Seq Scan on record_status  (cost=0.00..71062.18 rows=1987018 width=2) (actual time=0.181..164.633 rows=1578929 loops=3)
 Planning Time: 0.083 ms
 Execution Time: 379.953 ms
(8 rows)

所以，现在我可以花不到半秒的时间来完成这个视图。

为了与我之前尝试解决此问题时的缓冲区大小进行比较，这里是更详细的执行计划：

corpus=# EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM metadata_queue_statistics;

                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=106835.22..106835.23 rows=1 width=32) (actual time=402.090..406.696 rows=1 loops=1)
   Output: count(*) FILTER (WHERE (record_status.status_code = 'Q'::character(1))), count(*) FILTER (WHERE (record_status.status_code = 'P'::character(1))), count(*) FILTER (WHERE (record_status.status_code = 'E'::character(1))), count(*)
   Buffers: shared hit=51192
   ->  Gather  (cost=106834.99..106835.20 rows=2 width=32) (actual time=382.500..406.686 rows=3 loops=1)
         Output: (PARTIAL count(*) FILTER (WHERE (record_status.status_code = 'Q'::character(1)))), (PARTIAL count(*) FILTER (WHERE (record_status.status_code = 'P'::character(1)))), (PARTIAL count(*) FILTER (WHERE (record_status.status_code = 'E'::character(1)))), (PARTIAL count(*))
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=51192
         ->  Partial Aggregate  (cost=105834.99..105835.00 rows=1 width=32) (actual time=378.003..378.003 rows=1 loops=3)
               Output: PARTIAL count(*) FILTER (WHERE (record_status.status_code = 'Q'::character(1))), PARTIAL count(*) FILTER (WHERE (record_status.status_code = 'P'::character(1))), PARTIAL count(*) FILTER (WHERE (record_status.status_code = 'E'::character(1))), PARTIAL count(*)
               Buffers: shared hit=51192
               Worker 0: actual time=366.127..366.128 rows=1 loops=1
                 Buffers: shared hit=6728
               Worker 1: actual time=385.566..385.566 rows=1 loops=1
                 Buffers: shared hit=7750
               ->  Parallel Seq Scan on public.record_status  (cost=0.00..71062.18 rows=1987018 width=2) (actual time=1.255..188.238 rows=1578929 loops=3)
                     Output: record_status.status_code
                     Buffers: shared hit=51192
                     Worker 0: actual time=3.727..181.228 rows=1111377 loops=1
                       Buffers: shared hit=6728
                     Worker 1: actual time=0.030..211.375 rows=1073500 loops=1
                       Buffers: shared hit=7750
 Planning Time: 0.079 ms
 Execution Time: 406.726 ms
(24 rows)

Take Ichiru · Answer 4 · 2025-02-27T01:22:51+08:00

我不知道这是怎么回事，但这是我需要知道的？

您的 PostgreSQL 版本和配置文件是什么？您的硬件、数据/索引卷或分区的磁盘类型

变量太多了，所以我总结为两个问题，

（1）正如@Stanislav 上面所说，状态索引的基数太低，无法从基于堆的索引中受益。

(2) 该表包含太多 TEXT 字段，并且它可能会在每一页上增加行数只是为了向您提供确切的行数，并且由于 TEXT 列位于整行的中间，其性能会很差，这解释了您的 ANALYZE 一开始会使它变好，然后它会将所有的 shared_buffers 转储出来。

--> For the second issue, here is what I did, create two tables, one to store its critical result such as media_type, status, thing .. that technically don't have TEXT field in it. And the other has UNIQUE FOREIGN KEY with PRIMARY KEY from the first table.

--> For the INSERT and UPDATE and DELETE, in your app, making an explicit transaction include two INSERT (or UPDATE/DELETE) in it. If both are OK, apply COMMIT and/or ROLLBACK if you don't like it. Then if you want to extract it content, use LEFT JOIN (not INNER JOIN). In here, the main (first) table is much smaller, allowing you to load and get the COUNT on the same amount of rows in less number of pages required to scan.

在 PostgreSQL 中对单个索引文本列进行计数时出现这种间歇性慢查询的原因是什么？

重新设计架构以存储预先计算的统计数据

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

在 PostgreSQL 中对单个索引文本列进行计数时出现这种间歇性慢查询的原因是什么？

4 个回答

重新设计架构以存储预先计算的统计数据

相关问题