我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2024-04-24 00:36:48 +0800 CST2024-04-24 00:36:48 +0800 CST 2024-04-24 00:36:48 +0800 CST

Postgres jsonb 列上的全文搜索

772

所以我有一个这样的警报表：

my_db=> \d alerts_alert
                                     Table "public.alerts_alert"
     Column     |           Type           | Collation | Nullable |             Default
----------------+--------------------------+-----------+----------+----------------------------------
 received_at    | timestamp with time zone |           | not null |
 id             | bigint                   |           | not null | generated by default as identity
 data           | jsonb                    |           | not null |
 updated_at     | timestamp with time zone |           | not null |
 status         | character varying(8)     |           | not null |
 owner_id       | uuid                     |           |          |
 resolved_by_id | uuid                     |           |          |
Indexes:
    "alerts_alert_pkey" PRIMARY KEY, btree (id)
    "alerts_aler_data_eae7f5_gin" gin (data)
    "alerts_alert_data_gin" gin (to_tsvector('english'::regconfig, COALESCE(data::text, ''::text)))
    "alerts_alert_owner_id_0c00548a" btree (owner_id)
    "alerts_alert_resolved_by_id_b59cbeaf" btree (resolved_by_id)
Foreign-key constraints:
    "alerts_alert_owner_id_0c00548a_fk_accounts_user_id" FOREIGN KEY (owner_id) REFERENCES accounts_user(id) DEFERRABLE INITIALLY DEFERRED
    "alerts_alert_resolved_by_id_b59cbeaf_fk_accounts_user_id" FOREIGN KEY (resolved_by_id) REFERENCES accounts_user(id) DEFERRABLE INITIALLY DEFERRED

我确实想对该列执行全文搜索data。

我想出了这个查询，但它的性能很差：

my_db=> explain analyze WITH cte AS (
  SELECT id, received_at, data, updated_at, owner_id, resolved_by_id, status,
         to_tsvector('english'::regconfig, COALESCE(data::text, '')) AS search_vector
  FROM alerts_alert
)
SELECT id, received_at, data, updated_at, owner_id, resolved_by_id, status, search_vector,
       ts_rank(search_vector, websearch_to_tsquery('english'::regconfig, 'haykd')) AS rank
FROM cte
WHERE search_vector @@ websearch_to_tsquery('english'::regconfig, 'haykd')
ORDER BY rank DESC LIMIT 21;

这是它给我的：

                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1518.59..1529.61 rows=21 width=194) (actual time=3891.969..3952.546 rows=21 loops=1)
   ->  Result  (cost=1518.59..2195.31 rows=1289 width=194) (actual time=3891.962..3952.522 rows=21 loops=1)
         ->  Sort  (cost=1518.59..1521.81 rows=1289 width=162) (actual time=3868.134..3868.145 rows=21 loops=1)
               Sort Key: (ts_rank(to_tsvector('english'::regconfig, COALESCE((alerts_alert.data)::text, ''::text)), '''haykd'''::tsquery)) DESC
               Sort Method: top-N heapsort  Memory: 28kB
               ->  Bitmap Heap Scan on alerts_alert  (cost=538.11..1483.83 rows=1289 width=162) (actual time=19.143..3862.893 rows=1327 loops=1)
                     Recheck Cond: (to_tsvector('english'::regconfig, COALESCE((data)::text, ''::text)) @@ '''haykd'''::tsquery)
                     Heap Blocks: exact=202
                     ->  Bitmap Index Scan on alerts_alert_data_gin  (cost=0.00..537.79 rows=1289 width=0) (actual time=12.832..12.832 rows=1432 loops=1)
                           Index Cond: (to_tsvector('english'::regconfig, COALESCE((data)::text, ''::text)) @@ '''haykd'''::tsquery)
 Planning Time: 35.525 ms
 Execution Time: 3953.748 ms

该表不大，只有 12k 行。有什么建议可以让这个更快吗？我使用的是 PostgreSQL 16

更新：

explain (analyze, buffers)正如设置后运行的评论中所建议的track_io_timing = on：这是我得到的：

my_db=> explain (analyze, buffers) WITH cte AS (
  SELECT id, received_at, data, updated_at, owner_id, resolved_by_id, status,
         to_tsvector('english'::regconfig, COALESCE(data::text, '')) AS search_vector
  FROM alerts_alert
)
SELECT id, received_at, data, updated_at, owner_id, resolved_by_id, status, search_vector,
       ts_rank(search_vector, websearch_to_tsquery('english'::regconfig, 'haykd')) AS rank
FROM cte
WHERE search_vector @@ websearch_to_tsquery('english'::regconfig, 'haykd')
ORDER BY rank DESC LIMIT 21;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3064.58..3075.61 rows=21 width=194) (actual time=3029.762..3089.928 rows=21 loops=1)
   Buffers: shared hit=5518
   ->  Result  (cost=3064.58..3766.51 rows=1337 width=194) (actual time=3029.761..3089.919 rows=21 loops=1)
         Buffers: shared hit=5518
         ->  Sort  (cost=3064.58..3067.93 rows=1337 width=162) (actual time=3007.831..3007.840 rows=21 loops=1)
               Sort Key: (ts_rank(to_tsvector('english'::regconfig, COALESCE((alerts_alert.data)::text, ''::text)), '''haykd'''::tsquery)) DESC
               Sort Method: top-N heapsort  Memory: 28kB
               Buffers: shared hit=5417
               ->  Bitmap Heap Scan on alerts_alert  (cost=2055.61..3028.54 rows=1337 width=162) (actual time=4.503..3005.258 rows=1337 loops=1)
                     Recheck Cond: (to_tsvector('english'::regconfig, COALESCE((data)::text, ''::text)) @@ '''haykd'''::tsquery)
                     Heap Blocks: exact=204
                     Buffers: shared hit=5414
                     ->  Bitmap Index Scan on alerts_alert_data_gin  (cost=0.00..2055.28 rows=1337 width=0) (actual time=2.097..2.097 rows=1465 loops=1)
                           Index Cond: (to_tsvector('english'::regconfig, COALESCE((data)::text, ''::text)) @@ '''haykd'''::tsquery)
                           Buffers: shared hit=482
 Planning:
   Buffers: shared hit=276 read=2
   I/O Timings: shared/local read=1.171
 Planning Time: 35.782 ms
 Execution Time: 3090.356 ms

我正在 AWS RDS 上运行db.t4g.micro

1 个回答

Voted

Daniel Vérité · Answer 1 · 2024-04-27T20:21:43+08:00

全文索引：

"alerts_alert_data_gin" gin (to_tsvector('english'::regconfig, COALESCE(data::text, ''::text)))

用于位图索引扫描，但任何时候引擎需要tsvector查询中的其他内容时，data::text都会再次通过全文解析器。这部分是 CPU 密集型的，并且确实会降低搜索性能，除非文本内容很小。

特别是，该ts_rank()调用会导致执行to_tsvector('english'::regconfig, COALESCE(data::text, '')) 示例中的所有 1337 个匹配行。

通过将此表达式具体化为表中的一列（当然还要在此列上创建 GIN 索引）可以避免这种情况。

Postgres jsonb 列上的全文搜索

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Postgres jsonb 列上的全文搜索

1 个回答

相关问题