sh3nan1gans提出的问题 -dba

Asked: 2024-06-22 05:27:21 +0800 CST

如何优化具有多个多对多关系的查询

我在查询包含超过一百万行的事件表时遇到了性能问题。我们的应用程序逻辑规定用户只能访问他们可以通过其“组”访问的事件。组与资产具有多对多关系，资产与事件具有多对多关系。因此，为了找到用户可以访问的事件，我们目前加入了事件 -> events_assets -> assets -> groups_assets

我有 events.id、events_assets.event_id、assets.id 和 groups_assets.asset_id 的索引。

应用程序的本质是事件的数量以及 events_assets 的数量不断增长，而资产和 groups_assets 的数量保持相对较低且稳定。

架构如下：

create table events (id text, last_updated TIMESTAMP);
create table events_assets (event_id text, asset_id text);
create table assets (id text);
create table groups_assets (group_id text, asset_id text);

查询：

EXPLAIN ANALYZE
SELECT
  events.*
FROM
  events
WHERE
  (
    events.id IN (
      SELECT
        events.id
      FROM
        events
        INNER JOIN events_assets ON (events.id = events_assets.event_id)
        INNER JOIN assets ON (assets.id = events_assets.asset_id)
      WHERE
        (
          assets.id in (
            (
              SELECT
                id
              FROM
                assets
                LEFT JOIN groups_assets ON (groups_assets.asset_id = assets.id)
              WHERE
                (groups_assets.group_id IN ('default'))
            )
          )
        )
    )
  )
ORDER BY
  last_updated DESC
LIMIT
  25 OFFSET 0

查询计划如下：

Limit  (cost=77402.59..77402.66 rows=25 width=1970) (actual time=2147.720..2148.033 rows=25 loops=1)
  ->  Sort  (cost=77402.59..77462.96 rows=24147 width=1970) (actual time=2147.714..2148.024 rows=25 loops=1)
        Sort Key: events.last_updated DESC
        Sort Method: top-N heapsort  Memory: 109kB
        ->  Nested Loop  (cost=15979.68..76721.18 rows=24147 width=1970) (actual time=1074.494..2097.334 rows=144882 loops=1)
              ->  HashAggregate  (cost=15979.26..16220.73 rows=24147 width=74) (actual time=1074.361..1225.749 rows=144882 loops=1)
                    Group Key: events_1.id
                    Batches: 5  Memory Usage: 8241kB  Disk Usage: 11304kB
                    ->  Gather  (cost=1013.12..15918.90 rows=24147 width=74) (actual time=0.872..99.711 rows=144882 loops=1)
                          Workers Planned: 1
                          Workers Launched: 1
                          ->  Nested Loop  (cost=13.12..12504.20 rows=14204 width=74) (actual time=1.567..278.523 rows=72441 loops=2)
                                ->  Hash Join  (cost=12.70..3543.54 rows=14254 width=37) (actual time=1.019..34.109 rows=72441 loops=2)
                                      Hash Cond: (events_assets.asset_id = assets.id)
                                      ->  Parallel Seq Scan on events_assets  (cost=0.00..3103.23 rows=85523 width=74) (actual time=0.183..18.023 rows=72441 loops=2)
                                      ->  Hash  (cost=12.68..12.68 rows=1 width=100) (actual time=0.808..0.813 rows=8 loops=2)
                                            Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                            ->  Nested Loop  (cost=9.37..12.68 rows=1 width=100) (actual time=0.777..0.793 rows=8 loops=2)
                                                  Join Filter: (assets.id = groups_assets.asset_id)
                                                  ->  HashAggregate  (cost=9.24..9.25 rows=1 width=66) (actual time=0.755..0.759 rows=8 loops=2)
                                                        Group Key: assets_1.id
                                                        Batches: 1  Memory Usage: 24kB
                                                        Worker 0:  Batches: 1  Memory Usage: 24kB
                                                        ->  Nested Loop  (cost=0.13..9.23 rows=1 width=66) (actual time=0.702..0.730 rows=9 loops=2)
                                                              ->  Seq Scan on groups_assets  (cost=0.00..1.07 rows=1 width=32) (actual time=0.220..0.222 rows=9 loops=2)
                                                                    Filter: (group_id = 'default'::text)
                                                              ->  Index Only Scan using assets_id on assets assets_1  (cost=0.13..8.15 rows=1 width=34) (actual time=0.055..0.056 rows=1 loops=18)
                                                                    Index Cond: (id = groups_assets.asset_id)
                                                                    Heap Fetches: 18
                                                  ->  Index Only Scan using assets_id on assets  (cost=0.13..3.42 rows=1 width=34) (actual time=0.003..0.003 rows=1 loops=16)
                                                        Index Cond: (id = assets_1.id)
                                                        Heap Fetches: 16
                                ->  Index Only Scan using events_id on events events_1  (cost=0.42..0.62 rows=1 width=37) (actual time=0.003..0.003 rows=1 loops=144882)
                                      Index Cond: (id = events_assets.event_id)
                                      Heap Fetches: 18621
              ->  Index Scan using events_id on events  (cost=0.42..2.50 rows=1 width=1970) (actual time=0.006..0.006 rows=1 loops=144882)
                    Index Cond: (id = events_1.id)
Planning Time: 1.994 ms
Execution Time: 2159.146 ms

如何优化具有多个多对多关系的查询

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

sh3nan1gans's questions