Joe提出的问题 -dba

Joe

Asked: 2024-02-12 20:05:34 +0800 CST

Aurora PostgreSQL TEMP TABLE 创建占用的 CPU 超出预期？

5

我在db.x2g.2xlarge实例上的 Aurora PostgreSQL 上运行。简单地创建临时表显示为第三高的等待，这让我感到惊讶，因为我预计它是最短的查询之一。

该查询在事务中调用：

CREATE TEMP TABLE identifiers_to_resolve (  
  scheme_pk BIGINT,
  host_pk BIGINT,
  port INT NOT NULL,
  scheme_specific_part VARCHAR NOT NULL,
  index INTEGER NOT NULL
)  ON COMMIT DROP;

我想在更复杂的查询期间使用临时表作为事务中临时存储。

来自性能洞察的统计数据：

等待负载 (AAS)：0.42
调用/秒：123.64
批量点击/秒：48499.58
块写入/秒：16.32
每次调用的平均延迟毫秒：3.24
批量点击/跟注 329.27
块写入/调用 0.13

所有其他统计数据均为 0。

性能洞察将等待记录为几乎所有“CPU”。附截图。

我认为每秒 123 次调用是相当低的速率，而且我很惊讶它对数据库造成了如此大的负担，并触及了这么多块。从我阅读的临时表用例来看，我预计它们的影响较小。

创建临时表是否会导致如此多的 CPU 等待？

Joe

Asked: 2024-01-20 01:57:21 +0800 CST

在查询计划中看到数据类型转换是否是一个危险信号？

5

在查询计划中，我看到正在转换的行的类型。有一个应该使用的索引，但没有。类型转换是一个危险信号吗？

该表有一个 ANALYZE 并有 1.1 亿行。其中，10% 与查询匹配。

这是完整的解释：

explain select * from item_info where subtype = 'Dataset';

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1592021.24 rows=1320650 width=27)
   Workers Planned: 2
   ->  Parallel Seq Scan on item_info  (cost=0.00..1458956.24 rows=550271 width=27)
         Filter: ((subtype)::text = 'Dataset'::text)
(4 rows)

Time: 0.652 ms=> \d item_info
                     Table "public.item_info"
    Column    |       Type        | Collation | Nullable | Default
--------------+-------------------+-----------+----------+---------
 root_item_pk | bigint            |           | not null |
 type         | character varying |           |          |
 subtype      | character varying |           |          |
Indexes:
    "item_info_pkey" PRIMARY KEY, btree (root_item_pk)
    "item_info_subtype_idx" btree (subtype)
    "item_info_type_idx" btree (type)

Joe

Asked: 2023-08-09 00:06:10 +0800 CST

当临时表被删除时，Postgres 会回收空间吗？

6

我正在使用大量TEMP TABLES并希望确保不会泄漏存储空间。

我在事务中创建并使用该表CREATE TEMP TABLE mytable ... ON COMMIT DROP。

我无法理解的文档CREATE TEMPORARY TABLE。它指出：

autovacuum 守护进程无法访问，因此无法清理或分析临时表。因此，应通过会话 SQL 命令执行适当的清理和分析操作。

会ON COMMIT DROP回收空间还是我需要实际运行VACUUM mytable？

如果我确实需要运行VACUUM，那么我不确定如何运行，因为临时表在事务结束后不可用。

Joe

Asked: 2022-06-16 07:26:42 +0800 CST

带有大表的 UPDATE FROM 很慢并且使用 Seq Scans

2

我有一个大表（最终可能有 10 亿行，但目前约为 2600 万行），我想一次性为给定分组在最高 PK 上设置一个标志。

我选择创建一个临时表来存储应该设置的 PK，current=true其余的都应该设置current=false。我制作了一个临时表而不是物化视图，但我认为它不会产生真正的区别。

为每个发现最大 ID 的过程并不太痛苦：

CREATE TABLE assertion (
    pk integer NOT NULL,
    a bigint NOT NULL,
    b bigint NOT NULL,
    c bigint NOT NULL,
    d integer NOT NULL,
    current boolean DEFAULT false NOT NULL
);

CREATE INDEX assertion_current_idx ON assertion USING btree (current) WHERE (current = true);
CREATE INDEX assertion_current_idx1 ON assertion USING btree (current);
CREATE UNIQUE INDEX assertion_a_b_c_d_idx ON assertion USING btree (a, b, c, d) WHERE (current = true);

SELECT COUNT(pk) FROM assertion;

-- 26916858
-- Time: 2912.403 ms (00:02.912)

CREATE TEMPORARY TABLE assertion_current AS
    (SELECT MAX(pk) as pk, a, b, c, d
      FROM assertion
      GROUP BY a, b, c, d);

-- Time: 72218.755 ms (01:12.219)

ANALYZE assertion_current;

CREATE INDEX ON assertion_current(pk);

-- Time: 22107.698 ms (00:22.108)

SELECT COUNT(pk) FROM assertion_current;

-- 26455092
-- Time: 15650.078 ms (00:15.650)

根据的计数assertion_current，我们需要为 98% 的行设置“当前”标志为真。

棘手的是如何assertion根据当前值在合理的时间内更新表格。有一个a, b, c, d, current必须维护的唯一约束，因此对current列的更新需要是原子的，以避免破坏约束。

我有几个选择：

选项1

仅更新那些current更改的值。这具有根据索引字段更新所需的最少行数的好处：


BEGIN;
UPDATE assertion
   SET current = false
   WHERE assertion.current = true AND PK NOT IN (SELECT pk FROM assertion_current);
UPDATE assertion
   SET current = true
   WHERE assertion.current = false AND PK IN (SELECT pk FROM assertion_current);
COMMIT;

但是这两个查询都涉及序列扫描assertion_current（我认为）必须乘以大量行。

Update on assertion  (cost=0.12..431141.55 rows=0 width=0)
   ->  Index Scan using assertion_current_idx on assertion  (cost=0.12..431141.55 rows=1 width=7)
         Index Cond: (current = true)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..787318.40 rows=29982560 width=4)
                 ->  Seq Scan on assertion_current  (cost=0.00..520285.60 rows=29982560 width=4)

和

 Update on assertion  (cost=595242.56..596693.92 rows=0 width=0)
   ->  Nested Loop  (cost=595242.56..596693.92 rows=17974196 width=13)
         ->  HashAggregate  (cost=595242.00..595244.00 rows=200 width=10)
               Group Key: assertion_current.pk
               ->  Seq Scan on assertion_current  (cost=0.00..520285.60 rows=29982560 width=10)
         ->  Index Scan using assertion_pkey on assertion  (cost=0.56..8.58 rows=1 width=10)
               Index Cond: (pk = assertion_current.pk)
               Filter: (NOT current)

这意味着这些查询之一（许多当前为真或许多当前为假）总是需要很长时间。

选项 2

单次通过，但必须不必要地触摸每一行。

UPDATE assertion
   SET current =
     (CASE WHEN assertion.pk IN (select PK from assertion_current)
     THEN TRUE ELSE FALSE END)

但这会导致再次对 assertion_current 进行序列扫描

 Update on assertion  (cost=0.00..15498697380303.70 rows=0 width=0)
   ->  Seq Scan on assertion  (cost=0.00..15498697380303.70 rows=35948392 width=7)
         SubPlan 1
           ->  Materialize  (cost=0.00..787318.40 rows=29982560 width=4)
                 ->  Seq Scan on assertion_current  (cost=0.00..520285.60 rows=29982560 width=4)

选项 3

与选项 1 类似，但WHERE在更新中使用：

BEGIN;
UPDATE assertion SET current = false WHERE current = true;
UPDATE assertion SET current = true FROM assertion_current
  WHERE assertion.pk = assertion_current.pk;
COMMIT;

但第二个查询涉及两次 seq 扫描：

 Update on assertion  (cost=1654256.82..2721576.65 rows=0 width=0)
   ->  Hash Join  (cost=1654256.82..2721576.65 rows=29982560 width=13)
         Hash Cond: (assertion_current.pk = assertion.pk)
         ->  Seq Scan on assertion_current  (cost=0.00..520285.60 rows=29982560 width=10)
         ->  Hash  (cost=1029371.92..1029371.92 rows=35948392 width=10)
               ->  Seq Scan on assertion  (cost=0.00..1029371.92 rows=35948392 width=10)

选项 4

谢谢@jjanes，这花了> 6个小时，所以我取消了它。

UPDATE assertion
   SET current = not current
   WHERE current <>
     (CASE WHEN assertion.pk IN (select PK from assertion_current)
     THEN TRUE ELSE FALSE END)

生产

 Update on assertion  (cost=0.00..11832617068493.14 rows=0 width=0)
   ->  Seq Scan on assertion  (cost=0.00..11832617068493.14 rows=27307890 width=7)
         Filter: (current <> CASE WHEN (SubPlan 1) THEN true ELSE false END)
         SubPlan 1
           ->  Materialize  (cost=0.00..787318.40 rows=29982560 width=4)
                 ->  Seq Scan on assertion_current  (cost=0.00..520285.60 rows=29982560 width=4)

选项 5

谢谢@a_horse_with_no_name。这在我的机器上需要 24 分钟。

UPDATE assertion tg SET current = EXISTS (SELECT pk FROM assertion_current cr WHERE cr.pk = tg.pk);

给

 Update on assertion tg  (cost=0.00..233024784.94 rows=0 width=0)
   ->  Seq Scan on assertion tg  (cost=0.00..233024784.94 rows=27445116 width=7)
         SubPlan 1
           ->  Index Only Scan using assertion_current_pk_idx on assertion_current cr  (cost=0.44..8.46 rows=1 width=0)
                 Index Cond: (pk = tg.pk)

有没有更好的方法来及时实现这一目标？

Joe

Asked: 2015-01-25 18:47:27 +0800 CST

更改未显示在进程列表中

0

我正在表演一个改变。它目前在一个大表（300,000,000）上运行。

MariaDB [my_database]> ALTER TABLE my_table
    -> add INDEX a (x, y, z),
    -> add INDEX d (x);
Stage: 1 of 2 'copy to tmp table'   60.1% of stage done

但是processlist没有提及，反复查询。

MariaDB [my_database]> show full processlist;
+----+------+-----------------+-------------+---------+------+-------+-----------------------+----------+
| Id | User | Host            | db          | Command | Time | State | Info                  | Progress |
+----+------+-----------------+-------------+---------+------+-------+-----------------------+----------+
|  6 | apps | localhost:52235 | my_database | Sleep   |  304 |       | NULL                  |    0.000 |
| 33 | apps | localhost       | my_database | Query   |    0 | NULL  | show full processlist |    0.000 |
+----+------+-----------------+-------------+---------+------+-------+-----------------------+----------+
2 rows in set (0.01 sec)

我希望它出现。有什么想法为什么不呢？

Joe

Asked: 2012-09-08 01:39:22 +0800 CST

测量 PostgreSQL 表行的大小

127

我有一个 PostgreSQL 表。select *非常慢，但又select id好又快。我认为可能是行的大小非常大并且需要一段时间才能传输，或者可能是其他一些因素。

我需要所有字段（或几乎所有字段），因此仅选择一个子集并不是快速解决方法。选择我想要的字段仍然很慢。

这是我的表架构减去名称：

integer                  | not null default nextval('core_page_id_seq'::regclass)
character varying(255)   | not null
character varying(64)    | not null
text                     | default '{}'::text
character varying(255)   | 
integer                  | not null default 0
text                     | default '{}'::text
text                     | 
timestamp with time zone | 
integer                  | 
timestamp with time zone | 
integer                  |

文本字段的大小可以是任意大小。但是，在最坏的情况下，不超过几千字节。

问题

这有什么叫“疯狂低效”的吗？
有没有办法在 Postgres 命令行中测量页面大小来帮助我调试它？

Aurora PostgreSQL TEMP TABLE 创建占用的 CPU 超出预期？

在查询计划中看到数据类型转换是否是一个危险信号？

当临时表被删除时，Postgres 会回收空间吗？

带有大表的 UPDATE FROM 很慢并且使用 Seq Scans

选项1

选项 2

选项 3

选项 4

选项 5

更改未显示在进程列表中

测量 PostgreSQL 表行的大小

问题

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Joe's questions

选项1

选项 2

选项 3

选项 4

选项 5

问题