我可以在使用数据库后激活 PITR 吗？

Question

bmargulies

Asked: 2012-06-05 15:47:30 +0800 CST2012-06-05 15:47:30 +0800 CST 2012-06-05 15:47:30 +0800 CST

加快 postgresql 中的删除

772

我想出了：

drop table if exists idtemp;
create temp table idtemp as 
      select documentid from taskflag where taskid='coref' and state = 2 
         order by statechanged asc limit howmany;
create unique index on idtemp(documentid);

-- trim taskflag to the first N docs ordered by coref.
delete from taskflag where documentid not in (select documentid from idtemp) ;

当 taskflag 中有 120k 条记录并且我保留 10k 时，这非常慢。

任务标志看起来像：

\d taskflag
                Table "public.taskflag"
    Column    |            Type             | Modifiers 
--------------+-----------------------------+-----------
 documentid   | character varying(64)       | not null
 taskid       | character varying(64)       | not null
 state        | smallint                    | 
 statechanged | timestamp without time zone | 
Indexes:
    "taskflag_pkey" PRIMARY KEY, btree (documentid, taskid)
    "task_index2" btree (documentid)
    "task_index4" btree (taskid, state, statechanged)

解释说：

                                QUERY PLAN                                    
----------------------------------------------------------------------------------
 Delete on taskflag  (cost=0.00..105811822.25 rows=223210 width=6)
   ->  Seq Scan on taskflag  (cost=0.00..105811822.25 rows=223210 width=6)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..449.00 rows=10000 width=146)
                 ->  Seq Scan on idtemp  (cost=0.00..184.00 rows=10000 width=146)
(6 rows)

我应该只安排临时表来包含我保留的那些吗？

2 个回答

Voted

Daniel Vérité · Answer 1 · 2012-06-05T17:45:07+08:00

Best Answer

Daniel Vérité

2012-06-05T17:45:07+08:002012-06-05T17:45:07+08:00

最简单的优化可能是让规划器使用哈希反连接，方法是将查询重写为：

 delete from taskflag where not exists
  (select 1 from idtemp where documentid=taskflag.documentid);

您还可能需要在填充临时表后立即对其进行分析。

4

dezso · Answer 2 · 2012-06-06T00:18:19+08:00

我已经设置了一个测试。尝试删除您所做的方式会给出几乎相同的查询计划。如果您ANALYZE idtemp在删除之前执行，计划将更改为以下内容：

                                QUERY PLAN
---------------------------------------------------------------------------
 Delete on taskflag  (cost=198.00..3041.00 rows=60000 width=6)
   ->  Seq Scan on taskflag  (cost=198.00..3041.00 rows=60000 width=6)
         Filter: (NOT (hashed SubPlan 1))
         SubPlan 1
           ->  Seq Scan on idtemp  (cost=0.00..173.00 rows=10000 width=24)

a_horse_with_no_name 建议的方式：

EXPLAIN delete from taskflag where not (taskid='coref' and state = 2);

                             QUERY PLAN
---------------------------------------------------------------------
 Delete on taskflag  (cost=0.00..3509.13 rows=1 width=6)
   ->  Seq Scan on taskflag  (cost=0.00..3509.13 rows=1 width=6)
         Filter: (((taskid)::text <> 'coref'::text) OR (state <> 2))

在我的测试盒上，我没有耐心等待你的版本完成 :) 第一个版本完成了 1 秒多一点（但执行 the和 theANALYZE需要额外的半秒，这意味着总共大约 2 秒），改写不到 1 秒。INSERTCREATE INDEXDELETE

@a_horse_with_no_name 的第二个建议（见下面的评论）使用以下代码

DROP TABLE IF EXISTS idtemp;

CREATE TABLE idtemp AS 
    SELECT documentid 
    FROM taskflag 
    WHERE taskid = 'coref' AND state = 2 
    ORDER BY statechanged ASC LIMIT 10000;

ALTER TABLE idtemp
ADD CONSTRAINT pk_taskflag PRIMARY KEY (documentid, taskid);

CREATE INDEX idx_11 ON idtemp (documentid);

CREATE INDEX idx_12 ON idtemp (taskid, state, statechanged);

ANALYZE idtemp;

DROP TABLE taskflag;

ALTER TABLE idtemp RENAME TO taskflag;

也需要大约 900 毫秒，这意味着在这种特殊情况下（使用我的特殊测试数据）这种方法非常有竞争力。如果你的taskflag表有依赖对象，它就不会工作。

加快 postgresql 中的删除

如何查看 Oracle 中的数据库列表？

mysql innodb_buffer_pool_size 应该有多大？

列出指定表的所有列

从 .frm 和 .ibd 文件恢复表？

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

如何选择每组的第一行？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

加快 postgresql 中的删除

2 个回答

相关问题