AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 61084
Accepted
Dr.YSG
Dr.YSG
Asked: 2014-03-18 07:00:31 +0800 CST2014-03-18 07:00:31 +0800 CST 2014-03-18 07:00:31 +0800 CST

4300 万张 PostgreSQL 表上的复合索引

  • 772

这个问题与我之前问过的一个问题有关:Order of columns in a compound index in PostgreSQL (and query order)

我认为我可以在这里加强和限制我的问题,而不是让这个问题过载。给定以下查询(和 EXPLAIN ANALYZE),我正在创建的复合索引有帮助吗?

第一个查询仅使用简单索引(大纲上的 GIST)和(pid 上的 BTREE)运行。

查询是:

EXPLAIN ANALYZE SELECT DISTINCT ON (path) oid, pid, product_name, type, path, size 
FROM portal.inventory AS inv 
WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((21.51947021484375 51.55059814453125, 18.9129638671875 51.55059814453125, 18.9129638671875 48.8287353515625, 21.51947021484375 48.8287353515625, 21.51947021484375 51.55059814453125))'), inv.outline) 
AND (inv.pid in (20010,20046)) 

--

结果如下(速度更快,但可能只是因为数据库是热的)。

"Unique  (cost=581.76..581.76 rows=1 width=89) (actual time=110.436..110.655 rows=249 loops=1)"
"  ->  Sort  (cost=581.76..581.76 rows=1 width=89) (actual time=110.434..110.477 rows=1377 loops=1)"
"        Sort Key: path"
"        Sort Method: quicksort  Memory: 242kB"
"        ->  Bitmap Heap Scan on inventory inv  (cost=577.48..581.75 rows=1 width=89) (actual time=39.257..105.878 rows=1377 loops=1)"
"              Recheck Cond: ((pid = ANY ('{20010,20046}'::integer[])) AND ('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline))"
"              Rows Removed by Index Recheck: 3731"
"              Filter: (_st_distance('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography, outline, 0::double precision, false) < 1e-005::double precision)"
"              Rows Removed by Filter: 533"
"              ->  BitmapAnd  (cost=577.48..577.48 rows=1 width=0) (actual time=38.972..38.972 rows=0 loops=1)"
"                    ->  Bitmap Index Scan on inventory_pid_idx  (cost=0.00..123.82 rows=6204 width=0) (actual time=1.116..1.116 rows=7836 loops=1)"
"                          Index Cond: (pid = ANY ('{20010,20046}'::integer[]))"
"                    ->  Bitmap Index Scan on inventory_outline_idx  (cost=0.00..453.41 rows=8212 width=0) (actual time=37.765..37.765 rows=63112 loops=1)"
"                          Index Cond: ('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline)"
"Total runtime: 110.731 ms"

现在这是添加了复合索引的结果:(注意绝对时间较慢)

"Unique  (cost=37.81..37.82 rows=1 width=89) (actual time=2464.353..2464.561 rows=249 loops=1)"
"  ->  Sort  (cost=37.81..37.82 rows=1 width=89) (actual time=2464.349..2464.389 rows=1377 loops=1)"
"        Sort Key: path"
"        Sort Method: quicksort  Memory: 242kB"
"        ->  Bitmap Heap Scan on inventory inv  (cost=33.54..37.80 rows=1 width=89) (actual time=2361.018..2459.653 rows=1377 loops=1)"
"              Recheck Cond: (('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline) AND (pid = ANY ('{20010,20046}'::integer[])))"
"              Filter: (_st_distance('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography, outline, 0::double precision, false) < 1e-005::double precision)"
"              Rows Removed by Filter: 533"
"              ->  Bitmap Index Scan on inventory_compound_idx  (cost=0.00..33.53 rows=1 width=0) (actual time=2321.684..2321.684 rows=1910 loops=1)"
"                    Index Cond: (('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline) AND (pid = ANY ('{20010,20046}'::integer[])))"
"Total runtime: 2558.022 ms"

最后,这是表定义:

CREATE TABLE portal.inventory
(
  oid bigint,
  product_name character varying(100),
  type character varying(25),
  pid integer,
  size bigint,
  date timestamp without time zone,
  path character varying(200),
  outline geography(Polygon,4326)
)
WITH (
  OIDS=FALSE
);


CREATE INDEX inventory_compound_idx
  ON portal.inventory
  USING gist
  (outline, pid);


CREATE INDEX inventory_outline_idx
  ON portal.inventory
  USING gist
  (outline);


CREATE INDEX inventory_pid_idx
  ON portal.inventory
  USING btree
  (pid);

更新:下面列出的问题的答案:

我可以调整表格,但我正在努力使行变细。您的建议各不相同,类型等是我想更改的内容。

基本上,每一行代表有关地理空间图像文件的一些元数据。我们正在管理 50M,这很可能会增长到数亿或更多。在数据库中,每个文件都由一个唯一的 OID 引用(很抱歉该术语重复)。它们按“产品”分组,其中 PID 是产品 ID。每个产品可以有大约 1,000 个 OID。每个图像文件都有一个地理空间边界框(轮廓)。这就是我搜索所需的全部内容。其余数据不会为空(类型是文本字符串,大小是文件大小,日期是文件创建日期,路径是文件的 UNC 文件路径)。

现在这就是为什么我先按大纲,然后按 PID 对查询进行排序。产品将按地理空间分组。因此,波兰克拉科夫的所有 OID 行都将位于物理上的同一区域。所以我假设,如果我将桶缩小到一个小区域,第二个索引将非常小(比如一个城市区域大约 100 种产品)。IN( ..) 子句将退出。

PIDS 的实际值来自我在此处发布的另一个问题。但该表只针对产品,因此其大小约为 30K,这意味着快速搜索而不需要复合查询。

我想知道 POSTGreSQL 规划器是否足够聪明,可以决定如果两个索引都存在,则 (outline,pid) 的复合索引是否比 (pid, outline) 更快。好吧,我想我可以测试一下。

postgresql index
  • 2 2 个回答
  • 643 Views

2 个回答

  • Voted
  1. Best Answer
    Erwin Brandstetter
    2014-03-18T10:47:11+08:002014-03-18T10:47:11+08:00

    在 GiST 索引中,列的顺序与B 树索引具有不同的意义。根据文档:

    多列 GiST 索引可用于涉及索引列的任何子集的查询条件。附加列的条件限制了索引返回的条目,但第一列的条件对于确定需要扫描多少索引是最重要的。如果 GiST 索引的第一列只有几个不同的值,那么即使在其他列中有许多不同的值,GiST 索引也会相对无效。

    简而言之:将最具选择性的列放在第一位。

    您的EXPLAIN输出显示条件 on比( )pid更具选择性 ( ) 。如果可以概括(一个例子可能会产生误导)我建议这个替代方案:rows=7836outlinerows=63112

    CREATE INDEX inventory_compound_idx ON portal.inventory USING gist (pid, outline);

    如果您的大部分(重要)查询都包含两列的条件,则多列索引可能会很好地为您服务。否则,单列总体上可能更好。

    表格布局

    这是一个有根据的猜测,因为我没有完整的信息。

    • 不要oid用作列名。很容易与OID.

    • 不要将名称date用于时间戳列。或者更确切地说:不要使用任何列的名称date,根本不要使用基本类型的名称作为标识符。可能导致令人困惑的错误和错误消息。

    • 为类型创建一个查找表,只将一个小整数type_id放入大表中。紧紧地打包固定长度的类型,以免浪费空间来填充。细节。

    • 我更喜欢类型text(或varchar没有长度限制)而不是varchar(n). 细节。

    例如:

    CREATE TABLE portal.inventory (
       inventory_id bigint PRIMARY KEY
      ,type_id      integer NOT NULL REFERENCES inv_type(type_id)
      ,pid          integer NOT NULL
      ,size         bigint NOT NULL
      ,ts           timestamp NOT NULL
      ,outline      geography(Polygon,4326)
      ,product_name text
      ,path         text
    );
    
    • 8
  2. Dr.YSG
    2014-03-21T10:38:43+08:002014-03-21T10:38:43+08:00

    欧文,我有一些你要的数据。这次我尝试了一个更有野心的查询(我预计大部分工作将针对更小的 pids 集和更小的地理空间区域),但有些人会给系统带来压力,而这最终可能不会成为最大的查询。

    SELECT DISTINCT ON (path)  pid, type, path, size 
    FROM portal.inventory AS inv 
    WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((21.2310791015625 51.416015625, 18.643798828125 51.416015625, 18.643798828125 48.69415283203125, 21.2310791015625 48.69415283203125, 21.2310791015625 51.416015625))'), inv.outline) 
    AND (inv.pid in (23869,23869,23599,23869,14153,14156,110,19131,19131,19164,91,23501,36,23501,23586,23586,23586,23586,23586,23599,23599,20047,113,120,3,120,23,118,82,120,113,113,120,129,129,210,339,339,341,23345,23506,23559,23553,23546,23546,23765,20010,19939,19939,20043,20046,20046,20046,20046,20047,23345,23345,23507,23507,129,23589,23612,23612,23539,23539,23539,23553,23553,23553,23559,23596,23589,23594,23589,23589,23596,23596,23596,23506,23506,23511,23511,23742,23742,23846,23846,23846,23742,23765,23765,341,19939,20047,23612,62,150,150,150,150,150,150,268,268,268,268,23598,120,23501))
    

    用 (outline, pid) 解释索引分析

    [这里奇怪的是,即使查询指定 WHERE 子句首先具有大纲,其次是 pid - 因此应该使用索引 inventory_compound_idx 它正在使用反向索引 icompound_idx ]

    "Unique  (cost=10788.37..10792.36 rows=1 width=76) (actual time=3042.605..3120.677 rows=1682 loops=1)"
    "  ->  Sort  (cost=10788.37..10790.37 rows=799 width=76) (actual time=3042.600..3114.823 rows=48341 loops=1)"
    "        Sort Key: path"
    "        Sort Method: external merge  Disk: 4384kB"
    "        ->  Bitmap Heap Scan on inventory inv  (cost=503.88..10749.85 rows=799 width=76) (actual time=119.501..2586.973 rows=48341 loops=1)"
    "              Recheck Cond: ((pid = ANY ('{23869,23869,23599,23869,14153,14156,110,19131,19131,19164,91,23501,36,23501,23586,23586,23586,23586,23586,23599,23599,20047,113,120,3,120,23,118,82,120,113,113,120,129,129,210,339,339,341,23345,23506,23559,23553,23546,23546,23765,20010,19939,19939,20043,20046,20046,20046,20046,20047,23345,23345,23507,23507,129,23589,23612,23612,23539,23539,23539,23553,23553,23553,23559,23596,23589,23594,23589,23589,23596,23596,23596,23506,23506,23511,23511,23742,23742,23846,23846,23846,23742,23765,23765,341,19939,20047,23612,62,150,150,150,150,150,150,268,268,268,268,23598,120,23501}'::integer[])) AND ('0103000020E6100000010000000500000000000000283B35400000000040B5494000000000D0A432400000000040B5494000000000D0A4324000000000DA58484000000000283B354000000000DA58484000000000283B35400000000040B54940'::geography && outline))"
    "              Rows Removed by Index Recheck: 370361"
    "              Filter: (_st_distance('0103000020E6100000010000000500000000000000283B35400000000040B5494000000000D0A432400000000040B5494000000000D0A4324000000000DA58484000000000283B354000000000DA58484000000000283B35400000000040B54940'::geography, outline, 0::double precision, false) < 1e-005::double precision)"
    "              Rows Removed by Filter: 15439"
    "              ->  Bitmap Index Scan on inventory_icompound_idx  (cost=0.00..503.68 rows=2398 width=0) (actual time=117.783..117.783 rows=219595 loops=1)"
    "                    Index Cond: ((pid = ANY ('{23869,23869,23599,23869,14153,14156,110,19131,19131,19164,91,23501,36,23501,23586,23586,23586,23586,23586,23599,23599,20047,113,120,3,120,23,118,82,120,113,113,120,129,129,210,339,339,341,23345,23506,23559,23553,23546,23546,23765,20010,19939,19939,20043,20046,20046,20046,20046,20047,23345,23345,23507,23507,129,23589,23612,23612,23539,23539,23539,23553,23553,23553,23559,23596,23589,23594,23589,23589,23596,23596,23596,23506,23506,23511,23511,23742,23742,23846,23846,23846,23742,23765,23765,341,19939,20047,23612,62,150,150,150,150,150,150,268,268,268,268,23598,120,23501}'::integer[])) AND ('0103000020E6100000010000000500000000000000283B35400000000040B5494000000000D0A432400000000040B5494000000000D0A4324000000000DA58484000000000283B354000000000DA58484000000000283B35400000000040B54940'::geography && outline))"
    "Total runtime: 3125.598 ms"
    

    现在我切换了查询顺序,使 PID 在 where 子句中排在第一位,因此它应该使用 icompound_idx 索引 (pid, outline)。

    (你看到任何真正的优势,我没有)。

    "Unique  (cost=10788.37..10792.36 rows=1 width=76) (actual time=3030.431..3108.313 rows=1682 loops=1)"
    "  ->  Sort  (cost=10788.37..10790.37 rows=799 width=76) (actual time=3030.429..3102.474 rows=48341 loops=1)"
    "        Sort Key: path"
    "        Sort Method: external merge  Disk: 4384kB"
    "        ->  Bitmap Heap Scan on inventory inv  (cost=503.88..10749.85 rows=799 width=76) (actual time=110.656..2575.282 rows=48341 loops=1)"
    "              Recheck Cond: ((pid = ANY ('{23869,23869,23599,23869,14153,14156,110,19131,19131,19164,91,23501,36,23501,23586,23586,23586,23586,23586,23599,23599,20047,113,120,3,120,23,118,82,120,113,113,120,129,129,210,339,339,341,23345,23506,23559,23553,23546,23546,23765,20010,19939,19939,20043,20046,20046,20046,20046,20047,23345,23345,23507,23507,129,23589,23612,23612,23539,23539,23539,23553,23553,23553,23559,23596,23589,23594,23589,23589,23596,23596,23596,23506,23506,23511,23511,23742,23742,23846,23846,23846,23742,23765,23765,341,19939,20047,23612,62,150,150,150,150,150,150,268,268,268,268,23598,120,23501}'::integer[])) AND ('0103000020E6100000010000000500000000000000283B35400000000040B5494000000000D0A432400000000040B5494000000000D0A4324000000000DA58484000000000283B354000000000DA58484000000000283B35400000000040B54940'::geography && outline))"
    "              Rows Removed by Index Recheck: 370361"
    "              Filter: (_st_distance('0103000020E6100000010000000500000000000000283B35400000000040B5494000000000D0A432400000000040B5494000000000D0A4324000000000DA58484000000000283B354000000000DA58484000000000283B35400000000040B54940'::geography, outline, 0::double precision, false) < 1e-005::double precision)"
    "              Rows Removed by Filter: 15439"
    "              ->  Bitmap Index Scan on inventory_icompound_idx  (cost=0.00..503.68 rows=2398 width=0) (actual time=109.132..109.132 rows=219595 loops=1)"
    "                    Index Cond: ((pid = ANY ('{23869,23869,23599,23869,14153,14156,110,19131,19131,19164,91,23501,36,23501,23586,23586,23586,23586,23586,23599,23599,20047,113,120,3,120,23,118,82,120,113,113,120,129,129,210,339,339,341,23345,23506,23559,23553,23546,23546,23765,20010,19939,19939,20043,20046,20046,20046,20046,20047,23345,23345,23507,23507,129,23589,23612,23612,23539,23539,23539,23553,23553,23553,23559,23596,23589,23594,23589,23589,23596,23596,23596,23506,23506,23511,23511,23742,23742,23846,23846,23846,23742,23765,23765,341,19939,20047,23612,62,150,150,150,150,150,150,268,268,268,268,23598,120,23501}'::integer[])) AND ('0103000020E6100000010000000500000000000000283B35400000000040B5494000000000D0A432400000000040B5494000000000D0A4324000000000DA58484000000000283B354000000000DA58484000000000283B35400000000040B54940'::geography && outline))"
    "Total runtime: 3113.334 ms"
    
    • 0

相关问题

  • 我在索引上放了多少“填充”?

  • PostgreSQL 中 UniProt 的生物序列

  • RDBMS 上的“索引”是什么意思?[关闭]

  • 如何在 MySQL 中创建条件索引?

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve