PostgreSQL 中 UniProt 的生物序列

Question

daniel9x

Asked: 2019-09-12 06:06:56 +0800 CST2019-09-12 06:06:56 +0800 CST 2019-09-12 06:06:56 +0800 CST

如何让 Postgres 多列多表搜索更高效

772

我有一个Shipment表，其中包含一些关于货物的基本数据和一个ShipmentItem表，其中包含有关该货物的附加属性，表的主键foreignKey上有一个。toShipment表是关系。ShipmentShipmentItemOneToMany

我们需要包含一个文本搜索选项，该选项采用给定的输入文本字符串并搜索 (make)Shipment的 2 个列以及三个特定types的ShipmentItemname 列。这是我当前的查询：

select *
from Shipment shipment
where shipment.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
  and shipment.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
  and (
        shipment.identifierkeyvalues = '12345'
        or shipment.carrierReferenceNumber = '12345'
        or shipment.uuid in (
            select shipmentItem.resultId
            from ShipmentItem shipmentItem
            where (
                shipmentItem.type in (
                                      'poNumber', 'deliveryNoteNumber', 'salesOrderNumber'
                )
            )
            and shipmentItem.name = '12345'
            and shipmentItem.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
            and shipmentItem.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
       )
    )
limit 25

我发现的问题是将子查询作为or条件之一的组合导致了主要的性能问题（即使子查询本身通过利用type_name_deliveryRequestedDate该表上的索引快速返回。虽然我们在主表上有多个索引（identifierKeyValues, carrierReferenceNumber, , 甚至是查询的所有三个 Shipment 列的索引，它只会使用deliveryRequestedDate效率极低的索引，因为这个查询的范围太大了。

将其转换为 JOIN 似乎会导致相同的行为。我只是不太确定目前最好的方法是什么。我们在此查询上方有一个 Java Persistence API 层，因此希望尽可能避免对数据模型进行任何重大更改，但不确定最佳方法是什么。任何想法将不胜感激！

解释计划：

Limit  (cost=110.61..209.98 rows=25 width=1370) (actual time=119503.030..124034.809 rows=1 loops=1)
      ->  Index Scan using shipment_deliveryrequesteddate_idx on shipment shipment  (cost=110.61..890840.18 rows=224084 width=1370) (actual time=119503.027..124034.805 rows=1 loops=1)
            Index Cond: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
            Filter: ((identifierkeyvalues = '12345'::text) OR (carrierreferencenumber = '12345'::text) OR (hashed SubPlan 1))
            Rows Removed by Filter: 496784
            SubPlan 1
              ->  Index Scan using "type_name_deliveryRequestedDate" on resultitem shipmentitem  (cost=0.56..110.11 rows=24 width=16) (actual time=10.706..16.416 rows=1 loops=1)
                    Index Cond: ((type = ANY ('{poNumber,deliveryNoteNumber,salesOrderNumber}'::text[])) AND (name = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
    Planning time: 3.175 ms
    Execution time: 124035.006 ms

解释计划删除子查询——为什么它使用完全不同的索引？

Limit  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.052..0.053 rows=0 loops=1)
  ->  Bitmap Heap Scan on shipment shipment  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.051..0.051 rows=0 loops=1)
        Recheck Cond: (((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) OR (carrierreferencenumber = '12345'::text))
        Filter: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
        Rows Removed by Filter: 2
        Heap Blocks: exact=2
        ->  BitmapOr  (cost=9.51..9.51 rows=66 width=0) (actual time=0.041..0.041 rows=0 loops=1)
              ->  Bitmap Index Scan on shipment_identifierkeyvalues_idx  (cost=0.00..4.61 rows=4 width=0) (actual time=0.023..0.024 rows=0 loops=1)
                    Index Cond: ((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
              ->  Bitmap Index Scan on shipment_carrierreferencenumber_idx  (cost=0.00..4.90 rows=62 width=0) (actual time=0.016..0.016 rows=2 loops=1)
                    Index Cond: (carrierreferencenumber = '12345'::text)
Planning time: 1.668 ms
Execution time: 0.116 ms

1 个回答

Voted

jjanes · Answer 1 · 2019-09-12T07:12:07+08:00

Best Answer

jjanes

2019-09-12T07:12:07+08:002019-09-12T07:12:07+08:00

它不能使用 BitmapOr 对不同的表进行扫描（或者至少，它没有被编码为能够做到这一点——如果有人投入工作，它可能会这样做——它会必须在另一个表中查找UUID，然后将它们转换为ipso表上的tid并将它们填充到位图中），因此无法使用BitmapOr计划。

您最好的选择可能是将其编写为两个不同查询的 UNION ALL，一个只命中单个表，一个同时命中两个表。

3

如何让 Postgres 多列多表搜索更高效

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

如何让 Postgres 多列多表搜索更高效

1 个回答

相关问题