我有一个Shipment
表,其中包含一些关于货物的基本数据和一个ShipmentItem
表,其中包含有关该货物的附加属性,表的主键foreignKey
上有一个。toShipment
表是关系。Shipment
ShipmentItem
OneToMany
我们需要包含一个文本搜索选项,该选项采用给定的输入文本字符串并搜索 (make)Shipment
的 2 个列以及三个特定types
的ShipmentItem
name 列。这是我当前的查询:
select *
from Shipment shipment
where shipment.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
and shipment.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
and (
shipment.identifierkeyvalues = '12345'
or shipment.carrierReferenceNumber = '12345'
or shipment.uuid in (
select shipmentItem.resultId
from ShipmentItem shipmentItem
where (
shipmentItem.type in (
'poNumber', 'deliveryNoteNumber', 'salesOrderNumber'
)
)
and shipmentItem.name = '12345'
and shipmentItem.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
and shipmentItem.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
)
)
limit 25
我发现的问题是将子查询作为or
条件之一的组合导致了主要的性能问题(即使子查询本身通过利用type_name_deliveryRequestedDate
该表上的索引快速返回。虽然我们在主表上有多个索引(identifierKeyValues
, carrierReferenceNumber
, , 甚至是查询的所有三个 Shipment 列的索引,它只会使用deliveryRequestedDate
效率极低的索引,因为这个查询的范围太大了。
将其转换为 JOIN 似乎会导致相同的行为。我只是不太确定目前最好的方法是什么。我们在此查询上方有一个 Java Persistence API 层,因此希望尽可能避免对数据模型进行任何重大更改,但不确定最佳方法是什么。任何想法将不胜感激!
解释计划:
Limit (cost=110.61..209.98 rows=25 width=1370) (actual time=119503.030..124034.809 rows=1 loops=1)
-> Index Scan using shipment_deliveryrequesteddate_idx on shipment shipment (cost=110.61..890840.18 rows=224084 width=1370) (actual time=119503.027..124034.805 rows=1 loops=1)
Index Cond: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
Filter: ((identifierkeyvalues = '12345'::text) OR (carrierreferencenumber = '12345'::text) OR (hashed SubPlan 1))
Rows Removed by Filter: 496784
SubPlan 1
-> Index Scan using "type_name_deliveryRequestedDate" on resultitem shipmentitem (cost=0.56..110.11 rows=24 width=16) (actual time=10.706..16.416 rows=1 loops=1)
Index Cond: ((type = ANY ('{poNumber,deliveryNoteNumber,salesOrderNumber}'::text[])) AND (name = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
Planning time: 3.175 ms
Execution time: 124035.006 ms
解释计划删除子查询——为什么它使用完全不同的索引?
Limit (cost=9.51..273.71 rows=6 width=1370) (actual time=0.052..0.053 rows=0 loops=1)
-> Bitmap Heap Scan on shipment shipment (cost=9.51..273.71 rows=6 width=1370) (actual time=0.051..0.051 rows=0 loops=1)
Recheck Cond: (((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) OR (carrierreferencenumber = '12345'::text))
Filter: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
Rows Removed by Filter: 2
Heap Blocks: exact=2
-> BitmapOr (cost=9.51..9.51 rows=66 width=0) (actual time=0.041..0.041 rows=0 loops=1)
-> Bitmap Index Scan on shipment_identifierkeyvalues_idx (cost=0.00..4.61 rows=4 width=0) (actual time=0.023..0.024 rows=0 loops=1)
Index Cond: ((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
-> Bitmap Index Scan on shipment_carrierreferencenumber_idx (cost=0.00..4.90 rows=62 width=0) (actual time=0.016..0.016 rows=2 loops=1)
Index Cond: (carrierreferencenumber = '12345'::text)
Planning time: 1.668 ms
Execution time: 0.116 ms
它不能使用 BitmapOr 对不同的表进行扫描(或者至少,它没有被编码为能够做到这一点——如果有人投入工作,它可能会这样做——它会必须在另一个表中查找UUID,然后将它们转换为ipso表上的tid并将它们填充到位图中),因此无法使用BitmapOr计划。
您最好的选择可能是将其编写为两个不同查询的 UNION ALL,一个只命中单个表,一个同时命中两个表。