tl;dr:何时使用array_agg
数组子查询的横向连接?
语境:
create table t_zipcode ( zipcode text primary key, location geography );
create table t_user ( id text primary key, zipcode text references t_zipcode);
我想设计一个查询,获取每个用户的所有“附近”邮政编码,例如 20 公里范围内的邮政编码。我的第一次尝试是这样的:
select u.id, z.zipcode, z.location, array_agg(zz.zipcode order by zz.zipcode) as nearby_zipcodes
from t_user u
join t_zipcode z on u.zipcode = z.zipcode
left join t_zipcode zz on st_dwithin(z.location, zz.location, 20000)
group by u.id, z.zipcode, z.location
order by u.id;
返回的结果如下:
+------------------------------------+-------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id |zipcode|location |nearby_zipcodes |
+------------------------------------+-------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|0233684d-d2d8-4fdc-863f-08c6aac79d92|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|0e59fd58-00a5-442e-a21e-16ab1f4d2f65|85226 |0101000020E6100000506EDBF7A8005CC016A243E048A04040|{85034,85040,85041,85042,85044,85045,85048,85121,85202,85224,85225,85226,85248,85282,85283,85284,85286,85339} |
|13f26eca-bda0-4b8f-bcff-c95cceee3421|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|2d69e6bd-e0f0-40aa-ba3a-0e157fff46ff|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|4bd8e913-7a75-4028-8254-faebf043f629|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|627897e0-595c-4d21-b1b1-8dd43f21dee7|85033 |0101000020E6100000DAC69FA86C0D5CC08E78B29B19BF4040|{85003,85004,85006,85007,85009,85012,85013,85014,85015,85016,85017,85019,85020,85021,85023,85029,85031,85033,85034,85035,85037,85040,85041,85043,85051,85053,85064,85301,85302,85303,85304,85305,85306,85307,85308,85309,85323,85335,85340,85345,85351,85353,85363,85378,85379,85381,85382,85392,85395} |
|6353efe7-b0e0-42a2-b5fd-f38b7c8e27ad|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|6efad24b-10a7-4ac7-b91a-8842bca70c06|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|7f1fe1d1-e2cd-44b9-85f0-b3b888823a20|85305 |0101000020E6100000BF7D1D3867105CC0D2C6116BF1C34040|{85003,85004,85007,85009,85012,85013,85014,85015,85017,85019,85020,85021,85023,85029,85031,85033,85035,85037,85043,85051,85053,85080,85301,85302,85303,85304,85305,85306,85307,85308,85309,85323,85335,85340,85345,85351,85353,85355,85363,85373,85374,85375,85378,85379,85381,85382,85388,85392,85395} |
|9c238c4a-712b-4b10-a91d-6c3548ae59bc|86001 |0101000020E6100000054F2157EAE85BC0D6AA5D13D2A44140|{86001,86011,86015} |
|ace88100-9ecd-4931-a617-b8d8b091470e|85007 |0101000020E6100000CB30EE06D1055CC0AA9D616A4BB94040|{85003,85004,85006,85007,85008,85009,85012,85013,85014,85015,85016,85017,85018,85019,85020,85021,85028,85029,85031,85033,85034,85035,85037,85040,85041,85042,85043,85044,85045,85048,85051,85064,85251,85253,85257,85281,85282,85283,85284,85301,85302,85303,85304,85305,85353} |
|bf96321b-be2e-4ef9-801e-9ee82ce693f4|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|d787537e-a3fa-4267-b313-71b2c07027fe|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|ea8d074d-adb1-4c45-bd02-b6b57f5a6604|41091 |0101000020E61000005B0A48FB1F2F55C063D009A183744340|{41005,41018,41030,41042,41051,41080,41091,41092,41094,41095,47020,47038,47040} |
|f925949e-6a5e-4e2c-9591-60f05f42ccb1|81001 |0101000020E6100000C4245CC823225AC025E7C41EDA254340|{81001,81003,81006,81008,81025} |
|fc47120d-d3da-4847-8134-0eca87708467|98102 |0101000020E61000001C78B5DC99945EC09A7CB3CD8DD14740|{98004,98005,98006,98007,98008,98011,98020,98021,98028,98033,98034,98036,98039,98040,98043,98052,98056,98057,98072,98101,98102,98103,98104,98105,98106,98107,98108,98109,98110,98112,98115,98116,98117,98118,98119,98121,98122,98125,98126,98133,98134,98136,98144,98146,98154,98155,98164,98168,98174,98177,98178,98195,98199,98342,98353}|
+------------------------------------+-------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
这是仅包含 16 个用户并在邮政编码位置列上使用索引的分析计划:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|GroupAggregate (cost=85365.87..86252.54 rows=35467 width=112) (actual time=1.932..2.474 rows=16 loops=1) |
| Output: u.id, z.zipcode, z.location, array_agg(zz.zipcode ORDER BY zz.zipcode) |
| Group Key: u.id, z.zipcode, z.location |
| -> Sort (cost=85365.87..85454.53 rows=35467 width=112) (actual time=1.887..1.913 rows=277 loops=1) |
| Output: u.id, z.zipcode, z.location, zz.zipcode |
| Sort Key: u.id, z.zipcode, z.location |
| Sort Method: quicksort Memory: 53kB |
| -> Nested Loop Left Join (cost=0.69..81507.59 rows=35467 width=112) (actual time=0.197..1.641 rows=277 loops=1) |
| Output: u.id, z.zipcode, z.location, zz.zipcode |
| -> Nested Loop (cost=0.29..719.73 rows=1070 width=80) (actual time=0.043..0.136 rows=16 loops=1) |
| Output: u.id, z.zipcode, z.location |
| Inner Unique: true |
| -> Seq Scan on pg_temp.t_user u (cost=0.00..20.70 rows=1070 width=48) (actual time=0.017..0.021 rows=16 loops=1) |
| Output: u.id, u.zipcode |
| -> Index Scan using test_idx_user_zip on pg_temp.t_zipcode z (cost=0.29..0.65 rows=1 width=64) (actual time=0.006..0.006 rows=1 loops=16)|
| Output: z.zipcode, z.location |
| Index Cond: (z.zipcode = u.zipcode) |
| -> Index Scan using test_idx_zip_geo on pg_temp.t_zipcode zz (cost=0.40..75.47 rows=3 width=64) (actual time=0.040..0.089 rows=17 loops=16) |
| Output: zz.zipcode, zz.location |
| Index Cond: (zz.location && _st_expand(z.location, '20000'::double precision)) |
| Filter: st_dwithin(z.location, zz.location, '20000'::double precision, true) |
| Rows Removed by Filter: 8 |
|Query Identifier: -2358650375080684395 |
|Planning Time: 0.421 ms |
|Execution Time: 2.649 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
然后,我使用横向连接尝试了类似的解决方案,这产生了完全相同的结果:
select u.id, z.zipcode, z.location, t.nearby_zipcodes
from t_user u
join t_zipcode z on u.zipcode = z.zipcode
cross join lateral (select array(select zz.zipcode
from t_zipcode zz
where st_dwithin(z.location, zz.location, 20000)
order by zz.zipcode) as nearby_zipcodes
) t
order by u.id;
令我惊讶的是,它的性能稍好一些:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Sort (cost=85823.88..85826.55 rows=1070 width=112) (actual time=1.828..1.830 rows=16 loops=1) |
| Output: u.id, z.zipcode, z.location, ((SubPlan 1)) |
| Sort Key: u.id |
| Sort Method: quicksort Memory: 30kB |
| -> Nested Loop (cost=0.29..85770.04 rows=1070 width=112) (actual time=0.319..1.809 rows=16 loops=1) |
| Output: u.id, z.zipcode, z.location, (SubPlan 1) |
| Inner Unique: true |
| -> Seq Scan on pg_temp.t_user u (cost=0.00..20.70 rows=1070 width=48) (actual time=0.012..0.016 rows=16 loops=1) |
| Output: u.id, u.zipcode |
| -> Index Scan using test_idx_user_zip on pg_temp.t_zipcode z (cost=0.29..0.65 rows=1 width=64) (actual time=0.006..0.006 rows=1 loops=16) |
| Output: z.zipcode, z.location |
| Index Cond: (z.zipcode = u.zipcode) |
| SubPlan 1 |
| -> Sort (cost=79.48..79.49 rows=3 width=32) (actual time=0.101..0.102 rows=17 loops=16) |
| Output: zz.zipcode |
| Sort Key: zz.zipcode |
| Sort Method: quicksort Memory: 25kB |
| -> Index Scan using test_idx_zip_geo on pg_temp.t_zipcode zz (cost=0.40..79.45 rows=3 width=32) (actual time=0.032..0.080 rows=17 loops=16)|
| Output: zz.zipcode |
| Index Cond: (zz.location && _st_expand(z.location, '20000'::double precision)) |
| Filter: st_dwithin(z.location, zz.location, '20000'::double precision, true) |
| Rows Removed by Filter: 8 |
|Query Identifier: 8776498431209157273 |
|Planning Time: 0.257 ms |
|Execution Time: 1.892 ms |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
我的印象是,如果可能的话,一般应该避免横向连接,是否有一个我没有注意到的缺点?
这有点笼统了。
LATERAL
join 是一个与其他工具一样的工具:在适当的时候使用。这一切都取决于。ORDER BY
聚合函数中的计算通常相对昂贵。这是尽可能避免的一件事。无论哪种方式,这里都是可能的:查询少数用户
16个用户是很少的。
子
LATERAL
查询可能是最简单且最快的。在这种情况下,数组构造函数通常是最便宜的。看:您的第二个查询已经正确。数组构造函数始终返回一行,因此
CROSS JOIN
不会消除没有附近位置的用户。由于横向子查询总是返回一行,并且为了挤出最后一点性能,您可以使用相关子查询而不是子
LATERAL
查询。通常稍快一些:美丽:
GROUP BY
一点也没有。有关的:
多个用户的查询
上述内容由于两个原因不能很好地扩展:
这更像是您的第一个查询,但在子查询中订购了一次。增益与用户数量和重复 zib 代码的比例成正比。
MATERIALIZED VIEW
邮政编码和地理位置通常不会改变。如果搜索半径也是恒定的,或者对于少量的允许半径,a
MATERIALIZED VIEW
就会自动出现。物化nearby_zipcodes
(每个搜索半径一个)并重用结果。还快得多。