我有以下功能,我想用它来更新状态和距离列。
CREATE OR REPLACE FUNCTION public.check_deviation_table(
)
RETURNS void
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
declare
status integer;
geoms integer;
distance float;
selected_rotue_name text;
Begin
update ping_data set distance=sub.distance from
(select min(st_distance(st_setsrid(st_transform(rl.geom,3857),3857),st_setsrid(st_transform(ST_SetSRID(ST_MakePoint(pd.longitude,pd.latitude),4326),3857),3857))) as distance
,pd.id from ping_data pd inner join order_data od on od."orderNumber"=pd.ordernumber
inner join route_line rl on od."pickupLocName"||'-'||od."deliveryLocName"=rl."route name"
where pd.distance is null group by pd.id)sub
where ping_data.id=sub.id;
update ping_data set status='No deviation' where ping_data.status is null and ping_data.distance<='3000'::int::float;
update ping_data set status='Deviated' where ping_data.status is null and ping_data.distance>'3000'::int::float;
End
$BODY$;
它在几千条记录上运行良好,但是当我在ping_data
有 642810 条记录的整个表上运行它时,它只会继续运行。我还在所涉及的表上创建了以下索引,但无济于事。
create index btree_ping on ping_data using btree(id);
create index gist_route on route_line using gist(geom);
请参考以下详情
Postgres 版本:PostgreSQL 12.4,Visual C++ build 1914,64-bit 编译
PostGIS 版本:3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
机器详情:Intel i5 16GB RAM Windows 10
下面共享的是所涉及的表的模式
CREATE TABLE public.route_line
(
"route name" character varying COLLATE pg_catalog."default",
geom geometry,
routeid integer NOT NULL DEFAULT nextval('route_line_routeid_seq'::regclass),
"pickup location" text COLLATE pg_catalog."default",
"destination location" text COLLATE pg_catalog."default",
CONSTRAINT route_line_pkey PRIMARY KEY (routeid)
)
CREATE TABLE public.ping_data
(
latitude double precision,
longitude double precision,
pingdt character varying COLLATE pg_catalog."default",
shipmentid integer,
addedat timestamp with time zone DEFAULT now(),
ordernumber bigint,
id integer NOT NULL DEFAULT nextval('ping_data_id_seq'::regclass),
ping_dt timestamp with time zone,
status text COLLATE pg_catalog."default",
distance double precision
)
CREATE TABLE public.order_data
(
"assetNumber" text COLLATE pg_catalog."default",
"createDt" timestamp with time zone,
"deliveryLocName" text COLLATE pg_catalog."default",
"orderNumber" bigint,
"pickupLocName" text COLLATE pg_catalog."default",
"releaseNumber" bigint,
assigned character varying(1) COLLATE pg_catalog."default" DEFAULT '0'::character varying,
city text COLLATE pg_catalog."default",
transportername text COLLATE pg_catalog."default"
)
@ErwinBrandstetter 请参考以下对您共享的更新查询的解释分析结果。
Update on ping_data p (cost=2199987.43..6027475.29 rows=97076 width=162) (actual time=6014063.492..6014063.861 rows=0 loops=1)
-> Hash Join (cost=2199987.43..6027475.29 rows=97076 width=162) (actual time=168649.223..5998253.555 rows=545483 loops=1)
Hash Cond: (sub.id = p.id)
-> Subquery Scan on sub (cost=2144323.35..5958823.04 rows=97076 width=48) (actual time=167552.663..5987759.840 rows=545483 loops=1)
-> GroupAggregate (cost=2144323.35..5957852.28 rows=97076 width=12) (actual time=167552.647..5984534.784 rows=545483 loops=1)
Group Key: pd.id
-> Sort (cost=2144323.35..2144630.74 rows=122956 width=13212) (actual time=165047.817..177245.569 rows=891149 loops=1)
Sort Key: pd.id
Sort Method: external merge Disk: 2274960kB
-> Hash Join (cost=3815.85..47337.28 rows=122956 width=13212) (actual time=1134.751..2381.401 rows=891149 loops=1)
Hash Cond: (pd.ordernumber = od.orderNumber)
-> Seq Scan on ping_data pd (cost=0.00..38132.37 rows=97076 width=28) (actual time=0.122..381.017 rows=642810 loops=1)
Filter: (distance IS NULL)
-> Hash (cost=1451.69..1451.69 rows=1453 width=13200) (actual time=1134.316..1134.399 rows=1632 loops=1)
Buckets: 512 Batches: 8 Memory Usage: 576kB
-> Hash Join (cost=46.93..1451.69 rows=1453 width=13200) (actual time=159.161..1119.843 rows=1632 loops=1)
Hash Cond: ((rl.route name)::text = ((od.pickupLocName || '-'::text) || od.deliveryLocName))
-> Seq Scan on route_line rl (cost=0.00..958.54 rows=954 width=13210) (actual time=25.129..121.613 rows=954 loops=1)
-> Hash (cost=31.97..31.97 rows=1197 width=51) (actual time=11.256..11.259 rows=1197 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 114kB
-> Seq Scan on order_data od (cost=0.00..31.97 rows=1197 width=51) (actual time=9.379..10.686 rows=1197 loops=1)
-> Hash (cost=38132.37..38132.37 rows=645737 width=89) (actual time=1042.800..1042.801 rows=642810 loops=1)
Buckets: 32768 Batches: 32 Memory Usage: 2704kB
-> Seq Scan on ping_data p (cost=0.00..38132.37 rows=645737 width=89) (actual time=0.135..674.748 rows=642810 loops=1)
Planning Time: 155.156 ms
Execution Time: 6016315.903 ms
有人可以帮我优化这个功能吗?
这与您的原始版本完全相同:
应该已经快了很多。
删除所有无用的声明变量。那么就没有什么需要PL/pgSQL了。使用更简单的 SQL 函数,或者只是简单的查询。
最重要的是,使用一个
UPDATE
而不是三个。您的原件将编写两个新的行版本,而不仅仅是一个。有几个顺序扫描而不是一个。(索引可能对此毫无用处或使其更加昂贵,但这取决于...)可能还有更多的潜力,但我会在没有详细信息的情况下停在这里。