在 Neo4j 中,当我使用以下查询时:
MATCH (p:Person)-[:ACTED_IN]->(m) WHERE 'Neo' IN r.roles RETURN p
那么它只返回一个 Person 节点。
但是当我将查询更改为:
MATCH (p:Person)-[:ACTED_IN]->(m) WHERE 'Neo' IN r.roles RETURN p.name
然后它返回 3 行。
这对我来说很奇怪,因为我预计只应该返回一行?
是否可以检查在 PLSQL 块内执行的 SQL 语句的执行计划?
DECLARE
l_count PLS_INTEGER;
BEGIN
SELECT COUNT(1) INTO l_count
FROM foo;
END;
/
对于常规 SQL,我通常会运行以下命令来检查执行计划:
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
但是,这只是报告:
NOTE: cannot fetch plan for SQL_ID: 3q0sujncq54wy, CHILD_NUMBER: 0
Please verify value of SQL_ID and CHILD_NUMBER;
It could also be that the plan is no longer in cursor cache (check v$sql_plan)
在提出最近的问题时,我的 EXPLAIN ANALYZE 输出中出现了一些神秘的启动时间组件。我玩得更远了,发现如果我删除正则表达式WHERE
子句,启动时间会下降到接近 0。
我运行了以下 bash 脚本作为测试:
for i in $(seq 1 10)
do
if (( $RANDOM % 2 == 0 ))
then
echo "Doing plain count"
psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2"
else
echo "Doing regex count"
psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'"
fi
done
第一个查询返回约 3000 万行的计数,第二个查询仅计算 7 行。它们在 RDS 中的 PG 12.3 只读副本上运行,其他活动最少。正如我所料,这两个版本所花费的时间大致相同。下面是一些用 过滤的输出grep
:
Doing plain count
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.086..38622.215 rows=10114306 loops=3)
Doing regex count
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3091955.34 rows=897 width=0) (actual time=16856.679..41398.062 rows=2 loops=3)
Doing plain count
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.162..39454.499 rows=10114306 loops=3)
Doing plain count
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.036..39213.171 rows=10114306 loops=3)
Doing regex count
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3091955.34 rows=897 width=0) (actual time=12711.308..40015.734 rows=2 loops=3)
Doing plain count
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.244..39277.683 rows=10114306 loops=3)
Doing regex count
^CCancel request sent
所以,有几个问题:
正则表达式扫描中“实际时间”的启动组件是什么,为什么它要大得多?(10-20 秒对 0-1 秒)
虽然“成本”和“时间”不是可比单位,但规划者似乎认为启动成本在所有情况下都应该为 0——这是被愚弄了吗?
为什么这些策略看起来不同?两个计划都提到Partial Aggregate
,但是正则表达式查询说实际行是2
,但是普通版本说实际行是 ~1000 万(我猜这是 2 个工人和 1 个领导者之间的某种平均值,总和为 ~3000 万)。如果我必须自己实现这个,我可能会将几个count(*)
操作的结果相加,而不是合并行和计数 - 计划是否表明它是如何做到这一点的?
所以我不隐藏任何东西,下面是每个查询计划的完整版本:
EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=3093171.59..3093171.60 rows=1 width=8) (actual time=39156.499..39156.499 rows=1 loops=1)
-> Gather (cost=3093171.37..3093171.58 rows=2 width=8) (actual time=39156.356..39157.850 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=3092171.37..3092171.38 rows=1 width=8) (actual time=39154.405..39154.406 rows=1 loops=3)
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3060587.90 rows=12633390 width=0) (actual time=0.033..38413.690 rows=10115030 loops=3)
Planning Time: 7.968 ms
Execution Time: 39157.942 ms
(8 rows)
EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=3093173.83..3093173.84 rows=1 width=8) (actual time=39908.495..39908.495 rows=1 loops=1)
-> Gather (cost=3093173.61..3093173.82 rows=2 width=8) (actual time=39908.408..39909.848 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=3092173.61..3092173.62 rows=1 width=8) (actual time=39906.317..39906.318 rows=1 loops=3)
-> Parallel Seq Scan on ui_events_v2 (cost=0.00..3092171.37 rows=897 width=0) (actual time=17250.058..39906.308 rows=2 loops=3)
Filter: (page ~ 'foo'::text)
Rows Removed by Filter: 10115028
Planning Time: 0.803 ms
Execution Time: 39909.921 ms
(10 rows)
我EXPLAIN
在 Postgres 12.3 上无法理解这一点:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT count(1) FROM mytable WHERE page ~ 'foo';
这是一个包含 3000 万行的 22GB 表,位于具有 16GB 内存的服务器上。该查询计算 7 个匹配行。
我将输出解释为 I/O 花费了 164 秒,但整个查询只用了 65 秒。我认为这可能会重复计算一些并行工作人员,但是当我添加时VERBOSE
,它似乎也没有加起来。
看起来好像是说 2 名工人中的每一个都花了大约 55 秒的时间阅读。如果总和为 110 秒,我如何获得 164 秒的 I/O?(由于缓存页面时此查询需要约 10 秒,我猜实际读取时间与此处的 50 秒相差不远,FWIW)
我也很困惑这Parallel Seq Scan
似乎需要 32 秒,但还有 30 多秒才能得到最终结果。我认为由于它找到了 7 行,除了扫描之外几乎没有其他工作可做。我读错了这个部分吗?
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=3092377.14..3092377.15 rows=1 width=8) (actual time=65028.818..65028.818 rows=1 loops=1)
Output: count(1)
Buffers: shared hit=75086 read=2858433 dirtied=1
I/O Timings: read=164712.060
-> Gather (cost=3092376.92..3092377.13 rows=2 width=8) (actual time=65028.732..65030.093 rows=3 loops=1)
Output: (PARTIAL count(1))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=75086 read=2858433 dirtied=1
I/O Timings: read=164712.060
-> Partial Aggregate (cost=3091376.92..3091376.93 rows=1 width=8) (actual time=65026.990..65026.990 rows=1 loops=3)
Output: PARTIAL count(1)
Buffers: shared hit=75086 read=2858433 dirtied=1
I/O Timings: read=164712.060
Worker 0: actual time=65026.164..65026.164 rows=1 loops=1
Buffers: shared hit=25002 read=952587
I/O Timings: read=54906.994
Worker 1: actual time=65026.264..65026.264 rows=1 loops=1
Buffers: shared hit=25062 read=954370 dirtied=1
I/O Timings: read=54889.244
-> Parallel Seq Scan on public.ui_events_v2 (cost=0.00..3091374.68 rows=896 width=0) (actual time=31764.552..65026.980 rows=2 loops=3)
Filter: (ui_events_v2.page ~ 'foo'::text)
Rows Removed by Filter: 10112272
Buffers: shared hit=75086 read=2858433 dirtied=1
I/O Timings: read=164712.060
Worker 0: actual time=16869.988..65026.156 rows=2 loops=1
Buffers: shared hit=25002 read=952587
I/O Timings: read=54906.994
Worker 1: actual time=64091.539..65026.258 rows=1 loops=1
Buffers: shared hit=25062 read=954370 dirtied=1
I/O Timings: read=54889.244
Planning Time: 0.333 ms
Execution Time: 65030.133 ms
是否可以查看命令的执行时间alter table
?
我认为\timing
在 psql 中使用 on,它显示的不仅仅是执行时间,还有客户端往返,也许还有其他东西。我想要的是显示的“执行时间” explain analyze
。但我无法alter table
通过explain analyze
(对吗?)。
我在 PostgreSQL 11.5 上有以下 EXPLAIN ANALYZE 查询的输出
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT actor.actor_id, actor.first_name, act.num
FROM actor
INNER JOIN (
SELECT actor_id, COUNT(DISTINCT film_category.category_id) as num
FROM film_actor
INNER JOIN film_category ON film_actor.film_id = film_category.film_id
GROUP BY actor_id
) act ON act.actor_id = actor.actor_id
ORDER BY actor.actor_id ASC;
Merge Join (cost=527.43..591.41 rows=200 width=18) (actual time=320.130..324.861 rows=200 loops=1)
Output: actor.actor_id, actor.first_name, (count(DISTINCT film_category.category_id))
Inner Unique: true
Merge Cond: (actor.actor_id = film_actor.actor_id)
Buffers: shared hit=9 read=35
-> Index Scan using actor_pkey on public.actor (cost=0.14..16.16 rows=200 width=10) (actual time=77.146..77.272 rows=200 loops=1)
Output: actor.actor_id, actor.first_name, actor.last_name, actor.last_update
Buffers: shared hit=2 read=3
-> GroupAggregate (cost=527.28..570.25 rows=200 width=10) (actual time=242.973..247.346 rows=200 loops=1)
Output: film_actor.actor_id, count(DISTINCT film_category.category_id)
Group Key: film_actor.actor_id
Buffers: shared hit=7 read=32
-> Sort (cost=527.28..540.94 rows=5462 width=4) (actual time=242.932..243.781 rows=5462 loops=1)
Output: film_actor.actor_id, film_category.category_id
Sort Key: film_actor.actor_id
Sort Method: quicksort Memory: 449kB
Buffers: shared hit=7 read=32
-> Hash Join (cost=28.50..188.22 rows=5462 width=4) (actual time=17.034..216.640 rows=5462 loops=1)
Output: film_actor.actor_id, film_category.category_id
Hash Cond: (film_actor.film_id = film_category.film_id)
Buffers: shared hit=4 read=32
-> Seq Scan on public.film_actor (cost=0.00..84.62 rows=5462 width=4) (actual time=0.019..195.884 rows=5462 loops=1)
Output: film_actor.actor_id, film_actor.film_id, film_actor.last_update
Buffers: shared hit=2 read=28
-> Hash (cost=16.00..16.00 rows=1000 width=4) (actual time=16.964..16.965 rows=1000 loops=1)
Output: film_category.category_id, film_category.film_id
Buckets: 1024 Batches: 1 Memory Usage: 44kB
Buffers: shared hit=2 read=4
-> Seq Scan on public.film_category (cost=0.00..16.00 rows=1000 width=4) (actual time=0.017..16.431 rows=1000 loops=1)
Output: film_category.category_id, film_category.film_id
Buffers: shared hit=2 read=4
Planning Time: 966.447 ms
Execution Time: 436.348 ms
在第一行,实际总时间为 324.861 毫秒,但在最后一行,执行时间为 436.348 毫秒。造成这种差异的原因是什么?
我正在通过以下查询运行,EXPLAIN ANALYZE
但运行需要很长时间。根据 的输出EXPLAIN ANALYZE
,我可以对我的表执行哪些操作来加快此查询的执行速度?
select count(amount), sum(amount)
from mytable
where color_code = 5 and shade_type = 'Light';
EXPLAIN ANALYZE
输出:
Finalize Aggregate (cost=946640.31..946640.32 rows=1 width=40) (actual time=4799.256..4799.257 rows=1 loops=1)
-> Gather (cost=946640.09..946640.30 rows=2 width=40) (actual time=4799.191..4800.566 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=945640.09..945640.10 rows=1 width=40) (actual time=4797.002..4797.003 rows=1 loops=3)
-> Parallel Seq Scan on mytable (cost=0.00..945573.39 rows=13338 width=6) (actual time=656.722..4791.404 rows=10103 loops=3)
Filter: ((color_code = 5) AND ((shade_type)::text = 'Light'::text))
Rows Removed by Filter: 4888180
Planning time: 0.257 ms
Execution time: 4800.661 ms
(10 rows)
我一直在优化我对 MongoDB 中大约 200 万个文档的查询,我尝试在聚合函数上使用解释,但它会显示
"winningPlan" : {
"stage" : "EOF"
},
在此之前,该函数会显示带有“Fetch”等阶段的获胜计划,但在我尝试了几种不同的语法来编写聚合命令后,现在它显示“EOF”。我试图将我的命令简化为一个find().explain()
函数,但它仍然是一样的。有人有什么想法吗?
第二点,有没有人想出如何进行explain("executionStats")
聚合查询。我看到该功能已在此处实现,但是当我运行它时,我得到“EOF”以及基本explain()
结果。是不是因为我的 MongoDB 没有更新到 3.5.5?低于 3.5.5 的版本是否支持此功能?提前谢谢了。
Postgres 8.4.20
使用这些设置,守护进程启动:
postgresql配置文件
shared_preload_libraries = 'auto_explain'
使用这些设置,守护进程不会启动:
postgresql配置文件
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 1000
它是一个带有遗留应用程序的服务器,因此我们无法升级 PG。我找不到此行为/错误的文档...