我正在 Amazon Aurora 上使用 postgres 13.9。在我们的生产环境中,我们运行的查询在使用小型LIMIT
. 例如,当使用 运行查询时LIMIT 1
,我们会看到以下结果
Limit (cost=1.54..2608.50 rows=1 width=1100) (actual time=17945.422..17945.424 rows=1 loops=1)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
-> Merge Semi Join (cost=1.54..60650907.42 rows=23265 width=1100) (actual time=17945.420..17945.422 rows=1 loops=1)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
Merge Cond: (tasks.id = t0.id)
-> Index Scan using tasks_pkey on public.tasks (cost=0.56..14315808.88 rows=11401481 width=1100) (actual time=0.054..4908.126 rows=2722000 loops=1)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
Filter: ((tasks.deleted_at IS NULL) AND (tasks.milestone_id IS NOT NULL))
Rows Removed by Filter: 650237
-> Nested Loop (cost=0.98..46306291.52 rows=28266 width=8) (actual time=12863.972..12863.973 rows=1 loops=1)
Output: t0.id
Inner Unique: true
-> Index Scan using tasks_pkey on public.tasks t0 (cost=0.56..14350439.73 rows=13852340 width=16) (actual time=0.010..4179.561 rows=3372237 loops=1)
Output: t0.project_id, t0.id
Index Cond: (t0.id IS NOT NULL)
-> Index Scan using projects_pkey on public.projects j0 (cost=0.42..2.31 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=3372237)
Output: j0.id
Index Cond: (j0.id = t0.project_id)
Filter: (j0.organization_id = 79403)
Rows Removed by Filter: 1
Planning Time: 0.914 ms
Execution Time: 17945.475 ms
使用 运行相同的查询LIMIT 500
,具有以下解释:
Limit (cost=322268.59..322269.84 rows=500 width=1100) (actual time=1329.805..1330.032 rows=500 loops=1)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
-> Sort (cost=322268.59..322326.76 rows=23266 width=1100) (actual time=1329.803..1329.989 rows=500 loops=1)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
Sort Key: tasks.id
Sort Method: top-N heapsort Memory: 444kB
-> Nested Loop (cost=218419.30..321109.27 rows=23266 width=1100) (actual time=563.649..1313.910 rows=20876 loops=1)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
Inner Unique: true
-> HashAggregate (cost=218418.74..218701.41 rows=28267 width=8) (actual time=563.618..570.523 rows=21926 loops=1)
Output: t0.id
Group Key: t0.id
Batches: 1 Memory Usage: 2065kB
-> Gather (cost=1000.56..218348.08 rows=28267 width=8) (actual time=1.032..553.590 rows=21926 loops=1)
Output: t0.id
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=0.56..214521.38 rows=11778 width=8) (actual time=1.020..522.937 rows=7309 loops=3)
Output: t0.id
Worker 0: actual time=2.356..510.679 rows=7819 loops=1
Worker 1: actual time=0.063..537.921 rows=7849 loops=1
-> Parallel Seq Scan on public.projects j0 (cost=0.00..53613.72 rows=417 width=8) (actual time=0.515..68.601 rows=220 loops=3)
Output: j0.id
Filter: (j0.organization_id = 79403)
Rows Removed by Filter: 90977
Worker 0: actual time=0.885..109.155 rows=225 loops=1
Worker 1: actual time=0.034..37.727 rows=212 loops=1
-> Index Scan using index_tasks_on_project_id on public.tasks t0 (cost=0.56..384.30 rows=157 width=16) (actual time=0.886..2.059 rows=33 loops=660)
Output: t0.project_id, t0.id
Index Cond: (t0.project_id = j0.id)
Filter: (t0.id IS NOT NULL)
Worker 0: actual time=0.698..1.778 rows=35 loops=225
Worker 1: actual time=0.960..2.353 rows=37 loops=212
-> Index Scan using tasks_pkey on public.tasks (cost=0.56..3.63 rows=1 width=1100) (actual time=0.033..0.033 rows=1 loops=21926)
Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
Index Cond: (tasks.id = t0.id)
Filter: ((tasks.deleted_at IS NULL) AND (tasks.milestone_id IS NOT NULL))
Rows Removed by Filter: 0
Planning Time: 0.872 ms
Execution Time: 1330.691 ms
如果我错了,请纠正我,但罪魁祸首是查询规划器选择这样一个低效的查询计划LIMIT
。由于查询计划程序使用pg_statistics
来制定其执行计划,因此我们认为我们的统计信息是无效的。经过检查,我们确定跑步VACUUM(FULL, ANALYZE, VERBOSE)
对我们来说是最好的解决方案。如果您使用此命令,请务必小心。这可能需要一段时间,并且会暂时锁定数据库表。这更新了统计信息,但查询计划程序仍然选择不正确的执行计划。
我很想了解为什么查询规划器选择如此低效的计划以及如何解决这个问题。
规划器认为它找到的满足条件的行将沿着tasks.id顺序随机分布。因此,通过按顺序找到它们,它认为它可以在 23265 行中的第一行之后停止,这意味着它只需要完成全部工作量的大约 1/23265。实际上,在找到第一个匹配之前,它必须扫描大约 3372237/13852340(或 1/4)的任务表。所以它相差了近6000倍。这种“随机排序”只是一种假设。它不是由计划者统计数据驱动的,因此统计数据无法帮助修复它。
为什么所有任务的前 1/4 没有项目?也许它们可以被存档什么的。
通常,您可以通过将 ORDER BY 更改为使用索引无法实现的内容来“修复”此类问题,例如 ORDER BYtasks.id+0