我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2023-08-12 05:28:23 +0800 CST2023-08-12 05:28:23 +0800 CST 2023-08-12 05:28:23 +0800 CST

为什么 PostgreSQL 查询规划器选择如此低效的解决方案？

772

我正在 Amazon Aurora 上使用 postgres 13.9。在我们的生产环境中，我们运行的查询在使用小型LIMIT. 例如，当使用运行查询时LIMIT 1，我们会看到以下结果

Limit  (cost=1.54..2608.50 rows=1 width=1100) (actual time=17945.422..17945.424 rows=1 loops=1)
  Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
  ->  Merge Semi Join  (cost=1.54..60650907.42 rows=23265 width=1100) (actual time=17945.420..17945.422 rows=1 loops=1)
        Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
        Merge Cond: (tasks.id = t0.id)
        ->  Index Scan using tasks_pkey on public.tasks  (cost=0.56..14315808.88 rows=11401481 width=1100) (actual time=0.054..4908.126 rows=2722000 loops=1)
              Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
              Filter: ((tasks.deleted_at IS NULL) AND (tasks.milestone_id IS NOT NULL))
              Rows Removed by Filter: 650237
        ->  Nested Loop  (cost=0.98..46306291.52 rows=28266 width=8) (actual time=12863.972..12863.973 rows=1 loops=1)
              Output: t0.id
              Inner Unique: true
              ->  Index Scan using tasks_pkey on public.tasks t0  (cost=0.56..14350439.73 rows=13852340 width=16) (actual time=0.010..4179.561 rows=3372237 loops=1)
                    Output: t0.project_id, t0.id
                    Index Cond: (t0.id IS NOT NULL)
              ->  Index Scan using projects_pkey on public.projects j0  (cost=0.42..2.31 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=3372237)
                    Output: j0.id
                    Index Cond: (j0.id = t0.project_id)
                    Filter: (j0.organization_id = 79403)
                    Rows Removed by Filter: 1
Planning Time: 0.914 ms
Execution Time: 17945.475 ms

使用运行相同的查询LIMIT 500，具有以下解释：

Limit  (cost=322268.59..322269.84 rows=500 width=1100) (actual time=1329.805..1330.032 rows=500 loops=1)
  Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
  ->  Sort  (cost=322268.59..322326.76 rows=23266 width=1100) (actual time=1329.803..1329.989 rows=500 loops=1)
        Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
        Sort Key: tasks.id
        Sort Method: top-N heapsort  Memory: 444kB
        ->  Nested Loop  (cost=218419.30..321109.27 rows=23266 width=1100) (actual time=563.649..1313.910 rows=20876 loops=1)
              Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
              Inner Unique: true
              ->  HashAggregate  (cost=218418.74..218701.41 rows=28267 width=8) (actual time=563.618..570.523 rows=21926 loops=1)
                    Output: t0.id
                    Group Key: t0.id
                    Batches: 1  Memory Usage: 2065kB
                    ->  Gather  (cost=1000.56..218348.08 rows=28267 width=8) (actual time=1.032..553.590 rows=21926 loops=1)
                          Output: t0.id
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Nested Loop  (cost=0.56..214521.38 rows=11778 width=8) (actual time=1.020..522.937 rows=7309 loops=3)
                                Output: t0.id
                                Worker 0:  actual time=2.356..510.679 rows=7819 loops=1
                                Worker 1:  actual time=0.063..537.921 rows=7849 loops=1
                                ->  Parallel Seq Scan on public.projects j0  (cost=0.00..53613.72 rows=417 width=8) (actual time=0.515..68.601 rows=220 loops=3)
                                      Output: j0.id
                                      Filter: (j0.organization_id = 79403)
                                      Rows Removed by Filter: 90977
                                      Worker 0:  actual time=0.885..109.155 rows=225 loops=1
                                      Worker 1:  actual time=0.034..37.727 rows=212 loops=1
                                ->  Index Scan using index_tasks_on_project_id on public.tasks t0  (cost=0.56..384.30 rows=157 width=16) (actual time=0.886..2.059 rows=33 loops=660)
                                      Output: t0.project_id, t0.id
                                      Index Cond: (t0.project_id = j0.id)
                                      Filter: (t0.id IS NOT NULL)
                                      Worker 0:  actual time=0.698..1.778 rows=35 loops=225
                                      Worker 1:  actual time=0.960..2.353 rows=37 loops=212
              ->  Index Scan using tasks_pkey on public.tasks  (cost=0.56..3.63 rows=1 width=1100) (actual time=0.033..0.033 rows=1 loops=21926)
                    Output: tasks.id, tasks.name, tasks.description, tasks.priority, tasks.estimated_hours, tasks.sort_order, tasks.estimated_points, tasks.responsibility, tasks.sign_off_required, tasks.created_at, tasks.updated_at, tasks.milestone_id, tasks.status, tasks.sign_off_user_id, tasks.assignee_id, tasks.creator_id, tasks.start_on, tasks.due_on, tasks.project_id, tasks.template_id, tasks.actual_hours, tasks.deleted_at, tasks.assignment_email_sent_at, tasks.stuck_message, tasks.overdue_pm_reminder_sent_at, tasks.duration, tasks.dependency_type, tasks.dependency_id, tasks.last_activity_at, tasks.completed_at, tasks.overdue_watched_tasks_email_sent_at, tasks.task_type, tasks.must_start_on, tasks.must_start_on_required, tasks.must_start_on_email_sent_at, tasks.visibility, tasks.type, tasks.related_task_id, tasks.action_items_count, tasks.open_action_items_count, tasks.billable_hours, tasks.non_billable_hours, tasks.jira_sync, tasks.public_id, tasks.event_details, tasks.blueprint_task_id, tasks.task_group_id
                    Index Cond: (tasks.id = t0.id)
                    Filter: ((tasks.deleted_at IS NULL) AND (tasks.milestone_id IS NOT NULL))
                    Rows Removed by Filter: 0
Planning Time: 0.872 ms
Execution Time: 1330.691 ms

如果我错了，请纠正我，但罪魁祸首是查询规划器选择这样一个低效的查询计划LIMIT。由于查询计划程序使用pg_statistics来制定其执行计划，因此我们认为我们的统计信息是无效的。经过检查，我们确定跑步VACUUM(FULL, ANALYZE, VERBOSE)对我们来说是最好的解决方案。如果您使用此命令，请务必小心。这可能需要一段时间，并且会暂时锁定数据库表。这更新了统计信息，但查询计划程序仍然选择不正确的执行计划。

我很想了解为什么查询规划器选择如此低效的计划以及如何解决这个问题。

jjanes · Answer 1 · 2023-08-12T09:43:06+08:00

规划器认为它找到的满足条件的行将沿着tasks.id顺序随机分布。因此，通过按顺序找到它们，它认为它可以在 23265 行中的第一行之后停止，这意味着它只需要完成全部工作量的大约 1/23265。实际上，在找到第一个匹配之前，它必须扫描大约 3372237/13852340（或 1/4）的任务表。所以它相差了近6000倍。这种“随机排序”只是一种假设。它不是由计划者统计数据驱动的，因此统计数据无法帮助修复它。

为什么所有任务的前 1/4 没有项目？也许它们可以被存档什么的。

通常，您可以通过将 ORDER BY 更改为使用索引无法实现的内容来“修复”此类问题，例如 ORDER BYtasks.id+0

为什么 PostgreSQL 查询规划器选择如此低效的解决方案？

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

为什么 PostgreSQL 查询规划器选择如此低效的解决方案？

1 个回答

相关问题