我在索引上放了多少“填充”？

Question

vyegorov

Asked: 2014-09-25 23:23:51 +0800 CST2014-09-25 23:23:51 +0800 CST 2014-09-25 23:23:51 +0800 CST

BitmapIndexScan 与 IndexScan 调整

772

我有以下测试用例：

CREATE SCHEMA IF NOT EXISTS long;
SET search_path TO long,pg_temp,pg_catalog;
CREATE TABLE long WITH (autovacuum_enabled=off) AS
SELECT id, (random()*1000)::int AS key, random() AS value1, random() AS value2,
       repeat((random()*9)::int::text, 312) AS long
  FROM generate_series(1,10000000) id;
CREATE INDEX i_long_value1 ON long (value1);
CREATE INDEX i_long_wo ON long(key,value2 DESC);
VACUUM ANALYZE long;

现在，如果我解释以下查询：

SELECT * FROM long WHERE key=5 AND value1 BETWEEN 0.5 AND 0.6 ORDER BY value2 DESC;

我总是得到以下计划。这很奇怪，因为我有一个匹配的索引i_long_wo。所以我使用：SET enable_bitmapscan TO off;这导致了更好的计划。这在我尝试过的所有 PG 实例（win64、HP-UX、MacOS）上都是可重现的，而不仅仅是在 9.4beta2 上。此外，在查看计划之前，我已经多次运行这些查询，以确保缓存所有信息，您可以检查Buffers: shared hit所提供计划中的条目。

我正在运行（在 MacOS 上的 9.1-9.4b2 上也观察到相同的效果）：

postgres=# SELECT version();
                            version
----------------------------------------------------------------
 PostgreSQL 9.4beta2, compiled by Visual C++ build 1800, 64-bit

postgres=# SELECT name,setting,unit,source FROM pg_settings
            WHERE NOT source IN ('default','override');
             name             |      setting       | unit |        source
------------------------------+--------------------+------+----------------------
 application_name             | psql               |      | client
 bgwriter_delay               | 50                 | ms   | configuration file
 bgwriter_lru_maxpages        | 350                |      | configuration file
 checkpoint_completion_target | 0.9                |      | configuration file
 checkpoint_segments          | 100                |      | configuration file
 client_encoding              | WIN1252            |      | client
 DateStyle                    | ISO, MDY           |      | configuration file
 default_text_search_config   | pg_catalog.english |      | configuration file
 dynamic_shared_memory_type   | windows            |      | configuration file
 effective_cache_size         | 524288             | 8kB  | configuration file
 lc_messages                  | C                  |      | configuration file
 lc_monetary                  | C                  |      | configuration file
 lc_numeric                   | C                  |      | configuration file
 lc_time                      | C                  |      | configuration file
 listen_addresses             | *                  |      | configuration file
 log_destination              | stderr             |      | configuration file
 log_line_prefix              | %t                 |      | configuration file
 log_timezone                 | Europe/Helsinki    |      | configuration file
 logging_collector            | on                 |      | configuration file
 max_connections              | 100                |      | configuration file
 max_stack_depth              | 2048               | kB   | environment variable
 port                         | 5432               |      | configuration file
 shared_buffers               | 131072             | 8kB  | configuration file
 TimeZone                     | Europe/Helsinki    |      | configuration file
 wal_buffers                  | 2048               | 8kB  | configuration file
 work_mem                     | 16384              | kB   | configuration file
(26 rows)

Windows 实例正在使用来自 EDB 的 9.4beta2 安装程序，我会尽快更新编译细节。用于构建 MacOS 实例的配置：

CONFIGURE = '--prefix=/usr/local/Cellar/postgresql-9.4/9.4beta2' '--enable-dtrace' \
  '--with-bonjour' '--with-gssapi' '--with-ldap' '--with-libxml' \
  '--with-libxslt' '--with-openssl' '--with-uuid=e2fs' '--with-pam' \
  '--with-perl' '--with-python' '--with-tcl' \
'CC=/usr/bin/clang' \
'CFLAGS=-Os -w -pipe -march=native -mmacosx-version-min=10.9' \
'LDFLAGS=-L/usr/local/opt/readline/lib -L/usr/local/opt/e2fsprogs/lib \
  -L/usr/local/opt/gettext/lib -L/usr/local/lib \
  -Wl,-headerpad_max_install_names' \
'CPPFLAGS=-I/usr/local/opt/readline/include -I/usr/local/opt/e2fsprogs/include \
  -I/usr/local/opt/gettext/include'
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 \
  -lz -lreadline -lm

所以我想知道 — 为什么 BitmapIndexScan 得到更好的估计？

我从 Tom Lane（很老）那里看到了这篇文章，据我所知，推荐的方法是调整*_cost参数。但问题是——成本受页面数量的影响更大，这可以从 PostgreSQL 在执行期间接触的缓冲区数量看出，它们对于两个计划都具有可比性，并且高于返回的记录数量。因此，调整random_page_cost降低了两个计划的成本，而 IndexScan 永远不会赢。

我尝试增加cpu_index_tuple_cost，但我不得不将其提高0.02到使 planner 停止使用BitmapAnd。在此设置中，BitmapIndexScan varian 看起来好多了，但仍然不如 IndexScan。

是否可以让规划器在这里为 IndexScan 提供更好的估计？

1 个回答

Voted

jjanes · Answer 1 · 2014-09-29T11:42:16+08:00

Best Answer

jjanes

2014-09-29T11:42:16+08:002014-09-29T11:42:16+08:00

如果我为两个查询都使用冷缓存运行您的示例，那么位图索引扫描确实胜出。所以从这个意义上说，计划者是正确的。

由于您正在完全热缓存下进行测试，因此正确的做法是将random_page_cost和seq_page_cost都降低到零，如果我这样做，它会选择您喜欢的索引扫描（在这一点上，我什至不能让它使用 BitMapAnd，无论我对enable_*参数做什么）。

这并不是说规划师在这里做得很好。它大大低估了将元组插入位图中需要完成的工作量。它假设这样做需要 0.1 的 a cpu_operator_cost，这在我看来很荒谬。

如果您在实际条件下而不是在基准测试条件下测量此特定查询的缓冲区命中/未命中率，它们看起来像什么？在现实条件下进行测试会很好，但这些很难实现，所以我只能做全热或全冷，希望它能正确插值

2

BitmapIndexScan 与 IndexScan 调整

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

BitmapIndexScan 与 IndexScan 调整

1 个回答

相关问题