我创建了下表并在其中插入了一些值,如下所示:
CREATE TABLE query_all_lexeme (
payload text,
normalized tsvector GENERATED ALWAYS AS (to_tsvector('english', payload)) STORED
);
INSERT INTO query_all_lexeme (payload)
VALUES ('fat cats ate rats');
INSERT INTO query_all_lexeme (payload)
VALUES ('summarize the functions and operators that are provided for full text searching');
INSERT INTO query_all_lexeme (payload)
VALUES ('Constructs a phrase query');
INSERT INTO query_all_lexeme (payload)
SELECT
'Constructs a phrase query this is a test'
FROM
generate_series(1, 10000);
INSERT INTO query_all_lexeme (payload)
SELECT
'Constructs a phrase query this is a test'
FROM
generate_series(1, 100);
然后我创建了一个杜松子酒索引:
CREATE INDEX query_all_lexeme_vector ON query_all_lexeme USING gin (normalized);
然后我运行下面的查询来获取杜松子酒索引信息:
SELECT * FROM
gin_metapage_info (get_raw_page ('query_all_lexeme_vector', 0)) \gx
结果:
+-[ RECORD 1 ]-----+------------+
| pending_head | 4294967295 |
| pending_tail | 4294967295 |
| tail_free_size | 0 |
| n_pending_pages | 0 |
| n_pending_tuples | 0 |
| n_total_pages | 14 |
| n_entry_pages | 1 |
| n_data_pages | 12 |
| n_entries | 15 |
| version | 2 |
+------------------+------------+
WITH cte AS (
SELECT
flags,
p
FROM
generate_series(1, 13) AS p,
gin_page_opaque_info (get_raw_page ('query_all_lexeme_vector', p)))
SELECT
array_agg(p)
FROM
cte
WHERE
flags::text = '{data,leaf,compressed}';
返回
+----------------------+
| array_agg |
+----------------------+
| {3,4,6,7,9,10,12,13} |
+----------------------+
在以下查询中,我应该期望至少有一行的列gin_tid_vs_table_tid
值为真。但是,列值都是假的。
WITH cte AS (
SELECT
(unnest(normalized)).lexeme AS elements
, array_agg(ctid) AS ctids
FROM
query_all_lexeme
GROUP BY
1
)
SELECT
elements
, pg_typeof(ctids)
, ctids = (
SELECT
tids
FROM
gin_leafpage_items (get_raw_page ('query_all_lexeme_vector' , 3))
ORDER BY
1
LIMIT 1) AS gin_tid_vs_table_tid
FROM
cte;
我已经运行了真空分析。现在数据很稳定(只有SELECT
),tid
数值稳定。那么为什么最后一个查询的列gin_tid_vs_table_tid
有假值呢?我的逻辑就像gin
存储的索引lexemes
和lexemes
'对应的tid
(物理元组位置)。所以gin
索引tid
应该array_agg(ctid)
与相同lexeme
。
仅当保证一个索引条目包含给定词位的所有ctid 时,您的逻辑才适用。没有这样的保证,也不能保证,因为索引元组具有严格限制的大小,远远不足以包含所有可能的 ctid。
也许您可以从数组相等切换到重叠或包含 (
&&
,@>
)。