查询生产数据库中的表时遇到问题。一个文本列将与我们在 where 子句中过滤的字符串进行比较,但 postgres 不会选择该行。(我们在 postgres 11.11 上)我们的表设置如下:
(PROD)=> \d names;
Table "public.names"
Column | Type | Collation | Nullable | Default
----------------------+-----------------------------+-----------+----------+---------
name | text | | not null |
processed_name | text | | not null |
name_index | integer | | not null |
when_created | timestamp without time zone | | not null |
Indexes:
"names_pkey" PRIMARY KEY, btree (name, processed_name)
"names_name_index_key" UNIQUE CONSTRAINT, btree (name_index)
"ix_names_name" btree (name)
"ix_names_processed_name" btree (processed_name)
当我们处理名称列表时,我们检查它们是否已经在表中,以防止重复添加和违反主键约束。
但是,在一个名称“Сергей Иванович МЕНЯЙЛО”上,查看该名称是否已存在的查询返回一个空集
,我希望返回具有相同名称的行。但是,当我们尝试在表中插入行时,我们会遇到主键冲突
以下是一些可以更好地解释问题的查询
(PROD)=> SELECT name_index,
name,
name = 'Сергей Иванович МЕНЯЙЛО' names_compare_equal
FROM names where name_index = 75128;
name_index | name | names_compare_equal
----------------------+-------------------------+---------------------
75128 | Сергей Иванович МЕНЯЙЛО | t
(1 row)
但是,在名称列上进行过滤不会选择任何行。
2021-05-24 20:37:41 UTC
(PROD)=> SELECT name_index,
name,
name = 'Сергей Иванович МЕНЯЙЛО'
names_compare_equal
FROM names
WHERE name = 'Сергей Иванович МЕНЯЙЛО';
name_index | name | names_compare_equal
----------------------+------+---------------------
(0 rows)
因此,如果我们尝试插入行,我们会遇到主键冲突:
(PROD)>=> INSERT INTO names (name_index, name, processed_name, when_created)
VALUES (89266, 'Сергей Иванович МЕНЯЙЛО', lower('Сергей Иванович МЕНЯЙЛО'), now());
ERROR: duplicate key value violates unique constraint "names_pkey"
DETAIL: Key (name, processed_name)=(Сергей Иванович МЕНЯЙЛО, сергей иванович меняйло) already exists.
更重要的是,如果我根据行的哈希进行查询,我会得到正确的结果:
(PROD)=> SELECT name_index,
name,
name = 'Сергей Иванович МЕНЯЙЛО' names_compare_equal
FROM names
WHERE md5(name) = md5('Сергей Иванович МЕНЯЙЛО');
name_index | name | names_compare_equal
----------------------+-------------------------+---------------------
75128 | Сергей Иванович МЕНЯЙЛО | t
(1 row)
这只发生在我们的生产数据库上 - 它具有以下编码设置
Name | Owner | Encoding | Collate | Ctype |
----------------+----------------+----------+-------------+-------------+
PROD DB | PROD DB OWNER | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
这对我来说非常莫名其妙,所以关于下一步要检查什么的想法会很有帮助