AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 28647
Accepted
cww
cww
Asked: 2012-11-14 22:20:38 +0800 CST2012-11-14 22:20:38 +0800 CST 2012-11-14 22:20:38 +0800 CST

MySQL使用通配符检查与组的重复?

  • 772
    +----+--------------+-----+-----------+----------+
    | ID | NAME         | AGE | ADDRESS   | SALARY   |
    +----+--------------+-----+-----------+----------+
    |  1 | Ramesh Olive |  32 | Ahmedabad |  2000.00 |
    |  2 | Tan Kau      |  25 | Delhi     |  1500.00 |
    |  3 | Jason Tan Kau|  25 | Delhi     |  2000.00 |
    |  4 | Chaitali     |  25 | Mumbai    |  6500.00 |
    |  5 | Hardik       |  27 | Bhopal    |  8500.00 |
    |  6 | Hardik Jass  |  27 | Bhopal    |  4500.00 |
    |  7 | Muffy John   |  24 | Indore    | 10000.00 |
    |  8 | Muffy Lee    |  24 | Indore    | 10000.00 |
    +----+--------------+-----+-----------+----------+

在上面的例子中,假设表名是“table_a”,1)“Tan Kau”与“Jason Tan Kau”重复,2)“Hardik”与“Hardik Jass”重复

如何编写将产生如下输出的 SQL?

我认为这会起作用,但它应该很慢。有什么想法可以改进吗?

Select A.*, IF(B.ID IS NULL, "", "DUP") as DUP
FROM table_a A 
LEFT JOIN table_a B 
ON A.NAME LIKE CONCATE("%", B.NAME, "%") AND A.ID != B.ID


    +----+--------------+-----+-----------+----------+-----+
    | ID | NAME         | AGE | ADDRESS   | SALARY   | DUP |
    +----+--------------+-----+-----------+----------+-----+
    |  1 | Ramesh Olive |  32 | Ahmedabad |  2000.00 |     |
    |  2 | Tan Kau      |  25 | Delhi     |  1500.00 | Dup |
    |  3 | Jason Tan Kau|  25 | Delhi     |  2000.00 | Dup |
    |  4 | Chaitali     |  25 | Mumbai    |  6500.00 |     |
    |  5 | Hardik       |  27 | Bhopal    |  8500.00 | Dup |
    |  6 | Hardik Jass  |  27 | Bhopal    |  4500.00 | Dup | 
    |  7 | Muffy John   |  24 | Indore    | 10000.00 |     |
    |  8 | Muffy Lee    |  24 | Indore    | 10000.00 |     |
    +----+--------------+-----+-----------+----------+-----+
mysql duplication
  • 2 2 个回答
  • 1786 Views

2 个回答

  • Voted
  1. Best Answer
    Leigh Riffel
    2012-11-15T11:51:40+08:002012-11-15T11:51:40+08:00

    您的查询可以通过添加反向条件返回预期结果:

    SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
    FROM persons A 
    LEFT JOIN persons B 
    ON a.ID <> b.ID 
    AND (a.Name LIKE CONCAT ("%", b.Name, "%") OR b.Name LIKE CONCAT ("%", a.Name, "%"))
    ORDER BY ID;
    

    我不知道它是否会更快,但另一种方法是使用 INSTR:

    SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
    FROM persons A 
    LEFT JOIN persons B 
    ON a.ID <> b.ID 
    AND (Instr(a.Name, b.Name) > 0 OR Instr(b.Name, a.Name) > 0)
    ORDER BY ID;
    

    SQL小提琴

    • 3
  2. RolandoMySQLDBA
    2012-11-15T13:47:49+08:002012-11-15T13:47:49+08:00

    我做了一些不同的事情

    SELECT DISTINCT AA.* FROM
    (
        SELECT A.*,IF(IFNULL(B.ID,'')='','','Dup') DUP
        FROM table_a A LEFT JOIN table_a B ON a.ID <> b.ID 
        AND IF(LENGTH(A.name)>LENGTH(B.name),
        INSTR(A.name,B.name)>0,
        INSTR(B.name,A.name)>0)
    ) AA;
    

    注意:我基本上是抄袭了 Leigh 的答案并略微增加了它,所以请不要将我的答案标记为已接受!!!

    我给出这个的原因是为了防止出现更多的副本

    这是您的示例数据加上两个额外的行:

    mysql> DROP DATABASE IF EXISTS cww;
    Query OK, 1 row affected (0.03 sec)
    
    mysql> CREATE DATABASE cww;
    Query OK, 1 row affected (0.00 sec)
    
    mysql> USE cww
    Database changed
    mysql> CREATE TABLE table_a
        -> (
        ->     ID INT NOT NULL AUTO_INCREMENT,
        ->     NAME VARCHAR(25) NOT NULL,
        ->     AGE INT NOT NULL,
        ->     ADDRESS VARCHAR(25) NOT NULL,
        ->     SALARY DECIMAL(10,2) NOT NULL,
        ->     PRIMARY KEY (ID)
        -> );
    Query OK, 0 rows affected (0.10 sec)
    
    mysql> INSERT INTO table_a (NAME,AGE,ADDRESS,SALARY) VALUES
        -> ('Ramesh Olive'   ,32,'Ahmedabad', 2000.00),
        -> ('Tan Kau'        ,25,'Delhi'    , 1500.00),
        -> ('Jason Tan Kau'  ,25,'Delhi'    , 2000.00),
        -> ('Jackson Tan Kau',25,'Delhi'    , 2000.00),
        -> ('Chaitali'       ,25,'Mumbai'   , 6500.00),
        -> ('Hardik'         ,27,'Bhopal'   , 8500.00),
        -> ('Hardik Jass'    ,27,'Bhopal'   , 4500.00),
        -> ('Hardik Jess'    ,27,'Bhopal'   , 4500.00),
        -> ('Muffy John'     ,24,'Indore'   , 10000.00),
        -> ('Muffy Lee'      ,24,'Indore'   , 10000.00);
    Query OK, 10 rows affected (0.05 sec)
    Records: 10  Duplicates: 0  Warnings: 0
    
    mysql> SELECT * FROM table_a;
    +----+-----------------+-----+-----------+----------+
    | ID | NAME            | AGE | ADDRESS   | SALARY   |
    +----+-----------------+-----+-----------+----------+
    |  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |
    |  2 | Tan Kau         |  25 | Delhi     |  1500.00 |
    |  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 |
    |  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 |
    |  5 | Chaitali        |  25 | Mumbai    |  6500.00 |
    |  6 | Hardik          |  27 | Bhopal    |  8500.00 |
    |  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 |
    |  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 |
    |  9 | Muffy John      |  24 | Indore    | 10000.00 |
    | 10 | Muffy Lee       |  24 | Indore    | 10000.00 |
    +----+-----------------+-----+-----------+----------+
    10 rows in set (0.00 sec)
    
    mysql>
    

    请注意我的增强查询如何正确处理重复数据

    mysql> SELECT DISTINCT AA.* FROM
        -> (
        ->     SELECT A.*,IF(IFNULL(B.ID,'')='','','Dup') DUP
        ->     FROM table_a A LEFT JOIN table_a B ON a.ID <> b.ID
        ->     AND IF(LENGTH(A.name)>LENGTH(B.name),
        ->     INSTR(A.name,B.name)>0,
        ->     INSTR(B.name,A.name)>0)
        -> ) AA;
    +----+-----------------+-----+-----------+----------+-----+
    | ID | NAME            | AGE | ADDRESS   | SALARY   | DUP |
    +----+-----------------+-----+-----------+----------+-----+
    |  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |     |
    |  2 | Tan Kau         |  25 | Delhi     |  1500.00 | Dup |
    |  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 | Dup |
    |  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 | Dup |
    |  5 | Chaitali        |  25 | Mumbai    |  6500.00 |     |
    |  6 | Hardik          |  27 | Bhopal    |  8500.00 | Dup |
    |  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 | Dup |
    |  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 | Dup |
    |  9 | Muffy John      |  24 | Indore    | 10000.00 |     |
    | 10 | Muffy Lee       |  24 | Indore    | 10000.00 |     |
    +----+-----------------+-----+-----------+----------+-----+
    10 rows in set (0.00 sec)
    
    mysql>
    

    面对更多的重复,Leigh 的查询是这样的:

    mysql> SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
        -> FROM table_a A
        -> LEFT JOIN table_a B
        -> ON a.ID <> b.ID
        -> AND (Instr(a.Name, b.Name) > 0 OR Instr(b.Name, a.Name) > 0)
        -> ORDER BY ID;
    +----+-----------------+-----+-----------+----------+-----+
    | ID | NAME            | AGE | ADDRESS   | SALARY   | DUP |
    +----+-----------------+-----+-----------+----------+-----+
    |  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |     |
    |  2 | Tan Kau         |  25 | Delhi     |  1500.00 | DUP |
    |  2 | Tan Kau         |  25 | Delhi     |  1500.00 | DUP |
    |  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 | DUP |
    |  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 | DUP |
    |  5 | Chaitali        |  25 | Mumbai    |  6500.00 |     |
    |  6 | Hardik          |  27 | Bhopal    |  8500.00 | DUP |
    |  6 | Hardik          |  27 | Bhopal    |  8500.00 | DUP |
    |  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 | DUP |
    |  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 | DUP |
    |  9 | Muffy John      |  24 | Indore    | 10000.00 |     |
    | 10 | Muffy Lee       |  24 | Indore    | 10000.00 |     |
    +----+-----------------+-----+-----------+----------+-----+
    12 rows in set (0.00 sec)
    
    mysql>
    

    @LeighRiffel 的答案只需要嵌入子查询中并进行区分:

    mysql> SELECT DISTINCT * FROM (
        -> SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
        -> FROM table_a A
        -> LEFT JOIN table_a B
        -> ON a.ID <> b.ID
        -> AND (Instr(a.Name, b.Name) > 0 OR Instr(b.Name, a.Name) > 0)
        -> ORDER BY ID) AA;
    +----+-----------------+-----+-----------+----------+-----+
    | ID | NAME            | AGE | ADDRESS   | SALARY   | DUP |
    +----+-----------------+-----+-----------+----------+-----+
    |  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |     |
    |  2 | Tan Kau         |  25 | Delhi     |  1500.00 | DUP |
    |  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 | DUP |
    |  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 | DUP |
    |  5 | Chaitali        |  25 | Mumbai    |  6500.00 |     |
    |  6 | Hardik          |  27 | Bhopal    |  8500.00 | DUP |
    |  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 | DUP |
    |  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 | DUP |
    |  9 | Muffy John      |  24 | Indore    | 10000.00 |     |
    | 10 | Muffy Lee       |  24 | Indore    | 10000.00 |     |
    +----+-----------------+-----+-----------+----------+-----+
    10 rows in set (0.00 sec)
    
    mysql>
    

    尽管如此,Leigh 的回答确实预先提供了所需的 SQL 原则。

    因此,我给他+1!

    • 1

相关问题

  • 是否有任何 MySQL 基准测试工具?[关闭]

  • 我在哪里可以找到mysql慢日志?

  • 如何优化大型数据库的 mysqldump?

  • 什么时候是使用 MariaDB 而不是 MySQL 的合适时机,为什么?

  • 组如何跟踪数据库架构更改?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何查看 Oracle 中的数据库列表?

    • 8 个回答
  • Marko Smith

    mysql innodb_buffer_pool_size 应该有多大?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    从 .frm 和 .ibd 文件恢复表?

    • 10 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    如何选择每组的第一行?

    • 6 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    pedrosanta 使用 psql 列出数据库权限 2011-08-04 11:01:21 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST
  • Martin Hope
    bernd_k 什么时候应该使用唯一约束而不是唯一索引? 2011-01-05 02:32:27 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve