AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 5859
Accepted
Gary Lindahl
Gary Lindahl
Asked: 2011-09-17 05:22:13 +0800 CST2011-09-17 05:22:13 +0800 CST 2011-09-17 05:22:13 +0800 CST

删除所有重复项

  • 772

我正在尝试删除所有重复项,但仅保留单个记录(较短的 ID)。以下查询会删除重复项,但需要进行大量迭代才能删除所有副本并保留原始副本。

DELETE FROM emailTable WHERE id IN (
 SELECT * FROM (
    SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 )
 ) AS q
)

它的MySQL。

DDL

CREATE TABLE `emailTable` (
 `id` mediumint(9) NOT NULL auto_increment,
 `email` varchar(200) NOT NULL default '',
 PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1
mysql delete
  • 3 3 个回答
  • 6803 Views

3 个回答

  • Voted
  1. Best Answer
    Derek Downey
    2011-09-17T05:46:22+08:002011-09-17T05:46:22+08:00

    尝试这个:

    DELETE FROM emailTable WHERE NOT EXISTS (
     SELECT * FROM (
        SELECT MIN(id) minID FROM emailTable    
        GROUP BY email HAVING COUNT(*) > 0
      ) AS q
      WHERE minID=id
    )
    

    以上内容适用于我对 50 封电子邮件的测试(5 封不同的电子邮件重复了 10 次)。

    您可能需要在“电子邮件”列上添加索引:

    ALTER TABLE emailTable ADD INDEX ind_email (email);
    

    250,000 行可能有点慢。在一个有 150 万行(正确索引)的表上,这对我来说很慢,这就是我想出这个策略的方式:

    /* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
    CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;
    
    /* INSERT THE MINIMUM IDs */
    INSERT INTO email_min SELECT id FROM email
        GROUP BY email HAVING MIN(id);
    
    /* MAKE SURE YOU HAVE RIGHT INFO */
    SELECT * FROM email 
     WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
    
    /* DELETE FROM EMAIL */
    DELETE FROM email 
     WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
    
    /* IF ALL IS WELL, DROP MEMORY TABLE */
    DROP TABLE email_min;
    

    内存表的好处是使用了一个索引(minID 上的主键),它比普通临时表加快了进程。

    • 8
  2. RolandoMySQLDBA
    2011-09-17T08:01:05+08:002011-09-17T08:01:05+08:00

    这是一个更简化的删除过程:

    CREATE TABLE emailUnique LIKE emailTable;
    ALTER TABLE emailUnique ADD UNIQUE INDEX (email);
    INSERT IGNORE INTO emailUnique SELECT * FROM emailTable;
    SELECT * FROM emailUnique;
    ALTER TABLE emailTable  RENAME emailTable_old;
    ALTER TABLE emailUnique RENAME emailTable;
    DROP TABLE emailTable_old;
    

    以下是一些示例数据:

    use test
    DROP TABLE IF EXISTS emailTable;
    CREATE TABLE `emailTable` (
     `id` mediumint(9) NOT NULL auto_increment,
     `email` varchar(200) NOT NULL default '',
     PRIMARY KEY  (`id`)
    ) ENGINE=MyISAM;
    INSERT INTO emailTable (email) VALUES
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]'),
    ('[email protected]');
    SELECT * FROM emailTable;
    

    我跑了他们。结果如下:

    mysql> use test
    Database changed
    mysql> DROP TABLE IF EXISTS emailTable;
    Query OK, 0 rows affected (0.01 sec)
    
    mysql> CREATE TABLE `emailTable` (
        ->  `id` mediumint(9) NOT NULL auto_increment,
        ->  `email` varchar(200) NOT NULL default '',
        ->  PRIMARY KEY  (`id`)
        -> ) ENGINE=MyISAM;
    Query OK, 0 rows affected (0.05 sec)
    
    mysql> INSERT INTO emailTable (email) VALUES
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
    ('[email protected]');
    SELECT * FROM emailTable;
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]'),
        -> ('[email protected]');
    Query OK, 15 rows affected (0.00 sec)
    Records: 15  Duplicates: 0  Warnings: 0
    
    mysql> SELECT * FROM emailTable;
    +----+----------------------------+
    | id | email                      |
    +----+----------------------------+
    |  1 | [email protected]         |
    |  2 | [email protected]         |
    |  3 | [email protected]         |
    |  4 | [email protected]         |
    |  5 | [email protected]   |
    |  6 | [email protected]   |
    |  7 | [email protected]   |
    |  8 | [email protected]              |
    |  9 | [email protected]              |
    | 10 | [email protected]              |
    | 11 | [email protected]   |
    | 12 | [email protected]   |
    | 13 | [email protected] |
    | 14 | [email protected] |
    | 15 | [email protected] |
    +----+----------------------------+
    15 rows in set (0.00 sec)
    
    mysql> CREATE TABLE emailUnique LIKE emailTable;
    Query OK, 0 rows affected (0.04 sec)
    
    mysql> ALTER TABLE emailUnique ADD UNIQUE INDEX (email);
    Query OK, 0 rows affected (0.06 sec)
    Records: 0  Duplicates: 0  Warnings: 0
    
    mysql> INSERT IGNORE INTO emailUnique SELECT * FROM emailTable;
    Query OK, 4 rows affected (0.01 sec)
    Records: 15  Duplicates: 11  Warnings: 0
    
    mysql> SELECT * FROM emailUnique;
    +----+----------------------------+
    | id | email                      |
    +----+----------------------------+
    |  1 | [email protected]         |
    |  5 | [email protected]   |
    |  8 | [email protected]              |
    | 13 | [email protected] |
    +----+----------------------------+
    4 rows in set (0.00 sec)
    
    mysql> ALTER TABLE emailTable  RENAME emailTable_old;
    Query OK, 0 rows affected (0.03 sec)
    
    mysql> ALTER TABLE emailUnique RENAME emailTable;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> DROP TABLE emailTable_old;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql>
    

    如图所示,emailTable 将包含每个电子邮件地址的第一次出现和相应的原始 id。对于这个例子:

    • ID 1-4 有 [email protected],但只保留了 1。
    • ID 5-7、11、12 有 [email protected],但只保留了 5 个。
    • ID 8-10 有 [email protected],但只保留了 8 个。
    • ID 13-15 有 [email protected],但只保留了 13 个。

    CAVEAT:我通过临时表方法回答了一个类似的关于表删除的问题。

    试试看 !!!

    • 4
  3. Delux
    2011-09-17T06:18:28+08:002011-09-17T06:18:28+08:00

    这是一个真正快速的 Itzik 解决方案。这将适用于 SQL 2005 及更高版本。

    WITH Dups AS
    (
      SELECT *,
        ROW_NUMBER()
          OVER(PARTITION BY email ORDER BY id) AS rn
      FROM dbo.emailTable
    )
    DELETE FROM Dups
    WHERE rn > 1;
    
    • 1

相关问题

  • 是否有任何 MySQL 基准测试工具?[关闭]

  • 我在哪里可以找到mysql慢日志?

  • 如何优化大型数据库的 mysqldump?

  • 什么时候是使用 MariaDB 而不是 MySQL 的合适时机,为什么?

  • 组如何跟踪数据库架构更改?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    您如何显示在 Oracle 数据库上执行的 SQL?

    • 2 个回答
  • Marko Smith

    如何选择每组的第一行?

    • 6 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    我可以查看在 SQL Server 数据库上运行的历史查询吗?

    • 6 个回答
  • Marko Smith

    如何在 PostgreSQL 中使用 currval() 来获取最后插入的 id?

    • 10 个回答
  • Marko Smith

    如何在 Mac OS X 上运行 psql?

    • 11 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Marko Smith

    将数组参数传递给存储过程

    • 12 个回答
  • Martin Hope
    Manuel Leduc PostgreSQL 多列唯一约束和 NULL 值 2011-12-28 01:10:21 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Stuart Blackler 什么时候应该将主键声明为非聚集的? 2011-11-11 13:31:59 +0800 CST
  • Martin Hope
    pedrosanta 使用 psql 列出数据库权限 2011-08-04 11:01:21 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST
  • Martin Hope
    BrunoLM Guid vs INT - 哪个更好作为主键? 2011-01-05 23:46:34 +0800 CST
  • Martin Hope
    bernd_k 什么时候应该使用唯一约束而不是唯一索引? 2011-01-05 02:32:27 +0800 CST
  • Martin Hope
    Patrick 如何优化大型数据库的 mysqldump? 2011-01-04 13:13:48 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve