Chloe提出的问题 -dba

Chloe

Asked: 2020-01-21 18:43:17 +0800 CST

为什么 MySQL 不为子查询使用索引？

0

此查询需要永远运行（30+m - 无穷大）。

select date, 
       sc, 
       ( select count(fingerprint_id) 
         from stats 
         where hit_date >= t.date 
           and hit_date < date_add('2020-01-20', interval 1 day) 
           and hit_type = 0 
           and fingerprint_id is not null ) as total_fingerprint
from ( select date(hit_date) as date, 
              sum(sc) as sc 
       from delayed_stats  
       where hit_date > date_sub(now(), interval 1 day) 
       group by date(hit_date) 
       order by hit_date) t;

单个查询需要 1 秒和 8 秒才能运行，但组合起来永远不会完成。我预计8-9秒。如果我t.date用静态的“2020-01-20”替换，则需要 8 秒。只需将一个静态日期替换为t.date导致查询“挂起”。复制此挂起的最小查询是

select date, 
       (select count(fingerprint_id) from stats where hit_date >= t.date and hit_date < date_add(t.date, interval 1 day) and hit_type = 0 and fingerprint_id is not null) as total_fingerprint
from (select '2020-01-01' as date union select '2020-01-02' as date) t;

这是查询解释：

+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
| id | select_type        | table         | partitions                                                                                                                                | type  | possible_keys | key          | key_len | ref  | rows      | filtered | Extra                                                  |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
|  1 | PRIMARY            | <derived3>    | NULL                                                                                                                                       | ALL   | NULL          | NULL         | NULL    | NULL |      7496 |   100.00 | NULL                                                   |
|  3 | DERIVED            | delayed_stats | NULL                                                                                                                                       | range | hit_date_idx  | hit_date_idx | 5       | NULL |      7496 |   100.00 | Using index condition; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | stats         | p20180101,p20180201,p20180301,p20180401,p20180501,p20180601,p20180701,p20180801,p20180901,p20181001,p20181101,p20181201,p20190101,p20190201,p20190301,p20190401,p20190501,p20190601,p20190701,p20190801,p20190901,p20191001,p20191101,p20191201,p20200101,p20200201 | ALL   | NULL          | NULL         | NULL    | NULL | 316867000 |     1.00 | Using where                                            |
+----+--------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+--------------+---------+------+-----------+----------+--------------------------------------------------------+
3 rows in set, 2 warnings (0.11 sec)

它似乎没有在表的子查询上使用 hit_date 索引（PRIMARY KEY (id ,hit_date )）stats。我的最终目标是结合这两个查询（interval 30 day）：

select date(hit_date), 
       sum(sc) 
from delayed_stats 
where hit_date > date_sub(now(), interval 30 day) 
group by date(hit_date) 
order by hit_date;

select date(hit_date), 
       count(fingerprint_id) 
from stats 
where hit_date > date_sub(now(), interval 30 day) 
  and hit_type = 0 
  and fingerprint_id is not null 
group by date(hit_date) 
order by hit_date; -- 2m21s

当我看到表上第二个查询的查询计划时stats，它显示possible_keys为PRIMARY,source_id,stats_bag_id_idx. 我尝试了另一种将它们组合在一起的方法，即加入，但是运行需要 15m，而它应该只需要 2m。

select t.date, 
       sc, 
       fingerprint_count 
from ( select date(hit_date) date, 
              sum(sc) as sc 
       from delayed_stats 
       where hit_date > date_sub(now(), interval 30 day) 
       group by date(hit_date) 
       order by hit_date ) t 
join ( select date(hit_date) date, 
              count(fingerprint_id) as fingerprint_count 
       from stats 
       where hit_date > date_sub(now(), interval 30 day) 
         and hit_type = 0 
         and fingerprint_id is not null 
       group by date(hit_date) 
       order by hit_date ) t2 on t.date = t2.date;

Chloe

Asked: 2019-09-24 20:46:53 +0800 CST

将一个大表拆分为 12 个滚动月度表并将它们用于报告或保留大表并删除超过 1 年的行是否更快？

0

我的同事想将一个 158M 行的大型统计表拆分为 stats_jan、stats_feb ……并使用 UNION 从中选择报告。这是标准做法吗？它比只使用大表并删除超过一年的行更快吗？该表有许多小行。

mysql> describe stats;
+----------------+---------------------+------+-----+---------+----------------+
| Field          | Type                | Null | Key | Default | Extra          |
+----------------+---------------------+------+-----+---------+----------------+
| id             | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| badge_id       | bigint(20) unsigned | NO   | MUL | NULL    |                |
| hit_date       | datetime            | YES  | MUL | NULL    |                |
| hit_type       | tinyint(4)          | YES  |     | NULL    |                |
| source_id      | bigint(20) unsigned | YES  | MUL | NULL    |                |
| fingerprint_id | bigint(20) unsigned | YES  |     | NULL    |                |
+----------------+---------------------+------+-----+---------+----------------+

我确实手动拆分了表并将行复制到适当的月份表中并创建了一个巨大的 UNION 查询。大型 UNION 查询耗时 14s，而单表查询耗时 4.5m。当总行数相同时，为什么许多较小的表比一个大表花费的时间要短得多？

create table stats_jan (...);
create table stats_feb (...);
...
create index stats_jan_hit_date_idx on stats_jan (hit_date);
...
insert into stats_jan select * from stats where hit_date >= '2019-01-01' and hit_date < '2019-02-01';
...
delete from stats where hit_date < '2018-09-01';
...

月表有 170 万行到 3500 万行。

select host as `key`, count(*) as value from stats join sources on source_id = sources.id where hit_date >= '2019-08-21 19:43:19' and sources.host != 'NONE' group by source_id order by value desc limit 10;
4 min 30.39 sec

flush tables;
reset query cache;

select host as `key`, count(*) as value from stats_jan join sources on source_id = sources.id where hit_date >= '2019-08-21 19:43:19' and sources.host != 'NONE' group by source_id
UNION
...
order by value desc limit 10;
14.16 sec

Chloe

Asked: 2018-12-13 11:40:19 +0800 CST

我怎样才能加快这个有索引的 2m5s 查询？

0

我怎样才能加快这个有索引的 2m5s 查询？

select urls.id as urlId, 
    count(case when s1.hit_type = 0 then 1 end) as aCount, 
    count(case when s1.hit_type = 1 then 1 end) as bCount, 
    count(case when s1.hit_type = 2 then 1 end) as cCount, 
    count(distinct s1.source_id) as sourcesCount 
from urls join stats s1 on urls.id = s1.url_id 
where s1.hit_date >= '2017-12-12' 
group by urls.id 
order by aCount desc 
limit 0,100;

mysql> show create table stats;

| stats | CREATE TABLE `stats` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `url_id` varchar(100) DEFAULT NULL,
  `hit_date` datetime DEFAULT NULL,
  `hit_type` tinyint(4) DEFAULT NULL,
  `source_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `url_id_idx` (`url_id`),
  KEY `source_id` (`source_id`),
  KEY `stats_hit_date_idx` (`hit_date`),
  CONSTRAINT `stats_ibfk_1` FOREIGN KEY (`url_id`) REFERENCES `urls` (`ID`),
  CONSTRAINT `stats_ibfk_2` FOREIGN KEY (`source_id`) REFERENCES `sources` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6027557 DEFAULT CHARSET=latin1 |

mysql> describe select...
| id | select_type | table   | type   | possible_keys                                                                                   | key     | key_len | ref                      | rows    | Extra                                        |
+----+-------------+---------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | s1      | ALL    | url_id_idx,stats_hit_date_idx                                                                   | NULL    | NULL    | NULL                     | 5869695 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | urls    | eq_ref | PRIMARY,urls_email_idx,urls_status_idx,deptId_idx,deptId_status_email_idx                       | PRIMARY | 102     | db.s1.url_id             |     1   | Using index                                  |

它似乎没有使用 hit_date 索引或 url_id 索引。

我尝试使用子选择(select count(*) from stats where url_id = ... and hit_date >= ... and hit_type = 0) as aCount，速度更快，用了 24 秒。有没有办法让它小于5s？整个请求的限制是 30 秒。

MySQL 服务器版本：5.6.35-log MySQL Community Server (GPL)

Chloe

Asked: 2018-05-17 14:54:08 +0800 CST

是否可以将这三个连接查询合并为一个？

2

我想将不同 hit_types 的统计计数合并到一个查询中。那可能吗？

MariaDB [db]> select allurls.id, count(s1.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 0 where s1.hit_date >= '2018-01-15'  group by allurls.id;
+-----+--------------+
| id  | count(s1.id) |
+-----+--------------+
| aaa |            1 |
| cnn |           16 |
+-----+--------------+

MariaDB [db]> select allurls.id, count(s1.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 1 where s1.hit_date >= '2018-01-15'  group by allurls.id;
+-----+--------------+
| id  | count(s1.id) |
+-----+--------------+
| cnn |            1 |
+-----+--------------+

MariaDB [db]> select allurls.id, count(s1.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 2 where s1.hit_date >= '2018-01-15'  group by allurls.id;
+-----+--------------+
| id  | count(s1.id) |
+-----+--------------+
| cnn |            4 |
+-----+--------------+

我试图将前两个结合起来，但数字都搞砸了，它消除了第一个结果“aaa”。

MariaDB [db]> select allurls.id, count(s1.id), count(s2.id) from allurls inner join stats s1 on allurls.id = s1.allurl_id and s1.hit_type = 0 inner join stats s2 on allurls.id = s2.allurl_id and s2.hit_type = 1 where s1.hit_date >= '2018-01-15' and s2.hit_date >= '2018-01-15' group by allurls.id;
+-----+--------------+--------------+
| id  | count(s1.id) | count(s2.id) |
+-----+--------------+--------------+
| cnn |           16 |           16 |
+-----+--------------+--------------+

我期待看到

+-----+--------------+--------------+
| id  | count(s1.id) | count(s2.id) |
+-----+--------------+--------------+
| aaa |            1 |            0 |
| cnn |           16 |            1 |
+-----+--------------+--------------+

最后我还想包括count(distinct(s4.source_id)).

这是一个小提琴：https ://www.db-fiddle.com/f/tGP5SbC2AdGgeEwWTAgobf/0

Chloe

Asked: 2016-09-06 14:11:34 +0800 CST

这个语句如何在 Postgres 中更新 3 行？

4

我很好奇这条语句是如何在 Postgres 中更新 3 行的。我运行它的所有其他时间，它都会更新 0 或 1。有没有办法找出哪些行？

bestsales=# update keyword set revenue = random()*10 where id = cast(random()*99999 as int);
UPDATE 3

id是主键。

 id               | integer                        | not null default nextval('keyword_id_seq'::regclass)
    "keyword_pkey" PRIMARY KEY, btree (id)

我尝试将其运行为SELECT：

bestsales=# select * from keyword where id = cast(random()*99999 as int);
  id   |       keyword       | seed_id | source | search_count | country | language | volume | cpc  | competition | modified_on | google_violation | revenue | bing_violation
-------+---------------------+---------+--------+--------------+---------+----------+--------+------+-------------+-------------+------------------+---------+----------------
  6833 | vizio m190mv        |         | GOOGLE |            0 |         |          |     70 | 0.38 |        0.90 |             |                  |         |
 65765 | shiatsu massage mat |         | SPYFU  |            0 |         |          |    110 | 0.69 |             |             |                  |         |
 87998 | granary flour       |         | SPYFU  |            0 |         |          |     40 | 0.04 |             |             |                  |         |
(3 rows)

有时它会返回多个。这怎么可能？

PostgreSQL 9.5.3

Chloe

Asked: 2016-08-24 11:07:33 +0800 CST

这个唯一索引如何允许重复行？

1

有没有办法让这个唯一索引允许重复行？我想也许有一些额外的空格字符，但我找不到它们。

=> select *, length(keyword), length(country), length(language) from keyword where id in (4588076, 4951423);
   id    |       keyword       | seed_id | source | search_count | country | language | volume | cpc  | competition | modified_on | violation | revenue | length | length | length
---------+---------------------+---------+--------+--------------+---------+----------+--------+------+-------------+-------------+-----------+---------+--------+--------+--------
 4588076 | power wallet review |         | SPYFU  |            0 |         |          |     70 | 0.11 |        0.31 |             |           |         |     19 |        |
 4951423 | power wallet review |         | SPYFU  |            2 |         |          |     70 | 0.11 |        0.31 |             |           |         |     19 |        |
(2 rows)

指数是

"keyword_keyword_country_language" UNIQUE, btree (keyword, country, language)

PostgreSQL 9.5.3

好的，我打算删除其他两列，但我想我会测试该keyword列并发现：

=> select k1.id, k1.keyword, k2.id, k2.keyword, k1.keyword=k2.keyword from keyword k1, keyword k2 where k1.id=4588076 and k2.id=4951423;
   id    |       keyword       |   id    |       keyword       | ?column?
---------+---------------------+---------+---------------------+----------
 4588076 | power wallet review | 4951423 | power wallet review | f

为什么 MySQL 不为子查询使用索引？

将一个大表拆分为 12 个滚动月度表并将它们用于报告或保留大表并删除超过 1 年的行是否更快？

我怎样才能加快这个有索引的 2m5s 查询？

是否可以将这三个连接查询合并为一个？

这个语句如何在 Postgres 中更新 3 行？

这个唯一索引如何允许重复行？

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Chloe's questions