Oracle 中的数据库备份 - 导出数据库还是使用其他工具？

Question

Stepan

Asked: 2018-11-28 11:24:14 +0800 CST2018-11-28 11:24:14 +0800 CST 2018-11-28 11:24:14 +0800 CST

是什么减慢了这个直方图查询的速度？

772

我有一个表，其中包含从 2 到 8my_link_t的列t.weight类型number和分类speedcat类型整数值。我想将数据从 min=0、max=0.6、step = 0.001 拆分到桶中，并构建一个 3D 图以查看每个类别的权重分布。

初始数据看起来像
weight speedcat
0.0234 2
0.8643 6 0.1854
7
（权重在 0 和 0.6 之间，speedcat 在 2 和 8 之间的还有一亿个条目）

这些查询返回正确的结果并在不到一分钟的时间内完成：

--repeat for each variable. Here we loook for speedcat =8
--It takes seconds to run this query
create table   histogram_tbl_8  as (
  select  ttt."Start" as bucket_index, ttt.hist_row as bin8  --here
  FROM ((
  SELECT  Bucket*1 "Start" , Bucket "End", Count(Bucket) hist_row
  FROM (SELECT WIDTH_BUCKET (weight, 0, 0.6, 601) Bucket FROM my_link_t  where speedcat=8)
      GROUP BY Bucket ORDER BY Bucket ) ttt   )   );

speedcat对于in range 重复上述查询七次2..8

--if a bin is empty populate it with zero, don't skip it.
create table histogram_output as (
select  tr.bucket_index,
    CASE
        WHEN 1 > (select count(*) from histogram_tbl_2   htm where htm.bucket_index = tr.bucket_index)   THEN 0
        ELSE (select htm.bin2 from histogram_tbl_2   htm where htm.bucket_index =  tr.bucket_index and rownum = 1)
     END
     as b2,
     --same for b3-b7
      CASE
        WHEN 1 > (select count(*) from histogram_tbl_8   htm where htm.bucket_index = tr.bucket_index)   THEN 0
        ELSE     (select htm.bin8    from histogram_tbl_8   htm where htm.bucket_index =  tr.bucket_index and rownum = 1)
     END
     as b8
     FROM  (SELECT LEVEL as bucket_index, 0 as b2, /* 0 as b3, 0 as b4, 0 as b5, 0 as b6, 0 as b7, */ 0 as b8  FROM DUAL CONNECT BY LEVEL < 600)  tr
     )

最后

      select sum(b2), sum(b3),sum(b4),sum(b5),sum(b6),sum(b7),sum(b8) from histogram_output
       select bucket_index,
              round(b2 * 1000000 / 12921) as  b2, --normalize so that total is 1000000 ppm
              -- repeat for b3-b7
              round(b8 * 1000000 / 6262) as  b8 --normalize so that total is 1000000 ppm
         from histogram_output

我得到一张桌子

bin_end speedcat_2 speedcat_3 speedcat_4 .. speedcat_8
0.001
0.002 .. 0.599 0.600

显示此类别和此 bin 中的对象的 ppm 现在，当我组合查询时

-- DONT USE THE EXAMPLE BELOW - it is ineefficient  (runs 2+ hours instead of seconds for the method above)
       SELECT  Bucket_2*1 "Start" , Bucket_2 "End",
       Count(Bucket_2) as b2,
       --same for b3 .. b7
       Count(Bucket_8) as b8

      FROM
      (
          SELECT WIDTH_BUCKET (t2.weight, 0, 0.6, 601) Bucket_2,
                 --same for t3,.. t7
                 WIDTH_BUCKET (t8.weight, 0, 0.6, 601) Bucket_8

           FROM (select weight from my_link_t where  speedcat = 2) t2,
           -- ..speedcat = 3) t3, .. speedcat = 4) t4, etc
            (select weight from my_link_t where  speedcat = 8 ) t8

      )
      GROUP BY Bucket_2 ORDER BY Bucket_2
      ------

查询运行几个小时（运行时间比单个查询长大约 500 倍），直到我终止它。书籍建议在 SQL 中进行所有数据切片。这个例子表明，在复杂查询的情况下，将数据加载到 Java 并将其切片可能会更好。

什么会导致差异？

1 个回答

Voted

Michael Kutz · Answer 1 · 2018-11-28T13:11:08+08:00

简答

你的 7-way CartesianJOIN将会有一些严重的性能问题。

长答案

成套思考。

我假设您的数据集需要包含：speedcat、bucket_index、count(*)

该数据集的简单解决方案很简单：

select t.speedcat
  , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index
  , count(*) N
from my_link_t t
group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)

这是格式 (x,y,z) 是大多数图形包的预期格式。

如果您希望结果采用“网格”格式，则PIVOT结果。

with data as (
  -- same SELECT as previous answer
  select t.speedcat
    , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index
    , count(*) N
  from my_link_t t
  group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)
)
select *
from data
  pivot (
    sum(N)
    for speedcat in ( 2 as "speedcat_2"
                     ,3 as "speedcat_3"
                     ,4 as "speedcat_4"
                     ,5 as "speedcat_5"
                     ,6 as "speedcat_6"
                     ,7 as "speedcat_7"
                     ,8 as "speedcat_8"
                     )
  )

是什么减慢了这个直方图查询的速度？

简答

长答案

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

是什么减慢了这个直方图查询的速度？

1 个回答

简答

长答案

相关问题