我有这些数据:
select * from (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
)
用文字来说,我需要:The jobs constituting the (top) 60% of errors
因此,在这种情况下,那将是(113):
select sum(errors) * .4 as cut_off from ...
最终结果将是这样的,因为它们的总和 < 113:
工作 | 错误 |
---|---|
格 | 75 |
埃 | 三十五 |
我基本上需要一个过滤器来保持某种运行总和,然后一旦达到该值就丢弃所有内容。
我有这个查询,它不太有效,我不希望使用该with
语句
with data as (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
)
select k.*
from (
select t.*,
errors + LAG(errors, 1, 0) OVER (order by errors desc ) previous
from data t
) k where previous >= (select sum(errors) *.4 from data) order by errors desc
我已经尝试过窗口总和:
select k.*
from (
select t.*,
SUM(errors) OVER (
partition by JOB
order by errors desc
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as limit
from (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
) t
) k order by errors desc
SUM(errors) OVER (ORDER BY errors DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
按降序计算错误的累计总数。WHERE cum_errors <= cut_off
筛选累计错误率低于总数 40% 的作业。输出:
小提琴
使用子查询和自连接的另一种逻辑。
总错误数:
此 CTE 计算 jobs_errors 表中的错误总数。它只是将表中的所有错误加起来。
运行总和:
此 CTE 按降序计算错误累积总数。对于每项作业,它会将所有错误数大于或等于当前作业错误数的作业的错误总数相加(其中 x.errors >= t.errors)。它还会从 total_errors_cte 中检索 total_errors,以将运行总数与总错误的 40% 进行比较。
输出 :
小提琴