AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / coding / 问题 / 79216699
Accepted
Christian Bongiorno
Christian Bongiorno
Asked: 2024-11-23 06:01:51 +0800 CST2024-11-23 06:01:51 +0800 CST 2024-11-23 06:01:51 +0800 CST

在 Oracle 中:如何仅获取总和占总数的 x % 以上的记录

  • 772

我有这些数据:

select * from (
    select 'A' as JOB, 15 as errors from dual union all
    select 'B' as JOB, 17 as errors from dual union all
    select 'C' as JOB, 29 as errors from dual union all
    select 'D' as JOB, 27 as errors from dual union all
    select 'E' as JOB, 35 as errors from dual union all
    select 'F' as JOB, 32 as errors from dual union all
    select 'G' as JOB, 75 as errors from dual union all
    select 'H' as JOB, 31 as errors from dual union all
    select 'I' as JOB, 12 as errors from dual union all
    select 'J' as JOB, 10 as errors from dual
)

用文字来说,我需要:The jobs constituting the (top) 60% of errors

因此,在这种情况下,那将是(113):

select sum(errors) * .4 as cut_off from ...

最终结果将是这样的,因为它们的总和 < 113:

工作 错误
格 75
埃 三十五

我基本上需要一个过滤器来保持某种运行总和,然后一旦达到该值就丢弃所有内容。

我有这个查询,它不太有效,我不希望使用该with语句

with data as (
    select 'A' as JOB, 15 as errors from dual union all
        select 'B' as JOB, 17 as errors from dual union all
        select 'C' as JOB, 29 as errors from dual union all
        select 'D' as JOB, 27 as errors from dual union all
        select 'E' as JOB, 35 as errors from dual union all
        select 'F' as JOB, 32 as errors from dual union all
        select 'G' as JOB, 75 as errors from dual union all
        select 'H' as JOB, 31 as errors from dual union all
        select 'I' as JOB, 12 as errors from dual union all
        select 'J' as JOB, 10 as errors from dual
)
select k.*
from (
    select t.*,
           errors + LAG(errors, 1, 0) OVER (order by errors desc ) previous
    from data t
) k where previous >= (select sum(errors) *.4 from data) order by errors desc

我已经尝试过窗口总和:

select k.*
from (
    select t.*,
           SUM(errors) OVER (
               partition by JOB
               order by errors desc
               RANGE BETWEEN UNBOUNDED PRECEDING
                AND CURRENT ROW
          ) as limit
    from (
        select 'A' as JOB, 15 as errors from dual union all
        select 'B' as JOB, 17 as errors from dual union all
        select 'C' as JOB, 29 as errors from dual union all
        select 'D' as JOB, 27 as errors from dual union all
        select 'E' as JOB, 35 as errors from dual union all
        select 'F' as JOB, 32 as errors from dual union all
        select 'G' as JOB, 75 as errors from dual union all
        select 'H' as JOB, 31 as errors from dual union all
        select 'I' as JOB, 12 as errors from dual union all
        select 'J' as JOB, 10 as errors from dual
    ) t
) k order by errors desc
sql
  • 2 2 个回答
  • 26 Views

2 个回答

  • Voted
  1. Best Answer
    keithwalsh
    2024-11-23T06:27:33+08:002024-11-23T06:27:33+08:00
    • SUM(errors) OVER (ORDER BY errors DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)按降序计算错误的累计总数。
    • WHERE cum_errors <= cut_off筛选累计错误率低于总数 40% 的作业。
    SELECT job, errors
    FROM (
        SELECT job, errors,
            SUM(errors) OVER (ORDER BY errors DESC
                ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_errors,
            SUM(errors) OVER () * 0.4 AS cut_off
        FROM (
            SELECT 'A' AS job, 15 AS errors FROM DUAL UNION ALL
            SELECT 'B' AS job, 17 AS errors FROM DUAL UNION ALL
            SELECT 'C' AS job, 29 AS errors FROM DUAL UNION ALL
            SELECT 'D' AS job, 27 AS errors FROM DUAL UNION ALL
            SELECT 'E' AS job, 35 AS errors FROM DUAL UNION ALL
            SELECT 'F' AS job, 32 AS errors FROM DUAL UNION ALL
            SELECT 'G' AS job, 75 AS errors FROM DUAL UNION ALL
            SELECT 'H' AS job, 31 AS errors FROM DUAL UNION ALL
            SELECT 'I' AS job, 12 AS errors FROM DUAL UNION ALL
            SELECT 'J' AS job, 10 AS errors FROM DUAL
        ) t
    )
    WHERE cum_errors <= cut_off
    ORDER BY errors DESC;
    

    输出:

    工作 错误
    格 75
    埃 三十五

    小提琴

    • 2
  2. samhita
    2024-11-23T07:58:40+08:002024-11-23T07:58:40+08:00

    使用子查询和自连接的另一种逻辑。

    总错误数:

    此 CTE 计算 jobs_errors 表中的错误总数。它只是将表中的所有错误加起来。

    运行总和:

    此 CTE 按降序计算错误累积总数。对于每项作业,它会将所有错误数大于或等于当前作业错误数的作业的错误总数相加(其中 x.errors >= t.errors)。它还会从 total_errors_cte 中检索 total_errors,以将运行总数与总错误的 40% 进行比较。

    WITH total_errors_cte AS (
        -- Calculate the total errors for all jobs
        SELECT SUM(errors) AS total_errors
        FROM jobs_errors
    ),
    running_sum_cte AS (
        -- Calculate the running sum of errors, ordered by errors DESC
        SELECT JOB, errors, 
               (SELECT total_errors FROM total_errors_cte) AS total_errors,
               (
                   SELECT SUM(errors)
                   FROM jobs_errors x
                   WHERE x.errors >= t.errors
               ) AS running_sum
        FROM jobs_errors t
    )
    -- Filter jobs whose running sum is <= 40% of total errors
    SELECT JOB, errors
    FROM running_sum_cte
    WHERE running_sum <= total_errors * 0.4
    ORDER BY errors DESC;
    

    输出 :

    小提琴

    在此处输入图片描述

    • 0

相关问题

  • 更新除某些列上具有相同值的行之外的所有行

  • 当我返回 sql 列时,有没有办法只反转数字?(希伯来语)

  • 布尔值之间的 SQL less/greater 比较会产生意外结果

  • 如何根据数组中的匹配更新 Postgres 表中的值

  • 如何在sql server中对列求和

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    Vue 3:创建时出错“预期标识符但发现‘导入’”[重复]

    • 1 个回答
  • Marko Smith

    为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍,但在任何 Oracle JVM 上却不行?

    • 1 个回答
  • Marko Smith

    具有指定基础类型但没有枚举器的“枚举类”的用途是什么?

    • 1 个回答
  • Marko Smith

    如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误?

    • 6 个回答
  • Marko Smith

    `(表达式,左值) = 右值` 在 C 或 C++ 中是有效的赋值吗?为什么有些编译器会接受/拒绝它?

    • 3 个回答
  • Marko Smith

    何时应使用 std::inplace_vector 而不是 std::vector?

    • 3 个回答
  • Marko Smith

    在 C++ 中,一个不执行任何操作的空程序需要 204KB 的堆,但在 C 中则不需要

    • 1 个回答
  • Marko Smith

    PowerBI 目前与 BigQuery 不兼容:Simba 驱动程序与 Windows 更新有关

    • 2 个回答
  • Marko Smith

    AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String”

    • 1 个回答
  • Marko Smith

    我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

    • 1 个回答
  • Martin Hope
    Aleksandr Dubinsky 为什么 InetAddress 上的 switch 模式匹配会失败,并出现“未涵盖所有可能的输入值”? 2024-12-23 06:56:21 +0800 CST
  • Martin Hope
    Phillip Borge 为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍,但在任何 Oracle JVM 上却不行? 2024-12-12 20:46:46 +0800 CST
  • Martin Hope
    Oodini 具有指定基础类型但没有枚举器的“枚举类”的用途是什么? 2024-12-12 06:27:11 +0800 CST
  • Martin Hope
    sleeptightAnsiC `(表达式,左值) = 右值` 在 C 或 C++ 中是有效的赋值吗?为什么有些编译器会接受/拒绝它? 2024-11-09 07:18:53 +0800 CST
  • Martin Hope
    The Mad Gamer 何时应使用 std::inplace_vector 而不是 std::vector? 2024-10-29 23:01:00 +0800 CST
  • Martin Hope
    Chad Feller 在 5.2 版中,bash 条件语句中的 [[ .. ]] 中的分号现在是可选的吗? 2024-10-21 05:50:33 +0800 CST
  • Martin Hope
    Wrench 为什么双破折号 (--) 会导致此 MariaDB 子句评估为 true? 2024-05-05 13:37:20 +0800 CST
  • Martin Hope
    Waket Zheng 为什么 `dict(id=1, **{'id': 2})` 有时会引发 `KeyError: 'id'` 而不是 TypeError? 2024-05-04 14:19:19 +0800 CST
  • Martin Hope
    user924 AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String” 2024-03-20 03:12:31 +0800 CST
  • Martin Hope
    MarkB 为什么 GCC 生成有条件执行 SIMD 实现的代码? 2024-02-17 06:17:14 +0800 CST

热门标签

python javascript c++ c# java typescript sql reactjs html

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve