死锁的主要原因是什么，可以预防吗？

Question

irimias

Asked: 2018-07-26 03:54:37 +0800 CST2018-07-26 03:54:37 +0800 CST 2018-07-26 03:54:37 +0800 CST

在单个查询中评估最常见的列值

772

我有一个表格来描述我的应用程序用户，其中包含一些详细信息，例如姓名、姓氏、出生日期、国籍、电子邮件等......

我想知道每个属性和每个用户类别的最常见值和出现百分比。

例如：

create table test ( userId int identity(1,1), 
                    categoryId int, 
                    name varchar(50), 
                    surname varchar(50))

insert into test(categoryId, name, surname)
values  (1, 'John', 'Locke'),
        (1, 'John', 'Millar'),
        (1, 'James', 'Mill'),
        (1, 'John Stuart', 'Mill'),
        (2, 'Thomas', 'Bayes'),
        (2, 'Laurent', 'Schwartz'),
        (2, 'Herrmann Amandus', 'Schwartz'),
        (2, 'Thomas', 'Simpson'),
        (2, 'Leonhard', 'Euler')

结果应该是：

+------------+-------+--------+---------+----------+------------+
| categoryId | total |  name  | namePct | surname  | surnamePct |
+------------+-------+--------+---------+----------+------------+
|          1 |     4 | John   |    0.50 | Mill     |       0.50 |
|          2 |     5 | Thomas |    0.40 | Schwartz |       0.40 |
+------------+-------+--------+---------+----------+------------+

对于这个简单的示例，我可以通过如下查询来计算如何实现这一点：

select  t.categoryId, 
        t.total, 
        n.name, 
        1. * n.total / t.total as namePct,
        sn.surname,
        1. * sn.total / t.total as surnamePct
from (
    select categoryId, count(*) as total
    from test
    group by categoryId
    ) t
join (
        select categoryId, name, total
        from (
            select categoryId, name, total, row_number() over(partition by categoryId order by total desc) as rn
            from (
                select categoryId, name, count(*) as total
                from test
                group by categoryId, name
                ) t
            ) t
        where rn = 1
        ) n on t.categoryId = n.categoryId
join (
        select categoryId, surname, total
        from (
            select categoryId, surname, total, row_number() over(partition by categoryId order by total desc) as rn
            from (
                select categoryId, surname, count(*) as total
                from test
                group by categoryId, surname
                ) t
            ) t
        where rn = 1
        ) sn on t.categoryId = sn.categoryId

但是，在我的实际用例中，我的表有数百万行、数百个类别和十几个属性。

有没有办法使查询更简单、更高效（即每个属性没有一堆子选择）？

我目前使用的是 SQL Server 2008，但欢迎使用更新版本的答案。

1 个回答

Voted

EzLo · Answer 1 · 2018-07-26T05:29:29+08:00

您可以使用函数的窗口版本COUNT()，按每个类别拆分PARTITION BY以获取计数和总数，而无需子查询（注意缺少GROUP BY）：

SELECT
    T.categoryId,

    T.name,
    NameOccurencies = COUNT(T.name) OVER (PARTITION BY T.categoryId, T.name),
    NameTotals = COUNT(T.name) OVER (PARTITION BY T.categoryId),

    T.surname,
    SurnameOccurencies = COUNT(T.surname) OVER (PARTITION BY T.categoryId, T.surname),
    SurnameTotals = COUNT(T.surname) OVER (PARTITION BY T.categoryId)
FROM
    #test AS T

结果：

categoryId  name                NameOccurencies NameTotals  surname             SurnameOccurencies  SurnameTotals
1           John                2               4           Locke               1                   4
1           John Stuart         1               4           Mill                2                   4
1           James               1               4           Mill                2                   4
1           John                2               4           Millar              1                   4
2           Thomas              2               5           Bayes               1                   5
2           Leonhard            1               5           Euler               1                   5
2           Herrmann Amandus    1               5           Schwartz            2                   5
2           Laurent             1               5           Schwartz            2                   5
2           Thomas              2               5           Simpson             1                   5

然后，您可以使用此结果来获得每个百分比，只需将出现次数除以每个总数即可。您还可以ROW_NUMBER()在此步骤中使用 a 计算最佳（最常见的）姓名和姓氏：

;WITH Totals AS
(
    SELECT
        T.categoryId,

        T.name,
        NameOccurencies = COUNT(T.name) OVER (PARTITION BY T.categoryId, T.name),
        NameTotals = COUNT(T.name) OVER (PARTITION BY T.categoryId),

        T.surname,
        SurnameOccurencies = COUNT(T.surname) OVER (PARTITION BY T.categoryId, T.surname),
        SurnameTotals = COUNT(T.surname) OVER (PARTITION BY T.categoryId)
    FROM
        #test AS T
)
SELECT
    T.categoryId,

    T.name,
    NamePercentage = T.NameOccurencies * 1.0 / NULLIF(T.NameTotals, 0),
    NameMostFrequentRanking = ROW_NUMBER() OVER (
        PARTITION BY
            T.categoryId
        ORDER BY 
            T.NameOccurencies * 1.0 / NULLIF(T.NameTotals, 0) DESC), -- NamePercentage

    T.surname,
    SurnamePercentage = T.SurnameOccurencies * 1.0 / NULLIF(T.SurnameTotals, 0),
    SurnameMostFrequentRanking = ROW_NUMBER() OVER (
        PARTITION BY
            T.categoryId
        ORDER BY 
            T.SurnameOccurencies * 1.0 / NULLIF(T.SurnameTotals, 0) DESC) -- SurnamePercentage
FROM
    Totals AS T

结果：

categoryId  name                NamePercentage  NameMostFrequentRanking surname     SurnamePercentage   SurnameMostFrequentRanking
1           John Stuart         0.250000000000  3                       Mill        0.500000000000      1
1           James               0.250000000000  4                       Mill        0.500000000000      2
1           John                0.500000000000  1                       Millar      0.250000000000      3
1           John                0.500000000000  2                       Locke       0.250000000000      4
2           Herrmann Amandus    0.200000000000  4                       Schwartz    0.400000000000      1
2           Laurent             0.200000000000  5                       Schwartz    0.400000000000      2
2           Thomas              0.400000000000  1                       Simpson     0.200000000000      3
2           Thomas              0.400000000000  2                       Bayes       0.200000000000      4
2           Leonhard            0.200000000000  3                       Euler       0.200000000000      5

最后，对于每个可用的类别...

SELECT
    T.categoryId,
    TotalRecords = COUNT(1)
FROM
    #test AS T
GROUP BY
    T.categoryId

我们可以通过一些连接获得最常见的名字和姓氏及其百分比：

;WITH Totals AS
(
    SELECT
        T.categoryId,

        T.name,
        NameOccurencies = COUNT(T.name) OVER (PARTITION BY T.categoryId, T.name),
        NameTotals = COUNT(T.name) OVER (PARTITION BY T.categoryId),

        T.surname,
        SurnameOccurencies = COUNT(T.surname) OVER (PARTITION BY T.categoryId, T.surname),
        SurnameTotals = COUNT(T.surname) OVER (PARTITION BY T.categoryId)
    FROM
        #test AS T
),
MostFrequentRanking AS
(
    SELECT
        T.categoryId,

        T.name,
        NamePercentage = T.NameOccurencies * 1.0 / NULLIF(T.NameTotals, 0),
        NameMostFrequentRanking = ROW_NUMBER() OVER (
            PARTITION BY
                T.categoryId
            ORDER BY 
                T.NameOccurencies * 1.0 / NULLIF(T.NameTotals, 0) DESC),

        T.surname,
        SurnamePercentage = T.SurnameOccurencies * 1.0 / NULLIF(T.SurnameTotals, 0),
        SurnameMostFrequentRanking = ROW_NUMBER() OVER (
            PARTITION BY
                T.categoryId
            ORDER BY 
                T.SurnameOccurencies * 1.0 / NULLIF(T.SurnameTotals, 0) DESC)
    FROM
        Totals AS T
),
AvailableCategories AS
(
    SELECT
        T.categoryId,
        TotalRecords = COUNT(1)
    FROM
        #test AS T
    GROUP BY
        T.categoryId
)
SELECT
    A.categoryId,
    A.TotalRecords,
    MN.name,
    NamePercentage = CONVERT(DECIMAL(3, 2), MN.NamePercentage),
    MS.surname,
    SurnamePercentage = CONVERT(DECIMAL(3, 2), MS.SurnamePercentage)
FROM
    AvailableCategories AS A
    LEFT JOIN MostFrequentRanking AS MN ON 
        A.categoryId = MN.categoryId AND
        MN.NameMostFrequentRanking = 1
    LEFT JOIN MostFrequentRanking AS MS ON 
        A.categoryId = MS.categoryId AND
        MS.SurnameMostFrequentRanking = 1

结果：

categoryId  TotalRecords    name    NamePercentage  surname     SurnamePercentage
1           4               John    0.50            Mill        0.50
2           5               Thomas  0.40            Schwartz    0.40

它可能有点大，但您可以在不添加 new 的情况下使用任意数量的新列来编辑此查询SELECT，只需对要显示的每个新列重复相同的逻辑并在最后添加一个附加连接。

SELECT ... INTO如果您有数百万条记录并且查询需要很长时间，您可能希望使用+ CREATE INDEXby将每个 CTE 拆分为一个临时表categoryId以加快处理速度（如果您愿意通过创建这些表来花费一些资源）

在单个查询中评估最常见的列值

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

在单个查询中评估最常见的列值

1 个回答

相关问题