Rovanion提出的问题 -dba

Rovanion

Asked: 2022-01-11 04:45:39 +0800 CST

聚合函数中的自引用条件

我有一个来自养鱼场的真实用例，其中养鱼场的增长取决于养鱼时养鱼场中鱼的平均大小。我已经将这个问题简化为我认为无法在 PostgreSQL 中表达的核心问题：一个聚合函数，其中的条件取决于该聚合的先前计算的值。

操作的数据是一系列事务。

create table transactions (
    id           bigserial primary key,
    feed_g       bigint  
);

insert into transactions
    (feed_g)
values
    (50),
    (50),
    (50),
    (50);

计算这些行的总和很简单。

select
    id,
    feed_g,
    sum(feed_g) over (order by id) as simple_sum
from transactions;

--  id | feed_g | simple_sum 
-- ----+--------+------------
--   1 |     50 |         50
--   2 |     50 |        100
--   3 |     50 |        150
--   4 |     50 |        200

使用取决于输入行值的条件计算总和也很简单。在下面的查询中，将始终使用第二种情况。

select
    id,
    feed_g,
    sum(
        case when feed_g > 75 then feed_g
             else                  feed_g * 0.5
        end
    ) over (order by id) as row_weighted_sum
from transactions;

--  id | feed_g | row_weighted_sum 
-- ----+--------+------------------
--   1 |     50 |             25.0
--   2 |     50 |             50.0
--   3 |     50 |             75.0
--   4 |     50 |            100.0

我不知道该怎么做是编写一个查询，其中聚合函数中的条件取决于前一行的相同聚合函数计算的输出。

下面是一些不工作的伪 SQL。

select
    id,
    feed_g,
    sum(
        case when lag(recursive_sum) + feed_g  > 75 then feed_g
             else                                        feed_g * 0.5
        end
    ) over (order by id) as recursive_sum
from transactions;

-- The imagined output would be the following:
--  id | feed_g | row_weighted_sum 
-- ----+--------+------------------
--   1 |     50 |             25.0
--   2 |     50 |             50.0
--   3 |     50 |            100.0
--   4 |     50 |            150.0

将simple_sum用作的输入recursive_sum似乎不是一个可行的解决方案，因为它们会随着时间的推移而分道扬镳。在给定的小型示例数据集中，这种漂移会影响第二行，其中在simple_sum第 3 行之前它不应该发生在第 2 行的阈值交叉处。

with estimate as (
    select
        id,
        feed_g,
        sum(feed_g) over (order by id) as simple_sum
    from transactions
)
select
    id,
    feed_g,
    simple_sum,
    sum(
        case when simple_sum > 75 then feed_g
             else                      feed_g * 0.5
        end
    ) over (order by id) as simple_sum_weighted_sum
from estimate;

--  id | feed_g | simple_sum | simple_sum_weighted_sum 
-- ----+--------+------------+-------------------------
--   1 |     50 |         50 |                    25.0
--   2 |     50 |        100 |                    75.0
--   3 |     50 |        150 |                   125.0
--   4 |     50 |        200 |                   175.0

simple_sum_weighted_sum在调用中使用作为输入的第三步也lag不起作用，因为它“忘记”了除最后一行之外的所有内容的权重。

with estimate as (
    select
        id,
        feed_g,
        sum(feed_g) over (order by id) as simple_sum
    from transactions
),
est2 as (
select
    id,
    feed_g,
    simple_sum,
    sum(
        case when simple_sum > 75 then feed_g
             else                      feed_g * 0.5
        end
    ) over (order by id) as simple_sum_weighted_sum
from estimate)
select
    id,
    feed_g,
    simple_sum,
    simple_sum_weighted_sum,
    coalesce(lag(simple_sum_weighted_sum) over (order by id), 0)
        + case when simple_sum_weighted_sum > 75 then feed_g
               else                                   feed_g * 0.5
          end as row_weighted_sum
from est2;

--  id | feed_g | simple_sum | simple_sum_weighted_sum | row_weighted_sum 
-- ----+--------+------------+-------------------------+------------------
--   1 |     50 |         50 |                    25.0 |             25.0
--   2 |     50 |        100 |                    75.0 |             50.0
--   3 |     50 |        150 |                   125.0 |            125.0
--   4 |     50 |        200 |                   175.0 |            175.0

我在 Python 中编写了该算法的两个工作实现以供参考。这是第一个命令式风格。

data = (50, 50, 50, 50)
sum = 0
for value in data:
  if sum + value > 75:
    sum = sum + value
  else:
    sum = sum + value * 0.5
  print(value, sum)

# 50 25.0
# 50 50.0
# 50 100.0
# 50 150.0

这第二个功能风格有些发育不良。

data = (50, 50, 50, 50)

def data_dependant_recursive_sum(iterator, last_sum):
  try:
    value = next(iterator)
  except StopIteration:
    return
  recursively_weighted_value = value if last_sum + value > 75 else value * 0.5
  recursive_sum = recursively_weighted_value + last_sum
  print(value, recursive_sum)
  data_dependant_recursive_sum(iterator, recursive_sum)
  
data_dependant_recursive_sum(iter(data), 0)

# 50 25.0
# 50 50.0
# 50 100.0
# 50 150.0

如果这个练习感觉做作和荒谬，可以在这里找到这个问题的更复杂但完整的版本：https ://stackoverflow.com/questions/70158295

我目前正在使用 Postgres 12，但如果需要，升级到 14 会很容易。

聚合函数中的自引用条件

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Rovanion's questions