如果 Snowpark DF 是 JSON 格式并且我不想要完整的列（我想提取一个值），有没有办法选择它？

Question

Steven

Asked: 2024-12-12 06:29:26 +0800 CST2024-12-12 06:29:26 +0800 CST 2024-12-12 06:29:26 +0800 CST

按数组对象进行聚合 - 获取每个描述符的不同数量的客户 ID

772

我有类似的数据

with data(custid, descriptors) as (
    select 1, ['Corporate', 'fun times', 'but not really']
    union all 
    select 2, ['lame times', 'Corporate', 'boring']
    union all 
    select 3, ['boring', 'Corporate', 'fun times', 'but not really']
)
    
select 
* 
from data

包含超过 30k 行且所有数组中唯一描述符的数量未知。我想计算有多少个distinct customerid值具有包含给定字符串的描述符数组。对于任何特定字符串，我可以使用

select 
count(distinct custid)
from data
where array_contains('Corporate'::variant, descriptors)

但我想获取所有 30k+ 行中具有每个数组值的值的数量customerid，而不是一次获取一个。

最终，我希望有一张这样的桌子

描述符	n_custids
公司的	3
欢乐时光	2
但事实并非如此	2
蹩脚时代	1
无聊的	2

对于每个数组中每个可能的唯一字符串，但我不确定如何以编程方式获取所有数组成员，然后count(distinct custid)...where array_contains()对每个成员执行。我一直在阅读有关 s 和游标以及 FOR 循环的文档RESULTSET，但我对 Snowflake 还比较陌生，发现那组文档并不是完全有帮助。我知道我可以用来array_distinct(array_agg())组合所有数组并仅获取唯一值，但在此之后，我不知所措。我怀疑有一种简单的方法，但无论那是什么，我都错过了。

感谢您的帮助！

1 个回答

Voted

neeru0303 · Answer 1 · 2024-12-12T06:57:19+08:00

您已经接近了。如果您在执行 group by 操作之前将数组对象展平，将会有所帮助。flatten 函数的文档https://docs.snowflake.com/en/sql-reference/functions/flatten

with data(custid, descriptors) as (
    select 1, ['Corporate', 'fun times', 'but not really']
    union all
    select 2, ['lame times', 'Corporate', 'boring']
    union all
    select 3, ['boring', 'Corporate', 'fun times', 'but not really']
)

select
    value::string as descriptor,
    array_unique_agg(custid) distinct_values,
    count(distinct custid) count_distinct
from data,
lateral flatten( input => data.descriptors)
group by 1;

+--------------+-----------------+--------------+
|DESCRIPTOR    |DISTINCT_VALUES  |COUNT_DISTINCT|
+--------------+-----------------+--------------+
|Corporate     |[                |3             |
|              |1,               |              |
|              |2,               |              |
|              |3                |              |
|              |]                |              |
|fun times     |[                |2             |
|              |1,               |              |
|              |3                |              |
|              |]                |              |
|but not really|[                |2             |
|              |1,               |              |
|              |3                |              |
|              |]                |              |
|lame times    |[                |1             |
|              |2                |              |
|              |]                |              |
|boring        |[                |2             |
|              |2,               |              |
|              |3                |              |
|              |]                |              |
+--------------+-----------------+--------------+

按数组对象进行聚合 - 获取每个描述符的不同数量的客户 ID

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍，但在任何 Oracle JVM 上却不行？

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

按数组对象进行聚合 - 获取每个描述符的不同数量的客户 ID

1 个回答

相关问题