使用 jq 获取结构化输出

Question

Kusalananda

Asked: 2023-04-26 23:26:47 +0800 CST2023-04-26 23:26:47 +0800 CST 2023-04-26 23:26:47 +0800 CST

如何参数化 `jq` 表达式以返回选择或其补码？

772

让我们假设我有两个非常复杂的jq表达式，但它们的不同之处仅在于一个返回另一个的补码，即它们之间的唯一区别是一个做select(expression)而另一个做select(expression|not)。

简化示例：

$ jq -n '$ARGS.positional[] | select( . > 2 )' --jsonargs 1 2 3 4 5
3
4
5

$ jq -n '$ARGS.positional[] | select( . > 2 | not )' --jsonargs 1 2 3 4 5
1
2

与其jq在我的代码中重复这两个不同的表达式（实际上，每个表达式只有几行），不如将一个值传递到单个表达式中以在两种行为之间切换会很整洁。

我怎样才能做到这一点？

实际jq代码（根据消息负载中编码的文件路径过滤 RabbitMQ 消息）：

map(
        # Add an array of pathnames that would match this message.  This
        # includes the pathnames of each parent directory, leading up to
        # and including the pathname of the file itself.
        .tmp_paths = [
                # The full pathname is part of a base64-encodod JSON blob.
                foreach (
                        .payload |
                        @base64d |
                        fromjson.filepath |
                        split("/")[]
                ) as $elem (
                        null;
                        . += $elem + "/";
                        .
                )

        ] |
        # The last element is the full file path and should not have a
        # trailing slash.
        .tmp_paths[-1] |= rtrimstr("/")
) |
[
        # Match the pathnames given as positional command line arguments
        # against the computed pathnames in the "tmp_paths" array in
        # each message.  Extract the messages with a match.
        JOIN(
                INDEX($ARGS.positional[]; .);
                .[];
                .tmp_paths[];
                if (.[1:] | any) then
                        .[0]
                else
                        empty
                end
        )
] |
# Deduplicate the extracted messages on the full pathname of the file.
# Then remove the "tmp_paths" array from each message and base64 encode
# them.
unique_by(.tmp_paths[-1])[] |
del(.tmp_paths) |
@base64

我假设我需要if以某种方式修改该语句以使其提取或丢弃其文件路径与作为位置参数给出的路径名匹配的消息。

2 个回答

Voted

Kusalananda · Answer 1 · 2023-04-26T23:26:47+08:00

将布尔值传递到jq表达式中，并使用if- 语句在返回选定集或其补集之间切换：

$ jq -n --argjson yes true '$ARGS.positional[] | select( . > 2 | if $yes then . else not end )' --jsonargs 1 2 3 4 5
3
4
5

$ jq -n --argjson yes false '$ARGS.positional[] | select( . > 2 | if $yes then . else not end )' --jsonargs 1 2 3 4 5
1
2

在更复杂的jq表达式中，修改if语句：

# Match the pathnames given as positional command line arguments
# against the computed pathnames in the "tmp_paths" array in
# each message.  Depending on the $yes boolean variable, extract
# or discard matching messages.
JOIN(
        INDEX($ARGS.positional[]; .);
        .[];
        .tmp_paths[];
        if (.[1:] | any | if $yes then . else not end) then
                .[0]
        else
                empty
        end
)

请注意，withif $yes then . else not end允许变量充当$yes我们想要一个集合还是它的补集的“切换”。在简化的select()和更复杂的中JOIN()，此if语句作用于布尔测试结果，该结果确定元素是否应成为结果集的一部分。

LL3 · Answer 2 · 2023-04-29T22:59:16+08:00

@Kusalananda 描述的解决方案可以说是适用于所有常见情况的最佳方式，尤其是对于偶尔出现的情况，因为它简单、可读、紧凑，而且速度相当快。

如果您经常使用这种切换行为以保证稳定的设置，或者如果您愿意多走一英里以获得一些速度，您可能会考虑采用不同的方法。

if ... then ... else ... end事实上，a 的这种简单方法有一个缺点，即为流中的每个对象添加额外的比较。这种比较恰好有点浪费，因为它的结果总是预先知道的，是来自命令行的静态输入，在执行过程中永远不会改变。

删除该比较的一种可能方法是使用模块中定义的函数，然后您可以在命令行中选择该函数。

考虑：

# let's set the thing up
$ mkdir dot && echo 'def dot_or_not: .;' > dot/.jq
$ mkdir not && echo 'def dot_or_not: not;' > not/.jq
# now let's use it
$ seq 5 | jq 'include "./"; select ( . > 2 | dot_or_not )' -Ldot
3
4
5
$ seq 5 | jq 'include "./"; select ( . > 2 | dot_or_not )' -Lnot
1
2

在单处理器 VM 上的一些简单基准测试中，这种方法的结果平均比 bare 快 5 倍if ... then ... else ... end，尽管它可能不会影响您所展示的更大计算的“经济性”。

我自己可能不会为这样一个简单的操作走这么远……因为在其他可能的（更有价值的）模块之上处理每个额外的“开关”时它会变得越来越麻烦和笨拙。事实上，我宁愿使用模块来进行真正不同的计算变体……但仍然如此。

只是为了完整起见，在频谱的另一端，另一种方法可能是这样的：

$ seq 5 | jq 'select( . > 2 | [not,.][$yes] )' --argjson yes 1
3
4
5
$ seq 5 | jq 'select( . > 2 | [not,.][$yes] )' --argjson yes 0
1
2

或其表亲变体：

$ seq 5 | jq 'select( . > 2 | {(tostring):1}[$yes] )' --arg yes true 
3
4
5
$ seq 5 | jq 'select( . > 2 | {(tostring):1}[$yes] )' --arg yes false 
1
2

尽管这些方法看起来很紧凑，但它们也恰好比裸慢得多（在我的简单基准测试中慢 2-4 倍）if ... then ... else ... end，因为它们中的每一个都添加了 2 个易失对象的构造和一个查找。

如何参数化 `jq` 表达式以返回选择或其补码？

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

如何参数化 `jq` 表达式以返回选择或其补码？

2 个回答

相关问题