通过命令的标准输出以编程方式导出环境变量[重复]

Question

Felipe Evaristo

Asked: 2023-12-03 08:27:50 +0800 CST2023-12-03 08:27:50 +0800 CST 2023-12-03 08:27:50 +0800 CST

如何重新格式化数据块直到到达文件末尾？

772

我有一个如下所示的文件：

# Time-averaged data for fix avetimeall
# TimeStep Number-of-rows
# Row c_gyrationchunkall
1000 3
1 2.09024e-14
2 4.88628
3 5.69321
2000 3
1 2.10518e-14
2 8.33702
3 8.83162
3000 3
1 1.96656e-14
2 12.1396
3 11.5835
...

在我的文件中，前三行始终是标题。在标题之后，我的文件列出了相同大小的数据块，每个数据块都以标签子标题开头。我想重新组织文件中的数据，以便将每个块中的数据发送到以该块标签的相关部分开始的行中，然后列出该块的相关数据值，所有数据都用空格分隔开。作为示例，我想将上面的示例转换为：

# Time-averaged data for fix avetimeall
# TimeStep c_gyrationchunkall
1000 2.09024e-14 4.88628 5.69321
2000 2.10518e-14 8.33702 8.83162
3000 1.96656e-14 12.1396 11.5835
...

我如何在 Bash 中执行此操作？我有一些 Bash 经验，但恐怕不足以快速处理这个问题......

5 个回答

Voted

Ed Morton · Answer 1 · 2023-12-03T19:06:19+08:00

使用任何 awk，无论3块中的行数是否可以变化：

$ awk '
    NR == 2 { $3=""; saved=$0; next }
    NR == 3 { $0=saved $3 }
    NR  < 4 { print; next }
    !numLines {
        numLines = $2
        printf "%s%s", $1, OFS
        next
    }
    { printf "%s%s", $2, (--numLines ? OFS : ORS) }
' file
# Time-averaged data for fix avetimeall
# TimeStep c_gyrationchunkall
1000 2.09024e-14 4.88628 5.69321
2000 2.10518e-14 8.33702 8.83162
3000 1.96656e-14 12.1396 11.5835

继Xavier Gs 回答下有关可读性风格偏好的讨论之后，这里有一个 awk 脚本，其编写风格与 shell 脚本相同（并包含在 shell 脚本中，因此它在外部的行为方式相同），但它将运行比 shell 脚本快*几个数量级，并且比 shell 脚本更健壮和可移植：

$ cat ./script_filename
#!/usr/bin/env bash

awk '
    BEGIN {
        # Reformat comments:
        getline first_line
        print first_line
        getline; split($0,line2)
        getline; split($0,line3)
        printf "# %s %s\n", line2[2], line3[3]

        # Reformat data:
        while ( getline > 0 ) {
            timestep=$1; number_of_rows=$2
            printf "%s", timestep
            for ( i=1; i<=number_of_rows; i++ ) {
                getline; row_value=$NF
                printf " %s", row_value
            }
            print ""
        }
    }
'

$ ./script_filename < input
# Time-averaged data for fix avetimeall
# TimeStep c_gyrationchunkall
1000 2.09024e-14 4.88628 5.69321
2000 2.10518e-14 8.33702 8.83162
3000 1.96656e-14 12.1396 11.5835

* 以下是在包含 90,000 条 OP 记录的文件上运行 bash 脚本与上述 awk 脚本的第三次运行计时结果：

$ time ./script_bash < file > /dev/null

real    0m9.425s
user    0m5.062s
sys     0m4.139s

$ time ./script_awk < file > /dev/null

real    0m0.265s
user    0m0.171s
sys     0m0.000s

jubilatious1 · Answer 2 · 2023-12-03T13:49:10+08:00

使用Raku（以前称为 Perl_6）

用于skip暂时忘记标题行：

~$ raku -e 'my @a = lines.skip(3).rotor(4, partial => True).map: *.words; .[0,3,5,7].put for @a;'  file

#OR

~$ raku -e 'my @a = lines.skip(3).batch(4).map: *.words; .[0,3,5,7].put for @a;'  file

上面是用 Raku（Perl 编程语言家族的成员）编写的答案。简而言之，lines读入、skipping 前 3 个标题行。每 4 行都一起rotor编辑batch，包括partial文件末尾的最终“旋转”。当我们这样做时，让我们将每个rotor/batch分成空格分隔的words。

这些转子/批次 4 行，每行在空白处破碎，保存在@名为的签名数组中@a。最后（在第二个语句中），使用for每个@a位置进行迭代put，并注意删除不需要的元素（通过索引括号[0,3,5,7]）。

输入示例：

# Time-averaged data for fix avetimeall
# TimeStep Number-of-rows
# Row c_gyrationchunkall
1000 3
1 2.09024e-14
2 4.88628
3 5.69321
2000 3
1 2.10518e-14
2 8.33702
3 8.83162
3000 3
1 1.96656e-14
2 12.1396
3 11.5835

示例输出：

1000 2.09024e-14 4.88628 5.69321
2000 2.10518e-14 8.33702 8.83162
3000 1.96656e-14 12.1396 11.5835

put关于标题行，用两个语句启动 Raku 代码可能很容易，例如put "Time-averaged data...";等。但实际上，以下工作可以给出 OP 所需的输出：

~$ raku -e 'lines[0].put; .words[0..1, *-1].put for lines[0..1].rotor(2);  \
            my @a = lines.rotor(4, partial => True).map: *.words;          \
            .[0,3,5,7].put for @a;'  file
## Time-averaged data for fix avetimeall
# TimeStep c_gyrationchunkall
1000 2.09024e-14 4.88628 5.69321
2000 2.10518e-14 8.33702 8.83162
3000 1.96656e-14 12.1396 11.5835

https://raku.org

Prabhjot Singh · Answer 3 · 2023-12-03T19:31:31+08:00

Prabhjot Singh

2023-12-03T19:31:31+08:002023-12-03T19:31:31+08:00

使用AWK：

$ awk '
    NR==2{sub(/[[:space:]]+[^[:space:]]+$/,"");rec = $0; next}
    NR==3{$0 = rec OFS $NF};
    NR<4;                                
    NR>3{printf "%s", (NR%4==0) ? ((NR==4) ? "" : ORS) $1 : ($1="")$0 }
   END{if (NR)print ""}'

$ awk '
   NR==2{sub(/[[:space:]]+[^[:space:]]+$/,"");rec = $0; next}
   NR==3{$0 = rec OFS $NF};
   NR<4;
   $NF ~ /^[0-9]+$/{a=$NF;n=NR+a; sub(/[[:space:]]+[^[:space:]]+$/,""); printf "%s", $0; next}                    
   NR<=n{$1 =""; printf "%s", $0((NR==n) ? ORS : "") }'

3

Xavier G. · Answer 4 · 2023-12-03T12:22:38+08:00

Xavier G.

2023-12-03T12:22:38+08:002023-12-03T12:22:38+08:00

快速而肮脏的答案——随意运行shellcheck：

#!/usr/bin/env bash

# Reformat comments:
read -r first_line
echo "${first_line}"
read -r sharp line2_word1 line2_word2
read -r sharp line3_word1 line3_word2
echo "# ${line2_word1} ${line3_word2}"

# Reformat data:
while read -r timestep number_of_rows; do
    echo -n "${timestep}"
    for (( i=1; i<=number_of_rows; i++ )); do
        read -r row value
        echo -n " ${value}"
    done
    echo
done

用法：./script_filename < input

限制：

该脚本假设数据行是有序的（即 1、2、3，如示例所示）
该脚本不处理中断的数据（例如公布 3 行数据但只提供 1 行）

2

elmo · Answer 5 · 2023-12-04T05:53:53+08:00

elmo

2023-12-04T05:53:53+08:002023-12-04T05:53:53+08:00

根据问题中提到的警告并使用示例输入作为文件 q762948，您可以通过简单的 awk 命令来执行此操作：

$ head -2 q762948 >result.txt
# dump the comments as required
$ tail +4 q762948 | awk '{c=(NR-1)%4} c==0{p=$1;print ""} c>0{p=$2}{printf p"  "}'>>result.txt    
$ cat result.txt

# Time-averaged data for fix avetimeall
# TimeStep Number-of-rows

1000  2.09024e-14  4.88628  5.69321  
2000  2.10518e-14  8.33702  8.83162  
3000  1.96656e-14  12.1396  11.5835

2

如何重新格式化数据块直到到达文件末尾？

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

如何重新格式化数据块直到到达文件末尾？

5 个回答

相关问题