多行文件洗牌

Question

Chris

Asked: 2020-02-06 07:19:39 +0800 CST2020-02-06 07:19:39 +0800 CST 2020-02-06 07:19:39 +0800 CST

SSV/CSV 操作：计算比率

772

请注意，我知道datamash并且是一位经验丰富的awk用户。我正在寻找比awk. 假设我有以下内容：

// data_file
foo bar biz
10  100 1000
11  150 990
10  95  1010
9   99  950

// usage goal, in pseudo code
cat data_file | <tool> --ratio foo,bar --ratio foo,biz --ratio bar,biz

// desired output
foo bar biz foo_bar foo_biz bar_biz
10  100 1000 0.1    0.01    0.1  
11  150 990  0.073  0.011   0.1515
10  95  1010 0.105  0.0099  0.094
9   99  950  0.09   0.0095  0.1042

为了得到这个接口，我将用 C++ 构建一些微不足道的东西。

在此之前，在 Unix 中是否有一个解决方案？

2 个回答

Voted

aborruso · Answer 1 · 2020-02-06T23:54:41+08:00

aborruso

2020-02-06T23:54:41+08:002020-02-06T23:54:41+08:00

使用米勒（https://github.com/johnkerl/miller）并运行

mlr --pprint put '$foo_bar=$foo/$bar;$foo_biz=$foo/$biz;$bar_biz=$bar/$biz' input >output

你有

foo bar biz  foo_bar  foo_biz  bar_biz
10  100 1000 0.100000 0.010000 0.100000
11  150 990  0.073333 0.011111 0.151515
10  95  1010 0.105263 0.009901 0.094059
9   99  950  0.090909 0.009474 0.104211

3

Thor · Answer 2 · 2020-02-06T08:50:21+08:00

使用几个 bash 函数，如果你有一个文件可以使用paste，你可以非常简单地到达那里：bccsvtool

div() {
  printf "%1.4f\n" $(bc -l <<<"1.0 * $1 / $2")
}
export -f div

ratio() {
  echo "$1"_"$2"
  csvtool -t ' ' namedcol $1,$2 data.ssv |
  tail -n+2                              |
  csvtool call div -
}

paste -d ' ' <(cat data.ssv) <(ratio foo bar) <(ratio foo biz) <(ratio bar biz) |
csvtool -t ' ' readable -

输出：

foo bar biz  foo_bar foo_biz bar_biz 
10  100 1000 0.1000  0.0100  0.1000  
11  150 990  0.0733  0.0111  0.1515  
10  95  1010 0.1053  0.0099  0.0941  
9   99  950  0.0909  0.0095  0.1042

如果您真的想以流媒体方式进行，您最好的选择可能是awk，例如：

解析.awk

# Parse the requested column ratios into dividend[] and divisor[]
# by column name
BEGIN {
  split(ratios_str, ratios, / +/)
  for(r in ratios) {
    split(ratios[r], cols, /,/)
    dividend[++i] = cols[1] 
    divisor[i]    = cols[2]
  }
}

# Sort out the header
NR == 1 { 
  # Create the ColumnName-to-ColumnNumber hash
  split($0, a); for(k in a) c2n[a[k]]=k

  # Print the header line
  printf "%s ", $0
  for(i=1; i<=length(dividend); i++)
    printf "%s_%s ", dividend[i], divisor[i]
  printf "\n"
}

NR > 1 {
  printf "%s ", $0
  for(i=1; i<=length(dividend); i++)
    printf "%1.4f ", $(c2n[dividend[i]]) / $(c2n[divisor[i]])
  printf "\n"
}

像这样运行它：

<data.ssv awk -f parse.awk -v ratios_str='foo,bar foo,biz bar,biz' | column -t

输出：

foo  bar  biz   foo_bar  foo_biz  bar_biz
10   100  1000  0.1000   0.0100   0.1000
11   150  990   0.0733   0.0111   0.1515
10   95   1010  0.1053   0.0099   0.0941
9    99   950   0.0909   0.0095   0.1042

SSV/CSV 操作：计算比率

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

SSV/CSV 操作：计算比率

2 个回答

相关问题