需要一些系统调用

Question

BDN

Asked: 2018-05-26 08:02:05 +0800 CST2018-05-26 08:02:05 +0800 CST 2018-05-26 08:02:05 +0800 CST

从列中解析文本

772

2018-05-24 23:57:30 1.1.1.1 8.8.4.4
2018-05-24 23:57:32 2.2.2.2 8.8.4.4
2018-05-24 23:58:12 8.8.8.8 8.8.4.4
2018-05-24 23:58:23 8.8.8.8 8.8.4.4
2018-05-24 23:59:40 8.8.8.8 8.8.4.4
2018-05-24 23:59:51 8.8.8.8 8.8.4.4

因为我有上述格式的日志文件。现在我需要对其进行解析，输出应如下所示（如果重复行数据，则通过比较第 3 列和第 4 列仅显示第一行和最后一行。

2018-05-24 23:57:30 1.1.1.1 8.8.4.4
2018-05-24 23:57:32 2.2.2.2 8.8.4.4
2018-05-24 23:58:12 8.8.8.8 8.8.4.4
2018-05-24 23:59:51 8.8.8.8 8.8.4.4

6 个回答

Voted

choroba · Answer 1 · 2018-05-26T08:14:59+08:00

choroba

2018-05-26T08:14:59+08:002018-05-26T08:14:59+08:00

Perl 的救援：

perl -ane '
    if ($F[2] ne $c3 || $F[3] ne $c4) {
        $printed or print $previous;
        $printed = print;
    } else {
        $printed = 0;
    }
    ($c3, $c4, $previous) = (@F[2, 3], $_);
    END { print $previous unless $printed }
' -- input.file

-n逐行读取输入并运行每一行的代码。
-a将空格上的每个输入行拆分为 @F 数组。
$c3 和 $c4 用于保留第 3 列和第 4 列的先前值，实际值存储在 $F[2] 和 $F[3] 中（数组从 0 开始索引）。
$previous 存储上一行，以防我们需要打印它。
$printed 只是防止打印最后一行两次（如果其第 3 列和第 4 列与前一行不同，则会发生这种情况）。

6

αғsнιη · Answer 2 · 2018-05-26T08:17:42+08:00

Best Answer

αғsнιη

2018-05-26T08:17:42+08:002018-05-26T08:17:42+08:00

与awk：

awk '!first[$3, $4]{ first[$3, $4]= $0 } { last[$3, $4]= $0 }
    END{ for (x in last) print first[x] (last[x] != first[x]? ORS last[x]:"") }' infile
2018-05-24 23:58:12 8.8.8.8 8.8.4.4
2018-05-24 23:59:51 8.8.8.8 8.8.4.4
2018-05-24 23:57:30 1.1.1.1 8.8.4.4
2018-05-24 23:57:32 2.2.2.2 8.8.4.4

关联的first数组使用 column#3 和 column#4 的键组合保留第一个出现的行，但该last数组每次都使用相同的键保存最新的行。

读取所有行后，first数组中的值是最开始出现的行（具有不同的列#3、#4），其中的值last是最后出现的行。

然后在END打印时保存在first数组和下一个中的值last。当这(last[x] != first[x]? ORS last[x]:"")是唯一没有重复 column3&4 组合的唯一行时，这用于防止重复该行。

4

cheft · Answer 3 · 2018-05-26T12:31:52+08:00

cheft

2018-05-26T12:31:52+08:002018-05-26T12:31:52+08:00

在这种情况下，您也可以只使用仅比较列 3,4 的唯一行，然后附加最后一行。但是，如果所有其他行的第 3 列和第 4 列不同，这可能会导致最后一行重复。

然后只需将另一个管道添加到 uniq 以在需要时删除。

{uniq <your_file> -f2; tail -n1 <your_file>; } | cat | uniq

-f 此处跳过前 2 个空格分隔的字段。

0

Rakesh Sharma · Answer 4 · 2018-05-27T02:47:59+08:00

Rakesh Sharma

2018-05-27T02:47:59+08:002018-05-27T02:47:59+08:00

 perl -lane '
   *x = sub { print for splice @A; } if $. == 1;
   x() if $. > 1 and $F[2] ne $c3 || $F[3] ne $c4;
   ($c3, $c4, $A[!!@A]) = (@F[2,3], $_);
   x() if eof;
 '    include.txt

§ 这个怎么运作。

    °  Array @A holds only 2 elements max at any time. The beginning and end lines for the range.

   °  subroutine &x displays the array @A and after displaying empties it as well.

  °  display the previous range provided we are not at the first line and either of the previous columns don't match with the current.

  °   update the previous columns and array.

0

MiniMax · Answer 5 · 2018-05-27T04:50:23+08:00

第一个变体

paste -d'\n' <(uniq -f2 input.txt) <(tac input.txt | uniq -f2 | tac) | uniq

第二种变体

awk '
$3$4 == prev {
    buf = $0 ORS
}
$3$4 != prev {
    print buf $0
    prev = $3$4
    buf = ""
}
END {
    printf("%s", buf)
}' input.txt

测试

输入（测试复杂）

2018-05-24 23:57:30 1.1.1.1 8.8.4.4
2018-05-24 23:57:32 2.2.2.2 8.8.4.4
2018-05-24 23:58:12 8.8.8.8 8.8.4.4
2018-05-24 23:58:23 8.8.8.8 8.8.4.4
2018-05-24 23:59:40 8.8.8.8 8.8.4.4
2018-05-24 23:59:51 8.8.8.8 8.8.4.4
2018-05-25 00:18:12 8.8.1.8 8.8.4.4
2018-05-25 00:18:23 8.8.1.8 8.8.4.4
2018-05-25 00:19:40 8.8.1.8 8.8.4.4
2018-05-25 00:19:51 8.8.1.8 8.8.4.4
2018-05-25 00:39:51 8.8.2.8 8.8.4.4
2018-05-25 00:49:52 8.8.2.8 8.8.4.4
2018-05-25 00:59:51 8.8.2.8 8.8.4.4

输出（两种变体）

2018-05-24 23:57:30 1.1.1.1 8.8.4.4
2018-05-24 23:57:32 2.2.2.2 8.8.4.4
2018-05-24 23:58:12 8.8.8.8 8.8.4.4
2018-05-24 23:59:51 8.8.8.8 8.8.4.4
2018-05-25 00:18:12 8.8.1.8 8.8.4.4
2018-05-25 00:19:51 8.8.1.8 8.8.4.4
2018-05-25 00:39:51 8.8.2.8 8.8.4.4
2018-05-25 00:59:51 8.8.2.8 8.8.4.4

Rakesh Sharma · Answer 6 · 2018-05-27T19:21:22+08:00

 perl -lane '
   *x = sub { print for splice @A; } if $. == 1;
   x() if $. > 1 and $F[2] ne $c3 || $F[3] ne $c4;
   ($c3, $c4, $A[!!@A]) = (@F[2,3], $_);
   x() if eof;
 '    include.txt

§ 这个怎么运作。

    °  Array @A holds only 2 elements max at any time. The beginning and end lines for the range.

   °  subroutine &x displays the array @A and after displaying empties it as well.

  °  display the previous range provided we are not at the first line and either of the previous columns don't match with the current.

  °   update the previous columns and array.

¶ 阐述了另一种使用 sed 编辑器的方法。

 #! /bin/sh
  # declare regex assist variables
   b='[:space:]'
   s="[$b]"         # \s
   S="[^$b]"       # \S

   #      \S+                \s+
   F="$S$S*"   sp="$s$s*"
   F_s="$F$sp"      #  \S+\s+

   # composition of a line 
   L="$F_s$F_s\($F\)$sp\($F\)"

   #  matching next line
   M=".*$s\1$sp\2"

   #    2 lines when they match with 3,4 fields
   L2="$L\(\\n$M\)\{1\}"

   # 3 lines when they match in fields 3,4
   L3="$L\(\\n$M\)\{2\}"

  #### code 
  sed -e '
       #  bring on board next line for interrogation 
        N

         #   2 lines fields 3,4 donot match 
          #  display the first line... redo code with remaining 
         '"/^$L2\$/"'!{
                  P;D
           }

            #  3 lines with first two match but third not match in fields 3,4
           :a;h;N
           '"/^$L3\$/"'!{
                 x;p;g
                 s/.*\(\n\)/\1/;D
              }

              s/\n.*\(\n\)/\1/;ba
      '   include.txt

从列中解析文本

第一个变体

第二种变体

测试

如何将 GPG 私钥和公钥导出到文件

ssh 无法协商：“找不到匹配的密码”，正在拒绝 cbc

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

如何卸载内核模块“nvidia-drm”？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

从列中解析文本

6 个回答

第一个变体

第二种变体

测试

相关问题