grep 从 $START 到 $END 的一组行并且在 $MIDDLE 中包含匹配项

Question

69 420 1970

Asked: 2024-01-04 14:47:01 +0800 CST2024-01-04 14:47:01 +0800 CST 2024-01-04 14:47:01 +0800 CST

如何打印数字小于阈值的最长行序列？

772

我正在学习Perl，但我不知道如何解决这个问题。

我有一个.txt以下形式的文件：

1 16.3346384
2 11.43483
3 1.19819
4 1.1113829
5 1.0953443
6 1.9458343
7 1.345645
8 1.3847385794
9 1.3534344
10 2.1117454
11 1.17465
12 1.4587485

第一列仅包含行号，此处不感兴趣，但它存在于文件中；第二列中的值是相关部分。

我想输出第二列中编号小于 2.00 的最长连续行序列。对于上面的示例，这将是第 3 行到第 9 行，输出应为：

1.19819
1.1113829
1.0953443
1.9458343
1.345645
1.3847385794
1.3534344

7 个回答

Voted

aviro · Answer 1 · 2024-01-04T20:14:24+08:00

aviro

2024-01-04T20:14:24+08:002024-01-04T20:14:24+08:00

Perl 一行：

perl -ne '$n = (split)[1]; if ($n > 2) {if ($i > $max) {$longest=$cur; $cur=""; $max=$i}; $i=0} else {$cur .= $n . "\n"; $i++} END {print $i > $max ? $cur : $longest}' < file.txt

多行以获得更好的可读性：

perl -ne '
  $n = (split)[1];
  if ($n > 2) {
    if ($i > $max) {
      $longest=$cur;
      $cur="";
      $max=$i;
     }
     $i=0
  } else {
    $cur.= $n . "\n";
    $i++
  } 
  END {
    print $i > $max ? $cur : $longest
  }' < file.txt

1 个内衬awk：

awk '$2 > 2 { if (i > max) {res=cur; cur=""; max=i} i=0} $2 < 2 {cur = cur $2 "\n"; i++} END {if (i > max) res=cur; printf res}' file.txt

多线：

awk '
  $2 > 2 { 
    if (i > max) {
      res=cur
      cur=""
      max=i
    }
    i=0
  } 
  $2 < 2 {
    cur = cur $2 "\n"
    i++
  }
  END {
    if (i > max) res=cur
    printf res
  }' file.txt

4

AdminBee · Answer 2 · 2024-01-04T19:44:53+08:00

这不是一个微不足道的任务。关于提供完成的程序是否有助于其他人学习用编程语言解决问题也存在争议，但我相信它有其优点，所以我提出以下程序（我们称之为findlongestsequence.pl：

#!/usr/bin/perl
use strict;
use Getopt::Long;

my $limit; my $infile;
GetOptions( 'limit=f' => \$limit, 'infile=s' => \$infile );

my $lineno=0; my $groupstart;
my $currlength=0; my $maxlength=0; my $ingroup=0;
my @columns; my @groupbuf; my @longestgroup;

if (! open(fileinput, '<', "$infile" )) {exit 1;};
while (<fileinput>)
{
    $lineno++;
    @columns = split(/\s+/,$_);

    if ( $ingroup == 0 && $columns[1]<$limit )
    {
        $ingroup=1;
        $groupstart=$lineno;
        @groupbuf=();
    }

    if ( $ingroup == 1 )
    {
        if ($columns[1]>=$limit )
        {
            $ingroup=0;
            $currlength=$lineno-$groupstart;
    
            if ( $currlength>$maxlength )
            {
                $maxlength=$currlength;
                @longestgroup=@groupbuf;
            }
        }
        else
        {
            push(@groupbuf,$columns[1]);
        }
    }
}
close(fileinput);

if ( $ingroup == 1 )
{
    $currlength=$groupstart-$lineno;
    if ( $currlength>$maxlength )
    {
        $maxlength=$currlength;
        @longestgroup=@groupbuf;
    }
}

print join("\n",@longestgroup),"\n";
exit 0;

您可以将该程序称为

./findlongestsequence.pl --infile input.txt --limit 2.0

这将首先使用解释命令行参数Getopt::Long。

然后它将打开文件并逐行读取它，并在$lineno. 每行都将在空白处分成几列。

如果程序不在一组值< $limit（$ingroup为零）的行中，但遇到合适的行，它将记录它现在在这样的组中（$ingroup设置为1），存储组开始$groupstart并开始缓冲数组中第 2 列的值@groupbuf。
如果程序位于这样的组内，但当前值大于$limit，它将识别组尾并计算其长度。如果这比先前记录的最长组长，则新最长组的内容 ( @groupbuf) 和长度 ( $currlength) 分别复制到@longestgroup和$maxlength。

由于组可能由文件结尾而不是带有值 > 的行终止$limit，因此如果$ingroup在文件结尾处为 true，也执行此检查。

最后，@longestgroup打印的内容\n作为标记分隔符。

Ed Morton · Answer 3 · 2024-01-05T22:53:17+08:00

Ed Morton

2024-01-05T22:53:17+08:002024-01-05T22:53:17+08:00

使用任何 awk：

$ cat tst.awk
$2 >= 2 {
    max = getMax(cur,max)
    cur = ""
    next
}
{ cur = cur $2 ORS }
END {
    printf "%s", getMax(cur,max)
}
function getMax(a,b) {
    return ( gsub(ORS,"&",a) > gsub(ORS,"&",b) ? a : b )
}

$ awk -f tst.awk file
1.19819
1.1113829
1.0953443
1.9458343
1.345645
1.3847385794
1.3534344

2

Stéphane Chazelas · Answer 4 · 2024-01-04T17:42:59+08:00

Stéphane Chazelas

2024-01-04T17:42:59+08:002024-01-04T17:42:59+08:00

也许是这样的：

<input perl -snle '
  if ($_ < $limit) {
    $n++;
  } else {
    $max = $n if $n > $max;
    $n = 0;
  }
  END {
    print ($n > $max ? $n : $max);
  }' -- -limit=2 -max=0

或者，如果您不想查看最大行组中的行数，而是希望根据对问题的更新编辑来查看这些行：

<input perl -snle '
  if ($_ < $limit) {
    push @lines, $_;
  } else {
    @max = @lines if @lines > @max;
    @lines = ();
  }
  END {
    print for @lines > @max ? @lines : @max;
  }' -- -limit=2

如果，正如有人在您的问题中编辑的那样，行号是数据的一部分，请添加选项-a（awk 模式，其中记录被分割到@F数组中）并将$_（整个记录）替换为$F[1]（第二个字段，$F[0]是第一个字段）。

1

Simon Branch · Answer 5 · 2024-01-05T01:20:49+08:00

<>用于读取输入和触发器运算符的惯用解决方案。

#!/usr/bin/env perl
use strict;
use warnings;
# https://unix.stackexchange.com/questions/766081/how-to-print-the-longest-sequence-of-lines-featuring-numbers-smaller-than-a-thre
my $threshold = 2.00;
my ($section, $maxsection, $len, $maxlen);
my $flipflop;
while (<>) {
    # Remove leading line number
    s/^(\d+)\s+//;
    # Flip flop operator
    # https://www.effectiveperlprogramming.com/2010/11/make-exclusive-flip-flop-operators/
    if ($flipflop = $_ <= $threshold .. $_ > $threshold) {
        if ($flipflop =~ /E0$/) {
            # End of section
            if (!defined($maxlen) || $len > $maxlen) {
                $maxsection = $section;
                $maxlen = $len;
            }
            $len = 0;
            $section = "";
        } else {
            $len++;
            $section .= $_;
        }
    }
}
# One last possible end of section
if ($flipflop && $len > $maxlen) {
    $maxsection = $section;
}
print $maxsection;

jubilatious1 · Answer 6 · 2024-01-05T18:27:11+08:00

使用Raku（以前称为 Perl_6）

~$ raku -ne 'BEGIN my (@max,@tmp);  $_ .= words;  \
             if .[1]  < 2 { @tmp.push: .[1] };    \
             if .[1] !< 2 { @max = @tmp if @tmp.elems > @max.elems; @tmp = Empty };  \
             END @max.elems >= @tmp.elems ?? (.put for @max) !! (.put for @tmp);'  file

或者：

~$ raku -ne 'BEGIN my (@max,@tmp);  $_ .= words;  \
             when .[1]  < 2 { @tmp.push: .[1] };  \
             default { @max = @tmp if @tmp.elems > @max.elems; @tmp = Empty };  \
             END @max.elems >= @tmp.elems ?? (.put for @max) !! (.put for @tmp);'  file

以下是用 Raku（Perl 编程语言家族的成员）编写的答案。Raku 具有有理数功能，如果您在执行简单的数学运算时需要保持精度（例如say 0.1 + 0.2 - 0.3;）。

$_第一个答案使用非自动打印逐行标志读取行-ne。a@max和@tmparray 都已声明。该行在空白处断开words并.=保存回$_。如果 ( ifstatement).[1]第二列满足条件，则将值push添加到@tmp数组中。如果没有，则该@tmp数组将覆盖该@max数组（如果它有更多elems（元素））。不管怎样，@tmp数组是Empty（清空的）。为了END确保最终的连续序列是/不是最长的，Raku 的Test ?? True !! False三元运算符用于输出put最长的数组。
第二个答案与第一个答案类似，只是when使用了语句。在 Raku 中，一旦when满足条件，就会执行其关联的块，并且控制权将恢复到外部块，跳过任何后续的whenordefault语句。请参阅下面的参考。

输入示例：

1 16.3346384
2 11.43483
3 1.19819
4 1.1113829
5 1.0953443
6 1.9458343
7 1.345645
8 1.3847385794
9 1.3534344
10 2.1117454
11 1.17465
12 1.4587485

示例输出：

1.19819
1.1113829
1.0953443
1.9458343
1.345645
1.3847385794
1.3534344

注意：如果出现平局，上面的代码将输出第一个最长的连续序列。

https://docs.raku.org/syntax/when
https://docs.raku.org/
https://raku.org

Jas · Answer 7 · 2024-01-04T18:05:54+08:00

Jas

2024-01-04T18:05:54+08:002024-01-04T18:05:54+08:00

如果您不想过度设计，请尝试以下命令行：

awk '{print $2}' yourfile.txt | sort -g > youroutput.txt

第一个命令将选择文件的第二列
第二个命令将根据常规数字排序对所选列进行排序并写入输出文件。awk有关更多详细信息和摆弄，请查看和的手册页sort。

-1

如何打印数字小于阈值的最长行序列？

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

如何打印数字小于阈值的最长行序列？

7 个回答

相关问题