命令 ls | grep 只显示目录（当它也应该显示文件时）

Question

guest

Asked: 2022-02-19 12:02:08 +0800 CST2022-02-19 12:02:08 +0800 CST 2022-02-19 12:02:08 +0800 CST

grep 与文件中的许多模式并显示哪个模式与哪个文件匹配，而无需重新读取文件

772

我对 m 个文件有 n 个单独的非固定 grep。我只需要知道每个文件中是否至少有 1 个匹配项，但每个模式都需要这个。我目前制作了 n 个单独的 grep，以便以后可以将它们全部合并，但它非常慢并且有些文件很大。

有没有办法替换这些不需要我读取所有文件 n 次（只要我可以将模式（不匹配）匹配到包含匹配项的文件，就不需要是单独的文件）。grep -f 看起来很有希望，但它显示匹配任何模式的文件，而不是匹配每个模式的文件。

稍后合并到 1 个大文件中的内容：

grep -liE  pattern1  file_glob* > temp_pattern1.txt && sed s/^/escapedpattern1 / temp_pattern1.txt
grep -liE  pattern2   file_glob* > temp_pattern2.txt && sed s/^/escapedpattern2 / temp_pattern2.txt
...
grep -liE  patternN   file_glob* > temp_patternN.txt && sed s/^/escapedpatternN / temp_patternN.txt

temp_pattern1.txt
pattern1 /path/to/file1
pattern1 /path/to/file2
pattern1 /path/to/file3

temp_pattern2.txt
pattern2 /path/to/file1
pattern2 /path/to/file3
...
temp_patternN.txt
pattern N /path/to/fileM

1 个回答

Voted

cas · Answer 1 · 2022-02-19T20:31:51+08:00

如果你想使用grep，你能做的最好的就是使用在第一次匹配时停止读取当前输入文件的-m 1选项。grep您仍然会多次读取每个输入文件（每个模式一次），但它应该更快（除非匹配在文件的最后一行或附近）。

例如

#!/bin/bash

# Speed up each grep by exiting on 1st match with -m 1
#
# This still reads each file multiple times, but should run faster because it
# won't read the entire file each time unless the match is on the last line.
#
# Also reduce repetitive code by using an array and a for loop iterating over
# the indices of the array, rather than the values

patterns=(pattern1 pattern2 pattern3 patternN)

# iterate over the indices of the array (with `${!`), not the values.
for p in "${!patterns[@]}"; do
  # escape forward- and back- slashes in pattern
  esc=$(echo "${patterns[$p]}" | sed -e 's:/:\\/:g; s:\\:\\\\:g')
  grep -liE -m 1 "${patterns[$p]}" file_glob* |
    sed -e "s/^/$esc\t/" > "temp_pattern$(($p+1)).txt"
done

注意：$p+1存在是因为 bash 数组从零开始。+1 使 temp_patterns 文件从 1 开始。

如果您使用像或之类的脚本语言，您可以做您想做的事。例如，下面的 perl 脚本只读取每个输入文件一次，并针对尚未在该文件中看到的每个模式检查每一行。它跟踪已经在特定文件中看到的模式（使用数组），并且还注意到何时在文件中看到了所有可用模式（也使用）并在这种情况下关闭当前文件。awkperl@seen@seen

#!/usr/bin/perl
use strict;

# array to hold the patterns
my @patterns = qw(pattern1 pattern2 pattern3 patternN);

# Array-of-Arrays (AoA, see man pages for perllol and perldsc)
# to hold matches
my @matches;

# Array for keeping track of whether current pattern has
# been seen already in current file
my @seen;

# read each line of each file
while(<>) {
  # check each line against all patterns that haven't been seen yet
  for my $i (keys @patterns) {
    next if $seen[$i];
    if (m/$patterns[$i]/i) {
      # add the pattern and the filename to the @matches AoA
      push @{ $matches[$i] }, "$patterns[$i]\t$ARGV";
      $seen[$i] = 1;
    }
  };

  # handle end-of-file AND having seen all patterns in a file
  if (eof || $#seen == $#patterns) {
    #print "closing $ARGV on line $.\n" unless eof;
    # close the current input file.  This will have
    # the effect of skipping to the next file.
    close(ARGV);
    # reset seen array at the end of every input file
    @seen = ();
  };
}

# now create output files
for my $i (keys @patterns) {
  #next unless @{ $matches[$i] }; # skip patterns with no matches
  my $outfile = "temp_pattern" . ($i+1) . ".txt";
  open(my $out,">",$outfile) || die "Couldn't open output file '$outfile' for write: $!\n";
  print $out join("\n", @{ $matches[$i] }), "\n";
  close($out);
}

该if (eof || $#seen == $#patterns)行测试当前文件上的 eof（文件结尾）或者我们是否已经看到当前文件中的所有可用模式（即，@seen 中的元素数是否等于 @patterns 中的元素数）。

在这两种情况下，我们都希望将 @seen 数组重置为空，以便为下一个输入文件做好准备。

在后一种情况下，我们还想提前关闭当前输入文件——我们已经看到了我们想要在其中看到的所有内容，无需继续读取和处理文件的其余部分。

顺便说一句，如果您不希望创建空文件（即当模式不匹配时），请取消注释next unless @{ $matches[$i] }输出 for 循环中的行。

如果您不需要或不需要临时文件，并且只想将所有匹配项输出到一个文件，请将最终输出 for 循环替换为：

for my $i (keys @patterns) {
  #next unless @{ $matches[$i] }; # skip patterns with no matches
  print join("\n", @{ $matches[$i] }), "\n";
}

并将输出重定向到文件。

顺便说一句，如果要添加模式在文件中首次出现的行号，请更改：

push @{ $matches[$i] }, "$patterns[$i]\t$ARGV";

至

push @{ $matches[$i] }, "$patterns[$i]\t$.\t$ARGV";

$.是一个内置的 perl 变量，它保存输入的当前行号<>。ARGV只要当前文件 ( ) 关闭，它就会重置为零。

grep 与文件中的许多模式并显示哪个模式与哪个文件匹配，而无需重新读取文件

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

grep 与文件中的许多模式并显示哪个模式与哪个文件匹配，而无需重新读取文件

1 个回答

相关问题