AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / ubuntu / 问题 / 568800
Accepted
αғsнιη
αғsнιη
Asked: 2015-01-05 09:13:10 +0800 CST2015-01-05 09:13:10 +0800 CST 2015-01-05 09:13:10 +0800 CST

如何获得特定单词恰好重复 N 次的行?

  • 772

对于这个给定的输入:

How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this

我想要这个输出:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

获取整行仅包含三个重复的“this”字样。(不区分大小写的匹配)

text-processing
  • 7 7 个回答
  • 3470 Views

7 个回答

  • Voted
  1. Best Answer
    muru
    2015-01-05T10:13:58+08:002015-01-05T10:13:58+08:00

    在中,不区分大小写地perl替换为自身并计算替换次数:this

    $ perl -ne 's/(this)/$1/ig == 3 && print' <<EOF
    How to get This line that this word repeated 3 times in THIS line?
    But not this line which is THIS word repeated 2 times.
    And I will get This line with this here and This one
    A test line with four this and This another THIS and last this
    EOF
    How to get This line that this word repeated 3 times in THIS line?
    And I will get This line with this here and This one
    

    改为使用匹配计数:

    perl -ne 'my $c = () = /this/ig; $c == 3 && print'
    

    如果你有 GNU awk,一个非常简单的方法:

    gawk -F'this' -v IGNORECASE=1 'NF == 4'
    

    字段数将比分隔符数多一个。

    • 13
  2. Jacob Vlijm
    2015-01-05T09:53:02+08:002015-01-05T09:53:02+08:00

    在 python 中,这将完成这项工作:

    #!/usr/bin/env python3
    
    s = """How to get This line that this word repeated 3 times in THIS line?
    But not this line which is THIS word repeated 2 times.
    And I will get This line with this here and This one
    A test line with four this and This another THIS and last this"""
    
    for line in s.splitlines():
        if line.lower().count("this") == 3:
            print(line)
    

    输出:

    How to get This line that this word repeated 3 times in THIS line?
    And I will get This line with this here and This one
    

    或者以文件作为参数从文件中读入:

    #!/usr/bin/env python3
    import sys
    
    file = sys.argv[1]
    
    with open(file) as src:
        lines = [line.strip() for line in src.readlines()]
    
    for line in lines:
        if line.lower().count("this") == 3:
            print(line)
    
    • 将脚本粘贴到一个空文件中,另存为find_3.py,通过命令运行:

      python3 /path/to/find_3.py <file_withlines>
      

    当然单词“this”可以替换为任何其他单词(或其他字符串或行部分),并且每行出现的次数可以设置为该行中的任何其他值:

        if line.lower().count("this") == 3:
    

    编辑

    如果文件很大(数十万/数百万行),下面的代码会更快;它每行读取文件而不是一次加载文件:

    #!/usr/bin/env python3
    import sys
    file = sys.argv[1]
    
    with open(file) as src:
        for line in src:
            if line.lower().count("this") == 3:
                print(line.strip())
    
    • 9
  3. Sri
    2015-01-05T10:54:08+08:002015-01-05T10:54:08+08:00

    假设你的源文件是 tmp.txt,

    grep -iv '.*this.*this.*this.*this' tmp.txt | grep -i '.*this.*this.*this.*'
    

    左边的 grep 输出所有在 tmp.txt 中没有出现 4 次或更多次不区分大小写的“this”的行。

    结果通过管道传输到右侧 grep,它输出在左侧 grep 结果中出现 3 次或更多次的所有行。

    更新:感谢@Muru,这是这个解决方案的更好版本,

    grep -Eiv '(.*this){4,}' tmp.txt | grep -Ei '(.*this){3}'
    

    将 4 替换为 n+1,将 3 替换为 n。

    • 9
  4. fedorqui
    2015-01-06T06:15:48+08:002015-01-06T06:15:48+08:00

    你可以玩一下awk这个:

    awk -F"this" 'BEGIN{IGNORECASE=1} NF==4' file
    

    这将返回:

    How to get This line that this word repeated 3 times in THIS line?
    And I will get This line with this here and This one
    

    解释

    • 我们所做的是为其this自身定义字段分隔符。这样,该行将具有与单词this出现次数一样多的 +1 字段。

    • 为了使其不区分大小写,我们使用IGNORECASE = 1. 请参阅参考资料:匹配中的区分大小写。

    • 然后,只需要说NF==4得到所有那些this正好有 3 次的行。不需要更多代码,因为{print $0}(即打印当前行)是awk表达式求值为 时的默认行为True。

    • 6
  5. xyz
    2015-01-05T10:03:38+08:002015-01-05T10:03:38+08:00

    假设这些行存储在一个名为的文件中FILE:

    while read line; do 
        if [ $(grep -oi "this" <<< "$line" | wc -w)  = 3 ]; then 
            echo "$line"; 
        fi  
    done  <FILE
    
    • 5
  6. Bohr
    2015-01-05T21:44:28+08:002015-01-05T21:44:28+08:00

    如果你在 Vim 中:

    g/./if len(split(getline('.'), 'this\c', 1)) == 4 | print | endif
    

    这只会打印匹配的行。

    • 4
  7. Sergiy Kolodyazhnyy
    2017-01-08T02:37:04+08:002017-01-08T02:37:04+08:00

    Ruby 单行解决方案:

    $ ruby -ne 'print $_ if $_.chomp.downcase.scan(/this/).count == 3' < input.txt                                    
    How to get This line that this word repeated 3 times in THIS line?
    And I will get This line with this here and This one
    

    以一种非常简单的方式工作:我们将文件重定向到 ruby​​ 的标准输入,ruby 从标准输入获取行,用chompand清理它downcase,并scan().count给我们一个子字符串的出现次数。

    • 0

相关问题

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何运行 .sh 脚本?

    • 16 个回答
  • Marko Smith

    如何安装 .tar.gz(或 .tar.bz2)文件?

    • 14 个回答
  • Marko Smith

    如何列出所有已安装的软件包

    • 24 个回答
  • Marko Smith

    无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗?

    • 25 个回答
  • Martin Hope
    Flimm 如何在没有 sudo 的情况下使用 docker? 2014-06-07 00:17:43 +0800 CST
  • Martin Hope
    Ivan 如何列出所有已安装的软件包 2010-12-17 18:08:49 +0800 CST
  • Martin Hope
    La Ode Adam Saputra 无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗? 2010-11-30 18:12:48 +0800 CST
  • Martin Hope
    David Barry 如何从命令行确定目录(文件夹)的总大小? 2010-08-06 10:20:23 +0800 CST
  • Martin Hope
    jfoucher “以下软件包已被保留:”为什么以及如何解决? 2010-08-01 13:59:22 +0800 CST
  • Martin Hope
    David Ashford 如何删除 PPA? 2010-07-30 01:09:42 +0800 CST

热门标签

10.10 10.04 gnome networking server command-line package-management software-recommendation sound xorg

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve