AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / ubuntu / 问题 / 1127821
Accepted
Arronical
Arronical
Asked: 2019-03-23 04:51:27 +0800 CST2019-03-23 04:51:27 +0800 CST 2019-03-23 04:51:27 +0800 CST

文本处理 Aptly 输出文件

  • 772

我有一个由存储库管理工具的输出生成的文本文件aptly,其中列出了我发布的存储库,我需要从中提取信息。

文件格式如下:

Published repositories:
 * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
 * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
 * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...

输出的最后一行以新行结束。

“已发布的存储库:”行不是必需的。

对于以“*”开头的每一行,我需要删除无关信息,只留下快照名称。没有办法做到这一点aptly。这些行中的第一行的所需输出是。

test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]

方括号也不是必需的,因此保留或删除它们的解决方案很好。我更喜欢sedorawk解决方案,但任何有效的东西都会受到高度赞赏。

sed text-processing awk
  • 3 3 个回答
  • 181 Views

3 个回答

  • Voted
  1. WinEunuuchs2Unix
    2019-07-02T08:31:39+08:002019-07-02T08:31:39+08:00

    两个答案合二为一

    我在这里发布了两个答案:

    • 一个希望更容易理解的 bash 脚本
    • 使用通用 Linux 实用程序的单线器grep,sed以及cut

    Bash 脚本在运行中的样子

    我已关闭 gnome-terminal 换行以使输入和输出文件更易于阅读。

    ───────────────────────────────────────────────────────────────────────────────────────────
    rick@alien:~/askubuntu$ tput rmam # Turn off line wrap
    ───────────────────────────────────────────────────────────────────────────────────────────
    rick@alien:~/askubuntu$ cat aptfilein
    Published repositories:
     * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirr}
     * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: }
     * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirr}
    ...
    ───────────────────────────────────────────────────────────────────────────────────────────
    rick@alien:~/askubuntu$ time aptfileparse.sh
    5 lines read from aptfilein
    3 lines written to aptfileout
    
    real    0m0.025s
    user    0m0.016s
    sys     0m0.004s
    ───────────────────────────────────────────────────────────────────────────────────────────
    rick@alien:~/askubuntu$ cat aptfileout
     test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_201]
     test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190]
     test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_201]
    ───────────────────────────────────────────────────────────────────────────────────────────
    rick@alien:~/askubuntu$ 
    

    实际的 Bash 脚本

    请记住使脚本可执行chmod a+x script.sh

    #!/bin/bash
    
    # NAME: aptfileparse.sh
    # PATH: ~/askubuntu
    # DESC: Parse Apt File giving new lines.
    # DATE: July 1, 2019.
    # NOTE: For: https://askubuntu.com/questions/1127821/text-processing-aptly-output-file
    #       Program would be ~10 lines shorter (but harder to read) with arrays.
    
    : <<'END'
    /* -----------------------------------------------------------------------------
    
    INPUT FILE LAYOUT
    =================
    
    Published repositories:
     * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
     * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
     * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
    ...
    
    OUTPUT FILE LAYOUT
    ==================
    
     test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
    
    Five fields to extract: name, main, multiverse, restricted, universe
    
    ----------------------------------------------------------------------------- */
    END
    
     INPUT="aptfilein"
    OUTPUT="aptfileout"
    
    > "$OUTPUT" # Erase previous output file
    
    # Read all input lines
    while IFS= read -r line ; do
    
        let CountIn++
        ! [[ "$line" =~ " *" ]] && continue     # skip lines not starting " *"
        # Get name
        line="${line#" * "}"                    # remove leading " * "
        lout="${line%%" "*}"                    # name is up to next " "
        line="${line#" "*}"                     # remove name from line
        # Get main
        line="${line#*"{main: "}"               # remove leading "{main: "
        lout="$lout ${line%%":"*}"              # main is up to next ":"
        line="${line#":"*}"                     # remove name from line
        # Get multiverse
        line="${line#*"{multiverse: "}"         # remove leading "{multiverse: "
        lout="$lout ${line%%":"*}"              # maultiverse is up to next ":"
        line="${line#":"*}"                     # remove multiverse from line
        # Get restricted
        line="${line#*"{restricted: "}"         # remove leading "{restricted: "
        lout="$lout ${line%%":"*}"              # restricted is up to next ":"
        line="${line#":"*}"                     # remove restricted from line
        # Get universe
        line="${line#*"{universe: "}"           # remove leading "{universe: "
        lout="$lout ${line%%":"*}"              # universe is up to next ":"
        line="${line#":"*}"                     # remove universe from line
    
        # Append line to output file with leading space
        echo " $lout" >> "$OUTPUT"
        let CountOut++
    
    done < "$INPUT"
    
    echo  "$CountIn lines read from $INPUT"
    echo "$CountOut lines written to $OUTPUT"
    

    具有通用实用程序的单线

    One-liners 在 Linux 社区中很受欢迎,并且在此问答中发布了一些出色awk的答案。perl这是一个使用最有经验的命令行用户熟悉的常用实用程序的示例:

    $ time grep ^" \*" aptfilein | sed 's/ \* //;s/ /: /;s/^/ /' | cut -d':' -f1,3,6,9,12 --output-delimiter=''
     test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
     test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
     test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]
    
    real    0m0.011s
    user    0m0.003s
    sys     0m0.008s
    
    • grep ^" \*" aptfilein- 该grep命令选择包含搜索字符串的行。胡萝卜 ( ^) 表示字符串必须从行首开始。反斜杠 ( \) 表示星号/splat ( *) 将按字面意思理解,而不是用作选择所有内容的通配符。总之,此命令选择以in filegrep开头的所有行。 *aptfilein
    • sed是一个“流编辑器”,它可以编辑进入的行并更改它们并将它们传递出去。这里有三个sed变化's/ \* //;s/ /: /;s/^/ /'。更改在引号 ( ') 之间并由分号 ( ;) 分隔符划定(分隔)。他们在接下来的三点被打破。
    • s/ \* //- 搜索第一次出现*并将其更改为空。这将擦除*从每一行开始的。
    • s/ /: /- 搜索第一个空格并将其更改为冒号 ( :),后跟一个空格。这是将我们的第一个字段更改为键的必要条件。例如test_repo_one/xenial 变成test_repo_one/xenial: .
    • s/^/ /- 告诉sed在每行的开头插入一个空格。
    • cut -d':' -f1,3,6,9,12 --output-delimiter=''- 使用cut命令选择关键字段#1、3、6、9和12。关键字段由冒号分隔,如参数-d':'规定。通常输出字段的分隔符相同,但使用 --output-delimiter=''` 参数将其覆盖为 null。

    注意:单行比 bash 更快,后者在字符串处理方面更慢。

    • 3
  2. Best Answer
    terdon
    2019-07-02T14:37:01+08:002019-07-02T14:37:01+08:00

    Perl 方法:

    $ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*\[(.+?)\]/g); print "$n @k"' file 
    test_repo_one/xenial xenial-main_20190311 xenial-multiverse_20190311 xenial-restricted_20190311 xenial-universe_20190311
    test_repo_one/xenial-security xenial-security-main_20190311 xenial-security-multiverse_20190311 xenial-security-restricted_20190311 xenial-security-universe_20190311
    test_repo_two/trusty trusty-main_20190312 trusty-multiverse_20190312 trusty-restricted_20190312 trusty-universe_20190312
    

    解释

    • perl -lne: 逐行读取输入文件 ( -n),删除尾随换行符 ( ) 并运行每行-l给出的脚本。还会为每个调用添加一个-e换行符。-lprint
    • next unless /^\s*\*\s*(\S+)/;: 找到 repo 的名称,因此第一行的非空白字符 ( \S+) 以 0 个或多个空白字符 ( ^\s*) 开头,然后是*( \*),然后是 0 个或多个空白字符。之后最长的非空白区域就是我们想要的。如果此行与此正则表达式不匹配,next则会将我们移至下一行。
    • $n=$1: 将上述匹配项((\S+)括号中的$1)捕获的内容保存为$n.
    • @k=(/\{.+?:\s*\[(.+?)\]/g): 找出我们有 a {,任何其他字符,然后是 a :,然后是空格和 a 的所有情况,[并捕获介于 the[和 the之间的任何内容]。将所有匹配的字符串保存在数组中@k。
    • print "$n @k": 最后,从上面打印 repo 的名称、the$n和数组@k。

    如果您更喜欢包含方括号,您可以使用:

    $ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*(\[.+?\])/g); print "$n @k"' file 
    test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
    test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
    test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]
    
    • 3
  3. Ronny Blomme
    2019-07-03T06:17:22+08:002019-07-03T06:17:22+08:00

    我的 awk 方法:

    $ cat 1.txt 
    Published repositories:
     * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
     * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
     * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
    
    $ awk '$1=="*"{split ($0, a, /:/); print $2 a[2] a[5] a[8] a[11]}' 1.txt 
    test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
    test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
    test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]
    
    • 2

相关问题

  • 使用 sed 读取字符编码

  • 将引号添加到文件中的字符串

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何运行 .sh 脚本?

    • 16 个回答
  • Marko Smith

    如何安装 .tar.gz(或 .tar.bz2)文件?

    • 14 个回答
  • Marko Smith

    如何列出所有已安装的软件包

    • 24 个回答
  • Marko Smith

    无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗?

    • 25 个回答
  • Martin Hope
    Flimm 如何在没有 sudo 的情况下使用 docker? 2014-06-07 00:17:43 +0800 CST
  • Martin Hope
    Ivan 如何列出所有已安装的软件包 2010-12-17 18:08:49 +0800 CST
  • Martin Hope
    La Ode Adam Saputra 无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗? 2010-11-30 18:12:48 +0800 CST
  • Martin Hope
    David Barry 如何从命令行确定目录(文件夹)的总大小? 2010-08-06 10:20:23 +0800 CST
  • Martin Hope
    jfoucher “以下软件包已被保留:”为什么以及如何解决? 2010-08-01 13:59:22 +0800 CST
  • Martin Hope
    David Ashford 如何删除 PPA? 2010-07-30 01:09:42 +0800 CST

热门标签

10.10 10.04 gnome networking server command-line package-management software-recommendation sound xorg

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve