如何从命令行仅安装安全更新？关于如何管理更新的一些提示

Question

Insideup

Asked: 2020-10-08 07:46:54 +0800 CST2020-10-08 07:46:54 +0800 CST 2020-10-08 07:46:54 +0800 CST

带有 RS 的 AWK 与模式不匹配（再次询问，因为我不小心将其标记为已解决。这次更好的解释。）

772

我有一个 odt 文件，文本行之间有空行。我想搜索一个术语并输出与该术语匹配的整组文本。我的做法是说odt文件中的空行是记录分隔符。Odt 文件是 zip 档案，其文本包含在 content.xml 中。解压缩 odt 文件后，我使用 xmllint --format content.xml 插入换行符（如下所示），“空白”行实际上是 > 和 < 之间没有文本的行。所以我想将 RS 设置为在 > 和 < 之间没有文本的任何此类行。如果格式化后的 content.xml 文件如下：

<long line of alphanumerics, slashes, single and double quotes><more or the same><and many more>
      <office:text>
      <text:sequence-decls>
        <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Table"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Text"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Figure"/>
      </text:sequence-decls>
      <text:p text:style-name="P1">This is the first line</text:p>
      <text:p text:style-name="P1"/>
      <text:p text:style-name="P1">This is the third line</text:p>
      <text:p text:style-name="P1">and this is some more text that is to be included</text:p>
      <text:p text:style-name="P1"/>
      <text:p text:style-name="P1">This is the sixth. I want it included,</text:p>
      <text:p text:style-name="P1">with this line</text:p>
      <text:p text:style-name="P1">and this one</text:p>
    </office:text>

和代码是

$ awk '/line/' RS='\n[ \t]*<[^>]*>\n' file.xml

整个文件被输出。但我只想要：

      <text:p text:style-name="P1">This is the first line</text:p>
      <text:p text:style-name="P1">This is the third line</text:p>
      <text:p text:style-name="P1">and this is some more text that is to be included</text:p>
      <text:p text:style-name="P1">This is the sixth. I want it included,</text:p>
      <text:p text:style-name="P1">with this line</text:p>
      <text:p text:style-name="P1">and this one</text:p>

2 个回答

Voted

steeldriver · Answer 1 · 2020-10-08T12:50:41+08:00

你的方法充满了问题。最重要的是，没有明显的方法可以将正则表达式匹配限制为文档的正文 -/line/例如，这将匹配诸如<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>

（您的正则表达式消耗两个换行符也存在问题RS，这将阻止它正确处理相邻的分隔符；RS='\n([ \t]*<[^>]*>\n)+' 可能会解决这个问题，但我不会保证）。

相反，我建议先提取文档的正文，然后在“传统”段落模式下应用 awk（即使用空记录分隔符）：

xmlstarlet sel -t -v "//office:body/office:text/text:p" -n content.xml | 
  awk -v RS= '/line/{print $0 ORS}'

或使用 GNU awk，保留解析后的实际记录分隔符：

xmlstarlet sel -t -v "//office:body/office:text/text:p" -n content.xml | 
  gawk -v RS= '/line/{printf $0 RT}'

您甚至可以完全省略中间文件，将标准输出从unzip -p：

unzip -p somefile.odt content.xml | 
  xmlstarlet sel -t -v "//office:body/office:text/text:p" -n - | gawk -v RS= '/line/{printf $0 RT}'

Insideup · Answer 2 · 2020-10-13T12:35:50+08:00

Best Answer

Insideup

2020-10-13T12:35:50+08:002020-10-13T12:35:50+08:00

在使用 awk 之前，我根据 steeldriver 的灵感回答了我自己的问题，修改了文件：

sed '/>.*</! s/.*/---/' test.txt > modfile.txt  # overwrites lines matching the pattern with what I will name as the record separator, “---”

然后我能够提取 $searchterm 匹配项的整个记录

awk "/$searchterm/" RS="---" modfile.txt > results.txt

0

带有 RS 的 AWK 与模式不匹配（再次询问，因为我不小心将其标记为已解决。这次更好的解释。）

如何运行 .sh 脚本？

如何安装 .tar.gz（或 .tar.bz2）文件？

如何列出所有已安装的软件包

无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗？

带有 RS 的 AWK 与模式不匹配（再次询问，因为我不小心将其标记为已解决。这次更好的解释。）

2 个回答

相关问题