grep 从 $START 到 $END 的一组行并且在 $MIDDLE 中包含匹配项

Question

Thierry Blanc

Asked: 2025-02-06 18:15:46 +0800 CST2025-02-06 18:15:46 +0800 CST 2025-02-06 18:15:46 +0800 CST

删除包含嵌套括号的括号语句

772

一个典型的乳胶问题：

\SomeStyle{\otherstyle{this is the \textit{nested part} some more text...}}

现在我想删除所有\SomeStyle{...}内容但不删除内容。内容包含嵌套括号。上面的行应改为：

\otherstyle{this is the \textit{nested part} some more text...}

问题：

是否有任何 Latex 编辑器可以提供此方法？
什么编辑器/脚本可以做到这一点？
如何用 sed 来实现？[🤓]

我的解决方案是使用 sed 的 bash 脚本。

准备文本：用 ascii 铃声标记替换字符串，在每个括号后添加换行符
循环：查找 { -> 将 X 添加到保持空间，查找 } -> 从保持空间中移除 X，保持空间为空 -> 移除关闭 }
恢复换行符和 ascii 铃声到以前的

脚本可以运行但会失败： \badstyle{w}\badstyle{o}\badstyle{r}\badstyle{d} 它将变成： wo}rd}

分支到 :f 似乎不起作用。

F=$(sed 's|\\|\\\\|g;s|{|\\{|g' <<< "$1"  )

# mark all removestrings with ascii bell and newline
# add newline after each { and }  
SEDpre='
s|'"$F"'|\a%\n|g

s|\{|\{\n|g
s|\}|\}\n|g
'


SEDpost='
:a;N;$!ba;
s|\a%\n||g

s|\{\n|\{|g
s|\}\n|\}|g
'

# count the brackets
SED='
/\a%/{
:a
        n
:f
        /\{/{x;s|$|X|;x;ba}
        /\}/{x;
                s|X||;
                /^$/{x;bb}
                x
                ba
            }
}
b
:b  
/\}/{   
    s|\}||;
    N;
    s|\n||;
    /\a%/bf
     }
'

sed -r -E  "$SEDpre"  "$2"  | sed -rE "$SED"  | sed -rE "$SEDpost"

5 个回答

Voted

Stéphane Chazelas · Answer 1 · 2025-02-07T00:07:43+08:00

典型的方法是使用perl递归正则表达式功能：

perl -0777 -pe 's/\\SomeStyle(\{((?:(?1)|[^{}])*)\})/$2/gs' file.tex

或者如果你必须考虑括号转义为\{（并\转义为\\）¹

perl -0777 -pe 's/\\SomeStyle(\{((?:(?1)|\\.|[^\\{}])*+)\})/$2/gs' file.tex

其中，我们[^{}]*用替换(?:\\.|[^{}\\])*来匹配\anycharacter（包括\\、\{和\}我们在这里关心的）以及\、{、和之外的字符}。(?:...)是的非捕获形式(...)。

（添加-i编辑文件in-place 的选项）。

上面(?1)就像在第一对中插入正则表达式(...)，所以(\{((?:(?1)|\\.|[^\\{}])*+)\})在那一点上。

如果\SomeStyle{...}s 可以嵌套，如下所示：

\SomeStyle{\otherstyle{this is
the \SomeStyle{\textit{nested part} some} more text...}}

更改为：

\otherstyle{this is
the \textit{nested part} some more text...}

然后将其改为：

perl -0777 -pe '
  while(s/\\SomeStyle(\{((?:(?1)|\\.|[^\\{}])*+)\}){}' file.tex

这将重复该过程，先替换外部的，直到找到不匹配的。

对任意样式和文件执行此操作：

#! /bin/sh -
[ "$#" -ge 2 ] || {
  printf>&2 '%s\n' "Usage: $0 <style> <file> [<file>...]"
  exit 1
}
style=$1; shift
exec perl -0777 -spi -e '
  while(s/\\\Q$style\E(\{((?:(?1)|\\.|[^\\{}])*+)\})/$2/gs) {}
  ' -- -style="$style" -- "$@"

假设sed整个输入可以适合模式空间的实现，一种方法（也处理嵌套的，在这种情况下从内部的开始）可能是：

sed '
  :a
  $!{
    # slurp the whole input into the pattern space
    N; ba
  }
  # using _ as an escape character to escape { as _l and
  # } as _r below. So escape itself as _u first:
  s/_/_u/g
  :b
  # process the \SomeStyle{...}s that contain no unescaped {}:
  s/\\SomeStyle{\([^{}]*\)}/\1/g; tb
  # replace inner {...} to _l..._r and loop:
  s/{\([^{}]*\)}/_l\1_r/g; tb
  # undo escaping:
  s/_l/{/g; s/_r/}/g; s/_u/_/g' file.tex

（与在命令行中删除（可能嵌套的）文本引号以及此处的其他一些方法相同）。

一些sed实现已复制了 perl-i以进行就地编辑，但请注意，在某些情况下（FreeBSD 及其衍生产品），您需要-i ''进行就地编辑而无需备份原始文件。-i.back将在具有-i（和 perl 中）的所有实现中工作并将原始文件保存为file.tex.back。

您sed似乎是 GNU，sed因为您使用了相当多的 GNUisms，并且 GNUsed确实支持-ià la perl ，并且据我所知，除了可用内存之外，模式空间的大小没有限制。

为了解释转义为\{（和\转义为\\）¹的括号，您可以使用现在的标准-E选项（最好是 GNU 特定的-r）切换到具有|交替运算符的扩展正则表达式，但请注意，{然后也会成为正则表达式运算符，并且需要在外面时进行转义[...]，并且分组+捕获从更改$...$为(...)：

sed -E '
  :a
  $!{
    # slurp the whole input into the pattern space
    N; ba
  }
  # using _ as an escape character to escape { as _l and
  # } as _r below. So escape itself as _u first:
  s/_/_u/g
  :b
  # process the \SomeStyle{...} that contain no {}:
  s/\\SomeStyle\{((\\.|[^{}\\])*)\}/\1/g; tb
  # replace inner {...} to _l..._r and loop:
  s/\{((\\.|[^{}\\])*)\}/_l\1_r/g; tb
  # undo escaping:
  s/_l/{/g; s/_r/}/g; s/_u/_/g' file.tex

^{¹ 仍然忽略可能存在的可能性\\SomeStyle{something}，不处理注释或\verb|...|...覆盖这些并进行完整的 TeX 标记化是可能的，但可能不值得付出努力，具体取决于您的实际输入。}

jubilatious1 · Answer 2 · 2025-02-07T01:41:25+08:00

使用Raku（以前称为 Perl_6）

使用 Raku 的递归正则表达式符号匹配您想要的目标<~~>：

~$ raku -ne 'put join "", m:g/ \{ ~ \}  [ <( <-[{}]>* )> || <( <-[{}]>* <~~> <-[{}]>* )> ] /;'  file.tex

示例输入：

\SomeStyle{\otherstyle{this is the \textit{nested part} some more text...}}
\badstyle{w}\badstyle{o}\badstyle{r}\badstyle{d}

示例输出：

\otherstyle{this is the \textit{nested part} some more text...}
word

Raku 提供了一种新的 Regex 语法，有些人认为它更容易阅读。代码几乎逐字逐句地摘自 Raku 的Regex 文档页面。在这里，我们只需使用 Raku 的m///match 运算符，并使用:g命名参数将其设为全局：

\{ ~ \} <expression>表示嵌套结构的波浪符号语法，
表示<-[{}]>*自定义负字符类，包含除{}花括号之外的任何字符。ICYMI，<+[{}]>*或者更简单地<[{}]>*表示正字符类，
表示<~~>递归正则表达式，
<(在 Raku 中， …表示捕获标记)>。

要处理文件以纠正有问题的行并逐字输出无问题的行，请使用 Raku 的三元运算符：测试 ?? True !! False。

~$ raku -ne 'm:g/ \{  [ <( <-[{}]>* )> || <( <-[{}]>* <~~> <-[{}]>* )> ] \} / 
             ?? $/.join.put 
             !! $_.put;'   file.tex

不幸的是，目前上述所有代码示例都只是Style以逐行方式删除了顶层（以及相关括号），无论它Style是什么。我会努力纠正这种缺乏特异性的问题。

敏锐的观察者可能会注意到，上述所有答案都使用了 Raku 的m///匹配运算符。仅供参考，我确信有一种方法可以使用 Raku 的替换运算符（与 Raku 的...捕获标记s///结合使用）来实现这一点，但我想先发布这些匹配答案。<()>m///

meuh · Answer 3 · 2025-02-06T23:44:54+08:00

meuh

2025-02-06T23:44:54+08:002025-02-06T23:44:54+08:00

这是一个可能的sed机制。为简单起见，我们假设没有下划线字符，因此我们可以使用一个下划线字符作为标记。这就像您的 ASCII 铃铛。我们将标记插入行首，然后逐个字符地移动它，直到行尾。每次移动时，我们都会在行首{添加一个符号作为计数器。每次移动时，我们都从开头删除一个。如果我们没有更多符号，那么我们就平衡了括号，并且可以应用所需的替换，直到标记。+}++

如果该行以+already 开头，我们先在!!开头添加，然后在结尾删除它。

sed '
 s/^/!!_/
:a
 /_\(.\)/{
   s//\1_/
   /{_/{
     s/^/+/
   }
   /}_/{
     /^+/!{
       s/^/mismatch{}/
       b
     }
     s///
     /^!!/b b
   }
   b a
 }
 # flow through here if _ is at eol
:b
 # dummy t branch to clear so can detect if s done
 t c
:c
 s/\\SomeStyle{\(.*\)}_/\1_/
 s/\\badstyle{\(.*\)}_/\1_/
 # repeat to do globally on line
 t a
 s/^!!//
 s/_$//
'

4

Ed Morton · Answer 4 · 2025-02-07T19:12:32+08:00

使用任何 awk：

$ cat tst.awk
{
    while ( match($0, /\\SomeStyle\{/) ) {
        head = substr($0,1,RSTART-1)
        tail = substr($0,RSTART+RLENGTH-1)

        gsub(/@/, "@A", tail)
        gsub(/</, "@B", tail)
        gsub(/>/, "@C", tail)

        while ( match(tail, /\{[^{}]*}/) ) {
            if ( RSTART == 1 ) {
                tail = substr(tail,2,RLENGTH-2) substr(tail,RLENGTH+1)
                gsub(/</, "{", tail)
                gsub(/>/, "}", tail)
                break
            }
            tail = substr(tail,1,RSTART-1) "<" substr(tail,RSTART+1,RLENGTH-2) ">" substr(tail,RSTART+RLENGTH)
        }

        gsub(/@C/, ">", tail)
        gsub(/@B/, "<", tail)
        gsub(/@A/, "@", tail)

        $0 = head tail
    }
}
{ print }

$ awk -f tst.awk file
\otherstyle{this is the \textit{nested part} some more text...}

上面的代码并没有尝试处理输入中的转义{或}，因为它需要将\{（转义{）与\\{（转义\后跟{）区别对待，而这需要比我愿意投入的更多的思考，因为它没有出现在示例输入中，所以可能实际上不是OP的问题，如果是的话，他们总是可以问一个后续问题，而他们还没有办法处理它。

更新：在与@StéphaneChazelas在他的答案下的评论中进行讨论后，我相信您只需要在用于处理转义的或输入的正则表达式中替换[^{}]为。(\\.|[^{}\\])match(){}

它假定每个\SomeStyle{或确实{有一个匹配的}。

这是上述内容的注释版本，因为乍一看可能不太清楚它在做什么：

{
    # Work on lines that include \SomeStyle{, and find where that starts
    # We do this in a loop in case there are multiple such strings in a line.
    while ( match($0, /\\SomeStyle\{/) ) {
        # Save the part before \SomeStyle{ as-is in "head" so we can add it back later
        head = substr($0,1,RSTART-1)
        # Save the part starting from the { in \SomeStyle{ in "tail" for further processing
        tail = substr($0,RSTART+RLENGTH-1)

        # Convert every @ to @A so we can then convert every < to @B and > to @C
        # so we can later change every { in the nested string to < and } to >
        # so on every loop iteration we get rid of the innermost matched { and }
        gsub(/@/, "@A", tail)
        gsub(/</, "@B", tail)
        gsub(/>/, "@C", tail)

        # Loop finding every innermost {...} substring, replacing the { and } with
        # < and > as we go to catch longer and longer nested substrings on each
        # iteration, stopping when RSTART is 1 because thats the start of the
        # outermost {...}. Add "print tail" after "tail = ..." to see what it is
        # doing if its not obvious.
        while ( match(tail, /\{[^{}]*}/) ) {
            if ( RSTART == 1 ) {
                # Weve found the outermost {...} so stop looping and remove the
                # outermost { and }
                tail = substr(tail,2,RLENGTH-2) substr(tail,RLENGTH+1)
                # Change all < and > chars we inserted back to their original characters
                gsub(/</, "{", tail)
                gsub(/>/, "}", tail)
                break
            }
            # Change {...{foo}...} to {...<foo>...}
            tail = substr(tail,1,RSTART-1) "<" substr(tail,RSTART+1,RLENGTH-2) ">" substr(tail,RSTART+RLENGTH)
        }

        # Change all pre-loop replacement strings back to their original characters
        gsub(/@C/, ">", tail)
        gsub(/@B/, "<", tail)
        gsub(/@A/, "@", tail)

        # Reconstruct $0 from the part before \SomeStyle{ and the processed part
        $0 = head tail
    }
}
{ print }

Thierry Blanc · Answer 5 · 2025-02-07T15:44:24+08:00

该脚本检查替换字符串输入是否正确（无括号），检查目标文件中的转义括号（{,}）并创建备份。

sed 方法/perl 方法：

#!/bin/bash

#
# Remove braces statement but keep content
#

TMP="$2.bk"
TMP1=$(mktemp) || { echo "Failed to create temp file" >&2; exit 1; }
TMP2=$(mktemp) || { echo "Failed to create temp file" >&2; exit 1; }

# Ensure cleanup of temp files on exit
trap 'rm -f "$TMP1" "$TMP2"' EXIT

if [[ $# -ne 2 ]]; then
    echo "usage: ${0##*/} '<replace>' <file>" >&2
    echo "Example: ${0##*/} '\\textemphasis' myfile.tex will replace \\textemphasis{<1>} with <1>" >&2
    exit 1
fi

# Check for braces in replacement string
if grep -E '[{}]' <<< "$1"; then
    echo "Error: '$1' contains { or }, which is not allowed." >&2
    exit 1
fi

# Escape characters in replace string
F=$(sed 's|\\|\\\\|g; s|{|\\{|g' <<< "$1")

# Preserve escaped braces
sed 's|\\{|\ao|g; s|\\}|\ac|g' "$2" > "$TMP1"

# Backup original file
cp -- "$2" "$TMP" || { echo "Error: Backup failed." >&2; exit 1; }

# Process file
sed '
  :a; $!{N; ba}
  s/_/_u/g
  :b
  s/'"$F"'{\([^{}]*\)}/\1/g; tb
  s/{\([^{}]*\)}/_l\1_r/g; tb
  s/_l/{/g; s/_r/}/g; s/_u/_/g' "$TMP1" > "$TMP2"

# or perl 
# perl -0777 -pe 's/'"$F"'(\{((?:(?1)|[^{}])*)\})/$2/gs' $TMP1 > $TMP2
#

# Restore escaped braces
sed 's|\ao|\\{|g; s|\ac|\\}|g' "$TMP2" > "$2"

删除包含嵌套括号的括号语句

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

删除包含嵌套括号的括号语句

5 个回答

相关问题