正则表达式：匹配直到“，”，但如果“，”在括号内则不匹配

Question

JH Park

Asked: 2025-04-08 05:38:53 +0800 CST2025-04-08 05:38:53 +0800 CST 2025-04-08 05:38:53 +0800 CST

使用 sed 和正则表达式从文件中提取字符串

772

我想询问如何使用 sed 和正则表达式从文件中提取特定字符串。

以下是输入文本文件（testfile.txt）的示例：

# This file contains a short description of the columns in the
# meta-analysis summary file, named '/some/output/directory/result.txt'

# (Skipping some comment lines...)

# Input for this meta-analysis was stored in the files:
# --> Input File 1 : /some/input/directory/cohort1/dataset1_chrAll.regenie.txt
# --> Input File 2 : /some/input/directory/cohort2/subdir1/chrAll-out.txt
# --> Input File 3 : /some/input/directory/cohort2/subdir2/chrAll-out_ver2.txt
# --> Input File 4 : /some/input/directory/cohort3/resfile.txt
# --> Input File 5 : /some/input/directory/cohort4/regenie_res_chrAll.txt

从这个文件，我想提取输入文件名的列表，因此结果应该是这样的：

/some/input/directory/cohort1/dataset1_chrAll.regenie.txt
/some/input/directory/cohort2/subdir1/chrAll-out.txt
/some/input/directory/cohort2/subdir2/chrAll-out_ver2.txt
/some/input/directory/cohort3/resfile.txt
/some/input/directory/cohort4/regenie_res_chrAll.txt

以下是我尝试过的：

尝试 1

这是我使用的初始命令。

cat testfile.txt | sed -e 's/\/some\/input\/directory\/([A-z0-9\/\.\-]*)\.txt/$1/g'

结果：

sed: -e expression #1, char 55: Invalid range end

尝试 2

经过一番搜索后，我尝试使用反斜杠转义括号。

cat testfile.txt | sed -e 's/\/some\/input\/directory\/\([A-z0-9\/\.\-]*\).txt/$1/g'

结果：

sed: -e expression #1, char 56: Invalid range end

所以它并没有解决问题。

尝试 3

我也尝试过转义括号。

cat testfile.txt | sed -e 's/\/some\/input\/directory\/\(\[A-z0-9\/\.\-\]\*\)\.txt/$1/g'

结果：

# This file contains a short description of the columns in the
# meta-analysis summary file, named '/some/output/directory/result.txt'

# (Skipping some comment lines...)

# Input for this meta-analysis was stored in the files:
# --> Input File 1 : /some/input/directory/cohort1/dataset1_chrAll.regenie.txt
# --> Input File 2 : /some/input/directory/cohort2/subdir1/chrAll-out.txt
# --> Input File 3 : /some/input/directory/cohort2/subdir2/chrAll-out_ver2.txt
# --> Input File 4 : /some/input/directory/cohort3/resfile.txt
# --> Input File 5 : /some/input/directory/cohort4/regenie_res_chrAll.txt

这并没有引发错误，但这不是我想要的。

尝试 4

最后，我尝试添加 -r 选项，但不转义括号或方括号。

cat testfile.txt | sed -re 's/\/some\/input\/directory\/([A-z0-9\/\.\-]*)\.txt/$1/g'

结果：

sed: -e expression #1, char 55: Invalid range end

第一次尝试时显示相同的错误消息。

我想问我的命令行中存在什么问题以及是否有任何可能的解决方案。

谢谢。

5 个回答

Voted

potong · Answer 1 · 2025-04-08T13:17:17+08:00

potong

2025-04-08T13:17:17+08:002025-04-08T13:17:17+08:00

这可能对你有用（GNU sed）：

sed -n 's/^# --> Input File [[:digit:]]\+ : //p' file

使用命令行选项关闭隐式打印-n。

使用替换命令和模式匹配，查找以开头、# --> Input File 后跟一个或多个数字、后跟的行 : ，然后删除该部分并打印其余部分。

另一种选择：

sed -nE 's/^# --> Input File [0-9]+ : (.*)/\1/p' file

2

Gilles Quénot · Answer 2 · 2025-04-08T06:01:56+08:00

Best Answer

Gilles Quénot

2025-04-08T06:01:56+08:002025-04-08T06:01:56+08:00

我会怎么做：

$ grep -oP -- '--> .* \K(?:/[\w.-]+)+' file
/some/input/directory/cohort1/dataset1_chrAll.regenie.txt
/some/input/directory/cohort2/subdir1/chrAll-out.txt
/some/input/directory/cohort2/subdir2/chrAll-out_ver2.txt
/some/input/directory/cohort3/resfile.txt
/some/input/directory/cohort4/regenie_res_chrAll.txt

正则表达式匹配如下：

节点	解释
`-->`	'-->'
`.*`	除 \n 之外的任何字符（0 次或多次（匹配尽可能多的次数））
空间
`\K`	重置匹配的开始（`K`ept 是什么）作为使用后视断言的更短替代方法：环顾四周并支持正则表达式中的 \K
`(?:`	组，但不捕获（1 次或多次（匹配尽可能多的数量））：
`/`	/
`[\w.-]+`	任意字符：单词字符（az、AZ、0-9、_）、'.'、'-'（1 次或多次（匹配尽可能多的数量））
`)+`	分组结束

1

Gilles Quénot · Answer 3 · 2025-04-08T06:28:23+08:00

Gilles Quénot

2025-04-08T06:28:23+08:002025-04-08T06:28:23+08:00

和awk：

$ awk '/-->/{print $NF}' file
/some/input/directory/cohort1/dataset1_chrAll.regenie.txt
/some/input/directory/cohort2/subdir1/chrAll-out.txt
/some/input/directory/cohort2/subdir2/chrAll-out_ver2.txt
/some/input/directory/cohort3/resfile.txt
/some/input/directory/cohort4/regenie_res_chrAll.txt

1

Mark Setchell · Answer 4 · 2025-04-08T12:24:17+08:00

Mark Setchell

2025-04-08T12:24:17+08:002025-04-08T12:24:17+08:00

和sed：

sed -n '/Input File/s/.*: //p' YOURFILE
/some/input/directory/cohort1/dataset1_chrAll.regenie.txt
/some/input/directory/cohort2/subdir1/chrAll-out.txt
/some/input/directory/cohort2/subdir2/chrAll-out_ver2.txt
/some/input/directory/cohort3/resfile.txt
/some/input/directory/cohort4/regenie_res_chrAll.txt

意思是…… “运行sed但不打印任何内容，除非你看到包含的行Input File。如果看到，则将所有内容替换为冒号和空格，然后打印结果”

0

Alexey Melezhik · Answer 5 · 2025-04-08T13:39:14+08:00

Alexey Melezhik

2025-04-08T13:39:14+08:002025-04-08T13:39:14+08:00

使用 Raku/Sparrow，您可以采取增量方法，将复杂的正则表达式拆分为一系列简单的步骤（称为放大技术）

note: look up lines with file path

within: ^^ \s* \S+ \s+  "-->" \s+ (.*)
regexp: ^^ "Input File" \s (.*)
regexp: ^^ \d+ \s ":" \s (.*)
end:

code: <<RAKU
!raku
for captures()<> -> $c {
   say $c[0]
}
RAKU

0

使用 sed 和正则表达式从文件中提取字符串

尝试 1

尝试 2

尝试 3

尝试 4

正则表达式匹配如下：

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

使用 sed 和正则表达式从文件中提取字符串

尝试 1

尝试 2

尝试 3

尝试 4

5 个回答

正则表达式匹配如下：

相关问题