使用csplit根据正则表达式将一个文件拆分为多个文件

Question

Smeterlink

Asked: 2023-12-12 14:47:31 +0800 CST2023-12-12 14:47:31 +0800 CST 2023-12-12 14:47:31 +0800 CST

通过模式匹配将文件拆分为特定的输出文件名

772

我有一个包含以下内容的文件：

# new file
text in file 1
# new file
text in file 2
# new file
text in file 3

这里的模式是# new file。

我没有将每个文件保存到 xx00、xx01 和 xx02，而是保存到特定文件：another file、file new、last one。

这3个文件存在于当前目录中，所以我想将它们作为数组提供，覆盖它们：

csplit -z infile '/# new file/' "${array[*]}"

可以直接提供数组

array=('another file' 'file new' 'last one')
echo ${array[*]}
another file file new last one

或者列出当前目录

array=($(find . -type f))
echo ${array[*]}
./another file ./file new ./last one

对此脚本的修改可能是解决方案：

awk -v file="1" -v occur="2" '
{
  print > (file".txt")
}
/^\$\$\$\$$/{
  count++
  if(count%occur==0){
    if(file){
      close(file".txt")
      ++file
    }
  }
}
'  Input_file

2 个回答

Voted

Chris Davies · Answer 1 · 2023-12-12T18:48:34+08:00

Chris Davies

2023-12-12T18:48:34+08:002023-12-12T18:48:34+08:00

我仍然会考虑使用csplit，但随后重命名生成的文件。

#!/bin/sh
mkdir ".tmp.$$" || exit 2
csplit -f ".tmp.$$/tmp_" -zk -n 4 "$1" '/# new file/' '{*}'

for file in ".tmp.$$"/tmp_*
do
    shift
    mv -f "$file" "$1"
done
if ! rmdir ".tmp.$$" 2>/dev/null
then
    echo "Warning: not all file parts were assigned" >&2
    rm -rf ".tmp.$$"
    exit 1
fi
exit 0

用法

mysplit <source_file> <target_names...>

1

Smeterlink · Answer 2 · 2023-12-13T17:21:23+08:00

即使在文本文件和文件名中包含空格和非 ascii 字符，也可以使用此方法，而无需使用临时文件：

infile:

# new file
text in file1

blabla
# new file
text in file2
# new file
text in file3

$//*+\

s
# new file
4!
aaaaaaaaa
i^
# new file

#¬}}{][|\~@

必须为 awk 命令提供文件名作为单独的参数，并使用单引号，这样 shell 就不会展开（双引号），在此split.sh脚本中：

awk -v file="0" '
  BEGIN { 
    print "AWK arguments:"
    for (i = 0; i < ARGC; i++){
    ARRAY[i] = ARGV[i]
    print "\047"ARRAY[i]"\047"
    if (i > 1){
      ARGV[i] = ""
    }
  }
  print "Writing:"
}
!/^# new file$/{
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 >> ARRAY[file+1]
}
/^# new file$/{
  close(file)
  ++file
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 > ARRAY[file+1]
}
' 'infile' '1.txt' '2.txt' '3.txt' 'file $_%.txt' '&file  _.txt'

控制台看起来像这样：

AWK arguments:
'awk'
'infile'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
'&file  _.txt'
Writing:
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'

如果参数作为另一个命令的输出传递（文件必须先前存在于文件系统中）：

' $(ls infile | tr '\n' ' ' ; ls *.txt)

它用空格分割参数：

AWK arguments:
'awk'
'infile'
'&file'
'_.txt'
'1.txt'
'2.txt'
'3.txt'
'_.txt'
'file'
'$_%.txt'
Writing:
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '_.txt'
writing to: '_.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'

为了解决这个问题，将参数作为数组传递给 awk，用换行符而不是空格分隔，使用以下split.sh脚本：

array=(infile *.txt)
awk -v file="0" '
  BEGIN { 
    print "AWK arguments:"
    for (i = 0; i < ARGC; i++){
    ARRAY[i] = ARGV[i]
    print "\047"ARRAY[i]"\047"
    if (i > 1){
      ARGV[i] = ""
    }
  }
  print "Writing:"
}
!/^# new file$/{
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 >> ARRAY[file+1]
}
/^# new file$/{
  close(file)
  ++file
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 > ARRAY[file+1]
}
' "${array[@]}"

现在结果是：

AWK arguments:
'awk'
'infile'
'&file  _.txt'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
Writing:
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'

要写入的文件数量必须至少与将执行的拆分数量相同。如果多了，其余的将被忽略。

通过模式匹配将文件拆分为特定的输出文件名

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

通过模式匹配将文件拆分为特定的输出文件名

2 个回答

相关问题