AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / unix / 问题 / 763620
Accepted
Smeterlink
Smeterlink
Asked: 2023-12-12 14:47:31 +0800 CST2023-12-12 14:47:31 +0800 CST 2023-12-12 14:47:31 +0800 CST

通过模式匹配将文件拆分为特定的输出文件名

  • 772

我有一个包含以下内容的文件:

# new file
text in file 1
# new file
text in file 2
# new file
text in file 3

这里的模式是# new file。

我没有将每个文件保存到 xx00、xx01 和 xx02,而是保存到特定文件:another file、file new、last one。

这3个文件存在于当前目录中,所以我想将它们作为数组提供,覆盖它们:

csplit -z infile '/# new file/' "${array[*]}"

可以直接提供数组

array=('another file' 'file new' 'last one')
echo ${array[*]}
another file file new last one

或者列出当前目录

array=($(find . -type f))
echo ${array[*]}
./another file ./file new ./last one

对此脚本的修改可能是解决方案:

awk -v file="1" -v occur="2" '
{
  print > (file".txt")
}
/^\$\$\$\$$/{
  count++
  if(count%occur==0){
    if(file){
      close(file".txt")
      ++file
    }
  }
}
'  Input_file
csplit
  • 2 2 个回答
  • 75 Views

2 个回答

  • Voted
  1. Chris Davies
    2023-12-12T18:48:34+08:002023-12-12T18:48:34+08:00

    我仍然会考虑使用csplit,但随后重命名生成的文件。

    #!/bin/sh
    mkdir ".tmp.$$" || exit 2
    csplit -f ".tmp.$$/tmp_" -zk -n 4 "$1" '/# new file/' '{*}'
    
    for file in ".tmp.$$"/tmp_*
    do
        shift
        mv -f "$file" "$1"
    done
    if ! rmdir ".tmp.$$" 2>/dev/null
    then
        echo "Warning: not all file parts were assigned" >&2
        rm -rf ".tmp.$$"
        exit 1
    fi
    exit 0
    

    用法

    mysplit <source_file> <target_names...>
    
    • 1
  2. Best Answer
    Smeterlink
    2023-12-13T17:21:23+08:002023-12-13T17:21:23+08:00

    即使在文本文件和文件名中包含空格和非 ascii 字符,也可以使用此方法,而无需使用临时文件:

    infile:

    # new file
    text in file1
    
    blabla
    # new file
    text in file2
    # new file
    text in file3
    
    $//*+\
    
    s
    # new file
    4!
    aaaaaaaaa
    i^
    # new file
    
    #¬}}{][|\~@
    

    必须为 awk 命令提供文件名作为单独的参数,并使用单引号,这样 shell 就不会展开(双引号),在此split.sh脚本中:

    awk -v file="0" '
      BEGIN { 
        print "AWK arguments:"
        for (i = 0; i < ARGC; i++){
        ARRAY[i] = ARGV[i]
        print "\047"ARRAY[i]"\047"
        if (i > 1){
          ARGV[i] = ""
        }
      }
      print "Writing:"
    }
    !/^# new file$/{
      print "writing to: " "\047"ARRAY[file+1]"\047"
      print $0 >> ARRAY[file+1]
    }
    /^# new file$/{
      close(file)
      ++file
      print "writing to: " "\047"ARRAY[file+1]"\047"
      print $0 > ARRAY[file+1]
    }
    ' 'infile' '1.txt' '2.txt' '3.txt' 'file $_%.txt' '&file  _.txt'
    

    控制台看起来像这样:

    AWK arguments:
    'awk'
    'infile'
    '1.txt'
    '2.txt'
    '3.txt'
    'file $_%.txt'
    '&file  _.txt'
    Writing:
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: 'file $_%.txt'
    writing to: 'file $_%.txt'
    writing to: 'file $_%.txt'
    writing to: 'file $_%.txt'
    writing to: '&file  _.txt'
    writing to: '&file  _.txt'
    writing to: '&file  _.txt'
    

    如果参数作为另一个命令的输出传递(文件必须先前存在于文件系统中):

    ' $(ls infile | tr '\n' ' ' ; ls *.txt)
    

    它用空格分割参数:

    AWK arguments:
    'awk'
    'infile'
    '&file'
    '_.txt'
    '1.txt'
    '2.txt'
    '3.txt'
    '_.txt'
    'file'
    '$_%.txt'
    Writing:
    writing to: '&file'
    writing to: '&file'
    writing to: '&file'
    writing to: '&file'
    writing to: '_.txt'
    writing to: '_.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    

    为了解决这个问题,将参数作为数组传递给 awk,用换行符而不是空格分隔,使用以下split.sh脚本:

    array=(infile *.txt)
    awk -v file="0" '
      BEGIN { 
        print "AWK arguments:"
        for (i = 0; i < ARGC; i++){
        ARRAY[i] = ARGV[i]
        print "\047"ARRAY[i]"\047"
        if (i > 1){
          ARGV[i] = ""
        }
      }
      print "Writing:"
    }
    !/^# new file$/{
      print "writing to: " "\047"ARRAY[file+1]"\047"
      print $0 >> ARRAY[file+1]
    }
    /^# new file$/{
      close(file)
      ++file
      print "writing to: " "\047"ARRAY[file+1]"\047"
      print $0 > ARRAY[file+1]
    }
    ' "${array[@]}"
    

    现在结果是:

    AWK arguments:
    'awk'
    'infile'
    '&file  _.txt'
    '1.txt'
    '2.txt'
    '3.txt'
    'file $_%.txt'
    Writing:
    writing to: '&file  _.txt'
    writing to: '&file  _.txt'
    writing to: '&file  _.txt'
    writing to: '&file  _.txt'
    writing to: '1.txt'
    writing to: '1.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '2.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: '3.txt'
    writing to: 'file $_%.txt'
    writing to: 'file $_%.txt'
    writing to: 'file $_%.txt'
    

    要写入的文件数量必须至少与将执行的拆分数量相同。如果多了,其余的将被忽略。

    • 0

相关问题

  • 使用csplit根据正则表达式将一个文件拆分为多个文件

  • 使用 csplit (或类似工具)将文件拆分为 n 个文件

  • 根据匹配模式旁边的值拆分文件

  • csplit 的替代方法 - 在模式之后拆分

  • csplit 无法识别提供的正则表达式

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    模块 i915 可能缺少固件 /lib/firmware/i915/*

    • 3 个回答
  • Marko Smith

    无法获取 jessie backports 存储库

    • 4 个回答
  • Marko Smith

    如何将 GPG 私钥和公钥导出到文件

    • 4 个回答
  • Marko Smith

    我们如何运行存储在变量中的命令?

    • 5 个回答
  • Marko Smith

    如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域?

    • 3 个回答
  • Marko Smith

    dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

    • 2 个回答
  • Marko Smith

    如何从 systemctl 服务日志中查看最新的 x 行

    • 5 个回答
  • Marko Smith

    Nano - 跳转到文件末尾

    • 8 个回答
  • Marko Smith

    grub 错误:你需要先加载内核

    • 4 个回答
  • Marko Smith

    如何下载软件包而不是使用 apt-get 命令安装它?

    • 7 个回答
  • Martin Hope
    user12345 无法获取 jessie backports 存储库 2019-03-27 04:39:28 +0800 CST
  • Martin Hope
    Carl 为什么大多数 systemd 示例都包含 WantedBy=multi-user.target? 2019-03-15 11:49:25 +0800 CST
  • Martin Hope
    rocky 如何将 GPG 私钥和公钥导出到文件 2018-11-16 05:36:15 +0800 CST
  • Martin Hope
    Evan Carroll systemctl 状态显示:“状态:降级” 2018-06-03 18:48:17 +0800 CST
  • Martin Hope
    Tim 我们如何运行存储在变量中的命令? 2018-05-21 04:46:29 +0800 CST
  • Martin Hope
    Ankur S 为什么 /dev/null 是一个文件?为什么它的功能不作为一个简单的程序来实现? 2018-04-17 07:28:04 +0800 CST
  • Martin Hope
    user3191334 如何从 systemctl 服务日志中查看最新的 x 行 2018-02-07 00:14:16 +0800 CST
  • Martin Hope
    Marko Pacak Nano - 跳转到文件末尾 2018-02-01 01:53:03 +0800 CST
  • Martin Hope
    Kidburla 为什么真假这么大? 2018-01-26 12:14:47 +0800 CST
  • Martin Hope
    Christos Baziotis 在一个巨大的(70GB)、一行、文本文件中替换字符串 2017-12-30 06:58:33 +0800 CST

热门标签

linux bash debian shell-script text-processing ubuntu centos shell awk ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve