AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / unix / 问题 / 717674
Accepted
HippoMan
HippoMan
Asked: 2022-09-18 16:42:28 +0800 CST2022-09-18 16:42:28 +0800 CST 2022-09-18 16:42:28 +0800 CST

合并两个非基于 git 的文本文件,语义相似,描述 git 合并冲突

  • 772

我想git使用类似于git描述“合并冲突”的语义来合并两个非基于文本文件。

例如,假设我有两个内容相似但不相同的文本文件,名为file.1和file.2。我想将这两个文件合并到第三个文件中,如下所示:

hypothetical-merge-utility file.1 file.2 file.merged

我希望它产生file.merged,它将以类似于以下的方式列出文件内容和每个差异:

common line 1 ...
common line 2 ...
common line 3 ...
<<<<<<< file.1
something unique from file.1
a second line of something unique from file.1
======= file.2
something unique from file.2
>>>>>>> end of diff
common line 4 ...
common line 5 ...
<<<<<<< file.1
something unique from file.1
======= file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>> end of diff
common line 6 ...
common line 7 ...
... etc. ...

换句话说,我希望和之间的每个差异file.1看起来file.2类似于git“合并冲突”的表示。

我不在乎是否使用了 , 和 以外<<<<<<<<的========分隔符>>>>>>>>。

我知道有许多实用程序可用于在 linux 下合并文本文件。但是,我只是在寻找以类似于git描述“合并冲突”的方式专门呈现合并数据的东西。

有谁知道这样的实用程序?

先感谢您。

更新:根据 Ed Morton 的以下问题,这是两个测试文件的内容......

==== 文件.1 ====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...

==== 文件.2 ====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...
text-processing diff
  • 3 3 个回答
  • 48 Views

3 个回答

  • Voted
  1. HippoMan
    2022-09-18T17:08:17+08:002022-09-18T17:08:17+08:00

    注意:虽然我认为这是一个有点合理的“答案”,但我现在想出了另一个我认为更好的“答案”。所以请看下面我的另一个“答案”。

    这个“答案”的原始版本......

    哦!我在这里发布得太早了。我不知道-D命令行选项diff,现在我意识到我可以做到这一点......

    diff -D file.1 file.2 >file.merged
    

    它将产生以下内容file.merged...

    common line 1 ...
    common line 2 ...
    common line 3 ...
    #ifdef file.1
    something unique from file.1
    a second line of something unique from file.1
    #else /* file.1 */
    something unique from file.2
    #endif /* file.1 */
    common line 4 ...
    common line 5 ...
    #ifdef file.1
    something unique from file.1
    #else /* file.1 */
    something unique from file.2
    a second line of something unique from file.2
    #endif /* file.1 */
    common line 6 ...
    common line 7 ...
    ... etc. ...
    

    我可以像处理's #ifdef、#else和行一样处理、和行。#endifgit<<<<<<<<========>>>>>>>>

    更新:...我刚刚发现: https ://stackoverflow.com/questions/16902001/manually-merge-two-files-using-diff

    它展示了我如何也可以使用统一差异格式做类似的事情。给出diff一个-U参数很大的选项,该参数大于 和 中的最大行file.1数file.2。例如 ...

    diff -U 99999999 file.1 file.2 | tail -n +4 >file.merged
    

    然后它将产生这个:

     common line 1 ...
     common line 2 ...
     common line 3 ...
    +something unique from file.2
    -something unique from file.1
    -a second line of something unique from file.1
     common line 4 ...
     common line 5 ...
    +something unique from file.2
    +a second line of something unique from file.2
    -something unique from file.1
     common line 6 ...
     common line 7 ...
     ... etc. ...
    

    +线条代表 中的唯一数据,file.2线条-代表 中的唯一数据file.1。

    我可以处理这些+和-台词。

    • 2
  2. Ed Morton
    2022-09-19T09:01:22+08:002022-09-19T09:01:22+08:00

    听起来您并不真正关心输出格式,而只是想知道如何识别每个文件中的哪些行或常见的行。给,怎么样:

    $ diff --old-line-format=$'-%l\n' --new-line-format=$'+%l\n' --unchanged-line-format=$'=%l\n' file.1 file.2
    =common line 1 ...
    =common line 2 ...
    =common line 3 ...
    -something unique from file.1
    -a second line of something unique from file.1
    +something unique from file.2
    =common line 4 ...
    =common line 5 ...
    -something unique from file.1
    +something unique from file.2
    +a second line of something unique from file.2
    =common line 6 ...
    =common line 7 ...
    

    警惕任何必须测试行的内容以获得该行的源指示符的任何解决方案(例如,如果您正在寻找<<<<<<< file.1告诉您什么是独特的file1- 如果file包含一个正是该字符串的行怎么办? ) 而不是始终且仅出现在每行中唯一位置的指示符,因为如果该字符串可能在您的输入中,则对任何字符串的测试都将失败。上面的第一个字符始终是该行来自何处​​的指示符,因此它不会与可能的文件内容发生冲突。如果您真的想准确获得输出的 git merge 冲突格式(我不推荐),您总是可以将上面的内容通过管道传输到一个简单的 awk 脚本进行打印<<< file或当该行的第一个字符更改然后删除该字符时,您喜欢的任何内容。

    • 0
  3. Best Answer
    HippoMan
    2022-09-19T12:31:39+08:002022-09-19T12:31:39+08:00

    由于我最初在我的第一个“答案”中发布的解决方案的局限性diff -D ...,diff -U ...我决定使用 python 的difflib模块在 python 中编写一个解决方案。

    我编写它是为了生成看起来与git. 它使用包含字符串<<<<<<<<、========和的分隔符>>>>>>>>,并且我们知道,如果原始文本包含这样的字符串,这可能会导致歧义。但是,同样的歧义问题可能存在于 的“合并冲突”输出中git,但由于我对此感到满意git并愿意接受它,因此我也对自己的解决方案中的这些歧义感到满意。

    输出与“合并冲突”输出的输出并不完全相同git,但足以满足我的愿望。

    首先,这是python程序(我清理了我在这里发布的原始python代码,这是清理后的版本)。我称这个程序filemerge...

    #!/usr/bin/python3
    
    ### Take the diff's between two files and output
    ### the common and different lines in a manner
    ### which is very similar to the way that `git`
    ### depicts merge conflicts.
    
    import sys
    sys.dont_write_bytecode = True
    
    import os
    
    from difflib import unified_diff
    
    prog       = None
    diff_start = '<<<<<<<<'
    diff_sep   = '========'
    diff_end   = '>>>>>>>>'
    
    def main():
        if len(sys.argv) < 3:
            print(f'\nusage: {prog} file1 file2\n')
            return 1
    
        file1, file2 = sys.argv[1:3]
        data1        = None
        data2        = None
        missing      = []
    
        try:
            with open(file1, 'r') as f:
                data1 = f.readlines()
        except Exception:
            missing.append(file1)
    
        try:
            with open(file2, 'r') as f:
                data2 = f.readlines()
        except Exception:
            missing.append(file2)
            
        if missing:
            print(f'\nnot found: {", ".join(missing)}\n')
            return 1
    
        n1 = len(data1)
        n2 = len(data2)
        max_lines = (n1 + 1) if n1 > n2 else (n2 + 1)
        count = 0
        state = ''
        sep_printed = False
        next_file = ''
    
        for line in unified_diff(data1, data2, n=max_lines):
            count += 1
            if count < 4:
                continue
    
            # Every line which is returned by unified_diff()
            # is at least 2 characters long. Each of these
            # lines starts with either ' ', '+', or '-', and
            # each of these lines ends with a newline.
            line = line[:-1]
            ch0  = line[0]
    
            if ch0 == ' ':
                if state:
                    state = ''
                    if not sep_printed:
                        print(f'{diff_sep}{next_file}')
                    print(diff_end)
                sep_printed = False
                next_file = ''
            elif ch0 == '-':
                if state == ch0:
                    pass
                elif state == '+':
                    print(f'{diff_sep} file={file1}')
                    sep_printed = True
                    next_file = ''
                else:
                    print(f'{diff_start} file={file1}')
                    sep_printed = False
                    next_file = f' file={file2}'
                state = ch0
            elif ch0 == '+':
                if state == ch0:
                    pass
                elif state == '-':
                    print(f'{diff_sep} file={file2}')
                    sep_printed = True
                    next_file = ''
                else:
                    print(f'{diff_start} file={file2}')
                    sep_printed = False
                    next_file = f' file={file1}'
                state = ch0
            print(line[1:])
    
        if state:
            if not sep_printed:
                print(f'{diff_sep}{next_file}')
                next_file = ''
            print(diff_end)
    
        return 0
    
    if __name__ == '__main__':
        prog = os.path.basename(sys.argv[0])
        sys.exit(main())
    

    这是我测试它的输入文件。它们与我最初在此处的问题中发布的输入文件相似但不完全相同...

    ==== file.1====

    common line 1 ...
    common line 2 ...
    common line 3 ...
    something unique from file.1
    a second line of something unique from file.1
    common line 4 ...
    common line 5 ...
    something unique from file.1
    common line 6 ...
    common line 7 ...
    penultimate file.1 line
    common line 8 ...
    

    ==== file.2====

    common line 1 ...
    second line from file.2
    common line 2 ...
    common line 3 ...
    something unique from file.2
    common line 4 ...
    common line 5 ...
    something unique from file.2
    a second line of something unique from file.2
    common line 6 ...
    common line 7 ...
    common line 8 ...
    

    我像这样运行命令...

    filemerge file.1 file.2 >file.merged
    

    这些是file.merged...的结果内容

    common line 1 ...
    <<<<<<<< file=file.2
    second line from file.2
    ======== file=file.1
    >>>>>>>>
    common line 2 ...
    common line 3 ...
    <<<<<<<< file=file.1
    something unique from file.1
    a second line of something unique from file.1
    ======== file=file.2
    something unique from file.2
    >>>>>>>>
    common line 4 ...
    common line 5 ...
    <<<<<<<< file=file.1
    something unique from file.1
    ======== file=file.2
    something unique from file.2
    a second line of something unique from file.2
    >>>>>>>>
    common line 6 ...
    common line 7 ...
    <<<<<<<< file=file.1
    penultimate file.1 line
    ======== file=file.2
    >>>>>>>>
    common line 8 ...
    

    正如我所提到的,这与来自 的“合并冲突”输出的格式并不完全相同git,但它非常相似,这对我来说已经足够接近了。

    • 0

相关问题

  • grep 从 $START 到 $END 的一组行并且在 $MIDDLE 中包含匹配项

  • 重新排列字母并比较两个单词

  • 在awk中的两行之间减去相同的列

  • 多行文件洗牌

  • 如何更改字符大小写(从小到大,反之亦然)?同时[重复]

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    模块 i915 可能缺少固件 /lib/firmware/i915/*

    • 3 个回答
  • Marko Smith

    无法获取 jessie backports 存储库

    • 4 个回答
  • Marko Smith

    如何将 GPG 私钥和公钥导出到文件

    • 4 个回答
  • Marko Smith

    我们如何运行存储在变量中的命令?

    • 5 个回答
  • Marko Smith

    如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域?

    • 3 个回答
  • Marko Smith

    dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

    • 2 个回答
  • Marko Smith

    如何从 systemctl 服务日志中查看最新的 x 行

    • 5 个回答
  • Marko Smith

    Nano - 跳转到文件末尾

    • 8 个回答
  • Marko Smith

    grub 错误:你需要先加载内核

    • 4 个回答
  • Marko Smith

    如何下载软件包而不是使用 apt-get 命令安装它?

    • 7 个回答
  • Martin Hope
    user12345 无法获取 jessie backports 存储库 2019-03-27 04:39:28 +0800 CST
  • Martin Hope
    Carl 为什么大多数 systemd 示例都包含 WantedBy=multi-user.target? 2019-03-15 11:49:25 +0800 CST
  • Martin Hope
    rocky 如何将 GPG 私钥和公钥导出到文件 2018-11-16 05:36:15 +0800 CST
  • Martin Hope
    Evan Carroll systemctl 状态显示:“状态:降级” 2018-06-03 18:48:17 +0800 CST
  • Martin Hope
    Tim 我们如何运行存储在变量中的命令? 2018-05-21 04:46:29 +0800 CST
  • Martin Hope
    Ankur S 为什么 /dev/null 是一个文件?为什么它的功能不作为一个简单的程序来实现? 2018-04-17 07:28:04 +0800 CST
  • Martin Hope
    user3191334 如何从 systemctl 服务日志中查看最新的 x 行 2018-02-07 00:14:16 +0800 CST
  • Martin Hope
    Marko Pacak Nano - 跳转到文件末尾 2018-02-01 01:53:03 +0800 CST
  • Martin Hope
    Kidburla 为什么真假这么大? 2018-01-26 12:14:47 +0800 CST
  • Martin Hope
    Christos Baziotis 在一个巨大的(70GB)、一行、文本文件中替换字符串 2017-12-30 06:58:33 +0800 CST

热门标签

linux bash debian shell-script text-processing ubuntu centos shell awk ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve