grep 从 $START 到 $END 的一组行并且在 $MIDDLE 中包含匹配项

Question

HippoMan

Asked: 2022-09-18 16:42:28 +0800 CST2022-09-18 16:42:28 +0800 CST 2022-09-18 16:42:28 +0800 CST

合并两个非基于 git 的文本文件，语义相似，描述 git 合并冲突

772

我想git使用类似于git描述“合并冲突”的语义来合并两个非基于文本文件。

例如，假设我有两个内容相似但不相同的文本文件，名为file.1和file.2。我想将这两个文件合并到第三个文件中，如下所示：

hypothetical-merge-utility file.1 file.2 file.merged

我希望它产生file.merged，它将以类似于以下的方式列出文件内容和每个差异：

common line 1 ...
common line 2 ...
common line 3 ...
<<<<<<< file.1
something unique from file.1
a second line of something unique from file.1
======= file.2
something unique from file.2
>>>>>>> end of diff
common line 4 ...
common line 5 ...
<<<<<<< file.1
something unique from file.1
======= file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>> end of diff
common line 6 ...
common line 7 ...
... etc. ...

换句话说，我希望和之间的每个差异file.1看起来file.2类似于git“合并冲突”的表示。

我不在乎是否使用了 , 和以外<<<<<<<<的========分隔符>>>>>>>>。

我知道有许多实用程序可用于在 linux 下合并文本文件。但是，我只是在寻找以类似于git描述“合并冲突”的方式专门呈现合并数据的东西。

有谁知道这样的实用程序？

先感谢您。

更新：根据 Ed Morton 的以下问题，这是两个测试文件的内容......

==== 文件.1 ====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...

==== 文件.2 ====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...

3 个回答

Voted

HippoMan · Answer 1 · 2022-09-18T17:08:17+08:00

注意：虽然我认为这是一个有点合理的“答案”，但我现在想出了另一个我认为更好的“答案”。所以请看下面我的另一个“答案”。

这个“答案”的原始版本......

哦！我在这里发布得太早了。我不知道-D命令行选项diff，现在我意识到我可以做到这一点......

diff -D file.1 file.2 >file.merged

它将产生以下内容file.merged...

common line 1 ...
common line 2 ...
common line 3 ...
#ifdef file.1
something unique from file.1
a second line of something unique from file.1
#else /* file.1 */
something unique from file.2
#endif /* file.1 */
common line 4 ...
common line 5 ...
#ifdef file.1
something unique from file.1
#else /* file.1 */
something unique from file.2
a second line of something unique from file.2
#endif /* file.1 */
common line 6 ...
common line 7 ...
... etc. ...

我可以像处理's #ifdef、#else和行一样处理、和行。#endifgit<<<<<<<<========>>>>>>>>

更新：...我刚刚发现： https ://stackoverflow.com/questions/16902001/manually-merge-two-files-using-diff

它展示了我如何也可以使用统一差异格式做类似的事情。给出diff一个-U参数很大的选项，该参数大于和中的最大行file.1数file.2。例如 ...

diff -U 99999999 file.1 file.2 | tail -n +4 >file.merged

然后它将产生这个：

 common line 1 ...
 common line 2 ...
 common line 3 ...
+something unique from file.2
-something unique from file.1
-a second line of something unique from file.1
 common line 4 ...
 common line 5 ...
+something unique from file.2
+a second line of something unique from file.2
-something unique from file.1
 common line 6 ...
 common line 7 ...
 ... etc. ...

+线条代表中的唯一数据，file.2线条-代表中的唯一数据file.1。

我可以处理这些+和-台词。

Ed Morton · Answer 2 · 2022-09-19T09:01:22+08:00

听起来您并不真正关心输出格式，而只是想知道如何识别每个文件中的哪些行或常见的行。给，怎么样：

$ diff --old-line-format=$'-%l\n' --new-line-format=$'+%l\n' --unchanged-line-format=$'=%l\n' file.1 file.2
=common line 1 ...
=common line 2 ...
=common line 3 ...
-something unique from file.1
-a second line of something unique from file.1
+something unique from file.2
=common line 4 ...
=common line 5 ...
-something unique from file.1
+something unique from file.2
+a second line of something unique from file.2
=common line 6 ...
=common line 7 ...

警惕任何必须测试行的内容以获得该行的源指示符的任何解决方案（例如，如果您正在寻找<<<<<<< file.1告诉您什么是独特的file1- 如果file包含一个正是该字符串的行怎么办？ ) 而不是始终且仅出现在每行中唯一位置的指示符，因为如果该字符串可能在您的输入中，则对任何字符串的测试都将失败。上面的第一个字符始终是该行来自何处的指示符，因此它不会与可能的文件内容发生冲突。如果您真的想准确获得输出的 git merge 冲突格式（我不推荐），您总是可以将上面的内容通过管道传输到一个简单的 awk 脚本进行打印<<< file或当该行的第一个字符更改然后删除该字符时，您喜欢的任何内容。

HippoMan · Answer 3 · 2022-09-19T12:31:39+08:00

由于我最初在我的第一个“答案”中发布的解决方案的局限性diff -D ...，diff -U ...我决定使用 python 的difflib模块在 python 中编写一个解决方案。

我编写它是为了生成看起来与git. 它使用包含字符串<<<<<<<<、========和的分隔符>>>>>>>>，并且我们知道，如果原始文本包含这样的字符串，这可能会导致歧义。但是，同样的歧义问题可能存在于的“合并冲突”输出中git，但由于我对此感到满意git并愿意接受它，因此我也对自己的解决方案中的这些歧义感到满意。

输出与“合并冲突”输出的输出并不完全相同git，但足以满足我的愿望。

首先，这是python程序（我清理了我在这里发布的原始python代码，这是清理后的版本）。我称这个程序filemerge...

#!/usr/bin/python3

### Take the diff's between two files and output
### the common and different lines in a manner
### which is very similar to the way that `git`
### depicts merge conflicts.

import sys
sys.dont_write_bytecode = True

import os

from difflib import unified_diff

prog       = None
diff_start = '<<<<<<<<'
diff_sep   = '========'
diff_end   = '>>>>>>>>'

def main():
    if len(sys.argv) < 3:
        print(f'\nusage: {prog} file1 file2\n')
        return 1

    file1, file2 = sys.argv[1:3]
    data1        = None
    data2        = None
    missing      = []

    try:
        with open(file1, 'r') as f:
            data1 = f.readlines()
    except Exception:
        missing.append(file1)

    try:
        with open(file2, 'r') as f:
            data2 = f.readlines()
    except Exception:
        missing.append(file2)
        
    if missing:
        print(f'\nnot found: {", ".join(missing)}\n')
        return 1

    n1 = len(data1)
    n2 = len(data2)
    max_lines = (n1 + 1) if n1 > n2 else (n2 + 1)
    count = 0
    state = ''
    sep_printed = False
    next_file = ''

    for line in unified_diff(data1, data2, n=max_lines):
        count += 1
        if count < 4:
            continue

        # Every line which is returned by unified_diff()
        # is at least 2 characters long. Each of these
        # lines starts with either ' ', '+', or '-', and
        # each of these lines ends with a newline.
        line = line[:-1]
        ch0  = line[0]

        if ch0 == ' ':
            if state:
                state = ''
                if not sep_printed:
                    print(f'{diff_sep}{next_file}')
                print(diff_end)
            sep_printed = False
            next_file = ''
        elif ch0 == '-':
            if state == ch0:
                pass
            elif state == '+':
                print(f'{diff_sep} file={file1}')
                sep_printed = True
                next_file = ''
            else:
                print(f'{diff_start} file={file1}')
                sep_printed = False
                next_file = f' file={file2}'
            state = ch0
        elif ch0 == '+':
            if state == ch0:
                pass
            elif state == '-':
                print(f'{diff_sep} file={file2}')
                sep_printed = True
                next_file = ''
            else:
                print(f'{diff_start} file={file2}')
                sep_printed = False
                next_file = f' file={file1}'
            state = ch0
        print(line[1:])

    if state:
        if not sep_printed:
            print(f'{diff_sep}{next_file}')
            next_file = ''
        print(diff_end)

    return 0

if __name__ == '__main__':
    prog = os.path.basename(sys.argv[0])
    sys.exit(main())

这是我测试它的输入文件。它们与我最初在此处的问题中发布的输入文件相似但不完全相同...

==== file.1====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...
penultimate file.1 line
common line 8 ...

==== file.2====

common line 1 ...
second line from file.2
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...
common line 8 ...

我像这样运行命令...

filemerge file.1 file.2 >file.merged

这些是file.merged...的结果内容

common line 1 ...
<<<<<<<< file=file.2
second line from file.2
======== file=file.1
>>>>>>>>
common line 2 ...
common line 3 ...
<<<<<<<< file=file.1
something unique from file.1
a second line of something unique from file.1
======== file=file.2
something unique from file.2
>>>>>>>>
common line 4 ...
common line 5 ...
<<<<<<<< file=file.1
something unique from file.1
======== file=file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>>>
common line 6 ...
common line 7 ...
<<<<<<<< file=file.1
penultimate file.1 line
======== file=file.2
>>>>>>>>
common line 8 ...

正如我所提到的，这与来自的“合并冲突”输出的格式并不完全相同git，但它非常相似，这对我来说已经足够接近了。

合并两个非基于 git 的文本文件，语义相似，描述 git 合并冲突

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

合并两个非基于 git 的文本文件，语义相似，描述 git 合并冲突

3 个回答

相关问题