重新排列字母并比较两个单词

Question

Luca Schulz

Asked: 2022-05-27 10:25:34 +0800 CST2022-05-27 10:25:34 +0800 CST 2022-05-27 10:25:34 +0800 CST

如何在经常一起出现的多个文件中查找关键字？

772

我想找到经常相互关联的关键字。

例子

目录包含降价文件，每个文件的最后一行都有一些关键字：

$ tail -n 1 file1.md
#doctor #donkey #plants

$ tail -n 1 file2.md
#doctor #firework #university

$ tail -n 1 file3.md
#doctor #donkey #linux #plants

伪输出

100% 包含关键字“#donkey”的文件也包含关键字“#doctor”。
50% 包含关键字“#plants”的文件也包含关键字“#linux”。
…

一个 Shell 脚本、一个 awk 脚本，或者只是一个关于如何实现这个目标的解释就足够了！

任何帮助，将不胜感激。非常感谢

1 个回答

Voted

Ed Morton · Answer 1 · 2022-05-27T12:34:34+08:00

对数组数组使用 GNU awk：

如果关键字位于每个文件的第一行而不是使用 GNU awk 来nextfile提高效率：

$ cat tst.awk
FNR == 1 {
    for ( i=1; i<=NF; i++ ) {
        words[$i]++
        for ( j=i+1; j<=NF; j++ ) {
            pairs[$i][$j]++
            pairs[$j][$i]++
        }
    }
    nextfile
}
END {
    for ( word1 in pairs ) {
        for ( word2 in pairs[word1] ) {
            pct = pairs[word1][word2] * 100 / words[word1]
            printf "%d%% of the files containing the keyword \"%s\" also contain the keyword \"%s\".\n", pct, word1, word2
        }
    }
}

$ awk -f tst.awk file*.md
100% of the files containing the keyword "#university" also contain the keyword "#doctor".
100% of the files containing the keyword "#university" also contain the keyword "#firework".
100% of the files containing the keyword "#plants" also contain the keyword "#donkey".
50% of the files containing the keyword "#plants" also contain the keyword "#linux".
100% of the files containing the keyword "#plants" also contain the keyword "#doctor".
100% of the files containing the keyword "#donkey" also contain the keyword "#plants".
50% of the files containing the keyword "#donkey" also contain the keyword "#linux".
100% of the files containing the keyword "#donkey" also contain the keyword "#doctor".
100% of the files containing the keyword "#linux" also contain the keyword "#plants".
100% of the files containing the keyword "#linux" also contain the keyword "#donkey".
100% of the files containing the keyword "#linux" also contain the keyword "#doctor".
33% of the files containing the keyword "#doctor" also contain the keyword "#university".
66% of the files containing the keyword "#doctor" also contain the keyword "#plants".
66% of the files containing the keyword "#doctor" also contain the keyword "#donkey".
33% of the files containing the keyword "#doctor" also contain the keyword "#linux".
33% of the files containing the keyword "#doctor" also contain the keyword "#firework".
100% of the files containing the keyword "#firework" also contain the keyword "#university".
100% of the files containing the keyword "#firework" also contain the keyword "#doctor".

或在最后一行，然后再次依靠 gawk ENDFILE：

$ cat tst.awk
ENDFILE {
    for ( i=1; i<=NF; i++ ) {
        words[$i]++
        for ( j=i+1; j<=NF; j++ ) {
            pairs[$i][$j]++
            pairs[$j][$i]++
        }
    }
}
END {
    for ( word1 in pairs ) {
        for ( word2 in pairs[word1] ) {
            pct = pairs[word1][word2] * 100 / words[word1]
            printf "%d%% of the files containing the keyword \"%s\" also contain the keyword \"%s\".\n", pct, word1, word2
        }
    }
}

$ awk -f tst.awk file*.md

或者仍然在最后一行，但使用 tail+gawk 更有效：

$ cat tst.awk
{
    for ( i=1; i<=NF; i++ ) {
        words[$i]++
        for ( j=i+1; j<=NF; j++ ) {
            pairs[$i][$j]++
            pairs[$j][$i]++
        }
    }
}
END {
    for ( word1 in pairs ) {
        for ( word2 in pairs[word1] ) {
            pct = pairs[word1][word2] * 100 / words[word1]
            printf "%d%% of the files containing the keyword \"%s\" also contain the keyword \"%s\".\n", pct, word1, word2
        }
    }
}

$ tail -qn1 file*.md | awk -f tst.awk

如何在经常一起出现的多个文件中查找关键字？

例子

伪输出

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

如何在经常一起出现的多个文件中查找关键字？

例子

伪输出

1 个回答

相关问题