我有这样的数据,但要大得多。所以我有一个 df1.txt 如下
sp|O15304|SIVA_HUMAN MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVGMESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIAITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFFFSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIWPVSIGLIAGALCIFIVAIVVFKKRDLPL
我有 df2.txt 如下
sp|O15304|SIVA_HUMAN IGPDGR
我正在尝试将它们结合在一起,所以我执行以下操作
join df1.txt df2.txt | awk '{gsub($3, tolower($3), $2) ; print $1 "\t" $2}' > out.txt
我希望有这个
sp|O15304|SIVA_HUMAN MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVGMESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIAITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFFFSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIWPVSIGLIAGALCIFIVAIVVFKKRDLPL
但相反,我有这个
sp|O15304|SIVA_HUMAN MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
,我该如何解决?
问题出在 join 命令中。你需要使用
-a 1
.从
man join
即最后的命令是
背景
故障排除时,应按顺序对管道的各个部分进行测试。
join df1.txt df2.txt
只输出两个文件中的行。要包含 indf1.txt
中不匹配的行df2.txt
,请-a 1
按上述方式使用。