我在 Linux 机器上有一个PDB 文件(蛋白质中原子的坐标):
ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00
ATOM 2 CA GLY A 1 -0.712 50.962 5.596 1.00 0.00
ATOM 3 C GLY A 1 -1.243 50.872 4.179 1.00 0.00
ATOM 4 O GLY A 1 -1.313 51.888 3.492 1.00 0.00
ATOM 5 N GLN A 2 -1.600 49.664 3.737 1.00 0.00
ATOM 6 CA GLN A 2 -2.221 49.468 2.423 1.00 0.00
ATOM 7 C GLN A 2 -3.542 48.719 2.507 1.00 0.00
ATOM 8 O GLN A 2 -3.722 47.844 3.356 1.00 0.00
ATOM 9 CB GLN A 2 -1.280 48.738 1.468 1.00 0.00
ATOM 10 CG GLN A 2 -0.976 47.294 1.830 1.00 0.00
.... .. .. .. . . .... .... .... .... ....
TER SPLIT LINE FOR INTERNAL USE ONLY
ATOM 1 O5' G A 1 -44.412 97.503 31.177 1.00 0.00
ATOM 2 C5' G A 1 -45.447 96.803 31.882 1.00 0.00
ATOM 3 C4' G A 1 -45.225 95.295 31.894 1.00 0.00
ATOM 4 O4' G A 1 -46.441 94.578 31.654 1.00 0.00
ATOM 5 C3' G A 1 -44.328 94.850 30.748 1.00 0.00
ATOM 6 O3' G A 1 -42.943 94.877 31.129 1.00 0.00
ATOM 7 C2' G A 1 -44.804 93.425 30.542 1.00 0.00
ATOM 8 O2' G A 1 -44.163 92.592 31.466 1.00 0.00
ATOM 9 C1' G A 1 -46.304 93.444 30.772 1.00 0.00
ATOM 10 N9 G A 1 -46.965 93.699 29.495 1.00 0.00
.... .. .. . . . ....... ...... ..... .... ...
TER 记录明确标记了特定氨基酸链的结束。我想用 awk 更改第 5 列的蛋白质链 ID,以便在 TER 之后为新的链分配正确的 ID。
预期输出:
ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00
ATOM 2 CA GLY A 1 -0.712 50.962 5.596 1.00 0.00
ATOM 3 C GLY A 1 -1.243 50.872 4.179 1.00 0.00
ATOM 4 O GLY A 1 -1.313 51.888 3.492 1.00 0.00
ATOM 5 N GLN A 2 -1.600 49.664 3.737 1.00 0.00
ATOM 6 CA GLN A 2 -2.221 49.468 2.423 1.00 0.00
ATOM 7 C GLN A 2 -3.542 48.719 2.507 1.00 0.00
ATOM 8 O GLN A 2 -3.722 47.844 3.356 1.00 0.00
ATOM 9 CB GLN A 2 -1.280 48.738 1.468 1.00 0.00
ATOM 10 CG GLN A 2 -0.976 47.294 1.830 1.00 0.00
TER SPLIT LINE FOR INTERNAL USE ONLY
ATOM 1 O5' G B 1 -44.412 97.503 31.177 1.00 0.00
ATOM 2 C5' G B 1 -45.447 96.803 31.882 1.00 0.00
ATOM 3 C4' G B 1 -45.225 95.295 31.894 1.00 0.00
ATOM 4 O4' G B 1 -46.441 94.578 31.654 1.00 0.00
ATOM 5 C3' G B 1 -44.328 94.850 30.748 1.00 0.00
ATOM 6 O3' G B 1 -42.943 94.877 31.129 1.00 0.00
ATOM 7 C2' G B 1 -44.804 93.425 30.542 1.00 0.00
ATOM 8 O2' G B 1 -44.163 92.592 31.466 1.00 0.00
ATOM 9 C1' G B 1 -46.304 93.444 30.772 1.00 0.00
ATOM 10 N9 G B 1 -46.965 93.699 29.495 1.00 0.00
所有内容都需要用相同的空格分隔,以下安排是错误的:
ATOM 3674 CD1 PHE A 460 2.350 79.471 35.466 1.00 0.00
ATOM 3675 CD2 PHE A 460 1.037 81.443 35.196 1.00 0.00
ATOM 3676 CE1 PHE A 460 2.425 79.321 34.080 1.00 0.00
ATOM 3677 CE2 PHE A 460 1.108 81.298 33.805 1.00 0.00
ATOM 3678 CZ PHE A 460 1.805 80.232 33.250 1.00 0.00
TER SPLIT LINE FOR B USE ONLY
ATOM 1 O5' G B 1 -44.412 97.503 31.177 1.00 0.00
ATOM 2 C5' G B 1 -45.447 96.803 31.882 1.00 0.00
ATOM 3 C4' G B 1 -45.225 95.295 31.894 1.00 0.00
ATOM 4 O4' G B 1 -46.441 94.578 31.654 1.00 0.00
ATOM 5 C3' G B 1 -44.328 94.850 30.748 1.00 0.00
此外,该文件以此结尾:
TER
ENDMDL
文件末尾有一个空白行,需要保留原样