我获得了来自不同大脑区域的体积数据,并试图对其进行整理,以便于分析。为了便于理解,以下是我获得的数据的一部分:
LT_Putamen 5075 5075.000000
LT_Temporal 84593 84593.000000
LT_Thalamus 7720 7720.000000
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
我想修改它并且输出将是:
LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
只是想在每条记录中都有这条“重叠”线。
我在编程方面还是一个新手,但我想到了类似这样的东西:
awk '{
if (NR == 1) {
# Initialize the first region (using first world in a line)
region = $1
print $0
} else {
if ($1 != region) {
# Finalize the old region - printing "overlaps" line with 0 0
printf("%s %overlaps 0 0\n", region)
# Start the new region
region = $1
}
# Print the current line (for the current region)
print $0
}
}
END {
# For the last region
if (region) {
printf("%s 0 0\n", region)
}
}'
结果接近我想要的:
LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 0 0
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 0 0
RT_Amygdala overlaps 2133.000000 94.7100
但我在已有重叠的区域有这些额外的“重叠”线。你能帮我吗?我应该怎么做才能让它工作?我会非常感激任何帮助!!谢谢
马尔钦
假设/理解:
一个
awk
想法:这将生成:
为了演示正确的处理,其中最后一行不是“重叠”行:
设置:
相同的代码生成:
$1
)不等于变量col1
,且该变量有内容时,用" overlaps 0 0"
col1
当前第一列的值。/overlaps/
) 时重置 col1 变量。1
)输出:
$2
使用任何保留部分中的值的 awkEND
(大多数都这样做):该
awk
解决方案也应该适合您: