我有一堆带有 .vcf 后缀的文件,逗号分隔并包含“,如下所示
"","CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO","FORMAT","NORMAL","TUMOR","Depth","DistanceToAlignmentEnd","DistanceToAlignmentEndMAD","DistanceToAlignmentEndMedian","HomopolymerLength","LowMapQual","MMQSDiff","MapQualDiff","MapQualDiffMedian","NT","QSS","QSS_NT","ReadCount","ReadCountControl","Repeat","SGT","SNVCluster10","SNVCluster100","SNVCluster20","SNVCluster5","SNVCluster50","SOMATIC","StrandBias","TQSS","TQSS_NT","VariantAlleleCount","VariantAlleleCountControl","VariantAlleleFrequency","VariantBaseQual","VariantBaseQualMedian","VariantMMQS","VariantMapQual","VariantMapQualMedian","VariantStrandBias","normal","tumour","N_DP","N_FDP","N_SDP","N_SUBDP","T_DP","T_FDP","T_SDP","T_SUBDP","T_REF_COUNT","T_ALT_COUNT","N_REF_COUNT","N_ALT_COUNT","T_VAF","N_VAF","N_AU_1","N_AU_2","N_CU_1","N_CU_2","N_GU_1","N_GU_2","N_TU_1","N_TU_2","T_AU_1","T_AU_2","T_CU_1","T_CU_2","T_GU_1","T_GU_2","T_TU_1","T_TU_2"
"1","chr1","11195689",".","C","G",".","PASS","Depth=83;DistanceToAlignmentEnd=38.75;DistanceToAlignmentEndMAD=9.00;DistanceToAlignmentEndMedian=37.50;HomopolymerLength=2;LowMapQual=0.00;MMQSDiff=136.94;MapQualDiff=-8.750e-01;MapQualDiffMedian=0.00;NT=ref;QSS=16;QSS_NT=16;ReadCount=83;ReadCountControl=49;Repeat=0;SGT=CC->CG;SNVCluster10=0;SNVCluster100=2;SNVCluster20=1;SNVCluster5=0;SNVCluster50=2;SOMATIC;StrandBias=0.482;TQSS=1;TQSS_NT=1;VariantAlleleCount=4;VariantAlleleCountControl=1;VariantAlleleFrequency=0.048;VariantBaseQual=40.00;VariantBaseQualMedian=41.00;VariantMMQS=157.25;VariantMapQual=59.00;VariantMapQualMedian=60.00;VariantStrandBias=0.250","DP:FDP:SDP:SUBDP:AU:CU:GU:TU","49:1:0:0:0,0:47,48:1,1:0,0","82:0:0:0:0,0:78,79:4,4:0,0","83","38.75","9","37.5","2","0","136.94","-0.875","0","ref","16","16","83","49","0","CC->CG","0","2","1","0","2","SOMATIC","0.482","1","1","4","1","0.048","40","41","157.25","59","60","0.25","list(DP = ""49"", FDP = ""1"", SDP = ""0"", SUBDP = ""0"", AU = ""0,0"", CU = ""47,48"", GU = ""1,1"", TU = ""0,0"")","list(DP = ""82"", FDP = ""0"", SDP = ""0"", SUBDP = ""0"", AU = ""0,0"", CU = ""78,79"", GU = ""4,4"", TU = ""0,0"")","49","1","0","0","82","0","0","0","78","4","47","1","0.0487804878048781","0.0208333333333333","0","0","47","48","1","1","0","0","0","0","78","79","4","4","0","0"
如何将逗号空间转换为制表符,删除“,删除第一列,最后将清理后的文件保存为 .txt,并使用匹配的原始文件的基本名称?
您可以使用
csvcut
和csvformat
来自基于 Python 的csvkit
包 ex。