我遇到了某种正则表达式问题,尽管我已经在 MATLAB 中编写了代码,但我想使它尽可能通用。
信息:
LipidData
是一个 68x2 的表,其中包含一个名称列和Short
列,这些列是字符串,例如LPC
、PC
、AC4PIM2
、等等。这个矩阵不会改变,但可能会根据其来源的实际输入数据而变化。SHexCer
SQDG
LipidData
foundpattern
foundpattern
是一个 N×4 表,在我的示例中 N 为 7。这里唯一相关的列是第一个,称为ISDs
,其中包含要检查的字符串(为了可重复性,您可以仅将列复制为单元格数组)。在这里您可以看到两个 MATLAB 表:
输入:
>> LipidData
LipidData =
68×2 table
Lipid subclass name Short
___________________________________________________ ___________
{'Diacylated phosphatidylinositol monomannoside' } {'Ac2PIM1' }
{'Diacylated phosphatidylinositol dimannoside' } {'Ac2PIM2' }
{'Triacylated phosphatidylinositol dinomannoside' } {'Ac3PIM2' }
{'Tetraaacylated phosphatidylinositol dimannoside' } {'AC4PIM2' }
{'Anacardic Acid' } {'ACar' }
{'Acetylglucose andrographolide' } {'AcylGlcADG' }
{'Bis[monoacylglycero]phosphates' } {'BMP' }
{'Cholesteryl esters' } {'CE' }
{'Ceramide' } {'Cer' }
{'Ceramide alpha-hydroxy fatty acid-dihydrosphingosine' } {'CerADS' }
{'Ceramide alpha-hydroxy fatty acid-phytospingosine' } {'CerAP' }
{'Ceramide beta-hydroxy fatty acid-sphingosine' } {'CerAS' }
{'Ceramide beta-hydroxy fatty acid-dihydrosphingosine' } {'CerBDS' }
{'Ceramide beta-hydroxy fatty acid-sphingosine' } {'CerBS' }
{'Ceramide Esterified omega-hydroxy fatty acid-dihydrosphingosine'} {'CerEODS' }
{'Ceramide Esterified omega-hydroxy fatty acid-sphingosine' } {'CerEOS' }
{'Ceramide non-hydroxyfatty acid-dihydrosphingosine' } {'CerNDS' }
{'Ceramide non-hydroxyfatty acid-phytospingosine' } {'CerNP' }
{'Ceramide non-hydroxyfatty acid-sphingosine' } {'Cer_NS' }
{'Ceramide phosphate' } {'CerP' }
{'Cholesterol' } {'Cholesterol'}
{'Cardiolipins' } {'CL' }
{'Diacyl/alkylglycerides' } {'DG' }
{'Digalactosyldiacylglycerols' } {'DGDG' }
{'1,2-diacylglyceryl-3-O-4'-(N,N,N-trimethyl)-homoserine' } {'DGTS' }
{'Ether Oxygenated Phosphatidylcholines' } {'EtherOxPC' }
{'Ether Oxygenated Phosphatidylethanolamines' } {'EtherOxPE' }
{'Ether-linked Phosphatidylcoline' } {'EtherPC' }
{'Ether-linked Phosphatidylethanolamine' } {'EtherPE' }
{'Fatty Acids' } {'FA' }
{'Fatty acid ester of hydroxyl fatty acid' } {'FAHFA' }
{'Glucuronosyldiacylglycerol' } {'GlcADG' }
{'GM3 Ganglioside' } {'GM3' }
{'Hidroxy Bis[monoacylglycero]phosphates' } {'HBMP' }
{'Hexosylceramide alpha-hydroxy fatty acid-phytospingosine' } {'HexCerAP' }
{'Hexosylceramide non-hydroxyfatty acid-dihydrosphingosine' } {'HexCerNDS' }
{'Hexosylceramide non-hydroxyfatty acid-sphingosine' } {'HexCer_NS' }
{'Lyso 1,2-diacylglyceryl-3-O-4'-(N,N,N-trimethyl)-homoserine' } {'DGTS' }
{'Lyso Phosphatidic acids' } {'LPA' }
{'Lyso Phosphatidylcholines' } {'LPC' }
{'Lyso Phosphatidylethanolamines' } {'LPE' }
{'Lyso Phosphatidylglycerols' } {'LPG' }
{'Lyso Phosphatidylinositols' } {'LPI' }
{'Lyso Phosphatidylserines' } {'LPS' }
{'Monoacyl/alkylglycerides' } {'MG' }
{'Monogalactosyldiacylglycerols' } {'MGDG' }
{'Oxygenated Cardiolipins' } {'OxCL' }
{'Oxygenated Fatty Acids' } {'OxFA' }
{'Oxygenated Phosphatidic acids' } {'OxPA' }
{'Oxygenated Phosphatidylcholines' } {'OxPC' }
{'Oxygenated Phosphatidylethanolamines' } {'OxPE' }
{'Oxygenated Phosphatidylglycerols' } {'OxPG' }
{'Oxygenated Phosphatidylinositols' } {'OxPI' }
{'Oxygenated Phosphatidylserines' } {'OxPS' }
{'Oxygenated Triacyl/alkylglycerides' } {'OxTG' }
{'Phosphatidic acids' } {'PA' }
{'Phosphatidylbutyl alcohol' } {'PBtOH' }
{'Phosphatidylcholines' } {'PC' }
{'Phosphatidylethanolamines' } {'PE' }
{'Phosphatidyletanol' } {'PEtOH' }
{'Phosphatidylglycerols' } {'PG' }
{'Phosphatidylinositols' } {'PI' }
{'Phosphatidylmethanol' } {'PMeOH' }
{'Phosphatidylserines' } {'PS' }
{'Sulfatides hexosyl ceramide' } {'SHexCer' }
{'Sphingomyelines' } {'SM' }
{'Sulfoquinovosyl diacylglycerols' } {'SQDG' }
{'Triacyl/alkylglycerides' } {'TG' }
>> foundpattern
foundpattern =
7×4 table
ISDs tR Standard desv RSD
__________________________ ______ _____________ _______
{'18:1 (d7) MG' } 1.34 0.020418 1.5238
{'18:1(d7) LPC' } 1.5868 0.0056024 0.35305
{'18:1 (d9) SM' } 6.8999 0.08336 1.2081
{'15:0-18:1(d7) PC' } 7.989 0.072533 0.90791
{'15:0-18:1(d7) DG' } 12.085 0.097445 0.80631
{'15:0-18:1 (d7)-15:0 TG'} 17.487 0.029701 0.16984
{'Cholesterol (d7)' } 18.247 0.032275 0.17687
问题在于将 LipidData 的正则表达式PC
与 的 foundpattern 值进行比较时{'18:1(d7) LPC'}
,会产生“匹配”,但我不知道如何避免这种情况。我只需要在 中找到完全相同的Short
值。假设在 found pattern 中有一个,它不仅会与其 LipidData 值匹配,还会与 匹配,foundpattern.ISDs
则会出现相同问题的另一个示例。Cer_NS
Cer_NS
Cer
我相信将值组合成一个组(使用带括号的正则表达式)是一种解决方案,就像您在代码中看到的那样,但当然这些组是“略微修改的”,因此是重复的。我知道我错过了一些东西,但我不知道是什么。
无论如何都要避免那里的重复匹配?正如您在OUTPUT中看到的那样,Codes 单元格数组应该只有 7 个条目,而不是 8 个。
代码:
Codes={}
for j=1:size(ID,1)
expression=strcat("(",char(LipidData{j,2}),")");
for i=1:size(foundpattern,1)
if regexp(char(foundpattern{i,1}),expression) ~= 0
disp(foundpattern{i,1})
disp(LipidData{j,2})
Codes{end+1}=LipidData{j,2};
end
end
end
输出:
>> Codes
Codes =
1×8 cell array
Columns 1 through 6
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
Columns 7 through 8
{1×1 cell} {1×1 cell}
>> for i=1:size(Codes,2)
Codes{i}
end
ans =
1×1 cell array
{'Cholesterol'}
ans =
1×1 cell array
{'DG'}
ans =
1×1 cell array
{'LPC'}
ans =
1×1 cell array
{'MG'}
ans =
1×1 cell array
{'PC'}
ans =
1×1 cell array
{'PC'}
ans =
1×1 cell array
{'SM'}
ans =
1×1 cell array
{'TG'}
>>