您好,抱歉标题太长了!我正在处理一些包含长文本字符串的数据(一些观察结果最多有 2000 个字符)。这些字符串中可能有一个单词(AB/CD),该单词可能位于字符串中的任何位置。我试图检测文本字符串中的 AB/CD,并创建一个二进制变量(ABCD_present),如果该单词出现在文本中。
以下是一些示例数据
data test;
length status $175;
infile datalines dsd dlm="|" truncover;
input ID Status$;
datalines;
1|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data AB/CD
2|This is example AB/CD text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
3|This is example text I am using instead of real data. I AB/CD am making the length of this text longer to mimic the long text strings of my data
4|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
5|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
6|This is example text I am using instead of real data. I am making the length of this text longer to AB/CD mimic the long text strings of my data
;
run;
任何有关这方面的指导都非常好!我没有太多使用长文本字符串的经验。
先感谢您
您可以使用该
find
功能。另外两个检测子字符串是否存在的函数
INDEX
是PRXMATCH