SAS Code Fragments
Matching with a wildcard using Perl regular expression
Example 1: Let’s say that we want to extract observations where a particular variable text begins with "Inc" and ends with "b1". It does not matter what is in the middle. We first create a test data set. The wildcard is simply ".+" since "." is anything and ".+" is one or more of anything.
data test; length text $16; input a text $; cards; 1 Inc.F1b1 1 Inc.F2b1 2 Ltd.F4b2 2 Ltd.D5c1 ; run; data test2; retain re; if _n_=1 then do; re = prxparse('/Inc.+b1/'); end; set test; if prxmatch(re, text) then flag=1; else flag = 0; run; proc print data = test2; run;
Obs re text a flag
1 1 Inc.F1b1 1 1 2 1 Inc.F2b1 1 1 3 1 Ltd.F4b2 2 0 4 1 Ltd.D5c1 2 0
proc print data = test; where prxmatch('/Inc.+b1/', text); run;
Obs text a
1 Inc.F1b1 1 2 Inc.F2b1 1 proc means data= test; var a; where prxmatch('/Inc.+b1/', text); run;
The MEANS Procedure
Analysis Variable : a
N Mean Std Dev Minimum Maximum ----------------------------------------------------------------- 2 1.0000000 0 1.0000000 1.0000000 -----------------------------------------------------------------
Example 2: Dealing with real period ".". Let’s look at another slightly different situation. Our data looks like this.
data test; length text $16; input a text $3-12; cards; 1 Inc. F1b1 1 Inc F2b1 2 Ltd. F4b2 2 Ltd D5c1 ; run;
We only want to extract those rows that starts with "Inc." and ends with "b1". Notice that we want the "Inc" with the period "." with it. That is the row(s) we want to extract will be only the first row. If we use the code for example 1, we will extract both row 1 and 2, since "." is anything, not a real period. In Perl, "." represents the real period ".". So here is how the syntax goes.
data test2; retain re; if _n_=1 then do; re = prxparse('/Inc..+b1/'); end; set test; if prxmatch(re, text) then flag=1; else flag = 0; run; proc print data = test2; run;
Obs re text a flag
1 1 Inc. F1b1 1 1 2 1 Inc F2b1 1 0 3 1 Ltd. F4b2 2 0 4 1 Ltd D5c1 2 0
proc print data = test; where prxmatch('/Inc..+b1/', text); run;
Obs text a
1 Inc. F1b1 1
proc means data= test; var a; where prxmatch('/Inc..+b1/', text); run;
Analysis Variable : a
N Mean Std Dev Minimum Maximum ----------------------------------------------------------------- 1 1.0000000 . 1.0000000 1.0000000 -----------------------------------------------------------------
For more information, you can visit SAS webpage on Perl Regular Expressions.