Proc expand is a very useful procedure for working with time series data in terms of creating time series variables and plotting trends. We are going to show some examples here using data set https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sp500.sas7bdat. In particular, we are going to use variable date and open (price) only. Here are the first ten observations of these two variables
Obs DATE OPEN 1 01/04/2001 1347.56 2 01/05/2001 1333.34 3 01/08/2001 1298.35 4 01/09/2001 1295.86 5 01/10/2001 1300.80 6 01/11/2001 1313.27 7 01/12/2001 1326.82 8 01/16/2001 1318.32 9 01/17/2001 1326.65 10 01/18/2001 1329.89
Example 1. Creating a moving average variable
Let’s say that we need to create a variable of moving average of window size of 5 for the variable open. It can be done easily using the convert statement. Here we specify a new variable name open_ma for the new variable of moving average and in the option transformout we specify that the transformation is moving average (movave) with window size of 5. We also use "out = ma" for the name of the data set that includes the new variable open_ma. If option "out = " is missing, SAS will create the new data set anyway and will name it as data1 or such.
proc sort data = sp500; by date; run; *generating moving average variable; proc expand data = sp500 out = ma; convert open = open_ma / transformout=( movave 5); run; proc print data = ma (obs=10); var date open open_ma; run;
Obs DATE OPEN open_ma 1 01/04/2001 1347.56 1347.56 2 01/05/2001 1333.34 1340.45 3 01/08/2001 1298.35 1326.42 4 01/09/2001 1295.86 1318.78 5 01/10/2001 1300.80 1315.18 6 01/11/2001 1313.27 1308.32 7 01/12/2001 1326.82 1307.02 8 01/16/2001 1318.32 1311.01 9 01/17/2001 1326.65 1317.17 10 01/18/2001 1329.89 1322.99
Example 2. Creating lag and lead variables
There are many transformations that SAS offers and you can view the list and more details following the link. Here is an example of creating a lag variable and a lead variable.
proc expand data = sp500 out=ma1 method=none; convert open = open_lag1 /transformout = (lag 1); convert open = open_lead4 /transformout = (lead 4); run; proc print data =ma1 (obs=10); var date open open_lag1 open_lead4; run; open_ open_ Obs DATE OPEN lag1 lead4 1 01/04/2001 1347.56 . 1300.80 2 01/05/2001 1333.34 1347.56 1313.27 3 01/08/2001 1298.35 1333.34 1326.82 4 01/09/2001 1295.86 1298.35 1318.32 5 01/10/2001 1300.80 1295.86 1326.65 6 01/11/2001 1313.27 1300.80 1329.89 7 01/12/2001 1326.82 1313.27 1347.97 8 01/16/2001 1318.32 1326.82 1342.54 9 01/17/2001 1326.65 1318.32 1342.90 10 01/18/2001 1329.89 1326.65 1360.40 proc print data =ma1 (firstobs=236); var date open open_lag1 open_lead4; run; open_ open_ Obs DATE OPEN lag1 lead4 236 12/14/2001 1119.38 1137.07 1149.56 237 12/17/2001 1123.09 1119.38 1139.93 238 12/18/2001 1134.36 1123.09 1144.89 239 12/19/2001 1142.92 1134.36 1144.65 240 12/20/2001 1149.56 1142.92 1149.37 241 12/21/2001 1139.93 1149.56 1157.13 242 12/24/2001 1144.89 1139.93 1161.02 243 12/26/2001 1144.65 1144.89 . 244 12/27/2001 1149.37 1144.65 . 245 12/28/2001 1157.13 1149.37 . 246 12/31/2001 1161.02 1157.13 .
Example 3. Filling-in the gaps
Many times a time series has gaps between two time points. For example, the first ten observations of our example data set goes from 01/04/2001 to 01/18/2001. Proc expand offers many different methods for filling in the gaps. In this example, we will use the "method=step" option to fill the gaps with most the recent input value.
proc expand data = in.sp500 out=daily to=day method=step; convert open = daily_open; id date; run;
proc print data = daily (obs=20) noobs; run;
daily_ DATE open 04JAN2001 1347.56 05JAN2001 1333.34 06JAN2001 1333.34 07JAN2001 1333.34 08JAN2001 1298.35 09JAN2001 1295.86 10JAN2001 1300.80 11JAN2001 1313.27 12JAN2001 1326.82 13JAN2001 1326.82 14JAN2001 1326.82 15JAN2001 1326.82 16JAN2001 1318.32 17JAN2001 1326.65 18JAN2001 1329.89 19JAN2001 1347.97 20JAN2001 1347.97 21JAN2001 1347.97 22JAN2001 1342.54 23JAN2001 1342.90