When looking at data across consistent units of time (years, quarters, months), there is often interest in creating variables based on how data for a given time period compares to the periods before and after. If you have longitudinal data, you wish to look across units of time within a single subject. When your data is in long form (one observation per time point per subject), this can easily be handled in Stata with standard variable creation steps because of the way in which Stata processes datasets: it stores the entire dataset and can easily refer to any point in the dataset when generating variables. SAS works differently. SAS variables are typically created through a data step in which SAS moves through the dataset, observation by observation, carrying out the calculations for the given observation and accessing only one observation at a time. This system of data storage and access makes it possible for SAS to analyze large datasets but also very difficult to create time series variables in SAS using a data step. However, proc expand provides an easy-to-use alternative to the data step.
Let’s start with an example dataset containing only one subject. The dataset below contains US unemployment rates from September, 2006 to August, 2008.
data unemp; input year month rate @@; date = mdy( month, 1 , year ); format date yymm.; datalines; 2006 09 4.5 2006 10 4.4 2006 11 4.5 2006 12 4.4 2007 01 4.6 2007 02 4.5 2007 03 4.4 2007 04 4.5 2007 05 4.5 2007 06 4.6 2007 07 4.7 2007 08 4.7 2007 09 4.7 2007 10 4.8 2007 11 4.7 2007 12 5 2008 01 4.9 2008 02 4.8 2008 03 5.1 2008 04 5 2008 05 5.5 2008 06 5.5 2008 07 5.7 2008 08 6.1 ; proc print data = unemp (obs = 5); run; Obs year month rate date 1 2006 9 4.5 2006M09 2 2006 10 4.4 2006M10 3 2006 11 4.5 2006M11 4 2006 12 4.4 2006M12 5 2007 1 4.6 2007M01
For each month, we wish to know the difference between its rate and the rate of the previous month (r(i) – r(i-1)), its rate and the rate of the next month (r(i+1) – r(i)), and these two differences ((r(i+1)-r(i))-(r(i)-r(i-1)). To do this, we will use proc expand to generate a new dataset including these variables. In the proc expand line, we will name the new dataset unemp_laglead. We indicate that we do not wish to transform the values (using a spline, for example) but simply to grab the untransformed data from the specified record. We indicate that our time series is defined by date in the id line and in the three convert lines, we create the three values we wish to have for each time point in our data: the rate, the previous rate (rate_lag1), and the next rate (rate_lead1). In each line, we tell SAS the name of the variable in our new dataset, the type of transformation (lag, lead) and the number of time points to look back or ahead for the transformation (1 in this example).
proc expand data=unemp out=unemp_laglead method = none; id date; convert rate = rate_lag1 / transformout=(lag 1); convert rate; convert rate = rate_lead1 / transformout=(lead 1); run;
We can see the resulting dataset.
proc print data = unemp_laglead (obs = 5); run; rate_ rate_ Obs date lag1 rate lead1 year mont 1 2006M09 . 4.5 4.4 2006 9 2 2006M10 4.5 4.4 4.5 2006 10 3 2006M11 4.4 4.5 4.4 2006 11 4 2006M12 4.5 4.4 4.6 2006 12 5 2007M01 4.4 4.6 4.5 2007 1
Based on this dataset, we can now easily calculate the three time series variables we described earlier. But what if we had data for multiple countries? The dataset below contains unemployment data from 2000-2005 for three countries.
data unemp_international; input country $ year rate @@; datalines; US 2000 4 Canada 2000 6.1 UK 2000 5.5 US 2001 4.7 Canada 2001 6.5 UK 2001 5.1 US 2002 5.8 Canada 2002 7 UK 2002 5.2 US 2003 6 Canada 2003 6.9 UK 2003 5 US 2004 5.5 Canada 2004 6.4 UK 2004 4.8 US 2005 5.1 Canada 2005 6 UK 2005 4.9 ; proc print data = unemp_international (obs = 5); run; Obs country year rate 1 US 2000 4.0 2 Canada 2000 6.1 3 UK 2000 5.5 4 US 2001 4.7 5 Canada 2001 6.5
We wish to create lag and lead variables within each country. To do this, we can use proc expand with a by statement after sorting on country.
proc sort data = unemp_international; by country; run; proc expand data=unemp_international out=unemp_int2 method = none; by country; id year; convert rate = rate_lag1 / transformout=(lag 1); convert rate; convert rate = rate_lead1 / transformout=(lead 1); run; proc print data = unemp_int2; run; rate_ rate_ Obs country year lag1 rate lead1 1 Canada 2000 . 6.1 6.5 2 Canada 2001 6.1 6.5 7.0 3 Canada 2002 6.5 7.0 6.9 4 Canada 2003 7.0 6.9 6.4 5 Canada 2004 6.9 6.4 6.0 6 Canada 2005 6.4 6.0 . 7 UK 2000 . 5.5 5.1 8 UK 2001 5.5 5.1 5.2 9 UK 2002 5.1 5.2 5.0 10 UK 2003 5.2 5.0 4.8 11 UK 2004 5.0 4.8 4.9 12 UK 2005 4.8 4.9 . 13 US 2000 . 4.0 4.7 14 US 2001 4.0 4.7 5.8 15 US 2002 4.7 5.8 6.0 16 US 2003 5.8 6.0 5.5 17 US 2004 6.0 5.5 5.1 18 US 2005 5.5 5.1 .
With proc expand, you can also generate moving averages, splines, and
interpolated values. For more details, see the proc expand pages of
the SAS Online
Documentation.