The create command has many functions that are useful for making new variables. Below is a list of these functions.
Function name | Action |
CSUM | Cumulative sum |
DIFF | Difference |
FFT | Fast Fourier transform |
IFFT | Inverse fast Fourier transform |
LAG | Lag |
LEAD | Lead |
MA | Centered moving averages |
PMA | Prior moving averages |
RMED | Running medians |
SDIFF | Seasonal difference |
T4253H | Smoothing |
Let’s use the hsb2 data set and make new variables using some of these functions. We will start by deleting from this data set some of the variables that we will not be using. After making new variables, we will use the list command to show the first few cases of the original and new variable.
delete variables female ses schtyp prog read write math science.
We will start with the function for cumulative sum.
create v1 = csum(socst). list socst v1 /cases from 1 to 7.
socst v1 57.00 57.00 61.00 118.00 31.00 149.00 56.00 205.00 61.00 266.00 61.00 327.00 61.00 388.00 Number of cases read: 7 Number of cases listed: 7
The diff function can be used to create a variable with the difference between values of the original variable. The degree of the difference must be specified. In this example, we will make two new variables. The first will be differenced once and the second, v3, will be differenced twice.
create v2 = diff(socst, 1) /v3 =diff(socst, 2). list socst v2 v3 /cases from 1 to 7.
socst v2 v3 57.00 . . 61.00 4.00 . 31.00 -30.00 -34.00 56.00 25.00 55.00 61.00 5.00 -20.00 61.00 .00 -5.00 61.00 .00 .00 Number of cases read: 7 Number of cases listed: 7
The lag function can be used to make variables with lags of various lengths. The degree of lag must be specified. If a multiple variables with a range of lagged values is desired, the end points of the lags can be specified. In the first example, v4 contains the thrice lagged values of socst. In the second example, three new variables are made. The first, v5, contains the once lagged values of socst; v6 contains the twice lagged values of socst; v7 is the same as v4.
create v4 = lag(socst, 3). create v5 to v7 = lag(socst, 1, 3). list socst v4 to v7 /cases from 1 to 7.
socst v4 v5 v6 v7 57.00 . . . . 61.00 . 57.00 . . 31.00 . 61.00 57.00 . 56.00 57.00 31.00 61.00 57.00 61.00 61.00 56.00 31.00 61.00 61.00 31.00 61.00 56.00 31.00 61.00 56.00 61.00 61.00 56.00 Number of cases read: 7 Number of cases listed: 7
The lead function works just like the lag function. In this example, we use a lead of 2.
create v8 = lead(socst, 2). list socst v8 /cases from 1 to 7.
socst v8 57.00 31.00 61.00 56.00 31.00 61.00 56.00 61.00 61.00 61.00 61.00 36.00 61.00 51.00 Number of cases read: 7 Number of cases listed: 7
The create command can be combined with the split file command, so that the functions operate within groups of cases. In the example below, the lag function is used. As expected, the first case within each level of the variable race is missing.
sort cases by race. split file by race. create v9 = lag(socst, 1). split file off. list race socst v9 /cases from 1 to 40.
race socst v9 1.00 36.00 . 1.00 61.00 36.00 1.00 46.00 61.00 1.00 36.00 46.00 1.00 51.00 36.00 1.00 46.00 51.00 1.00 42.00 46.00 1.00 46.00 42.00 1.00 51.00 46.00 1.00 36.00 51.00 1.00 31.00 36.00 1.00 56.00 31.00 1.00 56.00 56.00 1.00 48.00 56.00 1.00 41.00 48.00 1.00 51.00 41.00 1.00 66.00 51.00 1.00 51.00 66.00 1.00 41.00 51.00 1.00 51.00 41.00 1.00 41.00 51.00 1.00 41.00 41.00 1.00 61.00 41.00 1.00 61.00 61.00 2.00 41.00 . 2.00 56.00 41.00 2.00 46.00 56.00 2.00 41.00 46.00 2.00 56.00 41.00 2.00 51.00 56.00 2.00 56.00 51.00 2.00 71.00 56.00 2.00 36.00 71.00 2.00 51.00 36.00 2.00 56.00 51.00 3.00 61.00 . 3.00 51.00 61.00 3.00 56.00 51.00 3.00 56.00 56.00 3.00 31.00 56.00 Number of cases read: 40 Number of cases listed: 40
When the create command makes a new variable, it also labels that variable. This is very useful if you are making many new variables.