Using SPSS functions for making and recoding variables

1. Introduction

SPSS has a wide variety of functions you can use for creating and recoding variables. We will explore three kinds of functions: mathematical functions, string functions, and random number functions. These functions have the same general syntax:

function_name(argument1, argument2, etc.)

We will illustrate some functions using the following data file that includes name, x, test1, test2, and test3.

DATA LIST FREE /
  name (A14)  x test1 test2 test3.
BEGIN DATA.
"John Smith"      4.2 86.5 84.55  81
"Samuel Adams"    9.0  -99 82.37 -99
"Ben Johnson"    -6.2 82.1 84.81  87
"Chris Adraktas"  9.5 94.2   -99  93
"John Brown"     -999 79.7 79.07  72
END DATA.

LIST.

The output of the LIST command is shown below.

NAME                  X    TEST1    TEST2    TEST3

John Smith         4.20    86.50    84.55    81.00
Samuel Adams       9.00   -99.00    82.37   -99.00
Ben Johnson       -6.20    82.10    84.81    87.00
Chris Adraktas     9.50    94.20   -99.00    93.00
John Brown      -999.00    79.70    79.07    72.00

The variable x uses -999 to indicate missing values, and test1, test2 and test3 use -99 to indicate missing values. Below we tell SPSS about these missing values and list out the data again.

MISSING VALUES x (-999)
  /test1 test2 test3 (-99).

LIST.

The output is shown below. Note that the data really does not look any different after we have defined the missing values. But, as we will see below, SPSS does know to treat these values as missing rather than treating them as though they were -99 and -999.

NAME                  X    TEST1    TEST2    TEST3

John Smith         4.20    86.50    84.55    81.00
Samuel Adams       9.00   -99.00    82.37   -99.00
Ben Johnson       -6.20    82.10    84.81    87.00
Chris Adraktas     9.50    94.20   -99.00    93.00
John Brown      -999.00    79.70    79.07    72.00

2. Math functions

Now let’s try some basic math functions. The trunc function (short for truncate) takes a number and converts it to a whole number (integer) by removing all the decimal places, for example, 6.99 and 6.49 would become 6. By contrast, the rnd function (short for round) rounds numbers to the nearest whole number using conventional rounding rules, for example 6.99 would become 7, but 6.49 would become 6.

COMPUTE t1tr = TRUNC(test1).
COMPUTE t2tr = TRUNC(test2).

COMPUTE t1rnd = RND(test1).
COMPUTE t2rnd = RND(test2).

LIST name test1 t1tr t1rnd test2 t2tr t2rnd.

The results below are as we would expect.

NAME              TEST1     T1TR    T1RND    TEST2     T2TR    T2RND

John Smith        86.50    86.00    87.00    84.55    84.00    85.00
Samuel Adams     -99.00      .        .      82.37    82.00    82.00
Ben Johnson       82.10    82.00    82.00    84.81    84.00    85.00
Chris Adraktas    94.20    94.00    94.00   -99.00      .        .
John Brown        79.70    79.00    80.00    79.07    79.00    79.00

SPSS has other mathematical functions. Below we illustrate functions for getting the square root (sqrt), natural log (ln), log to the base 10 (lg10) and exponential (exp). Note that the sqrt, ln and lg10 functions do not work with negative numbers (for example you cannot take the square root of a negative number). SPSS will generate missing values in such cases, as we will see below.

COMPUTE xsqrt = SQRT(x).
COMPUTE xln   = LN(x).
COMPUTE xlg10 = LG10(x).
COMPUTE xexp  = EXP(x).
EXECUTE.

LIST x xsqrt xln xlg10 xexp.

The results are shown below. We expected SPSS to generate missing values for xsqrt, xln and xlg10 when x was negative and we see below that those values are displayed as a single decimal point. This is the way that SPSS shows a system missing value. Also, we see that xsqrt, xln, xlg10 and xexp were all assigned system missing values when x was -999.

       X    XSQRT      XLN    XLG10     XEXP

    4.20     2.05     1.44      .62    66.69
    9.00     3.00     2.20      .95  8103.08
   -6.20      .        .        .        .00
    9.50     3.08     2.25      .98 13359.73
 -999.00      .        .        .        .

The results also included warnings like the one shown below. The one below is telling us that you cannot take the square root of a negative number and that SPSS is going to set the result to the system missing value.

Warning # 603
The argument to the square root function is less than zero.  The result has
been set to the system-missing value.

3. Statistical functions

SPSS also has statistical functions that operate on one or more variables. For example, we might want to compute the average of the three test scores. SPSS has the MEAN function that can do that for you, as shown below.

COMPUTE avg = MEAN(test1, test2, test3).
LIST name test1 test2 test3 avg.

We see the results below. Note that SPSS computed the mean of the non missing values. For Samuel Adams, that meant that his average was the same as his score on test2 since that was the only non-missing value. We could tell SPSS to give anyone a missing value if they have fewer than 2 valid test scores using the mean.2 function. Likewise, we could tell SPSS that we want the mean to be missing if any of the scores were missing, by using the mean.3 function. These are illustrated below.

COMPUTE avg2 = MEAN.2(test1, test2, test3).
COMPUTE avg3 = MEAN.3(test1, test2, test3).
LIST name test1 test2 test3 avg avg2 avg3.

As you see below, avg2 is missing for Samuel Adams, and avg3 is also missing for Samuel Adams and Chris Adraktas because they both had some missing test scores.

NAME              TEST1    TEST2    TEST3      AVG     AVG2     AVG3
John Smith        86.50    84.55    81.00    84.02    84.02    84.02
Samuel Adams     -99.00    82.37   -99.00    82.37      .        .
Ben Johnson       82.10    84.81    87.00    84.64    84.64    84.64
Chris Adraktas    94.20   -99.00    93.00    93.60    93.60      .
John Brown        79.70    79.07    72.00    76.92    76.92    76.92

In addition to the mean function, SPSS also has sum, sd, variance, min and max functions.

4. String functions

Now let’s illustrate some of the SPSS string functions. Below we create up that will be the name converted into upper case, lo that will be the name converted to lower case, and sub that will be the third through eighth character in the persons name. Note that we first had to use the string command to tell SPSS that up lo and sub are string variables that will have a length of up to 14 characters. Had we omitted the string command, these would have been treated as numeric variables, and when SPSS tried to assign a character value to the numeric variables, it would have generated an error. We also create len that is the length of the name variable, and len2 that is the length of the persons name.

STRING up lo (A14)
  /sub (A6).

COMPUTE up  = UPCASE(name).
COMPUTE lo  = LOWER(name).
COMPUTE sub  = SUBSTR(name,3,8).
COMPUTE len = LENGTH(name).
COMPUTE len2 = LENGTH(RTRIM(name)).

LIST name up lo sub len len2.

The results are shown below. The results for up lo sub all as we would expect. The result for len may be a bit confusing. The variable len does not refer to the length of the person’s name, but it refers to the length of the variable name. When we read the data we entered name (A14) for name, giving the variable a length of 14, and that is why len is always 14. By contrast, len2 uses the rtrim function to strip off any excess blanks, and then it takes the length of that. In the end, len2 returns the length of the persons name, for example John Smith has a length of 10.

NAME           UP             LO             SUB         LEN     LEN2
John Smith     JOHN SMITH     john smith     hn Smi    14.00    10.00
Samuel Adams   SAMUEL ADAMS   samuel adams   muel A    14.00    12.00
Ben Johnson    BEN JOHNSON    ben johnson    n John    14.00    11.00
Chris Adraktas CHRIS ADRAKTAS chris adraktas ris Ad    14.00    14.00
John Brown     JOHN BROWN     john brown     hn Bro    14.00    10.00

Let’s use SPSS string functions to get the first name and last name out of the name variable. We start by using the index function to determine the position of the first blank space in the name. We then use the substr function to extract the part of the name before the blank to be the first name, and the part after the blank to be the last name.

STRING fname lname (A10).

COMPUTE blank = INDEX(name,' ').
COMPUTE fname = SUBSTR(name,1,blank-1).
COMPUTE lname = SUBSTR(name,blank+1).

LIST name blank fname lname.

The results below show that this was successful. For example, for John Smith, the substr function extracted the first name by taking the substring from the 1st to 4th character of name, and the last name by taking the 6th character and onward.

NAME              BLANK FNAME      LNAME
John Smith         5.00 John       Smith
Samuel Adams       7.00 Samuel     Adams
Ben Johnson        4.00 Ben        Johnson
Chris Adraktas     6.00 Chris      Adraktas
John Brown         5.00 John       Brown

5. Random number functions

Random numbers are more useful than you might imagine, they are used extensively in Monte Carlo studies, but they are also frequently used in many other situation We will look at two of SPSS’s random number functions

uniform(n) – generates a random number that is 0 or greater, and less than n from a uniform distribution.

rv.binomial(n,p) – generates a value from the binomial distribution with n trials, and with a probability of success equal to p.

Below we generate a random number that is greater than or equal to 0, but less than 1.

COMPUTE rannum = UNIFORM(1).
LIST name rannum.

We see the results below.

NAME             RANNUM

John Smith          .14
Samuel Adams        .43
Ben Johnson         .61
Chris Adraktas      .29
John Brown          .16

Below we generate a random number that is greater than or equal to 0, but less than 10.

COMPUTE ran10 = UNIFORM(10).
LIST NAME ran10.

And the results are shown below.

NAME              RAN10

John Smith         7.00
Samuel Adams       3.46
Ben Johnson        4.46
Chris Adraktas      .52
John Brown         1.03

The example below generates a whole number (integer) from 1 to 100. The trucn function is used to convert the result into a whole number from 0 to 99, and then 1 is added to make it from 1 to 100.

COMPUTE ran100 = TRUNC(UNIFORM(100)) + 1.
LIST name ran100.

As we see below, these values are all whole numbers.

NAME             RAN100

John Smith        15.00
Samuel Adams       5.00
Ben Johnson       63.00
Chris Adraktas    16.00
John Brown        72.00

Below we use the rv.binomial function to simulate a coin flip. It is like a coin flip since the number of trials is 1 and the probability of success is .5 (like flipping a coin once and the probability of it coming up heads is .5). Let’s treat a 1 as coming up heads, and a 0 as coming up tails. As we see below, Ben and John each got a head, and the others got tails.

COMPUTE flip = RV.BINOMIAL(1 , .5 ).
LIST name flip.

NAME               FLIP
John Smith          .00
Samuel Adams        .00
Ben Johnson        1.00
Chris Adraktas      .00
John Brown         1.00

Below, we change the number of flips to 10, and count the number of heads each person gets. John got the most heads (7) and Ben got the fewest (4).

COMPUTE flip10 = RV.BINOMIAL(10 , .5 ).
LIST name flip10.

NAME             FLIP10
John Smith         6.00
Samuel Adams       6.00
Ben Johnson        4.00
Chris Adraktas     5.00
John Brown         7.00

The next example changes the flips to 100. It also sets the seed for the random number generator. The seed determines the string of random numbers that will be generated. John got the fewest heads (49 out of 100) and Samuel got the most (58 out of 100).

SET SEED = 149238.
COMPUTE flip100 = RV.BINOMIAL(100 , .5 ).
LIST name flip100 .

NAME            FLIP100
John Smith        49.00
Samuel Adams      58.00
Ben Johnson       52.00
Chris Adraktas    53.00
John Brown        52.00

If we repeat the example from above using the exact same seed, we will get the same results. This is very useful for being able to replicate results of a simulation study or Monte Carlo style study. Indeed, using the same seed did generate the same results (see below).

SET SEED = 149238.
COMPUTE flip100 = RV.BINOMIAL(100 , .5 ).
LIST name flip100 .

NAME            FLIP100
John Smith        49.00
Samuel Adams      58.00
Ben Johnson       52.00
Chris Adraktas    53.00
John Brown        52.00

6. Random number functions, advanced

In the examples above, we used the rv.binomial function to simulate coin flips but it gave us the end result of all of the flips. Perhaps you would like to do a simulation study where you generate each of the flips as a separate observation. SPSS can do this, as we illustrate below.

SET seed=943785.
INPUT PROGRAM.
+ LOOP id = 1 to 25.
+   COMPUTE cointoss = RV.BINOMIAL( 1 , .5 ).
+   END CASE.
+ END LOOP.
+ END FILE.
END INPUT PROGRAM.

LIST CASES.

The program above creates 25 observations, each having a variable called id which is the trial number, and cointoss that will be either 1 or 0. Even if this program does not make much sense to you, you could use it as a template to make your own simulation. You can change the number of trials by changing 25 to the number of trials you want. You can change the probability of success by changing the value of .5 to the value you would like. Or, you could choose an entirely different random number generating function instead of rv.binomial you might choose uniform. The results of the program above are shown below.

     ID COINTOSS

    1.00      .00
    2.00     1.00
    3.00     1.00
    4.00      .00
    5.00      .00
    6.00     1.00
    7.00      .00
    8.00      .00
    9.00      .00
   10.00      .00
   11.00      .00
   12.00     1.00
   13.00      .00
   14.00      .00
   15.00      .00
   16.00     1.00
   17.00      .00
   18.00     1.00
   19.00     1.00
   20.00     1.00
   21.00     1.00
   22.00     1.00
   23.00      .00
   24.00     1.00
   25.00      .00

3. Problems to look out for

Watch out for math errors, such as division by zero, square root of a negative number and log of a negative number.

4. For more information

For information on Functions is SPSS consult the SPSS Command Syntax Reference Guide.