Statistical Computing Seminars Beyond Point and Click: SPSS Syntax

This seminar was developed using SPSS version 15. However, the syntax should work with earlier versions of SPSS, although the output might look somewhat different. In places where we are aware of differences between recent versions of SPSS, we make note of them.

Here are links for downloading the data files associated with this seminar.

Introduction

The goal of this seminar is to help you learn about the use of SPSS syntax as an alternative to the point-and-click interface. In many instances, you may find that using syntax is more simple and convenient than using the point-and-click interface. The use of syntax is also helpful in documenting your analysis. It is difficult to take adequate notes on modifications made to the data and the procedures used to do the analyses when using the point-and-click interface. However, documenting what you are doing in a syntax file is simple, and this makes reviewing and/or reconstructing the analysis much easier.

All SPSS procedures and functions are executed using syntax, whether you use the point-and-click interface or write your own syntax. Almost everything that you can do in SPSS via the point-and-click interface can be accomplished by writing syntax. (There are a few exceptions, most notably when using the graph editor.) Also, there are a handful of commands that are available via syntax that are not available via the point-and-click interface, such as temporary and manova. There are several ways in which you can get SPSS to show you the syntax that it is using to run your analyses, and they are explained below.

Perhaps the simplest way to ease yourself into writing SPSS syntax is to click on the Paste button instead of the OK button after you have set up your analysis using the point-and-click interface. This will paste the syntax that SPSS uses to run your analysis into a syntax file. A syntax file is nothing more than a text file; hence, you can type syntax and comments into it, and you can cut-and-paste in it as you would in any text editor. To run the syntax that you have pasted, you simply highlight it and click on the right-pointing arrow at the top of the syntax window. Your results will be displayed in the output window just the same as if you had used the point-and-click interface.

Recent versions of SPSS now have syntax used to run a command displayed in the output window by default. If you are using an earlier version of SPSS, you can make a change in the general SPSS options that will show the syntax being used. If you need to make this change, from the Data Editor window, click on Edit, then on Options, and then on the View tab. In the lower left-hand corner, check the option that says "Display commands in log", and you will see all of the commands issued from then on in your output window immediately above the corresponding output. The syntax seen in the output can be copied and pasted into the SPSS syntax editor to be saved and/or modified.

Finally, you can change the general SPSS options to save the SPSS journal to a convenient location. The journal is a log of all of the SPSS commands that have been issued, but with no output. To make this change, from the Data Editor window, click on Edit, then Options, and under the General tab, you will see where SPSS is saving the journal file. You can change that location, and you can indicate whether the journal should be overwritten every time you start SPSS, or if your next session should be appended to the bottom of the existing file. You can view the journal file using a text editor such as WordPad. Be aware that the file might be quite long, so NotePad may not be able to open the file. Also, the file can become very difficult to read if it has many months (or years!) worth of analyses in it.

Now that we have seen some easy ways to learn SPSS syntax, let’s open a syntax window and get started. To do this, from the Data Editor window, click on "File", then "New", and then "Syntax." If you want to open an existing syntax file, you would click on "File", then "Open", and then on "Syntax". To avoid possible confusion, please remember that there are three windows in SPSS that we will be using, and these three windows correspond to three file types. Clearly, there is the data editor, which is the window in which you see your data. SPSS data files end with a .sav extension. There is a syntax window, and the extension for a syntax file is. sps. The third window is, of course, the output window. If you are using SPSS version 15 or lower, the output file will have a .spo extension. If you are using SPSS version 16, the extension for the output file will be .spv.

As of version 14 of SPSS, you can have multiple data sets open at once. You can write specific commands in your syntax file to indicate which data set should be used for a given analysis. You can also click on an open data set to make it the "active" data set; the syntax that you run will operate on the "active" data set.

One of the most important things to remember when writing SPSS syntax is that all commands must end in a period (.). (In other words, the end-of-command marker is a period.) This includes comments (notes to yourself), which you can use pretty much anywhere in your syntax file. To start a comment, use either an asterisk (*) or the command comment. If you forget to end your comment with a period, SPSS will consider everything between comment or * and the next period to be part of the comment, and you may be left wondering why some of your commands did not run.

1. Creating numeric variables

Perhaps the first thing that you need to know when using SPSS syntax is how to open a data file. The SPSS command for this is get file followed by the path where the file is located. The path and file name must be enclosed in quotes, and you need to include the file extension, which is .sav for SPSS data files. Note that the period must be outside of the quotes.

get file 'd:data.sav'.

Now that we have our data file read into SPSS, let’s create some new variables. Two commands that you can use to create numeric variables are compute and if. Be aware that there is no "then" in SPSS. SPSS will not create the new variables unless we issue either the execute command or a procedural command (whether or not the procedure involves the newly created variable). In the syntax below, the execute is technically unnecessary because we issue the procedural command list immediately afterward. However, including the execute does not cause any problems, and it is handy to have in case you later change the program and remove the command that executes the compute commands.

The standard format for the compute command is

compute newvar = oldvar.

When using the if command, you can use parentheses around the "if-condition" if they help you understand the command, but parentheses are not necessary.

compute newvar1 = num1.
compute newvar = 0.
if (num1 = 20) newvar = 1.
if (num1 ge 50 or num2 le 15) newvar = 2.
if num1=96 and num2 = 96 newvar = 3.
if num1 ge 90 newvar2 = 1.
execute.
list num1 num2 newvar newvar1.

    NUM1     NUM2   NEWVAR  NEWVAR1  NEWVAR2

   20.00    20.00     1.00    20.00      .
   20.00    30.00     1.00    20.00      .
   52.00    36.00     2.00    52.00      .
   63.00    86.00     2.00    63.00      .
   45.00    72.00      .00    45.00      .
   93.00    12.00     2.00    93.00     1.00
   28.00    15.00     2.00    28.00      .
   75.00    46.00     2.00    75.00      .
   96.00    96.00     3.00    96.00     1.00
   34.00    36.00      .00    34.00      .
   73.00    32.00     2.00    73.00      .
   20.00    30.00     1.00    20.00      .
   55.00    13.00     2.00    55.00      .
   91.00    29.00     2.00    91.00     1.00
   78.00    30.00     2.00    78.00      .

Number of cases read:  15    Number of cases listed:  15

You can use all sorts of math and functions when creating your variables. As shown in the following syntax, the execute command can be shortened to exe.

compute newvar3 = num1*num2.
compute newvar4 = num1/6.56.
compute newvar5 = sum(num1,num2).
exe.
list newvar3 newvar4 newvar5.

 NEWVAR3  NEWVAR4  NEWVAR5

  400.00     3.05    40.00
  600.00     3.05    50.00
 1872.00     7.93    88.00
 5418.00     9.60   149.00
 3240.00     6.86   117.00
 1116.00    14.18   105.00
  420.00     4.27    43.00
 3450.00    11.43   121.00
 9216.00    14.63   192.00
 1224.00     5.18    70.00
 2336.00    11.13   105.00
  600.00     3.05    50.00
  715.00     8.38    68.00
 2639.00    13.87   120.00
 2340.00    11.89   108.00

Number of cases read:  15    Number of cases listed:  15

Oftentimes there is more than one way to specify an inequality in SPSS. Below is a little chart with some of them.

Not equal	Less than	Less than or equal to	Greater than	Greater than or equal to
~=	<	<=	>	>=
ne	lt	le	gt	ge

2. Creating standardized variables

It is easy to create standardized variables in SPSS. You simply use the descriptives command (which can be shortened to desc) and use the /save subcommand. In the first example, we accept the default name that SPSS assigns to the new variable; in the second example, we provide a name for the new variable, num1z. Note that a label was automatically created for the new variables.

descriptives num1
 /save.
desc num1 (num1z) /save.

Many commands in SPSS have subcommands. These are usually options, and all subcommands start with a slash (/). The subcommands can be on the same line as the command, or on a new line, and they can be indented or not. You should write syntax in a way that is easiest for you to read and understand. One of the few things that SPSS does require is that each new command start on a new line. If you don’t start each command on a new line, you will get very "interesting" error messages, none of which indicate what the real problem is.

3. Creating string variables

Up through version 15 of SPSS, there are two types of string variables in SPSS: short strings and long strings. Short string variables have a maximum length of eight characters. Long string variables have a maximum length of 255 characters. Long strings can be displayed by some procedures and the print command, and they can be used as "break" variables. However, long string variables cannot be used in tabulation procedures, and they cannot have user-defined missing values (see below). This means that long string variables cannot have missing values, as user-defined missing is the only kind of missing values a string variable can have. To create either type of string variable, you usually need to use the string command. You can then populate the new string variable using the compute command. This is unlike numeric variables, which can be both created and populated using the compute command.

When creating either type of string variable (short or long), you need to indicate how long the string variable should be. This is done by including the desired length in parentheses after the name of the new string variable. You also need to include the letter "A" before the length to indicate that you want to create an alphanumeric variable (as opposed to a hexadecimal variable, which would use AHEX).

As of version 16, the distinction between short and long strings has been eliminated, the maximum length of a string variable has been increased to 32,767 characters, and string variables of all lengths can have user-defined missing values. In version 16, if you wish to alter the length of a string variable, you can use the alter type command.

string string1 (A4).
string string2 string3 string4 (A10).
compute string1 = "a".
if newvar2 = 1 string2 = "b".
if newvar1 ge 50 and newvar ne 1 string3 = "No".
exe.
list newvar newvar1 string1 to string4.

  NEWVAR  NEWVAR1 STRING1 STRING2    STRING3    STRING4

    1.00    20.00 a
    1.00    20.00 a
    2.00    52.00 a                  No
    2.00    63.00 a                  No
     .00    45.00 a
    2.00    93.00 a       b          No
    2.00    28.00 a
    2.00    75.00 a                  No
    3.00    96.00 a       b          No
     .00    34.00 a
    2.00    73.00 a                  No
    1.00    20.00 a
    2.00    55.00 a                  No
    2.00    91.00 a       b          No
    2.00    78.00 a                  No

Number of cases read:  15    Number of cases listed:  15

4. Recoding variables

There are several ways that you can recode variables in SPSS. For example, you can use the recode command, the if command or the autorecode command. Remember that when using the if command, there is no "then" in SPSS syntax. You can create complex rules regarding how variables get recoded. You have lots of functions from which to choose, and you can do all sorts of mathematical manipulations.

if num1 = 55 y = 30.
if num1 le 50 and gender = "f" y = 35.
list num1 gender y.

    NUM1 GENDER          Y

   20.00 f           35.00
   20.00 f           35.00
   52.00 f             .
   63.00 m             .
   45.00 m             .
   93.00 f             .
   28.00 m             .
   75.00 f             .
   96.00 m             .
   34.00 f           35.00
   73.00 f             .
   20.00 f           35.00
   55.00             30.00
   91.00 m             .
   78.00 m             .

Number of cases read:  15    Number of cases listed:  15

There are several SPSS keywords that you can use with the recode command, including lowest, lo, hi, highest, thru, sysmis, missing, else and copy. We strongly recommend that you recode your variables into new variables, just in case the recoding does not go as you planned. You can use the into option with the recode command to create the new variable into which you will recode the old variable.

recode num1 (lowest thru 60 = 1) (85 thru highest = sysmis) into y1.
list num1 y1.

    NUM1       Y1

   20.00     1.00
   20.00     1.00
   52.00     1.00
   63.00      .
   45.00     1.00
   93.00      .
   28.00     1.00
   75.00      .
   96.00      .
   34.00     1.00
   73.00      .
   20.00     1.00
   55.00     1.00
   91.00      .
   78.00      .

Number of cases read:  15    Number of cases listed:  15

Let’s look at recoding that involves string variables. SPSS is case-sensitive when recoding string variables. Hence, if you use upper-case letters in your recode command and have lower-case letters in your variable, nothing will happen. This includes NOT getting an error message in the output window telling you that no recoding was done.

recode str1 ("a", "b","c" = "D").
string str2a str2b str3a (A6).
recode str2 ("c" = "D") ("a" = ' ') into str2a.
recode str2 ("c" = "D") ("a" = ' ') (else=copy) into str2b.
recode str3 ("c" = "D") ("a" = ' ') (else = 'x') into str3a.
exe.
list str1 str2 str2a str2b str3 str3a.

<output shown in SPSS Output window>
recode str1 ("a", "b","c" = "D").
string str2a str2b str3a (A6).
recode str2 ("c" = "D") ("a" = ' ') into str2a.
recode str2 ("c" = "D") ("a" = ' ') (else=copy) into str2b.

>Warning # 4684 in column 54.  Text: STR2B
>On the RECODE command, the list of variables following the keyword INTO
>includes a string variable which is not of sufficient width to accept the
>longest string value generated by the value specifications.  Long values
>will be truncated to the length of the variables.

So, why did we get this error message, and what does it mean? If we look at str2 (perhaps by going to the Variable View in the Data Editor), we can see that it has a length of 8. However, str2b was created to have a length of only 6. The error message informs us that the last two characters of str2 will be cut off. Because in this case, the last two characters are blanks, we don’t really care. We don’t get this error message with regard to str2a because we only specify that strings of length 1 populate it (e.g., when str2 = "c" and str2 = " "); the rest of the cases remain unchanged.

recode str3 ("c" = "D") ("a" = ' ') (else = 'x') into str3a.
list str1 str2 str2a str2b str3 str3a.

STR1     STR2     STR2A  STR2B  STR3     STR3A

D        d               d      d        x
D        c        D      D      x        x
D        a                      x        x
D        b               b      d        x
f        d               d      d        x
d        d               d      b        x
D        f               f      b        x
D        b               b      c        D
D        a                      d        x
D        x               x               x
D        x               x      a
D                               a
D                               d        x
f        a                      c        D
         b               b      b        x

Number of cases read:  15    Number of cases listed:  15

5. Changing string variables into numeric variables

The main reason to convert a string variable into a numeric variable (often called "destringing") is for use in statistical analyses, as very few analysis procedures will allow a string variable. You can use the convert option of the recode command only if you have numbers and/or missing values in a string variable.

recode gender ("f" = 1) ("m" = 2) into sex.
recode str5 (convert) into str5a.
exe.
list gender sex str5 str5a.

GENDER        SEX STR5        STR5A

f            1.00 1            1.00
f            1.00 5            5.00
f            1.00 4            4.00
m            2.00 6            6.00
m            2.00 3            3.00
f            1.00 2            2.00
m            2.00 9            9.00
f            1.00 8            8.00
m            2.00               .
f            1.00 2            2.00
f            1.00 1            1.00
f            1.00 5            5.00
              .   8            8.00
m            2.00 3            3.00
m            2.00 5            5.00

Number of cases read:  15    Number of cases listed:  15

The autorecode command converts string variables into numeric variables. By default, the lowest value in the string variable is given a value of 1 in the new numeric variable, the next lowest value is given a value of 2, and so on. A null string is considered to be the lowest value; hence, all cases with a value of a null string will receive a value of 1 in the new numeric variable. SPSS also creates value labels for the new numeric variable, associating the numeric values with the string values. Compare variables str5a and str5auto1. Although both of these new variables are the numeric version of the same string variable, str5, there are some important differences between them, such as how the missing value in str5 is handled. If you are using SPSS version 13 or higher, there are some additional subcommands that you can use. For example, the /blank subcommand indicate how missing values should be handled. In our example, we use /blank = missing, so the missing values will be given the highest value in the recoded variable. The /group subcommand indicates that all variables listed should use the same coding scheme, which ensures that the new variables will have a consistent coding scheme.

autorecode gender /into sex1.
autorecode str5 /into str5auto1.
autorecode str2 /into str2auto1.
* the two commands below work for SPSS versions 13 and higher.
autorecode str2 str5 /into str2auto2 str5auto2 /group.
autorecode str2 str5 /into str2auto3 str5auto3 /blank = missing.
exe.
list str5 str5a str5auto1 str2 str2auto1 str5auto2 str2auto2 str5auto3 str2auto3.
display dictionary.

                  str5a          str2a
str5        str5a uto1  str2     uto1  str5auto2 str2auto2 str5auto3 str2auto3

1            1.00   2   d          5        2        13        1         4
5            5.00   6   c          4        6        12        5         3
4            4.00   5   a          2        5        10        4         1
6            6.00   7   b          3        7        11        6         2
3            3.00   4   d          5        4        13        3         4
2            2.00   3   d          5        3        13        2         4
9            9.00   9   f          6        9        14        8         5
8            8.00   8   b          3        8        11        7         2
              .     1   a          2        1        10        9         1
2            2.00   3   x          7        3        15        2         6
1            1.00   2   x          7        2        15        1         6
5            5.00   6              1        6         1        5         7
8            8.00   8              1        8         1        7         7
3            3.00   4   a          2        4        10        3         1
5            5.00   6   b          3        6        11        5         2

Number of cases read:  15    Number of cases listed:  15

6. Counting

The count command is useful if you have items from a questionnaire that are on a Likert scale (e.g., 1 to 5). It counts the number of occurrences of a value across a list of variables. In the example below, we create a new variable called total that contains the number of 3s contained in each case for the variables specified.

count total = q1 to q3 (3).
exe.
list q1 to q3 total.

      Q1       Q2       Q3    TOTAL

    3.00     3.00      .       2.00
    2.00     2.00    -9.00      .00
    3.00     1.00     2.00     1.00
    4.00     1.00     2.00      .00
   -8.00     1.00     3.00     1.00
   -8.00     2.00     1.00      .00
    3.00    -9.00     4.00     1.00
    4.00     4.00     2.00      .00
    1.00     1.00     1.00      .00
    2.00    -9.00     3.00     1.00
    3.00     3.00     2.00     2.00
    3.00     1.00     1.00     1.00
   -9.00     4.00     4.00      .00
     .       2.00     4.00      .00
    2.00     3.00     1.00     1.00

Number of cases read:  15    Number of cases listed:  15

7. The keyword "to"

When creating variables, the SPSS keyword to will create variables with consecutive numbering. When using to in syntax to refer to variables that already exist in the data set, SPSS assumes that variables are positionally consecutive (all variables between the first variable listed and the last variable listed in the command will be included). There are some commands in SPSS that will use the keyword to in both a positionally and a numerically consecutive manner, depending on whether existing variables are being modified in some way or whether new variables are being created. Some of these commands include autorecode, recode, aggregate and rename variables.

autorecode v1 to v2 /into w1 to w3.
rename variables (v1 to v2 = b1 to b3).
compute z = mean(q1 to q5).
exe.
list q1 to q5 z.

      Q1       Q2       Q3       Q4       Q5        Z

    3.00     3.00      .        .       2.00     2.67
    2.00     2.00    -9.00      .       1.00    -1.00
    3.00     1.00     2.00      .       3.00     2.25
    4.00     1.00     2.00      .      -9.00     -.50
   -8.00     1.00     3.00      .       2.00     -.50
   -8.00     2.00     1.00      .      -9.00    -3.50
    3.00    -9.00     4.00      .       2.00      .00
    4.00     4.00     2.00      .       3.00     3.25
    1.00     1.00     1.00      .       1.00     1.00
    2.00    -9.00     3.00      .       2.00     -.50
    3.00     3.00     2.00      .       5.00     3.25
    3.00     1.00     1.00      .       3.00     2.00
   -9.00     4.00     4.00      .       2.00      .25
     .       2.00     4.00      .       1.00     2.33
    2.00     3.00     1.00      .       4.00     2.50

Number of cases read:  15    Number of cases listed:  15

8. Dates

Dates are stored as numbers (actually, as floating-point numbers) in SPSS; you can add and subtract them. Dates are stored as the number of seconds from midnight, October 14, 1582 (the beginning of the Gregorian calendar). Therefore, you usually need to do some math in order to calculate the number of days (or months or years) between two dates. (Sometimes it is handy to know that there are 86,400 seconds in a day.) If your date is displayed as stars or if only part of the year is showing in the SPSS Data Editor, you can make the column wider and the dates will display properly.

compute diff = edate - dob.
compute age = diff/(60*60*24*365.25).
compute age1 = xdate.year(diff) - 1582.
compute age2 = xdate.year(edate) - xdate.year(dob).
exe.
list edate dob diff age age1 age2.

     EDATE        DOB     DIFF      AGE     AGE1     AGE2

12/19/2002 01/10/1923 2.52E+09    79.94    80.00    79.00
12/14/2002 03/06/1919 2.64E+09    83.78    84.00    83.00
11/18/2001 09/08/1945 1.77E+09    56.19    56.00    56.00
09/15/2002          .      .        .        .        .
07/03/2003 10/18/1956 1.47E+09    46.70    47.00    47.00
04/05/2002 04/14/1965 1.17E+09    36.97    37.00    37.00
03/26/2003 03/09/1942 1.93E+09    61.05    61.00    61.00
01/19/2002 05/07/1936 2.07E+09    65.70    66.00    66.00
06/06/2002 08/16/1952 1.57E+09    49.80    50.00    50.00
02/06/2003 07/17/1954 1.53E+09    48.56    49.00    49.00
07/09/2002 04/16/1941 1.93E+09    61.23    62.00    61.00
10/18/2002 06/08/1936 2.09E+09    66.36    67.00    66.00
05/20/2002 12/12/1953 1.53E+09    48.44    49.00    49.00
09/08/2003 11/08/1939 2.01E+09    63.83    64.00    64.00
         . 10/23/1961      .        .        .        .

Number of cases read:  15    Number of cases listed:  15

Now suppose that you have a data set that has the date in three different variables (i.e., three col;umns) and you want to combine them into one variable. You can use the date.dmy or other similar date functions to do this.

compute date = date.dmy(day,month,year).
exe.
list day month year date.

     DAY    MONTH     YEAR     DATE

      23       12     1962 1.20E+10
      25       11     1969 1.22E+10
      12       10     2001 1.32E+10
      19        8     2003 1.33E+10
      10        3     1987 1.28E+10
       2        6     1945 1.14E+10
      16        4     1996 1.30E+10
      13        7     1978 1.25E+10
      11        5     1982 1.26E+10
       3        2     1973 1.23E+10
      31        1     1992 1.29E+10
      29        3     1986 1.27E+10
      25       10     1973 1.23E+10
      30       12     1945 1.15E+10
       7        6     1997 1.31E+10

Number of cases read:  15    Number of cases listed:  15

You can change the appearance of the variable date by expanding the column so that the number is not shown in scientific notation, and you can go the Variable View window and change the type of variable for date from numeric to date and select the display option that you like.

Now let’s extract the day from the variable date. We already have this information in the variable day, but that will provide a check that we have done this correctly.

compute exday = xdate.mday(date).
exe.
list day exday date.

     DAY    EXDAY     DATE

      23    23.00 1.20E+10
      25    25.00 1.22E+10
      12    12.00 1.32E+10
      19    19.00 1.33E+10
      10    10.00 1.28E+10
       2     2.00 1.14E+10
      16    16.00 1.30E+10
      13    13.00 1.25E+10
      11    11.00 1.26E+10
       3     3.00 1.23E+10
      31    31.00 1.29E+10
      29    29.00 1.27E+10
      25    25.00 1.23E+10
      30    30.00 1.15E+10
       7     7.00 1.31E+10

Number of cases read:  15    Number of cases listed:  15

9. Documenting data

There are many ways to document your data using SPSS. There are also several commands that you can use to view the documentation that you have created, including sysfile info and display. When using the sysfile info command, you must specify the file path.

Please note that the up to version 13 of SPSS, the maximum length of a variable label is 255 characters and the maximum length of a value label is 60 characters. In version 14, those limits were greatly increased.

sysfile info 'd:data.sav'.
document I collected these data on January 16, 2003 and
blah blah blah.
display document.
* document drop.
file label SPSS Syntax Seminar data file.
save outfile 'd:data1.sav'.
sysfile info 'd:data1.sav'.
variable labels str1 'answer to question 7'
str2 'answer to question 8'.
display labels.
value labels q1 1 'strongly disagree' 2 'disagree' 3 'agree' 4 'strongly agree'.
value labels q2 to q3 q5 1 'strongly disagree' 2 'disagree' 3 'agree' 
4 'strongly agree'.
freq var = q1 to q5.
save outfile 'd:data2.sav'.
display dictionary.

10. Missing data

There are two different types of missing data in SPSS: system-missing and user-defined missing. System-missing is displayed as a dot (.) in the column of a numerical variable. String variables cannot have system-missing values; even a null string is considered a value. You can define your own missing values (called user-defined missing) for either numeric or short string variables. Missing values are considered the lowest possible value in SPSS. Although displayed differently, both system-missing and user-defined missing values are just missing values to SPSS; they are treated the same way (except in filter variables, see below). Both will be deleted from analyses that call for case-wise deletion. The only "difference" is that they will be displayed in separate categories in crosstabs, frequencies, etc.

missing values q1 to q5 (-9).
exe.
missing values q1 (-8).
exe.
missing values q1 (-9 -8).
missing values str1 ('x').
exe.

It is important to realize is that you can create the same variable in different ways, and that the missing values may be handled differently.

compute y = q1+q2.
compute y1 = sum(q1, q2).
exe.
list q1 q2 y y1.

      Q1       Q2        Y       Y1

    3.00     3.00     6.00     6.00
    2.00     2.00     4.00     4.00
    3.00     1.00     4.00     4.00
    4.00     1.00     5.00     5.00
   -8.00     1.00      .       1.00
   -8.00     2.00      .       2.00
    3.00    -9.00      .       3.00
    4.00     4.00     8.00     8.00
    1.00     1.00     2.00     2.00
    2.00    -9.00      .       2.00
    3.00     3.00     6.00     6.00
    3.00     1.00     4.00     4.00
   -9.00     4.00      .       4.00
     .       2.00      .       2.00
    2.00     3.00     5.00     5.00

Number of cases read:  15    Number of cases listed:  15

11. Creating and using filters (subsetting data)

You can create variables to use as filter variables and keep them in your data set. In constructing a variable to use as a filter variable, we suggest that you create a 0/1 (dummy) variable, where the cases with the 0s will be filtered out. It is important to note that SPSS does not treat system-missing and user-defined missing values the same way when applying the filter: cases with system-missing values will be filtered out, but cases with user-defined missing values will not. In other words, SPSS only looks for two specific values to be filtered out of your data: 0 and system-missing. You can use the filter on command to begin filtering your data. You can use either the filter off command or the use all command to end the filtering of your data. The select if command will permanently delete data from your data file. The command select if is the same as using the filter in the point-and-click interface with the "delete" radio button selected.

filter by fltr. desc num1 num2.

filter off. * use all. desc num1 num2.

One command that can be used only via syntax is temporary. In the syntax below, we will use the temporary command so that our observations are not permanently deleted from our data file when we use the select if command. The temporary command stays in effect only until the next executable command is executed. That is why the output for the first list command (which is the first executable command after temporary) has only seven observations (the seven that met the criteria listed on the select if command), while the second list command includes all of the observations from our data set. Although for this seminar we only use the temporary command while subsetting, it has many other uses.

temporary.
select if (gender = "f" and q1 ge 2).
list num1.
list num1.

    NUM1

   20.00
   20.00
   52.00
   75.00
   34.00
   73.00
   20.00

Number of cases read:  7    Number of cases listed:  7

    NUM1

   20.00
   20.00
   52.00
   63.00
   45.00
   93.00
   28.00
   75.00
   96.00
   34.00
   73.00
   20.00
   55.00
   91.00
   78.00

Number of cases read:  15    Number of cases listed:  15

Another command that you can use to subset your data is split file. You will first need to sort your data by the variable to will be used in the split file command. The split file command will remain in effect until you use the split file off command to turn it off.

sort cases by gender.
split file by gender.
desc num1 num2.

In this data set there are actually three values of gender: missing (a null string), "f" and "m". Notice also that you do not get the total for all cases.

split file off.
desc num1 num2.

12. Collapsing across observations

The aggregate command creates a new data set that is aggregated (or collapsed) by a variable or variables. The command also creates one or more new variables that require that the original variables be aggregated. There are about a dozen functions that can be used to create these new variables. Because a new data file is being created and replaces the one in the Data Editor (in older versions of SPSS) or opens a new data file in a new Data Editor window, we strongly suggest that you save your current data file before running this command. The aggregate command ignores all split file commands.

get file 'd:data.sav'.
aggregate outfile 'd:new.sav'
 /break gender
 /aveq1 = mean(q1).
get file 'd:new.sav'.
list.

GENDER      AVEQ1

            -9.00
f            1.50
m             .40

Number of cases read:  3    Number of cases listed:  3

get file 'd:data.sav'.
aggregate outfile 'd:new1.sav'
 /break gender
 /aveq1 = mean(q1)
 /sumq1 = sum(q1)
 /miss3 = numiss(q3)
 /pin5 = pin(q5, 2, 4).
get file 'd:new1.sav'.
list.

GENDER      AVEQ1    SUMQ1   MISS3  PIN5

            -9.00    -9.00       0 100.0
f            1.50    12.00       1  62.5
m             .40     2.00       0  50.0

Number of cases read:  3    Number of cases listed:  3

13. Reshaping data

The varstocases command can be used to reshape data from the wide to the long format. Note that reshaping data (either from long to wide or from wide to long) involves creating a new data set that will replace the data set currently open in the SPSS Data Editor. Therefore, it is VERY important that you save a copy of your original data set before reshaping it.

get file 'd:data.sav'.
list q1 to q3 
 /cases from 1 to 10.

      Q1       Q2       Q3

    3.00     3.00      .
    2.00     2.00    -9.00
    3.00     1.00     2.00
    4.00     1.00     2.00
   -8.00     1.00     3.00
   -8.00     2.00     1.00
    3.00    -9.00     4.00
    4.00     4.00     2.00
    1.00     1.00     1.00
    2.00    -9.00     3.00

Number of cases read:  10    Number of cases listed:  10

In the varstocases command below, the /index subcommand creates a variable that tells you what variable the data point came from (in this case, q1, q2 or q3). The /id subcommand creates a variable that tells you from what row in the original data set the data point came from. The /drop subcommand is optional and is used only to get rid of unwanted variables in the new data set.

varstocases
 /make q from q1 to q3
 /index = number
 /id = new_id
 /drop num1 to year.
list.

      ID NUMBER        Q

       1     1      3.00
       1     2      3.00
       2     1      2.00
       2     2      2.00
       2     3     -9.00
       3     1      3.00
       3     2      1.00
       3     3      2.00
       4     1      4.00
       4     2      1.00
       4     3      2.00
       5     1     -8.00
       5     2      1.00
       5     3      3.00
       6     1     -8.00
       6     2      2.00
       6     3      1.00
       7     1      3.00
       7     2     -9.00
       7     3      4.00
       8     1      4.00
       8     2      4.00
       8     3      2.00
       9     1      1.00
       9     2      1.00
       9     3      1.00
      10     1      2.00
      10     2     -9.00
      10     3      3.00
      11     1      3.00
      11     2      3.00
      11     3      2.00
      12     1      3.00
      12     2      1.00
      12     3      1.00
      13     1     -9.00
      13     2      4.00
      13     3      4.00
      14     2      2.00
      14     3      4.00
      15     1      2.00
      15     2      3.00
      15     3      1.00

Number of cases read:  43    Number of cases listed:  43

The casestovars command can be used to reshape data from the long to the wide format. Note that there is very useful information in the output and that there are labels for the variables.

get file 'd:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/long-1.sav'.
list.

   TRIAL     OUT1     OUT2 IVAR

    1.00    26.00     1.00 a
    1.00    32.00     4.00 b
    1.00    31.00     5.00 c
    2.00    32.00     2.00 a
    2.00    36.00     9.00 b
    2.00    33.00     4.00 c
    3.00    35.00     3.00 a
    3.00    38.00     2.00 b
    3.00    35.00     5.00 c
    4.00     6.00     5.00 a
    4.00     2.00     3.00 b
    4.00     5.00     4.00 c
    5.00     5.00     6.00 a
    5.00     5.00     1.00 b
    5.00     3.00     7.00 c

Number of cases read:  15    Number of cases listed:  15

sort cases by trial.
casestovars
 /id = trial
 /index = ivar
 /drop out2.
list.

   TRIAL        A        B        C

    1.00    26.00    32.00    31.00
    2.00    32.00    36.00    33.00
    3.00    35.00    38.00    35.00
    4.00     6.00     2.00     5.00
    5.00     5.00     5.00     3.00

Number of cases read:  5    Number of cases listed:  5

14. The keywords "by" and "with"

In most of the analysis commands in SPSS, the keyword by indicates that a categorical variable or variables will follow, while the keyword with indicates that a continuous variable or variables will follow. If you have a dichotomous variable (i.e., a 0/1 variable), you can list it either after with or by. Note that SPSS might change the coding of the 0/1 variable if you put it after by, so be sure to check the output carefully.

get file 'd:data.sav'.

unianova num1 by gender.

unianova num1 by gender with q1.

logistic regression binary with num1 by gender
 /categorical gender.

regress 
 /dependent num1 
 /method = enter num2 binary.

15. Pasting syntax

GET FILE='D:data.sav'.

* using the point-and-click interface.
* analyze - descriptives - explore.

EXAMINE
  VARIABLES=num1 BY gender
  /PLOT BOXPLOT STEMLEAF
  /COMPARE GROUP
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.

examine num1 by gender.

Notice that we get exactly the same output using both of the examine commands above. As you can see, when you paste the syntax, SPSS includes many of the default options, and these can clutter the syntax. It is a good idea to play around with syntax that you have pasted to see what subcommands can be eliminated without changing the output. In the example above, all of the subcommands can be eliminated.

Also, SPSS capitalizes its commands and keywords to help people distinguish between these and variables names, etc. However, SPSS syntax is not case-sensitive. This applies both the commands (and keywords) as well as to variable names. For example, if you have a variable in your data set called New_Var, you can type new_var, NEW_VAR, nEW_vAR, or any other combination of upper and lower case letters, so long as you have spelled the variable name correctly.

16. The SPSS syntax guide

You can access the SPSS syntax guide by clicking on "Help" and then "Command Syntax Reference" from any of the SPSS windows (the Data Editor, Syntax or Output windows).

17. System variables

SPSS sometimes uses internal variables that you never see in the Data Editor. You can call on some of these internal variables, which SPSS calls "system variables," to make certain tasks easier. All system variables begin with a $. For example, SPSS keeps information about case numbers (which are the numbers that you see along the left side of the Data Editor in the gray bar) in a system variable called $casenum. You can use this variable if you want to create an id variable that is part of your data set.

compute id = $casenum.
exe.

Another handy system variable is $sysmis, which can be used when you want to specify that a newly created variable (or some of its values) should be set to system missing.

compute miss = $sysmis.
compute miss1 = 1.
if missing(q1) or missing(q3) miss1 = $sysmis.
exe.
list miss q1 q3 miss1.

    MISS       Q1       Q3    MISS1

     .       3.00      .        .
     .       2.00    -9.00     1.00
     .       3.00     2.00     1.00
     .       4.00     2.00     1.00
     .      -8.00     3.00     1.00
     .      -8.00     1.00     1.00
     .       3.00     4.00     1.00
     .       4.00     2.00     1.00
     .       1.00     1.00     1.00
     .       2.00     3.00     1.00
     .       3.00     2.00     1.00
     .       3.00     1.00     1.00
     .      -9.00     4.00     1.00
     .        .       4.00      .
     .       2.00     1.00     1.00

Number of cases read:  15    Number of cases listed:  15

When working with dates, a potentially useful system variable is $jdate. This variable gives the current date as the number of days from October 14, 1582.

compute today = $jdate.
exe.

18. For more information

We have many Learning Modules and Frequently Asked Questions that will provide additional information:

SPSS Learning Modules

Creating and recoding variables

Using SPSS functions for making and recoding variables

Subsetting data

How can I analyze a subset of my data?

Labeling and documenting data

Using dates in SPSS

SORT and SPLIT BY

Reshaping data from wide format to long format in versions 11.0 and up

Reshaping data from long format to wide format in versions 11.0 and up

How can I see the number of missing values and patterns of missing values in my data file?

How do I count the number of missing values for a character variable?

How can I get a count of how many cases are missing in a string variable?

How can I easily convert a string variable into a categorical numeric variable?

How do I create and modify string (character) variables?

How can I change a string variable into a numeric variable?

What does the keyword "to" indicate in SPSS?

Updating SPSS for Windows

We also have some great books that UCLA researchers can check out from our Stat Books for Loan , please see https://stats.idre.ucla.edu/stat/books/#SPSS .