1.0 SPSS commands used in this unit
sysfile info | displays information about the specified data file |
codebook | displays information about the active data file |
save outfile | saves the data file |
display | displays attributes of the data set |
variable labels | labels a variable |
value labels | adds labels to values of a variable |
autorecode | recodes variables and automatically adds value labels |
rename variables | renames variables |
recode | recodes variables |
document | adds a document to the data set |
compute | creates new numeric variables |
summarize | calculates descriptive statistics |
aggregate | creates new variables with aggregated data |
2.0 Demonstration and explanation
Let’s begin by opening the data file.
* open the data file. get file "c:spss_datahs0.sav".
It is often useful to see information regarding the data file, such as the number of cases and variables, any type of labels, etc. You can use either the sysfile info command or the codebook command, which was introduced in SPSS version 17.
* using sysfile info to view the properties of the data set. * Because we have not listed any variables after the command, SPSS will show us the * codebook for all of the variables. sysfile info "c:spss_datahs0.sav".codebook.
2.1 Reordering variables
Reordering variables in the data file is helpful both for organizational reasons as well as to minimize the amount of scrolling you need to do in order to see the variables that you are working with. We will use the “cut and paste” method of reordering the variables.
* ordering the variables in a way that * makes sense. save outfile = "c:spss_datahs01.sav" / keep id gender all. get file "c:spss_datahs01.sav". display variables.
2.2 Adding variable and value labels
Adding variable labels is a very useful data management strategy, and we encourage you to take the time to do this when you input a data set or receive a data file.
* adding variable and value labels to schtyp. variable labels schtyp "the type of school the student attended.". value labels schtyp 1 "public" 2 "private". display dictionary /var = schtyp. list schtyp /cases from 1 to 10.
2.3 Changing a string variable to a numeric variable
If we click on the “Variable View” tab, we can see that the variable prgtype is a string variable, and this may cause some difficulty when we are using this variable in analyses. So let’s create a numeric version of this variable.
* changing prgtype from a string * to a numeric variable (called prog). autorecode variables = prgtype /into prog /print.
Add a variable label to the variable that we just created.
* adding the variable label. variable labels prog "The type of program in which the student was enrolled.".
2.4 Renaming variables
Renaming variables is easy. We can rename the variable gender to female, and then add variable and values labels.
* renaming the variable gender to female and adding * a variable label and value labels. rename variables (gender = female). variable labels female "The gender of the student.". value labels female 1 "female" 0 "male". display dictionary /var = female. list female /cases from 1 to 10.
2.5 Recoding values
Suppose that we would like to recode some values of a variable. For example, we might want to change the 5s to missing. If you like, you can use the frequencies command before and after the recoding to see the changes. You may also want to include some reminders of this change. We can create a document for this purpose.
* recoding race = 5 to missing. frequencies var = race. recode race (5 = sysmis). frequencies var = race. * adding notes to the data set and viewing the notes. document The variable gender was renamed to female. document Values of race coded as 5 were recoded to be missing. display document.
2.6 Creating a new variable
There are many ways that you can create a new variable. One way is to use a numeric expression. For example, let’s create a variable called total that will be the sum of the reading, writing and math scores.
* creating a variable that is a total * of some of the test scores. compute total = read + write + math. summarize var = total.
It might make more sense to add the social studies score to the total rather than the math score, so let’s change that.
* creating a variable that is a total * of the reading writing and social * studies test scores. compute total = read + write + socst. variable labels total "the total of the reading, writing and social studies scores.".
Now let’s summarize the variable that we have just created.
* creating a variable that is a total * of some of the test scores. summarize var = total. display dictionary /var = total.
We will recode total to become grade as shown below.
* assigning some letter grades to these test scores. recode total (0 thru 80=0) (80 thru 110 =1) (110 thru 140=2) (140 thru 170=3) (170 thru 300=4) into grade. execute. value labels grade 0 "f" 1 "d" 2 "c" 3 "b" 4 "a". variable labels grade "these are the combined grades of reading, writing and social studies scores.". display dictionary /var = grade. list read write socst grade /cases from 1 to 10.
Let’s label the data set itself so that we will remember what the data are. We can also add some notes to the data set.
file label "High School and Beyond". document The variable gender was renamed to female; The values of race coded as 5 were recoded to be missing. display document.
Finally, let’s make z-scores of some of our variables. There are at least two way that you could do this. If you remember the formula for creating z-scores and you know the mean of the variable, you can use the transform -> compute function as we did before. Another way to create the z-scores is shown below.
* there is another way to create variables * in SPSS that uses special functions. descriptives var = read /save. summarize var = zread. list read zread /cases from 1 to 10.2.7 Using functions
SPSS has many functions that you can use to create new variables. First we will create a new variable that contains the mean of read for each level of ses.
aggregate /break = ses /rmean = mean(read).
Next, we will create a new variable that contains the mean of several variables. Please note that there will be a mean for observation 9 even though it has a missing value for science.
compute row_mean = mean(read, write, math, science). exe.
Before we leave this unit, let’s save the data set.
save outfile "c:spss_datahs1.sav".
3.0 For more information
- SPSS Programming and Data Management, Fourth Edition
- Chapter 6
- SPSS Learning Modules
- SPSS Frequently Asked Questions
How can SPSS help me document my data?
How can I change a string variable into a numeric variable?
What kinds of new variables can I make with the create command?
What are some of the differences between the compute, create and shift values commands?