1. Introduction
This module will examine the use of the sort cases and split file commands in SPSS. We will illustrate this with the data file shown below.
DATA LIST / make 1-7 (A) mpg 9-10 rep78 12 weight 14-17 foreign 19 . BEGIN DATA. AMC 22 3 2930 0 AMC 17 3 3350 0 AMC 22 2640 0 Audi 17 5 2830 1 Audi 23 3 2070 1 BMW 25 4 2650 1 Buick 20 3 3250 0 Buick 15 4 4080 0 Buick 18 3 3670 0 Buick 26 2230 0 Buick 20 3 3280 0 Buick 16 3 3880 0 Buick 19 3 3400 0 Cad. 14 3 4330 0 Cad. 14 2 3900 0 Cad. 21 3 4290 0 Chev. 29 3 2110 0 Chev. 16 4 3690 0 Chev. 22 3 3180 0 Chev. 22 2 3220 0 Chev. 24 2 2750 0 Chev. 19 3 3430 0 Datsun 23 4 2370 1 Datsun 35 5 2020 1 Datsun 24 4 2280 1 Datsun 21 4 2750 1 END DATA. LIST.
The output from the list command is shown below.
MAKE MPG REP78 WEIGHT FOREIGN AMC 22 3 2930 0 AMC 17 3 3350 0 AMC 22 . 2640 0 Audi 17 5 2830 1 Audi 23 3 2070 1 BMW 25 4 2650 1 Buick 20 3 3250 0 Buick 15 4 4080 0 Buick 18 3 3670 0 Buick 26 . 2230 0 Buick 20 3 3280 0 Buick 16 3 3880 0 Buick 19 3 3400 0 Cad. 14 3 4330 0 Cad. 14 2 3900 0 Cad. 21 3 4290 0 Chev. 29 3 2110 0 Chev. 16 4 3690 0 Chev. 22 3 3180 0 Chev. 22 2 3220 0 Chev. 24 2 2750 0 Chev. 19 3 3430 0 Datsun 23 4 2370 1 Datsun 35 5 2020 1 Datsun 24 4 2280 1 Datsun 21 4 2750 1
2. Sorting data with the sort cases command
We can use the sort cases command to sort this data file. The program below sorts the file on the variable foreign (1=foreign car, 0=domestic car).
SORT CASES BY foreign. LIST.
From the output of the list command shown below, you can see that the data are indeed sorted on foreign. The observations where foreign is 0 precede all of the observations where foreign is 1. Note that the order of the observations within each group remained unchanged, (i.e., the observations where foreign is 0 remain in the same order).
MAKE MPG REP78 WEIGHT FOREIGN AMC 22 3 2930 0 AMC 17 3 3350 0 AMC 22 . 2640 0 Buick 20 3 3250 0 Buick 15 4 4080 0 Buick 18 3 3670 0 Buick 26 . 2230 0 Buick 20 3 3280 0 Buick 16 3 3880 0 Buick 19 3 3400 0 Cad. 14 3 4330 0 Cad. 14 2 3900 0 Cad. 21 3 4290 0 Chev. 29 3 2110 0 Chev. 16 4 3690 0 Chev. 22 3 3180 0 Chev. 22 2 3220 0 Chev. 24 2 2750 0 Chev. 19 3 3430 0 Audi 17 5 2830 1 Audi 23 3 2070 1 BMW 25 4 2650 1 Datsun 23 4 2370 1 Datsun 35 5 2020 1 Datsun 24 4 2280 1 Datsun 21 4 2750 1
Suppose you wanted the data sorted, but with the foreign cars (foreign=1) first and the domestic cars (foreign=0) second. The example below shows the (D) option that tells SPSS to sort the data in descending order (for the variable it precedes). In the example below, the data are sorted on foreign, but the order is reversed with the values going from largest to smallest.
SORT CASES BY foreign (D). LIST .
You can see from the output of the list command below that the data are now ordered by foreign, but highest to lowest.
MAKE MPG REP78 WEIGHT FOREIGN Audi 17 5 2830 1 Audi 23 3 2070 1 BMW 25 4 2650 1 Datsun 23 4 2370 1 Datsun 35 5 2020 1 Datsun 24 4 2280 1 Datsun 21 4 2750 1 AMC 22 3 2930 0 AMC 17 3 3350 0 AMC 22 . 2640 0 Buick 20 3 3250 0 Buick 15 4 4080 0 Buick 18 3 3670 0 Buick 26 . 2230 0 Buick 20 3 3280 0 Buick 16 3 3880 0 Buick 19 3 3400 0 Cad. 14 3 4330 0 Cad. 14 2 3900 0 Cad. 21 3 4290 0 Chev. 29 3 2110 0 Chev. 16 4 3690 0 Chev. 22 3 3180 0 Chev. 22 2 3220 0 Chev. 24 2 2750 0 Chev. 19 3 3430 0
It is also possible to sort on more than one variable at a time. Perhaps you would like the data sorted on foreign (this time we will go back to the normal sort order for foreign) and then sorted by rep78 within each level of foreign. The example below shows how this can be done.
SORT CASES BY foreign rep78. LIST .
The output of the list command below shows that the data are now ordered by foreign, domestic cars (foreign=0) followed by foreign (foreign=1) cars. Within the domestic cars, the data are sorted by rep78 and within foreign cars the data are also sorted by rep78.
MAKE MPG REP78 WEIGHT FOREIGN AMC 22 . 2640 0 Buick 26 . 2230 0 Cad. 14 2 3900 0 Chev. 22 2 3220 0 Chev. 24 2 2750 0 AMC 22 3 2930 0 AMC 17 3 3350 0 Buick 20 3 3250 0 Buick 18 3 3670 0 Buick 20 3 3280 0 Buick 16 3 3880 0 Buick 19 3 3400 0 Cad. 14 3 4330 0 Cad. 21 3 4290 0 Chev. 29 3 2110 0 Chev. 22 3 3180 0 Chev. 19 3 3430 0 Buick 15 4 4080 0 Chev. 16 4 3690 0 Audi 23 3 2070 1 BMW 25 4 2650 1 Datsun 23 4 2370 1 Datsun 24 4 2280 1 Datsun 21 4 2750 1 Audi 17 5 2830 1 Datsun 35 5 2020 1
In the output above, note how the missing values of rep78 were treated. When sorting the data, missing values are treated as the lowest value possible (e.g., negative infinity) so the missing values come before all other values of rep78.
3. Obtaining separate analyses with sorted data
Sometimes you would like to obtain results separately for different groups. For example, you might want to get the mean mpg and weight separately for foreign and domestic cars, as illustrated below.
MEANS weight BY foreign.
As you see below, it is possible to use means with the by option to get means separately for the foreign and domestic cars.
- - Description of Subpopulations - - Summaries of WEIGHT By levels of FOREIGN Variable Value Label Mean Std Dev Cases For Entire Population 3099.2308 695.0794 26 FOREIGN 0 3347.8947 627.1769 19 FOREIGN 1 2424.2857 325.1593 7 Total Cases = 26
However, what if you wanted to obtain the correlation of weight and mpg separately for foreign and domestic cars? The correlations command does not support a by option like means does. In such cases, you can sort the data and then use split file to obtain separate analyses, as illustrated below.
SORT CASES BY foreign. SPLIT FILE BY foreign. CORRELATIONS weight mpg.
As you see in the output below, using the split file by foreign resulted in getting the correlations output for the domestic cars and the correlations output for the foreign cars. In general, using the sort cases and split file commands request that subsequent commands be performed for every level of the by variable (in this case, for every level of foreign).
FOREIGN: 0 - - Correlation Coefficients - - WEIGHT MPG WEIGHT 1.0000 -.8624 ( 19) ( 19) P= . P= .000 MPG -.8624 1.0000 ( 19) ( 19) P= .000 P= . (Coefficient / (Cases) / 2-tailed Significance) " . " is printed if a coefficient cannot be computed FOREIGN: 1 WEIGHT MPG WEIGHT 1.0000 -.7101 ( 7) ( 7) P= . P= .074 MPG -.7101 1.0000 ( 7) ( 7) P= .074 P= . (Coefficient / (Cases) / 2-tailed Significance) " . " is printed if a coefficient cannot be computed
The split file command remains in effect until you turn it off (by typing split file off). For example, let’s do a descriptives on weight and mpg to illustrate this.
DESCRIPTIVES weight mpg.
As we expected, we are shown descriptives for foreign and domestic cars.
FOREIGN: 0 Number of valid observations (listwise) = 19.00 Valid Variable Mean Std Dev Minimum Maximum N Label WEIGHT 3347.89 627.18 2110 4330 19 MPG 19.79 4.04 14 29 19 FOREIGN: 1 Number of valid observations (listwise) = 7.00 Valid Variable Mean Std Dev Minimum Maximum N Label WEIGHT 2424.29 325.16 2020 2830 7 MPG 24.00 5.51 17 35 7
Now, let’s enter split file off and repeat the descriptives on weight and mpg to confirm that this ends the split file.
SPLIT FILE OFF. DESCRIPTIVES weight mpg.
As we would expect, we are shown descriptives for the overall sample.
Number of valid observations (listwise) = 26.00 Valid Variable Mean Std Dev Minimum Maximum N Label WEIGHT 3099.23 695.08 2020 4330 26 MPG 20.92 4.76 14 35 26
4. Problems to look out for
- If you use split file, be sure you have sorted the data first. For example, if you use split file by foreign then be sure that you have first issued a sort cases by foreign command.
5. For more information
- For more information about sort cases and split file see the SPSS Syntax Reference Guide .