Sometimes you may want to analyze your data based on categories or a grouping variable. One way that you could do this is to split the data file into different data files and conduct the same analyses on the two (or more) data sets. However, that is cumbersome and error prone. Several commands in SPSS will allow you to do separate analyses by category, and we will consider them below.
Let’s use the example data set below. You will notice that one of the independent variables, iv1, is a string variable. We will use this variable as our grouping variable to demonstrate how to use a string variable as the grouping variable. All of the techniques that will be shown can be used with a numeric categorical variable as well.
data list list / sub * iv1 (A) iv2 * dv1 dv2. begin data 1 "1" 1 48 25 2 "1" 1 49 37 3 "1" 1 50 55 4 "2" 1 17 19 5 "2" 1 20 38 6 "2" 2 23 48 7 "2" 2 28 44 8 "3" 2 28 68 9 "3" 2 30 30 10 "3" 2 32 37 end data.
To begin with, suppose we wanted to find the mean and standard deviation for dv1 for groups one, two and three in iv1. We can use the means command to obtain simple descriptive statistics.
means tables= dv1 by iv1.
Case Processing Summary Cases Included Excluded Total N Percent N Percent N Percent DV1 * IV1 10 100.0% 0 .0% 10 100.0%
Report
DV1IV1 Mean N Std. Deviation 1 49.0000 3 1.00000 2 22.0000 4 4.69042 3 30.0000 3 2.00000 Total 32.5000 10 12.25878
You could also use the examine command, as shown below. We will use the plot = none subcommand to suppress the stem-and-leaf and boxplots.
examine dv1 by iv1 /plot = none.
Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent DV1 10 100.0% 0 .0% 10 100.0%
Descriptives Statistic Std. Error DV1 Mean 32.5000 3.87657 95% Confidence Interval for Mean Lower Bound 23.7306 Upper Bound 41.2694 5% Trimmed Mean 32.3889 Median 29.0000 Variance 150.278 Std. Deviation 12.25878 Minimum 17.00 Maximum 50.00 Range 33.00 Interquartile Range 26.0000 Skewness .516 .687 Kurtosis -1.278 1.334
Case Processing Summary Cases Valid Missing Total IV1 N Percent N Percent N Percent DV1 1 3 100.0% 0 .0% 3 100.0% 2 4 100.0% 0 .0% 4 100.0% 3 3 100.0% 0 .0% 3 100.0%
Descriptives IV1 Statistic Std. Error DV1 1 Mean 49.0000 .57735 95% Confidence Interval for Mean Lower Bound 46.5159 Upper Bound 51.4841 5% Trimmed Mean . Median 49.0000 Variance 1.000 Std. Deviation 1.00000 Minimum 48.00 Maximum 50.00 Range 2.00 Interquartile Range . Skewness .000 1.225 Kurtosis . . 2 Mean 22.0000 2.34521 95% Confidence Interval for Mean Lower Bound 14.5365 Upper Bound 29.4635 5% Trimmed Mean 21.9444 Median 21.5000 Variance 22.000 Std. Deviation 4.69042 Minimum 17.00 Maximum 28.00 Range 11.00 Interquartile Range 9.0000 Skewness .543 1.014 Kurtosis -.153 2.619 3 Mean 30.0000 1.15470 95% Confidence Interval for Mean Lower Bound 25.0317 Upper Bound 34.9683 5% Trimmed Mean . Median 30.0000 Variance 4.000 Std. Deviation 2.00000 Minimum 28.00 Maximum 32.00 Range 4.00 Interquartile Range . Skewness .000 1.225 Kurtosis . .
Now let's a technique that is more general and that can be used with any type of analysis. First, we need to sort the data by by our grouping variable, in this case, iv1. Then we split the file by the same variable. The split file command temporarily splits the file by the variable specified. All analyses will be grouped by this variable until the split file off command is issued, or until the data are resorted. Note that the split file command can be used with numeric, short and long string variables. (Many SPSS commands will not work with long string variables, but split file will.) Next, list the commands for the analyses that you would like. Finally, issue the split file off command.
sort cases by iv1. split file by iv1. correlations var = dv1 with dv2.
Correlations IV1 DV2 1 DV1 Pearson Correlation .993 Sig. (2-tailed) .073 N 3 2 DV1 Pearson Correlation .780 Sig. (2-tailed) .220 N 4 3 DV1 Pearson Correlation -.766 Sig. (2-tailed) .444 N 3
split file off.
Note that you can use more than one variable to categorize your analysis. To do so, list all of the variables by which you want the analysis categorized in the sort cases command and in the split file command.
sort cases by iv1 iv2. split file by iv1 iv2. correlations var = dv1 with dv2.
Correlations IV1 IV2 DV2 1 1.00 DV1 Pearson Correlation .993 Sig. (2-tailed) .073 N 3 2 1.00 DV1 Pearson Correlation 1.000 Sig. (2-tailed) . N 2 2.00 DV1 Pearson Correlation -1.000 Sig. (2-tailed) . N 2 3 2.00 DV1 Pearson Correlation -.766 Sig. (2-tailed) .444 N 3 split file off.