SPSS Library: Data setup for comparing means in SPSS

                      Data Setup for Comparing Means in SPSS

                                 David P. Nichols
                           Senior Support Statistician
                                    SPSS, Inc.
                                    April 1994
	
	 
	     Testing hypotheses about equality of means is one of the
	most commonly used applications of statistical software. SPSS
	offers a variety of procedures capable of performing mean
	comparisons. Several of these procedures are fairly simple,
	designed to easily handle specific problems, while others are
	more general, and necessarily more complex. In order to
	successfully employ any of these options, users need to be
	familiar with the data structure required by SPSS. Judging by the
	number of statistical support calls that involve questions of
	data setup for procedures ranging from T-TEST to MANOVA, many
	users are not clear on the logic of this structure.
	     SPSS, like most other statistical software,
	primarily works on a rectangular cases by variables format. That
	is, rows of the rectangular data matrix represent cases, while
	columns denote variables. (Even though on occasion data sets are
	large enough to require multiple records or lines per case, the
	logic remains as if we were still using one line and simply
	wrapping it around as many times as necessary.) The decisive
	question when we look to compare two or more means is whether
	they represent means of independent or related samples.
	     The independent vs. related samples distinction is usually
	equivalent to the question of whether we want to compare means of
	two or more groups of cases or the means of the same group of
	cases under two or more conditions. For this reason the terms
	between subjects and within subjects are commonly used to denote
	the type of comparison(s) desired. In the T-TEST procedure these
	two kinds of analysis are referred to as independent vs. related
	samples tests. The generalization of the related samples (within
	subjects) situation to more than two time points or conditions is
	handled most generically in the MANOVA procedure via the
	WSFACTORS specification, though the RELIABILITY procedure's
	STATISTICS=ANOVA option also provides some tests of means of
	related samples.
	
	Setup for Independent Samples (Between Subjects Analyses)
	
	     If the desired comparison(s) involve between subjects or
	independent samples data, the appropriate data structure involves
	one or more grouping variables to identify what kind of case each
	line of data represents, with the values for the variable(s) on
	which we wish to compare the groups listed in one or more
	separate variable(s). Thus the proper data setup for a comparison
	of the means of two groups of cases would be along the lines of: 
	
	DATA LIST FREE / GROUP Y.
	BEGIN DATA
	1 5.2
	1 4.3
	...
	2 7.1
	2 6.9
	END DATA.
	
	In other words SPSS needs something to tell it which group a case
	belongs to (this variable--called GROUP in our example--is often
	referred to as a factor variable), as well as the value of the
	measured variable(s) of interest (Y). Once the data are
	successfully entered in this format, any of the following procedure
	commands can be used to obtain a test of the null hypothesis of
	equal population means for the two groups:
	
	T-TEST GROUPS=GROUP /VAR=Y.
	
	MEANS Y BY GROUP /STATISTICS=ANOVA.
	
	ONEWAY Y BY GROUP(1,2).
	
	ANOVA Y BY GROUP(1,2).
	
	MANOVA Y BY GROUP(1,2).
	
	     For situations in which there are three or more groups the
	same structure would prevail, except that there would be more
	than two values for the GROUP variable, and of course then we
	could not use the T-TEST procedure to compare more than two means
	at one time. If there are data groupings defined by more than one
	type of factor, such as gender and geographical region, then we
	simply have more grouping variables (such as GENDER with two
	categories and REGION with several) entered in our data set. In
	this case we move to either ANOVA or MANOVA, since MEANS and
	ONEWAY are designed specifically for use with one grouping
	factor. 
	
	Setup for Paired or Related Samples (Within Subjects Analyses)
	
	     Suppose instead of wanting to compare the means of two or
	more groups of cases, we now want to make comparisons among
	measurements taken on the same cases at different times or under
	different conditions. Since the repeated measures or time example
	is so common, we will call the factor of interest here TIME. The
	difference between this situation and that involving between
	subjects analyses is that here we are concerned with comparing
	related measurements on the same cases. Thus the data setup is
	different. Rather than having one variable distinguish among the
	cases on the basis of group membership, we simply have two
	measured variables for each case. If we call these TIME1 and
	TIME2, the data setup might look like:
	
	DATA LIST FREE / TIME1 TIME2.
	BEGIN DATA
	1.5 3.8
	2.1 4.2
	...
	3.2 4.7
	END DATA.
	
	The MEANS, ONEWAY and ANOVA procedures are not useful, as they do
	not handle within subjects data. Instead we could obtain the same
	results in varying forms of presentation by any of the following
	specifications:
	
	T-TEST PAIRS=TIME1 TIME2.
	
	RELIABILITY VARIABLES=TIME1 TIME2 
	 /STATISTICS=ANOVA.
	
	MANOVA TIME1 TIME2
	 /WSFACTORS=TIME(2).
	
	     Should we move to a comparison involving more than two
	related means we would not be able to use the T-TEST procedure,
	and the results produced by the RELIABILITY procedure, though
	presented in a more familiar format for many people than those
	given in MANOVA, will provide only a part of the information
	given by MANOVA, and this information will only be strictly valid
	under some fairly severe assumptions. For this reason users are
	generally much safer to work with MANOVA for within subjects
	analyses. Adding more time points would produce no structural
	changes in the MANOVA specification, only a longer list of
	dependent variables and a change in the number of levels of the
	WSFACTOR TIME. Note that this name is arbitrary; we can call this
	factor anything we want as long as it is eight characters or less
	and does not match any reserved words in MANOVA.
	     Note that if we are using data in which there are both
	grouping or between subjects factors and related or repeated
	variables forming within subjects factors, MANOVA is the only
	procedure we can use. If we had two groups measured at two time
	points and wished to perform a factorial analysis of variance on
	these data, comparing groups across time, time changes across
	groups, and the interaction of the two, we would use syntax such
	as:
	
	DATA LIST FREE / GROUP TIME1 TIME2.
	BEGIN DATA
	1 2.1 4.2
	1 3.0 3.6
	...
	2 2.5 2.1
	2 3.1 2.6
	END DATA.
	MANOVA TIME1 TIME2 BY GROUP(1,2)
	 /WSFACTORS=TIME(2).