My Coefficient
By David P. Nichols
Principal Support Statistician and Manager of Statistical Support
SPSS Inc.
From SPSS Keywords, Number 68, 1999
In classical test theory, the total variance of observed scores on a test (sX2) is defined (Lord & Novick, 1968) to be the sum of true score variance (sT2) and error variance (sE2):
s
The reliability of a test is defined in equivalent ways as
- The squared correlation between observed scores and true scores: rXT2,
- The proportion of total observed score variance that is due to variance in true scores: sT2/sX2,
- One minus the proportion of total variance that is due to error variance: 1 – (sE2/sX2).
In other words,
r
An essential feature of the definition of a reliability coefficient is that as a proportion of variance, it should in theory range between 0 and 1 in value. Unfortunately, the definitions given here include unobservable true and error scores, and when we turn from theory to practice, our attempts to estimate reliabilities can produce unexpected results. In practice, the possible values of estimates of reliability range from – to 1, rather than from 0 to 1.
To see that this is the case, let’s look at the most commonly cited formula for computation of Coefficient a, the most popular reliability coefficient. That formula is
a = [k/(k-1)][1 – (Ssi2/sX2)],
where k is the number of items, Ssi2 is the sum of the individual item variances taken over all k items, and sX2 is the scale variance. Since the term in the first set of brackets is always positive, a will be negative if and only if
Ss
or if and only if
Ss
In other words, a will be negative whenever the sum of the individual item variances is greater than the scale variance.
Since the variance of the sum of a set of random variables is equal to the sum of the individual variances plus twice the sum of their covariances (see, e.g., Hays (1981), Appendix C), and since the scale score is the sum of the individual item scores, the scale variance can be expressed as
sX2 = Ssi2 + SSsij,
where sij denotes the covariance between items i and j, and the double summation is taken over all combinations of i and j where i j. Thus, we can translate the necessary and sufficient condition for a to be negative as
Ss
or
SSs
In words, a will be negative whenever twice the sum of the item covariances is negative. This can be stated even more simply by saying that a will be negative whenever the average covariance among the items is negative.
To see that a can go to – , consider a scale consisting of two items with equal variance and a perfect negative correlation of -1. Since the covariance s12 between items 1 and 2 is defined (see, e.g, Lord & Novick) as
s
if r12 = -1, and s1 = s2 = s, then
s
Plugging these into the denominator of the ratio in the formula for a, we get
s
Thus a would be computed as
a
Though this is the most extreme case, SPSS users occasionally present a values that are negative and have magnitudes greater than 1, and want to know how this can happen. It must be borne in mind that a is actually a lower bound on the true reliability of a test under general conditions, and that it will only equal the true reliability if the items satisify a property known as essential t equivalence (Lord & Novick), which requires that their true scores are either all the same, or that each item s true score can be converted to any other item s true score by adding a fixed constant. This implies that in order for a to be a measure of reliability instead of a lower bound, the items must be measuring the same thing. Note that even if the items do satisfy the essential t equivalence assumption, if there is a good deal of error in measurement, sample values of a may be negative even though the population values are positive. This becomes less likely as the numbers of cases and items increases, because sampling variability is reduced.
If one encounters a negative value for a, implying a negative average covariance among items, the first thing that should be checked is to see whether data or item coding errors are responsible. A common problem of this type is that the scale consists of some items that are worded in opposite directions to alleviate response biases, and the researcher has forgotten to appropriately recode the reverse scored items, resulting in negative covariances where the actual covariances of interest are positive. Another possibility, most likely with small sample sizes and small numbers of items, is that while the true population covariances among items are positive, sampling error has produced a negative average covariance in a given sample of cases. Finally, it may simply be the case that the items do not truly have positive covariances, and therefore may not form a useful single scale because they are not measuring the same thing.
References
Hays, W. L. (1981). Statistics (3rd Ed.). Holt, Rinehart and Winston.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.