Example 1. A company that manufactures light bulbs claims that a particular type of light bulb will last 850 hours on average with standard deviation of 50. A consumer protection group thinks that the manufacturer has overestimated the lifespan of their light bulbs by about 40 hours. How many light bulbs does the consumer protection group have to test in order to make their point with reasonable confidence?
Example 2. It has been estimated that the average height of American male adults is 70 inches. It has also been postulated that there is a positive correlation between height and intelligence. If this is true, then the average height of a male graduate students on campus should be greater than the average height of American male adults in general. To test this theory, one would randomly sample a small group of male graduate students. However, one would need to know how many male graduate students need to measured such that the hypothesis can be reasonable tested.
Prelude to the power analysis
For the power analysis below, we are going to focus on Example 1 testing the average lifespan of a light bulb. Our first goal is to figure out the number of light bulbs that need to be tested. That is, we will determine the sample size for a given a significance level and power. Next, we will reverse the process and determine the power, given the sample size and the significance level.
We know so far that the manufacturer claims that the average lifespan of the light bulb is 850 with the standard deviation of 50, and the consumer protection group believes that the manufacturer has overestimated by about 40 hours. So in terms of hypotheses, our null hypothesis is H0 = 850 and our alternative hypothesis is Ha= 810.
The significance level is the probability of a Type I error, that is the probability of rejecting H0 when it is actually true. We will set it at the .05 level. The power of the test against Ha is the probability of that the test rejects H0. We will set it at .90 level.
We are almost ready for our power analysis. But let’s talk about the standard deviation a little bit. Intuitively, the number of light bulbs we need to test depends on the variability of the lifespan of these light bulbs. Take an extreme case where all the light bulbs have exactly the same lifespan. Then we just need to check a single light bulb to prove our point. Of course, this will never happen. On the other hand, suppose that some light bulbs last for 1000 hours and some only last 500 hours. We will have to select quite a few of light bulbs to cover all the ground. Therefore, the standard deviation for the distribution of the lifespan of the light bulbs will play an important role in determining the sample size.
In Sample Power, it is fairly straightforward to perform a power analysis for comparing means. After opening the program and starting a new analysis, we simply select ‘One sample t-test that mean = specific value’ and click ‘Okay’. This opens up a table of inputs, allowing us to enter the expected mean (alternative hypothesis mean), the constant to test against (null hypothesis mean), and the standard deviation. We can click on the icon bearing a set of binoculars and an arrow to call up a table of power values. Then we can select a power and click on ‘Find N’ to produce required sample size to achieve the given power.
The result tells us that we need a sample size at least 19 light bulbs to reject H0 under the alternative hypothesis Ha to have a power of 0.9.
Next, suppose we have a sample of size 10, how much power do we have keeping all of the other numbers the same? We can use the same program to calculate it.
Changing the N value to 10 and clicking ‘Compute’, the power is calculated and expressed not only numerically, but physically in the form of a bar that fills in blue.
You can see that the power is about .62 for a sample size of 10. What then is the power for sample size of 15?
So now the power is about .87. You could also do it again to find out the power for a sample size of 20. You’ll probably expect that the power will be greater.
We would also expect that if we specified a lower power or a smaller standard deviation, then the sample size should decrease. We can experiment with different values of power and standard deviation as shown below.
If the standard deviation is lower, then the sample size should also go down, as we discussed before.
The calculations performed to generate the power or sample size are based on an assumption that the variable is normally distributed. If the variable is not normally distributed, a small sample size usually will not truly have the power indicated in the results. It might not even be a good idea to perform a t-test on such a small sample to begin with if the normality assumption is in question.
Also, note that we really only need to know the difference between the two means, not both the individual values. In fact, what really matters is the difference in means over the standard deviation. We call this the effect size. For example, if we subtracted 800 from each mean, changing 850 to 50 and 810 to 10, our effect size would not change and we would get the same power.
If we standardize our variable, we can calculate the means in terms of change in standard deviations.
It is usually not an easy task to determine the "true" effect size. We make our best guess based upon the existing literature or a pilot study. A good estimate of the effect size is the key to a successful power analysis.
For more information on power analysis, please visit our Introduction to Power Analysis seminar.