Introduction
Power analysis is the name given to the process for determining the sample size for a research study. The technical definition of power is that it is the probability of detecting a “true” effect when it exists. Many students think that there is a simple formula for determining sample size for every research situation. However, the reality it that there are many research situations that are so complex that they almost defy rational power analysis. In most cases, power analysis involves a number of simplifying assumptions, in order to make the problem tractable, and running the analyses numerous times with different variations to cover all of the contingencies.
In this unit we will try to illustrate the power analysis process using a simple four group design.
Description of the Experiment
We wish to conduct a study in the area of mathematics education involving different teaching methods to improve standardized math scores in local classrooms. The study will include four different teaching methods and use fourth grade students who are randomly sampled from a large urban school district and are then random assigned to the four different teaching methods.
Here are the four different teaching methods which will be examined: 1) The traditional teaching method where the classroom teacher explains the concepts and assigns homework problems from the textbook; 2) the intensive practice method, in which students fill out additional work sheets both before and after school; 3) the computer assisted method, in which students learn math concepts and skills from using various computer based math learning programs; and, 4) the peer assistance learning method, which pairs each fourth grader with a fifth grader who helps them learn the concepts followed by the student teaching the same material to another student in their group.
Students will stay in their math learning groups for an entire academic year. At the end of the Spring semester all students will take the Multiple Math Proficiency Inventory (MMPI). This standardized test has a mean for fourth graders of 550 with a standard deviation of 80.
The experiment is designed so that each of the four groups will have the same sample size. One of the important questions we need to answer in designing the study is, how many students will be needed in each group?
The Power Analysis
In order to answer this question, we will need to make some assumptions and some educated guesses about the data. First, we will assume that the standard deviation for each of the four groups will be equal and will be equal to the national value of 80. Further, we expect, because of prior research, that the traditional teaching group (Group 1) will have the lowest mean score and that the peer assistance group (Group 4) will have the highest mean score on the MMPI. In fact, we expect that Group 1 will have a mean of 550 and that Group 4 will have mean that is greater by 1.2 standard deviations, i.e., the mean will equal at least 646. For the sake of simplicity, we will assume that the means of the other two groups will be equal to the grand mean.
We will make use of the Stata program fpower (search fpower) (see How can I use the search command to search for programs and get additional help? for more information about using search) to do the power analysis. The fpower program needs the following information in order to do the power analysis: 1) the number of levels (or groups), 2) the effect size (called delta), and 3) the alpha level. As stated above, there are four groups, a=4. We will set alpha = 0.05, and we will compute the effect size, delta = (largest_mean – smallest_mean)/standard_deviation. Hence, delta = (646-550)/80 = 1.2 . The standard deviation we use is the pooled within-group standard deviation, i.e., the square root of the mean square error for the anova table.
fpower, a(4) delta(1.2) alpha(0.05) a = 4 b = 1 c = 1 r = 1 rho = 0 delta = 1.2 nobs power 2 .0906746 3 .1438119 4 .2013958 5 .2614601 6 .3224192 7 .3829314 8 .4419005 9 .49847 10 .5520059 12 .6484047 14 .7294912 16 .795521 18 .8478578 20 .8884002 25 .9512783 30 .9800673 35 .9922693 40 .9971333 45 .998977 50 .9996469 100 1
The table above shows that we can achieve a power of 0.8 with between 16 and 18 students per group. Let’s call it 17 students, just for the sake of argument. We can attempt to verify these numbers using a Monte Carlo simulation program simpower (search simpower) (see How can I use the search command to search for programs and get additional help? for more information about using search). The grand mean for the other two groups is found by (550+646)/2 = 598.
simpower, gr(4) n(17 17 17 17) mu(550 598 598 646) s(80 80 80 80) Sample Sizes, Means and Standard Deviations ------------------------------------------- N1 = 17 MU1 = 550 S1 = 80 N2 = 17 MU2 = 598 S2 = 80 N3 = 17 MU3 = 598 S3 = 80 N4 = 17 MU4 = 646 S4 = 80 1000 simulated ANOVA F tests ------------------------------ Alpha Simulated Level Power ------------------------------ 0.1000 0.8840 0.0750 0.8510 0.0500 0.8070 0.0250 0.7300 0.0100 0.5930
The Monte Carlo results from simpower are consistent with the results from the fpower program.
While 17 students per group sound like a fine number of subjects if everything works out as planned, we should consider what would occur if things do not work out as planned. Let’s say that the treatment effect is not a large 1.2 but a more modest .75.
fpower, a(4) delta(0.75) alpha(0.05) a = 4 b = 1 c = 1 r = 1 rho = 0 delta = .75 nobs power 2 .0654313 3 .0840352 4 .1035826 5 .1239748 6 .1451255 7 .1669355 8 .1893014 9 .2121201 10 .2352911 12 .282309 14 .3296447 16 .3766765 18 .422875 20 .4678013 25 .5724329 30 .663641 35 .7402725 40 .8027472 45 .8524114 50 .8910493 100 .9969381
Now, it looks like we will need around 40 students per group to achieve a power of 0.8. Again, we will check these results versus simpower. Now if delta = 0.75 then we can compute the higher mean by 0.75*80+550 = 610. The grand mean will be (550+610)/2 = 580.
simpower, gr(4) n(40 40 40 40) mu(550 580 580 610) s(80 80 80 80) Sample Sizes, Means and Standard Deviations ------------------------------------------- N1 = 40 MU1 = 550 S1 = 80 N2 = 40 MU2 = 580 S2 = 80 N3 = 40 MU3 = 580 S3 = 80 N4 = 40 MU4 = 610 S4 = 80 1000 simulated ANOVA F tests ------------------------------ Alpha Simulated Level Power ------------------------------ 0.1000 0.8790 0.0750 0.8540 0.0500 0.8170 0.0250 0.7290 0.0100 0.6020
We will run an additional simpower in which we let the standard deviations increase along with the group means (not an uncommon occurrence).
simpower, gr(4) n(40 40 40 40) mu(550 580 580 610) s(80 90 90 100) Sample Sizes, Means and Standard Deviations ------------------------------------------- N1 = 40 MU1 = 550 S1 = 80 N2 = 40 MU2 = 580 S2 = 90 N3 = 40 MU3 = 580 S3 = 90 N4 = 40 MU4 = 610 S4 = 100 1000 simulated ANOVA F tests ------------------------------ Alpha Simulated Level Power ------------------------------ 0.1000 0.7880 0.0750 0.7550 0.0500 0.6920 0.0250 0.5740 0.0100 0.4520
It now looks like 40 students per groups is not quite enough. Let’s try it with 50 students.
simpower, gr(4) n(50 50 50 50) mu(550 580 580 610) s(80 90 90 100) Sample Sizes, Means and Standard Deviations ------------------------------------------- N1 = 50 MU1 = 550 S1 = 80 N2 = 50 MU2 = 580 S2 = 90 N3 = 50 MU3 = 580 S3 = 90 N4 = 50 MU4 = 610 S4 = 100 1000 simulated ANOVA F tests ------------------------------ Alpha Simulated Level Power ------------------------------ 0.1000 0.8750 0.0750 0.8450 0.0500 0.7920 0.0250 0.7160 0.0100 0.5860
This is pretty close to a power of 0.8. The effect size of 0.75 is considered moderate. Finally, just to be safe, we should see what sample size would be needed if the there was a small effect size of, say, 0.25.
fpower, a(4) delta(0.25) alpha(0.05) a = 4 b = 1 c = 1 r = 1 rho = 0 delta = .25 nobs power 2 .0516819 3 .05358 4 .0554754 5 .057375 6 .0592837 7 .0612038 8 .0631365 9 .0650824 10 .0670419 12 .0710018 14 .0750164 16 .079085 18 .0832068 20 .0873805 25 .0980343 30 .1089857 35 .1202144 40 .1317 45 .1434223 50 .1553612 100 .2825522
A power of 0.8 is not even on the chart. Using simpower indicates that an N of about 380 per group is needed to obtain a power of 0.8 when the effect size is 0.25.
Here are the sample sizes per group that we have come up with in our power analysis: 17 (best case scenario), 40 (medium effect size), 50 (medium effect size with a fudge factor), and 380 (almost the worst case scenario). Even though we expect a large effect, we will shoot for a sample size of between 40 and 50. For one thing, it is all that our research budget will allow and the school district won’t allow us to use more than 200 students total.
See Also
- Related Stata Commands
- sampsi — Sample size and power determination.
- References
-
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences, Second Edition.
Mahwah, NJ: Lawrence Erlbaum Associates.