1.0 Stata commands in this unit
|anova||Analysis of variance|
|xi||Creates dummy variables during model estimation|
|predict||Predicts after model estimation|
|kdensity||Kernel density estimates and graphs|
|pnorm||Graphs a standardized normal plot|
|qnorm||Graphs a quantile plot|
|rvfplot||Graphs a residual versus fitted plot|
|test||Test linear hypotheses after model estimation|
|tabulate||Crosstabs with chi-square test|
|signtest||Tests the equality of matched pairs of data|
|signrank||Wilcoxon matched-pairs signed rank test|
|ranksum||Mann-Whitney two-sample test|
|kwallis||Nonparametric analog to the one-way anova|
2.0 Demonstration and explanation
use hs1, clear
2.1 chi-square test of frequencies
Here is the tabulate command for a crosstabulation with an option to compute chi-square test of independence and measures of association.
tabulate prgtype ses, all
Here is the command with an option to display expected frequencies so that one can check for cells with very small expected values.
tabulate prgtype ses, all expected
This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.
ttest write = 50
This is the paired t-test, testing whether or not the mean of write equals the mean of read.
ttest write = read
This is the two-sample independent t-test with pooled (equal) variances.
ttest write, by(female)
This is the two-sample independent t-test with separate (unequal) variances.
ttest write, by(female) unequal
2.3 Analysis of Variance
The anova command, unsurprisingly, performs analysis of variance (ANOVA). Here is an examplr of a one-way analysis of variance.
anova write prog
In this example the anova command is used to perform a two-way factorial analysis of variance (ANOVA).
anova write prog female prog*female
Here is an example of an analysis of covariance (ANCOVA) using the anova command.
anova write prog female prog*female read, continuous(read)
Plain vanilla OLS linear regression.
regress write read female
In the example below, we run the regression with robust standard errors. This is very useful when there is heterogeneity of variance. This option does not affect the estimates of the regression coefficients.
regress write read female, robust
The predict command calculates predictions, residuals, influence statistics, and the like after an estimation command. The default shown here is to calculate the predicted scores.
When using the resid option the predict command calculates the residual.
predict r, resid
The list command displays the values of the variables that we have generated. The in 1/20 option stipulates that only the first 20 observations be displayed.
list math p r in 1/20
The kdensity command with the normal option displays a density graph of the residuals with an normal distribution superimposed on the graph. This is particularly useful in verifying that the residuals are normally distributed, which is a very important assumption for regression.
kdensity r, normal
The pnorm command produces a normal probability plot and it is another method of testing wether the residuals from the regression are normally distributed.
The qnorm command produces a normal quantile plot. It is yet another method for testing if the residuals are normally distributed. The qnorm plot is more sensitive to deviances from normality in the tails of the distribution, whereas the pnorm plot is more sensitive to deviances near the mean of the distribution.
rvfplot is a convenience command that generates a plot of the residual versus the fitted values; it is used after regress or anova.
Creating dummy variables by using the xi command
The xi prefix is use to dummy code categorical variables such as prog. The predictor prog has three levels and requires two dummy-coded variables. The test command is used to test the collective effect of the two dummy-coded variables; in other words, it tests the main effect of prog.
xi: regress write read i.prog describe _I* test _Iprog_2 _Iprog_3
The xi prefix can also be used to create dummy variables for prog and for the interaction of prog and read. The first test command tests the overall interaction and the second test command tests the main effect of prog.
xi: regress write i.prog*read describe _I* test _IproXread_2 _IproXread_3 test _Iprog_2 _Iprog_3
2.5 Logistic regression
In order to demonstrate the logistic regression commands, we will create a dichotomous variable called honcomp (honors composition) to use as our dependent variable. This is purely for illustrative purposes only!
gen honcomp = write >= 60 tab honcomp
The logistic command defaults to producing the output in odds ratios but can display the coefficients if the coef option is used. The exact same results can be obtained by using the logit command, which produces coefficients as the default but will display the odds ratio if the or option is used.
logit honcomp read female logit, or
2.6 Non-Parametric Tests
The signtest is the nonparametric analog of the single-sample t-test.
signtest write = 50
The signrank command computes a Wilcoxon sign-ranked test, the nonparametric analog of the paired t-test.
signrank write = read
The ranksum test is the nonparametric analog of the independent two-sample t-test and is know as the Mann-Whitney or Wilcoxon test.
ranksum write, by(female)
The kwallis command computes a Kruskal-Wallis test, the non-parametric analog of the one-way ANOVA.
kwallis write, by(prog)
3.0 For more information
with Stata 10
- Chapters 5, 6, 7, 9, 10
- Stata Web Books
- Regression with Stata Webbook Includes such topics as diagnostics, categorical predictors, testing interactions and testing contrasts
- Regression Models For Categorical Dependent Variables, Second Edition by Long and Freese Shows how to optimize Stata’s capabilities for analyzing logistic regression
- Frequently Asked QuestionsCovers many topics, including ANOVA and linear regression