/*Suppose that, in a junior high school, there are a total of 4,000
students in
grades 7, 8, and 9. You want to now how household income and the number of
children in a household affect students' average weekly spending for ice
cream. In order to answer this question, you draw a sample using simple
random
sampling from the student population in the junior high school. You
randomly
select 40 students and ask them their average weekly expenditure for ice
cream,
their household income, and the number of children in their household. The
answers from the 40 students are saved as a SAS data set. */;
data IceCream;
input Grade Spending Income Kids @@;
datalines;
7 7 39 2 7 7 38 1 8 12 47 1
9 10 47 4 7 1 34 4 7 10 43 2
7 3 44 4 8 20 60 3 8 19 57 4
7 2 35 2 7 2 36 1 9 15 51 1
8 16 53 1 7 6 37 4 7 6 41 2
7 6 39 2 9 15 50 4 8 17 57 3
8 14 46 2 9 8 41 2 9 8 41 1
9 7 47 3 7 3 39 3 7 12 50 2
7 4 43 4 9 14 46 3 8 18 58 4
9 9 44 3 7 2 37 1 7 1 37 2
7 4 44 2 7 11 42 2 9 8 41 2
8 10 42 2 8 13 46 1 7 2 40 3
9 6 45 1 9 11 45 4 7 2 36 1
7 9 46 1
;
run;
/*In the data set IceCream, the variable Grade indicates a student's grade.
The variable Spending contains the dollar amount of each student's average
weekly spending for ice cream. The variable Income specifies the household
income, in thousands of dollars. The variable Kids indicates how many
children
are in a student's family. */;
/* First let's try OLS regression which does not account for the sampling
scheme
and let's use dummy coding */;
title1 'Ice Cream Spending Analysis';
title2 'OLS Regression estimates';
data reg;
set IceCream;
if kids=1 then do; k1= 1;k2= 0;k3= 0; end;
if kids=2 then do; k1= 0;k2= 1;k3= 0; end;
if kids=3 then do; k1= 0;k2= 0;k3= 1; end;
if kids=4 then do; k1= 0;k2= 0;k3= 0; end; /*DUMMY CODING*/;
run;
proc reg data=reg;
model Spending = Income k1--k3;
run;
/* Now let's try running the same data assuming SRS from a sample of 4000 */;
title2 'Simple Random Sampling Design';
proc surveyreg data=IceCream total=4000;
class Kids;
model Spending = Income Kids / solution;
run;
/* Now let's suppose that the previous student sample is actually drawn from
a stratified sampling (STRS). The strata are grades in the junior high
school:
the 7th grade, the 8th grade, and the 9th grade. Within strata, simple
random
samples are selected. The StudentTotal data provides the number of students
in each grade. */;
data StudentTotal;
input Grade _TOTAL_;
datalines;
7 1824
8 1025
9 1151
;
run;
/*The variable Grade is the stratification variable, and the variable _TOTAL_
contains the total numbers of students in the strata in the survey
population.
PROC SURVEYREG requires you to use the keyword _TOTAL_ as the name of
the variable
that contains the population total information. The following statements
demonstrate how you can fit the linear model while incorporating the sample
design information (stratification). */;
title2 'Stratified Simple Random Sampling Design';
proc surveyreg data=IceCream total=StudentTotal;
strata Grade /list;
class Kids;
model Spending = Income Kids / solution;
run;
/*By comparing these statements to those in the section "Simple Random
Sampling",
the TOTAL=StudentTotal option replaces the previous TOTAL=4000 option.
When the
population totals and sample sizes differ among strata, the population
totals must
be provided by a data set. The STRATA statement specifies the
stratification
variable Grade. The LIST option in the STRATA statement requests that the
stratification information be included in the output. */;
---------------------------------------------
Ice Cream Spending Analysis 11:19 Wednesday, July 19,
2000 1
OLS Regression estimates
The REG Procedure
Model: MODEL1
Dependent Variable: Spending
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 915.30965 228.82741 38.10 <.0001
Error 35 210.19035 6.00544
Corrected Total 39 1125.50000
Root MSE 2.45060 R-Square 0.8132
Dependent Mean 8.75000 Adj R-Sq 0.7919
Coeff Var 28.00685
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -26.08468 3.07591 -8.48 <.0001
Income 1 0.77533 0.06431 12.06 <.0001
k1 1 0.89765 1.11649 0.80 0.4268
k2 1 1.49403 1.10259 1.36 0.1841
k3 1 -0.51318 1.23855 -0.41 0.6812
Ice Cream Spending Analysis 11:19 Wednesday, July 19,
2000 2
Simple Random Sampling Design
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending
Data Summary
Number of Observations 40
Mean of Spending 8.75000
Sum of Spending 350.00000
Fit Statistics
R-square 0.8132
Root MSE 2.4506
Denominator DF 39
Class Level Information
Class
Variable Levels Values
Kids 4 1 2 3 4
ANOVA for Dependent Variable Spending
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 915.310 228.8274 38.10 <.0001
Error 35 210.190 6.0054
Corrected Total 39 1125.500
Tests of Model Effects
Effect Num DF F Value Pr > F
Model 4 119.15 <.0001
Intercept 1 153.32 <.0001
Income 1 324.45 <.0001
Kids 3 0.92 0.4385
NOTE: The denominator degrees of freedom for the F tests is 39.
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -26.084677 2.46720403 -10.57 <.0001
Income 0.775330 0.04304415 18.01 <.0001
Kids 1 0.897655 1.12352876 0.80 0.4292
Kids 2 1.494032 1.24705263 1.20 0.2381
Ice Cream Spending Analysis 11:19 Wednesday, July 19,
2000 3
Simple Random Sampling Design
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Kids 3 -0.513181 1.33454891 -0.38 0.7027
Kids 4 0.000000 0.00000000 . .
NOTE: The denominator degrees of freedom for the t tests is 39.
Matrix X'X is singular and a generalized inverse was used to solve the
normal equations. Estimates are not unique.
Ice Cream Spending Analysis 11:19 Wednesday, July 19,
2000 4
Stratified Simple Random Sampling Design
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending
Data Summary
Number of Observations 40
Mean of Spending 8.75000
Sum of Spending 350.00000
Design Summary
Number of Strata 3
Fit Statistics
R-square 0.8132
Root MSE 2.4506
Denominator DF 37
Stratum Information
Stratum Population Sampling
Index Grade N Obs Total Rate
1 7 20 1824 0.01
2 8 9 1025 0.01
3 9 11 1151 0.01
Class Level Information
Class
Variable Levels Values
Kids 4 1 2 3 4
ANOVA for Dependent Variable Spending
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 915.310 228.8274 38.10 <.0001
Error 35 210.190 6.0054
Corrected Total 39 1125.500
Tests of Model Effects
Effect Num DF F Value Pr > F
Model 4 114.60 <.0001
Intercept 1 150.05 <.0001
Ice Cream Spending Analysis 11:19 Wednesday, July 19,
2000 5
Stratified Simple Random Sampling Design
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Spending
Tests of Model Effects
Effect Num DF F Value Pr > F
Income 1 317.63 <.0001
Kids 3 0.93 0.4355
NOTE: The denominator degrees of freedom for the F tests is 37.
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -26.084677 2.48241893 -10.51 <.0001
Income 0.775330 0.04350401 17.82 <.0001
Kids 1 0.897655 1.11778377 0.80 0.4271
Kids 2 1.494032 1.25209199 1.19 0.2404
Kids 3 -0.513181 1.36853454 -0.37 0.7098
Kids 4 0.000000 0.00000000 . .
NOTE: The denominator degrees of freedom for the t tests is 37.
Matrix X'X is singular and a generalized inverse was used to solve the
normal equations. Estimates are not unique.
