* Encoding: UTF-8.
get file='D:\data\Seminars\nhanes2f.sav'.

show license.

* looking at the sampling weight variable.
desc var = finalwgt.
graph
    /histogram = finalwgt.

csplan analysis
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /planvars analysisweight = finalwgt
  /srsestimator type = wor   
  /print plan
  /design strata = stratid cluster = psuid 
  /estimator type = wr.

* descriptives for continuous variables.
* must have the mean, the sum or the ratio subcommand.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age
  /mean
  /statistics se count popsize cin(95).

* notice that coppper has missing values.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = copper 
  /mean
  /statistics se count popsize cin(95).

* hct = hemocrit.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = hct
  /mean
  /statistics se count popsize cin(95).

* notice how missing data are handled.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age copper hct
  /mean
  /statistics se count popsize cin(95).

* the default is scope = analysis.
* these results are the same as above.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age copper hct
  /mean
  /statistics se count popsize cin(95)
  /missing scope = analysis.

* using listwise deletion.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age copper hct
  /mean
  /statistics se count popsize cin(95)
  /missing scope = listwise.

* getting the design effect.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age copper hct
  /mean
  /statistics se count popsize cin(95) deff.

* getting a sum.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = female
  /sum
  /statistics se count popsize cin(95).

* getting a ratio.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = copper hct
  /ratio numerator = copper denominator = hct
  /statistics se count popsize cin(95).

* question about ratio of females to males.
* create new variable called male.
recode female (0 = 1) (1 = 0) into male.
* check to see that the new variable is correct.
crosstabs
    /tables = female by male.

* ratio of females to males.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = female male
  /ratio numerator = female denominator = male
  /statistics se count popsize cin(95).

* t-test comparing mean to a user-specified value.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = hct
  /mean ttest = 40
  /statistics se count popsize cin(95).

* descriptives with binary variables.
* the mean of a binary variable is the proportion of 1s.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = female
  /mean
  /statistics se count popsize cin(95).

* descriptives for categorical variables.
cstabulate
    /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
    /tables variables = female.
* defaults:
    /cells popsize
    /statistics se.

* adding subcommands.
cstabulate
    /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
    /tables variables = female
    /cells popsize tablepct
    /statistics count.

* homogeneity available only for one-way tables.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = female
      /cells popsize tablepct
      /statistics se cin(95) count
      /test homogeneity.

* crosstab.
cstabulate
    /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
    /tables variables = female by rural.

* adding options.
* if you add options to a subcommand, it will override the default.
* notice that popsize is not in this table.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = female by rural
      /cells rowpct colpct tablepct
      /statistics se cin(95) count.

* another way to make two-way tables.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = female health by rural
      /cells popsize tablepct
      /statistics se cin(95) count.

* expected only works with a two-way table.
* chi-square.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = female by rural
      /cells popsize tablepct
      /statistics count expected
      /test independence.

* subpops.
* remember this anlaysis.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age
  /mean.

* different ways to organize the output.
* layered.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age
  /subpop table = female display = layered
  /mean.

* separate.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = age
  /subpop table = female display = separate
  /mean.

* two variables listed on the subpop subcommand.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = copper
  /subpop table = health by female display = layered
  /mean.

* three variables listed on subpop subcommand.
* a total of 17 variables can be listed.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = copper
  /subpop table = female by race by region display = layered
  /mean
  /statistics count popsize.

* there is a subpop subcommand for cstabulate.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = female by region
      /cells popsize tablepct
      /statistics se cin(95) count expected
      /subpop table = rural.

* notice the warning; this is very unfortunate.
* can't get a chi-square test with a subpop.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = female by region
      /cells popsize tablepct
      /statistics count expected
      /test independence
      /subpop table = rural.

* getting a sum with a subpop.
csdescriptives
  /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
  /summary variables = diabetes
  /subpop table = female display = layered
  /sum
  /statistics count popsize.

* remember that the highest numbered category is the reference category.
csglm copper by female
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female
      /statistics parameter cinterval ttest
      /test type = adjf.

* t-test with regression.
* using the emmeans subcommand.
csglm copper by female
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female compare = female.
* -24.774*-24.774 = 613.753 with some rounding error.

* one of the most useful subcommands is the emmeans subcommand.
* using the contrast keyword with the deviation option (each level compared to the grand mean).
csglm copper by female
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female compare = female contrast = deviation.

* another way to specify emmeans.
csglm copper by female diabetes with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female diabetes age female*diabetes
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female by diabetes compare = diabetes.

* multiple emmeans subcommands.
csglm copper by female diabetes
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female diabetes
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female
      /emmeans tables = diabetes.

* another way to specify emmeans.
csglm bpsystol by female with height
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female female*height
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female other = [height (150)]      
      /emmeans tables = female other = [height (160)].

* including an interaction with an asterisk.
csglm copper by female diabetes with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female diabetes age female*diabetes
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female by diabetes.

* including an interaction with the keyword by.
csglm copper by female diabetes with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female diabetes age female by diabetes
      /statistics parameter cinterval ttest
      /test type = adjf
      /emmeans tables = female by diabetes.

* making a squared term - can only be done with continuous variables.
csglm copper by female diabetes with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female diabetes age age*age
      /statistics parameter ttest cinterval
      /test type = adjf.

* using a domain subcommand - the value in () is necessary.
csglm bpsystol by female with height
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model female female*height
      /statistics parameter cinterval ttest
      /test type = adjf
      /domain variable = region (1).

* using a custom subcommand.
* the first row on the lmatrix tests the difference between levels 1 and 3 for race.
* the second row on the lmatrix tests the difference between levels 2 and 3 for race.
csglm bpsystol by race smsa
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model race smsa race*smsa
      /statistics parameter cinterval ttest
      /test type = adjf
      /custom label = "race"
      lmatrix = race 1 0 -1
      race*smsa 1/3 1/3 1/3
                           0    0   0
                           -1/3 -1/3 -1/3;
                           race 0 1 -1
                           race*smsa 0 0 0
                           1/3 1/3 1/3
                           -1/3 -1/3 -1/3.

* using a custom subcommand.
* lmatrix tests race 1 v 3 and race 1 v 2.
csglm bpsystol by race smsa
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model race smsa race*smsa
      /statistics parameter cinterval ttest
      /test type = adjf
      /custom label = "race"
      lmatrix = race 1 -1 0
      race*smsa 1 0 0
       -1 0 0
       0 0 0;
      race 1 -1 0
      race*smsa 0 1 0
       0 -1 0
       0 0 0
       kmatrix = 1; -1.

* logistic regression.
* looking at distribution of outcome variable.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = highbp
      /cells popsize tablepct
      /statistics se cin(95) count.

* looking at outcome variable crossed with categorical predictor.
cstabulate
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /tables variables = health by highbp
      /cells popsize tablepct
      /statistics se count.

* logistic with continuous and categorical predictors.
cslogistic highbp(low) by health with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model health age
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* if no model subcommand is used, SPSS will run a model with only main effects.
cslogistic highbp(low) by health with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* logistic with interaction with asterisk.
cslogistic highbp(low) by health with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model health age health*age
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* logistic with interaction with keyword by.
cslogistic highbp(low) by health with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model health age health by age
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* logistic with the oddratios subcommand.
cslogistic highbp(low) by health female with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model health female health*female age
      /statistics parameter exp cinterval ttest
      /test type = adjf
      /oddsratios factor = [female] control=[health(3)]
      /oddsratios factor = [health] control=[female(1)]
      /oddsratios covariate = [age(30 50 70)].

* using a domain subcommand to get urban.
cslogistic highbp(low) by health with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model health age health by age
      /statistics parameter exp cinterval ttest
      /test type = adjf
      /domain variable = rural (0).

* cslogistic will work with a multi-level DV.
* this is a multinomial logistic regression.
cslogistic health(low) by highbp with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model highbp age
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* ordered logistic.
* this is a logistic regression model using the csordinal command.
* getting threshold instead of intercept.
csordinal highbp(descending) by health with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model health age health*age
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* default output for csordinal.
* not so helpful without the coefficients.
csordinal health(descending) by female region with age
       /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'.

* getting the parameters and the adjusted tests.
* notice that no model subcommand is used.
csordinal health(ascending) by highbp with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /statistics parameter exp cinterval ttest
      /test type = adjf.

* using a domain subcommand to get rural.
csordinal health(ascending) by highbp with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model highbp age highbp*age
      /statistics parameter exp cinterval ttest
      /test type = adjf
      /domain variable = rural (1).

* generalized ordinal logistic regression.
csordinal health(ascending) by female region with age
       /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
       /model female region age
       /statistics parameter exp cinterval ttest
       /nonparallel test parameter.   

* generalized ordinal logistic regression with a domain subcommand.
csordinal health(ascending) by diabetes with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model diabetes age
      /statistics parameter exp cinterval ttest
      /test type = adjf
      /domain variable = rural (1)
     /nonparallel test parameter.   

* probit.
csordinal highbp(descending) with age
      /plan file = 'D:\data\Seminars\nhanes2f_plan.csaplan'
      /model age
      /statistics parameter cinterval ttest
      /link function = probit
      /test type = adjf.