*graphs of calender time and study time. clear input subj tp censored str11 datestr 1 1 0 "1 jan 1990" 1 2 0 "1 mar 1991" 2 1 1 "1 feb 1990" 2 2 1 "1 feb 1991" 3 1 1 "1 jun 1990" 3 2 1 "31 dec 1991" 4 1 0 "1 sep 1990" 4 2 0 "1 apr 1991" end gen date = date(datestr, "dmy") format date %dmy hilite subj date, c(L) hilite(censored==0) s(o x) ylab(1 2 3 4) gen time =0 if tp==1 replace time= (date-date[_n-1])/30.5 if tp==2 hilite subj time, c(L) hilite(censored==0) s(o x) ylab(1 2 3 4) xlab(0 8 12 14 19 24) *inputting small version of the uis data use c:\uis.dta, clear gen id = ID drop ID stset time, failure(censor) *graph of cumulative hazard curve sts graph, na noborder *looking at the first 10 observations of the data list id time censor age ndrugtx treat site herco in 1/10, nodisplay *logrank test for treat sts test treat, logrank *kaplan-meier curves for levels of treat sts graph, by(treat) noborder *logrank test for site sts test site, logrank *kaplan-meier curve for levels of site sts graph, by(site) noborder *logrank test for herco sts test herco *kaplan-meier curve for levels of herco sts graph, by(herco) noborder *cox regression for univariate analysis of ndrugtx and age stcox ndrugtx, nohr stcox age, nohr *first big cox regression model xi: stcox age ndrugtx treat site i.herco, nohr test _Iherco_2 _Iherco_3 drop _Iherco_2 _Iherco_3 *eliminating herco stcox age ndrugtx treat site, nohr *testing interactions *generate all the interaction variables gen age_drug = age*ndrugtx gen age_treat = age*treat gen age_site = age*site gen drug_treat = ndrugtx*treat gen drug_site = ndrugtx*site gen treat_site = treat*site stcox age ndrugtx treat site age_drug, nohr stcox age ndrugtx treat site drug_treat, nohr stcox age ndrugtx treat site drug_site, nohr stcox age ndrugtx treat site age_treat, nohr stcox age ndrugtx treat site age_site, nohr stcox age ndrugtx treat site treat_site, nohr *final model and checking that including interaction provides a better fitting model stcox age ndrugtx treat site age_site, nohr lrtest, saving(0) stcox age ndrugtx treat site, nohr lrtest, using(0) *lrtest is significant to the bigger model does fit the data better *and that will be the final model *final model for interpretation of hazard ratios stcox age ndrugtx treat site age_site *proportionality assumption *testing proportionality using time-dependent interactions of predictors. stcox age ndrugtx treat site age_site, nohr tvc(age ndrugtx treat site) texp(ln(_t)) *testing the proportionality assumption using the schoenfeld and scaled *scaled schoenfeld residuals *H0: proportionality holds quietly stcox age ndrugtx treat site age_site, schoenfeld(sch*) scaledsch(sca*) stphtest, detail stphtest, plot(age) stphtest, plot(ndrugtx) stphtest, plot(treat) stphtest, plot(site) stphtest, plot(age_site) drop sch1-sch5 sca1-sca5 *only treat appears to be a potential problem but it is not enough to warrant changing the model *testing proportionality assumptions using log-log plots *parallel curves indicate proportionality is upheld stphplot, by(treat) stphplot, by(site) *solution to non-proportionality: stratifying sort treat by treat: stcox age ndrugtx site age_site, nohr *graphing survival curves from cox regression models stcox age ndrugtx treat site age_site, nohr basesurv(surv0) gen surv1 = surv0^exp( (-0.0336943*30+0.0364537*5 - 0.2674113*1)) graph surv1 _t, s(.) c(J) sort ylab(0 .1 to 1) xlab(0 200 to 1200) gen surv2 = surv0^exp( (-0.0336943*30+0.0364537*5)) label variable surv1 "long treatment" label variable surv2 "short treatment" graph surv1 surv2 _t, s(..) c(JJ) sort ylab(0 .1 to 1) xlab(0 200 to 1200) drop surv0-surv2 *Model Fit *assessing goodness of fit by looking at the Cox-Snell residuals *ideal is for cumulative hazard of Cox-Snell residuals to follow 45 degree line *i.e. have an exponential distribution with a hazard rate of 1 quietly stcox age ndrugtx treat site age_site, nohr mgale(mg) predict cs, csnell stset cs, failure(censor) sts gen H = na graph H cs cs, c(ll) s(o.) sort xlab(0 1 to 4) ylab(0 1 to 4) drop mg cs na *Model fits well except at large values of time which is common with censored data.