Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence by Judith D. Singer and John B. Willett Chapter 6: Modeling Discontinuous and Nonlinear Change
Reading in the wages data set.
wages <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/wages_pp.txt", header=T, sep=","
Table 6.1, p. 192. Creating the ged*exper variable in the wages data set.
wages$ged.exper <- wages$ged*wages$exper
print(wages[wages$id %in% c(206,2365,4384),c(1:5, 16)])
id lnw exper ged postexp ged.exper
75 206 2.028 1.874 0 0.000 0.000
76 206 2.297 2.814 0 0.000 0.000
77 206 2.482 4.314 0 0.000 0.000
1264 2365 1.782 0.660 0 0.000 0.000
1265 2365 1.763 1.679 0 0.000 0.000
1266 2365 1.710 2.737 0 0.000 0.000
1267 2365 1.736 3.679 0 0.000 0.000
1268 2365 2.192 4.679 1 0.000 4.679
1269 2365 2.042 5.718 1 1.038 5.718
1270 2365 2.320 6.718 1 2.038 6.718
1271 2365 2.665 7.872 1 3.192 7.872
1272 2365 2.418 9.083 1 4.404 9.083
1273 2365 2.389 10.045 1 5.365 10.045
1274 2365 2.485 11.122 1 6.442 11.122
1275 2365 2.445 12.045 1 7.365 12.045
2206 4384 2.859 0.096 0 0.000 0.000
2207 4384 1.532 1.039 0 0.000 0.000
2208 4384 1.590 1.726 1 0.000 1.726
2209 4384 1.969 3.128 1 1.402 3.128
2210 4384 1.684 4.282 1 2.556 4.282
2211 4384 2.625 5.724 1 3.998 5.724
2212 4384 2.583 6.024 1 4.298 6.024
Table 6.2, p. 203.
model.a <- lme(lnw~exper+hgc.9+exper:black+ue.7, wages, random= ~exper | id, method="ML")
2*model.a$logLik
[1] -4830.519
model.b <- lme(lnw~exper+hgc.9+exper:black+ue.7+ged, wages, random= ~ exper+ged | id, method="ML")
2*model.b$logLik
[1] -4805.517
anova(model.a, model.b)
Model df AIC BIC logLik Test L.Ratio p-value
model.a 1 9 4848.519 4909.398 -2415.260
model.b 2 13 4831.517 4919.454 -2402.759 1 vs 2 25.00152 1e-04
model.c <- lme(lnw~exper+hgc.9+exper:black+ue.7+ged, wages, random= ~ exper | id, method="ML")
2*model.c$logLik
[1] -4818.324
anova(model.b, model.c)
Model df AIC BIC logLik Test L.Ratio p-value
model.b 1 13 4831.517 4919.454 -2402.759
model.c 2 10 4838.324 4905.968 -2409.162 1 vs 2 12.80671 0.0051
model.d <- lme(lnw~exper+hgc.9+exper:black+ue.7+postexp, wages, random= ~exper+postexp | id, method="ML")
2*model.d$logLik
[1] -4817.377
anova(model.a, model.d)
Model df AIC BIC logLik Test L.Ratio p-value
model.a 1 9 4848.519 4909.398 -2415.260
model.d 2 13 4843.377 4931.314 -2408.689 1 vs 2 13.1417 0.0106
model.e <- lme(lnw~exper+hgc.9+exper:black+ue.7+postexp, wages, random= ~exper | id, method="ML")
2*model.e$logLik
[1] -4820.706
anova(model.d, model.e)
Model df AIC BIC logLik Test L.Ratio p-value
model.d 1 13 4843.377 4931.314 -2408.689
model.e 2 10 4840.706 4908.350 -2410.353 1 vs 2 3.329158 0.3436
model.f <- lme(lnw~exper+hgc.9+exper:black+ue.7+postexp+ged, wages, random= ~exper+postexp+ged | id, method="ML")
2*model.f$logLik
[1] -4789.354
anova(model.b, model.f)
Model df AIC BIC logLik Test L.Ratio p-value
model.b 1 13 4831.517 4919.454 -2402.759
model.f 2 18 4825.354 4947.113 -2394.677 1 vs 2 16.16339 0.0064
anova(model.d, model.f)
Model df AIC BIC logLik Test L.Ratio p-value
model.d 1 13 4843.377 4931.314 -2408.689
model.f 2 18 4825.354 4947.113 -2394.677 1 vs 2 28.02321 <.0001
model.g <- lme(lnw~exper+hgc.9+exper:black+ue.7+postexp+ged, wages, random= ~exper+ged | id, method="ML")
2*model.g$logLik
[1] -4802.688
anova(model.f, model.g)
Model df AIC BIC logLik Test L.Ratio p-value
model.f 1 18 4825.354 4947.113 -2394.677
model.g 2 14 4830.688 4925.390 -2401.344 1 vs 2 13.33432 0.0098
model.h <- lme(lnw~exper+hgc.9+exper:black+ue.7+postexp+ged, wages, random= ~exper+postexp | id, method="ML")
2*model.h$logLik
[1] -4812.639
anova(model.f, model.h)
Model df AIC BIC logLik Test L.Ratio p-value
model.f 1 18 4825.354 4947.113 -2394.677
model.h 2 14 4840.639 4935.340 -2406.320 1 vs 2 23.28511 1e-04
***model.i <- lme(lnw~exper+hgc.9+exper:black+ue.7+ged+exper:ged, wages, random= ~exper+ged+exper:ged | id, method="ML")
2*model.i$logLik
[1] -4798.705
***anova(model.b, model.i)
Model df AIC BIC logLik Test L.Ratio p-value
model.b 1 13 4831.519 4919.455 -2402.759
model.i 2 18 4834.705 4956.464 -2399.353 1 vs 2 6.813514 0.2349
model.j <- lme(lnw~exper+hgc.9+exper:black+ue.7+ged+exper:ged, wages, random= ~exper+ged | id, method="ML")
2*model.j$logLik
[1] -4804.601
***anova(model.i, model.j)
Model df AIC BIC logLik Test L.Ratio p-value
model.i 1 18 4834.705 4956.464 -2399.353
model.j 2 14 4832.628 4927.329 -2402.314 1 vs 2 5.92252 0.205
Table 6.3, p. 205. Summary of model F.
summary(model.f)
Linear mixed-effects model fit by maximum likelihood
Data: wages
AIC BIC logLik
4825.354 4947.113 -2394.677
Random effects:
Formula: ~exper + postexp + ged | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.20327713 (Intr) exper postxp
exper 0.03688142 -0.227
postexp 0.05785090 -0.513 -0.425
ged 0.12828189 0.457 0.616 -0.528
Residual 0.30638530
Fixed effects: lnw ~ exper + hgc.9 + exper:black + ue.7 + postexp + ged
Value Std.Error DF t-value p-value
(Intercept) 1.7385724 0.011948806 5509 145.50177 0.0000
exper 0.0414714 0.002798481 5509 14.81925 0.0000
hgc.9 0.0390263 0.006246189 886 6.24801 0.0000
ue.7 -0.0117239 0.001783798 5509 -6.57244 0.0000
postexp 0.0094225 0.005548605 5509 1.69818 0.0895
ged 0.0408782 0.022006748 5509 1.85753 0.0633
exper:black -0.0196196 0.004472637 5509 -4.38657 0.0000
Correlation:
(Intr) exper hgc.9 ue.7 postxp ged
exper -0.531
hgc.9 0.093 -0.023
ue.7 -0.368 0.255 -0.045
postexp 0.159 -0.368 -0.021 -0.012
ged -0.323 0.122 -0.020 0.055 -0.558
exper:black -0.059 -0.297 -0.018 0.067 -0.056 0.007
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-4.34624959 -0.51331005 -0.03870199 0.44566634 6.97197858
Number of Observations: 6402
Number of Groups: 888
Fig. 6.4, p. 209. We must first read in the alcohol data set and then run model e from table 4.1, p. 94.
#reading in the alcohol data
alcohol <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")
#table 4.1, model e
model.e <- lme(alcuse~coa+peer*age_14 , data=alcohol, random= ~ age_14 | id, method="ML")
#obtaining the fixed effects parameters
fixef.e <- fixef(model.e)
#obtaining the predicted values and squaring them
fit2.ec0p0 <- (fixef.e[[1]] + .655*fixef.e[[3]] +
alcohol$age_14[1:3]*fixef.e[[4]] +
.655*alcohol$age_14[1:3]*fixef.e[[5]])^2
fit2.ec0p1 <- (fixef.e[[1]] + 1.381*fixef.e[[3]] +
alcohol$age_14[1:3]*fixef.e[[4]] +
1.381*alcohol$age_14[1:3]*fixef.e[[5]] )^2
fit2.ec1p0 <- (fixef.e[[1]] + fixef.e[[2]] + .655*fixef.e[[3]] +
alcohol$age_14[1:3]*fixef.e[[4]] +
.655*alcohol$age_14[1:3]*fixef.e[[5]] )^2
fit2.ec1p1 <- (fixef.e[[1]] + fixef.e[[2]] + 1.381*fixef.e[[3]] +
alcohol$age_14[1:3]*fixef.e[[4]] +
1.381*alcohol$age_14[1:3]*fixef.e[[5]])^2
plot(alcohol$age[1:3], fit2.ec0p0, ylim=c(0, 3), type="n",
ylab="predicted alcuse squared", xlab="age")
lines(spline(alcohol$age[1:3], fit2.ec0p0), pch=2, type="b")
lines(spline(alcohol$age[1:3], fit2.ec0p1), type="b", pch=0)
lines(spline(alcohol$age[1:3], fit2.ec1p0), type="b", pch=17)
lines(spline(alcohol$age[1:3], fit2.ec1p1), type="b", pch=15)
title("Non-linear Change")
legend(14, 3, c("COA=0, low peer", "COA=0, high peer",
"COA=1, low peer", "COA=1, high peer"))
Reading in the berkeley data set and then creating the transformed variables for iq and age.
berkeley <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/berkeley_pp.txt", header=T, sep=",")
berkeley$age2.3 <- berkeley$age^(1/2.3)
berkeley$iq2.3 <- berkeley$iq^2.3
Fig. 6.6, p. 212.
plot(berkeley$age, berkeley$iq, type="p", ylim=c(0, 250), xlim=c(0, 60), ylab="IQ", xlab="TIME")
plot(berkeley$age, berkeley$iq2.3, type="p", ylim=c(0, 300000), xlim=c(0, 60), ylab="IQ^(2.3)", xlab="TIME")
plot(berkeley$age2.3, berkeley$iq, type="p", ylim=c(0, 250), xlim=c(0, 6), ylab="IQ", xlab="TIME^(1/2.3)")
Reading in the external data set.
external <- read.table("https://stats.idre.ucla.edu/wp-content/uploads/2020/01/external_pp.txt", header=T, sep=",")
Creating the higher order variables for grade.
external$grade2 <- external$grade^2 external$grade3 <- external$grade^3 external$grade4 <- external$grade^4
Fig. 6.7, p. 218.
fit1 <- fitted.values(lm(external~grade+grade2, external[external$id==1,]))
fit1.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==1,]))
plot(spline(external[external$id==1,]$grade, fit1), type="l", ylim=c(0, 60), ylab="external", xlab="grade")
lines(spline(external[external$id==1,]$grade, fit1.4), type="l", lty=3)
points(external[external$id==1,]$grade, external[external$id==1,]$external, pch=16)
title("Person A, id=1")
legend(1, 10, c("quadratic fit", "quartic fit"), lty=c(1, 3))
fit2 <- fitted.values(lm(external~grade+grade2, external[external$id==6,]))
fit2.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==6,]))
plot(spline(external[external$id==6,]$grade, fit2), type="l", ylim=c(0, 60), ylab="external", xlab="grade")
lines(spline(external[external$id==6,]$grade, fit2.4), type="l", lty=3)
points(external[external$id==6,]$grade, external[external$id==6,]$external, pch=16)
title("Person B, id=6")
legend(1, 60, c("quadratic fit", "quartic fit"), lty=c(1, 3))
fit3 <- fitted.values(lm(external~grade, external[external$id==11,]))
fit3.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==11,]))
plot(external[external$id==11,]$grade, fit3, type="l", ylim=c(0, 60), ylab="external", xlab="grade")
lines(spline(external[external$id==11,]$grade, fit3.4), type="l", lty=3)
points(external[external$id==11,]$grade, external[external$id==11,]$external, pch=16)
title("Person C, id=11")
legend(1, 60, c("linear fit", "quartic fit"), lty=c(1, 3))
fit4 <- fitted.values(lm(external~grade, external[external$id==25,]))
fit4.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==25,]))
plot(external[external$id==25,]$grade, fit4, type="l", ylim=c(0, 60), ylab="external", xlab="grade")
lines(spline(external[external$id==25,]$grade, fit4.4), type="l", lty=3)
points(external[external$id==25,]$grade, external[external$id==25,]$external, pch=16)
title("Person D, id=25")
legend(1, 60, c("linear fit", "quartic fit"), lty=c(1, 3))
fit5 <- fitted.values(lm(external~grade+grade2+grade3, external[external$id==34,]))
fit5.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==34,]))
plot(spline(external[external$id==34,]$grade, fit5), type="l", ylim=c(0, 60), ylab="external", xlab="grade")
lines(spline(external[external$id==34,]$grade, fit5.4), type="l", lty=3)
points(external[external$id==34,]$grade, external[external$id==34,]$external, pch=16)
title("Person E, id=34")
legend(1, 60, c("cubic fit", "quartic fit"), lty=c(1, 3))
fit6.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==36,]))
plot(spline(external[external$id==36,]$grade, fit6.4), type="l", ylim=c(0, 60), ylab="external", xlab="grade")
points(external[external$id==36,]$grade, external[external$id==36,]$external, pch=16)
title("Person F, id=36")
legend(1, 60, c("quartic fit"), lty=1)
fit7 <- fitted.values(lm(external~grade+grade2, external[external$id==40,]))
fit7.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==40,]))
plot(spline(external[external$id==40,]$grade, fit7), type="l", ylim=c(0, 60), ylab="external", xlab="grade")
lines(spline(external[external$id==40,]$grade, fit7.4), type="l", lty=3)
points(external[external$id==40,]$grade, external[external$id==40,]$external, pch=16)
title("Person G, id=40")
legend(1, 60, c("quadratic fit", "quartic fit"), lty=c(1, 3))
fit8.4 <- fitted.values(lm(external~grade+grade2+grade3+grade4, external[external$id==26,]))
plot(spline(external[external$id==26,]$grade, fit8.4), type="l", ylim=c(0, 60), ylab="external", xlab="grade")
points(external[external$id==26,]$grade, external[external$id==26,]$external, pch=16)
title("Person H, id=26")
legend(1, 60, c("quartic fit"), lty=1
Creating the higher order terms for time in the external data set.
external$time2 <- external$time^2 external$time3 <- external$time^3
Table 6.5, p. 221. Comparison of fitting alternative polynomial change trajectories to the external data set.
model.a <- lme(external ~ 1, random = ~ 1 | id, method = "ML", external)
summary(model.a)
Linear mixed-effects model fit by maximum likelihood
Data: external
AIC BIC logLik
2016.253 2027.048 -1005.127
Random effects:
Formula: ~1 | id
(Intercept) Residual
StdDev: 9.349753 8.378721
Fixed effects: external ~ 1
Value Std.Error DF t-value p-value
(Intercept) 12.96296 1.486882 225 8.718217 0
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.6629105 -0.5254597 -0.1648188 0.4803230 3.5392935
Number of Observations: 270
Number of Groups: 45
model.b <- lme(external ~ time, random = ~ time | id, method = "ML", external)
summary(model.b)
Linear mixed-effects model fit by maximum likelihood
Data: external
AIC BIC logLik
2003.744 2025.335 -995.8722
Random effects:
Formula: ~time | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 11.114141 (Intr)
time 2.166306 -0.521
Residual 7.329261
Fixed effects: external ~ time
Value Std.Error DF t-value p-value
(Intercept) 13.289947 1.8426669 224 7.212344 0.000
time -0.130794 0.4168777 224 -0.313746 0.754
Correlation:
(Intr)
time -0.589
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.9551932 -0.4927453 -0.1353043 0.4206586 2.8078552
Number of Observations: 270
Number of Groups: 45
model.c <- lme(external ~ time+time2, random = ~ time+time2 | id, method = "ML", external)
summary(model.c)
Linear mixed-effects model fit by maximum likelihood
Data: external
AIC BIC logLik
1995.836 2031.821 -987.9182
Random effects:
Formula: ~time + time2 | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 10.348277 (Intr) time
time 4.960841 -0.072
time2 1.102567 -0.119 -0.908
Residual 6.479472
Fixed effects: external ~ time + time2
Value Std.Error DF t-value p-value
(Intercept) 13.969841 1.783654 223 7.832146 0.0000
time -1.150635 1.112977 223 -1.033835 0.3023
time2 0.203968 0.229323 223 0.889436 0.3747
Correlation:
(Intr) time
time -0.322
time2 0.131 -0.932
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.2152282 -0.4833639 -0.1148441 0.3780750 2.6531221
Number of Observations: 270
Number of Groups: 45
***model.d <- lme(external ~ time+time2+time3, random = ~ time+time2+time3 | id, method = "ML", external)
update(model.a, formula = external ~ time + time2 + time3, random
= ~ time + time2 + time3 | id)
summary(model.d)
Linear mixed-effects model fit by maximum likelihood
Data: external
AIC BIC logLik
1997.36 2051.336 -983.6798
Random effects:
Formula: ~ time + time2 + time3 | id
Structure: General positive-definite
StdDev Corr
(Intercept) 11.3456161 (Intr) time time2
time 10.3276727 -0.472
time2 4.0747529 0.521 -0.975
time3 0.4190613 -0.672 0.940 -0.981
Residual 6.1485004
Fixed effects: external ~ time + time2 + time3
Value Std.Error DF t-value p-value
(Intercept) 13.79453 1.929355 222 7.149815 <.0001
time -0.35006 2.344248 222 -0.149327 0.8814
time2 -0.23430 1.066569 222 -0.219680 0.8263
time3 0.05844 0.130845 222 0.446605 0.6556
Reading in the fox and geese data.
fg <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/foxngeese_pp.txt", header=T, sep=",")
Fig. 6.8, p. 227. Empirical growth plots for 8 children in the fox and geese data.
xyplot(nmoves~game | id, data=fg[fg$id %in% c(1, 4, 6, 7, 8, 11, 12, 15), ], ylim=c(0, 25), as.table=T)
Table 6.6, p. 231. Fitting a logistic model to the fox and geese data.
Notice, this model does not correspond to equation (6.8) in the book. Instead, it corresponds to the following equation:
Yij = 1 + 19/(1 + π0*exp(- (π1+u1)*Time – u0)) + εij
model.a <- nlme(nmoves~ 1 + 19/ (1 + xmid*exp( -scal*game + u)),
fixed=scal+xmid~1, random= scal+u~1 |id,
start=c(scal=.2, xmid=12), data=fg)
summary(model.a)
Nonlinear mixed-effects model fit by maximum likelihood
Model: nmoves ~ 1 + 19/(1 + xmid * exp(-scal * game + u))
Data: fg
AIC BIC logLik
2493.691 2518.28 -1240.846
Random effects:
Formula: list(scal ~ 1, u ~ 1)
Level: id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
scal 0.07502426 scal
u 0.67804085 0.821
Residual 3.67947375
Fixed effects: scal + xmid ~ 1
Value Std.Error DF t-value p-value
scal 0.113036 0.0201006 427 5.623499 0
xmid 10.741633 2.3508149 427 4.569323 0
Correlation:
scal
xmid 0.817
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.6168059 -0.5637294 -0.1259907 0.5704615 3.5385187
Number of Observations: 445
Number of Groups: 17
model.b <- nlme(nmoves~ 1 + 19/(1+xmid*exp(-scal10*game -scal01*read -scal11*read*game + u)),
fixed=scal10+scal01+scal11+xmid~1, random= scal10+u~1 |id,
start=c(scal10=.12, scal01= -.4, scal11= .04, xmid=12), data=fg)
summary(model.b)
Nonlinear mixed-effects model fit by maximum likelihood
Model: nmoves ~ 1 + 19/(1 + xmid * exp(-scal10 * game - scal01 * read - scal11 * read * game + u))
Data: fg
AIC BIC logLik
2495.652 2528.436 -1239.826
Random effects:
Formula: list(scal10 ~ 1, u ~ 1)
Level: id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
scal10 0.06864467 scal10
u 0.61433603 0.783
Residual 3.68126652
Fixed effects: scal10 + scal01 + scal11 + xmid ~ 1
Value Std.Error DF t-value p-value
scal10 0.041173 0.051314 425 0.8023701 0.4228
scal01 -0.331583 0.272499 425 -1.2168222 0.2243
scal11 0.036792 0.024578 425 1.4969888 0.1351
xmid 5.636007 3.175306 425 1.7749494 0.0766
Correlation:
scal10 scal01 scal11
scal01 0.736
scal11 -0.931 -0.796
xmid 0.793 0.929 -0.743
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.5945179 -0.5869666 -0.1254558 0.5679836 3.5759761
Number of Observations: 445
Number of Groups: 17
Fig. 6.10, p. 232.
fixef.a <- fixef(model.a)
fit.a <- 1 + 19/(1 + fixef.a[[2]]*exp(-fixef.a[[1]]*fg$game[1:27]))
plot(fg$game[1:27], fit.a, ylim=c(0, 25), type="l", ylab="predicted nmoves", xlab="game")
title("Model A \n Unconditional logistic growth")
fixef.b <- fixef(model.b)
fit.b.high <- 1 + 19/(1+fixef.b[[4]]*exp(-fixef.b[[1]]*fg$game[1:27] - fixef.b[[2]]*1.58 - fixef.b[[3]]*1.58*fg$game[1:27]))
fit.b.low <- 1 + 19/(1+fixef.b[[4]]*exp(-fixef.b[[1]]*fg$game[1:27] - fixef.b[[2]]*(-1.58) - fixef.b[[3]]*(-1.58)*fg$game[1:27]))
plot(fg$game[1:27], fit.b.high, ylim=c(0, 25), type="l",
ylab="predicted nmoves", xlab="game")
lines(fg$game[1:27], fit.b.low, lty=3)
title("Model B \n Fitted logistic growth by reading level")
legend(1, 25, c("High Reading level","Low reading level"), lty=c(1, 3)















