Display system information and load tidyverse
and faraway
packages
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.2 magrittr_1.5 tools_3.6.2 htmltools_0.4.0
## [5] yaml_2.2.1 Rcpp_1.0.4.6 stringi_1.4.6 rmarkdown_2.1
## [9] knitr_1.28 stringr_1.4.0 xfun_0.13 digest_0.6.25
## [13] rlang_0.4.5 evaluate_0.14
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.0 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(faraway)
So far we studied linear mixed models (LMM), where the responses are continuous and normally distributed. It is natural to extend LMM to handle nonnormally distributed responses.
Recall that, in GLM, the distribution of \(Y_i\) is from the exponential family of distributions of form \[ f(y_i \mid \theta_i, \phi) = \exp \left[ \frac{y \theta_i - b(\theta_i)}{a(\phi)} + c(y_i, \phi) \right]. \] If we use the canonical link, then \[ \theta_i = g(\mu_i) = \eta_i, \] where \(\mu_i = \mathbb{E} Y_i\). Now let \[ \eta_i = \mathbf{x}_i^T \boldsymbol{\beta} + \mathbf{z}_i^T \boldsymbol{\gamma}, \] where \(\boldsymbol{\beta}\) is the fixed effects and \(\boldsymbol{\gamma}\) is the random effects with density \(h(\gamma \mid \boldsymbol{\Sigma})\). (Typically we assume multivariate normal with mean 0 and covariance \(\boldsymbol{\Sigma}\).) Then the likelihood is \[ L(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \prod_{i=1}^n \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \sum_{i=1}^n \log \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma}. \] This model, combining GLM and mixed effects model, is called the generalized linear mixed effects model (GLMM).
Estimation and inference of GLMM have been active research. We give an overview of a few commonly used methods.
Maximum likelihood estimate (MLE) is obtained by maximizing the log-likelihood function. However, each iteration of the optimization algorithm needs to evaluate numerical integration, which quickly becomes infeasible when the dimension of random effects \(\boldsymbol{\gamma}\) is large.
Gauss-Hermite quadrature can be used for numerical integration. It approximates the integral by a weighted sum of integrands at different knots. The more knots, the more accurate the approximation, but at higher computational expense.
Laplace approximation is a special case of Gauss-Hermite quadrature with only one knot. It is less accurate than Guass-Hermite quadrature but computationally cheaper.
Bayesian method is another approach for estimation and inference of GLMM. It’s not covered in this course due to time constraint. Interested students can study ELMR Chapter 12.
Penalized quasi-likelihood (PQL). In the IRWLS (iteratively reweighted least squares) procedure for estimating the GLM, each iteration uses the pseudo-response or working response \[ \mathbf{z}^{(t)} = \boldsymbol{\eta}^{(t)} + (\mathbf{W}^{(t)})^{-1} (\mathbf{y} - \widehat{\boldsymbol{\mu}}^{(t)}). \] It can be adapted to the mixed effects model. PQL has computational advantage but generally considered less accurate than other approaches.
Generalized estimation equations (GEE). To be discussed later.
ctsib
data in EMLR studies the balance of 40 individuals.
norm
or foam
), vision (factor, open
or closed
or dome
).surface
and vision
. 12 measurements per subject.ctsib <- as_tibble(ctsib) %>%
mutate(stable = ifelse(CTSIB == 1, 1, 0)) %>%
print(n = 24)
## # A tibble: 480 x 9
## Subject Sex Age Height Weight Surface Vision CTSIB stable
## <int> <fct> <int> <dbl> <dbl> <fct> <fct> <int> <dbl>
## 1 1 male 22 176 68.2 norm open 1 1
## 2 1 male 22 176 68.2 norm open 1 1
## 3 1 male 22 176 68.2 norm closed 2 0
## 4 1 male 22 176 68.2 norm closed 2 0
## 5 1 male 22 176 68.2 norm dome 1 1
## 6 1 male 22 176 68.2 norm dome 2 0
## 7 1 male 22 176 68.2 foam open 2 0
## 8 1 male 22 176 68.2 foam open 2 0
## 9 1 male 22 176 68.2 foam closed 2 0
## 10 1 male 22 176 68.2 foam closed 2 0
## 11 1 male 22 176 68.2 foam dome 2 0
## 12 1 male 22 176 68.2 foam dome 2 0
## 13 2 male 22 181 67.6 norm open 1 1
## 14 2 male 22 181 67.6 norm open 1 1
## 15 2 male 22 181 67.6 norm closed 2 0
## 16 2 male 22 181 67.6 norm closed 2 0
## 17 2 male 22 181 67.6 norm dome 2 0
## 18 2 male 22 181 67.6 norm dome 2 0
## 19 2 male 22 181 67.6 foam open 2 0
## 20 2 male 22 181 67.6 foam open 2 0
## 21 2 male 22 181 67.6 foam closed 3 0
## 22 2 male 22 181 67.6 foam closed 3 0
## 23 2 male 22 181 67.6 foam dome 3 0
## 24 2 male 22 181 67.6 foam dome 3 0
## # … with 456 more rows
summary(ctsib)
## Subject Sex Age Height Weight
## Min. : 1.00 female:240 Min. :18.00 Min. :142.0 Min. : 44.20
## 1st Qu.:10.75 male :240 1st Qu.:21.75 1st Qu.:166.8 1st Qu.: 60.65
## Median :20.50 Median :25.50 Median :173.0 Median : 68.00
## Mean :20.50 Mean :26.80 Mean :172.1 Mean : 71.14
## 3rd Qu.:30.25 3rd Qu.:33.00 3rd Qu.:180.2 3rd Qu.: 83.50
## Max. :40.00 Max. :38.00 Max. :190.0 Max. :102.40
## Surface Vision CTSIB stable
## foam:240 closed:160 Min. :1.000 Min. :0.0000
## norm:240 dome :160 1st Qu.:2.000 1st Qu.:0.0000
## open :160 Median :2.000 Median :0.0000
## Mean :1.919 Mean :0.2375
## 3rd Qu.:2.000 3rd Qu.:0.0000
## Max. :4.000 Max. :1.0000
subsum <- ctsib %>%
group_by(Subject) %>%
summarise(Height = Height[1],
Weight = Weight[1],
stable = mean(stable),
Age = Age[1],
Sex = Sex[1])
subsum %>%
ggplot() +
geom_point(mapping = aes(x = Height, y = stable))
subsum %>%
ggplot() +
geom_point(mapping = aes(x = Weight, y = stable))
subsum %>%
ggplot() +
geom_point(mapping = aes(x = Age, y = stable))
subsum %>%
ggplot() +
geom_boxplot(mapping = aes(x = Sex, y = stable))
ctsib %>%
group_by(Subject, Surface) %>%
summarize(stable = mean(stable)) %>%
ggplot() +
geom_boxplot(mapping = aes(x = Surface, y = stable)) +
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 39 rows containing non-finite values (stat_boxplot).
ctsib %>%
group_by(Subject, Vision) %>%
summarize(stable = mean(stable)) %>%
ggplot() +
geom_boxplot(mapping = aes(x = Vision, y = stable)) +
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 63 rows containing non-finite values (stat_boxplot).
gf <- glm(stable ~ Sex + Age + Height + Weight + Surface + Vision,
family = binomial,
data = ctsib)
summary(gf)
##
## Call:
## glm(formula = stable ~ Sex + Age + Height + Weight + Surface +
## Vision, family = binomial, data = ctsib)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.28763 -0.54392 -0.13884 -0.05483 2.48130
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.277449 3.803987 1.913 0.055734 .
## Sexmale 1.401577 0.516231 2.715 0.006627 **
## Age 0.002521 0.024307 0.104 0.917390
## Height -0.096413 0.026837 -3.593 0.000327 ***
## Weight 0.043503 0.018002 2.417 0.015665 *
## Surfacenorm 3.967515 0.447179 8.872 < 2e-16 ***
## Visiondome 0.363753 0.383218 0.949 0.342516
## Visionopen 3.187501 0.416001 7.662 1.83e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 526.25 on 479 degrees of freedom
## Residual deviance: 295.20 on 472 degrees of freedom
## AIC: 311.2
##
## Number of Fisher Scoring iterations: 6
library(lme4)
modlap <-
glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
family = binomial,
data = ctsib)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.128219 (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
summary(modlap)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |
## Subject)
## Data: ctsib
##
## AIC BIC logLik deviance df.resid
## 247.6 285.1 -114.8 229.6 471
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0149 -0.1266 -0.0176 -0.0005 4.8664
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 8.197 2.863
## Number of obs: 480, groups: Subject, 40
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.920873 13.082211 0.758 0.4482
## Sexmale 2.825280 1.746247 1.618 0.1057
## Age -0.003644 0.079832 -0.046 0.9636
## Height -0.151012 0.089770 -1.682 0.0925 .
## Weight 0.058926 0.061577 0.957 0.3386
## Surfacenorm 7.524393 1.164363 6.462 1.03e-10 ***
## Visiondome 0.683934 0.529835 1.291 0.1968
## Visionopen 6.321072 1.074726 5.882 4.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Sexmal Age Height Weight Srfcnr Visndm
## Sexmale 0.476
## Age -0.170 0.097
## Height -0.957 -0.391 0.050
## Weight 0.372 -0.366 -0.172 -0.560
## Surfacenorm 0.010 0.195 -0.010 -0.136 0.059
## Visiondome -0.012 0.029 0.001 -0.026 0.015 0.115
## Visionopen 0.013 0.202 -0.007 -0.135 0.051 0.857 0.362
## convergence code: 0
## Model failed to converge with max|grad| = 0.128219 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
library(lme4)
modgh <-
glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
nAGQ = 25,
family = binomial,
data = ctsib)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.290295 (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
summary(modgh)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |
## Subject)
## Data: ctsib
##
## AIC BIC logLik deviance df.resid
## 247.9 285.5 -115.0 229.9 471
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8842 -0.1386 -0.0197 -0.0007 4.9020
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 7.195 2.682
## Number of obs: 480, groups: Subject, 40
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 16.171589 12.710041 1.272 0.2032
## Sexmale 3.096765 1.695698 1.826 0.0678 .
## Age -0.006675 0.076456 -0.087 0.9304
## Height -0.192267 0.088878 -2.163 0.0305 *
## Weight 0.075164 0.059096 1.272 0.2034
## Surfacenorm 7.285433 1.055120 6.905 5.03e-12 ***
## Visiondome 0.675909 0.527365 1.282 0.2000
## Visionopen 6.088908 0.972366 6.262 3.80e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Sexmal Age Height Weight Srfcnr Visndm
## Sexmale 0.509
## Age -0.166 0.094
## Height -0.959 -0.428 0.047
## Weight 0.386 -0.324 -0.168 -0.570
## Surfacenorm 0.166 0.256 -0.013 -0.281 0.147
## Visiondome 0.007 0.034 0.000 -0.044 0.027 0.113
## Visionopen 0.169 0.265 -0.008 -0.280 0.138 0.829 0.382
## convergence code: 0
## Model failed to converge with max|grad| = 0.290295 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
library(MASS)
modpql <- glmmPQL(stable ~ Sex + Age + Height + Weight + Surface + Vision,
random = ~ 1 | Subject,
family = binomial,
data = ctsib)
## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
## iteration 6
## iteration 7
summary(modpql)
## Linear mixed-effects model fit by maximum likelihood
## Data: ctsib
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 3.060712 0.5906232
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: stable ~ Sex + Age + Height + Weight + Surface + Vision
## Value Std.Error DF t-value p-value
## (Intercept) 15.571494 13.498304 437 1.153589 0.2493
## Sexmale 3.355340 1.752614 35 1.914478 0.0638
## Age -0.006638 0.081959 35 -0.080992 0.9359
## Height -0.190819 0.092023 35 -2.073601 0.0455
## Weight 0.069467 0.062857 35 1.105155 0.2766
## Surfacenorm 7.724078 0.573578 437 13.466492 0.0000
## Visiondome 0.726464 0.325933 437 2.228873 0.0263
## Visionopen 6.485257 0.543980 437 11.921876 0.0000
## Correlation:
## (Intr) Sexmal Age Height Weight Srfcnr Visndm
## Sexmale 0.488
## Age -0.164 0.110
## Height -0.963 -0.388 0.041
## Weight 0.368 -0.374 -0.168 -0.555
## Surfacenorm 0.051 0.116 0.023 -0.114 0.055
## Visiondome -0.003 0.011 0.004 -0.017 0.011 0.087
## Visionopen 0.056 0.125 0.026 -0.116 0.049 0.788 0.377
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -7.3825387619 -0.2333403347 -0.0233564300 -0.0004216629 9.9310682723
##
## Number of Observations: 480
## Number of Groups: 40
epilepsy
data in ELMR.
seizures
(number of seizures of each epilepsy patient).treat
(binary, 1 = treatment or 0 = control group), expind
(binary, 0 = baseline or 1 = experiment phase), timeadj
(length of phase).epilepsy <- as_tibble(epilepsy) %>%
mutate(period = rep(0:4, 59),
drug = factor(c("placebo", "treatment"))[treat + 1],
phase = factor(c("baseline", "experiment"))[expind + 1]) %>%
print()
## # A tibble: 295 x 9
## seizures id treat expind timeadj age period drug phase
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <fct> <fct>
## 1 11 1 0 0 8 31 0 placebo baseline
## 2 5 1 0 1 2 31 1 placebo experiment
## 3 3 1 0 1 2 31 2 placebo experiment
## 4 3 1 0 1 2 31 3 placebo experiment
## 5 3 1 0 1 2 31 4 placebo experiment
## 6 11 2 0 0 8 30 0 placebo baseline
## 7 3 2 0 1 2 30 1 placebo experiment
## 8 5 2 0 1 2 30 2 placebo experiment
## 9 3 2 0 1 2 30 3 placebo experiment
## 10 3 2 0 1 2 30 4 placebo experiment
## # … with 285 more rows
summary(epilepsy)
## seizures id treat expind timeadj
## Min. : 0.00 Min. : 1 Min. :0.0000 Min. :0.0 Min. :2.0
## 1st Qu.: 3.00 1st Qu.:15 1st Qu.:0.0000 1st Qu.:1.0 1st Qu.:2.0
## Median : 6.00 Median :30 Median :1.0000 Median :1.0 Median :2.0
## Mean : 12.86 Mean :30 Mean :0.5254 Mean :0.8 Mean :3.2
## 3rd Qu.: 14.50 3rd Qu.:45 3rd Qu.:1.0000 3rd Qu.:1.0 3rd Qu.:2.0
## Max. :151.00 Max. :59 Max. :1.0000 Max. :1.0 Max. :8.0
## age period drug phase
## Min. :18.00 Min. :0 placebo :140 baseline : 59
## 1st Qu.:22.00 1st Qu.:1 treatment:155 experiment:236
## Median :28.00 Median :2
## Mean :28.85 Mean :2
## 3rd Qu.:35.00 3rd Qu.:3
## Max. :57.00 Max. :4
epilepsy %>%
group_by(drug, phase) %>%
summarize(rate = mean(seizures / timeadj)) %>%
xtabs(formula = rate ~ phase + drug)
## drug
## phase placebo treatment
## baseline 3.848214 3.955645
## experiment 4.303571 3.983871
epilepsy %>%
ggplot() +
geom_line(mapping = aes(x = period, y = seizures, linetype = drug, group = id)) +
xlim(1, 4) +
scale_y_sqrt(breaks = (0:10)^2) +
theme(legend.position = "top", legend.direction = "horizontal")
## Warning: Removed 59 row(s) containing missing values (geom_path).
ratesum <- epilepsy %>%
group_by(id, phase, drug) %>%
summarize(rate = mean(seizures / timeadj))
spread(ratesum, phase, rate) %>%
ggplot() +
geom_point(mapping = aes(x = baseline, y = experiment, shape = drug)) +
scale_x_sqrt() + scale_y_sqrt() +
geom_abline(intercept = 0, slope = 1) +
theme(legend.position = "top", legend.direction = "horizontal")
epilo <- filter(epilepsy, id != 49)
modglm <- glm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
family = poisson,
data = epilo)
summary(modglm)
##
## Call:
## glm(formula = seizures ~ offset(log(timeadj)) + expind + treat +
## I(expind * treat), family = poisson, data = epilo)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.4725 -2.3605 -1.0290 0.9001 14.0104
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.34761 0.03406 39.566 < 2e-16 ***
## expind 0.11184 0.04688 2.386 0.017 *
## treat -0.10682 0.04863 -2.197 0.028 *
## I(expind * treat) -0.30238 0.06971 -4.338 1.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2485.1 on 289 degrees of freedom
## Residual deviance: 2411.5 on 286 degrees of freedom
## AIC: 3449.7
##
## Number of Fisher Scoring iterations: 5
library(lme4)
modgh <- glmer(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat) + (1 | id),
nAGQ = 25,
family = poisson,
data = epilo)
summary(modgh)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: poisson ( log )
## Formula: seizures ~ offset(log(timeadj)) + expind + treat + I(expind *
## treat) + (1 | id)
## Data: epilo
##
## AIC BIC logLik deviance df.resid
## 877.7 896.1 -433.9 867.7 285
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8724 -0.8482 -0.1722 0.5697 9.8941
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.515 0.7176
## Number of obs: 290, groups: id, 58
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.035998 0.141256 7.334 2.23e-13 ***
## expind 0.111838 0.046877 2.386 0.017 *
## treat -0.008152 0.196524 -0.041 0.967
## I(expind * treat) -0.302387 0.069714 -4.338 1.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) expind treat
## expind -0.175
## treat -0.718 0.126
## I(xpnd*trt) 0.118 -0.672 -0.173
library(MASS)
modpql <- glmmPQL(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
random = ~ 1 | id,
family = poisson,
data = epilo)
## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
summary(modpql)
## Linear mixed-effects model fit by maximum likelihood
## Data: epilo
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6819731 1.605408
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat)
## Value Std.Error DF t-value p-value
## (Intercept) 1.0807909 0.14370130 230 7.521094 0.0000
## expind 0.1118360 0.07576686 230 1.476055 0.1413
## treat -0.0089374 0.20024399 56 -0.044632 0.9646
## I(expind * treat) -0.3023841 0.11268890 230 -2.683353 0.0078
## Correlation:
## (Intr) expind treat
## expind -0.278
## treat -0.718 0.200
## I(expind * treat) 0.187 -0.672 -0.274
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.2956801 -0.5651709 -0.1501522 0.3243903 6.3134910
##
## Number of Observations: 290
## Number of Groups: 58
The quasi-likelihood (quasi-binomial, quasi-Poisson, etc) approach can be generalized to handle correlated, nonnormal responses.
Let \(\mathbf{Y}_i\) be the response vector of \(i\)-th individual. \[\begin{eqnarray*} \mathbb{E} \mathbf{Y}_i &=& \boldsymbol{\mu}_i \\ g(\boldsymbol{\mu}_i) &=& \mathbf{X}_i \boldsymbol{\beta} \\ \mathbf{V}_i &=& \text{Var}(\mathbf{Y}_i) = \phi \mathbf{A}_i^{1/2} \mathbf{R}_i(\boldsymbol{\alpha}) \mathbf{A}_i^{1/2}, \end{eqnarray*}\] where \(\mathbf{A}_i = \text{diag}(a(\boldsymbol{\mu}))\) captures the individual variances and \(\mathbf{R}_i(\boldsymbol{\alpha})\) is a working correlation matrix.
Commonly used working correlation are
compound symmetry, or equicorrelation, or exchangeable correlation: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & & \rho \\ \vdots & & \ddots & \vdots \\ \rho & \rho & \cdots & 1 \end{pmatrix} \]
Autoregressive model: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{n-1} \\ \rho & 1 & \rho & & \rho^{n-2} \\ \rho^2 & \rho & 1 & & \vdots \\ \vdots & & & \ddots & \\ \rho^{n-1} & \cdots & & \rho & 1 \end{pmatrix} \]
Unstructured correlation matrix
Given estimates of \(\phi\) and \(\boldsymbol{\alpha}\), we solve the estimation equation \[ \sum_i [D_{\boldsymbol{\beta}} \boldsymbol{\mu}_i(\boldsymbol{\beta})]^T \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol{\mu}_i) = \mathbf{0}. \]
Let’s revisit the ctsib
(stability) data set. We use the exchangeable correlation, or equivalently, compound symmetry. Standard errors and Wald test are constructed using the sandwich estimators. \(\boldsymbol{\beta}\) estimates in GEE are about half the size of those from GLMM. The scale.fix = TRUE
instructs the function geeglm
not estimate the dispersion parameter \(\phi\), since for binary response the dispersion is always 1.
library(geepack)
modgeep <- geeglm(stable ~ Sex + Age + Height + Weight + Surface + Vision,
id = Subject,
corstr = "exchangeable",
scale.fix = TRUE,
data = ctsib,
family = binomial(link = "logit"))
summary(modgeep)
##
## Call:
## geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface +
## Vision, family = binomial(link = "logit"), data = ctsib,
## id = Subject, corstr = "exchangeable", scale.fix = TRUE)
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 8.62332 5.91992 2.122 0.1452
## Sexmale 1.64488 0.90347 3.315 0.0687 .
## Age -0.01205 0.04802 0.063 0.8019
## Height -0.10211 0.04239 5.801 0.0160 *
## Weight 0.04365 0.03399 1.649 0.1991
## Surfacenorm 3.91632 0.56682 47.738 4.87e-12 ***
## Visiondome 0.35888 0.40403 0.789 0.3744
## Visionopen 3.17990 0.46063 47.657 5.08e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = exchangeable
## Scale is fixed.
##
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.2185 0.04467
## Number of clusters: 40 Maximum cluster size: 12
epilepsy
data set.modgeep <- geeglm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind*treat),
id = id,
family = poisson,
corstr = "ar1",
data = epilepsy,
subset = (id != 49))
summary(modgeep)
##
## Call:
## geeglm(formula = seizures ~ offset(log(timeadj)) + expind + treat +
## I(expind * treat), family = poisson, data = epilepsy, subset = (id !=
## 49), id = id, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 1.31383 0.16159 66.103 4.44e-16 ***
## expind 0.15094 0.11077 1.857 0.1730
## treat -0.07973 0.19831 0.162 0.6877
## I(expind * treat) -0.39872 0.17454 5.218 0.0223 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 10.61 2.35
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.7831 0.05192
## Number of clusters: 58 Maximum cluster size: 5