Mixed Effects Models for Nonnormal Responses (ELMR Chapter 13)

Generalized linear mixed models (GLMM)
Overview of estimation and inference methods
Binary response example
Count response example
GEE (generalized estimation equation)

Display system information and load tidyverse and faraway packages

sessionInfo()

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0
##  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.1  
##  [9] knitr_1.28      stringr_1.4.0   xfun_0.13       digest_0.6.25  
## [13] rlang_0.4.5     evaluate_0.14

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.0     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(faraway)

Generalized linear mixed models (GLMM)

So far we studied linear mixed models (LMM), where the responses are continuous and normally distributed. It is natural to extend LMM to handle nonnormally distributed responses.
Recall that, in GLM, the distribution of \(Y_i\) is from the exponential family of distributions of form \[ f(y_i \mid \theta_i, \phi) = \exp \left[ \frac{y \theta_i - b(\theta_i)}{a(\phi)} + c(y_i, \phi) \right]. \] If we use the canonical link, then \[ \theta_i = g(\mu_i) = \eta_i, \] where \(\mu_i = \mathbb{E} Y_i\). Now let \[ \eta_i = \mathbf{x}_i^T \boldsymbol{\beta} + \mathbf{z}_i^T \boldsymbol{\gamma}, \] where \(\boldsymbol{\beta}\) is the fixed effects and \(\boldsymbol{\gamma}\) is the random effects with density \(h(\gamma \mid \boldsymbol{\Sigma})\). (Typically we assume multivariate normal with mean 0 and covariance \(\boldsymbol{\Sigma}\).) Then the likelihood is \[ L(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \prod_{i=1}^n \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \sum_{i=1}^n \log \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma}. \] This model, combining GLM and mixed effects model, is called the generalized linear mixed effects model (GLMM).

Overview of estimation and inference methods

Estimation and inference of GLMM have been active research. We give an overview of a few commonly used methods.
Maximum likelihood estimate (MLE) is obtained by maximizing the log-likelihood function. However, each iteration of the optimization algorithm needs to evaluate numerical integration, which quickly becomes infeasible when the dimension of random effects \(\boldsymbol{\gamma}\) is large.
- Gauss-Hermite quadrature can be used for numerical integration. It approximates the integral by a weighted sum of integrands at different knots. The more knots, the more accurate the approximation, but at higher computational expense.
- Laplace approximation is a special case of Gauss-Hermite quadrature with only one knot. It is less accurate than Guass-Hermite quadrature but computationally cheaper.
Bayesian method is another approach for estimation and inference of GLMM. It’s not covered in this course due to time constraint. Interested students can study ELMR Chapter 12.
Penalized quasi-likelihood (PQL). In the IRWLS (iteratively reweighted least squares) procedure for estimating the GLM, each iteration uses the pseudo-response or working response \[ \mathbf{z}^{(t)} = \boldsymbol{\eta}^{(t)} + (\mathbf{W}^{(t)})^{-1} (\mathbf{y} - \widehat{\boldsymbol{\mu}}^{(t)}). \] It can be adapted to the mixed effects model. PQL has computational advantage but generally considered less accurate than other approaches.
Generalized estimation equations (GEE). To be discussed later.

Binary response example

ctsib data in EMLR studies the balance of 40 individuals.
- Response: stability (1 = stable, 0 = instable).
- Predictors: sex (binary), age (continuous), height (continuous), surface (binary, norm or foam), vision (factor, open or closed or dome).
- Each subject is tested twice on a combination of surface and vision. 12 measurements per subject.

ctsib <- as_tibble(ctsib) %>%
  mutate(stable = ifelse(CTSIB == 1, 1, 0)) %>%
  print(n = 24)

## # A tibble: 480 x 9
##    Subject Sex     Age Height Weight Surface Vision CTSIB stable
##      <int> <fct> <int>  <dbl>  <dbl> <fct>   <fct>  <int>  <dbl>
##  1       1 male     22    176   68.2 norm    open       1      1
##  2       1 male     22    176   68.2 norm    open       1      1
##  3       1 male     22    176   68.2 norm    closed     2      0
##  4       1 male     22    176   68.2 norm    closed     2      0
##  5       1 male     22    176   68.2 norm    dome       1      1
##  6       1 male     22    176   68.2 norm    dome       2      0
##  7       1 male     22    176   68.2 foam    open       2      0
##  8       1 male     22    176   68.2 foam    open       2      0
##  9       1 male     22    176   68.2 foam    closed     2      0
## 10       1 male     22    176   68.2 foam    closed     2      0
## 11       1 male     22    176   68.2 foam    dome       2      0
## 12       1 male     22    176   68.2 foam    dome       2      0
## 13       2 male     22    181   67.6 norm    open       1      1
## 14       2 male     22    181   67.6 norm    open       1      1
## 15       2 male     22    181   67.6 norm    closed     2      0
## 16       2 male     22    181   67.6 norm    closed     2      0
## 17       2 male     22    181   67.6 norm    dome       2      0
## 18       2 male     22    181   67.6 norm    dome       2      0
## 19       2 male     22    181   67.6 foam    open       2      0
## 20       2 male     22    181   67.6 foam    open       2      0
## 21       2 male     22    181   67.6 foam    closed     3      0
## 22       2 male     22    181   67.6 foam    closed     3      0
## 23       2 male     22    181   67.6 foam    dome       3      0
## 24       2 male     22    181   67.6 foam    dome       3      0
## # … with 456 more rows

summary(ctsib)

##     Subject          Sex           Age            Height          Weight      
##  Min.   : 1.00   female:240   Min.   :18.00   Min.   :142.0   Min.   : 44.20  
##  1st Qu.:10.75   male  :240   1st Qu.:21.75   1st Qu.:166.8   1st Qu.: 60.65  
##  Median :20.50                Median :25.50   Median :173.0   Median : 68.00  
##  Mean   :20.50                Mean   :26.80   Mean   :172.1   Mean   : 71.14  
##  3rd Qu.:30.25                3rd Qu.:33.00   3rd Qu.:180.2   3rd Qu.: 83.50  
##  Max.   :40.00                Max.   :38.00   Max.   :190.0   Max.   :102.40  
##  Surface       Vision        CTSIB           stable      
##  foam:240   closed:160   Min.   :1.000   Min.   :0.0000  
##  norm:240   dome  :160   1st Qu.:2.000   1st Qu.:0.0000  
##             open  :160   Median :2.000   Median :0.0000  
##                          Mean   :1.919   Mean   :0.2375  
##                          3rd Qu.:2.000   3rd Qu.:0.0000  
##                          Max.   :4.000   Max.   :1.0000

Graphical summary.

subsum <- ctsib %>% 
  group_by(Subject) %>%
  summarise(Height = Height[1], 
            Weight = Weight[1], 
            stable = mean(stable), 
            Age    = Age[1], 
            Sex    = Sex[1])
subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Height, y = stable))

subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Weight, y = stable))

subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Age, y = stable))

subsum %>%  
  ggplot() + 
  geom_boxplot(mapping = aes(x = Sex, y = stable))

ctsib %>%
  group_by(Subject, Surface) %>%
  summarize(stable = mean(stable)) %>%
  ggplot() + 
  geom_boxplot(mapping = aes(x = Surface, y = stable)) + 
  scale_y_log10()

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Removed 39 rows containing non-finite values (stat_boxplot).

ctsib %>%
  group_by(Subject, Vision) %>%
  summarize(stable = mean(stable)) %>%
  ggplot() + 
  geom_boxplot(mapping = aes(x = Vision, y = stable)) + 
  scale_y_log10()

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Removed 63 rows containing non-finite values (stat_boxplot).

GLM analysis. What are the issues with the GLM here?

gf <- glm(stable ~ Sex + Age + Height + Weight + Surface + Vision, 
          family = binomial,
          data   = ctsib)
summary(gf)

## 
## Call:
## glm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
##     Vision, family = binomial, data = ctsib)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.28763  -0.54392  -0.13884  -0.05483   2.48130  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  7.277449   3.803987   1.913 0.055734 .  
## Sexmale      1.401577   0.516231   2.715 0.006627 ** 
## Age          0.002521   0.024307   0.104 0.917390    
## Height      -0.096413   0.026837  -3.593 0.000327 ***
## Weight       0.043503   0.018002   2.417 0.015665 *  
## Surfacenorm  3.967515   0.447179   8.872  < 2e-16 ***
## Visiondome   0.363753   0.383218   0.949 0.342516    
## Visionopen   3.187501   0.416001   7.662 1.83e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 526.25  on 479  degrees of freedom
## Residual deviance: 295.20  on 472  degrees of freedom
## AIC: 311.2
## 
## Number of Fisher Scoring iterations: 6

GLMM analysis. We treat subjects as random. First, estimation by Laplace approximation.

library(lme4)

modlap <- 
  glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
        family = binomial,
        data   = ctsib)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.128219 (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

summary(modlap)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |  
##     Subject)
##    Data: ctsib
## 
##      AIC      BIC   logLik deviance df.resid 
##    247.6    285.1   -114.8    229.6      471 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0149 -0.1266 -0.0176 -0.0005  4.8664 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Subject (Intercept) 8.197    2.863   
## Number of obs: 480, groups:  Subject, 40
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.920873  13.082211   0.758   0.4482    
## Sexmale      2.825280   1.746247   1.618   0.1057    
## Age         -0.003644   0.079832  -0.046   0.9636    
## Height      -0.151012   0.089770  -1.682   0.0925 .  
## Weight       0.058926   0.061577   0.957   0.3386    
## Surfacenorm  7.524393   1.164363   6.462 1.03e-10 ***
## Visiondome   0.683934   0.529835   1.291   0.1968    
## Visionopen   6.321072   1.074726   5.882 4.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Sexmal Age    Height Weight Srfcnr Visndm
## Sexmale      0.476                                          
## Age         -0.170  0.097                                   
## Height      -0.957 -0.391  0.050                            
## Weight       0.372 -0.366 -0.172 -0.560                     
## Surfacenorm  0.010  0.195 -0.010 -0.136  0.059              
## Visiondome  -0.012  0.029  0.001 -0.026  0.015  0.115       
## Visionopen   0.013  0.202 -0.007 -0.135  0.051  0.857  0.362
## convergence code: 0
## Model failed to converge with max|grad| = 0.128219 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

GLMM estimation by Gauss-Hermite quadrature with 25 knots.

library(lme4)

modgh <- 
  glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
        nAGQ   = 25,
        family = binomial,
        data   = ctsib)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.290295 (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

summary(modgh)

## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |  
##     Subject)
##    Data: ctsib
## 
##      AIC      BIC   logLik deviance df.resid 
##    247.9    285.5   -115.0    229.9      471 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.8842 -0.1386 -0.0197 -0.0007  4.9020 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Subject (Intercept) 7.195    2.682   
## Number of obs: 480, groups:  Subject, 40
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 16.171589  12.710041   1.272   0.2032    
## Sexmale      3.096765   1.695698   1.826   0.0678 .  
## Age         -0.006675   0.076456  -0.087   0.9304    
## Height      -0.192267   0.088878  -2.163   0.0305 *  
## Weight       0.075164   0.059096   1.272   0.2034    
## Surfacenorm  7.285433   1.055120   6.905 5.03e-12 ***
## Visiondome   0.675909   0.527365   1.282   0.2000    
## Visionopen   6.088908   0.972366   6.262 3.80e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Sexmal Age    Height Weight Srfcnr Visndm
## Sexmale      0.509                                          
## Age         -0.166  0.094                                   
## Height      -0.959 -0.428  0.047                            
## Weight       0.386 -0.324 -0.168 -0.570                     
## Surfacenorm  0.166  0.256 -0.013 -0.281  0.147              
## Visiondome   0.007  0.034  0.000 -0.044  0.027  0.113       
## Visionopen   0.169  0.265 -0.008 -0.280  0.138  0.829  0.382
## convergence code: 0
## Model failed to converge with max|grad| = 0.290295 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

GLMM estimation by PQL.

library(MASS)

modpql <- glmmPQL(stable ~ Sex + Age + Height + Weight + Surface + Vision,
                  random = ~ 1 | Subject,
                  family = binomial,
                  data   = ctsib)

## iteration 1

## iteration 2

## iteration 3

## iteration 4

## iteration 5

## iteration 6

## iteration 7

summary(modpql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: ctsib 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept)  Residual
## StdDev:    3.060712 0.5906232
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: stable ~ Sex + Age + Height + Weight + Surface + Vision 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 15.571494 13.498304 437  1.153589  0.2493
## Sexmale      3.355340  1.752614  35  1.914478  0.0638
## Age         -0.006638  0.081959  35 -0.080992  0.9359
## Height      -0.190819  0.092023  35 -2.073601  0.0455
## Weight       0.069467  0.062857  35  1.105155  0.2766
## Surfacenorm  7.724078  0.573578 437 13.466492  0.0000
## Visiondome   0.726464  0.325933 437  2.228873  0.0263
## Visionopen   6.485257  0.543980 437 11.921876  0.0000
##  Correlation: 
##             (Intr) Sexmal Age    Height Weight Srfcnr Visndm
## Sexmale      0.488                                          
## Age         -0.164  0.110                                   
## Height      -0.963 -0.388  0.041                            
## Weight       0.368 -0.374 -0.168 -0.555                     
## Surfacenorm  0.051  0.116  0.023 -0.114  0.055              
## Visiondome  -0.003  0.011  0.004 -0.017  0.011  0.087       
## Visionopen   0.056  0.125  0.026 -0.116  0.049  0.788  0.377
## 
## Standardized Within-Group Residuals:
##           Min            Q1           Med            Q3           Max 
## -7.3825387619 -0.2333403347 -0.0233564300 -0.0004216629  9.9310682723 
## 
## Number of Observations: 480
## Number of Groups: 40

Count response example

epilepsy data in ELMR.
- Response: seizures (number of seizures of each epilepsy patient).
- Variables: treat (binary, 1 = treatment or 0 = control group), expind (binary, 0 = baseline or 1 = experiment phase), timeadj (length of phase).

epilepsy <- as_tibble(epilepsy) %>% 
  mutate(period = rep(0:4, 59), 
         drug   = factor(c("placebo", "treatment"))[treat + 1],
         phase  = factor(c("baseline", "experiment"))[expind + 1]) %>%
  print()

## # A tibble: 295 x 9
##    seizures    id treat expind timeadj   age period drug    phase     
##       <dbl> <int> <dbl>  <dbl>   <dbl> <dbl>  <int> <fct>   <fct>     
##  1       11     1     0      0       8    31      0 placebo baseline  
##  2        5     1     0      1       2    31      1 placebo experiment
##  3        3     1     0      1       2    31      2 placebo experiment
##  4        3     1     0      1       2    31      3 placebo experiment
##  5        3     1     0      1       2    31      4 placebo experiment
##  6       11     2     0      0       8    30      0 placebo baseline  
##  7        3     2     0      1       2    30      1 placebo experiment
##  8        5     2     0      1       2    30      2 placebo experiment
##  9        3     2     0      1       2    30      3 placebo experiment
## 10        3     2     0      1       2    30      4 placebo experiment
## # … with 285 more rows

summary(epilepsy)

##     seizures            id         treat            expind       timeadj   
##  Min.   :  0.00   Min.   : 1   Min.   :0.0000   Min.   :0.0   Min.   :2.0  
##  1st Qu.:  3.00   1st Qu.:15   1st Qu.:0.0000   1st Qu.:1.0   1st Qu.:2.0  
##  Median :  6.00   Median :30   Median :1.0000   Median :1.0   Median :2.0  
##  Mean   : 12.86   Mean   :30   Mean   :0.5254   Mean   :0.8   Mean   :3.2  
##  3rd Qu.: 14.50   3rd Qu.:45   3rd Qu.:1.0000   3rd Qu.:1.0   3rd Qu.:2.0  
##  Max.   :151.00   Max.   :59   Max.   :1.0000   Max.   :1.0   Max.   :8.0  
##       age            period         drug            phase    
##  Min.   :18.00   Min.   :0   placebo  :140   baseline  : 59  
##  1st Qu.:22.00   1st Qu.:1   treatment:155   experiment:236  
##  Median :28.00   Median :2                                   
##  Mean   :28.85   Mean   :2                                   
##  3rd Qu.:35.00   3rd Qu.:3                                   
##  Max.   :57.00   Max.   :4

Numerical summary. Is there any difference in the number of seizures per week in the placebo and treatment groups?

epilepsy %>%
  group_by(drug, phase) %>%
  summarize(rate = mean(seizures / timeadj)) %>%
  xtabs(formula = rate ~ phase + drug)

##             drug
## phase         placebo treatment
##   baseline   3.848214  3.955645
##   experiment 4.303571  3.983871

Graphical summary.

epilepsy %>%
  ggplot() + 
  geom_line(mapping = aes(x = period, y = seizures, linetype = drug, group = id)) +
  xlim(1, 4) + 
  scale_y_sqrt(breaks = (0:10)^2) + 
  theme(legend.position = "top", legend.direction = "horizontal")

## Warning: Removed 59 row(s) containing missing values (geom_path).

ratesum <- epilepsy %>%
  group_by(id, phase, drug) %>%
  summarize(rate = mean(seizures / timeadj))

spread(ratesum, phase, rate) %>%
  ggplot() + 
  geom_point(mapping = aes(x = baseline, y = experiment, shape = drug)) + 
  scale_x_sqrt() + scale_y_sqrt() + 
  geom_abline(intercept = 0, slope = 1) + 
  theme(legend.position = "top", legend.direction = "horizontal")

In following analysis we remove patient 49, who seems to have unsually high rate of seizures.

epilo <- filter(epilepsy, id != 49)

We start with GLM analysis even though the model is not correct due to the grouping of observations.

modglm <- glm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat), 
              family   = poisson,
              data     = epilo)
summary(modglm)

## 
## Call:
## glm(formula = seizures ~ offset(log(timeadj)) + expind + treat + 
##     I(expind * treat), family = poisson, data = epilo)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.4725  -2.3605  -1.0290   0.9001  14.0104  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.34761    0.03406  39.566  < 2e-16 ***
## expind             0.11184    0.04688   2.386    0.017 *  
## treat             -0.10682    0.04863  -2.197    0.028 *  
## I(expind * treat) -0.30238    0.06971  -4.338 1.44e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2485.1  on 289  degrees of freedom
## Residual deviance: 2411.5  on 286  degrees of freedom
## AIC: 3449.7
## 
## Number of Fisher Scoring iterations: 5

GLMM analysis by Gauss-Hermite quadrature with 25 knots.

library(lme4)

modgh <- glmer(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat) + (1 | id), 
               nAGQ     = 25,
               family   = poisson,
               data     = epilo)
summary(modgh)

## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: poisson  ( log )
## Formula: seizures ~ offset(log(timeadj)) + expind + treat + I(expind *  
##     treat) + (1 | id)
##    Data: epilo
## 
##      AIC      BIC   logLik deviance df.resid 
##    877.7    896.1   -433.9    867.7      285 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8724 -0.8482 -0.1722  0.5697  9.8941 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.515    0.7176  
## Number of obs: 290, groups:  id, 58
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.035998   0.141256   7.334 2.23e-13 ***
## expind             0.111838   0.046877   2.386    0.017 *  
## treat             -0.008152   0.196524  -0.041    0.967    
## I(expind * treat) -0.302387   0.069714  -4.338 1.44e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) expind treat 
## expind      -0.175              
## treat       -0.718  0.126       
## I(xpnd*trt)  0.118 -0.672 -0.173

GLMM estimation by PQL.

library(MASS)

modpql <- glmmPQL(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
                  random = ~ 1 | id,
                  family = poisson,
                  data   = epilo)

## iteration 1

## iteration 2

## iteration 3

## iteration 4

## iteration 5

summary(modpql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: epilo 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:   0.6819731 1.605408
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: seizures ~ offset(log(timeadj)) + expind + treat + I(expind *      treat) 
##                        Value  Std.Error  DF   t-value p-value
## (Intercept)        1.0807909 0.14370130 230  7.521094  0.0000
## expind             0.1118360 0.07576686 230  1.476055  0.1413
## treat             -0.0089374 0.20024399  56 -0.044632  0.9646
## I(expind * treat) -0.3023841 0.11268890 230 -2.683353  0.0078
##  Correlation: 
##                   (Intr) expind treat 
## expind            -0.278              
## treat             -0.718  0.200       
## I(expind * treat)  0.187 -0.672 -0.274
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.2956801 -0.5651709 -0.1501522  0.3243903  6.3134910 
## 
## Number of Observations: 290
## Number of Groups: 58

GEE (generalized estimation equation)

The quasi-likelihood (quasi-binomial, quasi-Poisson, etc) approach can be generalized to handle correlated, nonnormal responses.
Let \(\mathbf{Y}_i\) be the response vector of \(i\)-th individual. \[\begin{eqnarray*} \mathbb{E} \mathbf{Y}_i &=& \boldsymbol{\mu}_i \\ g(\boldsymbol{\mu}_i) &=& \mathbf{X}_i \boldsymbol{\beta} \\ \mathbf{V}_i &=& \text{Var}(\mathbf{Y}_i) = \phi \mathbf{A}_i^{1/2} \mathbf{R}_i(\boldsymbol{\alpha}) \mathbf{A}_i^{1/2}, \end{eqnarray*}\] where \(\mathbf{A}_i = \text{diag}(a(\boldsymbol{\mu}))\) captures the individual variances and \(\mathbf{R}_i(\boldsymbol{\alpha})\) is a working correlation matrix.
Commonly used working correlation are
- compound symmetry, or equicorrelation, or exchangeable correlation: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & & \rho \\ \vdots & & \ddots & \vdots \\ \rho & \rho & \cdots & 1 \end{pmatrix} \]
- Autoregressive model: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{n-1} \\ \rho & 1 & \rho & & \rho^{n-2} \\ \rho^2 & \rho & 1 & & \vdots \\ \vdots & & & \ddots & \\ \rho^{n-1} & \cdots & & \rho & 1 \end{pmatrix} \]
- Unstructured correlation matrix
Given estimates of \(\phi\) and \(\boldsymbol{\alpha}\), we solve the estimation equation \[ \sum_i [D_{\boldsymbol{\beta}} \boldsymbol{\mu}_i(\boldsymbol{\beta})]^T \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol{\mu}_i) = \mathbf{0}. \]
Let’s revisit the ctsib (stability) data set. We use the exchangeable correlation, or equivalently, compound symmetry. Standard errors and Wald test are constructed using the sandwich estimators. \(\boldsymbol{\beta}\) estimates in GEE are about half the size of those from GLMM. The scale.fix = TRUE instructs the function geeglm not estimate the dispersion parameter \(\phi\), since for binary response the dispersion is always 1.

library(geepack)

modgeep <- geeglm(stable    ~ Sex + Age + Height + Weight + Surface + Vision,
                  id        = Subject, 
                  corstr    = "exchangeable",
                  scale.fix = TRUE,
                  data      = ctsib,
                  family    = binomial(link = "logit"))
summary(modgeep)

## 
## Call:
## geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
##     Vision, family = binomial(link = "logit"), data = ctsib, 
##     id = Subject, corstr = "exchangeable", scale.fix = TRUE)
## 
##  Coefficients:
##             Estimate  Std.err   Wald Pr(>|W|)    
## (Intercept)  8.62332  5.91992  2.122   0.1452    
## Sexmale      1.64488  0.90347  3.315   0.0687 .  
## Age         -0.01205  0.04802  0.063   0.8019    
## Height      -0.10211  0.04239  5.801   0.0160 *  
## Weight       0.04365  0.03399  1.649   0.1991    
## Surfacenorm  3.91632  0.56682 47.738 4.87e-12 ***
## Visiondome   0.35888  0.40403  0.789   0.3744    
## Visionopen   3.17990  0.46063 47.657 5.08e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = exchangeable 
## Scale is fixed.
## 
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha   0.2185 0.04467
## Number of clusters:   40  Maximum cluster size: 12

Let’s revisit the epilepsy data set.

modgeep <- geeglm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind*treat),
                  id       = id, 
                  family   = poisson,
                  corstr   = "ar1", 
                  data     = epilepsy,
                  subset  = (id != 49))
summary(modgeep)

## 
## Call:
## geeglm(formula = seizures ~ offset(log(timeadj)) + expind + treat + 
##     I(expind * treat), family = poisson, data = epilepsy, subset = (id != 
##     49), id = id, corstr = "ar1")
## 
##  Coefficients:
##                   Estimate  Std.err   Wald Pr(>|W|)    
## (Intercept)        1.31383  0.16159 66.103 4.44e-16 ***
## expind             0.15094  0.11077  1.857   0.1730    
## treat             -0.07973  0.19831  0.162   0.6877    
## I(expind * treat) -0.39872  0.17454  5.218   0.0223 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = ar1 
## Estimated Scale Parameters:
## 
##             Estimate Std.err
## (Intercept)    10.61    2.35
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha   0.7831 0.05192
## Number of clusters:   58  Maximum cluster size: 5

Mixed Effects Models for Nonnormal Responses (ELMR Chapter 13)

Biostat 200C

Dr. Hua Zhou @ UCLA

May 21, 2020

Generalized linear mixed models (GLMM)

Overview of estimation and inference methods

Binary response example

Count response example

GEE (generalized estimation equation)