Random Effects (ELMR Chapter 10)

Introduction
Mixed effects model
Example: one-way ANOVA
Estimation
Inference
Estimate random effects
Prediction
Diagnostics
Blocks as random effects
- Test fixed effects
- Test random effects
Other designs

Display system information and load tidyverse and faraway packages

sessionInfo()

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0
##  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.1  
##  [9] knitr_1.28      stringr_1.4.0   xfun_0.13       digest_0.6.25  
## [13] rlang_0.4.5     evaluate_0.14

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.0     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(faraway)

Introduction

The pulp data set contains data from an experiment to test the paper brightness depending on a shift operator.

pulp <- as_tibble(pulp) %>%
  print(n = Inf)

## # A tibble: 20 x 2
##    bright operator
##     <dbl> <fct>   
##  1   59.8 a       
##  2   60   a       
##  3   60.8 a       
##  4   60.8 a       
##  5   59.8 a       
##  6   59.8 b       
##  7   60.2 b       
##  8   60.4 b       
##  9   59.9 b       
## 10   60   b       
## 11   60.7 c       
## 12   60.7 c       
## 13   60.5 c       
## 14   60.9 c       
## 15   60.3 c       
## 16   61   d       
## 17   60.8 d       
## 18   60.6 d       
## 19   60.5 d       
## 20   60.5 d

Graphical summary:

pulp %>%
  ggplot(mapping = aes(x = operator, y = bright)) + 
  geom_jitter(width=0.1, height=0.0) + 
  labs(x = "Operator", y = "Bright")

Research question is whether there is any difference in the paper brightness produced by different shift operators.
We can answer the question using the classical ANOVA (analysis of variance) \[ y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad i = 1,\ldots,4, j = 1,\ldots,5, \] where \(\epsilon_{ij}\) are assumed to be iid \(N(0,\sigma^2)\).

options(contrasts = c("contr.sum", "contr.poly"))
lmod <- lm(bright ~ operator, data = pulp)
summary(lmod)

## 
## Call:
## lm(formula = bright ~ operator, data = pulp)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.440 -0.195 -0.070  0.175  0.560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60.40000    0.07289 828.681   <2e-16 ***
## operator1   -0.16000    0.12624  -1.267    0.223    
## operator2   -0.34000    0.12624  -2.693    0.016 *  
## operator3    0.22000    0.12624   1.743    0.101    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.326 on 16 degrees of freedom
## Multiple R-squared:  0.4408, Adjusted R-squared:  0.3359 
## F-statistic: 4.204 on 3 and 16 DF,  p-value: 0.02261

The contrast contr.sum forces the coefficients for 4 operators sum to 0 (when using the dummy/one-hot coding). So the coeffiicient for the 4th operator is 0.28.

model.matrix(lmod)

##    (Intercept) operator1 operator2 operator3
## 1            1         1         0         0
## 2            1         1         0         0
## 3            1         1         0         0
## 4            1         1         0         0
## 5            1         1         0         0
## 6            1         0         1         0
## 7            1         0         1         0
## 8            1         0         1         0
## 9            1         0         1         0
## 10           1         0         1         0
## 11           1         0         0         1
## 12           1         0         0         1
## 13           1         0         0         1
## 14           1         0         0         1
## 15           1         0         0         1
## 16           1        -1        -1        -1
## 17           1        -1        -1        -1
## 18           1        -1        -1        -1
## 19           1        -1        -1        -1
## 20           1        -1        -1        -1
## attr(,"assign")
## [1] 0 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$operator
## [1] "contr.sum"

The factor operator is significant with p-value 0.023.

anova(lmod, test = "F")

## Analysis of Variance Table
## 
## Response: bright
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## operator   3   1.34 0.44667  4.2039 0.02261 *
## Residuals 16   1.70 0.10625                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The hypothesis being tested in the ANOVA is are these four operators same in terms of the paper brightness they produce. This is called a fixed effects model because the effects \(\alpha_i\) are assumed to be fixed.
Suppose we are interested in the research question are the population of operators all same based on this data set? We will arrive a random effects model \[ y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad i = 1,\ldots,4, j = 1,\ldots,5, \] where we assume
- \(\alpha_i\) are iid from \(N(0, \sigma_{\alpha}^2)\),
- \(\epsilon_{ij}\) are iid \(N(0,\sigma_{\epsilon}^2)\), and
- \(\alpha_i\) and \(\epsilon_{ij}\) are jointly independent.
This is called a random effects model because now the effects \(\alpha_i\) are assumed to be random. The parameters in this random effects model are \(\mu\), \(\sigma_\alpha^2\), and \(\sigma_\epsilon^2\). The latter two are also called the variance component parameters. To test the research hypothesis, we would test \(H_0: \sigma_\alpha^2=0\).
Differences between a fixed effects model and a random effects model.
1. the interpretation is different,
2. random effects model can be more parsimonious (less parameters) than the corresponding fixed effects model,
3. observations in random effects model are correlated.
In this chapter, we study estimation and inference for mixed effects models.

Mixed effects model

Traditionally, linear models can be divided into three categories:
1. Fixed effects model: \(\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}\), where \(\boldsymbol{\beta}\) is fixed.
2. Random effects model: \(\mathbf{y} = \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}\), where \(\boldsymbol{\gamma}\) is random.
3. Mixed effects model or mixed model: \(\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}\), where \(\boldsymbol{\beta}\) is fixed and \(\boldsymbol{\gamma}\) is random.
In a mixed effects model \[\begin{eqnarray*} \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon} \end{eqnarray*}\] where
- \(\mathbf{X} \in \mathbb{R}^{n \times p}\) is a design matrix for fixed effects \(\boldsymbol{\beta} \in \mathbb{R}^p\),
- \(\mathbf{Z} \in \mathbb{R}^{n \times q}\) is a design matrix for random effects \(\boldsymbol{\gamma} \in \mathbb{R}^q\),
- The most common assumption is \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I})\), \(\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma})\), and \(\boldsymbol{\epsilon}\) is independent of \(\boldsymbol{\gamma}\).
Primary goal of the mixed model (aka variance components model) is to
- estimation and testing of the fixed effects \(\boldsymbol{\beta}\),
- estimation and testing of the variance component parameters, and
- prediction.

Example: one-way ANOVA

For the one-way ANOVA random effects model with \(a\) levels and \(n_i\) observations in level \(i\), \[ y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad i=1,\ldots,a, \] we recognize it as a mixed effects model \[\begin{eqnarray*} \mathbf{y} = \mathbf{1}_{\sum_i n_i} \mu + \begin{pmatrix} \mathbf{1}_{n_1} & & \\ & \vdots & \\ & & \mathbf{1}_{n_a} \end{pmatrix} \boldsymbol{\gamma} + \boldsymbol{\epsilon}, \end{eqnarray*}\] where \(\boldsymbol{\gamma} = (\alpha_1, \ldots, \alpha_a)^T \sim N(\mathbf{0}_a, \sigma_{\alpha}^2 \mathbf{I}_a)\) and \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_{\sum_i n_i}, \sigma_{\epsilon}^2 \mathbf{I})\) are independent. Note in \(\mathbf{Z}\) we have one column for each level.
How do we estimate parameters \(\mu\), \(\sigma_{\alpha}^2\), and \(\sigma_{\epsilon}^2\)?

Estimation

ANOVA estimator

Consider the one-way ANOVA case. Assume we have a balanced design: \(n_1 = \cdots = n_a = n\). That is each level has the same number of observations \(n\). For example \(n=5\) in the pulp example.
Because \(\mathbb{E} Y_{ij} = \mu\). So we can estimate \(\mu\) by average of \(y_{ij}\)

mean(pulp$bright)

## [1] 60.4

To estimate the variance component parameters \(\sigma_a^2\) and \(\sigma_\epsilon^2\), the familiar ANOVA table gives the partition \[\begin{eqnarray*} \text{SST} &=& \text{SSE} + \text{SSA} \\ \sum_{i=1}^a \sum_{j=1}^n (y_{ij} - \bar{y}_{\cdot \cdot})^2 &=& \sum_{i=1}^a \sum_{j=1}^n (y_{ij} - \bar{y}_{i \cdot})^2 + \sum_{i=1}^a \sum_{j=1}^n (\bar{y}_{i \cdot} - \bar{y}_{\cdot \cdot})^2. \end{eqnarray*}\] Now we have (show in HW4) \[\begin{eqnarray*} \mathbb{E} (\text{SSE}) &=& a(n-1) \sigma_{\epsilon}^2 \\ \mathbb{E} (\text{SSA}) &=& (a-1)(n \sigma_{\alpha}^2 + \sigma_{\epsilon}^2), \end{eqnarray*}\] which can be solved to obtain estimators \[\begin{eqnarray*} \widehat{\sigma}_{\epsilon}^2 &=& \frac{\text{SSE}}{a(n-1)} = \text{MSE}, \\ \widehat{\sigma}_{\alpha}^2 &=& \frac{\text{SSA}/(a-1) - \widehat{\sigma}_{\epsilon}^2}{n} = \frac{\text{MSA} - \text{MSE}}{n}. \end{eqnarray*}\]
For the pulp data, the ANOVA table is

(aovmod <- aov(bright ~ operator, data = pulp) %>%
  summary())

##             Df Sum Sq Mean Sq F value Pr(>F)  
## operator     3   1.34  0.4467   4.204 0.0226 *
## Residuals   16   1.70  0.1062                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

str(aovmod)

## List of 1
##  $ :Classes 'anova' and 'data.frame':    2 obs. of  5 variables:
##   ..$ Df     : num [1:2] 3 16
##   ..$ Sum Sq : num [1:2] 1.34 1.7
##   ..$ Mean Sq: num [1:2] 0.447 0.106
##   ..$ F value: num [1:2] 4.2 NA
##   ..$ Pr(>F) : num [1:2] 0.0226 NA
##  - attr(*, "class")= chr [1:2] "summary.aov" "listof"

We estimate \(\sigma_{\alpha}^2\) by

(aovmod[1][[1]][[3]][1] - aovmod[1][[1]][[3]][2]) / 5

## [1] 0.06808333

and \(\sigma_{\epsilon}^2\) by

aovmod[1][[1]][[3]][2]

## [1] 0.10625

Drawbacks of ANOVA estimators.
1. When MSA<MSE, we obtain \(\widehat{\sigma}_{\alpha}^2\).
2. Hard to generalize to unbalanced ANOVA or more complicated designs.

Maximum likelihood estimation (MLE)

For the mixed effects model \[\begin{eqnarray*} \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}, \end{eqnarray*}\] where \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I})\) and \(\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma})\) are independent of each other, we have \[ \mathbf{Y} \sim N(\mathbf{X} \boldsymbol{\beta}, \mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I}). \] So the likelihood is \[ \frac{1}{(2\pi)^{n/2} \det(\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{1/2}} e^{- \frac 12 (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2) = - \frac n2 \log(2\pi) - \frac 12 \log \det (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I}) - \frac 12 (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}). \]
We can maximize the log-likelihood function to obtain MLE.
Exercise (HW4): Derive the MLE for the balanced one-way ANOVA example.
One drawback of MLE is that it produces biased estimate for the variance component parameters \(\sigma^2\) and \(\boldsymbol{\Sigma}\). E.g., in the linear regression case (fixed effects model), MLE for \(\sigma^2\) is \[ \widehat{\sigma}_{\text{MLE}}^2 = \frac{\text{RSS}}{n}, \] where an unbiased estimator is \[ \widehat{\sigma}^2 = \frac{\text{RSS}}{n-p}, \] where \(p\) is the number of parameters.
Let’s find the MLE for the pulp data using lme4 package. The syntax (1 | operator) instructs that the data is grouped by operator and the 1 indicates that random effect is constant within each group. The default method is REML so we set REML = FALSE to compute the MLE instead.

library(lme4)

smod <- lmer(bright ~ 1 + (1 | operator), data = pulp, REML = FALSE)
summary(smod)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bright ~ 1 + (1 | operator)
##    Data: pulp
## 
##      AIC      BIC   logLik deviance df.resid 
##     22.5     25.5     -8.3     16.5       17 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.50554 -0.78116 -0.06353  0.65850  1.56232 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  operator (Intercept) 0.04575  0.2139  
##  Residual             0.10625  0.3260  
## Number of obs: 20, groups:  operator, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  60.4000     0.1294   466.7

Restricted MLE (REML)

The restricted maximum likelihood (REML) method tries to reduce the bias in variance component estimates.
Assume \(\mathbf{X}\) has full column rank \(p\). Let \(\mathbf{K} \in \mathbb{R}^{n \times (n-p)}\) be an basis of the space \({\cal N}(\mathbf{X}^T)\), which is orthogonal to \({\cal C}(\mathbf{X})\). Then \[ \mathbf{K}^T \mathbf{Y} \sim N(\mathbf{0}_{n-p}, \mathbf{K}^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I}) \mathbf{K}). \] We estimate variance component parameters \((\sigma^2, \boldsymbol{\Sigma})\) by MLE using this transformed data \((\mathbf{K}^T \mathbf{y}, \mathbf{K}^T \mathbf{Z})\). Then we estimate fixed effects \(\boldsymbol{\beta}\) using general least squares. It can be shown that the REML estimate does not depend on the choice of \(\mathbf{K}\).
Let’s find the REML for the pulp data using lme4.

library(lme4)

mmod <- lmer(bright ~ 1 + (1 | operator), data = pulp, REML = TRUE)
summary(mmod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: bright ~ 1 + (1 | operator)
##    Data: pulp
## 
## REML criterion at convergence: 18.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4666 -0.7595 -0.1244  0.6281  1.6012 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  operator (Intercept) 0.06808  0.2609  
##  Residual             0.10625  0.3260  
## Number of obs: 20, groups:  operator, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  60.4000     0.1494   404.2

We found the REML estimate is exactly same as ANOVA estimate. Exercise (HW4): show this.

Inference

LRT and adjusted F test for fixed effects

If we compare two nested models that differ only in fixed effects, we can use the standard likelihood ratio test (LRT). Remember we have to use MLE (not REML) for LRT.
F tests for fixed effects need to use adjusted degrees of freedom (Kenward-Roger adjusted F-test for REML), as implemented in the pbkrtest package.

Parametric bootstrap

Testing the variance component parameters, e.g., \(H_0:\sigma_{\alpha}^2=0\), can be hard because of the boundary condition (estimator has to be nonnegative). Conventional \(\chi^2\) null distribution for LRT can be wrong. E.g., for the pulp data, LRT using the the conventional \(\chi_1^2\) null distribution gives a non-significant p-value.

nullmod <- lm(bright ~ 1, data = pulp)
(lrtstat <- as.numeric(2 * (logLik(smod, REML = FALSE) - logLik(nullmod))))

## [1] 2.568371

# fishy result using LRT
pchisq(lrtstat, 1, lower.tail = FALSE)

## [1] 0.1090199

The idea of parameteric bootstrap is we generate new \(y\) from the fitted null model many times. For each simulation replicate, we calculate the LRT statistic. Then the p-value is estimated by the proportion of simulation replicates that generate LRT statistics larger than the observed one \(2.5684\).

B <- 1000
lrstat <- numeric(B)
set.seed(123) # for reproducibility
for (i in 1:B) {
  by <- unlist(simulate(nullmod))
  bnull <- lm(by ~ 1)
  #balt  <- refitML(smod, by)
  balt  <- suppressMessages(lmer(by ~ 1 + (1 | operator), pulp, REML = FALSE))
  lrstat[i] <- as.numeric(2 * (logLik(balt, REML = FALSE) - logLik(bnull)))
}

Then the bootstrap p-value is

# parametric bootstrap p-value
(pval <- mean(lrstat > 2.5684))

## [1] 0.019

with standard error

sqrt(pval * (1 - pval) / B)

## [1] 0.004317291

Advantages of parametric boostrap:
- It’s not restricted to MLE estimators. It applies to many other estimators.
- It does not rely on the dubious asymptotic \(\chi^2\) distribution.
- It can be used to generate confidence intervals.
- It applies to inference of both \(\boldsymbol{\beta}\) and variance component parameters.
Drawbacks of parametric bootstrap: computationally intensive.

Exact LRT and RLRT test

Another simulation based method is provided by the RLRsim package.

library(RLRsim)

exactLRT(smod, nullmod)

## No restrictions on fixed effects. REML-based inference preferable.

## 
##  simulated finite sample distribution of LRT. (p-value based on 10000
##  simulated values)
## 
## data:  
## LRT = 2.5684, p-value = 0.0241

exactRLRT(mmod)

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 3.4701, p-value = 0.0224

Estimate random effects

Consider the mixed effects model \[ \begin{eqnarray*} \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}, \end{eqnarray*} \] where \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I})\) and \(\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma})\) are independent of each other. The random effects \(\boldsymbol{\gamma}\) are random variables. It does not make sense to estimate \(\boldsymbol{\gamma}\).
However, if we take the Bayesian point of view, we can estimate it by its posterior mean. We have a likelihood \[ \mathbf{y} \mid \boldsymbol{\gamma} \sim N(\mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma}, \sigma^2 \mathbf{I}_n) \] and a prior distribution \[ \boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma}). \] Then by the Bayes theorem, the posterior distribution is \[\begin{eqnarray*} f(\boldsymbol{\gamma} \mid \mathbf{y}) &=& \frac{f(\mathbf{y} \mid \boldsymbol{\gamma}) \times f(\boldsymbol{\gamma})}{f(\mathbf{y})} \\ &=& ... \end{eqnarray*}\] is a multivariate normal with mean (show in HW4) \[ \mathbb{E} (\boldsymbol{\gamma} \mid \mathbf{y}) = \boldsymbol{\Sigma} \mathbf{Z}^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}). \]
We can use this posterior mean to estimate random effects. For the pulp data, estimate of random effects is obtained by

ranef(mmod)$operator

##   (Intercept)
## a  -0.1219403
## b  -0.2591231
## c   0.1676679
## d   0.2133955

Compare these to the coefficients from fixed effects model:

(cc <- model.tables(aov(bright ~ operator, data = pulp)))

## Tables of effects
## 
##  operator 
## operator
##     a     b     c     d 
## -0.16 -0.34  0.22  0.28

We found the estimated random effects are uniformly smaller than the fixed effects

cc[[1]]$operator / ranef(mmod)$operator

##   (Intercept)
## a    1.312117
## b    1.312117
## c    1.312117
## d    1.312117

That’s why Bayesian estimates are often called the shrinkage estimator. Exercise: prove this in HW4 for the balanced one-way ANOVA example.

Prediction

For the observed levels (an operator we know), the best linear unbiased predictors (BLUP) is obtained by

fixef(mmod) + ranef(mmod)$operator

##   (Intercept)
## a    60.27806
## b    60.14088
## c    60.56767
## d    60.61340

predict(mmod, newdata = data.frame(operator="a"))

##        1 
## 60.27806

For a new operator, the best we can do is the intercept.

predict(mmod, re.form = ~ 0)

##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 
##   17   18   19   20 
## 60.4 60.4 60.4 60.4

Diagnostics

Diagnostic plots for random effects usually use the residuals calculated using predicted random effects. These residuals are regarded as estimates of \(\epsilon\).
QQ-plot.

qqnorm(residuals(mmod), main="")

Residual vs fitted.

plot(fitted(mmod), residuals(mmod), xlab = "Fitted", ylab = "Residuals")
abline(h=0)

Blocks as random effects

In the pulp example, operator defines blocks.
In the penicillin data set, we want to study how yield depends on the treat (proccess) and blend (corn steep liquor). It is natural to treat treat as fixed effects and blend as a blocking variable (random effects).

penicillin

##    treat  blend yield
## 1      A Blend1    89
## 2      B Blend1    88
## 3      C Blend1    97
## 4      D Blend1    94
## 5      A Blend2    84
## 6      B Blend2    77
## 7      C Blend2    92
## 8      D Blend2    79
## 9      A Blend3    81
## 10     B Blend3    87
## 11     C Blend3    87
## 12     D Blend3    85
## 13     A Blend4    87
## 14     B Blend4    92
## 15     C Blend4    89
## 16     D Blend4    84
## 17     A Blend5    79
## 18     B Blend5    81
## 19     C Blend5    80
## 20     D Blend5    88

Graphical summary

penicillin %>%
  ggplot() + 
  geom_point(mapping = aes(x = treat, y = yield, color = blend))

penicillin %>%
  ggplot() + 
  geom_point(mapping = aes(x = blend, y = yield, color = treat))

Classical two-way ANOVA:

op <- options(contrasts = c("contr.sum", "contr.poly"))
lmod <- aov(yield ~ blend + treat, data = penicillin)
summary(lmod)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## blend        4    264   66.00   3.504 0.0407 *
## treat        3     70   23.33   1.239 0.3387  
## Residuals   12    226   18.83                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coef(lmod)

## (Intercept)      blend1      blend2      blend3      blend4      treat1 
##          86           6          -3          -1           2          -2 
##      treat2      treat3 
##          -1           3

It is reasonable to treat treat as fixed effects and blend as random effects (draws from the population of blend).

mmod <- lmer(yield ~ treat + (1 | blend), data = penicillin)
summary(mmod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: yield ~ treat + (1 | blend)
##    Data: penicillin
## 
## REML criterion at convergence: 106.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4152 -0.5017 -0.1644  0.6830  1.2836 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  blend    (Intercept) 11.79    3.434   
##  Residual             18.83    4.340   
## Number of obs: 20, groups:  blend, 5
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   86.000      1.817  47.341
## treat1        -2.000      1.681  -1.190
## treat2        -1.000      1.681  -0.595
## treat3         3.000      1.681   1.785
## 
## Correlation of Fixed Effects:
##        (Intr) treat1 treat2
## treat1  0.000              
## treat2  0.000 -0.333       
## treat3  0.000 -0.333 -0.333

options(op)

Random effect estimates are shrunken version of fixed effect estimates.

ranef(mmod)$blend

##        (Intercept)
## Blend1   4.2878788
## Blend2  -2.1439394
## Blend3  -0.7146465
## Blend4   1.4292929
## Blend5  -2.8585859

Test fixed effects

Kenward-Roger adjusted F-tests for REML.

library(pbkrtest)

amod_reml <- lmer(yield ~ treat + (1 | blend), data = penicillin, REML = TRUE)
nmod_reml <- lmer(yield ~ 1 + (1 | blend), data = penicillin, REML = TRUE)
KRmodcomp(amod_reml, nmod_reml)

## F-test with Kenward-Roger approximation; time: 0.16 sec
## large : yield ~ treat + (1 | blend)
## small : yield ~ 1 + (1 | blend)
##          stat     ndf     ddf F.scaling p.value
## Ftest  1.2389  3.0000 12.0000         1  0.3387

LRT using \(\chi^2\) null distribution.

amod_mle <- lmer(yield ~ treat + (1 | blend), data = penicillin, REML = FALSE)
nmod_mle <- lmer(yield ~ 1 + (1 | blend), data = penicillin, REML = FALSE)
as.numeric(2 * (logLik(amod_mle, REML = FALSE) - logLik(nmod_mle, REML = FALSE)))

## [1] 4.047367

The \(\chi_3^2\) approximation gives p-value

pchisq(4.0474, 3, lower.tail = FALSE)

## [1] 0.2563911

Parametric bootstrap. Now we use parametric bootstrap to obtain a p-value for the LRT.

set.seed(123)
B <- 1000
lrstat <- numeric(B)
for (i in 1:B) {
  ryield <- unlist(simulate(nmod_mle))
  nmodr <- suppressMessages(lmer(ryield ~ 1 + (1 | blend), data = penicillin, REML = FALSE))
  amodr <- suppressMessages(lmer(ryield ~ treat + (1 | blend), data = penicillin, REML = FALSE))
  lrstat[i] <- 2 * (logLik(amodr, REML = FALSE) - logLik(nmodr, REML = FALSE))
}

The bootstrap p-value is

# p-value estimated by parametric bootstrap
(pval <- mean(lrstat > 4.0474))

## [1] 0.352