The fundamental analogy for cross-validation is to the collection of new data. That is, predicting the response in each fold from the model fit to data in the other folds is like using the model fit to all of the data to predict the response for new cases from the values of the predictors for those new cases. As we explained in the introductory vignette on cross-validating regression models, the application of this idea to independently sampled cases is straightforward—simply partition the data into random folds of equal size and leave each fold out in turn, or, in the case of LOO CV, simply omit each case in turn.

In contrast, mixed-effects models are fit to *dependent* data,
in which cases as clustered, such as hierarchical data, where the
clusters comprise higher-level units (e.g., students clustered in
schools), or longitudinal data, where the clusters are individuals and
the cases repeated observations on the individuals over time.^{1}

We can think of two approaches to applying cross-validation to
clustered data:^{2}

Treat CV as analogous to predicting the response for one or more cases in a

*newly observed cluster*. In this instance, the folds comprise one or more whole clusters; we refit the model with all of the cases in clusters in the current fold removed; and then we predict the response for the cases in clusters in the current fold. These predictions are based only on fixed effects because the random effects for the omitted clusters are presumably unknown, as they would be for data on cases in newly observed clusters.Treat CV as analogous to predicting the response for a newly observed case in an

*existing cluster*. In this instance, the folds comprise one or more individual cases, and the predictions can use both the fixed and random effects.

Following their use by Raudenbush & Bryk
(2002), data from the 1982 *High School and Beyond* (HSB)
survey have become a staple of the literature on mixed-effects models.
The HSB data are used by Fox & Weisberg
(2019, sec. 7.2.2) to illustrate the application of linear mixed
models to hierarchical data, and we’ll closely follow their example
here.

The HSB data are included in the `MathAchieve`

and
`MathAchSchool`

data sets in the **nlme**
package (Pinheiro & Bates, 2000).
`MathAchieve`

includes individual-level data on 7185 students
in 160 high schools, and `MathAchSchool`

includes
school-level data:

```
data("MathAchieve", package = "nlme")
dim(MathAchieve)
#> [1] 7185 6
head(MathAchieve, 3)
#> Grouped Data: MathAch ~ SES | School
#> School Minority Sex SES MathAch MEANSES
#> 1 1224 No Female -1.528 5.876 -0.428
#> 2 1224 No Female -0.588 19.708 -0.428
#> 3 1224 No Male -0.528 20.349 -0.428
tail(MathAchieve, 3)
#> Grouped Data: MathAch ~ SES | School
#> School Minority Sex SES MathAch MEANSES
#> 7183 9586 No Female 1.332 19.641 0.627
#> 7184 9586 No Female -0.008 16.241 0.627
#> 7185 9586 No Female 0.792 22.733 0.627
data("MathAchSchool", package = "nlme")
dim(MathAchSchool)
#> [1] 160 7
head(MathAchSchool, 2)
#> School Size Sector PRACAD DISCLIM HIMINTY MEANSES
#> 1224 1224 842 Public 0.35 1.597 0 -0.428
#> 1288 1288 1855 Public 0.27 0.174 0 0.128
tail(MathAchSchool, 2)
#> School Size Sector PRACAD DISCLIM HIMINTY MEANSES
#> 9550 9550 1532 Public 0.45 0.791 0 0.059
#> 9586 9586 262 Catholic 1.00 -2.416 0 0.627
```

The first few students are in school number 1224 and the last few in school 9586.

We’ll use only the `School`

, `SES`

(students’
socioeconomic status), and `MathAch`

(their score on a
standardized math-achievement test) variables in the
`MathAchieve`

data set, and `Sector`

(`"Catholic"`

or `"Public"`

) in the
`MathAchSchool`

data set.

Some data-management is required before fitting a mixed-effects model
to the HSB data, for which we use the **dplyr** package
(Wickham, François, Henry, Müller, & Vaughan,
2023):

```
library("dplyr")
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
MathAchieve %>% group_by(School) %>%
summarize(mean.ses = mean(SES)) -> Temp
Temp <- merge(MathAchSchool, Temp, by = "School")
HSB <- merge(Temp[, c("School", "Sector", "mean.ses")],
MathAchieve[, c("School", "SES", "MathAch")], by = "School")
names(HSB) <- tolower(names(HSB))
HSB$cses <- with(HSB, ses - mean.ses)
```

In the process, we created two new school-level variables:
`meanses`

, which is the average SES for students in each
school; and `cses`

, which is school-average SES centered at
its mean. For details, see Fox & Weisberg
(2019, sec. 7.2.2).

Still following Fox and Weisberg, we proceed to use the
`lmer()`

function in the **lme4** package (Bates, Mächler, Bolker, & Walker, 2015) to
fit a mixed model for math achievement to the HSB data:

```
library("lme4")
#> Loading required package: Matrix
hsb.lmer <- lmer(mathach ~ mean.ses * cses + sector * cses
+ (cses | school), data = HSB)
summary(hsb.lmer, correlation = FALSE)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: mathach ~ mean.ses * cses + sector * cses + (cses | school)
#> Data: HSB
#>
#> REML criterion at convergence: 46504
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.159 -0.723 0.017 0.754 2.958
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> school (Intercept) 2.380 1.543
#> cses 0.101 0.318 0.39
#> Residual 36.721 6.060
#> Number of obs: 7185, groups: school, 160
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 12.128 0.199 60.86
#> mean.ses 5.333 0.369 14.45
#> cses 2.945 0.156 18.93
#> sectorCatholic 1.227 0.306 4.00
#> mean.ses:cses 1.039 0.299 3.48
#> cses:sectorCatholic -1.643 0.240 -6.85
```

We can then cross-validate at the cluster (i.e., school) level,

```
library("cv")
#> Loading required package: doParallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
cv(hsb.lmer,
k = 10,
clusterVariables = "school",
seed = 5240)
#> R RNG seed set to 5240
#> 10-Fold Cross Validation based on 160 {school} clusters
#> criterion: mse
#> cross-validation criterion = 39.157
#> bias-adjusted cross-validation criterion = 39.148
#> 95% CI for bias-adjusted CV criterion = (38.066, 40.231)
#> full-sample criterion = 39.006
```

or at the case (i.e., student) level,

```
cv(hsb.lmer, seed = 1575)
#> R RNG seed set to 1575
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.00587228 (tol = 0.002, component 1)
#> boundary (singular) fit: see help('isSingular')
#> 10-Fold Cross Validation
#> criterion: mse
#> cross-validation criterion = 37.445
#> bias-adjusted cross-validation criterion = 37.338
#> 95% CI for bias-adjusted CV criterion = (36.288, 38.388)
#> full-sample criterion = 36.068
```

For cluster-level CV, the `clusterVariables`

argument
tells `cv()`

how the clusters are defined. Were there more
than one clustering variable, say classes within schools, these would be
provided as a character vector of variable names:
`clusterVariables = c("school", "class")`

. For cluster-level
CV, the default is `k = "loo"`

, that is, leave one cluster
out at a time; we instead specify `k = 10`

folds of clusters,
each fold therefore comprising \(160/10 =
16\) schools.

If the `clusterVariables`

argument is omitted, then
case-level CV is employed, with `k = 10`

folds as the
default, here each with \(7185/10 \approx
719\) students. Notice that one of the 10 models refit with a
fold removed failed to converge. Convergence problems are common in
mixed-effects modeling. The apparent issue here is that an estimated
variance component is close to or equal to 0, which is at a boundary of
the parameter space. That shouldn’t disqualify the fitted model for the
kind of prediction required for cross-validation.

There is also a `cv()`

method for linear mixed models fit
by the `lme()`

function in the **nlme** package,
and the arguments for `cv()`

in this case are the same as for
a model fit by `lmer()`

or `glmer()`

. We
illustrate with the mixed model fit to the HSB data:

```
library("nlme")
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:lme4':
#>
#> lmList
#> The following object is masked from 'package:dplyr':
#>
#> collapse
hsb.lme <- lme(
mathach ~ mean.ses * cses + sector * cses,
random = ~ cses | school,
data = HSB,
control = list(opt = "optim")
)
summary(hsb.lme)
#> Linear mixed-effects model fit by REML
#> Data: HSB
#> AIC BIC logLik
#> 46525 46594 -23252
#>
#> Random effects:
#> Formula: ~cses | school
#> Structure: General positive-definite, Log-Cholesky parametrization
#> StdDev Corr
#> (Intercept) 1.541177 (Intr)
#> cses 0.018174 0.006
#> Residual 6.063492
#>
#> Fixed effects: mathach ~ mean.ses * cses + sector * cses
#> Value Std.Error DF t-value p-value
#> (Intercept) 12.1282 0.19920 7022 60.886 0e+00
#> mean.ses 5.3367 0.36898 157 14.463 0e+00
#> cses 2.9421 0.15122 7022 19.456 0e+00
#> sectorCatholic 1.2245 0.30611 157 4.000 1e-04
#> mean.ses:cses 1.0444 0.29107 7022 3.588 3e-04
#> cses:sectorCatholic -1.6421 0.23312 7022 -7.044 0e+00
#> Correlation:
#> (Intr) men.ss cses sctrCt mn.ss:
#> mean.ses 0.256
#> cses 0.000 0.000
#> sectorCatholic -0.699 -0.356 0.000
#> mean.ses:cses 0.000 0.000 0.295 0.000
#> cses:sectorCatholic 0.000 0.000 -0.696 0.000 -0.351
#>
#> Standardized Within-Group Residuals:
#> Min Q1 Med Q3 Max
#> -3.170106 -0.724877 0.014892 0.754263 2.965498
#>
#> Number of Observations: 7185
#> Number of Groups: 160
cv(hsb.lme,
k = 10,
clusterVariables = "school",
seed = 5240)
#> R RNG seed set to 5240
#> 10-Fold Cross Validation based on 160 {school} clusters
#> criterion: mse
#> cross-validation criterion = 39.157
#> bias-adjusted cross-validation criterion = 39.149
#> 95% CI for bias-adjusted CV criterion = (38.066, 40.232)
#> full-sample criterion = 39.006
cv(hsb.lme, seed = 1575)
#> R RNG seed set to 1575
#> 10-Fold Cross Validation
#> criterion: mse
#> cross-validation criterion = 37.442
#> bias-adjusted cross-validation criterion = 37.402
#> 95% CI for bias-adjusted CV criterion = (36.351, 38.453)
#> full-sample criterion = 36.147
```

We used the same random-number generator seeds as in the previous
example cross-validating the model fit by `lmer()`

, and so
the same folds are employed in both cases.^{3} The estimated
covariance components and fixed effects in the summary output differ
slightly between the `lmer()`

and `lme()`

solutions, although both functions seek to maximize the REML criterion.
This is, of course, to be expected when different algorithms are used
for numerical optimization. To the precision reported, the cluster-level
CV results for the `lmer()`

and `lme()`

models are
identical, while the case-level CV results are very similar but not
identical.

We introduce an artificial data set that exemplifies aspects of cross-validation particular to hierarchical models. Using this data set, we show that model comparisons employing cluster-based and those employing case-based cross-validation may not agree on a “best” model. Furthermore, commonly used measures of fit, such as mean-squared error, do not necessarily become smaller as models become larger, even when the models are nested, and even when the measure of fit is computed for the whole data set.

Consider a researcher studying improvement in a skill, yodeling, for example, among students enrolled in a four-year yodeling program. The plan is to measure each student’s skill level at the beginning of the program and every year thereafter until the end of the program, resulting in five annual measurements for each student. It turns out that yodeling appeals to students of all ages, and students enrolling in the program range in age from 20 to 70. Moreover, participants’ untrained yodeling skill is similar at all ages, as is their rate of progress with training. All students complete the four-year program.

The researcher, who has more expertise in yodeling than in modeling, decides to model the response, \(y\), yodeling skill, as a function of age, \(x\), reasoning that students get older during their stay in the program, and (incorrectly) that age can serve as a proxy for elapsed time. The researcher knows that a mixed model should be used to account for clustering due to the expected similarity of measurements taken from each student.

We start by generating the data, using parameters consistent with the
description above and meant to highlight the issues that arise in
cross-validating mixed-effects models:^{4}

```
# Parameters:
set.seed(9693)
Nb <- 100 # number of groups
Nw <- 5 # number of individuals within groups
Bb <- 0 # between-group regression coefficient on group mean
SDre <-
2.0 # between-group SD of random level relative to group mean of x
SDwithin <- 0.5 # within group SD
Bw <- 1 # within group effect of x
Ay <- 10 # intercept for response
Ax <- 20 # starting level of x
Nx <- Nw * 10 # number of distinct x values
Data <- data.frame(group = factor(rep(1:Nb, each = Nw)),
x = Ax + rep(1:Nx, length.out = Nw * Nb)) |>
within({
xm <- ave(x, group, FUN = mean) # within-group mean
y <- Ay +
Bb * xm + # contextual effect
Bw * (x - xm) + # within-group effect
rnorm(Nb, sd = SDre)[group] + # random level by group
rnorm(Nb * Nw, sd = SDwithin) # random error within groups
})
```

Here is a scatterplot of the data for a representative group of 10
(without loss of generality, the first 10) of 100 students, showing the
95% concentration ellipse for each cluster:^{5}

```
library("lattice")
library("latticeExtra")
plot <- xyplot(
y ~ x,
data = Data[1:Nx,],
group = group,
ylim = c(4, 16),
par.settings = list(superpose.symbol = list(pch = 1, cex =
0.7))
) +
layer(panel.ellipse(..., center.cex = 0))
plot # display graph
```

The between-student effect of age is 0 but the within-student effect is 1. Due to the large variation in ages between students, the least-squares regression of yodeling skill on age (for the 500 observations among all 100 students) produces an estimated slope close to 0 (though with a small \(p\)-value), because the slope is heavily weighted toward the between-student effect:

```
summary(lm(y ~ x, data=Data))
#>
#> Call:
#> lm(formula = y ~ x, data = Data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.771 -1.658 -0.089 1.552 7.624
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.05043 0.34719 26.07 <2e-16 ***
#> x 0.02091 0.00727 2.87 0.0042 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.35 on 498 degrees of freedom
#> Multiple R-squared: 0.0163, Adjusted R-squared: 0.0143
#> F-statistic: 8.26 on 1 and 498 DF, p-value: 0.00422
```

The initial mixed-effects model that we fit to the data is a simple random-intercepts model:

```
# random intercept only:
mod.0 <- lmer(y ~ 1 + (1 | group), Data)
summary(mod.0)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ 1 + (1 | group)
#> Data: Data
#>
#> REML criterion at convergence: 2103.1
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.0351 -0.7264 -0.0117 0.7848 2.0438
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> group (Intercept) 2.90 1.70
#> Residual 2.71 1.65
#> Number of obs: 500, groups: group, 100
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 10.002 0.186 53.9
```

We will shortly consider three other, more complex, mixed models; because of data-management considerations, it is convenient to fit them now, but we defer discussion of these models:

```
# effect of x and random intercept:
mod.1 <- lmer(y ~ x + (1 | group), Data)
# effect of x, contextual (student) mean of x, and random intercept:
mod.2 <- lmer(y ~ x + xm + (1 | group), Data)
# equivalent to y ~ I(x - xm) + xm + (1 | group)
# model generating the data (where Bb = 0)
mod.3 <- lmer(y ~ I(x - xm) + (1 | group), Data)
```

We proceed to obtain predictions from the random-intercept model
(`mod.0`

) and the other models (`mod.1`

,
`mod.2`

, and `mod.3`

) based on fixed effects
alone, as would be used for cross-validation based on clusters (i.e.,
students), and for fixed and random effects—so-called best linear
unbiased predictions or BLUPs—as would be used for cross-validation
based on cases (i.e., occasions within students):

```
Data <- within(Data, {
fit_mod0.fe <- predict(mod.0, re.form = ~ 0) # fixed effects only
fit_mod0.re <- predict(mod.0) # fixed and random effects (BLUPs)
fit_mod1.fe <- predict(mod.1, re.form = ~ 0)
fit_mod1.re <- predict(mod.1)
fit_mod2.fe <- predict(mod.2, re.form = ~ 0)
fit_mod2.re <- predict(mod.2)
fit_mod3.fe <- predict(mod.3, re.form = ~ 0)
fit_mod3.re <- predict(mod.3)
})
```

We then prepare the data for plotting:

```
Data_long <- reshape(Data[1:Nx, ], direction = "long", sep = ".",
timevar = "effect", varying = grep("\\.", names(Data[1:Nx, ])))
Data_long$id <- 1:nrow(Data_long)
Data_long <- reshape(Data_long, direction = "long", sep = "_",
timevar = "modelcode", varying = grep("_", names(Data_long)))
Data_long$model <- factor(
c("~ 1", "~ 1 + x", "~ 1 + x + xm", "~ 1 + I(x - xm)")
[match(Data_long$modelcode, c("mod0", "mod1", "mod2", "mod3"))]
)
```

Predictions based on the random-intercept model `mod.0`

for the first 10 students are shown in the following graph:

```
(
plot +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod0" & effect == "fe"),
groups = group,
type = "l",
lwd = 2
) +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod0" & effect == "re"),
groups = group,
type = "l",
lwd = 2,
lty = 3
)
) |> update(
main="Model: y ~ 1 + (1 | group)",
key=list(
corner=c(0.05, 0.05),
text=list(c("fixed effects only","fixed and random")),
lines=list(lty=c(1, 3))))
```

The fixed-effect predictions for the various individuals are identical—the estimated fixed-effects intercept or estimated general mean of \(y\)—while the BLUPs are the sums of the fixed-effects intercept and the random intercepts, and are only slightly shrunken towards the general mean. Because in our artificial data there is no population relationship between age and skill, the fixed-effect-only predictions and the BLUPs are not very different.

Our next model, `mod.1`

, includes a fixed intercept and
fixed effect of `x`

along with a random intercept:

```
summary(mod.1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x + (1 | group)
#> Data: Data
#>
#> REML criterion at convergence: 1564.5
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.9016 -0.6350 0.0188 0.5541 2.8293
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> group (Intercept) 192.941 13.890
#> Residual 0.257 0.507
#> Number of obs: 500, groups: group, 100
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) -33.9189 1.5645 -21.7
#> x 0.9653 0.0158 61.0
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> x -0.460
```

Predictions from this model appear in the following graph:

```
(
plot +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod1" & effect == "fe"),
groups = group,
type = "l",
lwd = 2
) +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod1" & effect == "re"),
groups = group,
type = "l",
lwd = 2,
lty = 3
)
) |> update(
main="Model: y ~ 1 + x + (1 | group)",
ylim=c(-15, 35),
key=list(
corner=c(0.95, 0.05),
text=list(c("fixed effects only","fixed and random")),
lines=list(lty=c(1, 3))))
```

The BLUPs fit the observed data very closely, but predictions based
on the fixed effects alone, with a common intercept and slope for all
clusters, are very poor—indeed, much worse than the fixed-effects-only
predictions based on the simpler random-intercept model,
`mod.0`

. We therefore anticipate (and show later in this
section) that case-based cross-validation will prefer `mod1`

to `mod0`

, but that cluster-based cross-validation will
prefer `mod0`

to `mod1`

.

Our third model, `mod.2`

, includes the contextual effect
of \(x\)—that is, the cluster mean
`xm`

—along with \(x\) and
the intercept in the fixed-effect part of the model, and a random
intercept:

```
summary(mod.2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x + xm + (1 | group)
#> Data: Data
#>
#> REML criterion at convergence: 1169.2
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.9847 -0.6375 0.0019 0.5568 2.7325
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> group (Intercept) 3.399 1.844
#> Residual 0.255 0.505
#> Number of obs: 500, groups: group, 100
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 9.4787 0.6171 15.4
#> x 0.9915 0.0160 62.1
#> xm -0.9800 0.0206 -47.7
#>
#> Correlation of Fixed Effects:
#> (Intr) x
#> x 0.000
#> xm -0.600 -0.777
```

This model is equivalent to fitting
`y ~ I(x - xm) + xm + (1 | group)`

, which is the model that
generated the data once the coefficient of the contextual predictor
`xm`

is set to 0 (as it is in `mod.3`

, discussed
below).

Predictions from model `mod.2`

appear in the following
graph:

```
(
plot +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod2" & effect == "fe"),
groups = group,
type = "l",
lwd = 2
) +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod2" & effect == "re"),
groups = group,
type = "l",
lwd = 2,
lty = 3
)
) |> update(
main="Model: y ~ 1 + x + xm + (1 | group)",
ylim=c(4, 16),
key=list(
corner=c(0.05, 0.05),
text=list(c("fixed effects only","fixed and random")),
lines=list(lty=c(1, 3))))
```

Depending on the estimated variance parameters of the model, a mixed
model like `mod.2`

will apply varying degrees of shrinkage to
the random-intercept BLUPs that correspond to variation in the heights
of the parallel fitted lines for the individual students. In our
contrived data, the `mod.2`

applies little shrinkage,
allowing substantial variability in the heights of the fitted lines,
which closely approach the observed values for each student. The fit of
the mixed model `mod.2`

is consequently similar to that of a
fixed-effects model with age and a categorical predictor for individual
students (i.e., treating students as a factor, and not shown here).

The mixed model `mod.2`

therefore fits individual
observations well, and we anticipate a favorable assessment using
individual-based cross-validation. In contrast, the large variability in
the BLUPs results in larger residuals for predictions based on fixed
effects alone, and so we expect that cluster-based cross-validation
won’t show an advantage for model `mod.2`

compared to the
smaller model `mod.0`

, which includes only fixed and random
intercepts.

Had the mixed model applied considerable shrinkage, then neither cluster-based nor case-based cross-validation would show much improvement over the random-intercept-only model. In our experience, the degree of shrinkage does not vary smoothly as parameters are changed but tends to be “all or nothing,” and near the tipping point, the behavior of estimates can be affected considerably by the choice of algorithm used to fit the model.

Finally, `mod.3`

directly estimates the model used to
generate the data. As mentioned, it is a constrained version of
`mod.2`

, with the coefficient of `xm`

set to 0,
and with `x`

expressed as a deviation from the cluster mean
`xm`

:

```
summary(mod.3)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ I(x - xm) + (1 | group)
#> Data: Data
#>
#> REML criterion at convergence: 1163.2
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.9770 -0.6320 0.0063 0.5603 2.7249
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> group (Intercept) 3.391 1.842
#> Residual 0.255 0.505
#> Number of obs: 500, groups: group, 100
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 10.002 0.185 53.9
#> I(x - xm) 0.992 0.016 62.1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> I(x - xm) 0.000
```

The predictions from `mod.3`

are therefore similar to
those from `mod.2`

:

```
(
plot +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod3" & effect == "fe"),
groups = group,
type = "l",
lwd = 2
) +
xyplot(
fit ~ x,
subset(Data_long, modelcode == "mod3" & effect == "re"),
groups = group,
type = "l",
lwd = 2,
lty = 3
)
) |> update(
main="Model: y ~ 1 + I(x - xm) + (1 | group)",
ylim=c(4, 16),
key=list(
corner=c(0.05, 0.05),
text=list(c("fixed effects only","fixed and random")),
lines=list(lty=c(1, 3))))
```

We next carry out case-based cross-validation, which, as we have
explained, is based on both fixed and predicted random effects (i.e.,
BLUPs), and cluster-based cross-validation, which is based on fixed
effects only. In order to reduce between-model random variability in
comparisons of models, we apply `cv()`

to the list of models
created by the `models()`

function (introduced previously),
performing cross-validation with the same folds for each model:

```
modlist <- models(
"~ 1" = mod.0,
"~ 1 + x" = mod.1,
"~ 1 + x + xm" = mod.2,
"~ 1 + I(x - xm)" = mod.3
)
cvs_clusters <-
cv(
modlist,
data = Data,
cluster = "group",
k = 10,
seed = 6449
)
plot(cvs_clusters, main = "Model Comparison, Cluster-Based CV")
```

```
cvs_cases <- cv(modlist, data = Data, seed = 9693)
plot(cvs_cases, main = "Model Comparison, Case-Based CV")
```

In summary, model `mod.1`

, with \(x\) alone and without the contextual mean
of \(x\), is assessed as fitting very
poorly by cluster-based CV, but relatively much better by case-based CV.
Model `mod.2`

, which includes both \(x\) and its contextual mean, produces
better results using both cluster-based and case-based CV. The
data-generating model, `mod.3`

, which includes the fixed
effect of `x - xm`

in place of separate terms in
`x`

and `xm`

, isn’t distinguishable from model
`mod.2`

, which includes `x`

and `xm`

separately, even though `mod.2`

has an unnecessary parameter
(recall that the population coefficient of `xm`

is 0 when
`x`

is expressed as deviations from the contextual mean).
These conclusions are consistent with our observations based on graphing
predictions from the various models, and they illustrate the
desirability of assessing mixed-effect models at different hierarchical
levels.

Crossed random effects arise when the structure of the data aren’t strictly hierarchical. Nevertheless, crossed and nested random effects can be handled in much the same manner, by refitting the mixed-effects model to the data with a fold of clusters or cases removed and using the refitted model to predict the response in the removed fold.

We’ll illustrate with data on pig growth, introduced by Diggle, Liang, & Zeger (1994, Table 3.1).
The data are in the `Pigs`

data frame in the
**cv** package:

```
head(Pigs, 9)
#> id week weight
#> 1 1 1 24.0
#> 2 1 2 32.0
#> 3 1 3 39.0
#> 4 1 4 42.5
#> 5 1 5 48.0
#> 6 1 6 54.5
#> 7 1 7 61.0
#> 8 1 8 65.0
#> 9 1 9 72.0
head(xtabs( ~ id + week, data = Pigs), 3)
#> week
#> id 1 2 3 4 5 6 7 8 9
#> 1 1 1 1 1 1 1 1 1 1
#> 2 1 1 1 1 1 1 1 1 1
#> 3 1 1 1 1 1 1 1 1 1
tail(xtabs( ~ id + week, data = Pigs), 3)
#> week
#> id 1 2 3 4 5 6 7 8 9
#> 46 1 1 1 1 1 1 1 1 1
#> 47 1 1 1 1 1 1 1 1 1
#> 48 1 1 1 1 1 1 1 1 1
```

Each of 48 pigs is observed weekly over a period of 9 weeks, with the
weight of the pig recorded in kg. The data are in “long” format, as is
appropriate for use with the `lmer()`

function in the
**lme4** package. The data are very regular, with no
missing cases.

The following graph, showing the growth trajectories of the pigs, is similar to Figure 3.1 in Diggle et al. (1994); we add an overall least-squares line and a loess smooth, which are nearly indistinguishable:

```
plot(weight ~ week, data = Pigs, type = "n")
for (i in unique(Pigs$id)) {
with(Pigs, lines(
x = 1:9,
y = Pigs[id == i, "weight"],
col = "gray"
))
}
abline(lm(weight ~ week, data = Pigs),
col = "blue",
lwd = 2)
lines(
with(Pigs, loess.smooth(week, weight, span = 0.5)),
col = "magenta",
lty = 2,
lwd = 2
)
```

The individual “growth curves” and the overall trend are generally linear, with some tendency for variability of pig weight to increase over weeks (a feature of the data that we ignore in the mixed model that we fit to the data below).

The **Stata** mixed-effects models manual proposes a
model with crossed random effects for the `Pigs`

data (StataCorp LLC, 2023, p. 37):

[S]uppose that we wish to fit \[ \mathrm{weight}_{ij} = \beta_0 + \beta_1 \mathrm{week}_{ij} + u_i + v_j + \varepsilon_{ij} \] for the \(i = 1, \ldots, 9\) weeks and \(j = 1, \dots, 48\) pigs and \[ u_i \sim N(0, \sigma^2_u); v_j \sim N(0, \sigma^2_v ); \varepsilon_{ij} \sim N(0, \sigma^2_\varepsilon) \] all independently. That is, we assume an overall population-average growth curve \(\beta_0 + \beta_1 \mathrm{week}\) and a random pig-specific shift. In other words, the effect due to week, \(u_i\), is systematic to that week and common to all pigs. The rationale behind [this model] could be that, assuming that the pigs were measured contemporaneously, we might be concerned that week-specific random factors such as weather and feeding patterns had significant systematic effects on all pigs.

Although we might prefer an alternative model,^{6} we think that this is
a reasonable specification.

The **Stata** manual fits the mixed model by maximum
likelihood (rather than REML), and we duplicate the results reported
there using `lmer()`

:

```
m.p <- lmer(
weight ~ week + (1 | id) + (1 | week),
data = Pigs,
REML = FALSE, # i.e., ML
control = lmerControl(optimizer = "bobyqa")
)
summary(m.p)
#> Linear mixed model fit by maximum likelihood ['lmerMod']
#> Formula: weight ~ week + (1 | id) + (1 | week)
#> Data: Pigs
#> Control: lmerControl(optimizer = "bobyqa")
#>
#> AIC BIC logLik deviance df.resid
#> 2037.6 2058.0 -1013.8 2027.6 427
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.775 -0.542 0.005 0.476 3.982
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> id (Intercept) 14.836 3.852
#> week (Intercept) 0.085 0.292
#> Residual 4.297 2.073
#> Number of obs: 432, groups: id, 48; week, 9
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 19.3556 0.6334 30.6
#> week 6.2099 0.0539 115.1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> week -0.426
```

We opt for the non-default `"bobyqa"`

optimizer because it
provides more numerically stable results for subsequent cross-validation
in this example.

We can then cross-validate the model by omitting folds composed of
pigs, folds composed of weeks, or folds composed of pig-weeks (which in
the `Pigs`

data set correspond to individual cases, using
only the fixed effects):

```
cv(m.p, clusterVariables = "id")
#> n-Fold Cross Validation based on 48 {id} clusters
#> criterion: mse
#> cross-validation criterion = 19.973
#> bias-adjusted cross-validation criterion = 19.965
#> 95% CI for bias-adjusted CV criterion = (17.125, 22.805)
#> full-sample criterion = 19.201
cv(m.p, clusterVariables = "week")
#> boundary (singular) fit: see help('isSingular')
#> n-Fold Cross Validation based on 9 {week} clusters
#> criterion: mse
#> cross-validation criterion = 19.312
#> bias-adjusted cross-validation criterion = 19.305
#> 95% CI for bias-adjusted CV criterion = (16.566, 22.044)
#> full-sample criterion = 19.201
cv(
m.p,
clusterVariables = c("id", "week"),
k = 10,
seed = 8469
)
#> R RNG seed set to 8469
#> 10-Fold Cross Validation based on 432 {id, week} clusters
#> criterion: mse
#> cross-validation criterion = 19.235
#> bias-adjusted cross-validation criterion = 19.233
#> 95% CI for bias-adjusted CV criterion = (16.493, 21.973)
#> full-sample criterion = 19.201
```

We can also cross-validate the individual cases taking account of the random effects (employing the same 10 folds):

```
cv(m.p, k = 10, seed = 8469)
#> R RNG seed set to 8469
#> 10-Fold Cross Validation
#> criterion: mse
#> cross-validation criterion = 5.1583
#> bias-adjusted cross-validation criterion = 5.0729
#> 95% CI for bias-adjusted CV criterion = (4.123, 6.0229)
#> full-sample criterion = 3.796
```

Because these predictions are based on BLUPs, they are more accurate
than the predictions based only on fixed effects.^{7} As well, the
difference between the MSE computed for the model fit to the full data
and the CV estimates of the MSE is greater here than for cluster-based
predictions.

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting
linear mixed-effects models using lme4.
*Journal of Statistical Software*, *67*(1), 1–48.

Diggle, P. J., Liang, K.-Y., & Zeger, S. L. (1994). *Analysis of
longitudinal data*. Oxford: Oxford University Press.

Fox, J., & Weisberg, S. (2019). *An R companion to
applied regression* (Third edition). Thousand Oaks CA:
Sage.

Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-effects models in
S and S-PLUS*. New York: Springer.

Raudenbush, S. W., & Bryk, A. S. (2002). *Hierarchical linear
models: Applications and data analysis methods* (Second edition).
Thousand Oaks CA: Sage.

Sarkar, D. (2008). *Lattice: Multivariate data visualization with
R*. New York: Springer. Retrieved from http://lmdvr.r-forge.r-project.org

Sarkar, D., & Andrews, F. (2022). *latticeExtra: Extra graphical utilities based on
lattice*. Retrieved from https://CRAN.R-project.org/package=latticeExtra

StataCorp LLC. (2023). *Stata multilevel mixed-effects reference
manual, release 18*. College Station TX: Stata Press.
Retrieved from https://www.stata.com/manuals/me.pdf

Vehtari, A. (2023). Cross-validation FAQ. Retrieved October
15, 2023, from https://users.aalto.fi/~ave/CV-FAQ.html

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D.
(2023). *Dplyr: A grammar of data manipulation*. Retrieved from
https://CRAN.R-project.org/package=dplyr

There are, however, more complex situations that give rise to so-called

*crossed*(rather than*nested*) random effects. For example, consider students within classes within schools. In primary schools, students typically are in a single class, and so classes are nested within schools. In secondary schools, however, students typically take several classes and students who are together in a particular class may not be together in other classes; consequently, random effects based on classes within schools are crossed. The`lmer()`

function in the**lme4**package is capable of modeling both nested and crossed random effects, and the`cv()`

methods for mixed models in the**cv**package pertain to both nested and crossed random effects. We present an example of the latter later in the vignette.↩︎We subsequently discovered that Vehtari (2023, sec. 8) makes similar points.↩︎

The observant reader will notice that we set the argument

`control=list(opt="optim")`

in the call to`lme()`

, changing the optimizer employed from the default`"nlminb"`

. We did this because with the default optimizer,`lme()`

encountered the same convergence issue as`lmer()`

, but rather than issuing a warning,`lme()`

failed, reporting an error. As it turns out, setting the optimizer to`"optim"`

avoids this problem.↩︎We invite the interested reader to experiment with varying the parameters of our example.↩︎

We find it convenient to use the

**lattice**(Sarkar, 2008) and**latticeExtra**(Sarkar & Andrews, 2022) packages for this and other graphs in this section.↩︎These are repeated-measures data, which would be more conventionally modeled with autocorrelated errors within pigs. The

`lme()`

function in the**nlme**package, for example, is capable of fitting a mixed-model of this form.↩︎Even though there is only one observation per combination of pigs and weeks, we can use the BLUP for the omitted case because of the crossed structure of the random effects; that is each pig-week has a pig random effect and a week random effect. Although it probably isn’t sensible, we can imagine a mixed model for the pig data that employs nested random effects, which would be specified by

`lmer(weight ~ week + (1 | id/week), data=Pigs)`

—that is, a random intercept that varies by combinations of`id`

(pig) and`week`

. This model can’t be fit, however: With only one case per combination of`id`

and`week`

, the nested random-effect variance is indistinguishable from the case-level variance.↩︎