Parallel optimization using glmmTMB

Nafis Sadat

2020-03-15

A new, experimental feature of glmmTMB is the ability to parallelize the optimization process. This vignette shows an example and timing of a simple model fit with and without parallelizing across threads.

If your OS supports OpenMP parallelization and R was installed using OpenMP, glmmTMB will automatically pick up the OpenMP flags from R’s Makevars and compile the C++ model with OpenMP support. If the flag is not available, then the model will be compiled with serial optimization only.

Load packages:

library(glmmTMB)
set.seed(1)
nt <- min(parallel::detectCores(),5)

Simulate a dataset with large N:

N <- 3e5
xdata <- rnorm(N, 1, 2)
ydata <- 0.3 + 0.4*xdata + rnorm(N, 0, 0.25)

First, we fit the model serially. We can pass the number of parallelizing process we want using the parallel parameter in glmmTMBcontrol:

system.time(
  model1 <- glmmTMB(formula = ydata ~ 1 + xdata,
                    control = glmmTMBControl(parallel = 1))
  )
##    user  system elapsed 
##   2.192   0.256   2.512

Now, we fit the same model using five threads (or as many as possible - 4 in this case):

system.time(
  model2 <- glmmTMB(formula = ydata ~ 1 + xdata,
                    control = glmmTMBControl(parallel = nt))
  )
##    user  system elapsed 
##   2.462   0.302   3.044

The speed-up is definitely more visible on models with a much larger number of observations, or in models with random effects.

Here’s an example where we have an IID Gaussian random effect. We first simulate the data with 200 groups (our random effect):

xdata <- rnorm(N, 1, 2)
groups <- 200
data_use <- data.frame(obs = 1:N)
data_use <- within(data_use,
{
  
  group_var <- rep(seq(groups), times = nrow(data_use) / groups)
  group_intercept <- rnorm(groups, 0, 0.1)[group_var]
  xdata <- xdata
  ydata <- 0.3 + group_intercept + 0.5*xdata + rnorm(N, 0, 0.25)
})

We fit the random effect model, first with a single thread:

(t_serial <- system.time(
  model3 <- glmmTMB(formula = ydata ~ 1 + xdata + (1 | group_var), data = data_use, control = glmmTMBControl(parallel = 1))
 )
)
##    user  system elapsed 
##  19.899   2.122  25.367

Now we fit the same model, but using 4 threads. The speed-up is more noticeable with this model.

(t_parallel <- system.time(
     update(model3,  control = glmmTMBControl(parallel = nt))
 )
)
##    user  system elapsed 
##  19.581   2.121  24.581

Notes on OpenMP support

From Writing R Extensions:

Apple builds of clang on macOS currently have no OpenMP support, but CRAN binary packages are built with a clang-based toolchain which supports OpenMP. http://www.openmp.org/resources/openmp-compilers-tools gives some idea of what compilers support what versions.

The performance of OpenMP varies substantially between platforms. The Windows implementation has substantial overheads, so is only beneficial if quite substantial tasks are run in parallel. Also, on Windows new threads are started with the default FPU control word, so computations done on OpenMP threads will not make use of extended-precision arithmetic which is the default for the main process. ## System information

This report was built using 4 parallel threads (on a machine with a total of 4 cores)

print(sessionInfo(), locale=FALSE)
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] png_0.1-7       ggplot2_3.3.0   lattice_0.20-38 reshape2_1.4.3 
## [5] coda_0.19-3     TMB_1.7.16      MASS_7.3-51.5   glmmTMB_1.0.1  
## [9] knitr_1.28     
## 
## loaded via a namespace (and not attached):
##  [1] zoo_1.8-7           tidyselect_0.2.5    xfun_0.12          
##  [4] purrr_0.3.3         splines_3.6.3       colorspace_1.4-1   
##  [7] stats4_3.6.3        htmltools_0.4.0     yaml_2.2.1         
## [10] survival_3.1-8      rlang_0.4.4         nloptr_1.2.1       
## [13] pillar_1.4.3        withr_2.1.2         glue_1.3.1         
## [16] emmeans_1.4.3.01    multcomp_1.4-10     lifecycle_0.1.0    
## [19] plyr_1.8.6          stringr_1.4.0       munsell_0.5.0      
## [22] gtable_0.3.0        bdsmatrix_1.3-4     mvtnorm_1.1-0      
## [25] codetools_0.2-16    evaluate_0.14       labeling_0.3       
## [28] parallel_3.6.3      TH.data_1.0-10      Rcpp_1.0.3         
## [31] xtable_1.8-4        scales_1.1.0        farver_2.0.3       
## [34] lme4_1.1-21         digest_0.6.25       stringi_1.4.6      
## [37] dplyr_0.8.3         numDeriv_2016.8-1.1 tools_3.6.3        
## [40] bbmle_1.0.23.1      sandwich_2.5-1      magrittr_1.5       
## [43] tibble_2.1.3        crayon_1.3.4        pkgconfig_2.0.3    
## [46] Matrix_1.2-18       estimability_1.3    assertthat_0.2.1   
## [49] minqa_1.2.4         rmarkdown_2.1       R6_2.4.1           
## [52] boot_1.3-24         nlme_3.1-144        compiler_3.6.3