In this vignette we briefly compare the mmrm::mmrm,
SAS’s PROC GLIMMIX, nlme::gls,
lme4::lmer, and glmmTMB::glmmTMB functions for
fitting mixed models for repeated measures (MMRMs). A primary difference
in these implementations lies in the covariance structures that are
supported “out of the box”. In particular, PROC GLIMMIX and
mmrm are the only procedures which provide support for many
of the most common MMRM covariance structures. Most covariance
structures can be implemented in gls, though users are
required to define them manually. lmer and
glmmTMB are more limited. We find that mmmrm
converges more quickly than other R implementations while also producing
estimates that are virtually identical to
PROC GLIMMIX’s.
Two datasets are used to illustrate model fitting with the
mmrm, lme4, nlme,
glmmTMB R packages as well as PROC GLIMMIX.
These data are also used to compare these implementations’ operating
characteristics.
The FEV dataset contains measurements of FEV1 (forced expired volume in one second), a measure of how quickly the lungs can be emptied. Low levels of FEV1 may indicate chronic obstructive pulmonary disease (COPD). It is summarized below.
                                      Stratified by ARMCD
                               Overall       PBO           TRT
  n                              800           420           380
  USUBJID (%)
     PT[1-200]                   200           105 (52.5)     95 (47.5)
  AVISIT
     VIS1                        200           105            95
     VIS2                        200           105            95
     VIS3                        200           105            95
     VIS4                        200           105            95
  RACE (%)
     Asian                       280 (35.0)    152 (36.2)    128 (33.7)
     Black or African American   300 (37.5)    184 (43.8)    116 (30.5)
     White                       220 (27.5)     84 (20.0)    136 (35.8)
  SEX = Female (%)               424 (53.0)    220 (52.4)    204 (53.7)
  FEV1_BL (mean (SD))          40.19 (9.12)  40.46 (8.84)  39.90 (9.42)
  FEV1 (mean (SD))             42.30 (9.32)  40.24 (8.67)  44.45 (9.51)
  WEIGHT (mean (SD))            0.52 (0.23)   0.52 (0.23)   0.51 (0.23)
  VISITN (mean (SD))            2.50 (1.12)   2.50 (1.12)   2.50 (1.12)
  VISITN2 (mean (SD))          -0.02 (1.03)   0.01 (1.07)  -0.04 (0.98)The BCVA dataset contains data from a randomized longitudinal ophthalmology trial evaluating the change in baseline corrected visual acuity (BCVA) over the course of 10 visits. BCVA corresponds to the number of letters read from a visual acuity chart. A summary of the data is given below:
                                      Stratified by ARMCD
                               Overall         CTL            TRT
  n                             8605          4123           4482
  USUBJID (%)
     PT[1-1000]                 1000           494 (49.4)     506 (50.6)
  AVISIT
     VIS1                        983           482            501
     VIS2                        980           481            499
     VIS3                        960           471            489
     VIS4                        946           458            488
     VIS5                        925           454            471
     VIS6                        868           410            458
     VIS7                        816           388            428
     VIS8                        791           371            420
     VIS9                        719           327            392
     VIS10                       617           281            336
  RACE (%)
     Asian                       297 (29.7)    151 (30.6)     146 (28.9)
     Black or African American   317 (31.7)    149 (30.1)     168 (33.2)
     White                       386 (38.6)    194 (39.3)     192 (37.9)
  BCVA_BL (mean (SD))          75.12 (9.93)  74.90 (9.76)   75.40 (10.1)
  BCVA_CHG (mean (SD))
     VIS1                       5.59 (1.31)   5.32 (1.23)    5.86 (1.33)
     VIS10                      9.18 (2.91)   7.49 (2.58)   10.60 (2.36)Listed below are some of the most commonly used covariance structures
used when fitting MMRMs. We indicate which matrices are available “out
of the box” for each implementation considered in this vignette. Note
that this table is not exhaustive; PROC GLIMMIX and
glmmTMB support additional spatial covariance
structures.
| Covariance structures | mmrm | PROC GLIMMIX | gls | lmer | glmmTMB | 
|---|---|---|---|---|---|
| Ante-dependence (heterogeneous) | X | X | |||
| Ante-dependence (homogeneous) | X | ||||
| Auto-regressive (heterogeneous) | X | X | X | ||
| Auto-regressive (homogeneous) | X | X | X | X | |
| Compound symmetry (heterogeneous) | X | X | X | X | |
| Compound symmetry (homogeneous) | X | X | X | ||
| Spatial exponential | X | X | X | X | |
| Toeplitz (heterogeneous) | X | X | X | ||
| Toeplitz (homogeneous) | X | X | |||
| Unstructured | X | X | X | X | X | 
Code for fitting MMRMs to the FEV data using each of the considered functions and covariance structures are provided below. Fixed effects for the visit number, treatment assignment and the interaction between the two are modeled.
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=ANTE(1);
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + adh(VISITN | USUBJID),
  data = fev_data
)
mmrmmmrm(
  formula =FEV1 ~ ARMCD * AVISIT + ad(VISITN | USUBJID),
  data = fev_data
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=ARH(1);
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + ar1h(VISITN | USUBJID),
  data = fev_data
)
glsgls(
  FEV1 ~ ARMCD * AVISIT,
  data = fev_data,
  correlation = corCAR1(form = ~AVISIT | USUBJID),
  weights = varIdent(form = ~1|AVISIT),
  na.action = na.omit
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 =  ARMCD|AVISIT / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=AR(1);
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + ar1(VISITN | USUBJID),
  data = fev_data
)
glsgls(
  FEV1 ~ ARMCD * AVISIT,
  data = fev_data,
  correlation = corCAR1(form = ~AVISIT | USUBJID),
  na.action = na.omit
)
glmmTMBglmmTMB(
  FEV1 ~ ARMCD * AVISIT + ar1(0 + AVISIT | USUBJID),
  dispformula = ~ 0,
  data = fev_data
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=CSH;
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + csh(VISITN | USUBJID),
  data = fev_data
)
glsgls(
  FEV1 ~ ARMCD * AVISIT,
  data = fev_data,
  correlation = corCompSymm(form = ~AVISIT | USUBJID),
  weights = varIdent(form = ~1|AVISIT),
  na.action = na.omit
)
glmmTMBglmmTMB(
  FEV1 ~ ARMCD * AVISIT + cs(0 + AVISIT | USUBJID),
  dispformula = ~ 0,
  data = fev_data
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=CS;
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + cs(VISITN | USUBJID),
  data = fev_data
)
glsgls(
  FEV1 ~ ARMCD * AVISIT,
  data = fev_data,
  correlation = corCompSymm(form = ~AVISIT | USUBJID),
  na.action = na.omit
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM / subject=USUBJID type=sp(exp)(visitn) rcorr;
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + sp_exp(VISITN | USUBJID),
  data = fev_data
)
glsgls(
  FEV1 ~ ARMCD * AVISIT,
  data = fev_data,
  correlation = corExp(form = ~AVISIT | USUBJID),
  weights = varIdent(form = ~1|AVISIT),
  na.action = na.omit
)
glmmTMB# NOTE: requires use of coordinates
glmmTMB(
  FEV1 ~ ARMCD * AVISIT + exp(0 + AVISIT | USUBJID),
  dispformula = ~ 0,
  data = fev_data
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=TOEPH;
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + toeph(AVISIT | USUBJID),
  data = fev_data
)
glmmTMB glmmTMB(
  FEV1 ~ ARMCD * AVISIT + toep(0 + AVISIT | USUBJID),
  dispformula = ~ 0,
  data = fev_data
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=TOEP;
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + toep(AVISIT | USUBJID),
  data = fev_data
)
PROC GLIMMIXPROC GLIMMIX DATA = fev_data;
CLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = ARMCD|AVISIT / ddfm=satterthwaite solution chisq;
RANDOM AVISIT / subject=USUBJID type=un;
mmrmmmrm(
  formula = FEV1 ~ ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)
glsgls(
  FEV1 ~  ARMCD * AVISIT,
  data = fev_data,
  correlation = corSymm(form = ~AVISIT | USUBJID),
  weights = varIdent(form = ~1|AVISIT),
  na.action = na.omit
)
lmerlmer(
  FEV1 ~ ARMCD * AVISIT + (0 + AVISIT | USUBJID),
  data = fev_data,
  control = lmerControl(check.nobs.vs.nRE = "ignore"),
  na.action = na.omit
)
glmmTMBglmmTMB(
  FEV1 ~ ARMCD * AVISIT + us(0 + AVISIT | USUBJID),
  dispformula = ~ 0,
  data = fev_data
)
Next, the MMRM fitting procedures are compared using the FEV and BCVA datasets. FEV1 measurements are modeled as a function of race, treatment arm, visit number, and the interaction between the treatment arm and the visit number. Change in BCVA is assumed to be a function of race, baseline BCVA, treatment arm, visit number, and the treatment–visit interaction. In both datasets, repeated measures are modeled using an unstructured covariance matrix. The implementations’ convergence times are evaluated first, followed by a comparison of their estimates. Finally, we fit these procedures on simulated BCVA-like data to assess the impact of missingness on convergence rates.
The mmrm, PROC GLIMMIX, gls,
lmer, and glmmTMB functions are applied to the
FEV dataset 10 times. The convergence times are recorded for each
replicate and are reported in the table below.
| Implementation | Median | First Quartile | Third Quartile | 
|---|---|---|---|
| mmrm | 56.15 | 55.76 | 56.30 | 
| PROC GLIMMIX | 100.00 | 100.00 | 100.00 | 
| lmer | 247.02 | 245.25 | 257.46 | 
| gls | 687.63 | 683.50 | 692.45 | 
| glmmTMB | 715.90 | 708.70 | 721.57 | 
It is clear from these results that mmrm converges
significantly faster than other R functions. Though not demonstrated
here, this is generally true regardless of the sample size and
covariance structure used. mmrm is faster than
PROC GLIMMIX.
The MMRM implementations are now applied to the BCVA dataset 10 times. The convergence times are presented below.
| Implementation | Median | First Quartile | Third Quartile | 
|---|---|---|---|
| mmrm | 3.36 | 3.32 | 3.46 | 
| glmmTMB | 18.65 | 18.14 | 18.87 | 
| PROC GLIMMIX | 36.25 | 36.17 | 36.29 | 
| gls | 164.36 | 158.61 | 165.93 | 
| lmer | 165.26 | 157.46 | 166.42 | 
We again find that mmrm produces the fastest convergence
times on average.
We next estimate the marginal mean treatment effects for each visit
in the FEV and BCVA datasets using the MMRM fitting procedures. All R
implementations’ estimates are reported relative to
PROC GLIMMIX’s estimates. Convergence status is also
reported.
The R procedures’ estimates are very similar to those output by
PROC GLIMMIX, though mmrm and gls
generate the estimates that are closest to those produced when using
SAS. All methods converge using their default optimization
arguments.
mmrm, gls and lmer produce
estimates that are virtually identical to PROC GLIMMIX’s,
while glmmTMB does not. This is likely explained by
glmmTMB’s failure to converge. Note too that
lmer fails to converge.
The results of the previous benchmark suggest that the amount of patients missing from later time points affect certain implementations’ capacity to converge. We investigate this further by simulating data using a data-generating process similar to that of the BCVA datasets, though with various rates of patient dropout.
Ten datasets of 200 patients are generated each of the following levels of missingness: none, mild, moderate, and high. In all scenarios, observations are missing at random. The number patients observed at each visit is obtained for one replicated dataset at each level of missingness is presented in the table below.
| none | mild | moderate | high | |
|---|---|---|---|---|
| VIS01 | 200 | 196.7 | 197.6 | 188.1 | 
| VIS02 | 200 | 195.4 | 194.4 | 182.4 | 
| VIS03 | 200 | 195.1 | 190.7 | 175.2 | 
| VIS04 | 200 | 194.1 | 188.4 | 162.8 | 
| VIS05 | 200 | 191.6 | 182.5 | 142.7 | 
| VIS06 | 200 | 188.2 | 177.3 | 125.4 | 
| VIS07 | 200 | 184.6 | 168.0 | 105.9 | 
| VIS08 | 200 | 178.5 | 155.4 | 82.6 | 
| VIS09 | 200 | 175.3 | 139.9 | 58.1 | 
| VIS10 | 200 | 164.1 | 124.0 | 39.5 | 
The convergence rates of all implementations for stratified by missingness level is presented in the plot below.
mmrm, gls, and PROC GLIMMIX
are resilient to missingness, only exhibiting some convergence problems
in the scenarios with the most missingness. These implementations
converged in all the other scenarios’ replicates. glmmTMB,
on the other hand, has convergence issues in the no-, mild-, and
high-missingness datasets, with the worst convergence rate occurring in
the datasets with the most dropout. Finally, lmer is
unreliable in all scenarios, suggesting that it’s convergence issues
stem from something other than the missing observations.
Note that the default optimization schemes are used for each method; these schemes can be modified to potentially improve convergence rates.
A more comprehensive simulation study using data-generating processes
similar to the one used here is outlined in the simulations/missing-data-benchmarks
subdirectory. In addition to assessing the effect of missing data on
software convergence rates, we also evaluate these methods’ fit times
and empirical bias, variance, 95% coverage rates, type I error rates and
type II error rates. mmrm is found to be the most most
robust software for fitting MMRMs in scenarios where a large proportion
of patients are missing from the last time points. Additionally,
mmrm has the fastest average fit times regardless of the
amount of missingness. All implementations considered produce similar
empirical biases, variances, 95% coverage rates, type I error rates and
type II error rates.
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Ventura 13.4
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Zurich
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] knitr_1.47              sasr_0.1.2              glmmTMB_1.1.9          
#>  [4] nlme_3.1-164            lme4_1.1-35.4           Matrix_1.7-0           
#>  [7] stringr_1.5.1           microbenchmark_1.4.10   clusterGeneration_1.3.8
#> [10] MASS_7.3-60.2           yardstick_1.3.1         workflowsets_1.1.0     
#> [13] workflows_1.1.4         tune_1.2.1              tidyr_1.3.1            
#> [16] tibble_3.2.1            rsample_1.2.1           recipes_1.0.10         
#> [19] purrr_1.0.2             parsnip_1.2.1           modeldata_1.4.0        
#> [22] infer_1.0.7             ggplot2_3.5.1           dplyr_1.1.4            
#> [25] dials_1.2.1             scales_1.3.0            broom_1.0.6            
#> [28] tidymodels_1.2.0        car_3.1-2               carData_3.0-5          
#> [31] emmeans_1.10.2          mmrm_0.3.12            
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rdpack_2.6          rlang_1.1.4         magrittr_2.0.3     
#>  [4] furrr_0.3.1         compiler_4.4.0      mgcv_1.9-1         
#>  [7] png_0.1-8           vctrs_0.6.5         lhs_1.1.6          
#> [10] pkgconfig_2.0.3     fastmap_1.2.0       backports_1.5.0    
#> [13] ellipsis_0.3.2      labeling_0.4.3      utf8_1.2.4         
#> [16] rmarkdown_2.27      prodlim_2024.06.25  nloptr_2.1.0       
#> [19] xfun_0.45           cachem_1.1.0        jsonlite_1.8.8     
#> [22] highr_0.11          parallel_4.4.0      R6_2.5.1           
#> [25] bslib_0.7.0         stringi_1.8.4       reticulate_1.38.0  
#> [28] parallelly_1.37.1   boot_1.3-30         rpart_4.1.23       
#> [31] numDeriv_2016.8-1.1 lubridate_1.9.3     jquerylib_0.1.4    
#> [34] estimability_1.5.1  Rcpp_1.0.12         iterators_1.0.14   
#> [37] future.apply_1.11.2 splines_4.4.0       nnet_7.3-19        
#> [40] timechange_0.3.0    tidyselect_1.2.1    rstudioapi_0.16.0  
#> [43] abind_1.4-5         yaml_2.3.8          timeDate_4032.109  
#> [46] TMB_1.9.12          codetools_0.2-20    listenv_0.9.1      
#> [49] lattice_0.22-6      withr_3.0.0         coda_0.19-4.1      
#> [52] evaluate_0.24.0     future_1.33.2       survival_3.5-8     
#> [55] pillar_1.9.0        checkmate_2.3.1     foreach_1.5.2      
#> [58] generics_0.1.3      munsell_0.5.1       minqa_1.2.7        
#> [61] globals_0.16.3      xtable_1.8-4        class_7.3-22       
#> [64] glue_1.7.0          tools_4.4.0         data.table_1.15.4  
#> [67] gower_1.0.1         mvtnorm_1.2-5       grid_4.4.0         
#> [70] rbibutils_2.2.16    ipred_0.9-14        colorspace_2.1-0   
#> [73] cli_3.6.3           DiceDesign_1.10     fansi_1.0.6        
#> [76] lava_1.8.0          gtable_0.3.5        GPfit_1.0-8        
#> [79] sass_0.4.9          digest_0.6.36       farver_2.1.2       
#> [82] htmltools_0.5.8.1   lifecycle_1.0.4     hardhat_1.4.0