In this vignette we illustrate how to correct p-values for multiple comparisons using the multcomp and emmeans packages.
We start by simulating some data for a binary longitudinal outcome:
set.seed(1234)
n <- 300 # number of subjects
K <- 4 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at K time points
DF <- data.frame(id = rep(seq_len(n), each = K),
time = gl(K, 1, n*K, labels = paste0("Time", 1:K)),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
betas <- c(-2.13, 1, rep(c(1.2, -1.2), K-1)) # fixed effects coefficients
D11 <- 1 # variance of random intercepts
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))We fit a mixed effects logistic regression for y assuming random intercepts for the random-effects part, and the main effects of sex and time for the fixed-effects part.
fm <- mixed_model(fixed = y ~ sex + time, random = ~ 1 | id, data = DF,
family = binomial())The uncorrected p-values for the 4 time points are give by the summary() method:
summary(fm)
#>
#> Call:
#> mixed_model(fixed = y ~ sex + time, random = ~1 | id, data = DF,
#> family = binomial())
#>
#> Data Descriptives:
#> Number of Observations: 1200
#> Number of Groups: 300
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -583.4797 1178.959 1201.182
#>
#> Random effects covariance matrix:
#> StdDev
#> (Intercept) 1.448146
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) -2.2865 0.2496 -9.1597 < 1e-04
#> sexfemale 0.8853 0.2448 3.6171 0.00029798
#> timeTime2 0.3066 0.2266 1.3533 0.17594625
#> timeTime3 -0.5049 0.2463 -2.0501 0.04035455
#> timeTime4 0.5633 0.2237 2.5184 0.01178997
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUETo perform the pairwise comparisons and obtain corrected p-values, we load the multcomp package and use the glht() function. Because no specific methods exist for MixMod object returned by mixed_model(), we need to specify the vcov. and coef. arguments of glht(), i.e.,
library("multcomp")
#> Loading required package: mvtnorm
#> Loading required package: survival
#> Loading required package: TH.data
#> Loading required package: MASS
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:GLMMadaptive':
#>
#> negative.binomial
#>
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#>
#> geyser
fm_mc <- glht(fm, linfct = mcp(time = "Tukey"),
vcov. = vcov(fm, "fixed"), coef. = fixef)
summary(fm_mc)
#>
#> Simultaneous Tests for General Linear Hypotheses
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Fit: mixed_model(fixed = y ~ sex + time, random = ~1 | id, data = DF,
#> family = binomial())
#>
#> Linear Hypotheses:
#> Estimate Std. Error z value Pr(>|z|)
#> Time2 - Time1 == 0 0.3066 0.2266 1.353 0.52811
#> Time3 - Time1 == 0 -0.5049 0.2463 -2.050 0.16945
#> Time4 - Time1 == 0 0.5633 0.2237 2.518 0.05691 .
#> Time3 - Time2 == 0 -0.8115 0.2423 -3.350 0.00465 **
#> Time4 - Time2 == 0 0.2567 0.2165 1.186 0.63520
#> Time4 - Time3 == 0 1.0682 0.2406 4.440 < 0.001 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)We continue our illustration by including the interaction term between sex and time, and we focus on the difference between males and females for the various time points. We start by fitting the model:
gm <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = binomial())To compute the estimated log odds for males and females at the different time points we use the emmeans() functions from the emmeans package:
library("emmeans")
gm_mc <- emmeans(gm, ~ sex | time)
gm_mc
#> time = Time1:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -2.737017 0.3421930 Inf -3.407703 -2.0663308
#> female -1.209182 0.2550117 Inf -1.708995 -0.7093678
#>
#> time = Time2:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -1.675620 0.2755058 Inf -2.215602 -1.1356387
#> female -1.435439 0.2629832 Inf -1.950877 -0.9200015
#>
#> time = Time3:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -4.398350 0.5452336 Inf -5.466988 -3.3297118
#> female -1.389421 0.2612568 Inf -1.901475 -0.8773671
#>
#> time = Time4:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -1.336933 0.2610534 Inf -1.848589 -0.8252781
#> female -1.252104 0.2564246 Inf -1.754687 -0.7495210
#>
#> Confidence level used: 0.95The corresponding pairwise comparisons are performed by the pairs() function:
pairs(gm_mc)
#> time = Time1:
#> contrast estimate SE df z.ratio p.value
#> male - female -1.52783517 0.4097558 Inf -3.729 0.0002
#>
#> time = Time2:
#> contrast estimate SE df z.ratio p.value
#> male - female -0.24018095 0.3664609 Inf -0.655 0.5122
#>
#> time = Time3:
#> contrast estimate SE df z.ratio p.value
#> male - female -3.00892881 0.5851176 Inf -5.142 <.0001
#>
#> time = Time4:
#> contrast estimate SE df z.ratio p.value
#> male - female -0.08482945 0.3550390 Inf -0.239 0.8112