The R package mlma is created for linear and nonlinear mediation analysis with multilevel data using multilevel additive models (Yu and Li 2020). The vignette is composed of three parts. We first generate a simulated dataset. Based on the simulation, part I focuses on how to transform variables and prepare data for the mediation analysis. Part II walks through functions of the multilevel mediation analysis and Part III explains how to make inferences on multilevel mediation effects.
To use the R package mlma, we first install the package in R (install.packages("mlma")) and load it.
library(mlma)
#> Loading required package: lme4
#> Loading required package: Matrix
#> Loading required package: splines
#> Loading required package: car
#> Loading required package: carData
#> Loading required package: gplots
#> 
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#> 
#>     lowess
#> Loading required package: abind
#library(abind)
#source('O:/My Documents/My Research/Research/Multilevel mediation analysis/mlma package/current version/R/mlma.r')We generate a dataset with two levels. In the simulation, there are 1 level one exposure that is binary and 1 level two exposure that is continuous. There are also two mediators, one at each level. The level one mediator is continuous while the level two mediator is binary. The variables are generated by the following code:
# a binary predictor
set.seed(1)
n=20       # the number of observations in each group
J<-600/n   # there are 30 groups
level=rep(1:J,each=n)
alpha_211<-0.8     #covariates coefficients
alpha_1111<-0.8
alpha_2111<-0.8
beta_1<-0.4
beta_2<-0.4
beta_3<-0.4
beta_4<-0.4
beta_5<-0.4
v1=5              #the level 1 variance
v2=v1/5           #the level 2 variance
#The exposure variables
x1<-rbinom(600,1,0.5) #binary level 1 exposure, xij
x2<-rep(rnorm(J),each=n) #continuous level 2 exposure
#The mediators
m2<-rep(rbinom(J,1,exp(alpha_211*unique(x2))/(1+exp(alpha_211*unique(x2)))),each=n)    #level 2 binary mediator
u1<-rep(rnorm(J,0,0.5),each=n) #level 2 variance for mij
e1<-rnorm(n*J)  #level 1 variance for mij
m1<-u1+alpha_1111*x1+alpha_2111*x2^2+e1 #level 1 continuous mediator
#The response variable
u0<-rep(rnorm(J,0,v2),each=n)
e0<-rnorm(n*J,0,v1)
y<-u0+beta_1*x1+beta_2*x2+beta_3*ifelse(x2<=0,0,log(1+x2))+beta_4*m1+beta_5*m2+e0The function is used to do the transformation before the mediation analysis. In the function, the exposure variable(s) (\(x\)) and the mediator(s) (\(m\)) are required to input. The response variable (\(y\)) is required only when its level (\(levely\)) is not given. The argument \(levelx\) is to identify the levels of the exposure variable. \(levelx\) does not need to be provided. The function can automatically decide the level of the exposure variable(s). If any of the exposure variable is binary or categorical, \(xref\) is used to identify the reference group of the exposure variable.
The arguments \(l1\) and \(l2\) specify the column numbers in m the continuous mediators at level one or level 2 respectively. \(c1\) and \(c2\) refers to the categorical mediators where \(c1r\) and \(c2r\) idenfify the reference group respectively. \(l1, l2, c1\), and \(c2\) does not have to be provided. If not provided, the function checks each column of m and decide whether each varaible belongs to level 1 or 2, and be continuous or categorical.
level is a vector that record the group number for each observation. weight is the weight of each observation.
f01y and \(f10y\) specify the desired transformation of exposures at level 2 or level 1 respectively in explaining the response variable. \(f01y\) and \(f10y\) are lists with the first item identify the column number of the expsoure variable in \(x\) that needs to be transformed, and then in that order, each of the rest items list the transformation functional expressions for each exposure. For example means that column 2 of x is a level 1 exposure. It needs to be transformed to its square form and natural log form to predict the response variable. If not specified in \(f01y\) or \(f10y\), the exposure will keep its original format without transformation. In our simulation data, the level two exposure is transformed to itself, \(x_{.j}\) and \(I(x_{.j}>0)\times log(x_{.j}+1)\). Therefore, we define . Similarly, \(f02ky\) and \(f20ky\) defines the transformation of level 2 and level 1 mediators respectively in explaining \(y\).
\(f01km1\) and \(f01km2\) are aguments that defines transformation of level 2 exposures in explaining level 1 or level 2 mediator(s) respectively. Since only higher or equvalent level variables can be used as predictors, level 1 exposures can only be predictors for level 1 mediators. \(f10km\) defines the transformation of level 1 exposure(s) in explaining level 1 mediator(s). In addition, when there are level 2 mediators but not level 2 exposure variable, the level 1 exposure variable(s) will be aggregated at level 2 to form the level 2 exposure variables. The first item of the the f01km1, \(f01km2\) and f10km is a matrix of two columns, where the first column indicate the column number of the mediator in \(m\). The second column indicate the column number of the exposure in \(x\). By the order of the rows of the first item, each of the rest items of \(f01km1, f01km2\) and \(f10km\) list the transformation functional expressions for the exposure (identified by column 2) in explaining each mediator (identified by column 1). In our example, level one mediator m1 is explained by the level two exposure \(x2\) in the form of \(x2^2\). Therefore, we set the argument .
The following codes prepare for the data and perform the transformations. Note that the transformation functions can be set for different ways. Besides those in the example, we can also use the natural spline bases (e.g. “ns(x,df=5)”) and piecewise functions (e.g. “ifelse(x>0,0,sqrt(x))”).
example1<-data.org(x=cbind(x1=x1,x2=x2), m=cbind(m1=m1,m2=m2),                     level=level, 
                   f01y=list(2,c("x","ifelse(x>0,log(x),0)")),
                   f01km1=list(matrix(c(1,2),1,2),"x^2")) The function can be executed based on the results from or on the original arguments of . In additon, the response variable needs to be set up by \(y\). If the response variable is categorical, \(yref\) is used to specify the reference group. The \(random\) argument is to set up the random effect part for the response variable \(random.m1\) is for the medators.
The argument \(covariates\) include the data frame of all covariates for the response variable and/or mediators. For the response variable, covariates are defined as those variables used to explain \(y\), but are not related or caused by the exposure variable(s). Arguments \(cy1\) and \(cy2\) specify the column numbers of level one and two covariates respectively in \(covariates\). \(cm\) specifies the covariates for mediators.
If the joint effect of a group of mediators are of interest, the group can be set up with the \(joint\) argument. Finally, if users are interested in the medation effects on a new set of exposure and mediators. The new sets can also be set. Please read the menu of the package.
mlma.e1<-mlma(y=y,data1=example1,intercept=F)
mlma.e1
#> Level 2 Third Variable Effects: 
#>      TE   DE m2.1   m1
#> x2 1.57 1.39 0.01 0.17
#> Level 1 Third Variable Effects: 
#>      TE   DE   m1
#> x1 1.11 0.63 0.48The result of mediation effect analysis shows the mediation effect from different levels. The direct effect, indirect effects and total effect are shown for each exposure-response pair of variables. For the above example, the level 1 total effect from \(x1\) to \(y\) is \(1.11\), of which direct effect is \(0.63\), indirect effect from \(m1\) is \(0.48\). The level 2 total effect between \(x2\) and \(y\) is \(1.57\), in which the direct effect is \(1.39\), the indirect effect from \(m2\) is \(0.01\), and from \(m1\) is \(0.17\).
The function provides the ANOVA tests of the exposure variables and mediators in the full model to estimate the response variable. It also provides the ANOVA tests of the exposure variable(s) in predicting each mediator. Using the results, users can decide which variables should be included as mediators and which ones should be used as covariates and rerun the multilevle mediation analysis.
summary(mlma.e1)
#> 1. Anova on the Full Model:
#> Analysis of Deviance Table (Type III Wald chisquare tests)
#> 
#> Response: y
#>       Chisq Df Pr(>Chisq)   
#> x1   2.2349  1   0.134926   
#> x2.1 4.2622  1   0.038969 * 
#> x2.2 1.2820  1   0.257525   
#> m2.2 0.0573  1   0.810798   
#> m1   9.0432  1   0.002637 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> 2. Anova on models for Level 1 mediators:
#> $m1
#> Analysis of Deviance Table (Type III Wald chisquare tests)
#> 
#> Response: y
#>        Chisq Df Pr(>Chisq)    
#> x2.1  93.581  1  < 2.2e-16 ***
#> x1   106.515  1  < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> 
#> 3. Anova on models for Level 2 mediators:
#> $m2
#> Analysis of Deviance Table (Type III tests)
#> 
#> Response: y
#>      LR Chisq Df Pr(>Chisq)  
#> x2.2   3.0116  1    0.08267 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1To check the actual coefficients for each variable in the full model or in the model to predict level one or level two mediators, we can check the results from directly.
mlma.e1$f1   #the full model
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x1 + x2.1 + x2.2 + m2.2 + m1 - 1 + (1 | level)
#>    Data: data.frame(temp.data)
#> REML criterion at convergence: 3661.375
#> Random effects:
#>  Groups   Name        Std.Dev.
#>  level    (Intercept) 1.116   
#>  Residual             5.040   
#> Number of obs: 600, groups:  level, 30
#> Fixed Effects:
#>     x1    x2.1    x2.2    m2.2      m1  
#> 0.6349  0.6935  0.4352  0.1259  0.4791
mlma.e1$fm1  #models for level 1 mediators
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x2.1 + x1 - 1 + (1 | level)
#>    Data: data.frame(temp.data)
#> REML criterion at convergence: 1864.068
#> Random effects:
#>  Groups   Name        Std.Dev.
#>  level    (Intercept) 0.5694  
#>  Residual             1.0879  
#> Number of obs: 600, groups:  level, 30
#> Fixed Effects:
#>   x2.1      x1  
#> 0.8086  0.8975
mlma.e1$fm2  #models for level 2 mediators
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> 
#> Call:  glm(formula = frml.m, family = binomial(link = "logit"), data = data.frame(temp.data))
#> 
#> Coefficients:
#>   x2.2  
#> 0.7045  
#> 
#> Degrees of Freedom: 30 Total (i.e. Null);  29 Residual
#> Null Deviance:       41.59 
#> Residual Deviance: 38.58     AIC: 40.58The function help depict the directions of mediaiton effects. Without specifying the mediator (by \(var\)), the function plots the overall medation effects.
plot(mlma.e1)
#> Error in plot.new(): figure margins too largeBy specifying the mediator, the function shows the indirect effect of the mediator, and its marginal relationship with the response variable and with the exposure variable at each level.
plot(mlma.e1,var="m1")
#> Error in plot.new(): figure margins too largeFinally, the function uses the bootstrap method to estimate mediation effects and the estimated variances and confidence intervals. Again, the analysis can be built on the results from . The default number of bootstrap samples is \(100\), which can be changed to other desired numbers. The function for the output of gives the inference results for all mediation effects. Two confidence intervals are built up for the estiamted mediation effects. (lwbd, upbd) is based on the normal approximation and (lwbd_quan, upbd.quan) is built by the quantiles of the bootstrap results.
boot.e1<-boot.mlma(y=y,data1=example1,echo=F,intercept = F)
summary(boot.e1)
#> MLMA Analysis: Estimated Effects at level 1:
#>                te      de      m1
#> est        1.1120  0.6349  0.4772
#> mean       0.6279  0.5774  0.0505
#> sd         0.3758  0.3759  0.0672
#> upbd       1.3646  1.3141  0.1823
#> lwbd      -0.1087 -0.1593 -0.0812
#> upbd.quan  1.3694  1.2500  0.1985
#> lwbd.quan -0.0954 -0.1391 -0.0770
#> MLMA Analysis: Estimated Effects at level 2:
#>               te     de    m2.1     m1
#> est       1.5682 1.3910  0.0085 0.1687
#> mean      1.5782 1.3975  0.0101 0.1706
#> sd        0.5019 0.5215  0.0238 0.0546
#> upbd      2.5618 2.4196  0.0567 0.2777
#> lwbd      0.5946 0.3755 -0.0365 0.0635
#> upbd.quan 2.4515 2.3865  0.0545 0.2620
#> lwbd.quan 0.6860 0.5254 -0.0336 0.0674The functions for the \(boot.mlma\) objects works similarly for the \(mlma\) objects but confidence interval for estimations are added.
plot(boot.e1)
#> Error in plot.new(): figure margins too large
plot(boot.e1,var="m1")
#> Error in plot.new(): figure margins too largeYu, Qingzhao, and Bin Li. 2020. “Third-Variable Effect Analysis with Multilevel Additive Models.” Submitted Manuscript.