--- title: "Post-hoc MCMC with glmmTMB" author: "Ben Bolker and Mollie Brooks" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Post-hoc MCMC with glmmTMB} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ## Overview One commonly requested feature is to be able to run a *post hoc* Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows examples of such an analysis. Some caveats: - when using such a "pseudo-Bayesian" approach, be aware that using a scaled likelihood (implicit, improper priors) can often cause problems, especially when the model is poorly constrained by the data - in particular, models with poorly constrained random effects (singular or nearly singular) are likely to give bad results - as shown below, even models that are well-behaved for frequentist fitting may need stronger priors to give well-behaved MCMC results - as with all MCMC analysis, it is the *user's responsibility to check for proper mixing and convergence of the chains* (e.g. with functions from the `coda` package) before drawing conclusions. `tmbstan` does some checking automatically, so it produces warnings where the DIY sampler below does not. The first MCMC sampler illustrated below is conceptually simple (Metropolis with a multivariate normal candidate distribution). Users who want to do MCMC sampling with less code or on a regular basis should consider the [tmbstan package](https://CRAN.R-project.org/package=tmbstan), which does more efficient hybrid/Hamiltonian Monte Carlo sampling, and can take full advantage of `glmmTMB`'s ability to provide gradients of the log-posterior with respect to the parameters. More info on `tmbstan` and the methods that are available to use on the samples can be found on [Github](https://github.com/kaskr/tmbstan) (e.g. `methods(class="stanfit")`). It is safe to skip the Metropolis-Hastings (DIY) section of the salamander example below if the user prefers to use `tmbstan`. In general, you can plug the objective function from `glmmTMB` (i.e., `fitted_model$obj$fn`) into any general-purpose MCMC package (e.g. [adaptMCMC](https://CRAN.R-project.org/package=adaptMCMC ), [MCMCpack](https://CRAN.R-project.org/package=MCMCpack) (using `MCMCmetrop1R()`), [ensemblemcmc](https://CRAN.R-project.org/package=mcmcensemble), ...). However, most of these algorithms do not take advantage of `glmmTMB`'s automatic computation of gradients; for that, we will demonstrate the use of [tmbstan](https://CRAN.R-project.org/package=tmbstan), which uses `glmmTMB`'s objective and gradient functions in conjunction with the sampling algorithms from [Stan](https://mc-stan.org/). ```{r knitr_setup, include=FALSE, message=FALSE} library(knitr) opts_chunk$set(echo = TRUE) ## OK to evaluate on CRAN since we have stored all the slow stuff ... ## eval = identical(Sys.getenv("NOT_CRAN"), "true")) knitr::read_chunk(system.file("vignette_data", "mcmc.R", package="glmmTMB")) L <- load(system.file("vignette_data", "mcmc.rda", package="glmmTMB")) library(png) library(grid) ``` ## Setup Load packages: ```{r libs,message=FALSE} library(glmmTMB) library(coda) ## MCMC utilities library(reshape2) ## for melt() ## graphics library(lattice) library(ggplot2); theme_set(theme_bw()) ``` ## Salamander example Fit basic model: ```{r fit1} ``` ### Metropolis-Hastings (DIY) This is a basic block Metropolis sampler, based on Florian Hartig's code [here](https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/). ```{r run_MCMC} ``` Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points. ```{r setup} ``` Run the chain: ```{r do_run_MCMC,eval=FALSE} ``` (running this chain takes `r round(t1["elapsed"],1)` seconds) Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters): ```{r add_names} colnames(m1) <- colnames(vcov(fm1, full = TRUE)) colnames(m1)[ncol(m1)] <- "sd_site" ``` We could look at a traceplot (redacted for space, but appears OK): ```{r traceplot,fig.width=10, fig.height = 7, eval = FALSE} lattice::xyplot(m1,layout=c(2,3),asp="fill") ``` ```{r effsize} print(effectiveSize(m1),digits=3) ``` These effective sizes are probably still too small. **In a real analysis we would stop and make sure we had addressed the mixing/convergence problems before proceeding**; for this simple sampler, some of our choices would be (1) simply run the chain for longer; (2) tune the candidate distribution (e.g. by using `tune` to scale some parameters, or perhaps by switching to a multivariate Student t distribution [see the `mvtnorm` package]); (3) add regularizing priors. Ignoring the problems and proceeding, we can compute column-wise quantiles or highest posterior density intervals (`coda::HPDinterval`) to get confidence intervals. Plotting posterior distributions, omitting the intercept because it's on a very different scale. ```{r violins,echo=FALSE, fig.width = 6, fig.height = 6} m_long <- reshape2::melt(as.matrix(m1[,-1])) ggplot(m_long, aes(x=Var2, y=value))+ geom_violin(fill="gray")+ coord_flip()+labs(x="") ``` ### tmbstan The `tmbstan` package allows direct, simple access to a hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object; the `$obj` component of a `glmmTMB` fit is such an object. (To run this example you'll need to install the `tmbstan` package and its dependencies.) ```{r do_tmbstan,eval=FALSE} ``` Running this command, which creates 4 chains, takes `r round(t2["elapsed"],1)` seconds, with no parallelization. If you're going to do this a lot, you can use argument `cores=n` for running chains in parallel. The argument is sent to `rstan::sampling` so you should look at the helpfile `?rstan::sampling`. Running `bayestestR::diagnostic_posterior()` on the fit gives the following results: ```{r diagnostic_tab, echo = FALSE} knitr::kable(dp2, digits = c(0, 0, 3, 3)) ``` A trace plot (`rstan::traceplot(m2, pars=c("beta","betazi","theta"))`): (also redacted for space, also looks OK) ```{r show_traceplot,echo=FALSE,fig.width=10,fig.height=5,eval = FALSE} img <- readPNG(system.file("vignette_data","tmbstan_traceplot.png",package="glmmTMB")) grid.raster(img) ``` Pairs plot (`pairs(m2, pars = c("beta", "betazi"), gap = 0)`): (also redacted for space, also looks OK) ```{r show_pairsplot,echo=FALSE,fig.width=8,fig.height=8, eval=FALSE} img <- readPNG(system.file("vignette_data","tmbstan_pairsplot.png",package="glmmTMB")) grid.raster(img) ``` ## Sleep study example Now using the (now) classic `sleepstudy` data set included with the `lme4` package: ```{r sleepstudy_tmbstan, eval = FALSE} ``` Running this chain produces *many* warnings about divergent transitions, low effective sample size, etc.; the diagnostic table confirms this. Diagnostics: ```{r sleepstudy_diag, eval = FALSE} ``` ```{r sleepstudy_diag_tab, echo = FALSE} knitr::kable(dp3, digits = c(0, 0, 3, 3)) ``` Most of the trace plot (redacted) looks OK. However, one sub-panel shows that on of the parameters (`theta[3]`, controlling the intercept/slope correlation) is problematic for one of the chains: ```{r sleepstudy_trace,fig.width=10,fig.height=5, echo = FALSE, eval = FALSE} img <- readPNG(system.file("vignette_data","sleepstudy_traceplot.png",package="glmmTMB")) grid.raster(img) ``` ```{r sleepstudy_trace_theta3,fig.width=5,fig.height=5, echo = FALSE} img <- readPNG(system.file("vignette_data","sleepstudy_traceplot_theta3.png",package="glmmTMB")) grid.raster(img) ``` One way to address these problems is to add bounds that prevent the chains from going outside a range of $\pm 5$ standard errors from the MLE (we originally used $\pm 3 \sigma$, but it looked like these bounds were being hit too frequently - if possible we want bounds that will prevent crazy behaviour but *not* otherwise constrain the chains ...) ```{r sleepstudy_tmbstan_bounds, eval = FALSE} ``` The diagnostics and the trace plot look much better now (only `theta[3]` shown) ... ```{r sleepstudy_bounds_diag, eval = FALSE} ``` ```{r sleepstudy_bounds_diag_tab, echo = FALSE} knitr::kable(dp4, digits = c(0, 0, 3, 3)) ``` ```{r sleepstudy_trace_bounds,fig.width=10,fig.height=5, echo = FALSE, eval = FALSE} img <- readPNG(system.file("vignette_data","sleepstudy_traceplot_bounds.png",package="glmmTMB")) grid.raster(img) ``` ```{r sleepstudy_trace_bounds_theta3,fig.width = 5, fig.height=5, echo = FALSE} img <- readPNG(system.file("vignette_data","sleepstudy_traceplot_bounds_theta3.png",package="glmmTMB")) grid.raster(img) ``` The last step (for now) is to extract the sampled values, give them interpretable names, and transform the correlation parameter back to a more interpretable scale (see the "Covariance structures" vignette): ```{r trans_param, eval = FALSE} ``` ```{r sleepstudy_hist, fig.width = 10, fig.height = 5} m4_long <- reshape2::melt(as.matrix(samples4)) ggplot(m4_long, aes(x = value)) + geom_histogram(bins = 50) + facet_wrap(~Var2, scale = "free") ``` A more principled, properly Bayesian way to handle this problem would be to add sensible regularizing priors. (To be done/tested; as of this writing, priors can be set on fixed effects and on random effect standard deviations, but not on random effect correlations ...)