---
title: "Post-hoc MCMC with glmmTMB"
author: "Ben Bolker and Mollie Brooks"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Post-hoc MCMC with glmmTMB}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

## Overview

One commonly requested feature is to be able to run a *post hoc* Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows examples of such an analysis. Some caveats:

- when using such a "pseudo-Bayesian" approach, be aware that using a scaled likelihood (implicit, improper priors) can often cause problems, especially when the model is poorly constrained by the data
- in particular, models with poorly constrained random effects (singular or nearly singular) are likely to give bad results
- as shown below, even models that are well-behaved for frequentist fitting may need stronger priors to give well-behaved MCMC results
- as with all MCMC analysis, it is the *user's responsibility to check for proper mixing and convergence of the chains* (e.g. with functions from the `coda` package) before drawing conclusions. `tmbstan` does some checking automatically, so it produces warnings where the DIY sampler below does not.

The first MCMC sampler illustrated below is conceptually simple (Metropolis with a multivariate normal candidate distribution). Users who want to do MCMC sampling with less code or on a regular basis should consider the [tmbstan package](https://CRAN.R-project.org/package=tmbstan), which does more efficient hybrid/Hamiltonian Monte Carlo sampling, and can take full advantage of `glmmTMB`'s ability to provide gradients of the log-posterior with respect to the parameters. More info on `tmbstan` and the methods that are available to use on the samples can be found on [Github](https://github.com/kaskr/tmbstan) (e.g. `methods(class="stanfit")`). It is safe to skip the Metropolis-Hastings (DIY) section of the salamander example below if the user prefers to use `tmbstan`.


In general, you can plug the objective function from `glmmTMB` (i.e., `fitted_model$obj$fn`) into any general-purpose MCMC package (e.g. [adaptMCMC](https://CRAN.R-project.org/package=adaptMCMC ), [MCMCpack](https://CRAN.R-project.org/package=MCMCpack) (using `MCMCmetrop1R()`), [ensemblemcmc](https://CRAN.R-project.org/package=mcmcensemble), ...). However, most of these algorithms do not take advantage of `glmmTMB`'s automatic computation of gradients; for that, we will demonstrate the use of [tmbstan](https://CRAN.R-project.org/package=tmbstan), which uses `glmmTMB`'s objective and gradient functions in conjunction with the sampling algorithms from [Stan](https://mc-stan.org/).

<!-- weird setup (code stored in inst/vignette_data/mcmc.R) is designed
     so that we can easily re-run the MCMC chains with R CMD BATCH (or whatever)
	 but also show the code here, without having two separate copies of the code.
	 Perhaps too clever?
	 -->
	 
```{r knitr_setup, include=FALSE, message=FALSE}
library(knitr)
opts_chunk$set(echo = TRUE)
## OK to evaluate on CRAN since we have stored all the slow stuff ...
##               eval = identical(Sys.getenv("NOT_CRAN"), "true"))
knitr::read_chunk(system.file("vignette_data", "mcmc.R", package="glmmTMB"))
L <- load(system.file("vignette_data", "mcmc.rda", package="glmmTMB"))
library(png)
library(grid)
```

## Setup
Load packages:

```{r libs,message=FALSE}
library(glmmTMB)
library(coda)     ## MCMC utilities
library(reshape2) ## for melt()
## graphics
library(lattice)
library(ggplot2); theme_set(theme_bw())
```

## Salamander example

Fit basic model:
```{r fit1}
```

### Metropolis-Hastings (DIY)

This is a basic block Metropolis sampler, based on Florian Hartig's code [here](https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/).

```{r run_MCMC}
```

Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points.
```{r setup}
```

Run the chain:

```{r do_run_MCMC,eval=FALSE}
```

(running this chain takes `r round(t1["elapsed"],1)` seconds)

Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters):

```{r add_names}
colnames(m1) <- colnames(vcov(fm1, full = TRUE))
colnames(m1)[ncol(m1)] <- "sd_site"
```

We could look at a traceplot (redacted for space, but appears OK):
```{r traceplot,fig.width=10, fig.height = 7, eval = FALSE}
lattice::xyplot(m1,layout=c(2,3),asp="fill")
```

```{r effsize}
print(effectiveSize(m1),digits=3)
```

These effective sizes are probably still too small.
**In a real analysis we would stop and make sure we had addressed the mixing/convergence problems before proceeding**; for this simple sampler, some of our choices would be (1) simply run the chain for longer; (2) tune the candidate distribution (e.g. by using `tune` to scale some parameters, or perhaps by switching to a multivariate Student t distribution [see the `mvtnorm` package]); (3) add regularizing priors.

Ignoring the problems and proceeding, we can compute column-wise quantiles or highest posterior density intervals (`coda::HPDinterval`) to get confidence intervals.
Plotting posterior distributions, omitting
the intercept because it's on a very different scale.

```{r violins,echo=FALSE, fig.width = 6, fig.height = 6}
m_long <- reshape2::melt(as.matrix(m1[,-1]))
ggplot(m_long, aes(x=Var2, y=value))+
    geom_violin(fill="gray")+
    coord_flip()+labs(x="")
```

### tmbstan

The `tmbstan` package allows direct, simple access to a hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object; the `$obj` component of a `glmmTMB` fit is such an object. (To run this example you'll need to install the `tmbstan` package and its dependencies.)

```{r do_tmbstan,eval=FALSE}
```

Running this command, which creates 4 chains, takes `r round(t2["elapsed"],1)` seconds, with no parallelization. If you're going to do this a lot, you can use argument `cores=n` for running chains in parallel. The argument is sent to `rstan::sampling` so you should look at the helpfile `?rstan::sampling`. Running `bayestestR::diagnostic_posterior()` on the fit gives the following results:

```{r diagnostic_tab, echo = FALSE}
knitr::kable(dp2, digits = c(0, 0, 3, 3))
```

A trace plot (`rstan::traceplot(m2, pars=c("beta","betazi","theta"))`):

(also redacted for space, also looks OK)
```{r show_traceplot,echo=FALSE,fig.width=10,fig.height=5,eval = FALSE}
img <- readPNG(system.file("vignette_data","tmbstan_traceplot.png",package="glmmTMB"))
grid.raster(img)
```

Pairs plot (`pairs(m2, pars = c("beta", "betazi"), gap = 0)`):
(also redacted for space, also looks OK)

```{r show_pairsplot,echo=FALSE,fig.width=8,fig.height=8, eval=FALSE}
img <- readPNG(system.file("vignette_data","tmbstan_pairsplot.png",package="glmmTMB"))
grid.raster(img)
```

## Sleep study example

Now using the (now) classic `sleepstudy` data set included with the `lme4` package:

```{r sleepstudy_tmbstan, eval = FALSE}
```

Running this chain produces *many* warnings about divergent transitions, low effective sample size, etc.; the diagnostic table confirms this.

Diagnostics:

```{r sleepstudy_diag, eval = FALSE}
```

```{r sleepstudy_diag_tab, echo = FALSE}
knitr::kable(dp3, digits = c(0, 0, 3, 3))
```

Most of the trace plot (redacted) looks OK. However, one sub-panel shows that on of the parameters (`theta[3]`, controlling the intercept/slope correlation)
is problematic for one of the chains:

```{r sleepstudy_trace,fig.width=10,fig.height=5, echo = FALSE, eval = FALSE}
img <- readPNG(system.file("vignette_data","sleepstudy_traceplot.png",package="glmmTMB"))
grid.raster(img)
```

```{r sleepstudy_trace_theta3,fig.width=5,fig.height=5, echo = FALSE}
img <- readPNG(system.file("vignette_data","sleepstudy_traceplot_theta3.png",package="glmmTMB"))
grid.raster(img)
```

One way to address these problems is to add bounds that prevent the chains from going outside a 
range of $\pm 5$ standard errors from the MLE (we originally used $\pm 3 \sigma$, but it looked like these bounds were being hit too frequently - if possible we want bounds that will prevent crazy behaviour but *not* otherwise constrain the chains ...)

```{r sleepstudy_tmbstan_bounds, eval = FALSE}
```

The diagnostics and the trace plot look much better now (only `theta[3]` shown) ...

```{r sleepstudy_bounds_diag, eval = FALSE}
```

```{r sleepstudy_bounds_diag_tab, echo = FALSE}
knitr::kable(dp4, digits = c(0, 0, 3, 3))
```

```{r sleepstudy_trace_bounds,fig.width=10,fig.height=5, echo = FALSE, eval = FALSE}
img <- readPNG(system.file("vignette_data","sleepstudy_traceplot_bounds.png",package="glmmTMB"))
grid.raster(img)
```

```{r sleepstudy_trace_bounds_theta3,fig.width = 5, fig.height=5, echo = FALSE}
img <- readPNG(system.file("vignette_data","sleepstudy_traceplot_bounds_theta3.png",package="glmmTMB"))
grid.raster(img)
```

The last step (for now) is to extract the sampled values, give them interpretable names, and transform the correlation parameter back to a more interpretable scale (see the "Covariance structures" vignette):

```{r trans_param, eval = FALSE}
```

```{r sleepstudy_hist, fig.width = 10, fig.height = 5}
m4_long <- reshape2::melt(as.matrix(samples4))
ggplot(m4_long, aes(x = value)) + geom_histogram(bins = 50) + facet_wrap(~Var2, scale = "free")
```

A more principled, properly Bayesian way to handle this problem would be to add sensible regularizing priors. (To be done/tested; as of this writing, priors can be set on fixed effects and on random effect standard deviations, but not on random effect correlations ...)