---
title: Guide to familial
author: Ryan Thompson
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Guide to familial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 5
)
```

## Introduction

`familal` is an R package for testing familial hypotheses. Briefly, familial hypotheses are statements of the form
$$
\mathrm{H}_0:\mu(\lambda)=\mu_0\text{ for some }\lambda\in\Lambda\quad\text{vs.}\quad\mathrm{H}_1:\mu(\lambda)\neq\mu_0\text{ for all }\lambda\in\Lambda,
$$
where $\{\mu(\lambda):\lambda\in\Lambda\}$ is a family of centers. Presently, `familial` supports tests of the Huber family of centers. The mean and median are members of this family.

To test familial hypotheses, `familial` uses the Bayesian bootstrap. The Bayesian bootstrap uses weights $w_1^{(b)},\ldots,w_n^{(b)}$ ($b=1,\ldots,B$) from a uniform Dirichlet distribution in the computation
$$
\mu^{(b)}(\lambda):=\underset{\mu\in\mathbb{R}}{\operatorname{arg~min}}\sum_{i=1}^nw_i^{(b)}\ell_\lambda\left(\frac{x_i-\mu}{\sigma}\right),
$$
where the Huber loss function $\ell_\lambda$ is defined as
$$
\ell_\lambda(z):=
\begin{cases}
\frac{1}{2}z^2, & \text{if } |z|<\lambda, \\
\lambda|z|-\frac{1}{2}\lambda^2, & \text{otherwise}.
\end{cases}
$$

The above minimization is solved for all values of $\lambda\in\Lambda=(0,\infty)$ for $b=1,\ldots,B$. The proportion of times that $\{\mu^{(b)}(\lambda):\lambda\in\Lambda\}_{b=1}^B$ contains the null value $\mu_0$ represents the estimated posterior probability of $\mathrm{H}_0$ being true. To map this probability to a decision (accept, reject, or indeterminate), we assign a loss to making an incorrect decision. The decision that minimizes the expected loss under the posterior distribution is the optimal one.

## One-sample tests

Let's demonstrate the functionality of ` familial`. To perform a test of centers with the Huber family, use the `center.test` function with argument `family='huber'` (the default setting). We'll test whether the velocity of galaxies in the `galaxies` dataset is different to 21,000 km/sec.
```{r}
library(familial)
set.seed(1)
x <- MASS::galaxies
test <- center.test(x, mu = 21000)
print(test)
```
The output above shows the estimated posterior probabilities for the events $\mathrm{H}_0$ and $\mathrm{H}_1$. The 54.2% probability assigned to $\mathrm{H}_0$ means that the Huber family contained a center equal to 21,000 km/sec in 54.2% of bootstraps. Because neither of the above probabilities is much larger than the other, the test results in an indeterminate outcome. By default, the function will return an indeterminate result when both $\mathrm{H}_0$ and $\mathrm{H}_1$ have probability less than 0.95. This choice of threshold is analogous to using a frequentist significance level of 0.05.

It's possible to visualize the posterior using a functional boxplot via the `plot` function.
```{r}
plot(test)
```

Rather than specify a null value that is a point, we can specify a null that is an interval. Let's now test whether the velocity is between 20,500 km/sec and 21,500 km/sec.
```{r}
center.test(x, mu = c(20500, 21500))
```
The test now accepts $\mathrm{H}_0$.

## Two-sample tests

`familial` also supports two-sample testing with paired or independent samples. We'll now test whether the weight of cabbages in the `cabbages` dataset is different between two different cultivars. These samples are independent, so we set `paired = FALSE`.
```{r message = FALSE}
x <- MASS::cabbages[MASS::cabbages$Cult == 'c39', 'HeadWt']
y <- MASS::cabbages[MASS::cabbages$Cult == 'c52', 'HeadWt']

test <- center.test(x, y, paired = FALSE)
print(test)
```
The test rejects $\mathrm{H}_0$ that the weight of cabbages is the same.

We can also visualize the posterior differences between the family of each cultivar via a functional boxplot.
```{r}
plot(test)
```

`familial` also supports one-sided tests. Let's test whether family treatment (FT) improves the weight of anorexia patients in the `anorexia` dataset. These samples are paired.
```{r}
x <- MASS::anorexia[MASS::anorexia$Treat == 'FT', 'Postwt']
y <- MASS::anorexia[MASS::anorexia$Treat == 'FT', 'Prewt']

center.test(x, y, alternative = 'greater', paired = TRUE)
```
We again reject $\mathrm{H}_0$ that FT does not improve the weight of patients.