---
title: "How To Simulate Traits"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How To Simulate Traits}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

```{r setup}
library(smer)
library(genio)
```

For a simple illustration, we generate a synthetic allele count matrix with
minor allele frequency `maf` greater than 5%. The function `simulate_traits()`
requires the input genotype data to be
stored in the PLINK format (`.bim`, `.bed`, and `.fam` files). Here we use the
package `genio` to write
the allele count matrix to a PLINK file.


```{r genotypes, eval=FALSE}
plink_file <- tempfile()
n_samples <- 3000
n_snps <- 5000
maf <- 0.05 + 0.45 * runif(n_snps)
random_minor_allele_counts   <- (runif(n_samples * n_snps) < maf) +
                                (runif(n_samples * n_snps) < maf)
allele_count_matrix <- matrix(random_minor_allele_counts,
  nrow = n_samples,
  ncol = n_snps,
  byrow = TRUE,
)

# Create .fam and .bim data frames
fam <- data.frame(
  fam = sprintf("F%d", 1:n_samples),       # Family ID
  id = sprintf("I%d", 1:n_samples),       # Individual ID
  pat = 0,                        # Paternal ID
  mat = 0,                        # Maternal ID
  sex = sample(1:2, n_samples, replace = TRUE),
  pheno = -9
)

bim <- data.frame(
  chr = 1,               # Chromosome number
  id = sprintf("rs%d", 1:n_snps),   # SNP name
  posg = 0,                         # Genetic distance (cM)
  pos = 1:n_snps,          # Base-pair position
  ref = sample(c("A", "C", "G", "T"), n_snps, replace = T), # Minor allele
  alt = sample(c("A", "C", "G", "T"), n_snps, replace = T)  # Major allele
)

# Write to .bed, .fam, and .bim files
write_plink(
  file = plink_file,
  X = t(allele_count_matrix),
  fam = fam,
  bim = bim
)
```


To simulate traits, we need to specify the genetic architecture. E.g., the
function requires us to specify

- the narrow sense heritability $h^2$
- the indices SNPs contributing to the narrow sense heritability
- the heritability due to epistasis
- the indices of SNPs contributing to the epistatic trait variance
- the path to the output file.

The simulated trait is written to file in the PLINK phenotype format.

```{r traits, eval=FALSE}
pheno_file <- tempfile()
additive_heritability <- 0.3
gxg_heritability <- 0.25
n_snps_gxg_group <- 5
n_snps_additive <- 100
additive_snps <- sort(sample(1:n_snps, n_snps_additive, replace = F))
gxg_group_1 <- sort(sample(additive_snps, n_snps_gxg_group, replace = F))
gxg_group_2 <- sort(sample(setdiff(additive_snps, gxg_group_1),
                           n_snps_gxg_group, replace = F))

simulate_traits(
  plink_file,
  pheno_file,
  additive_heritability,
  gxg_heritability,
  additive_snps,
  gxg_group_1,
  gxg_group_2
)
```

The variance components of the traits are constructed to match the input.
See Crawford et al. (2017) and
[mvMAPIT Documentation: Simulate Traits](https://lcrawlab.github.io/mvMAPIT/articles/tutorial-simulations.html)
for a full description of the simulation scheme.

## SessionInfo

```{r seesionInfo}
sessionInfo()
```