---
title: "Statistics and clustering"
output: rmarkdown::html_vignette
bibliography: "references.bib"
vignette: >
  %\VignetteIndexEntry{Statistics and clustering}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette is adapted from the official Armadillo
[documentation](https://arma.sourceforge.net/docs.html).

# Mean {#mean}

The `mean` function computes the mean of a vector, matrix, or cube. For a vector
argument, the mean is calculated using all the elements of the vector. For a
matrix argument, the mean is calculated for each column by default (`dim = 0`),
or for each row (`dim = 1`). For a cube argument, the mean is calculated along
the specified dimension (same as for matrices plus `dim = 2` for slices).

Usage:

```cpp
mean(V)

mean(M)
mean(M, dim)

mean(Q)
mean(Q, dim)
```

## Examples

```cpp
[[cpp11::register]] list mean1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);
  
  // create a cube with 3 copies of B + random noise
  cube C(B.n_rows, B.n_cols, 3);
  C.slice(0) = B + 0.1 * randn<mat>(B.n_rows, B.n_cols);
  C.slice(1) = B + 0.2 * randn<mat>(B.n_rows, B.n_cols);
  C.slice(2) = B + 0.3 * randn<mat>(B.n_rows, B.n_cols);

  vec D = mean(A).t();
  vec E = mean(A, 1);
  vec F = mean(mean(B, 1), 1);

  writable::list res(3);
  res[0] = as_doubles(D);
  res[1] = as_doubles(E);
  res[2] = as_doubles(F);

  return res;
}
```

# Median {#median}

The `median` function computes the median of a vector or matrix. For a vector
argument, the median is calculated using all the elements of the vector. For a
matrix argument, the median is calculated for each column by default
(`dim = 0`), or for each row (`dim = 1`).

Usage:

```cpp
median(V)

median(M)
median(M, dim)
```

## Examples

```cpp
[[cpp11::register]] list median1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);

  vec C = median(A).t();
  vec D = median(A, 1);
  vec E = median(median(B, 1), 1);

  writable::list res(3);
  res[0] = as_doubles(C);
  res[1] = as_doubles(D);
  res[2] = as_doubles(E);

  return res;
}
```

# Standard deviation {#stddev}

The `stddev` function computes the standard deviation of a vector or matrix.
For a vector argument, the standard deviation is calculated using all the
elements of the vector. For a matrix argument, the standard deviation is
calculated for each column by default (`dim = 0`), or for each row (`dim = 1`).

The `norm_type` argument is optional; by default `norm_type = 0` is used. The
`norm_type` argument controls the type of normalization used, with `N` denoting
the number of observations:

- for `norm_type = 0`, normalization is done using `N-1`, providing the best
  unbiased estimation of the standard deviation (if the observations are from a
  normal distribution).
- for `norm_type = 1`, normalization is done using `N`, which provides the
  second moment of the observations about their mean.

Usage:

```cpp
stddev(V)
stddev(V, norm_type)

stddev(M)
stddev(M, norm_type)
stddev(M, norm_type, dim)
```

## Examples

```cpp
[[cpp11::register]] list stddev1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);

  vec C = stddev(A).t();
  vec D = stddev(A, 1).t();
  vec E = stddev(A, 1, 1);

  writable::list res(3);
  res[0] = as_doubles(C);
  res[1] = as_doubles(D);
  res[2] = as_doubles(E);

  return res;
}
```

# Variance {#var}

The `var` function computes the variance of a vector or matrix.
For a vector argument, the variance is calculated using all the
elements of the vector. For a matrix argument, the variance is
calculated for each column by default (`dim = 0`), or for each row (`dim = 1`).

The `norm_type` argument is optional; by default `norm_type = 0` is used. The
`norm_type` argument controls the type of normalization used, with `N` denoting
the number of observations:

- for `norm_type = 0`, normalization is done using `N-1`, providing the best
  unbiased estimation of the standard deviation (if the observations are from a
  normal distribution).
- for `norm_type = 1`, normalization is done using `N`, which provides the
  second moment of the observations about their mean.

Usage:

```cpp
var(V)
var(V, norm_type)

var(M)
var(M, norm_type)
var(M, norm_type, dim)
```

## Examples

```cpp
[[cpp11::register]] list var1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);

  vec C = var(A).t();
  vec D = var(A, 1).t();
  vec E = var(A, 1, 1);

  writable::list res(3);
  res[0] = as_doubles(C);
  res[1] = as_doubles(D);
  res[2] = as_doubles(E);

  return res;
}
```

# Range {#range}

The `range` function computes the range of a vector or matrix. For a vector
argument, the range is calculated using all the elements of the vector. For a
matrix argument, the range is calculated for each column by default (`dim = 0`),
or for each row (`dim = 1`).

Usage:

```cpp
range(V)

range(M)
range(M, dim)
```

## Examples

```cpp
[[cpp11::register]] list range1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);

  vec C = range(A).t();
  vec D = range(A, 1);

  writable::list res(2);
  res[0] = as_doubles(C);
  res[1] = as_doubles(D);

  return res;
}
```

# Covariance {#cov}

The `cov` function computes the covariance between two matrices or vectors. If
each row of `X` and `Y` is an observation and each column is a variable, the
`(i,j)`-th entry of `cov(X,Y)` is the covariance between the `i`-th variable in
`X` and the `j`-th variable in `Y`.

For two matrix arguments `X` and `Y`, the `cov(X,Y)` function computes the
covariance between the two matrices. For vector arguments, the type of vector is
ignored and each element in the vector is treated as an observation. The
`cov(X)` function is equivalent to `cov(X, X)`.

The `norm_type` argument is optional; by default `norm_type = 0` is used. The
`norm_type` argument controls the type of normalization used, with `N` denoting
the number of observations:

- for `norm_type = 0`, normalization is done using `N-1`, providing the best
  unbiased estimation of the covariance matrix (if the observations are from a
  normal distribution).
- for `norm_type = 1`, normalization is done using `N`, which provides the
  second moment matrix of the observations about their mean.

Usage:

```cpp
cov(X, Y, norm_type)

cov(X)
cov(X, norm_type)
```

## Examples

```cpp
[[cpp11::register]] list cov1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);

  mat C = cov(A, B);
  mat D = cov(A, B, 1);

  writable::list res(2);
  res[0] = as_doubles_matrix(C);
  res[1] = as_doubles_matrix(D);

  return res;
}
```

# Correlation {#cor}

The `cor` function computes the correlation coefficient between two matrices or
vectors. If each row of `X` and `Y` is an observation and each column is a
variable, the `(i,j)`-th entry of `cor(X,Y)` is the correlation coefficient
between the `i`-th variable in `X` and the `j`-th variable in `Y`.

For two matrix arguments `X` and `Y`, the `cor(X,Y)` function computes the
correlation coefficient between the two matrices. For vector arguments, the type
of vector is ignored and each element in the vector is treated as an observation.

The `norm_type` argument is optional; by default `norm_type = 0` is used. The
`norm_type` argument controls the type of normalization used, with `N` denoting
the number of observations:

- for `norm_type = 0`, normalization is done using `N-1`.
- for `norm_type = 1`, normalization is done using `N`.

Usage:

```cpp
cor(X, Y)
cor(X, Y, norm_type)

cor(X)
cor(X, norm_type)
```

## Examples

```cpp
[[cpp11::register]] list cor1_(const doubles_matrix<>& X,
  const doubles_matrix<>& Y) {
  mat A = as_Mat(X);
  mat B = as_Mat(Y);

  mat C = cor(A, B);
  mat D = cor(A, B, 1);

  writable::list res(2);
  res[0] = as_doubles_matrix(C);
  res[1] = as_doubles_matrix(D);

  return res;
}
```

# Histogram {#hist}

The `hist` function computes a histogram of counts for a vector or matrix. For a
vector argument, the histogram is calculated using all the elements of the
vector. For a matrix argument, the histogram is calculated for each column by
default (`dim = 0`), or for each row (`dim = 1`).

The bin centers can be automatically determined from the data, with the number
of bins specified via `n_bins` (the default is 10). The range of the bins is
determined by the range of the data. The bin centers can be explicitly
specified in the `centers` vector, which must contain monotonically increasing
values.

Usage:

```cpp
hist(V)
hist(V, n_bins)
hist(V, centers)

hist(M, centers)
hist(M, centers, dim)
```

## Examples

```cpp
[[cpp11::register]] list hist1_(const int& n) {
  vec A = randu<vec>(n);

  uvec h1 = hist(A, 11);
  uvec h2 = hist(A, linspace<vec>(-2, 2, 11));

  writable::list res(2);
  res[0] = as_integers(h1);
  res[1] = as_integers(h2);

  return res;
}
```

# Histogram of counts with user specified edges {#histc}

The `histc` function computes a histogram of counts for a vector or matrix. For
a vector argument, the histogram is calculated using all the elements of the
vector. For a matrix argument, the histogram is calculated for each column by
default (`dim = 0`), or for each row (`dim = 1`).

The bin edges have to be specified and contain monotonically increasing
values.

Usage:

```cpp
histc(V)
histc(V, edges)

hist(M, edges)
hist(M, edges, dim)
```

## Examples

```cpp
[[cpp11::register]] integers histc1_(const int& n) {
  vec A = randu<vec>(n);

  uvec h = histc(A, linspace<vec>(-2,2,11));

  return as_integers(h);
}
```

# Quantiles of a dataset {#quantile}

The `quantile` function computes the quantiles corresponding to the cumulative
probability values for a vector or matrix. For a vector argument, the quantiles
are calculated using all the elements of the vector. For a matrix argument, the
quantiles are calculated for each column by default (`dim = 0`), or for each
row (`dim = 1`).

The probabilities have to be specified as a second argument `P`.

The algorithm for calculating the quantiles is based on Definition 5 in:
*Rob J. Hyndman and Yanan Fan. Sample Quantiles in Statistical Packages. The American Statistician, 50(4), 361-365, 1996. DOI: 10.2307/2684934*

Usage:

```cpp
quantile(V, P)

quantile(M, P)
quantile(M, P, dim)
```

## Examples

```cpp
[[cpp11::register]] doubles quantile1_(const int& n) {
  vec A = randu<vec>(n);

  vec P = {0.0, 0.25, 0.50, 0.75, 1.0};
  vec Q = quantile(A, P);

  return as_doubles(Q);
}
```

# Principal component analysis (PCA) {#princomp}

TODO: This needs a custom method.

# Probability density function of normal distribution {#normpdf}

The `normpdf` function computes the probability density function of a normal
distribution for a scalar, vector, or matrix. For each scalar `x` in `X`, the
probability density function is computed according to a Gaussian (normal)
distribution using the corresponding mean value `m` in `M` and the corresponding
standard deviation value `s` in `S`.

$$
y = \frac{1}{s \sqrt{2\pi}} \exp\left[-\frac{(x - m)^2}{2s^2}\right]
$$

* `X` can be a scalar, vector, or matrix.
* `M` and `S` can jointly be either scalars, vectors, or matrices.
* If `M` and `S` are omitted, their values are assumed to be 0 and 1,
  respectively.

## Caveat

To reduce the incidence of numerical underflows, consider using `log_normpdf()`.

## Examples

```cpp
[[cpp11::register]] list normpdf1_(const int& n) {
  vec X = randu<vec>(n);
  vec M = randu<vec>(n);
  vec S = randu<vec>(n);

  vec P1 = normpdf(X);
  vec P2 = normpdf(X, M, S);
  vec P3 = normpdf(1.23, M, S);
  vec P4 = normpdf(X, 4.56, 7.89);
  double P5 = normpdf(1.23, 4.56, 7.89);

  writable::list res(5);

  res[0] = as_doubles(P1);
  res[1] = as_doubles(P2);
  res[2] = as_doubles(P3);
  res[3] = as_doubles(P4);
  res[4] = as_doubles({P5});

  return res;
}
```

# Probability density function of log-normal distribution {#log_normpdf}

The `log_normpdf` function computes the probability density function of a
log-normal distribution for a scalar, vector, or matrix. For each scalar `x` in
`X`, the probability density function is computed according to a log-normal
distribution using the corresponding mean value `m` in `M` and the corresponding
standard deviation value `s` in `S`.

$$
y = \log\left[\frac{1}{x s \sqrt{2\pi}} \exp\left[-\frac{(\log(x) - m)^2}{2s^2}\right]\right]
$$

* `X` can be a scalar, vector, or matrix.
* `M` and `S` can jointly be either scalars, vectors, or matrices.
* If `M` and `S` are omitted, their values are assumed to be 0 and 1,
  respectively.

## Examples

```cpp
[[cpp11::register]] list lognormpdf1_(const int& n) {
  vec X = randu<vec>(n);
  vec M = randu<vec>(n);
  vec S = randu<vec>(n);

  vec P1 = log_normpdf(X);
  vec P2 = log_normpdf(X, M, S);
  vec P3 = log_normpdf(1.23, M, S);
  vec P4 = log_normpdf(X, 4.56, 7.89);
  double P5 = log_normpdf(1.23, 4.56, 7.89);

  writable::list res(5);

  res[0] = as_doubles(P1);
  res[1] = as_doubles(P2);
  res[2] = as_doubles(P3);
  res[3] = as_doubles(P4);
  res[4] = as_doubles({P5});

  return res;
}
```

# Cumulative distribution function of normal distribution {#normcdf}

For each scalar `x` in `X`, compute its cumulative distribution function
according to a Gaussian (normal) distribution using the corresponding mean value
`m` in `M` and the corresponding standard deviation value `s` in `S`.

* `X` can be a scalar, vector, or matrix.
* `M` and `S` can jointly be either scalars, vectors, or matrices.
* If `M` and `S` are omitted, their values are assumed to be 0 and 1,
  respectively.

## Examples

```cpp
[[cpp11::register]] list normcdf1_(const int& n) {
  vec X = randu<vec>(n);
  vec M = randu<vec>(n);
  vec S = randu<vec>(n);

  vec P1 = normcdf(X);
  vec P2 = normcdf(X, M, S);
  vec P3 = normcdf(1.23, M, S);
  vec P4 = normcdf(X, 4.56, 7.89);
  double P5 = normcdf(1.23, 4.56, 7.89);

  writable::list res(5);

  res[0] = as_doubles(P1);
  res[1] = as_doubles(P2);
  res[2] = as_doubles(P3);
  res[3] = as_doubles(P4);
  res[4] = as_doubles({P5});

  return res;
}
```

# Random vectors from multivariate normal distribution {#mvnrnd}

Generate a matrix with random column vectors from a multivariate Gaussian
(normal) distribution with parameters `M` and `C`.

* `M` is the mean and it must be a column vector.
* `C` is the covariance matrix and it must be symmetric positive semi-definite
  (ideally positive definite).
* `N` is the number of column vectors to generate. If `N` is omitted, it is
  assumed to be 1.

Usage:

```cpp
X = mvnrnd(M, C)
mvnrnd(X, M, C)
mvnrnd(X. M. C. N)
```

The first form returns an error if the generation fails. The second and third
form reset `X` and return a boolean set to `false` without error if the
generation fails.

## Examples

```cpp
[[cpp11::register]] doubles_matrix<> mvnrnd1_(const int& n, const int&m) {
  vec M = randu<vec>(n);

  mat B = randu<mat>(n, n);
  mat C = B.t() * B;

  mat X = mvnrnd(M, C, m);

  return as_doubles_matrix(X);
}
```

# Random numbers from chi-squared distribution {#chi2rnd}

Generate a random scalar, vector, or matrix with elements sampled from a
chi-squared distribution with the degrees of freedom specified by parameter `DF`
or `DF_scalar`.

* `DF` is a vector or matrix, while `DF_scalar` is a scalar.
* For the `chi2rnd(DF)` form, the output vector or matrix has the same size and
  type as `DF`.
* Each value in `DF` and `DF_scalar` must be greater than zero.
* Each element in `DF` specifies a separate degree of freedom.

Usage:

```cpp
v = chi2rnd(DF)
X = chi2rnd(DF)

double s = chi2rnd<double>(DF_scalar) // float also works
vec v = chi2rnd<vec>(DF_scalar, n_elem)
mat X = chi2rnd<mat>(DF_scalar, n_rows, n_cols)
mat Y = chi2rnd<mat>(DF_scalar, size(X))
```

## Examples

```cpp
[[cpp11::register]] list chi2rnd1_(const int& n, const int& m) {
  mat X = chi2rnd(2, n, m);
  mat Y = randi<mat>(n, m, distr_param(1, 10));
  mat Z = chi2rnd(Y);

  writable::list res(2);
  res[0] = as_doubles_matrix(X);
  res[1] = as_doubles_matrix(Z);

  return res; 
}
```

# Random matrix from Wishart distribution {#wishrnd}

Generate a random matrix sampled from the Wishart distribution with parameters
`S` and `df`.

* `S` is a symmetric positive definite matrix (e.g., a covariance matrix).
* `df` is a scalar specifying the degrees of freedom; it can be a non-integer
  value.
* `D` is an optional argument to specify the Cholesky decomposition of `S`.

Usage:

```cpp
W = wishrnd(S, df)
wishrnd(W, S, df)
```

The first form returns an error if the generation fails. The second form resets
`W` and returns a boolean set to `false` without error if the generation fails.

## Examples

```cpp
[[cpp11::register]] doubles_matrix<> wishrnd1_(const int& n) {
  mat X = randu<mat>(n, n);
  mat S = X.t() * X;

  mat W = wishrnd(S, 6.7);

  return as_doubles_matrix(W);
}
```

# Random matrix from inverse Wishart distribution {#iwishrnd}

Generate a random matrix sampled from the inverse Wishart distribution with
parameters `T` and `df`.

* `T` is a symmetric positive definite matrix.
* `df` is a scalar specifying the degrees of freedom; it can be a non-integer
  value.
* `Dinv` is an optional argument; it specifies the Cholesky decomposition of the
  inverse of `T`. If `Dinv` is provided, `T` is ignored. Using `Dinv` is more
  efficient if `iwishrnd()` needs to be used many times for the same `T` matrix.

Usage:

```cpp
W = iwishrnd(T, df)
iwishrnd(W, T, df)
```

The first form returns an error if the generation fails. The second form resets
`W` and returns a boolean set to `false` without error if the generation fails.

## Examples

```cpp
[[cpp11::register]] doubles_matrix<> iwishrnd1_(const int& n, const double& d) {
  mat X = randu<mat>(n, n);
  mat T = X.t() * X;

  mat W = iwishrnd(T, d);

  return as_doubles_matrix(W);
}
```

# Cluster data into disjoint sets {#kmeans}

Cluster given data into `k` disjoint sets.

Usage:

```cpp
kmeans(means, data, k, seed_mode, n_iter, print_mode)
```

* The `means` parameter is the output matrix for storing the resulting centroids
  of the sets, with each centroid stored as a column vector. If the clustering
  fails, the `means` matrix is reset and set to `false`.
* The `data` parameter is the input data matrix, with each sample stored as a
  column vector. The `k` parameter indicates the number of centroids.
* The `seed_mode` parameter specifies how the initial centroids are seeded. It
  is one of:
  - `keep_existing`: use the centroids specified in the `means` matrix as the
    starting point.
  - `static_subset`: use a subset of the data vectors (repeatable).
  - `random_subset`: use a subset of the data vectors (random).
  - `static_spread`: use a maximally spread subset of data vectors (repeatable).
  - `random_spread`: use a maximally spread subset of data vectors (random
    start).
* The `n_iter` parameter specifies the number of clustering iterations. This is
  data dependent, but about 10 is typically sufficient.
* The `print_mode` parameter is either `true` or `false`, indicating whether
  progress is printed during clustering.

## Caveats

* The number of samples in the data matrix should be larger than `k`.
* Works much faster with OpenMP enabled in the compiler (e.g., `-fopenmp` in GCC
  and clang). Cpp11armadillo finds OpenMP and uses it by default.
* For probabilistic clustering, use the `gmm_diag` or `gmm_full` classes
  instead.

## Examples

```cpp
[[cpp11::register]] list kmeans1_(const int& n, const int& d) {
  mat data(d, n, fill::randu);

  mat means;

  bool status = kmeans(means, data, 2, random_subset, 10, true);

  if (status == false) {
    stop("clustering failed");
  }

  writable::list res(2);

  res[0] = writable::logicals({status});
  res[1] = as_doubles_matrix(means);

  return res;
}
```

# Probabilistic clustering and likelihood calculation via mixture of Gaussians {#gmm_diag-gmm_full}

`gmm_diag` and `gmm_full` are classes for multi-variate probabilistic
clustering and likelihood calculation via Gaussian Mixture Models (GMMs).
The implementation details are available in @Sanderson2017.

The distribution of data is modelled as:

$$
p(x) = \sum_{g=0}^{n_{\text{gaus}}-1} h_g N(x \mid m_g, C_g)
$$

where:

* $x$ is a column vector.
* $n_{\text{gaus}}$ is the number of Gaussians; $n_{\text{gaus}} \geq 1$.
* $N(x \mid m_g, C_g)$ represents a Gaussian (normal) distribution.
* Each Gaussian $g$ has the following parameters:
  - $h_g$ is the heft (weight), with constraints $h_g \geq 0$ and
    $\sum h_g = 1$.
  - $m_g$ is the mean vector (centroid) with dimensionality $n_{\text{dims}}$.
  - $C_g$ is the covariance matrix (either diagonal or full).

Both `gmm_diag` and `gmm_full` include tailored k-means and Expectation
Maximisation algorithms for learning model parameters from training data.

For an instance of `gmm_diag` or `gmm_full` named as `M`, the member functions
and variables are:

* `M.log_p(V)`: return a scalar representing the log-likelihood of vector `V`
  (of type `vec`).
* `M.log_p(V, g)`: return a scalar representing the log-likelihood of vector `V`
  (of type `vec`), according to Gaussian with index `g`.
* `M.log_p(X)`: return a row vector (of type `rowvec`) containing
  log-likelihoods of each column vector in matrix `X` (of type `mat`).
* `M.log_p(X, g)`: return a row vector (of type `rowvec`) containing
  log-likelihoods of each column vector in matrix `X` (of type `mat`), according
  to Gaussian with index `g`.
* `M.sum_log_p(X)`: return a scalar representing the sum of log-likelihoods for
  all column vectors in matrix `X` (of type `mat`).
* `M.sum_log_p(X, g)`: return a scalar representing the sum of log-likelihoods
  for all column vectors in matrix `X` (of type `mat`), according to Gaussian
  with index `g`.
* `M.avg_log_p(X)`: return a scalar representing the average log-likelihood of
  all column vectors in matrix `X` (of type `mat`).
* `M.avg_log_p(X, g)`: return a scalar representing the average log-likelihood
  of all column vectors in matrix `X` (of type `mat`), according to Gaussian
  with index `g`.
* `M.assign(V, dist_mode)`: return the index of the closest mean (or Gaussian)
  to vector `V` (of type `vec`); parameter `dist_mode` is one of:
  - `eucl_dist`: Euclidean distance (takes only means into account).
  - `prob_dist`: probabilistic "distance", defined as the inverse likelihood
    (takes into account means, covariances, and hefts).
* `M.assign(X, dist_mode)`: return a row vector (of type `urowvec`) containing
  the indices of the closest means (or Gaussians) to each column vector in
  matrix `X` (of type `mat`); parameter `dist_mode` is either `eucl_dist` or
  `prob_dist` (as per the `.assign()` function above).
* `M.raw_hist(X, dist_mode)`: return a row vector (of type `urowvec`)
  representing the raw histogram of counts; each entry is the number of counts
  corresponding to a Gaussian; each count is the number times the corresponding
  Gaussian was the closest to each column vector in matrix `X`;
  parameter `dist_mode` is either `eucl_dist` or `prob_dist` (as per the
  `.assign()` function above).
* `M.norm_hist(X, dist_mode)`: similar to the `.raw_hist()` function above;
  return a row vector (of type `rowvec`) containing normalised counts; the
  vector sums to one; parameter `dist_mode` is either `eucl_dist` or `prob_dist`
  (as per the `.assign()` function above).
* `M.generate()`: return a column vector (of type `vec`) representing a random
  sample generated according to the model's parameters.
* `M.generate(N)`: return a matrix (of type `mat`) containing `N` column
  vectors, with each vector representing a random sample generated according to
  the model's parameters.
* M.save(filename): save the model to a file and return a bool indicating either
  success (true) or failure (false).
* M.load(filename): load the model from a file and return a bool indicating
  either success (true) or failure (false).
* M.n_gaus(): return the number of means/Gaussians in the model.
* M.n_dims(): return the dimensionality of the means/Gaussians in the model.
* M.reset(n_dims, n_gaus): set the model to have dimensionality `n_dims`, with
  `n_gaus` number of Gaussians; all the means are set to zero, all covariance
  matrix representations are equivalent to the identity matrix, and all the
  hefts (weights) are set to be uniform.
* M.hefts: read-only row vector (of type `rowvec`) containing the hefts
  (weights).
* M.means: read-only matrix (of type `mat`) containing the means (centroids),
  stored as column vectors.
* M.dcovs: read-only matrix (of type `mat`) containing the representation of
  diagonal covariance matrices, with the set of diagonal covariances for each
  Gaussian stored as a column vector; applicable only to the `gmm_diag` class.
* M.fcovs: read-only cube containing the full covariance matrices, with each
  covariance matrix stored as a slice within the cube; applicable only to the
  `gmm_full` class.
* M.set_hefts(V): set the hefts (weights) of the model to be as specified in row
  vector `V` (of type `rowvec`); the number of hefts must match the existing
  model.
* M.set_means(X): set the means to be as specified in matrix `X` (of type
  `mat`); the number of means and their dimensionality must match the existing
  model.
* M.set_dcovs(X): set the diagonal covariances matrices to be as specified in
  matrix `X` (of type `mat`), with the set of diagonal covariances for each
  Gaussian stored as a column vector; the number of covariance matrices and
  their dimensionality must match the existing model; applicable only to the
  `gmm_diag` class.
* M.set_fcovs(X): set the full covariances matrices to be as specified in cube
  `X`, with each covariance matrix stored as a slice within the cube; the number
  of covariance matrices and their dimensionality must match the existing model;
  applicable only to the `gmm_full` class.
* M.set_params(means, covs, hefts): set all the parameters at the same time; the
  type and layout of the parameters is as per the `.set_hefts()`,
  `.set_means()`, `.set_dcovs()`, and `.set_fcovs()` functions above; the number
  of Gaussians and dimensionality can be different from the existing model.
* M.learn(data, n_gaus, dist_mode, seed_mode, km_iter, em_iter, var_floor,
  print_mode): learn the model parameters via multi-threaded k-means and/or EM
  algorithms; return a bool value, with true indicating success, and false
  indicating failure; the parameters have the following meanings:
  - `data`: matrix (of type `mat`) containing training samples; each sample is
    stored as a column vector.
  - `n_gaus`: set the number of Gaussians to `n_gaus`; to help convergence, it
    is recommended that the given data matrix (above) contains at least 10
    samples for each Gaussian.
  - `dist_mode`: specifies the distance used during the seeding of initial means
    and k-means clustering:
    - `eucl_dist`: Euclidean distance.
    - `maha_dist`: Mahalanobis distance, which uses a global diagonal covariance
      matrix estimated from the training samples; this is recommended for
      probabilistic applications.
  - `seed_mode`: specifies how the initial means are seeded prior to running
    k-means and/or EM algorithms:
    - `keep_existing`: keep the existing model (do not modify the means,
      covariances, and hefts).
    - `static_subset`: a subset of the training samples (repeatable).
    - `random_subset`: a subset of the training samples (random).
    - `static_spread`: a maximally spread subset of training samples (repeatable).
    - `random_spread`: a maximally spread subset of training samples (random
      start).
    - Caveat: seeding the initial means with `static_spread` and `random_spread`
      can be much more time consuming than with `static_subset` and
      `random_subset`.
  - `km_iter`: the number of iterations of the k-means algorithm; this is data
    dependent, but typically 10 iterations are sufficient.
  - `em_iter`: the number of iterations of the EM algorithm; this is data
    dependent, but typically 5 to 10 iterations are sufficient.
  - `var_floor`: the variance floor (smallest allowed value) for the diagonal
    covariances; setting this to a small non-zero value can help with
    convergence and/or better quality parameter estimates.
  - `print_mode`: either `true` or `false`; enable or disable printing of
    progress during the k-means and EM algorithms.

## Caveats

* `gmm_diag` is tailored for diagonal covariance matrices.
* `gmm_full` is tailored for full covariance matrices.
* `gmm_diag` is considerably faster than `gmm_full`, at the cost of some
  reduction in modelling accuracy.
* For faster execution on multi-core machines, enable OpenMP in your compiler
  (e.g., `-fopenmp` in GCC and clang). Cpp11armadillo finds OpenMP and uses it
  by default.
  
## Examples

```cpp
[[cpp11::register]] list gmm1_(const int& n, const int& d) {
  // create synthetic data with 2 Gaussians

  mat data(d, n, fill::zeros);

  vec mean0 = linspace<vec>(1, d, d);
  vec mean1 = mean0 + 2;

  int i = 0;

  while (i < n) {
    if (i < n) {
      data.col(i) = mean0 + randn<vec>(d);
      ++i;
    }
    if (i < n) {
      data.col(i) = mean0 + randn<vec>(d);
      ++i;
    }
    if (i < n) {
      data.col(i) = mean1 + randn<vec>(d);
      ++i;
    }
  }

  // model the data as a diagonal GMM with 2 Gaussians

  gmm_diag model;

  bool status = model.learn(data, 2, maha_dist, random_subset, 10, 5, 1e-5,
    true);

  if (status == false) {
    stop("learning failed");
  }

  model.means.print("means:");

  double scalar_likelihood = model.log_p(data.col(0));
  rowvec set_likelihood = model.log_p(data.cols(0, 9));

  double overall_likelihood = model.avg_log_p(data);

  uword gaus_id = model.assign(data.col(0), eucl_dist);
  urowvec gaus_ids = model.assign(data.cols(0, 9), prob_dist);

  rowvec hist1 = model.norm_hist(data, eucl_dist);
  urowvec hist2 = model.raw_hist(data, prob_dist);

  writable::list res(9);

  res[0] = writable::logicals({status});
  res[1] = as_doubles_matrix(model.means);
  res[2] = as_doubles({scalar_likelihood});
  res[3] = as_doubles(set_likelihood.t());
  res[4] = as_doubles({overall_likelihood});
  res[5] = as_integers(gaus_id);
  res[6] = as_integers(gaus_ids.t());
  res[7] = as_doubles(hist1.t());
  res[8] = as_integers(hist2.t());

  return res;
}
```