This document introducess the usage of the package fmlogcondens. The package name stands for (f)ast (m)ultivariate (log)-(con)cave (dens)ity estimation. It estimates a non-pararametric density, whose logarithm is a concave function. The class of log-concave probability distributions includes many well known parametric densities such as normal distributions, exponential distributions, Wishart distributions, Gamma distributions with shape parameter larger than one and many more (Walther and others 2009).
The second part of this document introduces all relevant functions of the package for obtaining and plotting estimates of log-concave and mixtures of log-concave densities. The first part provides some theoretical background of the maximum likelihood estimator and how it is approached algorithmically. It also compares this package to two relevant R packages: logcondens (Dümbgen, Hüsler, and Rufibach 2007) that can be used for univariate data and LogConcDEAD (Cule, Samworth, and Stewart 2010) which adresses the multivariate case, though has significanlty longer runtimes. A detailed numerical comparison can be found in our publication (Rathke and Schnörr 2015).
The estimation of a log-concave density can be formulated as the maximum likelihood problem for any sample \(X\) consisting of \(n\) points \(X_i \in \mathbb{R}^d\).
\[ \hat{f}_n = \text{argmax}_{f} \; \sigma(f) \qquad \sigma(f) = \frac{1}{n} \sum_{i=1}^n \log f(X_i) + \int f(x) dx \] The integral term is a Lagrange term that guarantees that \(\hat{f}_n\) is a normalized density. Finding the MLE \(\hat{f}_n\) is significantly simplified by the fact, that solutons \(\log{\hat{f}_n}\) take the form of piecwise-linear functions. Furthermore, (Cule, Samworth, and Stewart 2010) went on to show that \(\hat{f}_n\) is uniquely determined by \(X\) and a vector \(y \in \mathbb{R}^n\), where \(y_i = \log f(X_i)\). Consequently, they formulate the MLE problem in terms of \(y\):
\[
\sigma(y) = \frac{1}{n} \sum_{i=1}^n y_i + \int f_y(x) dx, \qquad \log f_y(x) = \sup_{\lambda}\left\{\sum_{i=1}^n \lambda_iy_i \;\Big|\; x = \sum_{i=1}^n \lambda_i X_i, \sum_{i=1}^n \lambda_i = 1, \lambda_i \geq 0 \right\}.
\] Their package LogConcDEAD implements a solver for this objective function. The major drawback of their approach is a long running time for large samples (\(\geq\) 250 points). For example for 10000 samples in 2-D their approach runs for hours while our approach is done in a matter of seconds.
A piecewise concave linear function can be equivalently defined in terms of its slopes \(a\) and intercepts \(\beta\). We propose the reformulation of the MLE in terms of these hyperplane parameters:
\[ \sigma(a,\beta) = \frac{1}{n} \sum_{i=1}^n \log f_{a,\beta}(X_i) + \int f_{a,\beta} (x) dx, \qquad \log f_{a, \beta} (x) = \min_k (a_k^Tx - \beta_k) \] This has two significant advantages: We demonstrate in (Rathke and Schnörr 2015) how this leads to sparse representations (in terms of the number of hyperplanes), which speeds up calculations significantly. Also our objective is smooth, which allows us to use a quasi-Newton optimization method adapted to large scale optimization problems (BFGS-L) (Nocedal and Wright 2006). While our formulation is non-convex contrary to the one of Cule et al., we proove numerically that our approach yields solutions that are very close to the optimum in much less time.
For samples in one dimension, the R package logcondens analytically yields the optimal density with the minimal number of hyperplanes.
The most convenient way to install the package is:
install.packages("fmlogcondens")The drawback of this approach is that the installed binaries where compiled for generic processors, which exclude AVX optimizations. See the next section, on how to activate AVX (and speedup the code by a factor of ~5x).
AVX is the successor to SSE and MMX, and denotes the ability of modern CPUs to do certain simple computations (addition, multiplication, …) much faster due to specialized instructions. To activate AVX, we have to install the package from source. To do this, we first download the source code from CRAN: link (download the tar.gz file).
Then in your favorite text editor open the file ~/.R/Makevars (create it if it does not exist) and add the line
CFLAGS += -march=nativeThis activates processor specific optimizations for the compilation of C code. [NOTE: This flag will affect all future installations that are manually compiled from source. In general it is a good idea to keep this flag after installing this pacakge, since it may potentially lead to faster code for other packages too.]
Finally, from within R, we run the command:
install.packages("path-to-package/fmlogcondens_1.0.0.tar.gz", type = "source")Alternatively we can execute the following command from the shell
R CMD INSTALL path-to-package/fmlogcondens_1.0.0.tar.gzYou can check whether AVX was activated for the installation by calling the function compilationInfo() in R after loading the package. The desired output would be AVX vector extensions activated..
Load the package in R with:
library(fmlogcondens)We start with a small example of 100 points in 2-D. We sample from a normal density with zero mean and identity covariance matrix.
set.seed(222)
X = matrix(rnorm(200), 100, 2)
r <- fmlcd(X) # estimate log-concave densityThe logarithm of the density f(x) is a piecewise-linear concave function and is parametrized by a set of hyperplane parameters (see Background). r$a contains the normal vectors of all hyperplanes whereas r$b containts their intercepts. r$logMLE holds the log-likelihoods for all data points X_i.
Additionally, a sparse representation is returned in r$aSparse and r$bSparse. This corresponds to the final solution of the Newton optimization. In a post-processing step we analytically normalize this sparse representation, thereby losing the sparseness property. Thus while only numerically normalized on the interation grid, the sparse representation can be usefull for qualitative insights into the estimated density.
We can plot the resulting density using functionality of the package LogConcDEAD. In order to be able to use all functions in their package, we have to create an object of class LogConvDEAD using the function getinfolcd(). We can then use the function plot() to display the estimated density as well as the log-density.
library(LogConcDEAD)
r <- LogConcDEAD::getinfolcd(X, r$logMLE) # create a `LogConcDEAD` object
par(mfrow = c(1, 2)) #square plots
plot(r, addp = FALSE, asp = 1, main="density")
plot(r, uselog = TRUE, addp = FALSE, asp = 1, main="log density")Other useful functions are provided by the package LogConcDEAD: To evaluate the estimated density \(f(x)\) at a point \(x\), we use the function LogConcDEAD::dlcd. To draw a sample from \(f(x)\), use LogConcDEAD::rlcd.
# evaluate density for the point (0,0)
x <- c(0,0)
LogConcDEAD::dlcd(x,r)
#> [1] 0.1424553
# sample 10 points from the estimated density
LogConcDEAD::rlcd(10,r)
#>             [,1]       [,2]
#>  [1,]  1.2587844 -1.0687963
#>  [2,] -1.3269509  0.3811847
#>  [3,]  0.2185162 -0.3643766
#>  [4,]  0.9975350 -0.8844245
#>  [5,]  1.7758723  0.4154826
#>  [6,] -1.3773275  0.5447273
#>  [7,] -1.3056512 -0.3790283
#>  [8,]  0.1893697  0.7384262
#>  [9,] -1.2040102 -0.4912608
#> [10,]  0.7663179  0.6162015LogConcDEADWe now estimate the log-concave density \(\hat{f}_n\) with maximal log likelihood using the package LogConcDEAD. Remember that their approach yields the optimal solution for the MLE problem. Comparing this estimate with the one from our package reveals that our optimization approach yielded near optimal results.
rCule <- LogConcDEAD::mlelcd(X)
par(mfrow = c(1, 2)) #square plots
plot(rCule, addp=  FALSE, asp = 1, main = "density [LogConcDEAD]")
plot(rCule, uselog = TRUE, addp = FALSE, asp = 1, main = "log density [LogConcDEAD]")We will now increase the sample size, in order to demonstrate the advantages of our packages: the very fast computation of \(\hat{f}_n\). To this end, we estimate the log-concave density for 500 data points in two dimensions using using both approaches.
set.seed(222)
X = matrix(rnorm(1000), 500, 2)
# time estimate for our approach
system.time(r <- fmlcd(X))
#>    user  system elapsed 
#>   1.956   0.056   0.675
# time estimate for the approach of Cule et al.
system.time(rCule <- mlelcd(X))
#>    user  system elapsed 
#>  23.652   0.188  24.414We see how the required runtime differs in a magnitude of more than ten (less without AVX enabled, see AVX). Run this example for larger sample sizes to see the ratio grow beyond ten thousand.
We now plot the estimated densities and see that they are almost identical:
r <- LogConcDEAD::getinfolcd(X, r$logMLE) # create a `LogConcDEAD` object
# plot bost estimates for comparison
par(mfrow=c(1, 2)) #square plots
plot(r, addp = FALSE, asp = 1, main="density")
plot(rCule, addp = FALSE, asp = 1, main="density [LogConcDEAD]")Finally, we demonstrate the ability of to estimate mixtures of log-concave densities. To this end, we sample data points from two normal densities with different parameters.
library(MASS)
set.seed(222)
X1 <- mvrnorm(200, c(0, 0), matrix(c(2, 1.5, 1.5, 2), 2, 2))
X2 <- mvrnorm(200, c(-2, 2), matrix(c(1, 0, 0, 1), 2, 2))
plot(X1[ ,1], X1[ ,2], col="red", pch = 20, xlab = "X", ylab = "Y")
points(X2[ ,1], X2[ ,2], col="blue", pch = 20)Estimate a mixture density having K=2 classes.
X <- rbind(X1,X2) # stack both data matrices
r <- fmlcdEM(X, K = 2)Finally, we evaluate the mixture density for a grid of points and plot the results.
# Create a grid of points for evaluation
x <- seq(min(X[ ,1]), max(X[ ,1]), 0.1)
y <- seq(min(X[ ,2]), max(X[ ,2]), 0.1)
m <- length(x); n <- length(y)
XEval = cbind(matrix(rep(x, each = n), ncol = 1), matrix(rep(y, m), ncol = 1))
nX = dim(XEval)[1]
# evaluate log-concave density component-wise
YA = exp(apply(- r$params[[1]]$a %*% t(XEval) - matrix(rep(r$params[[1]]$b,nX),length(r$params[[1]]$b),nX), 2, min))
YB = exp(apply(- r$params[[2]]$a %*% t(XEval) - matrix(rep(r$params[[2]]$b,nX),length(r$params[[2]]$b),nX), 2, min))
Y = YA * r$tau[1] + YB * r$tau[2]
contour(x,y,t(matrix(Y,n,m)), xlab = "X", ylab = "Y")
points(X1[ ,1], X1[ ,2], col="red", pch = 20, cex=.5)
points(X2[ ,1], X2[ ,2], col="blue", pch = 20, cex=.5)Cule, M., R. Samworth, and M. Stewart. 2010. “Maximum Likelihood Estimation of a Multi-Dimensional Log-Concave Density.” J. R. Stat. Soc. Series B Stat. Methodol. 72 (5). Wiley Online Library: 545–607.
Dümbgen, L., A. Hüsler, and K. Rufibach. 2007. “Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data.” CoRR abs/0707.4643.
Nocedal, Jorge, and Stephen Wright. 2006. Numerical Optimization. Springer Science & Business Media.
Rathke, Fabian, and Christoph Schnörr. 2015. “A Computational Approach to Log-Concave Density Estimation.” An. St. Univ. Ovidius Constanta 23 (3): 151–66.
Walther, Guenther, and others. 2009. “Inference and Modeling with Log-Concave Distributions.” Statistical Science 24 (3). Institute of Mathematical Statistics: 319–27.