--- title: "LAPACK Decompositions with bigalgebra" author: "Frédéric Bertrand" date: "`r format(Sys.Date())`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{LAPACK Decompositions with bigalgebra} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(bigalgebra) library(bigmemory) ``` ## Overview Beyond BLAS-level helpers, `bigalgebra` ships several wrappers around [LAPACK](https://www.netlib.org/lapack/) routines so that factorisations can run against in-memory `matrix` objects or file-backed [`bigmemory::big.matrix`] containers without changing workflows. This vignette highlights the four decompositions currently provided: * `dgeqrf()` — QR factorisation driven by Householder reflectors. * `dpotrf()` — Cholesky factorisation of symmetric positive definite matrices. * `dgeev()` — real eigenvalues and (optionally) eigenvectors of general matrices. * `dgesdd()` — divide-and-conquer singular value decomposition (SVD). Each example illustrates how to prepare workspace arguments, inspect the results, and clean up temporary `big.matrix` files when necessary. ## QR factorisation with `dgeqrf()` The `dgeqrf()` helper overwrites its input with the R factor in the upper triangle while storing Householder reflector coefficients in a user-supplied `TAU` vector. Supplying both `TAU` and `WORK` explicitly makes it easy to inspect the outputs afterwards. ```{r} hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") } H <- hilbert(4) H_big <- as.big.matrix(H) TAU <- matrix(0, nrow = min(nrow(H), ncol(H))) WORK <- matrix(0, nrow = max(1, ncol(H))) dgeqrf(A = H_big, TAU = TAU, WORK = WORK) # Extract the R factor from the overwritten big.matrix R_big <- H_big[,] R_big[lower.tri(R_big)] <- 0 R_big # Compare against base R's QR decomposition all.equal(R_big, qr.R(qr(H))) TAU ``` When the input is file-backed, the updates are written directly to disk and the factorisation scales to matrices that exceed available RAM. ```{r} tmp <- tempfile() H_fb <- filebacked.big.matrix(nrow(H), ncol(H), init = H, backingfile = basename(tmp), backingpath = dirname(tmp)) dgeqrf(A = H_fb) H_fb[,][upper.tri(H_fb[,], diag = TRUE)] rm(H_fb); gc() ``` ## Cholesky factorisation with `dpotrf()` `dpotrf()` computes an in-place Cholesky factorisation. The helper returns `0` when the matrix is positive definite and leaves the result in the selected triangle (upper by default). ```{r} set.seed(42) A <- matrix(rnorm(9), 3) SPD <- crossprod(A) # symmetric positive definite SPD_big <- as.big.matrix(SPD) info <- dpotrf(UPLO = "U", A = SPD_big) info U <- SPD_big[,] U[lower.tri(U)] <- 0 all.equal(U, chol(SPD)) ``` If the input matrix is not positive definite, the return value indicates the leading minor that failed so you can diagnose numerical issues before proceeding with downstream solves. ## Eigenvalues and eigenvectors via `dgeev()` `dgeev()` wraps LAPACK's general eigen solver and accepts optional storage for left and right eigenvectors. By default the helper queries LAPACK for an optimal workspace size, making small examples straightforward. ```{r} set.seed(123) M <- matrix(rnorm(16), 4) WR <- matrix(0, nrow = ncol(M)) WI <- matrix(0, nrow = ncol(M)) VL <- matrix(0, nrow = nrow(M), ncol = ncol(M)) dgeev(A = M, WR = WR, WI = WI, VL = VL) # Compare eigenvalues with base R complex_eigs <- WR[, 1] + 1i * WI[, 1] all.equal(sort(complex_eigs), sort(eigen(M)$values)) ``` For large matrices stored on disk you can pass `big.matrix` containers for `A` and (optionally) `VL`/`VR`. The wrapper automatically converts between big storage and native `matrix` arguments as required. ## Singular value decomposition with `dgesdd()` The divide-and-conquer SVD routine `dgesdd()` requires workspace for the singular values and (optionally) the left and right singular vectors. Providing `big.matrix` containers lets you persist decompositions to disk without copying through R's heap. ```{r} set.seed(101) X <- matrix(rnorm(12), 4) S <- matrix(0, nrow = min(dim(X))) U <- matrix(0, nrow = nrow(X), ncol = nrow(X)) VT <- matrix(0, nrow = ncol(X), ncol = ncol(X)) dgesdd(A = X, S = S, U = U, VT = VT) svd_base <- svd(X) all.equal(drop(S), svd_base$d) all.equal(U, svd_base$u) all.equal(VT, t(svd_base$v)) ``` Because `dgesdd()` accepts both in-memory and file-backed matrices, it enables scalable dimensionality reduction pipelines directly on datasets that are larger than available RAM.