When you already have a known set of basis functions—such as Fourier
components, wavelets, splines, or principal components from a reference
dataset—the goal isn’t to discover a latent space, but rather to express
new data in terms of that existing basis. The regress()
function facilitates this by wrapping several multi-output linear models
within the bi_projector API.
This integration means you automatically inherit standard
bi_projector methods for tasks like:
project(): Map new data from the original space into
the basis coefficients.inverse_projection(): Map basis coefficients back to
the original data space.reconstruct() / reconstruct_new():
Reconstruct data using all or a subset of basis components.coef(): Retrieve the basis coefficients.truncate(): Keep only a subset of basis
components.This vignette demonstrates a typical workflow using an orthonormal
Fourier basis, but the regress() function works equally
well with arbitrary, potentially non-orthogonal, basis dictionaries.
First, let’s define our basis. We’ll use sines and cosines.
set.seed(42)
n <- 128 # Number of observations (e.g., signals)
p <- 32 # Original variables per observation (e.g., time points)
k <- 20 # Number of basis functions (<= p often, <= n for lm)
## Create toy data: smooth signals + noise
t <- seq(0, 1, length.out = p)
Y <- replicate(n, 3*sin(2*pi*3*t) + 2*cos(2*pi*5*t) ) +
matrix(rnorm(n*p, sd = 0.3), p, n) # Note: Y is p x n here
## Orthonormal Fourier basis (columns = basis functions)
# We create k/2 sine and k/2 cosine terms, plus an intercept
freqs <- 1:(k %/% 2) # Integer division for number of frequencies
B <- cbind(rep(1, p), # Intercept column
do.call(cbind, lapply(freqs, function(f) sin(2*pi*f*t))),
do.call(cbind, lapply(freqs, function(f) cos(2*pi*f*t))))
colnames(B) <- c("Intercept", paste0("sin", freqs), paste0("cos", freqs))
# Make columns orthonormal (length 1, orthogonal to each other)
B <- scale(B, center = FALSE, scale = sqrt(colSums(B^2)))
cat(paste("Dimensions: Y is", nrow(Y), "x", ncol(Y),
", Basis B is", nrow(B), "x", ncol(B), "\n"))
#> Dimensions: Y is 32 x 128 , Basis B is 32 x 21
# We want coefficients C (k x n) such that Y ≈ B %*% C.Now, we use regress() to find the coefficients \(C\) that best represent each signal in
\(Y\) using the basis \(B\).
library(multivarious)
# Fit using standard linear models (lm)
# Y is p x n (32 x 128): 32 time points x 128 signals
# B is p x k (32 x 21): 32 time points x 21 basis functions
# regress will fit 128 separate regressions, each with 32 observations and 21 predictors
fit <- regress(X = B, # Predictors = basis functions (p x k)
Y = Y, # Response = signals (p x n)
method = "lm",
intercept = FALSE) # Basis B already includes an intercept column
# The result is a bi_projector object
print(fit)
#> Regression bi_projector object:
#> Method: lm
#> Input dimension: 128
#> Output dimension: 21
#> Coefficients dimension: 128 x 21
## Conceptual mapping to bi_projector slots:
# fit$v : Coefficients (n x k) - Basis coefficients for each signal.
# fit$s : Design Matrix (p x k) - The basis matrix B.
# Stored for reconstruction.The bi_projector structure provides a consistent way to
access the core components: the basis (fit$s) and the
coefficients (fit$v).
With the fitted bi_projector, we can move freely between
the original data space and the basis coefficient space. This section
demonstrates the key operations.
The coefficient matrix fit$v has dimensions \(n \times k\) (signals \(\times\) basis functions). Each row
contains the weights that express one signal as a linear combination of
basis functions.
coef_matrix_first3 <- fit$v[1:3, ]
cat("Coefficient matrix shape (first 3 signals):",
nrow(coef_matrix_first3), "x", ncol(coef_matrix_first3), "\n\n")
#> Coefficient matrix shape (first 3 signals): 3 x 21
cat("Coefficients for signal 1:\n")
#> Coefficients for signal 1:
print(coef_matrix_first3[1, ])
#> Intercept sin1 sin2 sin3 sin4 sin5
#> 0.135054901 0.587137215 -0.183816033 12.150464524 -0.007984793 -0.162281276
#> sin6 sin7 sin8 sin9 sin10 cos1
#> -0.593723657 0.581637564 -0.370023776 -0.108499186 0.173654284 0.394261726
#> cos2 cos3 cos4 cos5 cos6 cos7
#> -0.590122835 0.288835048 0.448031764 8.196393043 0.080690010 -0.612238653
#> cos8 cos9 cos10
#> 0.678547189 -0.030446715 0.088104811Notice that sin3 and cos5 have the largest
coefficients—this matches our generating function \(3\sin(2\pi \cdot 3t) + 2\cos(2\pi \cdot
5t)\).
Reconstruction recovers the original data from the basis
representation. The bi_projector stores both the design
matrix \(B\) (in fit$s)
and the coefficients (in fit$v). Reconstruction is simply
their product:
\[\hat{Y} = B \cdot C^T = \texttt{fit\$s} \times \texttt{t(fit\$v)}\]
Y_hat <- fit$s %*% t(fit$v)
max_reconstruction_error <- max(abs(Y_hat - Y))
cat("Reconstruction shape:", nrow(Y_hat), "x", ncol(Y_hat), "\n")
#> Reconstruction shape: 32 x 128
cat("Maximum reconstruction error:", format(max_reconstruction_error, digits=3), "\n")
#> Maximum reconstruction error: 0.7The error is non-zero because our signals contain random noise that cannot be captured by the smooth Fourier basis. This is expected and acceptable.
A key use case is expressing new observations in terms of the same basis. For an orthonormal basis \(B\), projection is straightforward:
\[\text{coefficients} = B^T \cdot Y_{\text{new}}\]
Let’s create a new signal with the same underlying pattern but different noise:
Y_new_signal <- 3*sin(2*pi*3*t) + 2*cos(2*pi*5*t) + rnorm(p, sd=0.3)
Y_new_matrix <- matrix(Y_new_signal, nrow = p, ncol = 1)
coef_new <- t(fit$s) %*% Y_new_matrix
cat("Basis coefficients for new signal:\n")
#> Basis coefficients for new signal:
print(coef_new[, 1])
#> Intercept sin1 sin2 sin3 sin4 sin5
#> -0.16459096 0.35921181 -0.09452109 11.90760497 0.37429968 -0.13899361
#> sin6 sin7 sin8 sin9 sin10 cos1
#> 0.44535459 -0.16769033 0.02891241 0.54613517 -0.41145969 0.53832581
#> cos2 cos3 cos4 cos5 cos6 cos7
#> 0.45371511 0.24009459 0.81766812 8.25333978 0.75410708 -0.42128770
#> cos8 cos9 cos10
#> 0.49377610 0.45618692 0.79379189Again, the sin3 and cos5 components
dominate, as expected.
Finally, we can reconstruct the new signal from its basis coefficients to see how well the basis captures the underlying structure:
Y_new_recon <- fit$s %*% coef_new
reconstruction_error <- sqrt(mean((Y_new_matrix - Y_new_recon)^2))
cat("Reconstruction RMSE for new signal:", format(reconstruction_error, digits=3), "\n")
#> Reconstruction RMSE for new signal: 0.362The reconstruction error reflects the noise that the basis cannot represent. Because \(B\) is orthonormal, projection and reconstruction are exact inverses for the signal components that lie within the basis span.
If the basis is ill-conditioned or you need feature
selection/shrinkage, simply change the method argument.
regress() wraps common regularized models.
# Ridge regression (requires glmnet)
fit_ridge <- regress(X = B, Y = Y, method = "mridge", lambda = 0.01, intercept = FALSE)
# Elastic Net (requires glmnet)
fit_enet <- regress(X = B, Y = Y, method = "enet", alpha = 0.5, lambda = 0.02, intercept = FALSE)
# Partial Least Squares (requires pls package) - useful if k > p or multicollinearity
fit_pls <- regress(X = B, Y = Y, method = "pls", ncomp = 15, intercept = FALSE)
# All these return bi_projector objects, so downstream code using
# project(), reconstruct(), coef() etc. remains the same.The bi_projector interface allows for flexible
manipulation:
# Truncate: Keep only the first 5 basis functions (Intercept + 2 sine + 2 cosine)
fit5 <- truncate(fit, ncomp = 5)
cat("Dimensions after truncating to 5 components:",
"Basis (s):", paste(dim(fit5$s), collapse="x"),
", Coefs (v):", paste(dim(fit5$v), collapse="x"), "\n")
#> Dimensions after truncating to 5 components: Basis (s): 32x5 , Coefs (v): 128x5
# Reconstruction using only first 5 basis functions (manual)
# Equivalent to: scores(fit5) %*% t(coef(fit5)) for the selected components
Y_hat5 <- fit5$s %*% t(fit5$v)
# Partial inverse projection: Map only a subset of coefficients back
# e.g., reconstruct using only components 2 through 6 (skip intercept)
# Note: partial_inverse_projection is not a standard bi_projector method,
# this might require manual slicing of the basis matrix B (fit$s) or coefs (fit$v).
# Manual reconstruction example for components 2:6
coef_subset <- fit$v[2:6, , drop=FALSE] # k_sub x n
basis_subset <- fit$s[, 2:6, drop=FALSE] # p x k_sub
Y_lowHat <- basis_subset %*% coef_subset # p x n reconstruction
# Variable usage helpers (Conceptual - actual functions might differ)
# `variables_used(fit)` could show which basis functions have non-zero coefficients (esp. for 'enet').
# `vars_for_component(fit, k)` isn't directly applicable here as components are the basis functions themselves.The core idea is to represent the \(p
\times n\) data matrix \(Y\) as
a product of the \(p \times k\) basis
matrix (stored in s) and the \(k
\times n\) coefficient matrix (stored in v): \[ \underbrace{Y}_{p\times n} \approx
\underbrace{s}_{p\times k} \underbrace{v}_{k\times n} \]
regress() estimates \(v\) (coefficients) based on the chosen
method:
method |
Solver Used (Conceptual) | Regularisation | Key Reference |
|---|---|---|---|
| “lm” | QR decomposition (lm.fit) |
None | Classical OLS |
| “mridge” | glmnet (alpha=0) |
Ridge (\(\lambda ||\beta||_2^2\)) | Hoerl & Kennard 1970 |
| “enet” | glmnet |
Elastic Net (\(\alpha\)-mix) | Zou & Hastie 2005 |
| “pls” | pls::plsr |
Latent PLS factors | Wold 1984 |
The resulting object stores: * v: The estimated
coefficient matrix (\(k \times n\)). *
s: The basis/design matrix \(B\) (\(p \times
k\)), possibly centered or scaled by the underlying solver. *
sdev, center: Potentially stores
scaling/centering info related to \(s\).
All other bi_projector methods (project,
reconstruct, inverse_projection) are derived
from these core matrices.
This section contains internal consistency checks, primarily for package development and CI testing.
The code chunk below only runs if the environment variable
_MULTIVARIOUS_DEV_COVERAGE is set.
# This chunk only runs if _MULTIVARIOUS_DEV_COVERAGE is non-empty
message("Running internal consistency checks for regress()...")
#> Running internal consistency checks for regress()...
tryCatch({
stopifnot(
# Check reconstruction fidelity for lm
max(abs(reconstruct(fit) - Y)) < 1e-10,
# Check dimensions of inverse projection matrix (n x p)
# inverse_projection maps coefficients (k x n) back to data (p x n)
# The matrix itself maps k -> p implicitly. Let's check coef matrix dims.
nrow(fit$v) == ncol(B), # k rows
ncol(fit$v) == ncol(Y) # n columns
# Add checks for other methods if evaluated
)
message("Regress internal checks passed.")
}, error = function(e) {
warning("Regress internal checks failed: ", e$message)
})
#> Warning in value[[3L]](cond): Regress internal checks failed:
#> max(abs(reconstruct(fit) - Y)) < 1e-10 is not TRUEregress() provides a convenient way to fit multiple
linear models simultaneously when expressing data \(Y\) in a known basis \(B\).bi_projector, giving immediate access to
projection, reconstruction, truncation, and coefficient extraction.lm), Ridge
(mridge), Elastic Net (enet), and PLS
(pls) out-of-the-box.%>>%) or cross-validation helpers.Happy re-representing!