Multivariate data analysis often involves reducing dimensionality or transforming data using techniques like Principal Component Analysis (PCA), Partial Least Squares (PLS), Contrastive PCA (cPCA), Nyström approximation for Kernel PCA, or representing data in a specific basis (e.g., Fourier, splines). While each method has unique mathematical underpinnings, they share common operational needs:
Handling these tasks consistently across different algorithms can
lead to repetitive code and complex workflows. The
multivarious package aims to simplify this by
providing a unified interface centered around the concept of a
bi_projector.
bi_projector: A Two-Way MapThe bi_projector class is the cornerstone of
multivarious. It represents a linear transformation (or an
approximation thereof) that provides a two-way
mapping:
Think of it as encapsulating the core results of a dimensionality reduction technique (like the U, S, V components of an SVD, or the scores and loadings of PCA/PLS) along with any necessary pre-processing information.
Crucially, many functions within multivarious (e.g.,
pca(), pls(), cPCAplus(),
nystrom_approx(), regress()) return objects
that inherit from bi_projector.
bi_projectorBecause different methods return a bi_projector, you can
perform common tasks using a consistent set of verbs:
scores(model): Get the scores (latent space
representation) of the training data.coef(model) or loadings(model): Get the
loadings or coefficients mapping variables to components.project(model, newdata): Project new samples
(rows of newdata) into the latent space defined by the
model.reconstruct(model, ...): Reconstruct an approximation
of the original data from the latent space (either from training scores
or provided new scores/coefficients).truncate(model, ncomp): Reduce the number of components
kept in the model.summary(model): Get a concise summary of the model
dimensions.This consistent API simplifies writing generic analysis code and makes it easier to swap between different dimensionality reduction methods.
Let’s demonstrate a typical workflow using PCA on the classic
iris dataset.
# Load iris dataset and select numeric columns
data(iris)
X <- as.matrix(iris[, 1:4])
# 1. Define a pre-processor (center the data)
preproc <- center()
# 2. Fit PCA using svd_wrapper, keeping 3 components
# The pre-processor is applied internally.
fit <- pca(X, ncomp = 3, preproc = preproc)
# The result 'fit' is a bi_projector
print(fit)
#> PCA object -- derived from SVD
#>
#> Data: 150 observations x 4 variables
#> Components retained: 3
#>
#> Variance explained (per component):
#> 1 2 3 92.95 5.33 1.72% (cumulative: 92.95 98.28 100%)
# 3. Access results
iris_scores <- scores(fit) # Scores of the centered training data (150 x 3)
iris_loadings <- loadings(fit) # Loadings (4 x 3)
cat("\nDimensions of Scores:", dim(iris_scores), "\n")
#>
#> Dimensions of Scores: 150 3
cat("Dimensions of Loadings:", dim(iris_loadings), "\n")
#> Dimensions of Loadings:
# 4. Project new data
# Create some new iris-like samples (5 samples, 4 variables)
set.seed(123)
new_iris_data <- matrix(rnorm(5 * 4, mean = colMeans(X), sd = apply(X, 2, sd)),
nrow = 5, byrow = TRUE)
# Project the new data into the PCA space defined by 'fit'
# Pre-processing (centering using training data means) is applied automatically.
projected_new_scores <- project(fit, new_iris_data)
cat("\nDimensions of Projected New Data Scores:", dim(projected_new_scores), "\n")
#>
#> Dimensions of Projected New Data Scores: 5 3
print(head(projected_new_scores))
#> [,1] [,2] [,3]
#> [1,] -2.2172144 0.8590909 -0.44924532
#> [2,] -0.3270495 -0.5478369 0.07965279
#> [3,] -1.7602954 0.9106117 -0.52932939
#> [4,] 0.2367242 -0.3204326 -0.50433574
#> [5,] -1.1529598 0.5426518 0.85478044
# 5. Reconstruct approximated original data from scores
# Reconstruct the first few original samples
reconstructed_X_approx <- reconstruct(fit, comp=1:3) # uses scores(fit) by default
cat("\nReconstructed Approximation of Original Data (first 5 rows):\n")
#>
#> Reconstructed Approximation of Original Data (first 5 rows):
print(head(reconstructed_X_approx))
#> [,1] [,2] [,3] [,4]
#> [1,] 5.099286 3.500723 1.401086 0.1982949
#> [2,] 4.868758 3.031661 1.447517 0.1253679
#> [3,] 4.693700 3.206384 1.309582 0.1849507
#> [4,] 4.623843 3.075837 1.463736 0.2569583
#> [5,] 5.019326 3.580414 1.370606 0.2461680
#> [6,] 5.407635 3.892262 1.688387 0.4182392
print(head(X)) # Original data for comparison
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] 5.1 3.5 1.4 0.2
#> [2,] 4.9 3.0 1.4 0.2
#> [3,] 4.7 3.2 1.3 0.2
#> [4,] 4.6 3.1 1.5 0.2
#> [5,] 5.0 3.6 1.4 0.2
#> [6,] 5.4 3.9 1.7 0.4This example shows how fitting (pca), accessing results
(scores, loadings), and applying the model to
new data (project) follow a consistent pattern, regardless
of whether the underlying method was PCA, PLS, or another technique
returning a bi_projector.
multivarious
EcosystemThe unified bi_projector interface enables several
powerful features within the package:
vignette("PreProcessing")).bi_projector steps together (e.g., pre-processing → PCA →
rotation) into a single composite projector (see
vignette("Composing_Projectors")).bi_projector structure (see
vignette("CrossValidation")).project_vars)While project() operates on new samples (rows), the
bi_projector also supports projecting new
variables (columns) into the component space defined by the
model’s scores (U vectors in SVD terms). This is done using
project_vars().
# Using the 'fit' object from the PCA example above
# Create a new variable (column) with the same number of samples as original data
set.seed(456)
new_variable <- rnorm(nrow(X))
# Project this new variable into the component space defined by the PCA scores (fit$s)
# Result shows how the new variable relates to the principal components.
projected_variable_loadings <- project_vars(fit, new_variable)
cat("\nProjection of new variable onto components:", projected_variable_loadings, "\n")
#>
#> Projection of new variable onto components: 0.0003082567 -0.0004245081 -0.0003111904The multivarious package provides a consistent and
extensible framework for common dimensionality reduction and related
linear transformation tasks. By leveraging the bi_projector
class, it offers a unified API for fitting models, projecting new data,
reconstruction, and accessing key model components. This simplifies
workflows, promotes code reuse, and facilitates integration with
pre-processing, model composition, and cross-validation tools within the
package ecosystem.