--- title: SVD wrapper, PCA and the bi_projector output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SVD wrapper, PCA and the bi_projector} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} params: family: red css: albers.css resource_files: - albers.css - albers.js includes: in_header: |- --- ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4) library(multivarious) # Ensure pca(), svd_wrapper(), bi_projector, etc. are available library(ggplot2) # for plots below ``` # 1. Why wrap SVD at all? There are six popular SVD engines in R (base::svd, corpcor, RSpectra, irlba, rsvd, svd (PROPACK)) – each with its own argument list, naming conventions and edge-cases (some refuse to return the full rank, others crash on tall-skinny matrices). `svd_wrapper()` smooths that out: * identical call-signature no matter the backend, * automatic pre-processing (centre / standardise) via the same pipeline interface shown in the previous vignette, * returns a `bi_projector` – an S3 class that stores loadings `v`, scores `s`, singular values `sdev` plus the fitted pre-processor. That means immediate access to verbs such as `project()`, `reconstruct()`, `truncate()`, `partial_project()`. ```{r svd_wrapper_example} set.seed(1) X <- matrix(rnorm(35*10), 35, 10) # 35 obs × 10 vars sv_fast <- svd_wrapper(X, ncomp = 5, preproc = center(), method = "fast") # irlba backend (if installed) gives identical results sv_irlba <- if (requireNamespace("irlba", quietly = TRUE)) { suppressWarnings(svd_wrapper(X, ncomp = 5, preproc = center(), method = "irlba")) } # Same downstream code works for both objects: head(scores(sv_fast)) # 35 × 5 if (!is.null(sv_irlba)) { all.equal(scores(sv_fast), scores(sv_irlba)) } ``` # 2. A one-liner `pca()` Most people really want PCA, so `pca()` is a thin wrapper that 1. calls `svd_wrapper()` with sane defaults, 2. adds the S3 class "pca" (printing, screeplot, biplot, permutation test, …). ```{r pca_example} data(iris) X_iris <- as.matrix(iris[, 1:4]) pca_fit <- pca(X_iris, ncomp = 4) # defaults to method = "fast", preproc=center() print(pca_fit) ``` ## 2.1 Scree-plot and cumulative variance ```{r pca_screeplot} screeplot(pca_fit, type = "lines", main = "Iris PCA – scree plot") ``` ## 2.2 Quick biplot ```{r pca_biplot, warning=FALSE, message=FALSE} # Requires ggrepel for repulsion, but works without it biplot(pca_fit, repel_points = TRUE, repel_vars = TRUE) ``` (If you do not have ggrepel installed the text is placed without repulsion.) # 3. What is a `bi_projector`? Think bidirectional mapping: ``` data space (p variables) ↔ component space (d ≤ p) new samples: project() ← scores new variables: project_vars() ← loadings reconstruction ↔ (scores %*% t(loadings)) ``` A `bi_projector` therefore carries | slot | shape | description | |-----------|-------|----------------------------------------------------| | `v` | p × d | component loadings (columns) | | `s` | n × d | score matrix (rows = observations) | | `sdev` | d | singular values (or SDs related to components) | | `preproc` | – | fitted transformer so you never leak training stats | Because `pca()` returns a `bi_projector`, you get other methods for free: ```{r biprojector_methods} # rank-2 reconstruction of the iris data Xhat2 <- reconstruct(pca_fit, comp = 1:2) print(paste("MSE (rank 2):", round(mean((X_iris - Xhat2)^2), 4))) # MSE ~ 0.076 # drop to 2 PCs everywhere pca2 <- truncate(pca_fit, 2) shape(pca2) # 4 vars × 2 comps ``` # 4. Fast code-coverage cameo The next chunk quietly touches a few more branches used in the unit tests (`std_scores()`, `perm_test()`, `rotate()`), but keeps printing to a minimum: ```{r code_coverage} # std scores head(std_scores(svd_wrapper(X, ncomp = 3))) # Use the earlier X data # tiny permutation test (10 perms; obviously too few for science) # This requires perm_test.pca method # Make sure X_iris is centered if perm_test needs centered data perm_res <- perm_test(pca_fit, X_iris, nperm = 10, comps = 2) print(perm_res$component_results) # quick varimax rotation if (requireNamespace("GPArotation", quietly = TRUE)) { pca_rotated <- rotate(pca_fit, ncomp = 3, type = "varimax") print(pca_rotated) } else { cat("GPArotation not installed, skipping rotation example.\n") } ``` (Running these once in the vignette means they are also executed by `R CMD check`, bumping test-coverage without extra scaffolding.) # 5. Take-aways * `svd_wrapper()` gives you a unified front end to half-a-dozen SVD engines. * `pca()` piggy-backs on that, returning a fully featured `bi_projector`. * The `bi_projector` contract means the same verbs & plotting utilities work for any decomposition you wrap into the framework later. --- # Session info ```{r session_info_svd} sessionInfo() ```