
Branch-level Inference Framework for Recognizing Optimal Shifts in Traits
bifrost performs branch-level inference of multi-regime,
multivariate trait evolution on a phylogeny using penalized-likelihood
multivariate GLS fits. The current version searches for evolutionary
model shifts under a multi-rate Brownian Motion (BMM) model with
proportional regime VCV scaling, operating directly in trait space
(e.g., no PCA), and is designed for high-dimensional datasets (p > n)
and large trees (> 1000 tips). The method will work with fossil
tip-dated trees, and will accept most forms of multivariate comparative
data (e.g., GPA aligned morphometric coordinates, linear dimensions, and
others). The next major release will enable usage of the multivariate scalar
Ornstein–Uhlenbeck process.
mvMORPH::mvgls), supporting p ≳ n.future ecosystem; practical on thousands of taxa ×
traits.future /
future.apply.📄 Vignette 1: Getting Started with bifrost
# install.packages("remotes")
remotes::install_github("jakeberv/bifrost")Windows users:
Install Rtools for
your R version and ensure it is added to your system PATH.
macOS users:
You may need to install XQuartz
to build or run packages that depend on certain graphical or system
libraries.
library(bifrost)
library(ape)
library(phytools)
library(mvMORPH)
set.seed(1)
# Simulate a tree
tr <- pbtree(n = 50, scale = 1)
# Paint a single global baseline state "0" (single regime)
base <- phytools::paintBranches(
tr,
edge = unique(tr$edge[, 2]),
state = "0",
anc.state = "0"
)
# Simulate multivariate traits under a single-regime BM1 model (no shifts)
sigma <- diag(0.1, 2) # 2×2 variance–covariance matrix for two traits
theta <- c(0, 0) # ancestral means for the two traits
sim <- mvSIM(
tree = base,
nsim = 1,
model = "BM1",
param = list(
ntraits = 2,
sigma = sigma,
theta = theta
)
)
# mvSIM returns either a matrix or a list of matrices depending on mvMORPH version
X <- if (is.list(sim)) sim[[1]] else sim
rownames(X) <- base$tip.label
# Run bifrost's greedy search for shifts
res <- searchOptimalConfiguration(
baseline_tree = base,
trait_data = X,
formula = "trait_data ~ 1",
min_descendant_tips = 10,
num_cores = 1,
shift_acceptance_threshold = 20, # conservative GIC threshold
IC = "GIC",
plot = FALSE,
store_model_fit_history = FALSE,
verbose = FALSE # set TRUE for progress messages
)
# For this single-regime BM1 simulation, we typically expect no inferred shifts:
res$shift_nodes_no_uncertainty # typically integer(0)
res$optimal_ic - res$baseline_ic # typically close to 0
str(res$VCVs) # regime-specific VCVs (here just the baseline regime "0")rownames(trait_data) must match tree$tip.label
(same order and names).phytools class
simmap)mvgls (mvMORPH) options
for your data.shift_acceptance_threshold and
ic_uncertainty_threshold to limit false positives; explore
sensitivity.flowchart TD
subgraph S[Setup]
A([Start]) --> B[Init and validate]
B --> C[Paint baseline state 0]
C --> D[Generate one-shift candidates]
end
subgraph CS[Baseline and scoring]
D --> E[Fit baseline mvgls]
E --> F[Baseline IC GIC or BIC]
F --> G[Score candidates in parallel]
G --> H[Compute delta IC and sort]
end
subgraph GS[Greedy search]
H --> I[Init best tree and IC]
I --> J{More candidates?}
J -- Yes --> K[Add shift]
K --> L[Fit shifted model]
L --> M[Delta IC best minus new]
M --> N{Delta IC >= threshold?}
N -- Yes --> O[Accept update best]
N -- No --> P[Reject keep best]
O --> Q[Record status]
P --> Q
Q --> J
end
subgraph PP[Post processing]
J -- No --> U[Finalize best model]
U --> V{Compute IC weights?}
V -- No --> V0[Skip weights]
V -- Yes --> W{Any shifts?}
W -- No --> V0
W -- Yes --> X{Weights parallel?}
X -- Yes --> X1[Drop-one refits parallel]
X -- No --> X2[Drop-one refits serial]
X1 --> Y[Compute weights aicw and ER]
X2 --> Y
end
subgraph OUT[Output]
V0 --> Z[Assemble result]
Y --> Z
Z --> ZA[Extract regime VCVs]
ZA --> ZB([Return])
end
searchOptimalConfiguration(): The main function for
end-to-end greedy search: candidate generation → parallel fitting →
iterative acceptance → optional pruning/IC weights.plot_ic_acceptance_matrix(): Visualize shift acceptance
and information-criterion (IC) differences across search
iterations.generatePaintedTrees()fitMvglsAndExtractGIC(),
fitMvglsAndExtractBIC(), and formula variants.calculateAllDeltaGIC()paintSubTree_mod(), addShiftToModel(),
removeShiftFromTree(),
paintSubTree_removeShift(), whichShifts()extractRegimeVCVs()searchOptimalConfiguration() returns a comprehensive
list containing:
user_input — All arguments passed to
searchOptimalConfiguration(), including the tree, trait
data, IC choice, thresholds, and other parameters for
reproducibility.tree_no_uncertainty_transformed —
Optimal SIMMAP tree with accepted shifts, using transformed branch
lengths (if a branch-length transformation was applied).tree_no_uncertainty_untransformed —
Same optimal SIMMAP tree but with original, untransformed branch
lengths.model_no_uncertainty — Final fitted
mvgls model object (BM or multi-rate BMM), including
estimated parameters, log-likelihood, and variance–covariance
matrices.shift_nodes_no_uncertainty — Node
numbers corresponding to accepted evolutionary shifts.optimal_ic — Information criterion
(IC) value for the optimal model.baseline_ic — IC value for the null
(single-rate) baseline model.IC_used — The information criterion
applied ("GIC" or "BIC").num_candidates — Total number of
candidate models evaluated during the search.model_fit_history — Per-iteration log
of model fits, IC values, and acceptance decisions; useful for
visualizing or debugging search behavior.VCVs — Regime-specific
penalized-likelihood variance–covariance matrices.ic_weights — Data frame of per-shift
IC weights and evidence ratios (if
uncertaintyweights_par = TRUE), providing support values
for individual shifts.Enable parallel processing using the future package:
library(future)
plan(multisession) # or multicore on Linux/macOSplot = FALSE) for
large trees.p.min_descendant_tips
and stricter IC thresholds on very large problems.sessionInfo() and the
mvMORPH version.renv to lock package
versions.Though bifrost was initially developed as a framework
for inferring macroevolutionary regime shifts in multivariate trait
data, it can also be applied to perform multivariate phylogenetic
generalized least squares (pGLS) analyses with factors or continuous
predictors (e.g., cbind(trait1, trait2, ...) ~ predictor,
or "trait_data[, 1:5] ~ trait_data[, 6]" when working
directly with a matrix). In this context, bifrost
identifies branch-specific rate variation under a multi-rate Brownian
Motion model and fits the pGLS conditional on the resulting residual
(phylogenetic) covariance structure, so estimated effect sizes and
uncertainties account for “hidden” rate variation not explained by the
predictors. This is conceptually similar to hidden-state approaches
(e.g., Boyko et
al. 2023), except that here the regimes influence variance and
evolutionary rate rather than introducing regime-specific means. This
use case is an active area of ongoing methodological development.
If you use bifrost, please cite the references returned
by:
citation("bifrost")Bug reports, feature requests, and pull requests are welcome. Please open an issue at https://github.com/jakeberv/bifrost/issues.
This project is released under the GPL >= 2 License. See the
LICENSE file for details.
bifrost builds on substantial work from
mvMORPH, phytools, ape,
future, and future.apply. The greedy search
algorithm is adapted from Mitov et al
2019 and Smith
et al 2023. See the DESCRIPTION file for complete
dependency and version information.
The name of our R package is inspired by the Bifröst, the rainbow bridge of Norse mythology that connects Earth (Midgard) and Asgard within the cosmic structure of Yggdrasil, the Tree of Life, echoing how our framework links observable data to hidden evolutionary shifts across the history of life.
Development of the bifrost R package was supported by
the Oxford
Research Software Engineering Group, with support from Schmidt Sciences,
LLC. and the Michigan Institute
for Data Science and AI in Society.