When preparing a secsse analysis, it can be daunting to prepare the different required matrices and settings in order to be able to perform a meaningful analysis. Starting with secsse package version 2.6, there are now general helper functions available that can prepare all matrices for some general cases. Often, these general cases can already be applicable, alternatively, they can be modified later on to better reflect the intricacies of the specific studied system.
To perform a secsse analysis, we want to use maximum likelihood to find the most likely values for our parameters, given a phylogenetic tree and tip states. To do so, secsse requires the user to specify how speciation changes the state of the daughter species in relation to the parent species, and requires the user to specify the number of unique speciation rates to be fitted. Here, we will explore a basic example.
Now, we can use our settings to perform an analysis. Because we are lacking empirical data in this example, we will simulate a tree for this. To do so, we first need to specify our focal rates, and then fill them in.
speciation <- 0.5
extinction <- 0.0
sp_sn <- 0.2
sp_ns <- 0.2
q_ab <- 0.5
q_ba <- 0.5
params <- c(speciation,
extinction,
sp_sn, sp_ns,
q_ab, q_ba)
lambda_matrices_p <- secsse::fill_in(lambda_matrices,
params)
trans_matrix_p <- secsse::fill_in(trans_matrix,
params)
mus_p <- secsse::fill_in(mus,
params)With the values replaced, we can now simulate an “empirical” dataset:
simulated_tree <- secsse::secsse_sim(lambdas = lambda_matrices_p,
mus = mus_p,
qs = trans_matrix_p,
num_concealed_states = num_hidden_states,
crown_age = 5,
conditioning = "obs_states",
verbose = TRUE,
seed = 26)
sim_traits <- simulated_tree$obs_traits
focal_tree <- simulated_tree$phyGiven this data, we can now perform our maximum likelihood analysis. Here, we choose to initialize our parameters with random values in [0, 1], we use multithreading to speed up the analysis, and use the subplex optimization method, as this has shown to be more reliable.
param_posit <- list()
param_posit[[1]] <- lambda_matrices
param_posit[[2]] <- mus
param_posit[[3]] <- trans_matrix
initpars <- params
initpars <- initpars[-2]
answ <- secsse::cla_secsse_ml(phy = focal_tree,
traits = sim_traits,
num_concealed_states = num_hidden_states,
idparslist = param_posit,
idparsopt = c(1, 3, 4, 5, 6),
initparsopt = initpars,
idparsfix = c(0, 2),
parsfix = c(0.0, 0.0),
sampling_fraction = c(1, 1),
optimmethod = "subplex",
verbose = FALSE,
num_threads = 6,
atol = 0.1, # high values for demonstration
rtol = 0.1) # purposes, don't use at home!## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = sim_traits, : Note:
## you set some transitions as impossible to happen.
We can now extract our parameters to get them in the right place:
found_pars_vals <- secsse::extract_par_vals(param_posit, answ$MLpars)
found_pars_vals## [1] 0.6105537 0.0000000 0.1472296 0.1313448 0.2067287 0.7870417
We have done this now only for the CR model, but we can also use the CTD and ETD model. Let’s do that semi-automagically! We first define a generic function to optimize for a model:
fit_model <- function(focal_tree, traits, model) {
focal_list <- secsse::create_default_lambda_list(state_names = used_states,
model = model)
lambda_matrices <- secsse::create_lambda_matrices(state_names = used_states,
num_concealed_states = num_hidden_states,
transition_list =
focal_list,
model = model)
mus <- secsse::create_mus(state_names = used_states,
num_concealed_states = num_hidden_states,
model = model,
lambdas = lambda_matrices)
q_list <- secsse::create_default_q_list(state_names = used_states,
num_concealed_states = num_hidden_states,
mus = mus)
trans_matrix <- secsse::create_transition_matrix(state_names = used_states,
num_concealed_states = num_hidden_states,
transition_list = q_list,
diff.conceal = TRUE)
param_posit <- list()
param_posit[[1]] <- lambda_matrices
param_posit[[2]] <- mus
param_posit[[3]] <- trans_matrix
max_indicator <- max(trans_matrix, na.rm = TRUE)
# we cheat a bit by setting extinction to zero -
# in a real analysis this should be avoided.
extinct_rates <- unique(mus)
idparsopt <- 1:max_indicator
idparsopt <- idparsopt[-extinct_rates]
idparsfix <- c(0, extinct_rates)
parsfix <- rep(0.0, length(idparsfix))
initpars <- c(rep(params[1], min(extinct_rates) - 1),
params[-c(1, 2)])
answ <- secsse::cla_secsse_ml(phy = focal_tree,
traits = traits,
num_concealed_states = num_hidden_states,
idparslist = param_posit,
idparsopt = idparsopt,
initparsopt = initpars,
idparsfix = idparsfix,
parsfix = parsfix,
sampling_fraction = c(1, 1),
optimmethod = "subplex",
verbose = FALSE,
num_threads = 6,
atol = 0.1, # high values for demonstration
rtol = 0.1) # purposes, don't use at home!
found_pars_vals <- secsse::extract_par_vals(param_posit, answ$MLpars)
aic <- 2 * max_indicator - 2 * as.numeric(answ$ML)
return(list(pars = found_pars_vals,
ml = as.numeric(answ$ML),
aic = aic))
}And then we can loop over the different models:
found <- c()
for (focal_model in c("CR", "CTD", "ETD")) {
local_answ <- fit_model(focal_tree = focal_tree,
traits = sim_traits,
model = focal_model)
found <- rbind(found, c(focal_model, local_answ$ml, local_answ$aic))
}## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = traits,
## num_concealed_states = num_hidden_states, : Note: you set some transitions as
## impossible to happen.
## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = traits,
## num_concealed_states = num_hidden_states, : Note: you set some transitions as
## impossible to happen.
## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = traits,
## num_concealed_states = num_hidden_states, : Note: you set some transitions as
## impossible to happen.
colnames(found) <- c("model", "LL", "AIC")
found <- as.data.frame(found)
found$LL <- as.numeric(found$LL)
found$AIC <- as.numeric(found$AIC)
found## model LL AIC
## 1 CR -128.1962 268.3923
## 2 CTD -127.8295 271.6590
## 3 ETD -127.9006 271.8012
Because we have simulated the tree using the CR model, we expect the model with the lowest AIC to be the CR model again, and indeed we do find this!