In this tutorial, we will be fitting boosted regression tree SDMs to our dataset, whilst accounting for spatial and temporal autocorrelation.
The covariate data frame generated in the second tutorial can be imported from your project directory.
project_directory <- file.path(file.path(tempdir(), "dynamicSDM_vignette"))
# project_directory<-"your_path_here"
dir.create(project_directory)
#> Warning in dir.create(project_directory):
#> 'C:\Users\eerdo\AppData\Local\Temp\RtmpWoz8Su\dynamicSDM_vignette' already
#> exists
#sample_explan_data <- read.csv(paste0(project_directory, "/extracted_quelea_occ.csv"))Or alternatively, you can run the code below to read the pre-extracted data into your R environment from the dynamicSDM package.
Autocorrelation is when explanatory variable data for species records taken closer in space and time are more similar to each other than to those of records more distantly sampled. When species distribution modelling with spatiotemporally dynamic explanatory variables, spatial and temporal autocorrelation can impact model performance.
Run the code below to test for spatial and temporal autocorrelation in the extracted explanatory variable data using spatiotemp_autocorr(). This function can also generate a temporal autocorrelation plot for each variable.
variablenames<-c("eight_sum_prec","year_sum_prec","grass_crop_percentage")
autocorrelation <- spatiotemp_autocorr(sample_explan_data,
                                       varname = variablenames,
                                       temporal.level = c("year")) # can choose month or day too
autocorrelation
#> $eight_sum_prec
#> $eight_sum_prec$Temporal_autocorrelation
#> $eight_sum_prec$Temporal_autocorrelation$year
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  first_obs and second_obs
#> t = 1.9975, df = 14, p-value = 0.06558
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.03229165  0.78370005
#> sample estimates:
#>       cor 
#> 0.4709523 
#> 
#> 
#> 
#> $eight_sum_prec$Spatial_autocorrelation
#>      observed     expected          sd      p.value
#> 1 -0.01417411 -0.003039514 0.001844916 1.586784e-09
#> 
#> 
#> $year_sum_prec
#> $year_sum_prec$Temporal_autocorrelation
#> $year_sum_prec$Temporal_autocorrelation$year
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  first_obs and second_obs
#> t = 0.71217, df = 14, p-value = 0.4881
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.3402656  0.6247751
#> sample estimates:
#>       cor 
#> 0.1869775 
#> 
#> 
#> 
#> $year_sum_prec$Spatial_autocorrelation
#>      observed     expected          sd p.value
#> 1 -0.09107117 -0.003039514 0.001845888       0
#> 
#> 
#> $grass_crop_percentage
#> $grass_crop_percentage$Temporal_autocorrelation
#> $grass_crop_percentage$Temporal_autocorrelation$year
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  first_obs and second_obs
#> t = 1.2466, df = 14, p-value = 0.233
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.2129836  0.7018300
#> sample estimates:
#>      cor 
#> 0.316094 
#> 
#> 
#> 
#> $grass_crop_percentage$Spatial_autocorrelation
#>     observed     expected          sd p.value
#> 1 -0.1581672 -0.003039514 0.001848495       0One approach to account for autocorrelation when species distribution modelling is to split records into sampling units based upon spatial and temporal factors, and then group units into separate blocks so that the mean and standard deviation of ecoclimatic variables are roughly equal across blocks. SDMs are then fitted in a jack-knife approach, leaving out each block in-turn to use as the test dataset.
spatiotemp_block() splits occurrence records into sampling units by spatial categories (e.g. ecoregions or biome) and temporal units (e.g. month or year of record), before blocking the data. As some spatial categories can be very large, the argument spatial.split.degrees can be used to split large contiguous regions into smaller units.
In our case study, we use a raster of various biomes in southern Africa and split large biomes by 3 degree cells. The function has returned the original data frame with an additional column containing the block numbers that each record belongs to.
There are many SDM approaches to modelling the relationships between species occurrence and associated explanatory variables. dynamicSDM includes the function brt_fit() to fit boosted regression tree models to training and test data. There are arguments to specify the blocks to split data by (block.col, function returns a list of models the length of unique blocks) and to weight records by spatiotemporal sampling effort (weights.col, see the Stage 1 tutorial).
sample_explan_data$weights <- (1 - sample_explan_data$REL_SAMP_EFFORT)
models <- brt_fit(sample_explan_data,
                  response.col = "presence.absence",
                  varnames = variablenames,
                  block.col = "BLOCK.CATS",
                  weights.col = "weights",
                  distribution = "bernoulli",
                  interaction.depth = 2)If you want to use other modelling approaches, this data frame of response and explanatory variable data can easily be input into SDM functions in other packages.
At the end of this vignette, we now have our fitted species distribution models for each spatiotemporal block. Let’s save this list of models to our project directory for use in the next tutorial!