% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/spatPredict.R
\name{spatPredict}
\alias{spatPredict}
\title{Predict spatial variables using machine learning}
\usage{
spatPredict(
  features,
  outcome,
  poly_sample = 1000,
  trainControl,
  methods,
  fastCompare = TRUE,
  thinFeatures = TRUE,
  predict = FALSE,
  n.cores = NULL,
  save_path = NULL
)
}
\arguments{
\item{features}{Independent variables. Must be either a NAMED list of terra spatRasters or a multi-layer (stacked) spatRaster (c(rast1, rast2). All layers must all have the same cell size, alignment, extent, and crs. These rasters should include the training extent (that covered by the spatVector in \code{outcome}) as well as the desired extrapolation extent.}

\item{outcome}{Dependent variable, as a terra spatVector of points or polygons with a single attribute table column (of class integer, numeric or factor). The class of this column dictates whether the problem is approached as a classification or regression problem; see details. If specifying polygons, stratified random sampling will be done with \code{poly_sample} number of points per unique polygon value.}

\item{poly_sample}{If passing a polygon SpatVector to \code{outcome}, the number of points to generate from the polygons for each unique polygon value.}

\item{trainControl}{Parameters used to control training of the machine learning model, created with \code{\link[caret:trainControl]{caret::trainControl()}}. Passed to the \code{trControl} parameter of \code{\link[caret:train]{caret::train()}}. If specifying multiple model types in \code{methods} you can use a single \code{trainControl} which will apply to all \code{methods}, or pass multiple variations to this argument as a list with names matching the names of \code{methods} (one element for each model specified in methods).}

\item{methods}{A string specifying one or more classification/regression model(s) to use. Passed to the \code{method} parameter of \code{\link[caret:train]{caret::train()}}. If specifying more than one method they will all be passed to \code{\link[caret:resamples]{caret::resamples()}} to compare model performance. Then, if \code{predict = TRUE}, the model with the highest accuracy will be selected to predict the raster surface across the exent of \code{features}. A different \code{trainControl} parameter can be used for each model in \code{methods}.}

\item{fastCompare}{If specifying multiple model types in \code{methods} or one model with multiple different \code{trainControl} objects, should the points in \code{outcome} be sub-sampled for the model comparison step? The selected model will be trained on the full \code{outcome} data set after selection. TRUE/FALSE. This only applies if \code{methods} is length > 3 and if \code{outcome} has more than 4000 rows.}

\item{thinFeatures}{Should random forest selection using \code{\link[VSURF:VSURF]{VSURF::VSURF()}} be used in an attempt to remove irrelevant variables?}

\item{predict}{TRUE will apply the selected model to the full extent of \code{features} and return a raster saved to \code{save_path}.}

\item{n.cores}{The maximum number of cores to use. Leave NULL to use all cores minus 1.}

\item{save_path}{The path (folder) to which you wish to save the predicted raster. Not used unless \code{predict = TRUE}.}
}
\value{
A list with three to five elements: the outcome of the VSURF variable selection process, details of the fitted model, model performance statistics, model performance comparison (if methods includes more than one model), and the final predicted raster (if predict = TRUE). If applicable, the predicted raster is written to disk.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}

Function to facilitate the prediction of spatial variables using machine learning, including the selection of a particular model and/or model parameters from several user-defined options. Both classification and regression is supported, though please ensure that the models passed to the parameter \code{methods} are suitable.

Note that you may need to acquiesce to installing supplementary packages, depending on the model types chosen and whether or not these have been run before; this function may not be 'set and forget'.

It is possible to specify multiple model types (the \code{methods} argument) as well as model-specific parameters (the \code{trainControl} parameter) if you wish to test multiple options and select the best one. To facilitate model type selection, refer to function \code{\link[=modelMatch]{modelMatch()}}.
}
\details{
This function partly operates as a convenient means of passing various parameters to the \code{\link[caret:train]{caret::train()}} function, enabling the user to rapidly trial different model types and parameter sets. In addition, pre-processing of data can optionally be done using \code{\link[VSURF:VSURF]{VSURF::VSURF()}} (parameter \code{thinFeatures}) which can decrease the time to run models by removing superfluous parameters.
}
\section{Balancing classes in outcome (dependent) variable}{
Models can be biased if they are given significantly more points in one outcome class vs others, and best practice is to even out the number of points in each class. If extracting point values from a vector or raster object, a simple way to do that is by using the "strata" parameter if using \code{\link[terra:sample]{terra::spatSample()}}. If working directly from points, \code{\link[caret:downSample]{caret::downSample()}} and \code{\link[caret:downSample]{caret::upSample()}} can be used. See \href{https://topepo.github.io/caret/subsampling-for-class-imbalances.html}{this link} for more information.
}

\section{Classification or regression}{
Whether this function treats your inputs as a classification or regression problem depends on the class attached to the outcome variable. A class \code{factor} will be treated as a classification problem while all other classes will be treated as regression problems.
}

\section{Method for selecting the best model:}{
When specifying multiple model types in\code{methods}, each model type and \code{trainControl} pair (if \code{trainControl} is a list of length equal to \code{methods}) is run using \code{\link[caret:train]{caret::train()}}. To speed things up you can use \code{fastCompare} = TRUE. Models are then compared on their 'accuracy' metric as output by \code{\link[caret:resamples]{caret::resamples()}}, and the highest-performing model is selected. If \code{fastCompare} is TRUE, this model is then run on the complete data set provided in \code{outcome}. Model statistics are returned upon function completion, which allows the user to select their own 'best performing' model based on other criteria.
}

\examples{
\dontshow{if (interactive()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}

# These examples can take a while to run!

# Single model, single trainControl

trainControl <- caret::trainControl(
                method = "repeatedcv",
                number = 2, # 2-fold Cross-validation
                repeats = 2, # repeated 2 times
                verboseIter = FALSE,
                returnResamp = "final",
                savePredictions = "all",
                allowParallel = TRUE)

 outcome <- permafrost_polygons
 outcome$Type <- as.factor(outcome$Type)

result <- spatPredict(features = c(aspect, solrad, slope),
  outcome = outcome,
  poly_sample = 100,
  trainControl = trainControl,
  methods = "ranger",
  n.cores = 2)

terra::plot(result$prediction)


# Multiple models, multiple trainControl

trainControl <- list("ranger" = caret::trainControl(
                                  method = "repeatedcv",
                                  number = 2,
                                  repeats = 2,
                                  verboseIter = FALSE,
                                  returnResamp = "final",
                                  savePredictions = "all",
                                  allowParallel = TRUE),
                     "Rborist" = caret::trainControl(
                                   method = "boot",
                                   number = 2,
                                   repeats = 2,
                                   verboseIter = FALSE,
                                   returnResamp = "final",
                                   savePredictions = "all",
                                   allowParallel = TRUE)
                                   )

result <- spatPredict(features = c(aspect, solrad, slope),
  outcome = outcome,
  poly_sample = 100,
  trainControl = trainControl,
  methods = c("ranger", "Rborist"),
  n.cores = 2)

terra::plot(result$prediction)
\dontshow{\}) # examplesIf}
}
\author{
Ghislain de Laplante (gdela069@uottawa.ca or ghislain.delaplante@yukon.ca)
}
