% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/workflow.R
\name{workflow}
\alias{workflow}
\title{Create a workflow}
\usage{
workflow(preprocessor = NULL, spec = NULL)
}
\arguments{
\item{preprocessor}{An optional preprocessor to add to the workflow. One of:
\itemize{
\item A formula, passed on to \code{\link[=add_formula]{add_formula()}}.
\item A recipe, passed on to \code{\link[=add_recipe]{add_recipe()}}.
\item A \code{\link[=workflow_variables]{workflow_variables()}} object, passed on to \code{\link[=add_variables]{add_variables()}}.
}}

\item{spec}{An optional parsnip model specification to add to the workflow.
Passed on to \code{\link[=add_model]{add_model()}}.}
}
\value{
A new \code{workflow} object.
}
\description{
A \code{workflow} is a container object that aggregates information required to
fit and predict from a model. This information might be a recipe used in
preprocessing, specified through \code{\link[=add_recipe]{add_recipe()}}, or the model specification
to fit, specified through \code{\link[=add_model]{add_model()}}.

The \code{preprocessor} and \code{spec} arguments allow you add components to a
workflow quickly, without having to go through the \verb{add_*()} functions, such
as \code{\link[=add_recipe]{add_recipe()}} or \code{\link[=add_model]{add_model()}}. However, if you need to control any of
the optional arguments to those functions, such as the \code{blueprint} or the
model \code{formula}, then you should use the \verb{add_*()} functions directly
instead.
}
\section{Indicator Variable Details}{
Some modeling functions in R create indicator/dummy variables from
categorical data when you use a model formula, and some do not. When you
specify and fit a model with a \code{workflow()}, parsnip and workflows match
and reproduce the underlying behavior of the user-specified model’s
computational engine.
\subsection{Formula Preprocessor}{

In the \link[modeldata:Sacramento]{modeldata::Sacramento} data set of real
estate prices, the \code{type} variable has three levels: \code{"Residential"},
\code{"Condo"}, and \code{"Multi-Family"}. This base \code{workflow()} contains a
formula added via \code{\link[=add_formula]{add_formula()}} to predict property
price from property type, square footage, number of beds, and number of
baths:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{set.seed(123)

library(parsnip)
library(recipes)
library(workflows)
library(modeldata)

data("Sacramento")

base_wf <- workflow() \%>\%
  add_formula(price ~ type + sqft + beds + baths)
}\if{html}{\out{</div>}}

This first model does create dummy/indicator variables:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{lm_spec <- linear_reg() \%>\%
  set_engine("lm")

base_wf \%>\%
  add_model(lm_spec) \%>\%
  fit(Sacramento)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## == Workflow [trained] ================================================
## Preprocessor: Formula
## Model: linear_reg()
## 
## -- Preprocessor ------------------------------------------------------
## price ~ type + sqft + beds + baths
## 
## -- Model -------------------------------------------------------------
## 
## Call:
## stats::lm(formula = ..y ~ ., data = data)
## 
## Coefficients:
##      (Intercept)  typeMulti_Family   typeResidential  
##          32919.4          -21995.8           33688.6  
##             sqft              beds             baths  
##            156.2          -29788.0            8730.0
}\if{html}{\out{</div>}}

There are \strong{five} independent variables in the fitted model for this
OLS linear regression. With this model type and engine, the factor
predictor \code{type} of the real estate properties was converted to two
binary predictors, \code{typeMulti_Family} and \code{typeResidential}. (The third
type, for condos, does not need its own column because it is the
baseline level).

This second model does not create dummy/indicator variables:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{rf_spec <- rand_forest() \%>\%
  set_mode("regression") \%>\%
  set_engine("ranger")

base_wf \%>\%
  add_model(rf_spec) \%>\%
  fit(Sacramento)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## == Workflow [trained] ================================================
## Preprocessor: Formula
## Model: rand_forest()
## 
## -- Preprocessor ------------------------------------------------------
## price ~ type + sqft + beds + baths
## 
## -- Model -------------------------------------------------------------
## Ranger result
## 
## Call:
##  ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
## 
## Type:                             Regression 
## Number of trees:                  500 
## Sample size:                      932 
## Number of independent variables:  4 
## Mtry:                             2 
## Target node size:                 5 
## Variable importance mode:         none 
## Splitrule:                        variance 
## OOB prediction error (MSE):       7058847504 
## R squared (OOB):                  0.5894647
}\if{html}{\out{</div>}}

Note that there are \strong{four} independent variables in the fitted model
for this ranger random forest. With this model type and engine,
indicator variables were not created for the \code{type} of real estate
property being sold. Tree-based models such as random forest models can
handle factor predictors directly, and don’t need any conversion to
numeric binary variables.
}

\subsection{Recipe Preprocessor}{

When you specify a model with a \code{workflow()} and a recipe preprocessor
via \code{\link[=add_recipe]{add_recipe()}}, the \emph{recipe} controls whether dummy
variables are created or not; the recipe overrides any underlying
behavior from the model’s computational engine.
}
}

\examples{
library(parsnip)
library(recipes)
library(magrittr)
library(modeldata)

data("attrition")

model <- logistic_reg() \%>\%
  set_engine("glm")

formula <- Attrition ~ BusinessTravel + YearsSinceLastPromotion + OverTime

wf_formula <- workflow(formula, model)

fit(wf_formula, attrition)

recipe <- recipe(Attrition ~ ., attrition) \%>\%
  step_dummy(all_nominal(), -Attrition) \%>\%
  step_corr(all_predictors(), threshold = 0.8)

wf_recipe <- workflow(recipe, model)

fit(wf_recipe, attrition)

variables <- workflow_variables(
  Attrition,
  c(BusinessTravel, YearsSinceLastPromotion, OverTime)
)

wf_variables <- workflow(variables, model)

fit(wf_variables, attrition)
}
