% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/io_repertoires_processing.R
\name{make_default_preprocessing}
\alias{make_default_preprocessing}
\alias{make_default_postprocessing}
\alias{make_exclude_columns}
\alias{make_productive_filter}
\alias{make_barcode_prefix}
\title{Preprocessing and postprocessing of input immune repertoire files}
\usage{
make_default_preprocessing(format = c("airr", "10x"))

make_default_postprocessing()

make_exclude_columns(cols = imd_drop_cols("airr"))

make_productive_filter(col_name = c("productive"), truthy = TRUE)

make_barcode_prefix(prefix_col = "Prefix")
}
\arguments{
\item{format}{For \code{make_default_preprocessing()}, a character string specifying
the input data format. Currently supports \code{"airr"} (default) or \code{"10x"}.
This determines the default set of columns to exclude and the values
considered "productive".}

\item{cols}{For \code{make_exclude_columns()}, a character vector of column names
to be removed from the dataset. Defaults to \code{imd_drop_cols("airr")}.
If empty, the returned function will not remove any columns.}

\item{col_name}{For \code{make_productive_filter()}, a character vector of potential
column names that indicate sequence productivity (e.g., \code{"productive"}).
The first matching column found in the dataset will be used.}

\item{truthy}{For \code{make_productive_filter()}, a value or vector of values
that signify a productive sequence in the \code{col_name} column.
Can be a logical \code{TRUE} (default for "airr" format) or a character vector
of strings (e.g., \code{c("true", "TRUE", "True", "t", "T", "1")} for "10x" format).}

\item{prefix_col}{For \code{make_barcode_prefix()}, the name of the column in the
dataset that contains the prefix string to be added to each cell barcode.
Defaults to \code{"Prefix"}. The barcode column itself is identified internally
via \code{imd_schema("barcode")}.}
}
\value{
Each \verb{make_*} function returns a \emph{new function}. This returned function takes
a \code{dataset} as its first argument and \code{...} for any additional arguments,
and performs the specific processing step.
\code{make_default_preprocessing()} and \code{make_default_postprocessing()} return a
\emph{named list} of such functions.
}
\description{
Preprocessing and postprocessing of input immune repertoire files
}
\details{
This collection of "maker" functions generates common preprocessing and
postprocessing function steps tailored for immune repertoire data.
Each \verb{make_*} function returns a new function that can then be applied
to a dataset.

These functions are designed to be flexible components in constructing
custom data processing workflows.

The functions generated by these factories typically expect a \code{dataset}
(e.g., a \code{duckplyr} with annotations) as their first argument
and may accept additional arguments via \code{...} (though often unused in the
predefined steps).
\itemize{
\item \code{make_default_preprocessing()} and \code{make_default_postprocessing()} assemble
a list of such processing functions.
\item The individual \code{make_exclude_columns()}, \code{make_productive_filter()}, and
\code{make_barcode_prefix()} functions create specific transformation steps.
}

These steps are often used when reading data to standardize formats, filter
unwanted records, or enrich information like cell barcodes. They are designed
to gracefully handle cases where an operation is not applicable (e.g., a specified
column is not found) by issuing a warning and returning the dataset unmodified.
}
\section{Functions}{

\itemize{
\item \code{make_default_preprocessing()}: Creates a default list of preprocessing
functions suitable for "airr" or "10x" formatted data. This typically
includes steps to exclude unnecessary columns and filter for productive sequences.
\item \code{make_default_postprocessing()}: Creates a default list of postprocessing
functions, such as adding a prefix to cell barcodes.
\item \code{make_exclude_columns()}: Creates a function that, when applied to a
dataset, removes a specified set of columns.
\item \code{make_productive_filter()}: Creates a function that filters a dataset
to retain only rows where sequences are marked as productive, based on
a specified column and set of "truthy" values.
\item \code{make_barcode_prefix()}: Creates a function that prepends a prefix
(sourced from a specified column in the dataset) to the cell barcodes.
}
}

\seealso{
\code{\link[=read_repertoires]{read_repertoires()}}
}
\concept{processing}
