Title: Fast Analysis of ROC Curves
Version: 0.1.0
Description: A toolkit for analyzing classifier performance by using receiver operating characteristic (ROC) curves. Performance may be assessed on a single classifier or multiple ones simultaneously, making it suitable for comparisons. In addition, different metrics allow the evaluation of local performance when working within restricted ranges of sensitivity and specificity. For details on the different implementations, see McClish D. K. (1989) <doi:10.1177/0272989X8900900307>, Vivo J.-M., Franco M. and Vicari D. (2018) <doi:10.1007/S11634-017-0295-9>, Jiang Y., et al (1996) <doi:10.1148/radiology.201.3.8939225>, Franco M. and Vivo J.-M. (2021) <doi:10.3390/math9212826> and Carrington, André M., et al (2020) <doi:10.1186/s12911-019-1014-6>.
License: GPL (≥ 3)
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: cli, dplyr, forcats, ggplot2, magrittr, purrr, rlang, stringr, SummarizedExperiment, tibble, tidyr
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://pablopnc.github.io/ROCnGO/, https://github.com/pabloPNC/ROCnGO
Depends: R (≥ 4.1.0)
BugReports: https://github.com/pabloPNC/ROCnGO/issues
NeedsCompilation: no
Packaged: 2025-07-14 16:59:55 UTC; infer
Author: Pablo Navarro [aut, cre, cph], Juana-María Vivo [aut], Manuel Franco [aut]
Maintainer: Pablo Navarro <pablo.navarrocarpio@gmail.com>
Repository: CRAN
Date/Publication: 2025-07-17 20:30:18 UTC

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).


Show chance line in a ROC plot

Description

Plot chance line in a ROC plot.

Usage

add_chance_line()

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
 add_chance_line()

Add FpAUC lower bound to a ROC plot

Description

Calculate and plot lower bound defined by FpAUC sensitivity index.

First one plots lower bound when curve shape is partially proper (presents some kind of hook). Second one plots lower bound when curve shape is concave in the region of interest.

Usage

add_fpauc_partially_proper_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  threshold,
  .condition = NULL,
  .label = NULL
)

add_fpauc_concave_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  threshold,
  .condition = NULL,
  .label = NULL
)

add_fpauc_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

threshold

A number between 0 and 1, inclusive. This number represents the lower value of TPR for the region where to calculate and plot lower bound.

Because of definition of fp_auc(), region upper bound will be established as 1.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

# Add lower bound based on curve shape (Concave)
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
  add_fpauc_lower_bound(
    data = iris,
    response = Species,
    predictor = Sepal.Width,
    threshold = 0.9
  )

Add a threshold line to a ROC plot

Description

Include a threshold line on an specified axis.

Usage

add_fpr_threshold_line(threshold)

add_tpr_threshold_line(threshold)

add_threshold_line(threshold, ratio = NULL)

Arguments

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

ratio

Ratio in which to display threshold.

  • If "tpr" threshold will be displayed in TPR, y axis

  • If "fpr" it will be displayed in FPR, x axis.

Value

A ggplot layer instance object.

Examples

# Add two threshold line in TPR = 0.9 and FPR = 0.1
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
 add_threshold_line(threshold = 0.9, ratio = "tpr") +
 add_threshold_line(threshold = 0.1, ratio = "fpr")
# Add threshold line in TPR = 0.9
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
 add_tpr_threshold_line(threshold = 0.9)
# Add threshold line in FPR = 0.1
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
 add_fpr_threshold_line(threshold = 0.1)

Add a section of a ROC curve to an existing one

Description

Add an specific region of a ROC curve to an existing ROC plot.

Usage

add_partial_roc_curve(
  data,
  response = NULL,
  predictor = NULL,
  ratio,
  threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
  add_partial_roc_curve(
    iris,
    response = Species,
    predictor = Sepal.Length,
    ratio = "tpr",
    threshold = 0.9
  )

Add points in a section of a ROC curve to an existing plot

Description

Add points in a specific ROC region to an existing ROC plot.

Usage

add_partial_roc_points(
  data,
  response = NULL,
  predictor = NULL,
  ratio,
  threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
  add_partial_roc_points(
    iris,
    response = Species,
    predictor = Sepal.Length,
    ratio = "tpr",
    threshold = 0.9
  )

Add a ROC curve plot to an existing one

Description

Add a ROC curve to an existing ROC plot.

Usage

add_roc_curve(
  data,
  response = NULL,
  predictor = NULL,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
 add_roc_curve(iris, response = Species, predictor = Sepal.Length)

Add ROC points plot to an existing one

Description

Add ROC points to an existing ROC plot.

Usage

add_roc_points(
  data,
  response = NULL,
  predictor = NULL,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
 add_roc_points(iris, response = Species, predictor = Sepal.Length)

Add TpAUC lower bound to a ROC plot

Description

Calculate and plot lower bound defined by TpAUC specificity index.

Additionally, several lower level functions are provided to plot specific lower bounds:

Usage

add_tpauc_concave_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  .condition = NULL,
  .label = NULL
)

add_tpauc_partially_proper_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  .condition = NULL,
  .label = NULL
)

add_tpauc_under_chance_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  .condition = NULL,
  .label = NULL
)

add_tpauc_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_threshold, upper_threshold

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper values of FPR region where to calculate and plot lower bound.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
  add_tpauc_lower_bound(
    data = iris,
    response = Species,
    predictor = Sepal.Width,
    upper_threshold =  0.1,
    lower_threshold = 0
  )

Calculate area under ROC curve

Description

Calculates area under curve (AUC) of a predictor's ROC curve.

Usage

auc(data = NULL, response, predictor, .condition = NULL)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A numerical value representing the area under ROC curve.

Examples

# Calc AUC of Sepal.Width as a classifier of setosa species
auc(iris, Species, Sepal.Width)
# Change class to predict to virginica
auc(iris, Species, Sepal.Width, .condition = "virginica")

Calculate curve shape over an specific region

Description

calc_curve_shape() calculates ROC curve shape over a specified region.

Usage

calc_curve_shape(
  data = NULL,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  ratio,
  .condition = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_threshold, upper_threshold

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A string indicating ROC curve shape in the specified region. Result can take any of the following values:

Examples

# Calc ROC curve shape of Sepal.Width as a classifier of setosa species
# in TPR = (0.9, 1)
calc_curve_shape(iris, Species, Sepal.Width, 0.9, 1, "tpr")
# Change class to virginica
calc_curve_shape(iris, Species, Sepal.Width, 0.9, 1, "tpr", .condition = "virginica")

Calculate ROC curve partial points

Description

Calculates a series pairs of (FPR, TPR) which correspond to ROC curve points in a specified region.

Usage

calc_partial_roc_points(
  data = NULL,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  ratio,
  .condition = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_threshold, upper_threshold

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A tibble with two columns:

Examples

# Calc ROC points of Sepal.Width as a classifier of setosa species
# in TPR = (0.9, 1)
calc_partial_roc_points(
 iris,
 response = Species,
 predictor = Sepal.Width,
 lower_threshold = 0.9,
 upper_threshold = 1,
 ratio = "tpr"
)

# Change class to virginica
calc_partial_roc_points(
 iris,
 response = Species,
 predictor = Sepal.Width,
 lower_threshold = 0.9,
 upper_threshold = 1,
 ratio = "tpr",
 .condition = "virginica"
)

Concordance indexes

Description

Concordance derived indexes allow calculation and explanation of area under ROC curve in a specific region. They use a dual perspective since they consider both TPR and FPR ranges which enclose the region of interest.

cp_auc() applies concordan partial area under curve (CpAUC), while ncp_auc() applies its normalized version by dividing by the total area.

Usage

cp_auc(
  data = NULL,
  response,
  predictor,
  lower_threshold,
  upper_threshold,
  ratio,
  .condition = NULL
)

ncp_auc(
  data = NULL,
  response,
  predictor,
  lower_threshold,
  upper_threshold,
  ratio,
  .condition = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_threshold, upper_threshold

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A numeric value representing index score for the partial area under ROC curve.

References

Carrington, André M., et al. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC medical informatics and decision making 20 (2020): 1-12.

Examples

# Calculate cp_auc of Sepal.Width as a classifier of setosa especies in
# FPR = (0, 0.1)
cp_auc(
  iris,
  response = Species,
  predictor = Sepal.Width,
  lower_threshold = 0,
  upper_threshold = 0.1,
  ratio = "fpr"
)
# Calculate ncp_auc of Sepal.Width as a classifier of setosa especies in
# FPR = (0, 0.1)
ncp_auc(
  iris,
  response = Species,
  predictor = Sepal.Width,
  lower_threshold = 0,
  upper_threshold = 0.1,
  ratio = "fpr"
)

Hide legend in a ROC plot

Description

Hide legend showing name of ploted classifiers and bounds in a ROC curve plot.

Usage

hide_legend()

Value

A ggplot theme object.


Add NpAUC lower bound to a ROC plot

Description

Calculate and plot lower bound defined by NpAUC specificity index.

Usage

add_npauc_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  threshold,
  .condition = NULL,
  .label = NULL
)

add_npauc_normalized_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

threshold

A number between 0 and 1, inclusive. This number represents the lower value of TPR for the region where to calculate and plot lower bound.

Because of definition of np_auc(), region upper bound will be established as 1.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
  add_npauc_lower_bound(
    iris,
    response = Species,
    predictor = Sepal.Width,
    threshold = 0.9
  )

Calculate partial area under curve

Description

Calculates area under curve curve in an specific TPR or FPR region.

Usage

pauc(
  data = NULL,
  response,
  predictor,
  ratio,
  lower_threshold,
  upper_threshold,
  .condition = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

lower_threshold, upper_threshold

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A numeric value representing the area under ROC curve in the specified region.

Examples

# Calculate pauc of Sepal.Width as a classifier of setosa species in
# in TPR = (0.9, 1)
pauc(
  iris,
  response = Species,
  predictor = Sepal.Width,
  ratio = "tpr",
  lower_threshold = 0.9,
  upper_threshold = 1
)
# Calculate pauc of Sepal.Width as a classifier of setosa species in
# in FPR = (0, 0.1)
pauc(
  iris,
  response = Species,
  predictor = Sepal.Width,
  ratio = "fpr",
  lower_threshold = 0,
  upper_threshold = 0.1
)

Plot a section of a classifier ROC curve

Description

Create a curve plot using points in an specific region of ROC curve.

Usage

plot_partial_roc_curve(
  data,
  response = NULL,
  predictor = NULL,
  ratio,
  threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot object.

Examples

plot_partial_roc_curve(
 iris,
 response = Species,
 predictor = Sepal.Width,
 ratio = "tpr",
 threshold = 0.9
)

Plot points in a region of a ROC curve

Description

Create an scatter plot using points in an specific region of ROC curve.

Usage

plot_partial_roc_points(
  data,
  response = NULL,
  predictor = NULL,
  ratio,
  threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot object.

Examples

plot_partial_roc_points(
 iris,
 response = Species,
 predictor = Sepal.Width,
 ratio = "tpr",
 threshold = 0.9
)

Plot a classifier ROC curve

Description

Create a curve plot using ROC curve points.

Usage

plot_roc_curve(
  data,
  response = NULL,
  predictor = NULL,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width)

Plot classifier points of a ROC curve

Description

Create an scatter plot using ROC curve points.

Usage

plot_roc_points(
  data,
  response = NULL,
  predictor = NULL,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Value

A ggplot object.

Examples

plot_roc_points(iris, response = Species, predictor = Sepal.Width)

Establish condition of interest as 1 and absence as 0.

Description

Transforms levels in a factor to 1 if they match condition of interest ( condition) or 0 otherwise (absent) or 0 otherwise (absent).

Usage

reorder_response_factor(response_fct, condition, absent)

Arguments

response_fct

A factor with different categories (levels).

condition

Name of category being the condition of interest.

absent

Character vector of categories not corresponding to the condition of interest.

Value

factorwith values (0, 1) where 1 matches condition of interest.


Calculate ROC curve points

Description

Calculates a series pairs of (FPR, TPR) which correspond to points displayed by ROC curve. "false positive ratio" will be represented on x axis, while "true positive ratio" on y one.

Usage

roc_points(data = NULL, response, predictor, .condition = NULL)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A tibble with two columns:

Examples

# Calc ROC points of Sepal.Width as a classifier of setosa species
roc_points(iris, Species, Sepal.Width)
# Change class to predict to virginica
roc_points(iris, Species, Sepal.Width, .condition = "virginica")

Sensitivity indexes

Description

Sensitivity indexes provide different ways of calculating area under ROC curve in a specific TPR region. Two different approaches to calculate this area are available:

Usage

fp_auc(data = NULL, response, predictor, lower_tpr, .condition = NULL)

np_auc(data, response, predictor, lower_tpr, .condition = NULL)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_tpr

A numeric value between 0 and 1, inclusive, which represents lower value of TPR for the region where to calculate the partial area under curve.

Because of definition of sensitivity indexes, upper bound of the region will be established as 1.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A numeric value representing the index score for the partial area under ROC curve.

References

Franco M. y Vivo J.-M. Evaluating the Performances of Biomarkers over a Restricted Domain of High Sensitivity. Mathematics 9, 2826 (2021).

Jiang Y., Metz C. E. y Nishikawa R. M. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 201, 745-750 (1996).

Examples

# Calculate fp_auc of Sepal.Width as a classifier of setosa species
# in TPR = (0.9, 1)
fp_auc(iris, response = Species, predictor = Sepal.Width, lower_tpr = 0.9)
# Calculate np_auc of Sepal.Width as a classifier of setosa species
# in TPR = (0.9, 1)
np_auc(iris, response = Species, predictor = Sepal.Width, lower_tpr = 0.9)

Specificity indexes

Description

Specificity indexes provide different ways of calculating area under ROC curve in a specific FPR region. Two different approaches to calculate this area are available:

Usage

sp_auc(
  data = NULL,
  response,
  predictor,
  lower_fpr,
  upper_fpr,
  .condition = NULL,
  .invalid = FALSE
)

tp_auc(
  data = NULL,
  response,
  predictor,
  lower_fpr,
  upper_fpr,
  .condition = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_fpr, upper_fpr

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper values of FPR region where to calculate partial area under curve.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.invalid

If FALSE, the default, sp_auc() will return NA when ROC curve does not fit theoretical bounds and index cannot be applied. If TRUE, function will force the calculation and return a value despite probably being incorrect.

Value

A numeric value representing the index score for the partial area under ROC curve.

References

McClish D. K. Analyzing a Portion of the ROC Curve. Medical Decision Making 9, 190-195 (1989).

Vivo J.-M., Franco M. y Vicari D. Rethinking an ROC partial area index for evaluating the classification performance at a high specificity range. Advances in Data Analysis and Classification 12, 683-704 (2018).

Examples

# Calculate sp_auc of Sepal.Width as a classifier of setosa species
# in FPR = (0.9, 1)
sp_auc(
 iris,
 response = Species,
 predictor = Sepal.Width,
 lower_fpr = 0,
 upper_fpr = 0.1
)
# Calculate tp_auc of Sepal.Width as a classifier of setosa species
 # in FPR = (0.9, 1)
tp_auc(
 iris,
 response = Species,
 predictor = Sepal.Width,
 lower_fpr = 0,
 upper_fpr = 0.1
)

Add SpAUC lower bound to a ROC plot

Description

Calculate and plot lower bound defined by SpAUC specificity index.

Usage

add_spauc_lower_bound(
  data,
  response = NULL,
  predictor = NULL,
  lower_threshold,
  upper_threshold,
  .condition = NULL,
  .label = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

lower_threshold, upper_threshold

Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.label

A string representing the name used in labels.

If NULL, variable name from predictor will be used as label.

Details

SpAUC presents some limitations regarding its lower bound. Lower bound defined by this index cannot be applied to sections where ROC curve is defined under chance line.

add_spauc_lower_bound() doesn't make any check to ensure the index can be safely applied. Consequently, it allows to enforce the representation even though SpAUC cound't be calculated in the region.

Value

A ggplot layer instance object.

Examples

plot_roc_curve(iris, response = Species, predictor = Sepal.Width) +
  add_spauc_lower_bound(
    iris,
    response = Species,
    predictor = Sepal.Width,
    lower_threshold = 0,
    upper_threshold = 0.1
  )

Transform data in a SummarizedExperiment to a data.frame

Description

Transforms a SummarizedExperiment into a data.frame which can be used as input for other functions.

Usage

sumexp_to_df(se, .n = NULL)

Arguments

se

A SummarizedExperiment object.

.n

An integer or string, representing the index or name of the assay to use. Same as i in SummarizedExperiment::assay() function.

By default, function combines every assay in se argument.

Value

A data.frame created from combining assays and colData in a SummarizedExperiment.


Summarize classifiers performance in a dataset

Description

Calculate a series of metrics describing global and local performance for selected classifiers in a dataset.

Usage

summarize_dataset(
  data,
  predictors = NULL,
  response,
  ratio,
  threshold,
  .condition = NULL,
  .progress = FALSE
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

predictors

A vector of numeric data variables which represents the different classifiers or predictors in data to be summarized.

If NULLand by default, predictors will match all numeric variables in data with the exception of response, given that it has a numeric type.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

.progress

If TRUE, show progress of calculations.

Value

A list with different elements:

Examples

summarize_dataset(iris, response = Species, ratio = "tpr", threshold = 0.9)

Summarize classifier performance

Description

Calculates a series of metrics describing global and local classifier performance.

Usage

summarize_predictor(
  data = NULL,
  predictor,
  response,
  ratio,
  threshold,
  .condition = NULL
)

Arguments

data

A data.frame or extension (e.g. a tibble) containing values for predictors and response variables.

predictor

A data variable which must be numeric, representing values of a classifier or predictor for each observation.

response

A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard).

If the variable presents more than two possible outcomes, classes or categories:

  • The outcome of interest (the one to be predicted) will remain distinct.

  • All other categories will be combined into a single category.

New combined category represents the "absence" of the condition to predict. See .condition for more information.

ratio

Ratio or axis where to apply calculations.

  • If "tpr", only points within the specified region of TPR, y axis, will be considered for calculations.

  • If "fpr", only points within the specified region of FPR, x axis, will be considered for calculations.

threshold

A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve.

If ratio = "tpr", it represents lower bound of the TPR region, being its upper limit equal to 1.

If ratio = "fpr", it represents the upper bound of the FPR region, being its lower limit equal to 0.

.condition

A value from response that represents class, category or condition of interest which wants to be predicted.

If NULL, condition of interest will be selected automatically depending on response type.

Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted.

See vignette("selecting-condition") for further information on how automatic selection is performed and details on selecting the condition of interest.

Value

A single row tibble with different predictor with following metrics as columns:

Examples

# Summarize Sepal.Width as a classifier of setosa species
# and local performance in TPR (0.9, 1)
summarize_predictor(
 data = iris,
 predictor = Sepal.Width,
 response = Species,
 ratio = "tpr",
 threshold = 0.9
)
# Summarize Sepal.Width as a classifier of setosa species
# and local performance in FPR (0, 0.1)
summarize_predictor(
 data = iris,
 predictor = Sepal.Width,
 response = Species,
 ratio = "fpr",
 threshold = 0.1
)

Transforms a response variable into a valid factor that can be processed downstream.

Description

transform_response transforms response so that it can be processed in further steps. Function transforms input into a factor of values 1 and 0 corresponding to the condition of interest and absence of it respectively.

Usage

transform_response(response, .condition = NULL)

Arguments

response

A factor, integer or character vector of categories.

Details

By default function takes some assumption on how to make transformation, depending on the class of response:

Value

factor of levels ⁠(0,1)⁠, where 1 represents the condition of interest and 0 absence of it.