| Title: | ROC-Optimizing Binary Classifiers |
| Version: | 0.1.3 |
| Description: | Implements ROC (Receiver Operating Characteristic)–Optimizing Binary Classifiers, supporting both linear and kernel models. Both model types provide a variety of surrogate loss functions. In addition, linear models offer multiple regularization penalties, whereas kernel models support a range of kernel functions. Scalability for large datasets is achieved through approximation-based options, which accelerate training and make fitting feasible on large data. Utilities are provided for model training, prediction, and cross-validation. The implementation builds on the ROC-Optimizing Support Vector Machines. For more information, see Hernàndez-Orallo, José, et al. (2004) <doi:10.1145/1046456.1046489>, presented in the ROC Analysis in AI Workshop (ROCAI-2004). |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | stats, graphics, utils, ggplot2, fastDummies, kernlab, pracma, rsample, dplyr, caret |
| RoxygenNote: | 7.3.2 |
| Suggests: | mlbench, knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| URL: | https://github.com/gimunBae/roclab |
| BugReports: | https://github.com/gimunBae/roclab/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-10-23 03:43:18 UTC; bgd55 |
| Author: | Gimun Bae [aut, cre], Seung Jun Shin [aut] |
| Maintainer: | Gimun Bae <gimunbae0201@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-28 08:20:02 UTC |
utils-internal.R - Internal utilities for ROC-SVM
Description
These helper functions are used only inside the package (not exported). They handle pairwise difference construction, Adamax-based optimization, proximal updates for penalties, intercept estimation, and lambda max.
Usage
X.func.complete(X.plus, X.minus)
Generic function for AUC
Description
Compute AUC (Area Under the ROC Curve) for a fitted model.
Dispatches to class-specific methods such as auc.roclearn.
Usage
auc(object, ...)
Arguments
object |
A fitted model object. |
... |
Additional arguments passed to methods. |
Value
Numeric scalar: estimated AUC.
Compute AUC for a fitted kernel model
Description
Estimate the AUC (Area Under the ROC Curve) for a fitted kernel model on new data.
Usage
## S3 method for class 'kroclearn'
auc(object, newdata, y, ...)
Arguments
object |
A fitted model object of class |
newdata |
A matrix or data.frame of test predictors. Must have the same structure as the training data (categorical variables are dummy-aligned automatically). |
y |
Response vector of test labels ({-1, 1} or convertible). |
... |
Not used. |
Value
A numeric scalar giving the estimated AUC.
Examples
set.seed(123)
n_train <- 100
r_train <- sqrt(runif(n_train, 0.05, 1))
theta_train <- runif(n_train, 0, 2*pi)
X_train <- cbind(r_train * cos(theta_train), r_train * sin(theta_train))
y_train <- ifelse(r_train < 0.5, 1, -1)
n_test <- 10
r_test <- sqrt(runif(n_test, 0.05, 1))
theta_test <- runif(n_test, 0, 2*pi)
X_test <- cbind(r_test * cos(theta_test), r_test * sin(theta_test))
y_test <- ifelse(r_test < 0.5, 1, -1)
fit <- kroclearn(X_train, y_train, lambda = 0.1,
kernel = "radial", approx=TRUE)
auc(fit, X_test, y_test)
Compute AUC for a fitted linear model
Description
Estimate the AUC (Area Under the ROC Curve) for a fitted linear model on new data.
Usage
## S3 method for class 'roclearn'
auc(object, newdata, y, ...)
Arguments
object |
A fitted model object of class |
newdata |
A matrix or data.frame of test predictors. Must have the same structure as the training data (categorical variables are dummy-aligned automatically). |
y |
Response vector of test labels ({-1, 1} or convertible). |
... |
Not used. |
Value
A numeric scalar giving the estimated AUC.
Examples
set.seed(123)
n_train <- 100
n_pos <- round(0.2 * n_train)
n_neg <- n_train - n_pos
X_train <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y_train <- c(rep(-1, n_neg), rep(1, n_pos))
n_test <- 10
n_pos_test <- round(0.2 * n_test)
n_neg_test <- n_test - n_pos_test
X_test <- rbind(
matrix(rnorm(2 * n_neg_test, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos_test, mean = 1), ncol = 2)
)
y_test <- c(rep(-1, n_neg_test), rep(1, n_pos_test))
fit <- roclearn(X_train, y_train, lambda = 0.1, approx=TRUE)
auc(fit, X_test, y_test)
Cross-validation for kernel models
Description
Perform k-fold cross-validation over a sequence of \lambda values and
select the optimal model based on AUC.
Usage
cv.kroclearn(
X,
y,
lambda.vec = NULL,
lambda.length = 30,
kernel = "radial",
param.kernel = NULL,
loss = "hinge",
approx = NULL,
intercept = TRUE,
nfolds = 10,
target.perf = list(),
param.convergence = list()
)
Arguments
X |
Predictor matrix or data.frame (categorical variables are automatically one-hot encoded). |
y |
Response vector with class labels in {-1, 1}. Labels given as {0, 1} or as a two-level factor/character are automatically converted to this format. |
lambda.vec |
Optional numeric vector of regularization parameters (lambda values).
If |
lambda.length |
Number of |
kernel |
Kernel type: |
param.kernel |
Kernel-specific parameter:
|
loss |
Surrogate loss function type. One of:
|
approx |
Logical; enables a scalable approximation to accelerate training.
The default is |
intercept |
Logical; include an intercept in the model (default |
nfolds |
Number of cross-validation folds (default 10). |
target.perf |
List with target sensitivity and specificity used when estimating the intercept (defaults to 0.9 each). |
param.convergence |
List of convergence controls (e.g., |
Value
An object of class "cv.kroclearn" with:
-
optimal.lambda— selected\lambda. -
optimal.fit— model trained atoptimal.lambda. -
lambda.vec— grid of penalty values considered. -
auc.mean,auc.sd— mean and sd of cross-validated AUC. -
auc.result— fold-by-lambda AUC matrix. -
time.mean,time.sd— mean and sd of training time. -
time.result— fold-by-lambda training time matrix. -
nfolds,loss,kernel— settings.
See Also
Examples
set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)
cvfit <- cv.kroclearn(
X, y,
lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
kernel = "radial",
approx=TRUE, nfolds = 2
)
cvfit$optimal.lambda
Cross-validation for linear models
Description
Perform k-fold cross-validation over a sequence of \lambda values and
select the optimal model based on AUC.
Usage
cv.roclearn(
X,
y,
lambda.vec = NULL,
lambda.length = 30,
penalty = "ridge",
param.penalty = NULL,
loss = "hinge",
approx = NULL,
intercept = TRUE,
nfolds = 10,
target.perf = list(),
param.convergence = list()
)
Arguments
X |
Predictor matrix or data.frame (categorical variables are automatically one-hot encoded). |
y |
Response vector with class labels in {-1, 1}. Labels given as {0, 1} or as a two-level factor/character are automatically converted to this format. |
lambda.vec |
Optional numeric vector of regularization parameters (lambda values).
If |
lambda.length |
Number of |
penalty |
Regularization penalty type: |
param.penalty |
Penalty-specific parameter:
|
loss |
Surrogate loss function type. One of:
|
approx |
Logical; enables a scalable approximation to accelerate training.
The default is |
intercept |
Logical; include an intercept in the model (default |
nfolds |
Number of cross-validation folds (default 10). |
target.perf |
List with target sensitivity and specificity used when estimating the intercept (defaults to 0.9 each). |
param.convergence |
List of convergence controls (e.g., |
Value
An object of class "cv.roclearn" with:
-
optimal.lambda— selected\lambda. -
optimal.fit— model refit on the full data atoptimal.lambda. -
lambda.vec— grid of penalty values considered. -
auc.mean,auc.sd— mean and sd of cross-validated AUC. -
auc.result— fold-by-lambda AUC matrix. -
time.mean,time.sd— mean and sd of training time. -
time.result— fold-by-lambda training time matrix. -
nfolds,loss,penalty— settings.
See Also
Examples
set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos
X <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))
cvfit <- cv.roclearn(
X, y,
lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
approx=TRUE, nfolds = 2
)
cvfit$optimal.lambda
Fit a kernel model
Description
Fit a kernel model
Usage
kroclearn(
X,
y,
lambda,
kernel = "radial",
param.kernel = NULL,
loss = "hinge",
approx = NULL,
intercept = TRUE,
target.perf = list(),
param.convergence = list()
)
Arguments
X |
Predictor matrix or data.frame (categorical variables are automatically one-hot encoded). |
y |
Response vector with class labels in {-1, 1}. Labels given as {0, 1} or as a two-level factor/character are automatically converted to this format. |
lambda |
Positive scalar regularization parameter. |
kernel |
Kernel type: |
param.kernel |
Kernel-specific parameter:
|
loss |
Surrogate loss function type. One of:
|
approx |
Logical; enables a scalable approximation to accelerate training.
The default is |
intercept |
Logical; include an intercept in the model (default |
target.perf |
List with target sensitivity and specificity used when estimating the intercept (defaults to 0.9 each). |
param.convergence |
List of convergence controls (e.g., |
Details
For large-scale data, the model is computationally prohibitive because its
loss is a U-statistic involving a double summation. To reduce this burden,
the package adopts an efficient algorithm based on an incomplete U-statistic,
which approximates the loss with a single summation. In kernel models,
a Nyström low-rank approximation is further applied to efficiently compute
the kernel matrix. These approximations substantially reduce computational
cost and accelerate training, while maintaining accuracy, making the model
feasible for large-scale datasets. This option is available when
@param approx = TRUE.
Value
An object of class "kroclearn", a list containing:
-
theta.hat— estimated dual coefficient vector. -
intercept— fitted intercept (if applicable). -
lambda,kernel,param.kernel,loss. -
approx,B(number of sampled pairs if approximation used). -
time— training time (seconds). -
nobs,p— number of observations and predictors. -
converged,n.iter— convergence information. -
kfunc— kernel function object. -
nystrom— low rank kernel approximation details (if used). -
X— training data (post-preprocessing). -
preprocessing— details on categorical variables, removed columns, and column names. -
call— the function call.
Examples
set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)
fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)
Visualize Cross-Validation results for kernel models
Description
Produce a visualization of cross-validation results from a fitted
cv.kroclearn object. The plot shows the mean AUC across
regularization parameters \lambda, with error bars reflecting
the cross-validation standard deviation. Optionally, the selected
optimal \lambda is highlighted with a dashed line and marker.
Usage
## S3 method for class 'cv.kroclearn'
plot(x, highlight = TRUE, ...)
Arguments
x |
A cross-validation object of class |
highlight |
Logical; if |
... |
Additional arguments passed to underlying ggplot2 functions. |
Details
This function is a method for the generic plot() function,
designed specifically for cross-validation objects from
cv.kroclearn. The x-axis is displayed on a log scale for
\lambda, and the y-axis represents AUC values. Error bars show
variability across folds. This is the kernel counterpart of
plot.cv.roclearn.
Value
A ggplot2 object is returned and drawn to the current device.
Examples
set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)
cvfit <- cv.kroclearn(
X, y,
lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
kernel = "radial",
approx=TRUE, nfolds = 2
)
plot(cvfit)
Visualize Cross-Validation results for linear models
Description
Produce a visualization of cross-validation results from a fitted
cv.roclearn object. The plot shows the mean AUC across
regularization parameters \lambda, with error bars reflecting
the cross-validation standard deviation. Optionally, the selected
optimal \lambda is highlighted with a dashed line and marker.
Usage
## S3 method for class 'cv.roclearn'
plot(x, highlight = TRUE, ...)
Arguments
x |
A cross-validation object of class |
highlight |
Logical; if |
... |
Additional arguments passed to underlying ggplot2 functions. |
Details
This function is a method for the generic plot() function,
designed specifically for cross-validation objects from
cv.roclearn. The x-axis is displayed on a log scale for
\lambda, and the y-axis represents AUC values. Error bars
show variability across folds.
Value
A ggplot2 object is returned and drawn to the current device.
Examples
set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos
X <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))
cvfit <- cv.roclearn(
X, y,lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
approx=TRUE, nfolds = 2)
plot(cvfit)
Predictions from a fitted kernel model
Description
Generate predictions from a fitted kernel model.
Usage
## S3 method for class 'kroclearn'
predict(object, newdata, type = c("class", "response"), ...)
Arguments
object |
A fitted model object of class |
newdata |
A data frame or matrix of predictors for which predictions are desired. Categorical variables are automatically dummy-encoded and aligned to the training structure. |
type |
Prediction type: |
... |
Not used. |
Value
A numeric vector of predictions ({-1, 1}) if type = "class",
or raw decision scores if type = "response".
See Also
Examples
set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)
fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)
# Predict classes {-1, 1}
predict(fit, X, type = "class")
# Predict decision scores
predict(fit, X, type = "response")
Predictions from a fitted linear model
Description
Generate predictions from a fitted linear model.
Usage
## S3 method for class 'roclearn'
predict(object, newdata, type = c("class", "response"), ...)
Arguments
object |
A fitted model object of class |
newdata |
A data frame or matrix of predictors for which predictions are desired. Categorical variables are automatically dummy-encoded and aligned to the training structure. |
type |
Prediction type: |
... |
Not used. |
Value
A numeric vector of predictions ({-1, 1}) if type = "class",
or raw decision scores if type = "response".
See Also
Examples
set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos
X <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))
fit <- roclearn(X, y, lambda = 0.1, approx=TRUE)
# Predict classes {-1, 1}
predict(fit, X, type = "class")
# Predict decision scores
predict(fit, X, type = "response")
Fit a linear model
Description
Fit a linear model
Usage
roclearn(
X,
y,
lambda,
penalty = "ridge",
param.penalty = NULL,
loss = "hinge",
approx = NULL,
intercept = TRUE,
target.perf = list(),
param.convergence = list()
)
Arguments
X |
Predictor matrix or data.frame (categorical variables are automatically one-hot encoded). |
y |
Response vector with class labels in {-1, 1}. Labels given as {0, 1} or as a two-level factor/character are automatically converted to this format. |
lambda |
Positive scalar regularization parameter. |
penalty |
Regularization penalty type:
|
param.penalty |
Penalty-specific parameter:
|
loss |
Surrogate loss function type. One of:
|
approx |
Logical; enables a scalable approximation to accelerate training.
The default is |
intercept |
Logical; include an intercept in the model (default |
target.perf |
List with target sensitivity and specificity used when estimating the intercept (defaults to 0.9 each). |
param.convergence |
List of convergence controls (e.g., |
Details
For large-scale data, the model is computationally prohibitive because its
loss is a U-statistic involving a double summation. To reduce this burden,
the package adopts an efficient algorithm based on an incomplete U-statistic,
which approximates the loss with a single summation. These approximations
substantially reduce computational cost and accelerate training, while
maintaining accuracy, making the model feasible for large-scale datasets.
This option is available when approx = TRUE.
Value
An object of class "roclearn", a list containing:
-
beta.hat— estimated coefficient vector. -
intercept— fitted intercept (if applicable). -
lambda,penalty,param.penalty,loss. -
approx,B(number of sampled pairs if approximation used). -
time— training time (seconds). -
nobs,p— number of observations and predictors. -
converged,n.iter— convergence information. -
preprocessing— details on categorical variables, removed columns, and column names. -
call— the function call.
Examples
set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos
X <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))
fit <- roclearn(X, y, lambda = 0.1, penalty = "ridge", approx=TRUE)
Summarize Cross-Validation results for kernel models
Description
Print a concise summary of cross-validation results for a kernel model.
Usage
## S3 method for class 'cv.kroclearn'
summary(object, ...)
Arguments
object |
A fitted cross-validation object of class
|
... |
Not used. |
Details
This is a method for the generic summary() function, applied to
objects of class "cv.kroclearn". It prints training settings
(loss, kernel type, number of folds, the set of candidate \lambda),
the selected optimal \lambda, the corresponding mean and standard
deviation of cross-validated AUC, and a truncated table of AUC results
across candidate \lambda values.
Value
Invisibly returns the input object, after printing a summary
to the console.
Examples
set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)
cvfit <- cv.kroclearn(
X, y,
lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
kernel = "radial",
approx=TRUE, nfolds = 2
)
summary(cvfit)
Summarize Cross-Validation results for linear models
Description
Print a concise summary of cross-validation results for a linear model.
Usage
## S3 method for class 'cv.roclearn'
summary(object, ...)
Arguments
object |
A fitted cross-validation object of class
|
... |
Not used. |
Details
This is a method for the generic summary() function, applied to
objects of class "cv.roclearn". It prints training settings
(loss, penalty, number of folds, the set of candidate \lambda),
the selected optimal \lambda, the corresponding mean and standard
deviation of cross-validated AUC, and a truncated table of AUC results
across candidate \lambda values.
Value
Invisibly returns the input object, after printing a summary
to the console.
Examples
set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos
X <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))
cvfit <- cv.roclearn(
X, y,
lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
approx=TRUE, nfolds = 2
)
summary(cvfit)
Summarize a fitted kernel model
Description
Display key information from a fitted "kroclearn" object, including:
data dimensions, kernel specification, convergence status, training time,
and leading coefficient estimates.
Usage
## S3 method for class 'kroclearn'
summary(object, ...)
Arguments
object |
A fitted model of class |
... |
Unused. |
Value
Invisibly returns object after printing a formatted summary.
See Also
kroclearn, summary.roclearn,
cv.kroclearn, cv.roclearn
Examples
set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)
fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)
summary(fit)
Summarize a fitted linear model
Description
Display key information from a fitted "roclearn" object, including:
data dimensions, model specification, convergence status, training time,
and leading coefficient estimates.
Usage
## S3 method for class 'roclearn'
summary(object, ...)
Arguments
object |
A fitted model of class |
... |
Unused. |
Value
Invisibly returns object after printing a formatted summary.
See Also
roclearn, summary.kroclearn,
cv.roclearn, cv.kroclearn
Examples
set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos
X <- rbind(
matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
matrix(rnorm(2 * n_pos, mean = 1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))
fit <- roclearn(X, y, lambda = 0.1, approx=TRUE)
summary(fit)