Help for package gtregression

Type:

Package

Title:

Tools for Creating Publication-Ready Regression Tables

Version:

1.0.0

Description:

Simplifies regression modeling in R by integrating multiple modeling and summarization tools into a cohesive, user-friendly interface. Designed to be accessible for researchers, particularly those in Low- and Middle-Income Countries (LMIC). Built upon widely accepted statistical methods, including logistic regression (Hosmer et al. 2013, ISBN:9781118548429), log-binomial regression (Spiegelman and Hertzmark 2005 <doi:10.1093/aje/kwi188>), Poisson and robust Poisson regression (Zou 2004 <doi:10.1093/aje/kwh090>), negative binomial regression (Hilbe 2011, ISBN:9780521179515), and linear regression (Kutner et al. 2005, ISBN:9780071122214). Leverages multiple dependencies to ensure high-quality output and generate reproducible, publication-ready tables in alignment with best practices in epidemiology and applied statistics.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 4.1.0)

Imports:

dplyr, gtsummary, risks, purrr, MASS, rlang, stats, lmtest, patchwork, ggtext, ggplot2, tidyr, utils, sandwich, tibble, broom, broom.helpers, gt, officer, flextable

VignetteBuilder:

knitr

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown, mlbench, car, forcats, pkgdown

RoxygenNote:

7.3.2

Config/testthat/edition:

URL:

https://thinkdenominator.github.io/gtregression/

BugReports:

https://github.com/ThinkDenominator/gtregression/issues

NeedsCompilation:

Packaged:

2025-07-30 18:38:55 UTC; salineliyas

Author:

Rubeshkumar Polani

[aut, cre], Salin K Eliyas

[aut], Manikandanesan Sakthivel

[aut], Yuvaraj Krishnamoorthy

[aut], Marie Gilbert Majella

[aut]

Maintainer:

Rubeshkumar Polani <rubesh@thinkdenominator.com>

Repository:

CRAN

Date/Publication:

2025-08-18 14:50:13 UTC

Fit multivariate Regression Model (Internal)

Description

Fits a regression model based on the selected approach.

Usage

.fit_multi_model(data, outcome, exposures, approach)

Arguments

data

A 'data.frame' with outcome and exposures.

outcome

A string. Name of the outcome variable.

exposures

A string or character vector of predictor(s).

approach

A string specifying the regression approach. One of '"logit"', '"log-binomial"', '"poisson"', '"linear"', '"robpoisson"', or '"negbin"'.

Value

A fitted model object ('glm', 'lm', 'riskratio', or 'negbin') or 'NULL' if fitting fails.

Fit Regression Model with One or More Predictors (Internal)

Description

Fits a regression model based on the selected approach. Can handle a single exposure or a vector of exposures.

Usage

.fit_uni_model(data, outcome, exposures, approach)

Arguments

data

A 'data.frame' with complete observations for outcome and exposures.

outcome

A string. Name of the outcome variable.

exposures

A string or character vector of predictor(s).

approach

A string specifying the regression approach. One of '"logit"', '"log-binomial"', '"poisson"', '"linear"', '"robpoisson"', or '"negbin"'.

Value

A fitted model object ('glm', 'lm', 'riskratio', or 'negbin') or 'NULL' if fitting fails.

Get Abbreviation Explanation

Description

Returns a plain-language abbreviation string for the regression approach.

Usage

.get_abbreviation(approach)

Arguments

approach

A character string for the regression approach.

Value

A character string explaining the abbreviation (e.g., '"OR = Odds Ratio"').

Get Unadjusted Effect Label

Description

Returns the appropriate label (e.g., OR, IRR, RR, Beta) for unadjusted regression estimates.

Usage

.get_effect_label(approach)

Arguments

approach

A character string for the regression approach. One of '"logit"', '"log-binomial"', '"poisson"', '"robpoisson"', '"linear"'.

Value

A character string for the effect label, formatted with markdown (e.g., '"**OR**"').

Get Adjusted Effect Label

Description

Returns a markdown-formatted label for adjusted estimates (e.g., Adjusted OR).

Usage

.get_effect_label_adjusted(approach)

Arguments

approach

A character string for the regression approach.

Value

A character string label (e.g., '"**Adjusted IRR**"').

Get Abbreviation to Remove

Description

Identifies which abbreviation string should be removed from the summary table, if applicable for the given approach.

Usage

.get_remove_abbreviation(approach)

Arguments

approach

A character string for the regression approach.

Value

A character string indicating the abbreviation to remove, or '""' if none.

Linear Regression Diagnostic Checks (Internal) similar to reg check in stata Performs diagnostic tests for linear regression models: - Breusch-Pagan test for heteroskedasticity - Shapiro-Wilk test for normality of residuals - RESET test for model specification - Cook's Distance for influential points

Description

Linear Regression Diagnostic Checks (Internal) similar to reg check in stata Performs diagnostic tests for linear regression models: - Breusch-Pagan test for heteroskedasticity - Shapiro-Wilk test for normality of residuals - RESET test for model specification - Cook's Distance for influential points

Usage

.reg_check_linear(model, exposure)

Arguments

model

A fitted linear model ('lm' object).

exposure

Character string giving the name of the exposure variable (for labeling).

Value

A data frame with one row per diagnostic test, including:

Exposure: Name of the exposure variable.
Test: Diagnostic test name.
Statistic: Test statistic or summary (e.g., p-values).
Interpretation: Plain-language result interpretation.

Validate Exposure Variable(s) for Regression

Description

Ensures that the exposure variable has at least two non-missing levels or sufficient numeric variation to support regression modelling.

Usage

.validate_exposures(data, exposures)

Arguments

data

A data frame containing the exposure variables.

exposures

Character vector of column names to validate.

Value

Returns TRUE if valid; otherwise throws an error.

Check Collinearity Using VIF for Fitted Models

Description

Computes Variance Inflation Factors (VIF) for fitted models returned by uni_reg(), multi_reg(), uni_reg_nbin(), or multi_reg_nbin(). Returns one VIF table per model. For multivariate models only

Usage

check_collinearity(model)

Arguments

model

A fitted model object with class "uni_reg", "multi_reg", "uni_reg_nbin", or "multi_reg_nbin".

Value

A tibble containing VIF values and interpretation. For multivariable models, returns one tibble. For univariate models, an error is raised indicating VIF is not applicable.

Examples

if (requireNamespace("gtregression", quietly = TRUE) &&
  requireNamespace("mlbench", quietly = TRUE) &&
  getRversion() >= "4.1.0") {
  data(PimaIndiansDiabetes2, package = "mlbench")
  pima <- PimaIndiansDiabetes2 |> dplyr::filter(!is.na(diabetes))
  pima$diabetes <- ifelse(pima$diabetes == "pos", 1, 0)
  fit <- multi_reg(pima,
    outcome = "diabetes",
    exposures = c("age", "mass", "glucose"),
    approach = "logit"
  )
  check_collinearity(fit)
}

Check Convergence for a Regression Model

Description

Assesses model convergence and provides diagnostics for each exposure (in univariate mode) or for the full model (in multivariable mode), depending on the regression approach used.

Usage

check_convergence(
  data,
  exposures,
  outcome,
  approach = "logit",
  multivariate = FALSE
)

Arguments

data

A data frame containing the dataset.

exposures

A character vector of predictor variable names. If multivariate = FALSE, each exposure is assessed separately. If multivariate = TRUE, exposures are included together.

outcome

A character string specifying the outcome variable.

approach

A character string specifying the regression approach. One of: "logit", "log-binomial", "poisson", "robpoisson", or "negbin".

multivariate

Logical. If TRUE, checks convergence for a multivariable model; otherwise, performs checks for each univariate model.

Details

For robpoisson, predicted probabilities (fitted values) may exceed 1, which is acceptable when estimating risk ratios but should not be interpreted as actual probabilities.

This function is useful for identifying convergence issues, especially for "log-binomial" models, which often fail to converge .

Value

A data frame summarizing convergence diagnostics, including:

Exposure: Name of the exposure variable.
Model: The regression approach used.
Converged: TRUE if the model converged successfully; FALSE otherwise.
Max.prob: Maximum predicted probability or fitted value in the dataset.

Examples

if (requireNamespace("gtregression", quietly = TRUE)) {
  data(data_PimaIndiansDiabetes, package = "gtregression")

  check_convergence(
    data = data_PimaIndiansDiabetes,
    exposures = c("age", "bmi"),
    outcome = "diabetes",
    approach = "logit"
  )

  check_convergence(
    data = data_PimaIndiansDiabetes,
    exposures = c("age", "bmi"),
    outcome = "diabetes",
    approach = "logit",
    multivariate = TRUE
  )
}

PimaIndians2 Diabetes Dataset

Description

A cleaned version of the original Pima Indians Diabetes dataset from the 'mlbench' package. Useful for demonstrating regression approaches for binary outcomes.

Usage

data_PimaIndiansDiabetes

Format

A data frame with 768 observations and 9 variables:

pregnant: Number of times pregnant
glucose: Plasma glucose concentration (glucose tolerance test)
pressure: Diastolic blood pressure (mm Hg)
triceps: Triceps skin fold thickness (mm)
insulin: 2-Hour serum insulin (mu U/ml)
mass: Body mass index (BMI)
pedigree: Diabetes pedigree function
age: Age in years
diabetes: Factor indicating diabetes status (pos/neg)

Source

https://www.openml.org/d/37

Birth Weight Data

Description

A dataset from the MASS package containing risk factors associated with low birth weight (LBW) in newborns. Originally collected at Baystate Medical Center, Springfield, Massachusetts, USA.

Usage

data_birthwt

Format

A data frame with 189 observations and 10 variables:

low: Indicator for birth weight < 2500g (binary): 0 = normal, 1 = low birth weight
age: Mother's age in years (numeric)
lwt: Mother's weight in pounds at last menstrual period (numeric)
race: Mother's race (factor): 1 = White, 2 = Black, 3 = Other
smoke: Smoking status during pregnancy (binary): 0 = No, 1 = Yes
ptl: Number of previous premature labors (integer)
ht: History of hypertension (binary): 0 = No, 1 = Yes
ui: Presence of uterine irritability (binary): 0 = No, 1 = Yes
ftv: no of physician visits during the 1st trimester (integer, 0–6)
bwt: Birth weight in grams (numeric)

Details

The outcome variable is binary ('low'): birth weight < 2500g (yes = 1) or not (no = 0).

Source

Hosmer, D.W., Lemeshow, S. (1989). *Applied Logistic Regression.* New York: Wiley. Also available in MASS and described in detail in its documentation.

Epilepsy Treatment and Seizure Counts

Description

RCT on the effect of a drug on the seizures in patients with epilepsy. Contains repeated measures data with treatment groups, baseline seizure counts, and follow-up counts.

Usage

data_epilepsy

Format

A data frame with 236 observations and 9 variables:

y: Number of seizures in a 2-week period (count)
trt: Treatment group (factor): placebo or progabide
base: Seizure count during baseline period (numeric)
age: Age of patient (numeric)
V4: Indicator for 4th visit (binary)
subject: Patient ID (factor)
period: Follow-up period number (integer)
lbase: Log of baseline seizures (numeric)
lage: Log of age (numeric)

Source

MASS package. Original data from Thall and Vail (1990)

Student Absenteeism in Rural Schools

Description

This dataset contains observations on the number of days absent from school for children in rural Australia, along with student characteristics. It's commonly used to demonstrate count models such as Poisson and Negative Binomial regression.

Usage

data_gt_quin

Format

A data frame with 146 observations and 5 variables:

Eth: Ethnicity ("A" = Aboriginal, "N" = Non-Aboriginal)
Sex: Sex ("F" or "M")
Age: Age group ("F0", "F1", "F2", "F3")
Lrn: Learner status ("AL" = average learner, "SL" = slow learner)
Days: Number of days absent from school (count outcome)

Source

MASS package. See also Venables and Ripley (2002), *Modern Applied Statistics with S*.

Infertility Matched Case-Control Study

Description

investigating the relationship between infertility and abortions.

Usage

data_infertility

Format

A data frame with 248 observations and 8 variables:

education: Education level (0 = 0–5 years, 1 = 6–11 years, 2 = 12+ years)
age: Age in years
parity: Number of prior pregnancies
induced: Number of induced abortions
case: Infertility case status (1 = case, 0 = control)
spontaneous: Number of spontaneous abortions
stratum: Matched set ID
pooled.stratum: Pooled stratum ID used for conditional regression

Source

https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/infert.html

Lung Cancer Trial Data

Description

Survival data from a clinical trial of lung cancer patients conducted by the Veteran's Administration.

Usage

data_lungcancer

Format

A data frame with 137 observations and 8 variables:

trt: Treatment group (1 = standard, 2 = test)
celltype: Cell type (squamous, smallcell, adeno, large)
time: Survival time (in days)
status: Censoring status (1 = died, 0 = censored)
karno: Karnofsky performance score (higher = better)
diagtime: Months from diagnosis to randomization
age: Age in years
prior: Prior therapy (0 = no, 10 = yes)

Source

https://CRAN.R-project.org/package=survival

References

Kalbfleisch JD and Prentice RL (1980). The Statistical Analysis of Failure Time Data.

Descriptive Summary Table for Study Characteristics (User-Friendly)

Description

Creates a clean, publication-ready summary table using 'gtsummary::tbl_summary()'. Designed for beginner analysts, this function applies sensible defaults and flexible options to display categorical and continuous variables with or without stratification. It supports one-line summaries of dichotomous variables, handles missing data gracefully, and includes an optional "Overall" column for comparison.

Usage

descriptive_table(
  data,
  exposures,
  by = NULL,
  percent = c("column", "row"),
  digits = 1,
  show_missing = c("ifany", "no"),
  show_dichotomous = c("all_levels", "single_row"),
  show_overall = c("no", "first", "last"),
  statistic = NULL,
  value = NULL
)

Arguments

data

A data frame containing your study dataset.

exposures

A character vector specifying the variable names (columns) in 'data' that should be included in the summary table. These can be categorical or continuous.

by

Optional. A single character string giving the name of a grouping variable (e.g., outcome). If supplied, the table will show stratified summaries by this variable.

percent

Character. Either '"column"' (default) or '"row"'. - '"column"' calculates percentages within each group defined by 'by' (i.e., denominator = column total). - '"row"' calculates percentages across 'by' groups (i.e., denominator = row total). If 'by' is not specified, '"column"' is used and '"row"' is ignored.

digits

Integer. Controls how many decimal places are shown for percentages and means. Defaults to 1.

show_missing

Character. One of '"ifany"' (default) or '"no"'. - '"ifany"' shows missing value counts only when missing values exist. - '"no"' hides missing counts entirely.

show_dichotomous

Character. One of '"all_levels"' (default) or '"single_row"'. - '"all_levels"' displays all levels of binary (dichotomous) variables. - '"single_row"' shows only one row (typically "Yes", "Present", or a user-defined level), making the table more compact.

show_overall

Character. One of '"no"' (default), '"first"', or '"last"'. If 'by' is supplied: - '"first"' includes a column for overall summaries before the stratified columns. - '"last"' includes the overall column at the end. - '"no"' disables the overall column.

statistic

Optional named vector of summary types for specific variables. For example, use 'statistic = c(age = "mean", bmi = "median")' to override default summaries. Accepted values: '"mean"', '"median"', '"mode"', '"count"'.

value

Optional. A list of formulas specifying which level of a binary variable to show when 'show_dichotomous = "single_row"'. For example, 'value = list(sex ~ "Female")' will report only the "Female" row.

Value

A 'gtsummary::tbl_summary' object with additional class '"descriptive_table"'. Can be printed, customized, merged, or exported.

Examples


if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  library(dplyr)
pima <- PimaIndiansDiabetes2 |>
  mutate(
    diabetes = ifelse(diabetes == "pos", 1, 0),

    bmi = cut(
      mass,
      breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
      labels = c("Underweight", "Normal", "Overweight", "Obese")
    )
  )
  descriptive_table(pima, exposures = c("age", "bmi"),
                    by = "diabetes")
}

Dissect a Dataset Before Regression

Description

Returns a tidy summary of each variable's structure, missingness, uniqueness, and suitability for use in regression models.

Usage

dissect(data)

Arguments

data

A data frame.

Value

A tibble with columns: Variable, Type, Missing ( and Regression Hint.

Examples

dissect(data_birthwt)

Identify Confounders Using the Change-in-Estimate Method

Description

Identifies whether one or more variables are confounders by comparing the crude and adjusted effect estimates of a primary exposure on an outcome. A variable is flagged as a confounder if its inclusion changes the estimate by more than a specified threshold (default = 10

Usage

identify_confounder(
  data,
  outcome,
  exposure,
  potential_confounder,
  approach = "logit",
  threshold = 10
)

Arguments

data

A data frame containing the variables.

outcome

The name of the outcome variable (character string).

exposure

The primary exposure variable (character string).

potential_confounder

One or more variables to test as potential confounders.

approach

The regression modeling approach. One of: "logit", "log-binomial", "poisson", "negbin", "robpoisson", or "linear".

threshold

Numeric. Percent change threshold to define confounding (default = 10). If the absolute percent change exceeds this, the variable is flagged as a confounder.

Details

Supports logistic, log-binomial, Poisson, robust Poisson, negative binomial, and linear regression approaches.

This method does not evaluate effect modification. Use causal diagrams (e.g., DAGs) and subject-matter knowledge to supplement decisions.

Value

If one confounder is provided, prints crude and adjusted estimates with a confounding flag. If multiple are given, returns a tibble with:

covariate: Name of potential confounder.
crude_est: Crude effect estimate.
adjusted_est: Adjusted estimate including the confounder.
pct_change: Percent change from crude to adjusted.
is_confounder: Logical: whether confounding threshold is exceeded.

Examples

data <- data_PimaIndiansDiabetes
identify_confounder(
  data = data,
  outcome = "glucose",
  exposure = "insulin",
  potential_confounder = "age_cat",
  approach = "linear"
)

Compare Models With and Without Interaction Term

Description

This function fits two models—one with and one without an interaction term between an exposure and a potential effect modifier— and compares them using either a likelihood ratio test (LRT) or Wald test. It is useful for assessing whether there is statistical evidence of interaction (effect modification).

Usage

interaction_models(
  data,
  outcome,
  exposure,
  covariates = NULL,
  effect_modifier,
  approach = "logit",
  test = c("LRT", "Wald"),
  verbose = TRUE
)

Arguments

data

A data frame containing all required variables.

outcome

The name of the outcome variable

exposure

The name of the main exposure variable.

covariates

character vector of additional covariates to adjust for

effect_modifier

The name of the variable to test for interaction

approach

The regression modeling approach to use. One of: "logit", "log-binomial", "poisson", "robpoisson", "negbin", or "linear".

test

Type of statistical test for model comparison. Either: "LRT" (likelihood ratio test, default) or "Wald".

verbose

Logical; if TRUE, prints a basic interpretation of whether interaction is likely present (default = FALSE).

Value

A list with the following elements:

model_no_interaction: The model without the interaction term.
model_with_interaction: The model with the interaction term.
p_value: The p-value for interaction (based on selected test).
interpretation: A brief text interpretation if verbose = TRUE.

Examples

data <- data_PimaIndiansDiabetes

Merge Multiple gtsummary Tables (Descriptive, Univariate, Multivariable)

Description

Flexibly merges any 2 or more 'gtsummary' tables (e.g., from 'descriptive_table()', 'uni_reg()', 'multi_reg()') into a single table using 'tbl_merge()'. Automatically applies column spanners based on the order of inputs.

Usage

merge_tables(..., spanners = NULL)

Arguments

...

Two or more 'gtsummary' table objects to merge.

spanners

Optional character vector of column header titles. If not supplied, defaults to '"Table 1"', '"Univariate"', '"Multivariable"' etc.

Value

A merged 'gtsummary::tbl_merge' object.

Examples


if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  library(dplyr)
  library(gtregression)

  # Prepare data
  pima <- PimaIndiansDiabetes2 |>
    mutate(
      diabetes = ifelse(diabetes == "pos", 1, 0),
      bmi_cat = cut(
        mass,
        breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
        labels = c("Underweight", "Normal", "Overweight", "Obese")
      )
    )

  # Descriptive table
  desc_tbl <- descriptive_table(pima,
                                exposures = c("age", "bmi_cat"),
                                by = "diabetes")

  # Univariate logistic regression
  uni_tbl <- uni_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "bmi_cat"),
    approach = "logit"
  )

  # Multivariable logistic regression
  multi_tbl <- multi_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "bmi_cat"),
    approach = "logit"
  )

  # Merge descriptive + univariate + multivariate
  merge_tables(desc_tbl, uni_tbl, multi_tbl)

  # Merge with custom spanners
  merge_tables(desc_tbl, uni_tbl, spanners = c("Summary", "Crude"))

  # Merge just uni and multi
  merge_tables(uni_tbl, multi_tbl)
}

Modify Regression Table Labels and Layout

Description

Allows customization of labels, headers, and layout of regression tables created using 'gtsummary'. Designed for tables from functions like 'uni_reg()', 'multi_reg()', etc.

Usage

modify_table(
  gt_table,
  variable_labels = NULL,
  level_labels = NULL,
  header_labels = NULL,
  caption = NULL,
  bold_labels = FALSE,
  bold_levels = FALSE,
  remove_N = FALSE,
  remove_N_obs = FALSE,
  remove_abbreviations = FALSE,
  caveat = NULL
)

Arguments

gt_table

A 'gtsummary' table object.

variable_labels

A named vector for relabeling variable names.

level_labels

A named list for relabeling levels of variables. Should be structured as 'list(var1 = c(old1 = new1, old2 = new2), ...)'.

header_labels

A named vector for relabeling column headers. Names should match internal column names (e.g., '"estimate"', '"p.value"').

caption

A character string used to set the table title.

bold_labels

Logical. If 'TRUE', bolds variable labels.

bold_levels

Logical. If 'TRUE', bolds factor level labels.

remove_N

Logical. If 'TRUE', hides the 'N' column in univariate regression tables ('uni_reg', 'uni_reg_nbin'). Ignored for other tables.

remove_N_obs

Logical. If 'TRUE', removes the source note showing the no of observations in multivariable models ('multi_reg', 'multi_reg_nbin').

remove_abbreviations

Logical. If 'TRUE', removes default footnotes for estimate abbreviations.

caveat

A character string to add as a footnote (source note) below the table, e.g., "N may vary due to missing data."

Value

A customized 'gtsummary' table object with modified labels, layout, and options.

Examples


if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  library(dplyr)
  library(gtregression)

  # Prepare data
  pima <- PimaIndiansDiabetes2 |>
    mutate(
      diabetes = ifelse(diabetes == "pos", 1, 0),
      bmi_cat = cut(
        mass,
        breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
        labels = c("Underweight", "Normal", "Overweight", "Obese")
      )
    )

  # Descriptive table
  desc_tbl <- descriptive_table(pima,
                                exposures = c("age", "bmi_cat"),
                                by = "diabetes")

  # Univariate logistic regression
  uni_rr <- uni_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "bmi_cat"),
    approach = "logit"
  )
}

Multivariable Regression (Adjusted Odds, Risk, or Rate Ratios)

Description

Fits multivariable regression models for binary, count, or continuous outcomes and returns a publication-ready summary table using 'gtsummary'. Depending on the specified 'approach', the function estimates adjusted Odds Ratios (OR), Risk Ratios (RR), Incidence Rate Ratios (IRR), or Beta coefficients.

Usage

multi_reg(data, outcome, exposures, approach = "logit")

Arguments

data

A data frame containing the analysis variables.

outcome

The name of the outcome variable. Must be a character string.

exposures

A character vector of predictor variables to include.

approach

Modeling approach to use. One of: - '"logit"' for logistic regression (OR), - '"log-binomial"' for log-binomial regression (RR), - '"poisson"' for Poisson regression (IRR), - '"robpoisson"' for robust Poisson regression (RR), - '"linear"' for linear regression (Beta coefficients), - '"negbin"' for negative binomial regression (IRR).

Value

An object of class 'multi_reg', extending 'gtsummary::tbl_regression'. Additional components can be accessed using:

x$models: List of fitted model objects.
x$model_summaries: List of summary outputs.
x$reg_check: Regression diagnostics (only for linear models).
x$table: Returns the main regression table.

Accessors

$models: List of fitted model objects.
$model_summaries: A tibble of tidy regression summaries for each model.

Examples

if (requireNamespace("mlbench", quietly = TRUE)) {
  data(PimaIndiansDiabetes2, package = "mlbench")
  pima <- dplyr::mutate(PimaIndiansDiabetes2,
  diabetes = ifelse(diabetes == "pos", 1, 0))
  model <- multi_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "mass"),
    approach = "logit"
  )
 print(model)
}

Visualize a Regression Model as a Forest Plot

Description

Creates a forest plot from a 'gtsummary' object. Supports both univariate and multivariable models with hierarchical labels for categorical variables. Designed to work seamlessly with outputs from functions like 'uni_reg()' and 'multi_reg()'.

Usage

plot_reg(
  tbl,
  title = NULL,
  ref_line = 1,
  order_y = NULL,
  log_x = FALSE,
  xlim = NULL,
  breaks = NULL,
  point_color = "#1F77B4",
  errorbar_color = "#4C4C4C",
  base_size = 14,
  show_ref = TRUE
)

Arguments

tbl

A 'gtsummary' object from regression functions

title

Optional plot title (character).

ref_line

Numeric value for the reference line (default = 1).

order_y

Optional character vector to the customise y-axis order

log_x

Logical. If 'TRUE', uses a logarithmic x-axis (default = FALSE).

xlim

Optional numeric vector specifying x-axis limits

breaks

Optional numeric vector for x-axis tick breaks.

point_color

Color of the points (default is automatic).

errorbar_color

Color of the error bars (default is automatic).

base_size

Base font size for text elements.

show_ref

Logical. If 'TRUE', includes reference in the plot.

Value

A 'ggplot2' object representing the forest plot.

Examples


if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  library(dplyr)
  library(gtregression)

  # Prepare data
  pima <- PimaIndiansDiabetes2 |>
    mutate(
      diabetes = ifelse(diabetes == "pos", 1, 0),
      bmi_cat = cut(
        mass,
        breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
        labels = c("Underweight", "Normal", "Overweight", "Obese"))
  )

  # Univariate logistic regression
  uni_rr <- uni_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "bmi_cat"),
    approach = "logit"
  )
plot_reg(uni_rr)
}

Visualize Univariate and Multivariate Regression Side-by-Side

Description

Generates side-by-side plots to compare univariate & multivariable results

Usage

plot_reg_combine(
  tbl_uni,
  tbl_multi,
  title_uni = NULL,
  title_multi = NULL,
  ref_line = 1,
  order_y = NULL,
  log_x = FALSE,
  point_color = "#1F77B4",
  errorbar_color = "#4C4C4C",
  base_size = 14,
  show_ref = TRUE,
  xlim_uni = NULL,
  breaks_uni = NULL,
  xlim_multi = NULL,
  breaks_multi = NULL
)

Arguments

tbl_uni

A 'gtsummary' object from 'uni_reg()' etc.,

tbl_multi

A 'gtsummary' object from 'multi_reg()'.

title_uni

Optional plot title for the univariate model

title_multi

Optional plot title for the multivariable mode

ref_line

Numeric value for the reference line (default = 1).

order_y

Optional character vector to manually order the y-axis labels.

log_x

Logical. If 'TRUE', x-axis is log-transformed (default = FALSE).

point_color

Optional color for plot points.

errorbar_color

Optional color for error bars.

base_size

Numeric. Base font size for plot text elements.

show_ref

Logical. If 'TRUE', includes reference categories

xlim_uni

Optional numeric vector to set x-axis limits for uni plot.

breaks_uni

Optional numeric vector to set x-axis breaks for uni plot.

xlim_multi

Optional numeric vector to set x-axis limits for multi plot

breaks_multi

Optional numeric vector to set x-axis breaks- multi plot.

Value

A 'ggplot2' object with two forest plots displayed side-by-side.

Examples


if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  library(dplyr)
  library(gtregression)

  # Prepare data
  pima <- PimaIndiansDiabetes2 |>
    mutate(
      diabetes = ifelse(diabetes == "pos", 1, 0),
      bmi_cat = cut(
        mass,
        breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
        labels = c("Underweight", "Normal", "Overweight", "Obese")
      ),
      age_cat = cut(
        age,
        breaks = c(-Inf, 29, 49, Inf),
        labels = c("Young", "Middle-aged", "Older")
      )
    )

  # Univariate logistic regression
  uni_rr <- uni_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age_cat", "bmi_cat"),
    approach = "logit"
  )

  # Multivariable logistic regression
  multi_rr <- multi_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age_cat", "bmi_cat"),
    approach = "logit"
  )

  # Combined plot
  plot_reg_combine(uni_rr, multi_rr)
}

Save Multiple Tables and Plots to a Word Document

Description

Saves a collection of gtsummary tables and ggplot2 plots into a .docx file.

Usage

save_docx(tables = NULL, plots = NULL, filename = "report.docx", titles = NULL)

Arguments

tables

A list of gtsummary tables.

plots

A list of ggplot2 plot objects.

filename

File name for the output (with or without .docx extension).

titles

Optional. A character vector of titles.

Value

A Word document saved to a temporary directory (if no path is given). No object is returned.

Examples


library(gtsummary)
library(ggplot2)
tbl <- tbl_regression(glm(mpg ~ hp + wt, data = mtcars))
p <- ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()
save_docx(
  tables = list(tbl),
  plots = list(p),
  filename = file.path(tempdir(), "report.docx"),
  titles = c("Table 1: Regression", "Figure 1: Scatterplot")
)

Save a Single Plot

Description

Saves a ggplot2 plot to a file in PNG, PDF, or JPG format.

Usage

save_plot(
  plot,
  filename = "plot",
  format = c("png", "pdf", "jpg"),
  width = 8,
  height = 6,
  dpi = 300
)

Arguments

plot

A ggplot2 object.

filename

Name of the file to save, with or without extension.

format

Output format. One of "png", "pdf", or "jpg".

width

Width of the saved plot in inches.

height

Height of the saved plot in inches.

dpi

Resolution of the plot in dots per inch (default is 300).

Value

Saves the file to a temporary directory (if no path is given).

Examples


library(ggplot2)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()
save_plot(p, filename = file.path(tempdir(), "scatterplot"), format = "png")

Save a Single Regression Table

Description

Saves a gtsummary table as a Word, PDF, or HTML file

Usage

save_table(tbl, filename = "table", format = c("docx", "pdf", "html"))

Arguments

tbl

A gtsummary object (e.g., tbl_regression(), tbl_summary()).

filename

File name to save the output. Extension is optional.

format

Output format. One of "docx", "pdf", or "html".

Value

Saves the file to a temporary directory (if no path is given). Does not return an object.

Examples


model <- glm(mpg ~ hp + wt, data = mtcars)
tbl <- gtsummary::tbl_regression(model)
save_table(tbl, filename = file.path(tempdir(), "regression_table"), format = "docx")

Stepwise Model Selection with Evaluation Metrics

Description

Performs stepwise model selection using forward, backward, or both directions across different regression approaches. Returns a summary table with evaluation metrics (AIC, BIC, log-likelihood, deviance) and the best model.

Usage

select_models(
  data,
  outcome,
  exposures,
  approach = "logit",
  direction = "forward"
)

Arguments

data

A data frame containing the outcome and predictor variables.

outcome

A character string indicating the outcome variable.

exposures

vector of predictor variables to consider in the model.

approach

Regression method. One of: "logit", "log-binomial", "poisson", "robpoisson", "negbin", or "linear".

direction

Stepwise selection direction. One of: "forward" (default), "backward", or "both".

Value

A list with the following components:

results_table: A tibble summarising each tested model's metric (AIC, BIC, deviance, log-likelihood, adjusted R² if applicable).
best_model: The best-fitting model object based on low AIC.
all_models: A named list of all fitted models.

Examples

data <- data_PimaIndiansDiabetes
stepwise <- select_models(
  data = data,
  outcome = "glucose",
  exposures = c("age", "pregnant", "mass"),
  approach = "linear",
  direction = "forward"
)
summary(stepwise)
stepwise$results_table
stepwise$best_model

Stratified Multivariable Regression (Adjusted OR, RR, IRR, or Beta)

Description

Performs multivariable regression with multiple exposures on a binary, count, or continuous outcome, stratified by a specified variable. NA values in the stratifier are excluded from analysis.

Usage

stratified_multi_reg(data, outcome, exposures, stratifier, approach = "logit")

Arguments

data

A data frame containing the variables.

outcome

name of the outcome variable.

exposures

vector specifying the predictor (exposure) variables.

stratifier

A character string specifying the stratifying variable.

approach

Modeling approach to use. One of: '"logit"' (Adjusted Odds Ratios), '"log-binomial"' (Adjusted Risk Ratios), '"poisson"' (Adjusted IRRs), '"robpoisson"' (Adjusted RRs), or '"linear"' (Beta coefficients), '"negbin"' (Adjusted IRRs).

Value

An object of class 'stratified_multi_reg', which includes: - 'table': A 'gtsummary::tbl_stack' object of regression tables by stratum, - 'models': A named list of model objects for each stratum, - 'model_summaries': A list of tidy model summaries, - 'reg_check': Diagnostics results (if available for the model type).

Accessors

$table: Stacked table of stratified regression outputs.
$models: Named list of fitted models per stratum.
$model_summaries: Tidy summaries for each model.
$reg_check: Regression diagnostic checks (when applicable).

Examples

if (requireNamespace("mlbench", quietly = TRUE) &&
  requireNamespace("dplyr", quietly = TRUE)) {
  data(PimaIndiansDiabetes2, package = "mlbench")
  pima <- dplyr::mutate(
    PimaIndiansDiabetes2,
    diabetes = ifelse(diabetes == "pos", 1, 0),
    glucose_cat = dplyr::case_when(
      glucose < 140 ~ "Normal",
      glucose >= 140 ~ "High"
    )
  )
  stratified_multi <- stratified_multi_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "mass"),
    stratifier = "glucose_cat",
    approach = "logit"
  )
  stratified_multi$table
}

Performs univariate regression for each exposure on a binary, count, or continuous outcome, stratified by a specified variable. Produces a stacked 'gtsummary' table with one column per stratum, along with underlying models and diagnostics.

Description

Performs univariate regression for each exposure on a binary, count, or continuous outcome, stratified by a specified variable. Produces a stacked 'gtsummary' table with one column per stratum, along with underlying models and diagnostics.

Usage

stratified_uni_reg(data, outcome, exposures, stratifier, approach = "logit")

Arguments

data

A data frame containing the variables.

outcome

name of the outcome variable.

exposures

A vector specifying the predictor (exposure) variables.

stratifier

A character string specifying the stratifier

approach

Modeling approach to use. One of: '"logit"' (Odds Ratios), '"log-binomial"' (Risk Ratios), '"poisson"' (Incidence Rate Ratios), '"robpoisson"' (Robust RR), '"linear"' (Beta coefficients), '"negbin"' (Incidence Rate Ratios),.

Value

An object of class 'stratified_uni_reg', which includes: - 'table': A 'gtsummary::tbl_stack' object with stratified results, - 'models': A list of fitted models for each stratum, - 'model_summaries': A tidy list of model summaries, - 'reg_check': A tibble of regression diagnostics (when available).

Accessors

$table: Stacked stratified regression table.
$models: List of fitted model objects for each stratum.
$model_summaries: List of tidy model summaries.
$reg_check: Diagnostic check results (when applicable).

Examples

if (requireNamespace("mlbench", quietly = TRUE) &&
  requireNamespace("dplyr", quietly = TRUE)) {
  data(PimaIndiansDiabetes2, package = "mlbench")
  pima <- dplyr::mutate(
    PimaIndiansDiabetes2,
    diabetes = ifelse(diabetes == "pos", 1, 0),
    glucose_cat = dplyr::case_when(
      glucose < 140 ~ "Normal",
      glucose >= 140 ~ "High"
    )
  )
  stratified_uni <- stratified_uni_reg(
    data = pima,
    outcome = "diabetes",
    exposures = c("age", "mass"),
    stratifier = "glucose_cat",
    approach = "logit"
  )
  stratified_uni$table
}

Univariate regression (Odds, Risk, or Rate Ratios)

Description

Performs univariate regression for each exposure on a binary, continuous, or count outcome. Depending on 'approach', returns either Odds Ratios (OR), Risk Ratios (RR), or Incidence Rate Ratios (IRR).

Usage

uni_reg(data, outcome, exposures, approach = "logit")

Arguments

data

A data frame containing the variables.

outcome

outcome variable (binary, continuous, or count).

exposures

A vector of predictor variables.

approach

Modeling approach to use. One of: '"logit"' (OR), '"log-binomial"' (RR), '"poisson"' (IRR), '"robpoisson"' (RR), '"linear"' (Beta coefficients), '"negbin"' (IRR)

Details

This function requires the following packages: 'dplyr', 'purrr', 'gtsummary', 'risks'.

Value

A list of class 'uni_reg' and 'gtsummary::tbl_stack', including:

A publication-ready regression table ('tbl_stack')
Accessor elements:
- '$models': Fitted regression models for each exposure
- '$model_summaries': Tidy model summaries
- '$reg_check': Diagnostics (only for linear regression)

Examples

data(PimaIndiansDiabetes2, package = "mlbench")
library(dplyr)
pima <- PimaIndiansDiabetes2 |>
  dplyr::mutate(diabetes = ifelse(diabetes == "pos", 1, 0))
uni_reg(pima, outcome = "diabetes", exposures = "age", approach = "logit")

Fit multivariate Regression Model (Internal)

Description

Usage

Arguments

Value

Fit Regression Model with One or More Predictors (Internal)

Description

Usage

Arguments

Value

Get Abbreviation Explanation

Description

Usage

Arguments

Value

Get Unadjusted Effect Label

Description

Usage

Arguments

Value

Get Adjusted Effect Label

Description

Usage

Arguments

Value

Get Abbreviation to Remove

Description

Usage

Arguments

Value

Description

Usage

Arguments

Value

Validate Exposure Variable(s) for Regression

Description

Usage

Arguments

Value

Check Collinearity Using VIF for Fitted Models

Description

Usage

Arguments

Value

Examples

Check Convergence for a Regression Model

Description

Usage

Arguments

Details

Value

See Also

Examples

PimaIndians2 Diabetes Dataset

Description

Usage

Format

Source

Birth Weight Data

Description

Usage

Format

Details

Source

Epilepsy Treatment and Seizure Counts

Description

Usage

Format

Source

Student Absenteeism in Rural Schools

Description

Usage

Format

Source

Infertility Matched Case-Control Study

Description

Usage

Format

Source

Lung Cancer Trial Data