--- title: "tcplfit2: A Concentration-Response Modeling Utility" author: "US EPA's Center for Computational Toxicology and Exposure ccte@epa.gov" output: rmdformats::readthedown: fig_retina: false code_folding: hide toc_depth: 3 params: my_css: css/rmdformats.css vignette: > %\VignetteIndexEntry{1. Introduction to tcplfit2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{css, code = readLines(params$my_css), hide=TRUE, echo = FALSE} ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = 'center' ) ``` # Introduction The package `tcplfit2` is used to perform basic concentration-response curve fitting. The original `tcplFit()` functions in the [ToxCast Data Analysis Pipeline (tcpl)](https://cran.R-project.org/package=tcpl) package performed basic concentration-response curve fitting to 3 models: Hill, gain-loss [a modified Hill], and constant. With `tcplfit2`, the concentration-response functionality of the package `tcpl` has been expanded and is being used to process high-throughput screening (HTS) data generated at the US Environmental Protection Agency, including targeted assay data in ToxCast, high-throughput transcriptomics (HTTr), and high-throughput phenotypic profiling (HTPP) screening results. The `tcpl` R package continues to be used to manage, curve fit, plot, and populate its linked MySQL database, invitrodb. Processing with `tcpl` version 3.0 and beyond depends on the stand-alone `tcplfit2` package to allow a wider variety of concentration-response models (when using invitrodb in the 4.0 schema and beyond). The main set of extensions includes additional concentration-response models like those contained in the program [BMDExpress2](https://github.com/auerbachs/BMDExpress-2). These include exponential, polynomial (1 & 2), and power functions in addition to the original Hill, gain-loss and constant models. Similar to BMDExpress2, a defined benchmark response (BMR) level is used to estimate a benchmark dose (BMD), which is the concentration where the curve fit intersects with this BMR threshold. One final addition was to let the hitcall value be a number ranging from 0 to 1 (in contrast to binary hitcall values from `tcplFit()`). Continuous hitcall values in `tcplfit2` are defined as the product of three proportional weights testing the following: 1) the AIC of the winning model is better than the constant model (i.e. the winning model is not fit to background noise), 2) at least one concentration has a median response that exceeds cutoff (i.e. outside the cutoff band in bidirectional modeling cases), and 3) the top from the winning model exceeds the cutoff (i.e. outside the cutoff band in bidirectional modeling cases). Although developed primarily for bioactivity data curve fitting in the Center for Computational Toxicology and Exposure, the `tcplfit2` package is written to be generally applicable for the broader chemical-screening community and their standalone model-fitting applications. This vignette describes some functionality of the `tcplfit2` package with a few simple standalone examples. ## Suggested packages for use with this vignette ```{r setup,class.source="fold-show",warning = FALSE, message = FALSE} # Primary Packages # library(tcplfit2) library(tcpl) # Data Formatting Packages # library(data.table) library(DT) library(htmlTable) library(dplyr) library(stringr) # Plotting Packages # library(ggplot2) library(gridExtra) ``` # Concentration-Response Modeling Multiple concentration experiments allow one to evaluate a chemical's impact on a biological response with increasing concentration. Concentration-response modeling is aimed at leveraging multiple concentration data to predict the underlying relationship between increasing chemical concentrations and its impact on a measured/observable biological response. Predicting the underlying concentration-response relationship can allow one to assess not just a chemical's bioactivity for a particular response of interest/concern, but also its potency. Though, bioactivity and potency may be estimated via other statical analyses (e.g. one-way ANOVA) the advantage to concentration-response modeling is that it evaluates the the shape of the underlying relationship and allows one to derive a point-of-departure (POD) not dependent upon experimental concentrations. In this section we provide three examples for concentration-response modeling: - [Example 1](#ex1): Single series fit with `concRespCore`. - [Example 2](#ex2): Multiple series fit using `tcplfit2_core` and `tcplhit2_core` as stand-alone functions, sequentially. - [Example 3](#ex3): Curve fitting similar to what is executed in the ToxCast pipeline (`tcpl`). This is followed by a section providing details about the continuous hitcall estimation with a brief overview of interpreting these values. ## Concentration-Response Modeling for a Single Series with `concRespCore` {#ex1} `concRespCore` is the main wrapper function performing concentration-response modeling. Under the hood, `concRespCore` utilizes the `tcplfit2_core` and `tcplhit2_core` functions, to perform curve fitting, hitcalling and potency estimation. The example in this section shows how to use the `concRespCore` function; and we refer readers to the [Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core`](#ex2) section later in the vignette to see how `tcplfit2_core` and `tcplhit2_core` may be used separately. The first argument for `concRespCore` is a named list, called 'row', containing the following inputs: - `conc` - a numeric vector of concentrations (not log concentrations). - `resp` - a numeric vector of responses, of the same length as `conc`. Note replicates are allowed, i.e. there may be multiple response values (`resp`) for one concentration dose group. - `cutoff`- a single numeric value indicating the response at which a relevant level of biological activity occurs. This value is typically used to determine if a curve is classified as a "hit". In ToxCast, this is usually 3 times the median absolute deviation around the baseline (BMAD) (i.e. $cutoff = 3*BMAD$). However, users are free to make other choices more appropriate for their given assay and data. - `bmed` - a single numeric value giving the baseline median response. If set to zero then the data are already zero-centered. Otherwise, this value is used to zero-center the data by shifting the entire response series by the specified amount. - `onesd`- a single numeric value giving one standard deviation of the baseline responses. This value is used to calculate the benchmark response (BMR), where $BMR = {\text{onesd}}\times{\text{bmr_scale}}$. The `bmr_scale` defaults to 1.349. The `row` object may include other elements providing meta-data/annotations to be included as part of the `concRespCore` function output -- for example, chemical names (or other identifiers), assay name, name of the response being modeled, etc. A user may also need to include other arguments in the `concRespCore` function, which internally control the execution of curve fitting, hitcalling, and potency estimation: - `conthits` - Logical argument. If `TRUE` (the default, and recommended usage), the hitcall returned will be a value between 0 and 1. - `errfun` - Allows a user to specify the assumed distribution of errors. The default is "dt4", indicating models are fit assuming the errors follow a Student's t-distribution with 4 degrees of freedom. This error distribution has wider tails that diminish the influence of outlier values to produce a more robust estimate. Alternatively, one may assume the errors are normally distributed by changing it to "dnorm". - `poly2.biphasic` - Logical argument. If `TRUE` (the default, and recommended usage), the polynomial 2 model will allow a biphasic curve to be fit to the response (i.e. increase then decrease or vice versa). However, one may force monotonic fitting with `FALSE` (i.e. a parabola where the vertex is not in the tested concentration range -- specifically the vertex will be somwhere less than 0). - `do.plot` - Logical argument. If `TRUE` (the default is `FALSE`), a plot of all fitted curves will be generated. Note, an alternative to this plotting functionality is provided by another plotting function in this package, namely `plot_allcurves` (see [Plotting](#plotting) for further details). - `fitmodels` - a character vector indicating which models to fit the concentration-response data with. If the `fitmodels` parameter is specified, the constant model (`cnst`) model must be included because it is used for comparison in the hitcalling process. However, any other model may be omitted by the user, for example the gain-loss (`gnls`) model is excluded in some applications. For a full list of potential arguments, refer to the function documentation (`?concRespCore`). The following code provides a simple example for using `concRespCore`, including input data set-up and executing the modeling with `concRespCore`. ```{r ex1_concRespCore,class.source="fold-show",warning=FALSE} # tested concentrations conc <- list(.03,.1,.3,1,3,10,30,100) # observed responses at respective concentrations resp <- list(0,.2,.1,.4,.7,.9,.6, 1.2) # row object with relevant parameters row = list(conc = conc,resp = resp,bmed = 0,cutoff = 1,onesd = 0.5,name="some chemical") # execute concentration-response modeling through potency estimation res <- concRespCore(row, fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), conthits = T) ``` The output of this run will be a data frame, with one row, summarizing the winning model results. ```{r, echo=FALSE} htmlTable::htmlTable(head(res), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ') ``` One can plot the winning curve by passing the output (`res`) to the function `concRespPlot2`. This function returns a basic `ggplot2` object, which is meant to leverage the flexibility and modularity of `ggplot2` objects allowing users the ability to customize the plot by adding layers of detail. For more information on customizing plots we refer users to the [Plotting](#plotting) section. ```{r ex1_concRespPlot2, fig.height = 4.55, fig.width = 8} # plot the winning curve from example 1, add a title concRespPlot2(res, log_conc = TRUE) + ggtitle("Example 1: Chemical A") ``` ***Figure 1:** The winning model fit for a single concentration-response series. The concentrations (x-axis) are in $\mathbf{log_{10}}$ units.* ## Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core` {#ex2} In this section, we provide an example of how to fit a set of concentration-response series from a single assay using the `tcplfit2_core` and `tcplhit2_core` functions sequentially. Using the functions sequentially allows users greater flexibility to examine the intermediate output. For example, the output from `tcplfit2_core` contains model parameters for all models fit to the provided concentration-response series. Furthermore, `tcplfit2_core` results may be passed to `plot_allcurves`, which generates a comparative plot of all curves fit to a concentration-response series (see [Plotting](#plotting) for further details). Here, data from a Tox21 high-throughput screening (HTS) assay measuring estrogen receptor (ER) agonist activity are examined. The data were processed with the ToxCast pipeline (`tcpl`), stored, and retrieved from the Level 3 (mc3) table in the `invitrodb` database. At Level 3, data have already undergone pre-processing steps (prior to `tcpl`), including transformation of response values (including zero centering) and concentration normalization. For this example, 6 out of the 100 available chemical samples (spids) from `mc3` are selected. [Concentration-Response Modeling for `tcpl`-like data without a database connection](#ex3) highlights how to process from the original source data. The following code demonstrates how to set up the input data and execute curve fitting and hitcalling with the `tcplfit2_core` and `tcplhit2_core` functions, respectively. ```{r example2,class.source="fold-show",warning=FALSE} # read in the data # Loading in the level 3 example data set from invitrodb stored in tcplfit2 data("mc3") # view the first 6 rows of the mc3 data # dtxsid = unique chemical identifier from EPA's DSSTox Database # casrn = unique chemical identifier from Chemical Abstracts Service # name = chemical name # spid = sample id # logc = log_10 concentration value # resp = response # assay = assay name head(mc3) # estimate the background variability # assume the two lowest concentrations (logc <= -2) for baseline in this example # Note: The baseline may be assay/application specific temp <- mc3[mc3$logc<= -2,"resp"] # obtain response in the two lowest concentrations bmad <- mad(temp) # obtain the baseline median absolute deviation onesd <- sd(temp) # obtain the baseline standard deviation cutoff <- 3*bmad # estimate the cutoff, use the typical cutoff=3*BMAD # select six chemical samples # Note: there may be more than one sample processed for a given chemical spid.list <- unique(mc3$spid) spid.list <- spid.list[1:6] # create empty objects to store fitting results and plots model_fits <- NULL result_table <- NULL plt_lst <- NULL # loop over the samples to perform concentration-response modeling & hitcalling for(spid in spid.list) { # select the data for just this sample temp <- mc3[is.element(mc3$spid,spid),] # The data file stores concentrations in log10 units, so back-transform to "raw scale" conc <- 10^temp$logc # Save the response values resp <- temp$resp # pull out all of the chemical identifiers and the assay name dtxsid <- temp[1,"dtxsid"] casrn <- temp[1,"casrn"] name <- temp[1,"name"] assay <- temp[1,"assay"] # Execute curve fitting # Input concentrations, responses, cutoff, a list of models to fit, and other model fitting requirements # force.fit is set to true so that all models will be fit regardless of cutoff # bidirectional = FALSE indicates only fit models in the positive direction. # if using bidirectional = TRUE the coff only needs to be specified in the positive direction. model_fits[[spid]] <- tcplfit2_core(conc, resp, cutoff, force.fit = TRUE, fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2","exp3", "exp4", "exp5"), bidirectional = FALSE) # Get a plot of all curve fits plt_lst[[spid]] <- plot_allcurves(model_fits[[spid]], conc = conc, resp = resp, log_conc = TRUE) # Pass the output from 'tcplfit2_core' to 'tcplhit2_core' along with # cutoff, onesd, and any identifiers out <- tcplhit2_core(model_fits[[spid]], conc, resp, bmed = 0, cutoff = cutoff, onesd = onesd, identifiers = c(dtxsid = dtxsid, casrn = casrn, name = name, assay = assay)) # store all results in one table result_table <- rbind(result_table,out) } ``` The output from `tcplfit2_core` is a nested list containing the following elements: - `modelnames` - a vector of the model names fit to the data. - `errfun` - a character string specifying the assumed error distribution for model fitting. - Nested list elements, specified by their model names, and contain the estimated model parameters and other details when the corresponding model is fit to the provided data. *The hidden code chunk below shows how to view the structure of model fit output.* ```{r example 2 fit results} # shows the structure of the output object from tcplfit2_core (only top level) str(model_fits[[1]],max.lev = 1) ``` Taking the "Hill" model as an example, the structure of the "Hill" model output elements are as follows, along with details of what is contained in each of the elements: - `success` - a binary indicator, where 1 indicates the fit was successful. - `aic` - the Akaike Information Criterion (AIC) - `cov` - a binary indicator, where 1 indicates estimation of the inverted hessian was successful - `rme` - the root mean square error around the curve - `modl` - a numeric vector of model predicted responses at the given concentrations - `tp`, `ga`, `p` - estimated model parameters for the "Hill" model - `tp_sd`, `ga_sd`, `p_sd` - standard deviations of the model parameters for the "Hill" model - `er` - the numeric error term - `er_sd` - the numeric value for the standard deviation of the error term - `pars` - a character vector containing the name of model parameters estimated for the "Hill" model - `sds` - a character vector containing the name of parameters storing the standard deviation of model parameters for the "Hill" model - `top` - the maximal predicted change in response from baseline (i.e. $y = 0$), can be positive or negative - `ac50` - the concentration inducing 50% of the maximal predicted response All of these details are provided for other models, except for the constant model. The constant model only includes the `success`, `aic`, `rme`, and `er` elements. *The hidden code chunk below shows how to view the structure of fit output for a particular model of interest, we use the Hill model here for demonstration purposes.* ```{r hill_model_fit_str} # structure of the model fit list - hill model results str(model_fits[[1]][["hill"]]) ``` Here we display all model fits for each of the `spid`'s included in the analysis above, these plots are generated with `plot_allcurves`. ```{r example2 plot1, fig.height = 9, fig.width = 7} grid.arrange(grobs=plt_lst,ncol=2) ``` ***Figure 2:** Example plots generated from `plot_allcurves`. Each plot depicts all model fits for a given sample (i.e. concentration-response series). In the plots, observed values are represented by the open circles and each model fit to the data is represented with a different color and line type. Concentrations (x-axis) are displayed in $\mathbf{log_{10}}$ units.* When running the fitting and hitcalling functions sequentially, one can save the resulting rows from `tcplhit2_core` in a data frame structure and export it for further analysis (e.g. in the above code, all results are saved to the `result_table` object). The `result_table` is shown below. ```{r echo=FALSE} htmlTable::htmlTable(result_table, align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ') ``` One can also pass output from `tcplhit2_core` directly to `concRespPlot2` to plot the best model fit, as shown in [Concentration-Response Modeling for a Single Series with `concRespCore`](#ex1). *The hidden code below demonstrates modeling a single row/result and plotting the winning model with `concRespPlot2`, along with a minor customization using `ggplot2` layers.* ```{r example2 plot2} # plot the first row concRespPlot2(result_table[1,],log_conc = TRUE) + # add a descriptive title to the plot ggtitle(paste(result_table[1,"dtxsid"], result_table[1,"name"])) ``` ***Figure 3:** Concentration-response data and the winning model fit for Bisphenol A using the `concRespPlot2` function. Concentrations (x-axis) are displayed in $\mathbf{log_{10}}$ units.* Further details on hitcalling are provided in a later section [Hitcalling](#hitcalling). ## Concentration-Response Modeling for `tcpl`-like data without a database connection {#ex3} The `tcplLite` functionality was deprecated with the updates to `tcpl` and development of `tcplfit2`, because `tcplfit2` allows one to perform curve fitting and hitcalling independent of a database connection. The example in this section demonstrates how to perform an analysis analogous to `tcplLite` with `tcplfit2`. More information on the ToxCast program can be found at https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast. A detailed explanation of processing levels can be found within the Data Processing section of the [`tcpl` Vignette on CRAN](https://cran.R-project.org/package=tcpl). In this example, the input data comes from the ACEA_AR assay. Data from the assay component ACEA_AR_agonist_80hr assumes the response changes in the positive direction relative to DMSO (neutral control & baseline activity) for this curve fitting analysis. Using an electrical impedance as a cell growth reporter, increased activity can be used to infer increased signaling at the pathway-level for the androgen receptor (as encoded by the AR gene). Given the heterogeneity in assay data reporting, source data often must go through pre-processing steps to transform into a uniform data format, namely Level 0 data. ## - Source Data Formatting To run standalone `tcplfit2` fitting, without the need for a MySQL database connection like `invitrodb`, the user will need to step-through/replicate multiple levels of processing (i.e. Level 0 through to Level 3). The below table is identical to the multi-concentration level 0 data (mc0) table one would see in `invitrodb` and is compatible with `tcpl`. Columns include: - `m0id` - Level 0 id - `spid` - Sample id - `acid` - Unique assay component id; unique numeric id for each assay component - `apid` - Assay plate id - `coli` - Column index (location on assay plate) - `rowi` - Row index (location on assay plate) - `wllt` - Well type - `wllq` - Well quality - `conc` - Concentration - `rval` - Raw response value - `srcf` - Source file name - `clowder_uid` - Clowder unique id for source files - `git_hash` - Hash key for pre-processing scripts *The hidden code below demonstrates obtaining the mc0 data file from `invitrodb`, which is saved as an example dataset in the `tcplfit2` R package.* ```{r example3_init, fig.height = 6, fig.width = 7, message=FALSE, warning = FALSE} # Loading in the Level 0 example data set from invitrodb data("mc0") data.table::setDTthreads(2) dat <- mc0 ``` Here we show the top six rows of samples with a treatment well type identifier (i.e. `wllt == 't'`). ```{r, echo=FALSE} # only show the top 6 rows for the treatment samples htmlTable::htmlTable(head(dat[wllt=='t',]), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ') ``` The first step is to establish the concentration index, and corresponds to Level 1 in `tcpl`. Concentration indices are integer values ranking $N$ distinct concentrations from 1 to $N$, which correspond to the lowest and highest concentration groups, respectively. This index can be used to calculate the baseline median absolute deviation (BMAD) for an assay. *The hidden code chunk below demonstrates how to obtain and assign the concentration indices using the `data.table` package.* ```{r example3_cndx, class.source="scroll-100", fig.height = 6, fig.width = 7, warning=FALSE} # Order by the following columns setkeyv(dat, c('acid', 'srcf', 'apid', 'coli', 'rowi', 'spid', 'conc')) # Define a temporary replicate ID (rpid) column for test compound wells # rpid consists of the sample ID, well type (wllt), source file, assay plate ID, and # concentration. # the := operator is a data.table function to add/update rows nconc <- dat[wllt == "t" , ## denotes test well as the well type (wllt) list(n = lu(conc)), # total number of unique concentrations by = list(acid, apid, spid)][ , list(nconc = min(n)), by = acid] dat[wllt == "t" & acid %in% nconc[nconc > 1, acid], rpid := paste(acid, spid, wllt, srcf, apid, "rep1", conc, sep = "_")] dat[wllt == "t" & acid %in% nconc[nconc == 1, acid], rpid := paste(acid, spid, wllt, srcf, "rep1", conc, sep = "_")] # Define rpid column for non-test compound wells dat[wllt != "t", rpid := paste(acid, spid, wllt, srcf, apid, "rep1", conc, sep = "_")] # set the replicate index (repi) based on rowid # increment repi every time a replicate ID is duplicated dat[, dat_rpid := rowid(rpid)] dat[, rpid := sub("_rep[0-9]+.*", "",rpid, useBytes = TRUE)] dat[, rpid := paste0(rpid,"_rep",dat_rpid)] # For each replicate, define concentration index # by ranking the unique concentrations indexfunc <- function(x) as.integer(rank(unique(x))[match(x, unique(x))]) dat[ , cndx := indexfunc(conc), by = list(rpid)] ``` ## - Adjustments The second step is perform any necessary data adjustments, and corresponds to Level 2 in `tcpl`. Generally, if the raw response values (`rval`) need to undergo logarithmic transformation or some other transformation, then those adjustments occur in this step. Transformed response values are referred to as corrected values and are stored in the `cval` field/variable. Here, the raw response values do not require transformation and are identical to the corrected values (`cval`). Samples with poor well quality (`wllq = 0`) and/or missing response values are removed from the overall dataset to consider in the concentration-response series. *The hidden code chunk below demonstrates how to assign the `cval` and filter the data as necessary.* ```{r example3_mc2, fig.height = 6, fig.width = 7} # If no adjustments are required for the data, the corrected value (cval) should be set as original rval dat[,cval := rval] # Poor well quality (wllq) wells should be removed dat <- dat[!wllq == 0,] ##Fitting generally cannot occur if response values are NA therefore values need to be removed dat <- dat[!is.na(cval),] ``` ## - Normalization The third step normalizes and zero-centers data before model fitting, and corresponds to Level 3 in `tcpl`. Our example dataset has both neutral and negative controls available. The equation below demonstrates how to normalize responses to a control in this scenario. However, given experimental designs vary from assay to assay, this process also varies across assays. Thus, the steps shown in this example may not apply to other assays and should only be considered applicable for this example data set. In other applications/scenarios, such as when neutral control or positive/negative controls are not available, the user should normalize responses in a way that best accounts for baseline sampling variability within their experimental design and data. Provided below is a list of normalizing methods used in `tcpl` for reference. For this example, the normalized responses (`resp`) are calculated as a percent of control, i.e. the ratio of differences. The numerator is the difference between the corrected (`cval`) and baseline (`bval`) values and denominator is the difference between the positive/negative control (`pval`) and baseline (`bval`) values. $$ \% \space control = \frac{cval - bval}{pval - bval} $$ The table below provides a few methods for calculating `bval` and `pval` in `tcpl`. For more on the data normalization step, refer to the Data Normalization sub-section in the [`tcpl` Vignette on CRAN](https://cran.R-project.org/package=tcpl). ```{r, echo=FALSE} htmlTable::htmlTable(head(tcpl::tcplMthdList(3)), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ') ``` *The hidden code chunk below demonstrates how to perform the normalization described above and assign values as is done in `tcpl`.* ```{r example3_normalize} # calculate bval of the median of all the wells that have a type of n dat[, bval := median(cval[wllt == "n"]), by = list(apid)] # calculate pval based on the wells that have type of m or o excluding any NA wells dat[, pval := median(cval[wllt %in% c("m","o")], na.rm = TRUE), by = list(apid, wllt, conc)] # take pval as the minimum per assay plate (apid) dat[, pval := min(pval, na.rm = TRUE), by = list(apid)] # Calculate normalized responses dat[, resp := ((cval - bval)/(pval - bval) * 100)] ``` Before model fitting, we need to determine the median absolute deviation around baseline (`BMAD`) and baseline variability (`onesd`), which are later used for cutoff and benchmark response (`BMR`) calculations, respectively. This is part of Level 4 processing in `tcpl`. In this example, we consider test wells in the two lowest concentrations as our baseline to calculate `BMAD` and `onesd`. `BMAD` can be calculated as the median absolute deviation of the data in control wells too. Check out other methods of determining `BMAD` and `onesd` used in `tcpl`. ```{r, echo=FALSE} htmlTable::htmlTable(head(tcpl::tcplMthdList(4)), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ') ``` If the user's dataset contains data from multiple assays (`aeid`), `BMAD` and `onesd` should be calculated per assay/ID. The example data set only contains data from one assay, so we can calculate `BMAD` and `onesd` on the whole dataset. *The hidden code chunk below demonstrates how to perform `BMAD` and `onesd` estimation from the two lowest experimental concentrations across all treatment wells for a given assay endpoint (as done in `tcpl`).* ```{r example3_get_bmad.and.onesd} bmad <- mad(dat[cndx %in% c(1, 2) & wllt == "t", resp]) onesd <- sd(dat[cndx %in% c(1, 2) & wllt == "t", resp]) ``` ## - Dose-Response Curve Fitting Once the data adjustments and normalization steps are complete, model fitting and hitcalling can be done, similar to what was shown in [Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core`](#ex2). Dose-Response Curve Fitting corresponds to Level 4 in `tcpl`. This is where `tcplfit2` is used to fit all available models within `tcpl`. Here we set up a function for running our default model fitting approach and necessary arguments for our analysis. ```{r example3_fitting,class.source="fold-show"} #do tcplfit2 fitting myfun <- function(y) { res <- tcplfit2::tcplfit2_core(y$conc, y$resp, cutoff = 3*bmad, bidirectional = TRUE, verbose = FALSE, force.fit = TRUE, fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5") ) list(list(res)) #use list twice because data.table uses list(.) to look for values to assign to columns } ``` Once the fitting funcion is set up, one can perform dose-response modeling for all `spid`'s in the dataset. **Warning: The fitting step on the full data set, `dat`, can take 7-10 minutes with a single core laptop.** *The hidden code chunk below demonstrates how to curve fit the full example dataset, but is not executed.* ```{r example3_fitting_full, eval=FALSE} # only want to run tcplfit2 for test wells in this case # this chunk doesn't run, fit the curves on the subset below dat[wllt == 't',params:= myfun(.SD), by = .(spid)] ``` However, to demonstrate what the results will look like we execute the curve fitting on an example subset of the data, which only contains records of six samples. ```{r example3_fitting_subset,class.source="fold-show"} # create a subset that contains 6 samples and run curve fitting subdat <- dat[spid %in% unique(spid)[10:15],] subdat[wllt == 't',params:= myfun(.SD), by = .(spid)] ``` Similar to the earlier example [Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core`](#ex2) one can combine the general hitcalling approach of using the `tcplhit2_core` function with the generalized function creation (shown above) to apply hitcalling to the example dataset. This will be further demonstrated in a later section, see [Consideration: Continuous Hitcalls to Activity Calls](#ex3_hitc). # Hitcalling{#hitcalling} After all models are fit to the data, `tcplhit2_core` is used to perform hitcalling, which corresponds to Level 5 in `tcpl`. The continuous hitcall value (`hitc`) is the product of three proportional weights, and the resulting continuous value is between 0 and 1. The definition of each proportional weight is provided in the following subsections. For further details on the proportional weights not provided here we suggest the reader to see Sheffield et al., 2021 for more information on `tcplfit2` hitcalling. ## - $p_1$: AIC Weight :::{.center} “the winning AIC value is less than that of the constant model” ::: Determine whether the constant model – if it were allowed to win – is a better fit to the observed data than the winning model – i.e., is the winning model essentially flat or not. The constant model can never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous `hitc` will be zero. When `aicc` is `FALSE`, default, $p_1$ is calculated as: $$ p_1 = 1 - \frac{exp(0.5*AIC_{constant})}{exp(0.5*AIC_{constant})+exp(0.5*AIC_{winning})}$$ Otherwise, the corrected AICs (i.e. $AIC_c$) for the constant and winning model are used. $p_1$ with the corrected AIC values is estimated as: $$ AIC_c= AIC + \frac{2+df*(df+1)}{n-df-1}$$ where $df$ is the model's degrees of freedom and $n$ is the number of observed responses. ## - $p_2$: Responses Outside Cutoff :::{.center} “at least one median response is outside the cutoff band” ::: At least one dose group has a median response value (central tendency of observed responses within the dose group) “outside” the cutoff band (when considering bi-directional fitting). Responses greater than the cutoff in the positive (“+”) direction and less than the cutoff in the negative (“–”) direction. To estimate whether the median response values for the experimental concentration/dose groups are outside the cutoff band we first obtain a 'scaled' median response ($y_k^*$) value for each experimental dose/concentration group $k$: $$ y_k^* = \frac{y_k-sign(top)*cutoff}{exp(err)} $$ where $y_k$ is the median of observed responses for experimental concentration/dose group $k$, $sign(top)$ is the sign (either positive or negative) of the maximal predicted response from baseline, $cutoff$ is the user defined response threshold indicating meaningful biological activity, and $err$ is the model error parameter. When assuming the responses follow a t-distribution, default, $p_2$ is calculated as: $$ p_2 = 1 - \prod_{k=1}^{D}y_k^* \sim t(df = 4)$$ Alternatively, when assuming the responses follow a normal distribution, $p_2$ is calculated as: $$ p_2 = 1 - \prod_{k=1}^{D} y_k^* \sim N(0,1) $$ where $D$ is the total number of experimental concentration/dose groups. ## - $p_3$: Top Likelihood Ratio ::: {.center} “the top of the fitted curve is outside the cutoff band” ::: Determine whether the predicted maximal response from baseline (`top`) exceeds the cutoff, i.e. the response corresponding to the effect size of interest is outside the cutoff band (less than cutoff in the negative direction and greater than cutoff in the positive direction). $p_3$ is estimated as: $$ p_3 = \frac{1 \pm \chi_2(2*(MLL-LL),1)}{2} $$ where $MLL$ is the maximum log-likelihood of the original predicted best fit model, $LL$ is the log-likelihood of the re-scaled predicted best fit model, and the $\pm$ is: * "+" when $$ \mid top \mid \geq \mid cutoff \mid $$ * "-" when $$ \mid top \mid < \mid cutoff \mid $$ ## Visual Representation of Proportional Weights The following plots provide visual representations for the comparisons conducted in each of the proportional weights that make up the continuous hitcall value. Each figure has one item "highlighted" in blue and another "highlighted" in red. The blue represents the reference for the proportional weight of interest, whereas the red represents an indicator for a response with potential bioactivity (i.e. key comparator) for the proportional weight of interest. For example, for $p_1$ which (as mentioned previously) is meant to determine whether the winning model (red), which is the best fit curve to the observed data given it has the lowest AIC, is much different from the constant model (blue), which indicates no biological response. ```{r hitc_plots,fig.height = 6, fig.width = 7,warning=FALSE,message=FALSE} #### Data Set-Up #### # obtain the base example data DATA_CASE <- tcplfit2::signatures[1,] conc <- strsplit(DATA_CASE[,"conc"],split = "[|]") %>% unlist() %>% as.numeric() resp <- strsplit(DATA_CASE[,"resp"],split = "[|]") %>% unlist() %>% as.numeric() OG_data <- data.frame(xval = conc,yval = resp) %>% # obtain the concentrations that are outside the cutoff band dplyr::mutate(type = ifelse(abs(resp)>=abs(DATA_CASE[,"cutoff"]),"Extreme Responses",NA)) %>% mutate(.,df = "OG_data") # obtain the fit and best fitting/hitcalling information fit <- tcplfit2::tcplfit2_core(conc = conc,resp = resp, cutoff = DATA_CASE[,"cutoff"]) hit <- tcplfit2::tcplhit2_core(params = fit, conc = conc,resp = resp, cutoff = DATA_CASE[,"cutoff"], onesd = DATA_CASE[,"onesd"]) # obtain the continuous curve from fit information XC <- seq(from = min(conc),to = max(conc),length.out = 100) YC <- tcplfit2::exp4(x = XC,ps = unlist(fit$exp4[fit$exp4$pars])) # set up a continuous curve dataset cont_fit <- # best fit data.frame(xval = XC,yval = YC,type = "Best Fit") %>% # constant (flat) fit rbind.data.frame(data.frame(xval = XC,yval = rep(0,length(XC)),type = "Constant Fit")) ## prop weight 3 - continuous curve dataset addition ## # set up temporary data needed for re-scaling plot tmp_cutoff <- DATA_CASE[,"cutoff"] # cutoff value tmp_top <- fit$exp4$top # maximal predicted response from baseline tmp_ps <- unlist(fit$exp4[fit$exp4$pars]) # model parameters # code from toplikelihood.R lines 51-56 for the "exp4" model if (tmp_top == tmp_ps[1]) { # check if the top and tp are the same tmp_ps[1] = tmp_cutoff } else { x_top = acy(y = tmp_top, modpars = list(tp=tmp_ps[1],ga=tmp_ps[2],er=tmp_ps[3]),type="exp4") tmp_ps[1] = tmp_cutoff/( 1 - 2^(-x_top/tmp_ps[2])) } # obtain the rescaled predicted response YC_rescale <- tcplfit2::exp4(x = XC,ps = tmp_ps) # add the continuous rescaled curve to the continuous curve dataset cont_fit <- rbind.data.frame( cont_fit, data.frame(xval = XC,yval = YC_rescale,type = "Rescaled Best Fit") ) %>% mutate(.,df = "cont_fit") # dataset with reference lines (e.g. cutoff, bmr, top, etc.) ref_df <- data.frame( xval = rep(0,6), yval = c(hit$cutoff*c(-1,1), hit$bmr*c(-1,1), fit$exp4$top, hit$cutoff), type = c(rep("Cutoff",2),rep("BMR",2),"Top","Top at Cutoff") ) %>% mutate(.,df = "ref_df") ## plotting dataframe combined plot_highlight_df <- rbind.data.frame(OG_data,cont_fit,ref_df) #### Generate Plots #### ## Generate a Base Plot for the Concentration-Response ## base_plot <- ggplot2::ggplot()+ geom_point(data = dplyr::filter(plot_highlight_df,df == "OG_data"), aes(x = log10(xval),y = yval))+ geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type == "Best Fit"), aes(x = log10(xval),y = yval))+ geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df" & type %in% c("Cutoff","BMR")), aes(yintercept = yval,linetype = type,colour = type))+ ggplot2::ylim(c(-1,1))+ scale_colour_manual(breaks = c("Cutoff","BMR"),values = rep("black",2))+ scale_linetype_manual(breaks = c("Cutoff","BMR"),values = c("dashed","dotted"))+ theme_bw()+ theme(axis.title.x = element_blank(),axis.title.y = element_blank()) ## Proportional Weight 1 Plot ## p1_plot <- base_plot+ # add a title for the subplot ggplot2::ggtitle("p1",subtitle = "AIC Weight")+ # add the constant (reference) and winning model (comparison) - highlighted geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type != "Rescaled Best Fit"), aes(x = log10(xval),y = yval,colour = type,linetype = type))+ scale_colour_manual(name = "", breaks = c("Constant Fit","Best Fit","Cutoff","BMR"), values = c("blue","red",rep("black",2)))+ scale_linetype_manual(name = "", breaks = c("Constant Fit","Best Fit","Cutoff","BMR"), values = c("solid","solid","dashed","dotted"))+ theme(legend.position = "inside", legend.position.inside = c(0.5,0.15), legend.key.size = unit(0.5,"cm"), legend.text = element_text(size = 7), legend.title = element_blank(), legend.background = element_rect(fill = alpha("lemonchiffon",0.5))) ## Proportional Weight 2 Plot ## p2_plot <- base_plot+ # add a title for the subplot ggplot2::ggtitle("p2",subtitle = "Responses Outside Cutoff")+ # add the concentrations with median responses outside the cutoff band - highlighted geom_point(data = dplyr::filter(plot_highlight_df,df == "OG_data" & type == "Extreme Responses"), aes(x = log10(xval),y = yval,shape = type),col = "red")+ # add the cutoff band - highlighted geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df" & type %in% c("Cutoff","BMR")), aes(yintercept = yval,linetype = type,colour = type))+ scale_colour_manual(name = "", breaks = c("Cutoff","BMR"), values = c("blue","black"))+ scale_linetype_manual(name = "", breaks = c("Cutoff","BMR"), values = c("dashed","dotted"))+ scale_shape(name = "")+ theme(legend.position = "inside", legend.position.inside = c(0.5,0.15), legend.key.size = unit(0.5,"cm"), legend.spacing.y = unit(-4,"lines"), legend.text = element_text(size = 7), legend.title = element_blank(), legend.background = element_rect(fill = alpha("lemonchiffon",0.5))) ## Proportional Weight 3 Plot ## p3_plot <- base_plot+ # add a title for the subplot ggplot2::ggtitle("p3",subtitle = "Top Likelihood Ratio")+ # add the original predicted curve & the re-scaled predicted curve - highlighted ggplot2::geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type != "Constant Fit"), aes(x = log10(xval),y = yval,colour = type,linetype = type))+ # add the 'top' (maximal predicted change in response from baseline) & the cutoff band - highlighted ggplot2::geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df"), aes(yintercept = yval,colour = type,linetype = type))+ scale_linetype_manual(name = "", breaks = c("Best Fit","Rescaled Best Fit","Cutoff","BMR","Top","Top at Cutoff"), values = c(rep("solid",2),"dashed","dotted",rep("dashed",2)))+ scale_colour_manual(name = "", breaks = c("Best Fit","Rescaled Best Fit","Cutoff","BMR","Top","Top at Cutoff"), values = c("blue","red",rep("black",2),"skyblue","hotpink"))+ theme(legend.position = "inside", legend.position.inside = c(0.5,0.175), legend.key.size = unit(0.5,"cm"), legend.text = element_text(size = 7), legend.title = element_blank(), legend.background = element_rect(fill = alpha("lemonchiffon",0.5))) ## All Plots ## grid.arrange(p1_plot,p2_plot,p3_plot, ncol = 3, top = paste(DATA_CASE[,"signature"],DATA_CASE[,"dtxsid"],sep = "\n"), left = "response", bottom = paste("log10(conc)", paste(paste("hitc:",signif(hit[,"hitcall"],3)), paste("log10(bmd):",signif(log10(hit[,"bmd"]),3)),sep = ", "), sep = "\n") ) ``` ***Figure 4:** Each sub-plot displays the winning curve for a given concentration-response series in the `signatures` dataset. The sub-plots highlight the key items compared as part of a proportional weight calculation to provide an indication of bioactivity.* One should note that the distribution of hitcall values does not follow a normal distribution, rather values tend towards 0 or 1. Hitcall values close to 1 indicate concentration-response series with biological activity in the measured response (i.e. ‘active’ hit). ## Consideration: Continuous Hitcalls to Activity Calls{#ex3_hitc} Users may consider binarizing the continuous hitcall values into active or inactive designations, setting the activity threshold based on the level of stringency required by the user. Currently, the ToxCast requires a `hitc` value to be greater than or equal to 0.90 for the response to be labeled as active, and anything less is considered inactive. For further details on the activity threshold used in ToxCast we refer readers to the [`tcpl` Vignette on CRAN](https://cran.R-project.org/package=tcpl) and Nyffeler et al., 2023. As previously mentioned, the output of `tcplfit2_core`, i.e. Level 4 data from invitroDB, may be fed directly to the `tcplhit2_core` function. The results are then pivoted wide, and the resulting data table is displayed below. *The hidden code chunk below demonstrates performing hitcalling on the fitting results from [Concentration-Response Modeling for `tcpl`-like data without a database connection](#ex3) and setting a binary hitcall (`hitb`), where 0 indicates an inactive response and 1 indicates an active response.* ```{r example3_hitcalling} #do tcplfit2 hitcalling myfun2 <- function(y) { res <- tcplfit2::tcplhit2_core(params = y$params[[1]], conc = y$conc, resp = y$resp, cutoff = 3*bmad, onesd = onesd ) list(list(res)) } # continue with hitcalling res <- subdat[wllt == 't', myfun2(.SD), by = .(spid)] # pivot wider res_wide <- rbindlist(Map(cbind, spid = res$spid, res$V1)) # add a binary hitcall column to the data res_wide[,hitb := ifelse(hitcall >= 0.9,1,0)] ``` ```{r, echo=FALSE} htmlTable::htmlTable(head(res_wide), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ') ``` *Please note, hitcalling can also be done with the full data set, `dat`, but here we only demonstrate hitcalling with the example data subset model fitting was performed on in [Concentration-Response Modeling for `tcpl`-like data without a database connection](#ex3).* The resulting output from the previous code chunk is the same format as the `result_table` table in [Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core`](#ex2). Thus, one can use the `concRespPlot2` function, as done previously to plot the results. The next code chunk demonstrates how to visualize the [Concentration-Response Modeling for `tcpl`-like data without a database connection](#ex3) fit results. ```{r example3_plot, fig.height = 8, fig.width = 7} # allocate a place-holder object plt_list <- NULL # plot results using `concRespPlot` for(i in 1:nrow(res_wide)){ plt_list[[i]] <- concRespPlot2(res_wide[i,]) } # compile and display winning model plots for concentration-response series grid.arrange(grobs=plt_list,ncol=2) ``` ***Figure 5:** Each sub-plot displays the winning curve for a given concentration-response series in the `subdat` dataset.* # Bounding the Benchmark Dose (BMD) Occasionally, the estimated benchmark dose (BMD) can occur outside the experimental concentration range, e.g. the BMD may be greater than the maximum tested concentration in the data. In these cases, `tcplhit2_core` and `concRespCore` provide options for users to "bound" the estimated BMD. This can be done using the `bmd_low_bnd` and `bmd_up_bnd` arguments. `bmd_low_bnd` and `bmd_up_bnd` are multipliers applied to the minimum or maximum tested concentrations (i.e. reference doses), respectively, to provide lower and upper boundaries for BMD estimates. This section demonstrates how to "bound" BMD estimates using the provided arguments in the `concRespCore` and `tcplhit2_core` functions, thereby preventing extreme BMD estimates far outside of the concentration range screened. ## Imposing Lower BMD Bounds {#boundinglowerbound} First, consider a situation when the estimated BMD is less than the lowest tested concentration. This occurs when the experimental concentrations do not go low enough to capture the transition between the baseline response and the minimum response considered adverse occurring around the benchmark response (BMR). Failure to capture the response behavior in the low-dose region of the experimental design may indicate the data is not suitable for estimating a reliable point-of-departure, and should be flagged. In the following code chunk, we use the `mc3` dataset with some minor modifications to demonstrate this case. Here, we take one of the concentration-response series and remove dose groups less than $0.41$. Removing the lower dose groups simulates the scenario where there is a lack of data in the low-dose region and causes the BMD estimate to be less than the lowest concentration remaining in the data. ```{r ex4_lower,warning=FALSE} # We'll use data from mc3 in this section data("mc3") # determine the background variation # background is defined per the assay. In this case we use logc <= -2 # However, background should be defined in a way that makes sense for your application temp <- mc3[mc3$logc<= -2,"resp"] bmad <- mad(temp) onesd <- sd(temp) cutoff <- 3*bmad # load example data spid <- unique(mc3$spid)[94] ex_df <- mc3[is.element(mc3$spid,spid),] # The data file has stored concentration in log10 form, fix it conc <- 10^ex_df$logc # back-transforming concentrations on log10 scale resp <- ex_df$resp # modify the data for demonstration purposes conc2 <- conc[conc>0.41] resp2 <- resp[which(conc>0.41)] # pull out all of the chemical identifiers and the name of the assay dtxsid <- ex_df[1,"dtxsid"] casrn <- ex_df[1,"casrn"] name <- ex_df[1,"name"] assay <- ex_df[1,"assay"] # create the row object row_low <- list(conc = conc2, resp = resp2, bmed = 0, cutoff = cutoff, onesd = onesd, assay=assay, dtxsid=dtxsid,casrn=casrn,name=name) # run the concentration-response modeling for a single sample res_low <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F) # plotting the results min_conc <- min(conc2) concRespPlot2(res_low, log_conc = T) + geom_vline(aes(xintercept = log10(min_conc)),lty = "dashed")+ geom_rect(aes(xmin = log10(res_low[1, "bmdl"]), xmax = log10(res_low[1, "bmdu"]),ymin = 0,ymax = 30), alpha = 0.05,fill = "skyblue") + geom_segment(aes(x = log10(res_low[, "bmd"]), xend = log10(res_low[, "bmd"]), y = 0, yend = 30),col = "blue")+ ggtitle(label = paste(name,"-",assay),subtitle = dtxsid) ``` ***Figure 6:** This plot shows the winning curve, the lowest experimental concentration (represented by the dashed line), BMD estimation (represented by the solid blue line), and the estimated BMD confidence interval (represented by the light blue bar).* ```{r ex4_lower-res} # function results res_low['Min. Conc.'] <- min(conc2) res_low['Name'] <- name res_low[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) ``` ```{r ex4_table, echo=FALSE} DT::datatable(res_low[1, c("Name","Min. Conc.", "bmd", "bmdl", "bmdu")],rownames = FALSE) ``` The lowest tested concentration in the data is `r min(conc2)` but the estimated BMD from the hitcalling results is `r round(res_low$bmd, 3)`, which is lower. Users may allow the estimated BMD to be lower than the lowest concentration screened while restricting it to be no lower than a boundary set by using the argument `bmd_low_bnd`. Suppose the BMD should be no lower than 80% of the lowest tested concentration, then `bmd_low_bnd = 0.8` can be used to set this boundary. For this example, this results in a computed boundary of `r 0.8*min(conc2)`. The valid input range for `bmd_low_bnd` is between 0 and 1, excluding 0, ($0 < \text{bmd_low_bnd} \leq 1$). If `bmd_low_bnd` is set to 1, that makes the lowest experimental concentration the lower threshold value. ```{r ex4_lower-demo,class.source="fold-show"} # using the argument to set a lower bound for BMD res_low2 <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F, bmd_low_bnd = 0.8) ``` If the estimated BMD is less than the computed boundary (like in this example), it will be "bounded" to the threshold set in `bmd_low_bnd`. Similarly, the confidence interval will also be shifted right by a distance equal to the difference between the estimated BMD and the computed boundary. The following data table provides the numerical adjustments after bounding is applied based on the lower bound threshold ```{r ex4_lower-res-bnd} # print out the new results # include previous results side by side for comparison res_low2['Min. Conc.'] <- min(conc2) res_low2['Name'] <- paste(name, "after `bounding`", sep = "-") res_low['Name'] <- paste(name, "before `bounding`", sep = "-") res_low2[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low2[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) output_low <- rbind(res_low[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")], res_low2[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")]) ``` ```{r example_4_lower_res_table, echo = FALSE} DT::datatable(output_low,rownames = FALSE) ``` Below provides a visual representation of the before and after applying lower boundary BMD bounding. ```{r ex4_lower-plot-bnd, class.source="scroll-100"} # generate some concentrations for the fitted curve logc_plot <- seq(from=-3,to=2,by=0.05) conc_plot <- 10^logc_plot # initiate the plot plot(conc2,resp2,xlab="conc (uM)",ylab="Response",xlim=c(0.001,100),ylim=c(-5,60), log="x",main=paste(name,"\n",assay),cex.main=0.9) # add vertical lines to mark the minimum concentration in the data and the lower threshold set by bmd_low_bnd abline(v=min(conc2), lty = 1, col = "brown", lwd = 2) abline(v=res_low2$bmd, lty = 2, col = "darkviolet", lwd = 2) # add markers for BMD and its boundaries before `bounding` lines(c(res_low$bmd,res_low$bmd),c(0,50),col="green",lwd=2) rect(xleft=res_low$bmdl,ybottom=0,xright=res_low$bmdu,ytop=50,col=rgb(0,1,0, alpha = .5), border = NA) points(res_low$bmd, -0.5, pch = "x", col = "green") # add markers for BMD and its boundaries after `bounding` lines(c(res_low2$bmd,res_low2$bmd),c(0,50),col="blue",lwd=2) rect(xleft=res_low2$bmdl,ybottom=0,xright=res_low2$bmdu,ytop=50,col=rgb(0,0,1, alpha = .5), border = NA) points(res_low2$bmd, -0.5, pch = "x", col = "blue") # add the fitted curve lines(conc_plot, exp4(ps = c(res_low$tp, res_low$ga), conc_plot)) legend(1e-3, 60, legend=c("Lowest Dose Tested", "Boundary", "BMD-before", "BMD-after"), col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1)) ``` ***Figure 7:** This plot shows the estimated BMD and confidence interval before and after "bounding." The solid green line and "X" mark the estimated BMD before "bounding," and the green shaded region represents the estimated confidence interval. The solid blue line and "X" mark the BMD after "bounding," and the blue shaded region represents the "bounded" confidence interval. The solid brown line represents the minimum tested concentration, and the dashed dark violet line represents the boundary dose set by `bmd_low_bnd`. Here, the estimated BMD and the confidence interval were shifted right such that the BMD was "bounded" to the boundary value represented by the overlap between the blue "X" and dashed dark violet line.* ## Imposing Upper BMD Bounds Next, let us consider a situation where the estimated BMD is much larger than the maximum tested concentration. This occurs when the experimental concentrations are too low to capture the transition between the baseline response and the minimum response considered adverse occurring around the benchmark response (BMR). In these situations, the chemical is likely inert or is only active in really high-doses, and should be flagged appropriately. In the following code chunk, we use an example from the `mc3` dataset to demonstrate this case. ```{r ex5_upper,warning=FALSE} # load example data spid <- unique(mc3$spid)[26] ex_df <- mc3[is.element(mc3$spid,spid),] # The data file has stored concentration in log10 form, so fix that conc <- 10^ex_df$logc # back-transforming concentrations on log10 scale resp <- ex_df$resp # pull out all of the chemical identifiers and the name of the assay dtxsid <- ex_df[1,"dtxsid"] casrn <- ex_df[1,"casrn"] name <- ex_df[1,"name"] assay <- ex_df[1,"assay"] # create the row object row_up <- list(conc = conc, resp = resp, bmed = 0, cutoff = cutoff, onesd = onesd,assay=assay, dtxsid=dtxsid,casrn=casrn,name=name) # run the concentration-response modeling for a single sample res_up <- concRespCore(row_up,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F) # plotting the results max_conc <- max(conc) concRespPlot2(res_up, log_conc = T) + # geom_vline(aes(xintercept = max(log10(conc))),lty = "dashed")+ geom_vline(aes(xintercept = log10(max_conc)),lty = "dashed")+ geom_rect(aes(xmin = log10(res_up[1, "bmdl"]), xmax = log10(res_up[1, "bmdu"]),ymin = 0,ymax = 125), alpha = 0.05,fill = "skyblue") + geom_segment(aes(x = log10(res_up[, "bmd"]), xend = log10(res_up[, "bmd"]), y = 0, yend = 125),col = "blue")+ ggtitle(label = paste(name,"-",assay),subtitle = dtxsid) ``` ```{r ex5_upper-res} # max conc res_up['Max Conc.'] <- max(conc) res_up['Name'] <- name res_up[1, c("Max Conc.", "bmd", "bmdl", "bmdu")] <- round(res_up[1, c("Max Conc.", "bmd", "bmdl", "bmdu")], 3) # function results ``` ```{r example_5_table, echo = FALSE} DT::datatable(res_up[1, c('Name','Max Conc.', "bmd", "bmdl", "bmdu")],rownames = FALSE) ``` The estimated BMD, `r round(res_up$bmd, 3)`, is greater than the maximum tested concentration, which is `r max(conc)`. As with the `bmd_low_bnd`, users may allow the BMD to be greater than the maximum tested concentration but no greater than a boundary dose set using `bmd_up_bnd`. Suppose it is desired that the estimated BMD not be larger than 2 times the maximum tested concentration. Here, `bmd_up_bnd = 2` can set the upper threshold dose to `r 2*max(conc)`. If the estimated BMD is greater than the upper boundary (like in this example), it will be "bounded" to this dose, and its confidence interval will be shifted left. The valid input range for `bmd_up_bnd` is any value greater than or equal to 1 ($\text{bmd_up_bnd} \geq 1$). If `bmd_up_bnd` is set to 1, that makes the highest experimental concentration the upper threshold value. ```{r ex5_upper-demo,class.source="fold-show"} # using bmd_up_bnd = 2 res_up2 <- concRespCore(row_up,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F, bmd_up_bnd = 2) ``` Similar to the `bmd_low_bnd` bounding approach, if the estimated BMD is greater than the computed boundary (like in this example), it will be "bounded" to the threshold set in `bmd_up_bnd`. As before, the confidence interval will also be shifted to the left by a distance equal to the difference between the estimated BMD and the computed boundary. The following data table provides the numerical adjustments after bounding is applied based on the upper bound threshold. ```{r ex5_upper-bnd} # print out the new results # include previous results side by side for comparison res_up2['Max Conc.'] <- max(conc) res_up2['Name'] <- paste(name, "after `bounding`", sep = "-") res_up['Name'] <- paste(name, "before `bounding`", sep = "-") res_up2[1, c("Max Conc.", "bmd", "bmdl", "bmdu")] <- round(res_up2[1, c("Max Conc.", "bmd", "bmdl", "bmdu")], 3) output_up <- rbind(res_up[1, c('Name', "Max Conc.", "bmd", "bmdl", "bmdu")], res_up2[1, c('Name', "Max Conc.", "bmd", "bmdl", "bmdu")]) ``` ```{r example_upper_2_table, echo = FALSE} DT::datatable(output_up,rownames = FALSE) ``` Below provides a visual representation of the before and after applying the upper boundary BMD bounding. ```{r ex5_upper-bnd-plot, class.source="scroll-100"} # generate some concentration for the fitting curve logc_plot <- seq(from=-3,to=2,by=0.05) conc_plot <- 10^logc_plot # initiate plot plot(conc,resp,xlab="conc (uM)",ylab="Response",xlim=c(0.001,500),ylim=c(-5,150), log="x",main=paste(name,"\n",assay),cex.main=0.9) # add vertical lines to mark the maximum concentration in the data and the upper boundary set by bmd_up_bnd abline(v=max(conc), lty = 1, col = "brown", lwd=2) abline(v=160, lty = 2, col = "darkviolet", lwd=2) # add marker for BMD and its boundaries before `bounding` lines(c(res_up$bmd,res_up$bmd),c(0,125),col="green",lwd=2) rect(xleft=res_up$bmdl,ybottom=0,xright=res_up$bmdu,ytop=125,col=rgb(0,1,0, alpha = .5), border = NA) points(res_up$bmd, -0.5, pch = "x", col = "green") # add marker for BMD and its boundaries after `bounding` lines(c(res_up2$bmd,res_up2$bmd),c(0,125),col="blue",lwd=2) rect(xleft=res_up2$bmdl,ybottom=0,xright=res_up2$bmdu,ytop=125,col=rgb(0,0,1, alpha = .5), border = NA) points(res_up2$bmd, -0.5, pch = "x", col = "blue") # add the fitting curve lines(conc_plot, poly1(ps = c(res_up$a), conc_plot)) legend(1e-3, 150, legend=c("Maximum Dose Tested", "Boundary", "BMD-before", "BMD-after"), col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1)) ``` ***Figure 8:** This plot shows the estimated BMD and confidence interval before and after "bounding". The green line and "X" mark the estimated BMD before "bounding" and the green shaded region represents the estimated confidence interval. The solid blue line and "X" mark the "bounded" BMD, and the blue shaded region represents the "bounded" confidence interval. The solid brown line represents the maximum tested concentration, and the dashed dark violet line represents the boundary dose set by `bmd_up_bnd`. Here, the estimated BMD and the confidence interval were shifted left such that the BMD was "bounded" to the boundary value represented by the overlap between the blue "X" and dashed dark violet line.* ## Bounding BMDs with `tcplhit2_core` The previous two examples provided for BMD bounding use the `concRespCore` function. However, the `bmd_low_bnd` and `bmd_up_bnd` arguments originate from the `tcplhit2_core` function, which is utilized within the `concRespCore` function. Thus, for users that perform dose-response modeling and hitcalling utilizing the `tcplfit2_core` and `tcplhit2_core` separately can do the same BMD "bounding." Regardless of whether a user utilizes the `bmd_low_bnd` and `bmd_up_bnd` arguments in the `concRespCore` or `tcplhit2_core` function the results should be identical. The code provided below shows how to replicate the results from the [lower bound example](#boundinglowerbound) using `tcplhit2_core` as an alternative. ```{r ex6_hitcore,class.source="fold-show"} # using the same data, fit curves param <- tcplfit2_core(conc2, resp2, cutoff = cutoff) hit_res <- tcplhit2_core(param, conc2, resp2, cutoff = cutoff, onesd = onesd, bmd_low_bnd = 0.8) ``` The following data table provides the numerical adjustments after bounding is applied, here in the lower bound direction. ```{r ex6_hitcore-res} # adding the result from tcplhit2_core to the output table for comparison hit_res["Name"]<- paste("Chlorothalonil", "tcplhit2_core", sep = "-") hit_res['Min. Conc.'] <- min(conc2) hit_res[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(hit_res[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) output_low <- rbind(output_low, hit_res[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")]) ``` ```{r ex6_res-hit_table, echo = FALSE} DT::datatable(output_low,rownames = FALSE) ``` ## Impacts if BMD is between the BMD Lower Bound and Lowest Dose Tested If the estimated BMD falls between the lowest dose tested and the defined threshold for an acceptable BMD, i.e. lowest tested dose and lower boundary dose, the estimated BMD will remain unchanged. For demonstration purposes, the lower bound example is used, but the same principle applies to the upper bound case. The same data from the [lower bound example](#boundinglowerbound) is used along with a smaller `bmd_low_bnd` value to obtain a lower boundary dose. Here, the estimated BMD is acceptable as long as it is no less than 40% (two-fifths) of the minimum tested concentration. The estimated BMD is `r res_low$bmd`, which is between the lowest tested dose, `r min(conc2)`, and the new computed boundary, `r 0.4*min(conc2)`. Thus, the BMD estimate and its confidence interval will remain unchanged. ```{r ex7_lower-bnd,class.source="fold-show"} res_low3 <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), conthits = T, aicc = F, bidirectional=F, bmd_low_bnd = 0.4) ``` The following data table provides the results after applying bounding based on the lower bound threshold. ```{r ex7_lower-bnd-res} # print out the new results # add to previous results for comparison res_low3['Min. Conc.'] <- min(conc2) res_low3['Name'] <- paste("Chlorothalonil", "after `bounding` (two fifths)", sep = "-") res_low3[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low3[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) output_low <- rbind(output_low[-3, ], res_low3[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")]) ``` ```{r ex7_lower-bnd-res-table, echo = FALSE} DT::datatable(output_low,rownames = FALSE) ``` Below provides a visual representation of the before and after applying lower boundary BMD bounding. ```{r ex7_lower-bnd-plot, class.source="scroll-100"} # initiate the plot plot(conc2,resp2,xlab="conc (uM)",ylab="Response",xlim=c(0.001,100),ylim=c(-5,60), log="x",main=paste(name,"\n",assay),cex.main=0.9) # add vertical lines to mark the minimum concentration in the data and the lower boundary set by bmd_low_bnd abline(v=min(conc2), lty = 1, col = "brown", lwd = 2) abline(v=0.4*min(conc2), lty = 2, col = "darkviolet", lwd = 2) # add markers for BMD and its boundaries before `bounding` lines(c(res_low$bmd,res_low$bmd),c(0,50),col="green",lwd=2) rect(xleft=res_low$bmdl,ybottom=0,xright=res_low$bmdu,ytop=50,col=rgb(0,1,0, alpha = .5), border = NA) points(res_low$bmd, 0, pch = "x", col = "green") # add markers for BMD and its boundaries after `bounding` lines(c(res_low3$bmd,res_low3$bmd),c(0,50),col="blue",lwd=2) rect(xleft=res_low3$bmdl,ybottom=0,xright=res_low3$bmdu,ytop=50,col=rgb(0,0,1, alpha = .5), border = NA) points(res_low3$bmd, 0, pch = "x", col = "blue") # add the fitted curve lines(conc_plot, exp4(ps = c(res_low$tp, res_low$ga), conc_plot)) legend(1e-3, 60, legend=c("Lowest Dose Tested", "Boundary Dose", "BMD-before", "BMD-after"), col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1)) ``` ***Figure 9:** This plot shows the estimated BMD and the confidence interval before and after "bounding". The dashed dark violet line represents the boundary dose and the solid brown line represents the minimum tested concentration, which are at `r 0.4*min(conc2)` and `r min(conc2)`, respectively. The estimated BMD of `r res_low3[, "bmd"]` falls between the boundary and lowest dose tested, which leaves the BMD and confidence intervals unchanged. Here, the estimated BMD and "bounded" BMD are the same. Thus, the green and blue lines and "X"s representing the estimated BMD before and after "bounding", respectively, as well as their confidence intervals indicated by the shaded regions completely overlap.* # Plotting {#plotting} [Concentration-Response Modeling for a Single Series with `concRespCore`](#ex1) and [for Multiple Series with `tcplfit2_core` and `tcplhit2_core`](#ex2) illustrated two plotting functions available in `tcplfit2` based on `ggplot2` plotting grammar. This section will show two other plotting options available in `tcplfit2`, which use base R plotting, namely the `do.plot` argument in `concRespCore` and the `concRespPlot` function. For this section of the vignette, we use the `signature` dataset from `tcplfit2` to demonstrate the utility of the plotting functions, see [High-Throughput Transcriptomics Platform for Screening Environmental Chemicals](https://doi.org/10.1093/toxsci/kfab009) for further details. The `signatures` dataset contains 6 transcriptional signatures for one chemical. Each row in the data is treated as a chemical-assay endpoint pair and provides the experimental concentration-response data, along with the cutoff and baseline standard deviation. ## Plotting All Models with `concRespCore` and `concRespPlot` The `concRespPlot` function and the `do.plot` argument in `concRespCore` provide plots similar to Figure 1 and 2, respectively. The `do.plot` argument returns a plot of all curve fits of a chemical, and `concRespCore` returns a plot of the winning curve with the hitcalling results. ```{r appendix_plt1, fig.height = 6, fig.width = 7, warning = FALSE} # read in the file data("signatures") # set up a 3 x 2 grid for the plots oldpar <- par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow=c(3,2),mar=c(4,4,5,2)) # fit 6 observations in signatures for(i in 1:nrow(signatures)){ # set up input data row = list(conc=as.numeric(str_split(signatures[i,"conc"],"\\|")[[1]]), resp=as.numeric(str_split(signatures[i,"resp"],"\\|")[[1]]), bmed=0, cutoff=signatures[i,"cutoff"], onesd=signatures[i,"onesd"], name=signatures[i,"name"], assay=signatures[i,"signature"]) # run concentration-response modeling (1st plotting option) out = concRespCore(row,conthits=F,do.plot=T) if(i==1){ res <- out }else{ res <- rbind.data.frame(res,out) } } ``` ***Figure 10:** This figure provides several example plots generated using the argument `do.plot=TRUE` in the `concRespCore` function. Each plot displays data for a single row of data in the `signatures` dataset, and like Figure 1 provides all model fits for a given response. Note, the detail of smooth curves is not captured here as the curves only show the predicted responses at the provided experimental concentrations.* ```{r appendix plt2, fig.height = 8, fig.width = 7} # set up a 3 x 2 grid for the plots oldpar <- par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow=c(3,2),mar=c(4,4,2,2)) # plot results using `concRespPlot` for(i in 1:nrow(res)){ concRespPlot(res[i,],ymin=-1,ymax=1) } ``` ***Figure 11:** Each figure shows curve fitting results for a set of responses in the `signatures` data. Each plot title contains the chemical name and assay ID. Additionally, summary statistics from the curve fitting results – including the winning model, AC50, top, BMD, ACC, and hitcall – are displayed at the top of the plot. The black dots represent the observed responses, and the winning model fit is displayed as a solid black curve. The estimated BMD is displayed with a solid green vertical line, and the confidence interval around the BMD is represented with solid green lines bounding the green shaded region (i.e., lower and upper BMD confidence limits - BMDL and BMDU, respectively). The black horizontal lines bounding the grey shaded region indicate the estimated baseline noise (per the user defined cutoff band) and is centered around the x-axis (i.e. y = 0).* ## Plotting All Models with `tcplfit2_core` Output While most users prefer to fit and hitcall all of their data in one step with `concRespCore`, some users (as mentioned in earlier sections) may prefer to perform curve fitting with `tcplfit2_core` and then hitcalling with `tcplhit2_core`. In this case, users may want to examine and compare each of the resulting concentration-response fits from all models included in the fitting step. The `plot_allcurves` function enables users to automatically generate this visualization with the output from the `tcplfit2_core` function. *Note, to utilize `plot_allcurves`, `tcplfit2_core` must be run separately to obtain the necessary input.* The resulting figure allows one to evaluate general behaviors and qualities of the resulting curve fits. Furthermore, some curves may fail to fit the observed data. In these cases, failed models are excluded from the plot, and a warning message is provided, such that the user will know which models reasonably describe the data. Lastly, if a user wants to visualize their data with the concentrations on the $\mathbf{log_{10}}$ scale, they can set the `log_conc` argument to `TRUE`. *The hidden code chunk below shows how to load the data and obtain the curve fitting results with `tcplfit2_core`. We also refer readers to the [Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core`](#ex2) section if they are interested in further details.* ```{r} # Load the example data set data("signatures") # using the first row of signature as an example conc <- as.numeric(str_split(signatures[1,"conc"],"\\|")[[1]]) resp <- as.numeric(str_split(signatures[1,"resp"],"\\|")[[1]]) cutoff <- signatures[1,"cutoff"] # run curve fitting output <- tcplfit2_core(conc, resp, cutoff) # show the structure of the output summary(output) ``` The following code demonstrates utilizing the curve fitting results from `tcplfit2_core` with the `plot_allcurves` function to generate the visualization containing all included model fits: ```{r class.source="fold-show",fig.height=8,fig.width=7} # get plots in the original and in log-10 concentration scale basic <- plot_allcurves(output, conc, resp) basic_log <- plot_allcurves(output, conc, resp, log_conc = T) # arrange the ggplot2 output into a grid grid.arrange(basic, basic_log) ``` ***Figure 12:** Example plots generated by `plot_allcurves`. Both plots display the experimental data (open circles) with all successful curve fits. Concentrations are in the original and $\mathbf{log_{10}}$ scale for the top and bottom plots, respectively.* ## Plotting the Winning Model with `concRespPlot2` Most users utilizing the `tcplfit2` package are only interested in generating a plot displaying the observed concentration-response data with the winning curve. This can be achieved with the `concRespPlot2` function, which generates a basic plot with minimal information. `concRespPlot2` gives a slightly more aesthetic plot compared to the basic plotting functionality in `concRespPlot` by using the `ggplot2` package. Minimalism in the resulting plot gives users the flexibility to include additional details they consider informative, while maintaining a clean visualization. More details on this is found in the [Customizing `concRespPlot2` Plots](#plot_custom) section. As with the `plot_allcurves` function, the `log_conc` argument is available to return a plot with concentrations on the $\mathbf{log_{10}}$ scale. *The hidden code chunk below shows how to format data and perform curve fitting and hitcalling with `concRespCore`. We also refer readers to the [Concentration-Response Modeling for a Single Series with `concRespCore`](#ex1) section if they are interested in further details.* ```{r} # prepare the 'row' object for concRespCore row <- list(conc=conc, resp=resp, bmed=0, cutoff=cutoff, onesd=signatures[1,"onesd"], name=signatures[1,"name"], assay=signatures[1,"signature"]) # run concentration-response modeling out <- concRespCore(row,conthits=F) # show the output out ``` The following code demonstrates utilizing the curve fit and hitcalling results from `concRespCore` with the `concRespPlot2` function to visualize the winning model fit: ```{r class.source="fold-show"} # pass the output to the plotting function basic_plot <- concRespPlot2(out) basic_log <- concRespPlot2(out, log_conc = TRUE) # arrange the ggplot2 output into a grid grid.arrange(basic_plot, basic_log) ``` ***Figure 13:** Example plots generated by `concRespPlot2`. Both plots display the experimental data (open circles) and the best curve fit (red curve). Concentrations are in the original and $\mathbf{log_{10}}$ scale for the top and bottom plots, respectively.* *Note, one may also use output from `tcplhit2_core` as input for `concRespPlot2`.* ## Customizing `concRespPlot2` Plots{#plot_custom} Users may want to generate a polished figure to include in a report or publication. However, the basic plot from `concRespPlot2` may not include enough context or information to be included as part of a report or publication. Thus, this section introduces a few simple modifications one can use to customize the basic plot returned by `concRespPlot2` to provide additional information. Because `concRespPlot2` returns a `ggplot2` object, additional details can be included with `ggplot2` layers. `ggplot2` layers can be added directly to the base plot with a `+` operator. Customizations one may want to include are: * Adding a title with compound and assay endpoint information * Visualizing the user-specified cutoff band to evaluate response efficacy * Adding points and lines to label potency estimates and relevant responses - e.g. the benchmark dose (BMD) and benchmark response (BMR) to evaluate the estimates relative to the experimental data * Adding comparable data and their winning curve fits to evaluate different experimental scenarios (e.g. multiple compounds, technologies, endpoints, etc.) It should be noted that this is just a small subset of the possible customizations and is not a comprehensive list of possible changes one could make. Each of the following sub-sections explores the aforementioned customizations, but again these are just a limited set of possible updates to the base plotting from `concRespPlot2`. *Note, the plotting output from `plot_allcurves` may also be customized similarly (if desired). However, this will not be shown in this vignette.* ### - Adding a Plot Title, Shade Cutoff Band, and Potency Estimates The first customization one may want to include on the basic plot from `concRespPlot2` is a title with necessary chemical and response (i.e. assay endpoint) information. Furthermore, because the estimated benchmark dose (BMD) (i.e. potency) is likely of interest for the applicable report/manuscript, then adding guidelines for the benchmark response (BMR) and BMD, as well as a shaded region representing the cutoff band (for reference) may be useful. *The hidden code chunk below adds a plot title, shades a region signifying the cutoff band, and highlights the specified adverse response level (BMR) with a horizontal blue line along with the potency estimate (BMD) represented by the vertical blue segment and red point.* ```{r} # Using the fitted result and plot from the example in the last section # get the cutoff from the output cutoff <- out[, "cutoff"] basic_plot + # Cutoff Band - a transparent rectangle geom_rect(aes(xmin = 0,xmax = 30,ymin = -cutoff,ymax = cutoff), alpha = 0.1,fill = "skyblue") + # Titles ggtitle( label = paste("Best Model Fit", out[, "name"], sep = "\n"), subtitle = paste("Assay Endpoint: ", out[, "assay"])) + ## Add BMD and BMR labels geom_hline( aes(yintercept = out[, "bmr"]), col = "blue") + geom_segment( aes(x = out[, "bmd"], xend = out[, "bmd"], y = -0.5, yend = out[, "bmr"]), col = "blue" ) + geom_point(aes(x = out[, "bmd"], y = out[, "bmr"], fill = "BMD"), shape = 21, cex = 2.5) ``` ***Figure 14:** Basic plot generated with `concRespPlot2` with updated titles to provide additional details about the observed data. Experimental data is shown with the open circles and the red curve represents the best fit model. The title and subtitle display the compound name and assay endpoint, respectively. The light blue band represents responses within the cutoff threshold(s) -- i.e. cutoff band. The red point represents the BMD estimated from the winning model, given the BMR. The horizontal and vertical blue lines display the BMR and the estimated BMD, respectively.* ### - Label All Potency Estimates The `concRespCore` and `tcplfit2_core` functions return several potency estimates in addition to the BMD (displayed in Figure 3), e.g. AC50, ACC, etc. Thus, it may be desirable to users to include and compare several of the resulting potency estimates on the same plot. *The hidden code chunk below demonstrates how to add all available potency estimates to the base plot.* ```{r} # Get all potency estimates and the corresponding y value on the curve estimate_points <- out %>% select(bmd, acc, ac50, ac10, ac5) %>% tidyr::pivot_longer(everything(), names_to = "Potency Estimates") %>% mutate(`Potency Estimates` = toupper(`Potency Estimates`)) y <- c(out[, "bmr"], out[, "cutoff"], rep(out[, "top"], 3)) y <- y * c(1, 1, .5, .1, .05) estimate_points <- cbind(estimate_points, y = y) # add Potency Estimate Points and set colors basic_plot + geom_point( data = estimate_points, aes(x = value, y = y, fill = `Potency Estimates`), shape = 21, cex = 2.5 ) ``` ***Figure 15:** Basic plot generated by `concRespPlot2` with potency estimates highlighted. Experimental data is shown with the open circles and the red curve represents the best fit model. The five colored points represent the various potency estimates from `concRespCore`. These include the activity concentrations at 5, 10, and 50 percent of the maximal response from baseline (AC5 = gold, AC10 = red, and AC50 = green, respectively), as well as the activity concentration at the user-specified threshold (cutoff) and BMD (ACC = blue and BMD = purple, respectively).* It should be noted, when using the `log_conc = TRUE` in the basic plotting function, the potency estimates will also need to be log-transformed to be displayed in the correct positions. *The hidden code chunk below demonstrates how to add potency values when the base plot is using a $\mathbf{log_{10}}$ concentration scale.* ```{r} # add Potency Estimate Points and set colors - with plot in log-10 concentration basic_log + geom_point( data = estimate_points, aes(x = log10(value), y = y, fill = `Potency Estimates`), shape = 21, cex = 2.5 ) ``` ***Figure 16:** Basic plot generated by `concRespPlot2`, where `log_conc = TRUE`, with potency estimates highlighted. Experimental data is shown with the open circles and the red curve represents the best fit model. The five colored points represent the various potency estimates from `concRespCore`. These include the activity concentrations at 5, 10, and 50 percent of the maximal response from baseline (AC5 = gold, AC10 = red, and AC50 = green, respectively), as well as the activity concentration at the user-specified threshold (cutoff) and BMD (ACC = blue and BMD = purple, respectively).* ### - Add Additional Curves Some users may want to compare one or more curve fits, which represent either various compounds, experimental scenarios, technologies, etc. For this example, the flexibility of `ggplot2` accommodates a user's unique plotting needs. This sub-section provides example code that a user may modify to add another curve, and may be generalized to add more than one curve. It is necessary the user first knows the models to be displayed on the plot and corresponding parameter estimates (i.e. must have all the fitting and hitcalling output prior to plotting), such that they can then generate smooth curves by predicting the responses for a series of points within the concentration range. The output for applicable curves (i.e. concentration points and predicted response for the smooth curve) can then be added to the basic plot. Here, the smooth curves are generated using a series of one hundred points within the experimental concentration range, but the curve resolution may be changed based on the number of points included in the concentration series (i.e. more points will result in higher resolution). *The hidden code chunk below demonstrates how to predict the responses for another curve and generate a smooth curve fit to be added to the basic plot. Additionally, we have included details for labeling the two curve fits plotted together.* ```{r} # maybe want to extract and use the same x's in the base plot # to calculate predicted responses conc_plot <- basic_plot[["layers"]][[2]][["data"]][["conc_plot"]] basic_plot + # fitted parameter values of another curve you want to add geom_line(data=data.frame(x=conc_plot, y=tcplfit2::exp5(c(0.5, 10, 1.2), conc_plot)), aes(x,y,color = "exp5"))+ # add different colors for comparisons scale_colour_manual(values=c("#CC6666", "#9999CC"), labels = c("Curve 1-exp4", "Curve 2-exp5")) + labs(title = "Curve 1 v.s. Curve 2") ``` ***Figure 17:** Basic plot generated by `concRespPlot2` with an additional curve for comparison. Experimental data is shown with the open circles, the red curve represents the best fit model for the baseline model, and the blue curve represents the additional curve of interest.* Plots like Figure 17 typically have similar concentrations and response ranges. If one is comparing curves that do not have similar concentration and/or response ranges, additional alterations may be necessary. # Area Under the Curve (AUC) **Please note, the AUC estimation in `tcplfit2` is a beta functionality still under development and review, and as such, feedback is welcome.** This section explores how to estimate the area under the curve (AUC) for concentration-response fits from `tcplfit2`. Generally, the AUC estimate may be interpreted as a measure of overall efficacy and potency, which users may want to include as part of their analyses, e.g. analyses aiming to prioritize chemicals by bio-activity. The AUC is estimated by integrating the best fitting (or another applicable) model with the optimized parameter values obtained during the curve fitting process. *Note: When applying the `get_AUC` function, which estimates the AUC, it is important to know whether the model bounds are on the log10- or arithmetic-scale. Using the log10-scale or arithmethic scale may result in different values and interpretation of the AUC value may change. In the `get_AUC` function, `use.log` is a logical option to control which scale the AUC is calculated on, and is `FALSE` by default.* In `tcplfit2` we provide functionality such that a user may obtain the AUC directly from the `concRespCore` function and include it as part of the output table. Alternatively, one may use a more granular approach by utilizing the `get_AUC` and `post_hit_AUC` functions directly with the `tcplfit2_core` and `tcplhit2_core` output, respectively. The following two sections outline these approaches, and the latter section breaks down the AUC estimation for several different response cases. ## Area Under the Curve (AUC) with `concRespCore` Performing the AUC estimation within `concRespCore` is a fairly simple modification. The `concRespCore` function has a logical argument `AUC` controlling whether the area under the curve (AUC) is calculated for the winning model and returned alongside the other modeling results (e.g. model parameters and hitcall details), when `AUC = TRUE` the AUC will be included in the output. (default is `FALSE` requiring a user to specify the inclusion of this output). ```{r ex1_AUC,class.source="fold-show"} # some example data conc <- list(.03, .1, .3, 1, 3, 10, 30, 100) resp <- list(0, .2, .1, .4, .7, .9, .6, 1.2) row <- list(conc = conc, resp = resp, bmed = 0, cutoff = 1, onesd = .5) # AUC is included in the output concRespCore(row, conthits = TRUE, AUC = TRUE) ``` ## Area Under the Curve (AUC) with `tcplfit2_core` and `tcplhit2_core` Let us consider the case where a users wants to run the `tcplfit2_core` and `tcplhit2_core` functions separately and now wants to obtain AUC estimates. Here, and in the following sub-sections, we demonstrate estimating the AUC for this type of scenario. We will consider obtaining the AUC values for individual models from the fit results, and AUC values only for the best fit (i.e. winning) model. Furthermore, we will consider the following response cases in the following sub-sections: - [Positive Curve Fits - i.e. increasing models only](#positivecurve) - [Negative Curve Fits - i.e. decreasing models only](#negativecurve) - [Bi-phasic Curve Fits - i.e. models where the curve crosses the x-axis](#biphasiccurve) ### - Positive Responses {#positivecurve} First, let us consider a positive curve fit case, which is the typical baseline example -- (i.e. monotonic increasing response above the x-axis). *The hidden code chunk below shows the data set-up, curve fitting, and plotting code for our positive curve fit example.* ```{r ex2_AUC, fig.height = 4.55, fig.width = 8} # This is taken from the example under tcplfit2_core conc_ex2 <- c(0.03, 0.1, 0.3, 1, 3, 10, 30, 100) resp_ex2 <- c(0, 0.1, 0, 0.2, 0.6, 0.9, 1.1, 1) # fit all available models in the package # show all fitted curves output_ex2 <- tcplfit2_core(conc_ex2, resp_ex2, 0.8) # arrange the ggplot2 output into a grid grid.arrange(plot_allcurves(output_ex2, conc_ex2, resp_ex2), plot_allcurves(output_ex2, conc_ex2, resp_ex2, log_conc = TRUE), ncol = 2) ``` ***Figure 18:** This figure depicts all fit concentration-response curves. The models are polynomial 1 and 2, power, Hill, gain-loss, and exponentials 2-5.* Let us first consider the case where only the AUC estimate for the winning model is desirable. For this scenario, we included the `post_hit_AUC` function, which is a wrapper function for `get_AUC`, within `tcplfit2`. This function takes the `tcplhit2_core` output, in the data frame format with a single row containing the concentration-response data, the winning model name, winning model's optimized parameter values, and hitcalling results. Internally, the wrapper function extracts information from the one-row data frame output and passes it to `get_AUC`, which calculates the AUC. ```{r ex2_AUC_posthit,class.source="fold-show"} # hitcalling results out <- tcplhit2_core(output_ex2, conc_ex2, resp_ex2, 0.8, onesd = 0.4) out # perform AUC estimation post_hit_AUC(out) ``` Now, suppose the users wants AUC estimates for a single model which is not necessarily the best fit model to the data. For this scenario, the user will want to use the most granular AUC estimation function (i.e. `get_AUC`). Unlike the `post_hit_AUC` function, it is necessary to manually enter the model name, parameters values, etc. to obtain an AUC estimate. The full list of necessary inputs include: - the model name (single model of interest) - lower and upper concentration bounds (usually the lowest and highest concentrations in the data, respectively) - the estimated model parameters (for the specified model) Here we demonstrate the AUC estimation for the Hill model with `get_AUC`, starting with extracting the relevant parameter values from the `tcplfit2_core` output to passing the relevant information to the AUC estimation function. ```{r ex2_AUC-getAUC,class.source="fold-show"} fit_method <- "hill" # extract the parameters modpars <- output_ex2[[fit_method]][output_ex2[[fit_method]]$pars] # plug into get_AUC function estimated_auc1 <- get_AUC(fit_method, min(conc_ex2), max(conc_ex2), modpars) estimated_auc1 # extract the predicted responses from the model pred_resp <- output_ex2[[fit_method]][["modl"]] ``` ```{r ex2_AUC-getAUC-plot,fig.height = 6, fig.width = 6} # plot to see if the result make sense # the shaded area is what the function tries to find plot(conc_ex2, pred_resp,ylim = c(0,1), xlab = "Concentration",ylab = "Response",main = "Positive Response AUC") lines(conc_ex2, pred_resp) polygon(c(conc_ex2, max(conc_ex2)), c(pred_resp, min(pred_resp)), col=rgb(1,0,0,0.5)) ``` ***Figure 19:** The red shaded region is the area under the Hill curve fit. The AUC estimated with `get_AUC` is `r round(estimated_auc1,5)`. This estimate seems to align with the area of the shaded region. * Because the winning model in this example is the Hill model, if we compare the AUC from the two previous approaches the AUC values are identical -- i.e. `post_hit_AUC`: `r round(post_hit_AUC(out),5)`, `get_AUC`: `r round(estimated_auc1,5)`. As mentioned earlier, because `get_AUC` is the most granular of the AUC estimation functions and most flexible we can use this function to estimate the AUC for all models, excluding the constant model, fit to a concentration-response series. *The hidden code chunk below demonstrates how to apply the `get_AUC` function across all models included in the `tcplfit2_core` output.* ```{r ex2_AUC-other-models} # list of models fitmodels <- c("gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5") mylist <- list() for (model in fitmodels){ fit_method <- model # extract corresponding model parameters modpars <- output_ex2[[fit_method]][output_ex2[[fit_method]]$pars] # get AUC mylist[[fit_method]] <- get_AUC(fit_method, min(conc_ex2), max(conc_ex2), modpars) } # print AUC's for other models data.frame(mylist,row.names = "AUC") ``` ### - Negative Responses{#negativecurve} Next, let us consider a negative curve fit case -- (i.e. monotonic decreasing response about the x-axis). Here, we use example data from the `signatures` dataset. *The hidden code chunk below shows the data set-up, curve fitting, and plotting code for our negative curve fit example.* ```{r ex3_AUC, fig.height = 4.55, fig.width = 8} # use row 5 in the data conc <- as.numeric(str_split(signatures[5,"conc"],"\\|")[[1]]) resp <- as.numeric(str_split(signatures[5,"resp"],"\\|")[[1]]) cutoff <- signatures[5,"cutoff"] # plot all models, this is an example of negative curves output_negative <- tcplfit2_core(conc, resp, cutoff) grid.arrange(plot_allcurves(output_negative, conc, resp), plot_allcurves(output_negative, conc, resp, log_conc = TRUE), ncol = 2) ``` ***Figure 20:** This plot depicts all concentration-response curves fit to the observed data. All curves show decreasing responses starting from 0 and below the x-axis. * Here, we will only demonstrate using the `get_AUC` function with the exponential 3 model. *Note: This is not the best fit model based on the AIC.* ```{r ex3_AUC-getAUC,class.source="fold-show"} # choose fit method fit_method <- "exp3" # extract corresponding model parameters and predicted response modpars <- output_negative[[fit_method]][output_negative[[fit_method]]$pars] pred_resp <- output_negative[[fit_method]][["modl"]] estimated_auc2 <- get_AUC(fit_method, min(conc), max(conc), modpars) estimated_auc2 ``` ```{r ex3_AUC-plot,fig.height = 6, fig.width = 6} # plot this curve pred_resp <- pred_resp[order(conc)] plot(conc[order(conc)], pred_resp,ylim = c(-1,0), xlab = "Concentration",ylab = "Response",main = "Negative Response AUC") lines(conc[order(conc)], pred_resp) polygon(c(conc[order(conc)], max(conc)), c(pred_resp, max(pred_resp)), col=rgb(1,0,0,0.5)) ``` ***Figure 21:** Notice the function returns a negative AUC value, `r round(estimated_auc2, 5)`. The absolute value, `r abs(round(estimated_auc2,5))`, seems to align with the area between the curve and the x-axis. Note: The x-axis in this plot is in the original (un-logged) units.* As demonstrated, when integrating over a curve in the negative direction, the function will return a negative AUC value. However, some users may want to consider all "areas" (i.e. AUC estimates) as positive values. For this reason, the `return.abs = TRUE` argument in `get_AUC` converts negative AUC values to positive values when returned. However, this argument is `FALSE` by default. ```{r ex3_AUC-abs-neg,class.source="fold-show"} get_AUC(fit_method, min(conc), max(conc), modpars, return.abs = TRUE) ``` ### - Bi-phasic Responses{#biphasiccurve} Finally, let us consider a bi-phasic curve fit case -- (i.e. response increases then decreases, or vice versa, and typically crosses the x-axis somewhere in the experimental concentration range). Currently, only the polynomial 2 model in `tcplfit2` is capable of fitting a bi-phasic response. Because curve fits (as implemented in the `tcplfit2` package) are bounded such that the baseline response is always assumed to be 0, there is typically some response above the x-axis and some below. This section demonstrates the AUC estimation for a simulated bi-phasic curve, with area under the curve both below and above the x-axis, for such events. The polynomial 2 model in `tcplfit2` is implemented as $a*(\frac{x}{b} + \frac{x^2}{b^2})$. Here, we simulate a bi-phasic curve, where $a = 2.41$ and $b = (-1.86)$, which can also be represented in the typical form as $\frac{1}{4} x^2 - \frac{1}{2}x$. *The hidden code chunk below shows the data simulation and plotting the simulated curve.* ```{r ex4_AUC, fig.height = 6, fig.width = 6} # simulate a poly2 curve conc_sim <- seq(0,3, length.out = 100) ## biphasic poly2 parameters b1 <- -1.3 b2 <- 0.7 ## converted to tcplfit2's poly2 parameters a <- b1^2/b2 b <- b1/b2 c(a,b) ## plot the curve resp_sim <- poly2(c(a, b, 0.1), conc_sim) plot(conc_sim, resp_sim, type = "l", xlab = "Concentration",ylab = "Response",main = "Biphasic Response") abline(h = 0) ``` ***Figure 22:** This plot illustrates the simulated bi-phasic polynomial 2 curve. The curve initially decreases, then increases and crosses the x-axis.* Because the simulated parameters are known for this example, we can utilize this information directly in the `get_AUC` function. *However, one could also add noise to the simulated curve and go through the typical curve fitting process outlined in earlier sections -- we will leave it as an exercise to the users if they desire.* ```{r ex4_AUC-cont,class.source="fold-show"} # get AUC for the simulated Polynomial 2 curve get_AUC("poly2", min(conc_sim), max(conc_sim), ps = c(a, b)) ``` Currently, when integrating over a bi-phasic curve fit the `get_AUC` function returns the difference between the total area above the x-axis and the total area below the x-axis (i.e. the blue region minus the red region in Figure 23). In this example, the area above the x-axis is slightly larger than the area below the x-axis resulting in a positive AUC value. ```{r fig.height = 6, fig.width = 6} ## plot the curve for the AUC plot(conc_sim, resp_sim, type = "l", xlab = "Concentration",ylab = "Response",main = "Biphasic Response AUC") abline(h = 0) polygon(c(conc_sim[which(resp_sim <= 0)], max(conc_sim[which(resp_sim <= 0)])), c(resp_sim[which(resp_sim <= 0)], max(resp_sim[which(resp_sim <= 0)])), col="skyblue") polygon(c(conc_sim[c(max(which(resp_sim <= 0)),which(resp_sim > 0))], max(conc_sim[which(resp_sim > 0)])), c(0,resp_sim[which(resp_sim > 0)], 0), col="indianred") ``` ***Figure 23:** This plot illustrates the simulated bi-phasic polynomial 2 curve, with the regions included in the AUC estimation.* # Model Details{#model_details} This section contains details for the various models available in `tcplfit2`, with parameter explanations and illustrative plots. Users should note that the implementation of all models in `tcplfit2` assume the baseline response is always zero ($y = 0$). *The hidden code chunk below sets up two concentration ranges used in the following visualizations demonstrating the effect of changing various parameters in the models on the shape of the concentration-response curve.* ```{r setup-2, warning=FALSE} # prepare concentration data for demonstration ex_conc <- seq(0, 100, length.out = 500) ex2_conc <- seq(0, 3, length.out = 100) ``` ## Polynomial 1 (poly1){#poly1} The polynomial 1 (poly1) model is a simple linear model with the intercept assumed to be at zero. Model: $y = ax$ Parameters include: - `a` : slope of the line (i.e. rate of change for the response across the concentration/dose range). If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \ge 0$ (i.e. non-negative). ```{r poly-1,fig.width=5,fig.height=5,warning=FALSE} poly1_plot <- ggplot(mapping=aes(ex_conc)) + geom_line(aes(y = 55*ex_conc, color = "a=55")) + geom_line(aes(y = 10*ex_conc, color = "a=10")) + geom_line(aes(y = 0.05*ex_conc, color = "a=0.05")) + geom_line(aes(y = -5*ex_conc, color = "a=(-5)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-5)', 'a=0.05', 'a=10', 'a=55'), values=c('a=(-5)'='black', 'a=0.05' = 'red', 'a=10'='blue', 'a=55'='darkviolet')) poly1_plot ``` ***Figure 24:** This plot illustrates how changing the parameter `a` (slope) affects the shape of the resulting curves.* ## Polynomial 2 (poly2){#poly2} The polynomial 2 (poly2) model is a quadratic model with the baseline response assumed to be zero. The quadratic model implemented in `tcplfit2` is parameterized such that the `a` and `b` parameters are interpreted in terms of their impact on the the x- and y-scales, respectively. The `poly2` model is defined by the following equation: Model: $f(x) = a(\frac{x}{b} + \frac{x^2}{b^2})$. Note, this parameterization differs from the typical representation of a quadratic function. * Typical quadratic function: $f(x) = (b_1)x^2+(b_2)x+c$. Parameters include: - `a` : The y-scalar. If `a` increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \ge 0$ (i.e. non-negative). - `b` : The x-scalar. If `b` increase, the curve is shrunk horizontally. Optimization of the poly2 model in `tcplfit2` restricts `b` such that $b > 0$. ```{r poly-2, fig.width=8, fig.height=5, warning=FALSE} fits_poly <- data.frame( # change a y1 = poly2(ps = c(a = 40, b = 2),x = ex_conc), y2 = poly2(ps = c(a = 6, b = 2),x = ex_conc), y3 = poly2(ps = c(a = 0.1, b = 2),x = ex_conc), y4 = poly2(ps = c(a = -2, b = 2),x = ex_conc), y5 = poly2(ps = c(a = -20, b = 2),x = ex_conc), # change b y6 = poly2(ps = c(a = 4,b = 1.8),x = ex_conc), y7 = poly2(ps = c(a = 4,b = 7),x = ex_conc), y8 = poly2(ps = c(a = 4,b = 16),x = ex_conc) ) # shows how changes in parameter 'a' affect the shape of the curve poly2_plot1 <- ggplot(fits_poly, aes(ex_conc)) + geom_line(aes(y = y1, color = "a=40")) + geom_line(aes(y = y2, color = "a=6")) + geom_line(aes(y = y3, color = "a=0.1")) + geom_line(aes(y = y4, color = "a=(-2)")) + geom_line(aes(y = y5, color = "a=(-20)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-20)', 'a=(-2)', 'a=0.1', 'a=6', 'a=40'), values=c('a=(-20)'='black', 'a=(-2)'='red', 'a=0.1'='blue', 'a=6'='darkviolet', 'a=40'='darkgoldenrod1')) # shows how changes in parameter 'b' affect the shape of the curve poly2_plot2 <- ggplot(fits_poly, aes(ex_conc)) + geom_line(aes(y = y6, color = "b=1.8")) + geom_line(aes(y = y7, color = "b=7")) + geom_line(aes(y = y8, color = "b=16")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='b values', breaks=c('b=1.8', 'b=7', 'b=16'), values=c('b=1.8'='black', 'b=7'='red', 'b=16'='blue')) grid.arrange(poly2_plot1, poly2_plot2, ncol = 2) ``` ***Figure 25:** The left plot illustrates how changing `a` (y-scalar) affects the shape of the resulting polynomial 2 curves while holding `b` constant ($b = 2$). The right plot illustrates how changing `b` (x-scalar) affects the shape of the resulting polynomial 2 curves while holding `a` constant ($a = 4$).* It should be noted, the quadratic model may be optimized either allowing for the possibility of bi-phasic responses in the concentration/dose range (`poly2.biphasic=TRUE` argument in `tcplfit2_core`, default) or assuming the response is monotonic (`poly2.biphasic=FALSE`). When bi-phasic modeling is enabled, the polynomial 2 model is optimized using the typical quadratic function then parameters are converted to the x- and y-scalar parameterization. ## Power (pow){#pow} Model: $f(x) = a*x^b$ Parameters include: - `a` : Scaling factor. If `a` increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \gt 0$. - `p` : Power, or the rate of growth. A measure of how steep the curve is. The larger `p` is, the steeper the curve is. Optimization of the power model restricts `p` such that $0.3 \le p \le 20$. ```{r pow, fig.width=8, fig.height=5, warning=FALSE} fits_pow <- data.frame( # change a y1 = pow(ps = c(a = 0.48,p = 1.45),x = ex2_conc), y2 = pow(ps = c(a = 7.2,p = 1.45),x = ex2_conc), y3 = pow(ps = c(a = -3.2,p = 1.45),x = ex2_conc), # change p y4 = pow(ps = c(a = 1.2,p = 0.3),x = ex2_conc), y5 = pow(ps = c(a = 1.2,p = 1.6),x = ex2_conc), y6 = pow(ps = c(a = 1.2,p = 3.2),x = ex2_conc) ) # shows how changes in parameter 'a' affect the shape of the curve pow_plot1 <- ggplot(fits_pow, aes(ex2_conc)) + geom_line(aes(y = y1, color = "a=0.48")) + geom_line(aes(y = y2, color = "a=7.2")) + geom_line(aes(y = y3, color = "a=(-3.2)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-3.2)', 'a=0.48', 'a=7.2'), values=c('a=(-3.2)'='black', 'a=0.48'='red', 'a=7.2'='blue')) # shows how changes in parameter 'p' affect the shape of the curve pow_plot2 <- ggplot(fits_pow, aes(ex2_conc)) + geom_line(aes(y = y4, color = "p=0.3")) + geom_line(aes(y = y5, color = "p=1.6")) + geom_line(aes(y = y6, color = "p=3.2")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='p values', breaks=c('p=0.3', 'p=1.6', 'p=3.2'), values=c('p=0.3'='black', 'p=1.6'='red', 'p=3.2'='blue')) grid.arrange(pow_plot1, pow_plot2, ncol = 2) ``` ***Figure 26:** The left plot illustrates how changing `a` (scaling factor) affects the shape of the resulting power curves while holding `p` constant ($p = 1.45$). The right plot illustrates how changing `p` (power) affects the shape of the resulting power curves while holding `a` constant ($a = 1.2$). Note: These plots use a concentration range from 0 to 3 to better show the impact of `p` on the resulting curves.* ## Hill {#hill} Model: $f(x) = \frac{tp}{(1 + (ga/x)^p )}$ Parameters include: - `tp` : Top parameter, the maximum theoretical response (highest or lowest - for an increasing or decreasing curve, respectively) achieved at saturation, that is the horizontal asymptote. If bi-directional fitting is allowed, then $-\infty < tp <\infty$. Otherwise $0 \le tp < \infty$. - `ga` : AC50, concentration at 50% of the maximal activity. It provides useful information about the "apparent affinity" of the protein under study (enzyme, transporter, etc.) for the substrate. The model restricts `ga` such that $0 \le ga < \infty$. - `p` : Power, also called the Hill coefficient. Mathematically, it is a measure of how steep the response curve is. In context, it is a measure of the co-operativity of substrate binding to the enzyme, transporter, etc. Optimization of the Hill model restricts `p` such that $0.3 \le p \le 8$. ```{r Hill, fig.height=5, fig.width=8, warning=FALSE} fits_hill <- data.frame( # change tp y1 = hillfn(ps = c(tp = -200,ga = 5,p = 1.76), x = ex_conc), y2 = hillfn(ps = c(tp = 200,ga = 5,p = 1.76), x = ex_conc), y3 = hillfn(ps = c(tp = 850,ga = 5,p = 1.76), x = ex_conc), # change ga y4 = hillfn(ps = c(tp = 120,ga = 4,p = 1.76), x = ex_conc), y5 = hillfn(ps = c(tp = 120,ga = 12,p = 1.76), x = ex_conc), y6 = hillfn(ps = c(tp = 120,ga = 20,p = 1.76), x = ex_conc), # change p y7 = hillfn(ps = c(tp = 120,ga = 5,p = 0.5), x = ex_conc), y8 = hillfn(ps = c(tp = 120,ga = 5,p = 2), x = ex_conc), y9 = hillfn(ps = c(tp = 120,ga = 5,p = 5), x = ex_conc) ) # shows how changes in parameter 'tp' affect the shape of the curve hill_plot1 <- ggplot(fits_hill, aes(log10(ex_conc))) + geom_line(aes(y = y1, color = "tp=(-200)")) + geom_line(aes(y = y2, color = "tp=200")) + geom_line(aes(y = y3, color = "tp=850")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.7), legend.key.size = unit(0.5, 'cm')) + scale_color_manual(name='tp values', breaks=c('tp=(-200)', 'tp=200', 'tp=850'), values=c('tp=(-200)'='black', 'tp=200'='red', 'tp=850'='blue')) # shows how changes in parameter 'ga' affect the shape of the curve hill_plot2 <- ggplot(fits_hill, aes(log10(ex_conc))) + geom_line(aes(y = y4, color = "ga=4")) + geom_line(aes(y = y5, color = "ga=12")) + geom_line(aes(y = y6, color = "ga=20")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.7), legend.key.size = unit(0.4, 'cm')) + scale_color_manual(name='ga values', breaks=c('ga=4', 'ga=12', 'ga=20'), values=c('ga=4'='black', 'ga=12'='red', 'ga=20'='blue')) # shows how changes in parameter 'p' affect the shape of the curve hill_plot3 <- ggplot(fits_hill, aes(log10(ex_conc))) + geom_line(aes(y = y7, color = "p=0.5")) + geom_line(aes(y = y8, color = "p=2")) + geom_line(aes(y = y9, color = "p=5")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.7), legend.key.size = unit(0.4, 'cm')) + scale_color_manual(name='p values', breaks=c('p=0.5', 'p=2', 'p=5'), values=c('p=0.5'='black', 'p=2'='red', 'p=5'='blue')) grid.arrange(hill_plot1, hill_plot2, hill_plot3, ncol = 2, nrow = 2) ``` ***Figure 27:** The top left plot illustrates how changing `tp` (maximal theoretical change in response) affects the shape of the resulting Hill curves while holding all other parameters constant ($ga = 5, p = 1.76$). The top right plot illustrates how changing `ga` (slope) affects the shape of the resulting Hill curves while holding all other parameters constant ($tp = 120, p = 1.76$). The bottom left plot illustrates how changing `p` (power) affects the shape of the resulting Hill curves while holding all other parameters constant ($tp = 120, ga = 5$). Note: The x-axes are in the $\mathbf{log_{10}}$ scale to reflect the scale the model is optimized in, i.e. log Hill model $f(x) = \frac{tp}{1 + 10^{(p*(ga-x))}}$.* ## Gain-Loss (gnls){#gnls} The gain-loss (gnls) model is the product of two Hill models. One Hill model fits the response going up (gain) and one fits the response going down (loss). A gain-loss curve can occur either as a gain in response first then changing to a loss, or vice-versa. Model: $f(x) = \frac{tp}{[(1 + (ga/x)^p )(1 + (x/la)^q )]}$ Parameters include: - `tp`, `ga`, and `p` are the same as in the [Hill model](#hill), and the `la` and `q` parameters are counterparts to the `ga` and `p` parameters, respectively, but in the loss direction of the curve. - `la` : Loss AC50, concentration at 50% of the maximal activity in the loss direction. The model optimization restricts `la` such that $0 \le la < \infty$ and $la-ga\ge 1.5$. - `q` : Loss power or the rate of loss. The larger it is, the faster the curve decreases (if it increases first). The model restricts `q` such that $0.3 \le q \le 8$. ```{r gnls, fig.width=8, fig.height=5, warning=FALSE} fits_gnls <- data.frame( # change la y1 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 17,q = 1.34), x = ex_conc), y2 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 50,q = 1.34), x = ex_conc), y3 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 100,q = 1.34), x = ex_conc), # change q y4 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 0.3), x = ex_conc), y5 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 1.2), x = ex_conc), y6 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 8), x = ex_conc) ) # shows how changes in parameter 'la' affect the shape of the curve gnls_plot1 <- ggplot(fits_gnls, aes(log10(ex_conc))) + geom_line(aes(y = y1, color = "la=17")) + geom_line(aes(y = y2, color = "la=50")) + geom_line(aes(y = y3, color = "la=100")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='la values', breaks=c('la=17', 'la=50', 'la=100'), values=c('la=17'='black', 'la=50'='red', 'la=100'='blue')) # shows how changes in parameter 'q' affect the shape of the curve gnls_plot2 <- ggplot(fits_gnls, aes(log10(ex_conc))) + geom_line(aes(y = y4, color = "q=0.3")) + geom_line(aes(y = y5, color = "q=1.2")) + geom_line(aes(y = y6, color = "q=8")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='q values', breaks=c('q=0.3', 'q=1.2', 'q=8'), values=c('q=0.3'='black', 'q=1.2'='red', 'q=8'='blue')) grid.arrange(gnls_plot1, gnls_plot2, ncol = 2) ``` ***Figure 28:** The left plot illustrates how changing `la` (loss slope) affects the shape of the resulting gain-loss curves while holding all other parameters constant ($tp = 750,ga = 15,p = 1.45,q = 1.34$). The right plot illustrates how changing `q` (loss power) affects the shape of the resulting gain-loss curves while holding all other parameters constant ($tp = 750,ga = 15,p = 1.45,la = 20$). Note: The x-axes are in the $\mathbf{log_{10}}$ scale to reflect the scale the model is optimized in, i.e. the log gain-loss model $f(x) = \frac{tp}{[(1 + 10^{(p*(ga-x))} )(1 + 10^{(q*(x-la))})] }$.* ## Exponential 2 (Exp2){#exp2} Model: $f(x) = a*(e^{\frac{x}{b}}-1)$ Parameters include: - `a` : The y-scalar. If `a` increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a < \infty$. Otherwise, $0 < a <\infty$. - `b` : The x-scalar. If `b` increases, the curve is shrunk horizontally. The model restricts `b` such that $b > 0$ (i.e. positive). ```{r exp2, fig.width=8, fig.height=5, warning=FALSE} fits_exp2 <- data.frame( # change a y1 = exp2(ps = c(a = 20,b = 12), x = ex2_conc), y2 = exp2(ps = c(a = 9,b = 12), x = ex2_conc), y3 = exp2(ps = c(a = 0.1,b = 12), x = ex2_conc), y4 = exp2(ps = c(a = -3,b = 12), x = ex2_conc), # change b y5 = exp2(ps = c(a = 0.45,b = 4), x = ex2_conc), y6 = exp2(ps = c(a = 0.45,b = 9), x = ex2_conc), y7 = exp2(ps = c(a = 0.45,b = 20), x = ex2_conc) ) # shows how changes in parameter 'a' affect the shape of the curve exp2_plot1 <- ggplot(fits_exp2, aes(ex2_conc)) + geom_line(aes(y = y1, color = "a=20")) + geom_line(aes(y = y2, color = "a=9")) + geom_line(aes(y = y3, color = "a=0.1")) + geom_line(aes(y = y4, color = "a=(-3)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-3)', 'a=0.1', 'a=9', 'a=20'), values=c('a=(-3)'='black', 'a=0.1'='red', 'a=9'='blue', 'a=20'='darkviolet')) # shows how changes in parameter 'b' affect the shape of the curve exp2_plot2 <- ggplot(fits_exp2, aes(ex2_conc)) + geom_line(aes(y = y5, color = "b=4")) + geom_line(aes(y = y6, color = "b=9")) + geom_line(aes(y = y7, color = "b=20")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='b values', breaks=c('b=4', 'b=9', 'b=20'), values=c('b=4'='black', 'b=9'='red', 'b=20'='blue')) grid.arrange(exp2_plot1, exp2_plot2, ncol = 2) ``` ***Figure 29:** The left plot illustrates how changing `a` (y-scalar) affects the shape of the resulting exponential 2 curves while holding `b` constant ($b=12$). The right plot illustrates how changing `b` (x-scalar) affects the shape of the resulting exponential 2 curves while holding `a` constant ($a=0.45$). Note: These plots use a smaller concentration range from 0 to 3 to better show the impact of `b` on the resulting curves.* ## Exponential 3 (Exp3){#exp3} Model: $f(x) = a*(e^{(x/b)^p} - 1)$ Parameters include: - `a` and `b` are similar to those in Exponential 2. For details and plots, refer back to [Exponential 2](#exp2). - `p` : Power. A measure of how steep the curve is. The further `p` is from 1, the steeper the curve is. The model restricts `p` such that $0.3 \le p \le 8$. ```{r exp3, fig.width=5, fig.height=5, warning=FALSE} fits_exp3 <- data.frame( # change p y1 = exp3(ps = c(a = 1.67,b = 12.5,p = 0.3), x = ex2_conc), y2 = exp3(ps = c(a = 1.67,b = 12.5,p = 0.9), x = ex2_conc), y3 = exp3(ps = c(a = 1.67,b = 12.5,p = 1.2), x = ex2_conc) ) # shows how changes in parameter 'p' affect the shape of the curve exp3_plot <- ggplot(fits_exp3, aes(ex2_conc)) + geom_line(aes(y = y1, color = "p=0.3")) + geom_line(aes(y = y2, color = "p=0.9")) + geom_line(aes(y = y3, color = "p=1.2")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='p values', breaks=c('p=0.3', 'p=0.9', 'p=1.2'), values=c('p=0.3'='black', 'p=0.9'='red', 'p=1.2'='blue')) exp3_plot ``` ***Figure 30:** This plot illustrates how changing `p` (power) affects the shape of the resulting exponential 3 curves while holding all other parameters constant ($a = 1.67,b = 12.5$). Note: This plot uses a smaller concentration range from 0 to 3 to better show the impact of `p` on the resulting curves.* ## Exponential 4 (Exp4){#exp4} Model: $f(x) = tp*(1-2^{(-\frac{x}{ga})})$ Parameters include: - `tp` : Top parameter. The maximum theoretical response (i.e., horizontal asymptote that the predicted curve is approaching), which may also be negative for decreasing curves. If bi-directional fitting is allowed, then $-\infty b (x-scale)", # quadratic "a (y-scale)
p (power)", # power "tp (top parameter)
ga (gain AC50)
p (gain-power)", # hill "tp (top parameter)
ga (gain AC50)
p (gain power)
la (loss AC50)
q (loss power)", # gain-loss "a (y-scale)
b (x-scale)", # exp2 "a (y-scale)
b (x-scale)
p (power)", # exp3 "tp (top parameter)
ga (AC50)", # exp4 "tp (top parameter)
ga (AC50)
p (power)" # exp5 ) # Fifth column - additional model details. Details <- c( "Parameters always equals 'er'.", # constant "", # linear "", # quadratic "", # power "Concentrations are converted internally to log10 units and optimized with f(x) = tp/(1 + 10^(p*(gax))), then ga and ga_sd are converted back to regular units before returning.", # hill "Concentrations are converted internally to log10 units and optimized with f(x) = tp/[(1 + 10^(p*(gax)))(1 + 10^(q*(x-la)))], then ga, la, ga_sd, and la_sd are converted back to regular units before returning." , # gain-loss "", # exp2 "", # exp3 "", # exp4 "") # exp5 # Consolidate all columns into a table. output <- data.frame(Model, Abbreviation, Equations, OutputParameters, Details) # Export/print the table into an html rendered table. htmlTable(output, align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ', caption="*tcplfit2* model details.", tfoot = "Model descriptions are pulled from tcplFit2 manual at ." ) ``` # Glossary The following glossary, though it may not be encompassing all terms included in this package, is provided to serve as a quick reference when using `tcplfit2`: a : Model fitting parameter in the following models: exp2, exp3, poly1, poly2, pow ac5 : Active concentration at 5% of the maximal predicted change in response (top) value ac10 : Active concentration at 10% of the maximal predicted change in response (top) value ac20 : Active concentration at 20% of the maximal predicted change in response (top) value ac50 : Active concentration at 50% of the maximal predicted change in response (top) value acc : Active concentration at the cutoff ac1sd : Active concentration at 1 standard deviation of the baseline response b : Model fitting parameter in the following models: exp2, exp3, ploy2 bmad : Baseline median absolute deviation. Measure of baseline variability. bmed : Baseline median response. If set to zero then the data are already zero-centered. Otherwise, this value is used to zero-center the data by shifting the entire response series by the specified amount. bmd : Benchmark dose, activity concentration observed at the benchmark response (BMR) level bmdl : Benchmark dose lower confidence limit. Derived using a 90% confidence interval around the BMD to reflect the uncertainty bmdu : Benchmark dose upper confidence limit. Derived using a 90% confidence interval around the BMD to reflect the uncertainty bmr : Benchmark response. Response level at which the BMD is calculated as $BMR = {\text{onesd}}\times{\text{bmr_scale}}$, where the default `bmr_scale` is 1.349 caikwt : Akaike weight of the constant model relative to the winning model, calculated as $\frac{exp(0.5*AIC_{constant})}{exp(0.5*AIC_{constant})+exp(0.5*AIC_{winning})}$. Used in calculating the continuous hitcall. conc : Tested concentrations, typically micromolar ($\mu M$) cutoff : Efficacy threshold. User-specified to define activity and may reflect statistical, assay-specific, and biological considerations er : Model fitting error parameter, measure of the uncertainty in parameters used to define the model and plotting error bars fit_method : Curve fit method ga : AC50 for the rising curve in a Hill model or the gnls model hitc or hitcall : Continuous hitcall value ranging from 0 to 1 mll : Maximum log-likelihood of winning model. Used in calculating the continuous hitcall $length(modpars) - aic(fit_{method})/2$ la : AC50 for the falling curve in a gain-loss model lc50 : Loss concentration at 50% of maximal predicted change in response (top), corresponding to the loss side of the gnls model n_gt_cutoff : Number of data points above the cutoff p : Model fitting parameter in the following models: exp3, exp5, gnls, Hill, pow q : Model fitting parameter in the gnls model resp : Observed responses at respective concentrations (conc) rmse : Root mean square error of the data points relative to model fit. Lower RMSE indicate model fits the data well. top_over_cutoff : Ratio of the maximal predicted change in response from baseline value to the cutoff (top/cutoff) top : Response value at the maximal predicted change in response from baseline ($y = 0$) tp : Model fitting parameter in the following models: Hill, gnls, exp4, exp5 - the horizontal asymptote that the predicted curve is approaching (theoretical maximum)