The main purpose of the adjustedCurves R-Package is to
calculate and plot confounder-adjusted survival curves and
cause-specific confounder-adjusted cumulative incidence functions (CIF)
using a variety of methods. Group- or treatment-specific survival curves
and CIFs are often used to graphically display the treatment (or group)
effect on the survival probability. When the data at hand comes from a
randomized controlled trial with balanced subgroups, simple stratified
estimators of the survival curves or CIFs ignoring other covariates are
unbiased. When randomization fails or was not done at all however,
confounding can lead to biased depictions (see Rubin 1974; Pearl
2009).
Luckily, a lot of different methods to adjust survival curves and
CIFs for confounding have been proposed. With the
adjustedCurves package, these methods can be used with
little effort.
The R-Package is currently not available on CRAN, but can be
installed easily using the devtools package:
library(devtools)
devtools::install_github("RobinDenz1/adjustedCurves")or the remotes package:
library(remotes)
remotes::install_github("RobinDenz1/adjustedCurves")Further packages might have to be installed, depending on the specified method.
Let’s start with the standard survival setting. Using the
sim_confounded_surv function, we simulate some survival
data (Chatton et al. 2020):
library(survival)
library(ggplot2)
library(riskRegression)
library(pammtools)
library(adjustedCurves)
# set random number generator seed to make this reproducible
set.seed(44)
# simulate standard survival data with 300 rows
data_1 <- sim_confounded_surv(n=300, max_t=1.1, group_beta=-0.6)
# code the grouping variable as a factor
data_1$group <- as.factor(data_1$group)
# take a look at the first few rows
head(data_1)##   x1 x2 x3          x4          x5          x6 group       time event
## 1  1  0  0 -0.52520986  0.22776621 -0.05319665     1 0.37181297     1
## 2  0  1  1  0.02441988 -2.16818274  0.58097459     0 0.65649863     1
## 3  1  1  1  0.05159071  0.07329185  1.18737651     1 0.08156474     0
## 4  1  0  0  0.01956631 -1.27205270  1.16594437     0 0.34161603     1
## 5  0  1  0  0.33793112  1.32447922 -1.28991227     0 0.52472847     1
## 6  0  1  0 -2.96505672  2.19434272  0.00528396     1 0.87274894     1Using the default arguments, this function outputs a data.frame with
6 independently drawn covariates (x1 - x6), a
binary group variable (group), the observed event time
(time) and the event indicator (event). In
this setting, there is only one type of event (1). When the
observations are right-censored this indicator is set to 0.
This is the standard data format used in standard time-to-event
analysis.
To calculate confounder-adjusted survival curves fo each group of
this dataset using Direct Standardization (also known as
G-Computation or Corrected Group Prognosis method, see
Makuch (1982) or Chang et al. (1982)), we first have to fit a
coxph model:
# it is important to use X=TRUE in the coxph function call
outcome_model <- survival::coxph(Surv(time, event) ~ x1 + x2 + x3 + x4 + x5 + 
                                   x6 + group, data=data_1, x=TRUE)This model can then be used in a call to the
adjustedsurv function, as shown below:
adjsurv <- adjustedsurv(data=data_1,
                        variable="group",
                        ev_time="time",
                        event="event",
                        method="direct",
                        outcome_model=outcome_model,
                        conf_int=TRUE)The argument data simply refers to our data.frame, the
argument variable specifies our grouping variable of
interest and the ev_time and event variable
specify the time-to-event variables in our data.frame. Setting the
method to "direct" will result in
G-Computation estimates, based on the previously fit cox-regression
model supplied using the outcome_model argument.
Doing this returns a list with some needed output objects. Most
important however is the adjsurv data.frame in there,
containing the adjusted survival curves and corresponding confidence
intervals (because we used conf_int=TRUE in the original
function call). We can take a look at this object using the following
code:
head(adjsurv$adjsurv)##         time      surv group          se  ci_lower  ci_upper
## 1 0.00000000 1.0000000     0 0.000000000 1.0000000 1.0000000
## 2 0.01457659 0.9959356     0 0.004039858 0.9880177 1.0000000
## 3 0.04062032 0.9917639     0 0.005767859 0.9804591 1.0000000
## 4 0.04232678 0.9876128     0 0.007034862 0.9738247 1.0000000
## 5 0.06129069 0.9834495     0 0.008123567 0.9675276 0.9993714
## 6 0.06271999 0.9792709     0 0.009066121 0.9615017 0.9970402More importantly however, we can plot the survival curves directly
using the plot method:
plot(adjsurv)This plot function comes with many options which are listed in the
documentation. To plot the point-wise confidence intervals, we can set
the argument conf_int to TRUE:
plot(adjsurv, conf_int=TRUE)Instead of using colors to differentiate the curves we can also use
different linetypes, by setting the linetype argument to
TRUE and the color argument to FALSE:
plot(adjsurv, linetype=TRUE, color=FALSE)We can also add small indicator lines for censored observations using
censoring_ind="lines":
plot(adjsurv, color=TRUE, censoring_ind="lines")Indicator lines for the median survival time can be added using
median_surv_lines=TRUE:
plot(adjsurv, color=TRUE, median_surv_lines=TRUE)Many more custom settings are available. For more details and
examples see ?plot.adjustedsurv.
The adjustedsurv function essentially works the same
with every available method. Since the methods are however vastly
different in nature, some additional arguments have to be supplied by
the user. For example, using the Inverse Probability of Treatment
Weighting method (IPTW), we need to model the treatment assignment
mechanism instead of the outcome mechanism (see Xie and Liu
(2005)). This can be done using a logistic regression model as
follows:
treatment_model <- glm(group ~ x1 + x2 + x3 + x4 + x5 + x6,
                       data=data_1, family="binomial"(link="logit"))
adjsurv <- adjustedsurv(data=data_1,
                        variable="group",
                        ev_time="time",
                        event="event",
                        method="iptw_km",
                        treatment_model=treatment_model,
                        conf_int=TRUE)The resulting curves can be plotted as before:
plot(adjsurv)Since both methods were used correctly here, there are only slight differences in the results. Big differences between the two methods usually indicate that either the cox-regression model or the logistic regression model are incorrectly specified.
Doubly-Robust methods can be helpful in such cases (see Robins and Rotnitzky (1992) or Ozenne et al. (2020)). The standard Augmented Inverse Probability of Treatment Weighting estimator utilized both kinds of models at the same time. If either of the models is correctly specified, the resulting estimates will be unbiased. It can be used in the same way as the other methods, only that this time both models have to be supplied:
adjsurv <- adjustedsurv(data=data_1,
                        variable="group",
                        ev_time="time",
                        event="event",
                        method="aiptw",
                        treatment_model=treatment_model,
                        outcome_model=outcome_model,
                        conf_int=TRUE)
plot(adjsurv, conf_int=TRUE)In many situations there are multiple mutually exclusive types of events instead of just one event. This is formally known as a competing risks situation. In these situations, survival curves can not be estimated anymore. However, the cumulative incidence function can be used instead. Without randomization, these CIFs face the same problems due to confounding as the survival curves do. Many of the methods to adjust survival curves for confounders can be used to adjust CIFs in very similar fashion.
While the computational details and the underlying theory is slightly
different (see Ozenne et al. (2020)), the syntax for the R-Package stays
pretty much exactly the same. The only major difference is that instead
of using the adjustedsurv function, the
adjustedcif function should be utilized. Additionally, the
user now also has to specify which event-type is of interest (argument
cause).
First we need new example data, mirroring the competing risks
situation. We are going to use the sim_confounded_crisk
function, which does the same thing as the
sim_confounded_surv function, but with competing risks
data:
# simulate the data
data_2 <- sim_confounded_crisk(n=300)
data_2$group <- as.factor(data_2$group)
head(data_2)##   x1 x2 x3          x4          x5          x6 group       time event
## 1  1  1  0  0.96216812 -0.66375234  0.65527293     1 0.02085406     0
## 2  1  1  0 -0.58477811  1.73696566 -0.18714061     1 0.18924205     0
## 3  1  0  0 -0.33549773  0.52008044  0.08290031     0 0.26288996     2
## 4  0  1  1 -0.04008865  0.19841348  0.38858944     1 0.08210559     2
## 5  0  0  1  0.78936972 -0.27545174  0.49322272     1 1.67993934     2
## 6  0  0  0  0.26001387 -0.03590098  0.86231143     1 1.70000000     0Let’s again start with the Direct Standardization method.
Instead of using a simple coxph method, we need to use a
model for the time-to-event process which takes the multiple event-types
into account. One such method is the Cause-Specific
Cox-Regression model. A simple implementation of this model is
contained in the riskRegression R-Package.
outcome_model <- riskRegression::CSC(Hist(time, event) ~ x1 + x2 + x3 + x4 + 
                                       x5 + x6 + group, data=data_2)This model can then be used in a call to the adjustedcif
function, as shown below:
adjcif <- adjustedcif(data=data_2,
                      variable="group",
                      ev_time="time",
                      event="event",
                      method="direct",
                      outcome_model=outcome_model,
                      cause=1,
                      conf_int=TRUE)
plot(adjcif, conf_int=TRUE)This shows the confounder-adjusted CIFs for cause = 1.
By setting cause to 2 we get the
confounder-adjusted CIFs for the other cause:
adjcif <- adjustedcif(data=data_2,
                      variable="group",
                      ev_time="time",
                      event="event",
                      method="direct",
                      outcome_model=outcome_model,
                      cause=2,
                      conf_int=TRUE)
plot(adjcif, conf_int=TRUE)The IPTW estimator can be used exactly the same way as we did with
the adjustedsurv function:
treatment_model <- glm(group ~ x1 + x2 + x3 + x4 + x5 + x6,
                       data=data_2, family="binomial"(link="logit"))
adjcif <- adjustedcif(data=data_2,
                      variable="group",
                      ev_time="time",
                      event="event",
                      method="iptw",
                      treatment_model=treatment_model,
                      cause=1,
                      conf_int=TRUE)
plot(adjcif, conf_int=TRUE)The other estimators can be used using essentially the same syntax.
We used bootstrap=TRUE in the last example, because we will
use functionality relying on it when comparing groups later.
In many applications there are more than two treatments. Some of the methods included in this R-Package allow calculations for an arbitrary number of treatments. In these cases the code does not change at all and can be used exactly in the same way as before. The only difference occurs when using IPTW methods. Here the user has to use a multinomial logistic regression model instead of a regular logistic regression to model the outcome. This is illustrated below.
First we again create a simulated data set (for single event survival data). The function is only able to create binary treatment variables, but we can simply resample all occurrences of 1 into 1 and 2, creating 3 treatments, where 1 and 2 have an identical treatment effect and selection process:
# add another group
# NOTE: this is done only to showcase the method and does not
#       reflect what should be done in real situations
data_1$group <- factor(data_1$group, levels=c("0", "1", "2"))
data_1$group[data_1$group=="1"] <- sample(c("1", "2"), replace=TRUE,
                                          size=nrow(data_1[data_1$group=="1",]))The Direct Adjusted survival curves can be calculated exactly as before:
outcome_model <- survival::coxph(Surv(time, event) ~ x1 + x2 + x3 + x4 + 
                                   x5 + x6 + group, data=data_1, x=TRUE)
adjsurv <- adjustedsurv(data=data_1,
                        variable="group",
                        ev_time="time",
                        event="event",
                        method="direct",
                        outcome_model=outcome_model,
                        conf_int=TRUE)
plot(adjsurv)For the IPTW based estimates we first fit a multinomial logistic
regression model using the multinom function from the
nnet R-Package and use it with the same code as before:
treatment_model <- nnet::multinom(group ~ x1 + x2 + x3 + x4 + x5 + x6,
                                  data=data_1)## # weights:  24 (14 variable)
## initial  value 329.583687 
## iter  10 value 286.543657
## iter  20 value 283.765460
## final  value 283.765313 
## convergedadjsurv <- adjustedsurv(data=data_1,
                        variable="group",
                        ev_time="time",
                        event="event",
                        method="iptw_km",
                        treatment_model=treatment_model,
                        conf_int=TRUE,
                        bootstrap=TRUE,
                        n_boot=50)
plot(adjsurv, conf_int=TRUE)We use bootstrap=TRUE here because we will need it in
the next section. In practice, however, 50 bootstrap samples would
definitely not be enough to guarantee convergence. We choose this low
number here only due to speed considerations enforced by CRAN.
The estimated adjusted curves can subsequently be used to compare the treatment groups using additional plots and statistics.
One simple way to compare the groups is to pick a point in time and
simply calculate the differences between the survival or failure
probability at this point. The adjusted_curve_diff function
automates this process. In the following example we calculate the
difference between the adjusted survival probability of group 0 and 1 at
time = 0.4:
adjusted_curve_diff(adjsurv, times=0.4, group_1="0", group_2="1", conf_int=TRUE)##   time         diff         se   ci_lower  ci_upper   p_value
## 1  0.4 -0.008257592 0.09049062 -0.1856159 0.1691008 0.9272911The confidence interval does include 0, suggesting that this difference is not statistically significant. This can also be seen from the associated p-value.
Instead of focusing on a single point in time, we can also plot the
entire difference curve using the plot_curve_diff
function:
plot_curve_diff(adjsurv, group_1="0", group_2="1", conf_int=TRUE, color="blue")These kind of curves are a great way to visualize the group effect
over time. When more than two groups are present, this function can be
called multiple times with changing group_1 and
group_2 values.
Similarly, we can calculate the adjusted survival time quantiles
using the adjusted_surv_quantile function. For example, the
adjusted median survival time is given by:
adjusted_surv_quantile(adjsurv, p=0.5, conf_int=TRUE)##     p group    q_surv  ci_lower  ci_upper
## 1 0.5     0 0.5067310 0.4487885 0.5657139
## 2 0.5     1 0.8014008 0.4068963        NA
## 3 0.5     2 0.4835157 0.4311141 0.6283029In our case, they are all very similar.
An alternative measure is the restricted mean survival time, which
denotes the integral of the survival curve from time 0 up to a specified
point in time (see Royston et al. 2013). A confounder adjusted version
of this statistic can be calculated by simply integrating the confounder
adjusted survival curves instead of a normal Kaplan-Meier curve (Conner
et al. 2019). The adjustedCurves package allows the user to
do this by simply calling the adjusted_rmst function on a
previously created adjustedsurv object:
adjusted_rmst(adjsurv, to=0.4)##   group      rmst
## 1     0 0.3485056
## 2     1 0.3415740
## 3     2 0.3547187The function returns adjusted RMST for every level of the
"group" variable. When bootstrapping was performed when
calling the adjustedsurv function, standard errors and
confidence intervals can be also calculated using this function as well
by setting the conf_int argument to TRUE:
adjusted_rmst(adjsurv, to=0.4, conf_int=TRUE)##   group      rmst          se  ci_lower  ci_upper n_boot
## 1     0 0.3485056 0.009156084 0.3305600 0.3664512     50
## 2     1 0.3415740 0.019812278 0.3027427 0.3804054     50
## 3     2 0.3547187 0.009117535 0.3368487 0.3725888     50Similar to the adjusted_curve_diff function, we can
perform a test of the difference between two RMST values:
adjusted_rmst(adjsurv, to=0.4, conf_int=TRUE, difference=TRUE,
              group_1="0", group_2="1")##          diff         se    ci_lower   ci_upper   p_value
## 1 0.006931522 0.02182568 -0.03584602 0.04970907 0.7507993There seems to be no statistically significant difference between the two RMSTs. However, due to the very low number of bootstrap replications used here these results should not be trusted. In practice one should use a much higher number, such as 500 or 1000 replications.
Instead of focusing on a single point in time, it is also possible to
plot the RMST as it evolves over time using the
plot_rmst_curve function:
plot_rmst_curve(adjsurv)Bootstrap confidence intervals could also be added to this plot using
conf_int=TRUE.
A similar statistic is the adjusted Restricted Mean Time
Lost (RMTL), which is defined as the area under the cause-specific
confounder adjusted CIF. It can be calculated in the same way as the
RMST, but using the adjusted_rmtl function:
adjusted_rmtl(adjcif, to=0.4, conf_int=FALSE)##   group       rmtl
## 1     0 0.05621035
## 2     1 0.04968113In contrast to the adjusted_rmst function, this works
for both adjustedsurv and adjustedcif objects
because the survival curve can easily be transformed to a CIF. It also
allows difference tests, just like the adjusted_rmtl
function.
We can also plot the RMTL over time, this time using the
plot_rmtl_curve function:
plot_rmtl_curve(adjcif)Donald B. Rubin (1974). “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies”. In: Journal of Educational Psychology 66.5, pp. 688-701
Judea Pearl (2009). Causality: Models, Reasoning and Inference. 2nd ed. Cambridge: Cambridge University Press
Arthur Chatton, Florent Le Borgne, Clemence Leyrat, and Yohann Foucher (2020). G-Computation and Inverse Probability Weighting for Time-To-Event Outcomes: A Comparative Study. arXiv:2006.16859v1
Robert W. Makuch (1982). “Adjusted Survival Curve Estimation Using Covariates”. In: Journal of Chronic Diseases 35.6, pp. 437-443
I-Ming Chang, Rebecca Gelman, and Marcello Pagano (1982). “Corrected Group Prognostic Curves and Summary Statistics”. In: Journal of Chronic Diseases 35, pp. 669-674
Jun Xie and Chaofeng Liu (2005). “Adjusted Kaplan-Meier Estimator and Log-Rank Test with Inverse Probability of Treatment Weighting for Survival Data”. In: Statistics in Medicine 24, pp. 3089-3110
James M. Robins and Andrea Rotnitzky (1992). “Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers”. In: AIDS Epidemiology: Methodological Issues. Ed. by Nicholas P. Jewell, Klaus Dietz, and Vernon T. Farewell. New York: Springer Science + Business Media, pp. 297-331
Brice Maxime Hugues Ozenne, Thomas Harder Scheike, and Laila Staerk (2020). “On the Estimation of Average Treatment Effects with Right-Censored Time to Event Outcome and Competing Risks”. In: Biometrical Journal 62, pp. 751-763
Weixin Cai and Mark J. van der Laan (2020). “One-Step Targeted Maximum Likelihood Estimation for Time-To-Event Outcomes”. In: Biometrics 76, pp. 722-733
Patrick Royston and Mahesh K. B. Parmar (2013). “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-To-Event Outcome”. In: BMC Medical Research Methodology 13.152
Sarah C. Conner, Lisa M. Sullivan, Emelia J. Benjamin, Michael P. LaValley, Sandro Galea, and Ludovic Trinquart (2019). “Adjusted Restricted Mean Survival Times in Observational Studies”. In: Statistics in Medicine 38, pp. 3832-3860