--- title: "DTEBOP2:An R Package for Designing Two-Arm Multi-Stage Survival Trials with Delayed Treatment Effects" output: rmarkdown::html_vignette #: css: styles.css #word_document bibliography: references.bib vignette: > %\VignetteIndexEntry{DTEBOP2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} #cat("") # 全局代码块选项设置 knitr::opts_chunk$set( collapse = TRUE, # 合并代码和输出 comment = "#>", # 输出前的注释符号 fig.width = 8, # 默认图像宽度（英寸） fig.height = 6, # 默认图像高度（英寸） dpi = 300, # 图像分辨率（DPI） out.width = "100%" # 图像在 HTML 中的宽度占比 ) ``` ## 1. Introduction We have developed a novel design method to extend the Bayesian Optimal Phase II (BOP2) design [@Zhou:2020] to address delayed treatment effects in a two-arm RCT setting. Below is the corresponding R package, DTEBOP2, along with its user manual. The `DTEBOP2` package serves as multiple purposes, including: - Determining the required interim sample size and final sample sizes in a two-stage design setting. - Identifying the optimal design parameters $\lambda$ and $\gamma$ in $P(\tilde{\mu}_1 < \tilde{\mu}_0 \mid \mathcal{D}_n, S)> 1-\lambda(n/N)^{\gamma}$ to control the type I error rate in multi-stage design setting. - Given the separation time and estimated optimal design parameter $\lambda$ and $\gamma$, it can provide the operating characteristics including Percentage of Rejecting the Null Hypothesis, Probability of Early Stopping, Average Sample Size and Trial Duration. - Conducting interim analyses, including Go/No-Go decisions, as well as the final analysis. Clinical researchers can leverage the functions within this package when conducting clinical trials focused on immunotherapies or molecularly targeted therapies for cancer, where the delayed treatment effect will be anticipated. The following are steps using the proposed methods when designing a study in which DTE is anticipated. 1. **Step 1: Overall Median Survival Times** Provide the overall median survival time for the standard of care (SOC) and experimental arm, denoted as $\bar{\mu}_j,j=0,1$ (0 is for SOC and 1 is for experimental arm). These values correspond to the null and alternative hypotheses in a conventional setting, for example, 2.8 months vs. 3.5 months. 2. **Step 2: Prior Distribution for Delayed Treatment Effect (DTE) Timepoint $S$ and Most Likely DTE Timepoint $S$** Specify the prior distribution for the delayed separation timepoint $S$ as a truncated Gamma distribution, i.e., \[ S \sim \text{Gamma}(s_1, s_2)I(L, U) \] and the best-guessed most likely $S$. 3. **Step 3: Sample Size Calculation and Optimal Parameter Determination for Two-Stage Design** For a two-stage design, we can provide the calculated sample size for the interim and the final analyses using the function `Two_stage_sample_size(.)`. The function will also return the optimal parameters $\lambda$ and $\gamma$ such that \[ P(\tilde{\mu}_1 < \tilde{\mu}_0 \mid \mathcal{D}_n, S) > 1 - \lambda\left(\frac{n}{N}\right)^{\gamma}. \]. If your study plans to have more than two stages, skip Step 3 and go directly to step 4. Note: ($\tilde{\mu}_0,\tilde{\mu}_1$) represents the median of time-to-event before and after separation timepoint $S$. A more detailed explanation will be provided later. 4. **Step 4: Optimal Parameters Determination** If your study plans to have more than two stages, similar to the BOP2 design method, if a user provides sample sizes at interim and final stages, we then can compute the optimal parameters $\lambda$ and $\gamma$ satisfying: \[ P(\tilde{\mu}_1 < \tilde{\mu}_0 \mid \mathcal{D}_n, S) > 1 - \lambda\left(\frac{n}{N}\right)^{\gamma}. \] with R function `get.optimal_2arm_piecewise(.)` 5. **Step 5: Conduct Interim and Final Analyses** If we have the data, we can use the function `conduct(.)` to perform interim and/or final analyses to make Go/No-Go decisions based on the obtained optimal parameters $\lambda$ and $\gamma$ in previous step 3 or 4. Below, we provide a step-by-step explanation of the procedures outlined above. ## 2. Eliciting the Truncated Gamma Prior for DTE Separation Timepoint $ S $ In the previous Step 2, we require the trialist to specify the prior distribution for the DTE timepoint S as a truncated Gamma distribution, i.e., \[ S \sim \text{Gamma}(s_1, s_2)I(L, U) \] and also need a best-guessed most likely DTE timepoint $S$. For specifying the L and U, we can ask domain experts: ***“What would be reasonable estimates/guesses for the lower and upper bounds of the delayed treatment effect timepoint S in population-level survival curves?"*** This part is relatively straightforward, especially if historical data or studies are available for reference. Meanwhile, to obtain a best-guess estimate of the most likely DTE timepoint $S$, a straightforward method is to use the midpoint of the lower and upper bounds $L$ and $U$. Next, we introduce the specification of $s_1$ and $s_2$ for Gamma distribution. Although default values of $s_1=1$ and $s_2=1$ are used in this package, more accurate estimates can be obtained by asking domain experts the following questions ***"From your experience, can you provide at least one of the following statistics for $ S $?"*** - Mean - Standard deviation - Median - 95% confidence interval If any of the above questions we can get answer, We can use the following proposed method to estimate $s_1$ and $s_2$ by minimizing the following *weighted least squares (WLS) function*: \[ w_1\sum_{i=1}^n(\text{mean}(S)-\text{mean from Expert } i)^2 + w_2\sum_{i=1}^n(\text{median}(S)-\text{median from Expert } i)^2 + w_3\sum_{i=1}^n(\text{sd}(S)-\text{sd from Expert } i)^2 + w_4\sum_{i=1}^n(\text{quantile}(S,0.025)-\text{0.025 quantile from Expert } i)^2 + w_5\sum_{i=1}^n(\text{quantile}(S,0.975)-\text{0.975 quantile from Expert } i)^2 \] where $ S \sim \text{Gamma}(s_1, s_2)I(L, U) $, and $ w_i $ ($ i=1,2,3,4,5 $) represents the relative importance of each statistic. Since not all experts may provide every statistic, we can still proceed by using, for example, just one available statistic to estimate the optimal parameters. The function `trunc_gamma_para(.)` can help us implement this WLS optimization and estimate $s_1$ and $s_2$. The lower and upper $ L $ and $ U $, along with estimated Gamma prior parameters, reflect a complete picture of DTE timepoint. Biostatisticians may visualize the final prior to facilitate discussion and decision-making. --- ## 3. Calculating Median Survival Times for Pre- and Post-Separation DTE Timepoint given DTE timepoint S From the above, we have the **most likely DTE timepoint $ S $** and based on the study proposed null and alternative hypotheses, we have the overall median survival times for the **SOC** ($ \bar{\mu}_0 $) and **experimental** ($ \bar{\mu}_1 $) arms, we can then back-calculate the **median survival times** for pre- and post-separation DTE timepoint, denoted as ($ \tilde{\mu}_0, \tilde{\mu}_1 $) using the following formula: \[ \begin{aligned} \bar{\mu}_0 &= \tilde{\mu}_0 \\ \bar{\mu}_1 &= \begin{cases} \tilde{\mu}_0, & \text{if } \tilde{\mu}_0 < S, \\ (1 - \frac{\tilde{\mu}_1}{\tilde{\mu}_0}) S + \tilde{\mu}_1, & \text{if } \tilde{\mu}_0 \geq S. \end{cases} \end{aligned} \] The median survival times for pre and post DTE separation timepoints will be used inherently in our proposed algorithm to obtain optimal estimates for design parameters $ \lambda $ and $ \gamma $, which are crucial for controlling the type I error rate. --- ## 4. Determination of Sample Size in a Two-Stage Design Setting The sample size for a two-stage design is determined by minimizing the following weighted expected sample size function, while ensuring that the power is at least 1− (Type II error) when $ S \leq U $. To balance two competing objectives—minimizing patient exposure to futile treatments and maximizing access to effective treatments—we define the function as follows: \[ EN = w \frac{PS_{H_0}\times n_1 + (1 - PS_{H_0})(n - n_1)}{n} + (1 - w) \left(1 - \frac{PS_{H_1} \times n_1 + (1 - PS_{H_1})(n - n_1)}{n} \right) \] Here, $w\in[0,1]$ is a tuning parameter representing the trade-off between two goals: - When $w=1$: the function simplifies to \[EN=E(N|H_0)=\frac{PS_{H_0}\times n_1 + (1 - PS_{H_0})(n - n_1)}{n}\] This expression reflects the expected sample size under the null hypothesis. Minimizing this value emphasizes **early stopping when the treatment is futile**, thereby reducing unnecessary patient exposure to ineffective therapies. - When $w=0$: the function becomes \[EN=1-E(N|H_1)=1 - \frac{PS_{H_1} \times n_1 + (1 - PS_{H_1})(n - n_1)}{n}\] In this case, the focus shifts to maximizing the number of patients who can access a truly effective treatment. By minimizing $1−E(N∣H_1)$, we effectively encourage continuation of the trial when the treatment is promising, avoiding premature termination that could deny access to beneficial therapy. By varying $w$, this design allows flexible prioritization between safety (minimizing exposure to ineffective treatment) and efficacy (maximizing access to effective treatment), while ensuring adequate statistical power when the true signal is weak. Although this sample size is derived only for a **two-stage design**, it can be extended to sample size calculations for trials with multiple stages. Such extensions are not explored in this package. Sometimes, the probability of early stopping under the null hypothesis $PS_{H_0}$ at the first stage, as computed purely by the algorithm, may be too low (e.g., 25%). This indicates that the design lacks the ability to stop the study early for futility, which is generally undesirable, especially in phase II trials. To address this, our method provides users with the option to specify a minimum desired $PS_{H_0}$. Based on this user-specified threshold instead of just minimizing $EN$ above, we can still compute and provide the corresponding sample sizes for both the interim and final stages while keeping power at least $ 1 - $ (Type II error) when $ S \leq U $. Note: Given that $PS_{H_0}$ increases with $n_1$ for a fixed $n$, our proposed algorithm restricts the ratio $n_1/n$ to be less than 0.75 by default. --- ## 5. Conduct the Trial: Make Go/No-Go Decision From the above, with the obtained optimal $ \lambda $ and $ \gamma $, once we have the data, we can calculate $P(\tilde{\mu}_1 < \tilde{\mu}_0 \mid \mathcal{D}_n, S)$ to determine whether to proceed with the trial (**Go/No-Go decision**). --- ## 6. Summary This vignette provides a structured, step-by-step approach to designing two-arm multi-stage survival trials with delayed treatment effects. In summary, the process includes: - **Eliciting expert opinions** to construct a **truncated Gamma prior** for the separation time $ S $. - **Obtaining optimal estimates** for $ \lambda $ and $ \gamma $. - **Making Go/No-Go decision** at interim and final stages. - **Determining the sample size** in a two-stage trial setting, if applicable. When a delayed treatment effect (DTE) is anticipated, the proposed design method can reduce the required sample size compared to methods (e.g., the conventional BOP2) that do not account for DTE. The details can be found in @Cai:2025. ## 7. Example 1 We consider the CheckMate 017 study [@Julie:2015] as an example. This randomized, open-label, international phase 3 trial evaluated the efficacy and safety of nivolumab, a fully human IgG4 programmed death-1 (PD-1) immune checkpoint inhibitor, compared to docetaxel. The study focused on patients with squamous-cell non-small-cell lung cancer (NSCLC) following prior platinum-based chemotherapy. Here, we focus on progression-free survival (PFS), a secondary endpoint, which has demonstrated a delayed treatment effect. We now use DTEBOP2 R package to design this trial by assuming the PFS endpoint to be the primary endpoint and also demonstrate the process for conducting interim and final analyses once the data are available. **Preparation Step: Extract Data from the Published PFS Kaplan-Meier Plot** ```{r,echo=FALSE} knitr::opts_chunk$set( fig.width = 10, # 默认图像宽度 fig.height = 7, # 默认图像高度 dpi = 300, # 图像分辨率 out.width = "80%" # 图像在页面中显示为 100% 宽度 ) ``` First, we reconstruct the Kaplan-Meier (KM) plot for PFS using PlotDigitizer. The datasets extracted from PlotDigitizer for the treatment and control arms are stored in our R package as "Experimental" and "SOC". ```{r setup,message=FALSE,warning=FALSE} library(DTEBOP2) library(invgamma) library(truncdist) library(doParallel) library(survRM2adapt) library(reconstructKM) library(ggplot2) data("Experimental") data("SOC") Experiment_NAR <- data.frame(time=seq(from=0, to=24, by=3), NAR=c(135, 68, 48, 33, 21, 15, 6, 2,0)) SOC_NAR <- data.frame(time=seq(from=0, to=24, by=3), NAR=c(137, 62, 26, 9, 6, 2,1, 0, 0)) Experiment_aug <- format_raw_tabs(raw_NAR=Experiment_NAR, raw_surv=Experimental) SOC_aug <- format_raw_tabs(raw_NAR=SOC_NAR, raw_surv=SOC) # reconstruct by calling KM_reconstruct() Experiment_recon <- KM_reconstruct(aug_NAR=Experiment_aug$aug_NAR, aug_surv=Experiment_aug$aug_surv) SOC_recon <- KM_reconstruct(aug_NAR=SOC_aug$aug_NAR, aug_surv=SOC_aug$aug_surv) # put the treatment and control arms into one dataset Experiment_IPD <- data.frame(arm=1, time=Experiment_recon$IPD_time, status=Experiment_recon$IPD_event) SOC_IPD <- data.frame(arm=0, time=SOC_recon$IPD_time, status=SOC_recon$IPD_event) allIPD <- rbind(Experiment_IPD, SOC_IPD) KM_fit <- survival::survfit(survival::Surv(time, status) ~ arm, data=allIPD) KM_plot <- survminer::ggsurvplot( fit = KM_fit, data = allIPD, risk.table = TRUE, palette = c('red', 'blue'), legend = c(0.8, 0.9), legend.title = '', legend.labs = c('Docetaxel', 'Nivolumab'), title = 'PFS', ylab = 'Survival (%)', xlab = 'Time (Month)', risk.table.title = 'Number at Risk', break.time.by = 3, tables.height = 0.4, risk.table.fontsize = 5 ) print(KM_plot,fig.align='center') ``` In the original paper, the hazard ratio of nivolumab and Docetaxel is 0.62 ($95\%$ CI, 0.47 to 0.81), which means nivolumab has statistical significantly better performance compared with SOC. Based on the figure above, it is evident that a delayed treatment effect is present, with the two PFS curves visibly diverging around 2.2 months. In the following, we assume the results are unknown and provide a step-by-step guide by using our proposed method to design a study while anticipating the delayed treatment effect. ***Step 1: Hypothesis Setup Based on Overall Median Survival Times*** To design a study, one key component is establishing the null and alternative hypotheses. Here, we assume $\bar{\mu}_0 = 2.8$ months for the standard of care (SOC) arm and $\bar{\mu}_1 = 3.5$ months for the experimental arm. ***Step 2.1: Prior Specification for Delayed Time $S$*** Next, we derive the most likely value and prior distribution of the delayed separation timepoint $S$. For the prior of $S$, we seek expert opinion to establish the lower ($L$) and upper ($U$) bounds. For example, the clinical team may find a range with $L=2$ months and $U=2.5$ months for the delayed separation timepoint $S$ based on literature and their experiences. If experts have greater confidence in the separation timepoint, for example, by providing specific estimates as shown below: - Expert A: Mean = 2.2, Median = 2.27 - Expert B: Mean = 2.1, Median = 2.3 - Expert C: Mean = 2.3, Median = 2.31 We can use the following code to obtain the estimated prior parameters $s_1$ and $s_2$ based on the previously introduced optimization algorithm. As a recap, we assume a truncated Gamma prior for $S$, following a Gamma distribution $\text{Gamma}(s_1, s_2)I(L, U)$. In the following code: `weights=c(4,4,2,1)` indicates that the expert-provided mean and median are given equal weight and more emphasis compared to other quantities. ```{r,eval=FALSE} expert_data_correct <- list( list(mean = 2.2, median = 2.27, sd = NULL, q25 = NULL, q975 = NULL), # expert A list(mean = 2.1, median = 2.3, sd = NULL, q25 = NULL, q975 = NULL), # expert B list(mean = 2.3, median = 2.31, sd = NULL, q25 = NULL, q975 = NULL) # expert C ) param<-trunc_gamma_para(L = 2, U = 2.5, expert_data = expert_data_correct,weights = c(4,4,2,1,1) ,num_cores = 4,seed=123) # num_cores specifies the number of core to do the parallel computing. print(param) ``` The estimated $s_1,s_2$ are 12.86 and 0.19, respectively. Below is the density plot of the truncated gamma distribution $Gamma(12.86,0.19)I(2,2.5)$: ```{r} plot(density(rtrunc_gamma(1000,L=2,U=2.5,shape=12.86,scale=0.19)),xlab="",main="") ``` The plot suggests that the prior is close to uniform distribution over $[2,2.5]$. ***Step 2.2: Determining the Most Likely DTE timepoint $S$ Using Historical Data*** When expert opinions are unavailable, the most likely $S$ can be estimated using the mean of the truncated gamma prior or midpoint of $[L,U]$ e.g., $(L+U)/2$. In this specific example, CheckMate 017, since the data is already available, we assume it represents historical data. Our goal here is to show how to use this "historical" data to identify the most likely timepoint $S$—that is, the point at which the Kaplan-Meier (KM) curves begin to diverge. The following R code identifies this separation time point $S$. We identify several probabilities and obtain the corresponding survival times for two arms. The $S$ is identified to be the timepoint where the survival time begin to differ significantly. ```{r} y_prob <- c(0.54, 0.53, 0.52,0.51) #y-axis survival probability to explore summary_fit <- summary(KM_fit) arm_times <- list() for (i in 1:length(KM_fit$strata)) { time_points <- summary_fit$time[summary_fit$strata == names(KM_fit$strata)[i]] survival_probs <- summary_fit$surv[summary_fit$strata == names(KM_fit$strata)[i]] closest_times <- sapply(y_prob, function(tp) { closest_index <- which.min(abs(y_prob - tp)) return(time_points[closest_index]) }) arm_times[[names(KM_fit$strata)[i]]] <- setNames(closest_times, y_prob) } print(arm_times) ``` Based on the above results, when the corresponding $y$ with probability value reaches 0.53, the survival time shows a significant difference, the corresponding value to $x$ axis is equal to 2.28, this is the separation timepoint. Therefore, we select $S=2.28$ as the most likely delayed timepoint. If you have a historical dataset, the above procedure offers a simpler alternative to traditional changepoint methods described in the literature. Next, given the hypothesis of median survival times of 3.5 months for the experimental arm and 2.8 months for the SOC arm, and from the above with the best guessed delayed separation time point of 2.28 months within the range of $L = 2$ months to $U = 2.5$ months, by assuming accrual rate with 6 patients per month, minimum follow-up of 6 months for the last enrolled patient, Type I error Controlled at 10%, Type II error controlled at 15% (e.g., power is 85%). If the study is designed to have two stages, the estimated sample sizes for interim and final stages and the optimal parameters $\lambda,\gamma$ can be computed as follows using function `Two_stage_sample_size(.).` ```{r,eval=FALSE} median.1=2.8 #\bar{mu}_0 median.2=3.5 #\bar{mu}_1 L=2 # Lower bound of S U=2.5 # Upper bound of S S_likely=2.28 # The most likely delayed time trunc.para=c(12.86,0.19) # prior paramters of S rate=6 # accrual rate. It assumes that the arrival time follow Poisson(rate) FUP=6 # Follow-up time of the last enrolled patient err1 = 0.1 # type I error err2=0.15 # type II error Two_stage_sample_size(median.1=median.1 , median.2=median.2,L=L,U=U,S_likely=S_likely,err1=err1,err2=err2,trunc.para=trunc.para, FUP=FUP, rate=rate, weight=0.5,earlystop_prob =NULL,seed=123) #calculate the sample size for two-stage design ``` ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #result_sample_size=readRDS("sample_size_1.rds") result_sample_size=c(28,40) print(result_sample_size) ``` The interim and final sample sizes are 28 and 40 per arm, respectively. The optimal parameters, $\lambda$ and $\gamma$, are 0.95 and 1, respectively, with an average type I error rate of 0.0871 and power of 0.8667. Note: In the code above, `weight`$=0.5$ indicates that equal importance is given to the probability of early stopping under both the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$). Meanwhile, if we may be interested in the corresponding number of events for two stages, we can use function `event_fun(.)`: ```{r} event_fun(median.1 = 2.8,median.2 = 3.5,S_likely = 2.28,n.interim = c(28,40),rate = 6,FUP = 6,n.sim = 1000,seed=123) ``` We anticipate observing approximately 23 events at the interim analysis stage and a total of 68 events by the final analysis. With optimal parameters and sample size, we can evaluate the operating characteristics, including: - Percentage of rejecting the null hypothesis - Probability of early stopping under $H_0$ - Average sample size - Trial duration for different true unknown separation timepoint S. For sensitivity analysis (SA) in the following, we assume that the most likely separation timepoint, `S_likely=2.28` months, remains unchanged. Under this condition, the median survival times before and after the separation timepoint for both the control (SOC) and treatment (experimental) groups--$\tilde{\mu}_0=2.8$ and $\tilde{\mu}_1=6.57$, respectively--will also remain unchanged. This is because the median survival times before and after the separation depend solely on the assumed most likely separation timepoint (in this case, `S_likely=2.28` months). **Impact of True S Variation Under a Fixed Assumed `S_likely`** We now consider two true unknown values for $S$, 2 and 2.5 months, to examine their effects on the design. Since overall median survival time $\bar{\mu}_1$ depends on true but unknown separation timepoin $S$, its value of $\bar{\mu}_1$ will change adjust accordingly. ***Sensitivity Analysis 1.1: If true unknown DTE separation timepoint $ S = 2 $*** In the following code, we can set $L=U=2$. However, in the real trial design practices, of course, we should also set `S_likley=2`. Nevertheless, in the current setting, `S_likely` is still set to be 2.28 as same as the previous, the goal here is only to check the robustness of the performance of our proposed method. ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2,U=2,S_likely=2.28,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2,U=2,S_likely=2.28,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` ***Sensitivity Analysis 1.2: If true unknown DTE separation timepoint: $S=2.5$*** ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2.5,U=2.5,S_likely=2.28,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2.5,U=2.5,S_likely=2.28,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` From the above results, we observe that when `S_likely` = 2.28 remains unchanged, the type I error rates can still be controlled under 0.1, and the powers maintain above 0.85 across different true DTE separation timepoints. Additionally, we see that when $S=2.5$, the power is lower than when $S=2$, given that our best estimated separation timepoint is `S_likely` = 2.28. This is expected because when $S=2.5$ and `S_likely` = 2.28, the unseparated portion of the two curves is included in the comparison, leading to a loss of power. Conversely, if the true separation occurs earlier than `S_likely`, the power will increase. **Sensitivity Analysis : Varying the Assumed Separation Timepoint** `S_likely` Next, we examine scenarios where `S_likely`$=2$ or `S_likely`$=2.5$,the best-guessed separation timepoint, to evaluate the robustness of our proposed method, while keeping the overall median survival time with $\bar{\mu}_0=2.8$, $\bar{\mu}_1=3.5$. In this scenario, median survival time of pre- and post-separation time point, $\tilde{\mu}_0, \tilde{\mu}_1$, will be to be changed with `S_likely` accordingly. We report the average type I error and power to assess the performance of our method under these scenarios. Specifically, we examine two extreme scenarios to assess the robustness of our method. The optimal design parameters were initially determined assuming the separation time point falls within the range of $L = 2$ months to $U = 2.5$ months. - In the first scenario, we evaluate the case where the clinical team believes the most likely separation time point is 2 months. - In the second scenario, we assess the case where the most likely separation time point is 2.5 months. If the operating characteristics (OCs) are still satisfied in both cases, it would demonstrate the robustness of our proposed method. ***Sensitivity Analysis 1.3: If best-guessed DTE separation timepoint:*** `S_likely` $= 2$ ```{r} ##When H0 holds, the reject rate is the average type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2,U=2.5,S_likely=2,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the average power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2,U=2.5,S_likely=2,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` ***Sensitivity Analysis 1.4: If best-guessed DTE separation timepoint:*** `S_likely` $= 2.5$ ```{r} ##When H0 holds, the reject rate is the average type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2,U=2.5,S_likely=2.5,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the average power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2,U=2.5,S_likely=2.5,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` From these results, we can see that the average type I errors remain controlled under 0.1. However, when `S_likely`$=2$, the average power falls below 0.85. This is expected because the sample size was originally determined based on `S_likely`$=2.28$, with corresponding $\tilde{\mu}_0=2.8,\tilde{\mu}_1=6.57$, whereas for when `S_likely`$=2$, the corresponding these values are changed to $\tilde{\mu}_0=2.8,\tilde{\mu}_1=5.25$. That is, if the best guessed separation timepoint shifts earlier to the beginning of the trial, the treatment effect will be diluted. Vice versa, When the best guessed separation timepoint `S_likely` is larger than 2.28, as we see in the case of `S_likely`$=2.5$ ($\tilde{\mu}_0=2.8,\tilde{\mu}_1=9.33$), the average power remains above 0.85, ensuring adequate performance. Just for another exploration, if we set `S_likely`$=2$ to determine the required sample size. The results are as follows: **Example 2: Impact of Setting** `S_likely`$=2$ ```{r,eval=FALSE} median.1=2.8 #\bar{mu}_0 median.2=3.5 #\bar{mu}_1 L=2 # Lower bound of S U=2.5 # Upper bound of S S_likely=2 # The most likely delayed time trunc.para=c(12.86,0.19) # default prior parameter of S rate=6 # accrual rate. It assumes that the arrival time follow Poisson(rate) FUP=6 # Follow-up time of the last patient err1 = 0.1 # type I error err2=0.15 # type II error Two_stage_sample_size(median.1=median.1 , median.2=median.2,L=L,U=U,S_likely=S_likely,err1=err1,err2=err2,trunc.para=trunc.para, FUP=FUP, rate=rate, weight=0.5,earlystop_prob =NULL,seed=123) #calculate the sample size for two-stage design ``` ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #result_sample_size_2=readRDS("sample_size_2.rds") result_sample_size_2=c(44,63) print(result_sample_size_2) ``` The interim and final sample sizes are 44 and 63 per arm, respectively. The optimal parameters in $P(\tilde{\mu}_1 < \tilde{\mu}_0 \mid \mathcal{D}_n, S)> 1-\lambda(n/N)^{\gamma}$ are $\lambda=0.95,\gamma=1$ with average type I error 0.0813 and power 0.8786. For the number of events for two stages, we have: ```{r} event_fun(median.1 = 2.8,median.2 = 3.5,S_likely = 2,n.interim = c(44,63),rate = 6,FUP = 6,n.sim = 1000,seed=123) ``` We may know that we expect that there are 46 events and 112 events in total at interim stage and final stage, respectively. Take a further step, with the optimal parameters $\lambda=0.95, \gamma=1$, we consider four scenarios by varying both `S_likely` and $S$. ***Sensitivity Analysis 2.1:*** $S=2$, `S_likely`$=2$ ```{r} ##When H0 holds, the reject rate is the average type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2,U=2,S_likely=2,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the average power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2,U=2,S_likely=2,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` ***Sensitivity Analysis 2.2:*** $S=2.5$, `S_likely`$=2$ ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2.5,U=2.5,S_likely=2,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2.5,U=2.5,S_likely=2,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` ***Sensitivity Analysis 2.3:*** $S=2$, `S_likely`$=2.5$ ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2,U=2,S_likely=2.5,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2,U=2,S_likely=2.5,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` ***Sensitivity Analysis 2.4:*** $S=2.5$, `S_likely`$=2.5$ ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2.5,U=2.5,S_likely=2.5,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(44,63),L=2.5,U=2.5,S_likely=2.5,FUP=6,trunc.para=c(12.86,0.19),rate=6,nsim=10000,seed=10) ``` **Summary of the above results** - With a maximum sample size of 63 per arm, the type I error is controlled at 0.1, and power remains above 0.85 across all combinations of $S$ and `S_likely`, demonstrating doubly robustness (DR) property -- robustness to both the true but unknown separation time point S, and the user-specified best-guess value S_likely as long as the unknown true $S$ and best-guessed `S_likely` are in $[L,U]$. - A larger upper bound $U$ increases the required sample size. Meanwhile, smaller $L$ results in smaller best guessed $S$, that is, `S_likely`, which means that the undiveraged region of the two curves is also included for comparison and thus also needs more sample size to maintain the power. In short, a larger interval length $[L,U]$ reflects greater uncertainty in specification and leads to an increased required sample size. - The lower bound $L$ and upper bound $U$ should be chosen carefully based on prior knowledge and experience, as both influence the maximum required sample size. The guiding principle is to choose $L$ and $U$ such that the true separation time is covered with as much certainty as possible. ***Comparison with Log-Rank Test Without DTE Consideration*** Compared with the log-rank test that does not incorporate the information of $S$, given the event rate $P=0.85$, the sample size under the log-rank test is: \[ \frac{4(z_{0.95} + z_{0.85})^2}{P\left(\log\left(\frac{2.8}{3.5}\right)\right)^2} = 680 \] Without incorporating the information of $S$, the required sample size is **340 per arm**. In contrast, if we employ the median survival time from post separation timepoint, 6.57, with `S_likely`$=2.28$, the sample size in terms of number of patients under the log-rank test is only \[ \frac{4(z_{0.95} + z_{0.85})^2}{P\left(\log\left(\frac{2.8}{6.57}\right)\right)^2} = 48 \] In other words, if we assume that the true separation time is $ S = 0 $, then the required sample size is at least 24 subjects per arm. On the other hand, under scenarios where the separation time is unknown, the sample size could increase up to 340 per arm. Therefore, the sample size required by our proposed method is expected to lie between 24 and 340, depending on the actual separation time. This observation highlights the value of incorporating prior information about the separation time into the design. By leveraging this information, our method can substantially reduce the required sample size, thereby improving efficiency and lowering resource demands in clinical trial planning. Our method significantly reduces the required sample size by leveraging information about the separation time S. The key reason for this reduction is the use of the ratio $\tilde{\mu}_1$ and $\tilde{\mu}_0$ instead of $\bar{\mu}_1$ and $\bar{\mu}_0$ for sample size determination. The former is a more appropriate choice when a delayed treatment effect (DTE) is anticipated. This approach allows for a more efficient design, achieving meaningful sample size savings while maintaining statistical power. **Example 3: Applying `conduct(·)` for Go/No-Go Decisions with Mock Data** We now generate two mock datasets and use the `conduct(·)` function to guide Go/No-Go decision-making at the interim and final stages based on observed data. ***Example 3.1: Mock Data Scenario Assuming $H_0$ to Be True*** The first dataset is generated under the null hypothesis $H_0$. ```{r} set.seed(126) n.interim = c(44,63) nmax=63 FUP=6 rate=6 L=2 U=2.5 median.1=2.8 median.2=3.5 lambda_1=log(2)/2.8 #hazard rate before S lambda_2=log(2)/2.8 #hazard rate after S nobs = n.interim+1 nobs[length(nobs)] = 63 wait.t = rexp(nmax,rate = rate) arrival.t = cumsum(wait.t) tobs = arrival.t[nobs] tobs[length(tobs)] = tobs[length(tobs)] + FUP S_likely=2.1 event.t.E = generate_pe(nmax,t=S_likely,lambda1=lambda_1,lambda2=lambda_1) event.t.C=rexp(nmax,rate=lambda_1) ####first interim k = 1 Ind_event_E = ifelse(arrival.t[1:n.interim[k]] + event.t.E[1:n.interim[k]] <= tobs[k],1,0) ##event indicator for th treatment, 1: event. Ind_event_C = ifelse(arrival.t[1:n.interim[k]] + event.t.C[1:n.interim[k]] <= tobs[k],1,0) t.event.E = rep(0,n.interim[k]) t.event.C=rep(0,n.interim[k]) for(l in 1:length(t.event.E)){ t.event.E[l] = ifelse(arrival.t[l]+event.t.E[l]<=tobs[k],event.t.E[l],tobs[k]-arrival.t[l]) t.event.C[l] = ifelse(arrival.t[l]+event.t.C[l]<=tobs[k],event.t.C[l],tobs[k]-arrival.t[l]) } data.E=data.frame(Time=t.event.E,Status=Ind_event_E) ## treatment dataset data.C=data.frame(Time=t.event.C,Status=Ind_event_C) ## control dataset print("Treatment Dataset:") print(head(data.E)) print("Control Dataset:") print(head(data.C)) conduct(data.E,data.C,median.1=median.1,median.2=median.2,lambda=0.95,gamma=1,nmax=63,S_likely=S_likely,L=L,U=U,trunc.para=c(12.86,0.19))[[4]] ``` The result is `Go`. Assuming that we have all the data, we are now moving to the final analysis. ```{r} k = 2 Ind_event_E = ifelse(arrival.t[1:n.interim[k]] + event.t.E[1:n.interim[k]] <= tobs[k],1,0) ##event indicator for th treatment, 1: event. Ind_event_C = ifelse(arrival.t[1:n.interim[k]] + event.t.C[1:n.interim[k]] <= tobs[k],1,0) t.event.E = rep(0,n.interim[k]) t.event.C=rep(0,n.interim[k]) for(l in 1:length(t.event.E)){ t.event.E[l] = ifelse(arrival.t[l]+event.t.E[l]<=tobs[k],event.t.E[l],tobs[k]-arrival.t[l]) t.event.C[l] = ifelse(arrival.t[l]+event.t.C[l]<=tobs[k],event.t.C[l],tobs[k]-arrival.t[l]) } data.E=data.frame(Time=t.event.E,Status=Ind_event_E) data.C=data.frame(Time=t.event.C,Status=Ind_event_C) print("Treatment Dataset:") print(head(data.E)) print("Control Dataset:") print(head(data.C)) conduct(data.E,data.C,median.1=median.1,median.2=median.2,lambda=0.95,gamma=1,nmax=nmax,S_likely=S_likely,L=L,U=U,trunc.para=c(12.86,0.19))[[4]] ``` The result is `Not Reject Null Hypothesis`, which meets our expectation. The second mock dataset is created under $H_1$ holds. ***Example 3.2: Mock Data Scenario Assuming $H_1$ to Be True*** The second mock dataset is generated under the alternative hypothesis $H_1$ ```{r} set.seed(126) n.interim = c(44,63) nmax=63 FUP=6 rate=6 L=2 U=2.5 median.1=2.8 median.2=3.5 lambda_1=log(2)/2.8 #hazard rate before S lambda_2=log(2)/6.6 #hazard rate after S nobs = n.interim+1 nobs[length(nobs)] = 63 wait.t = rexp(nmax,rate = rate) arrival.t = cumsum(wait.t) tobs = arrival.t[nobs] tobs[length(tobs)] = tobs[length(tobs)] + FUP S_likely=2.1 event.t.E = generate_pe(nmax,t=S_likely,lambda1=lambda_1,lambda2=lambda_2) event.t.C=rexp(nmax,rate=lambda_1) ####first interim k = 1 Ind_event_E = ifelse(arrival.t[1:n.interim[k]] + event.t.E[1:n.interim[k]] <= tobs[k],1,0) ##event indicator for th treatment, 1: event. Ind_event_C = ifelse(arrival.t[1:n.interim[k]] + event.t.C[1:n.interim[k]] <= tobs[k],1,0) t.event.E = rep(0,n.interim[k]) t.event.C=rep(0,n.interim[k]) for(l in 1:length(t.event.E)){ t.event.E[l] = ifelse(arrival.t[l]+event.t.E[l]<=tobs[k],event.t.E[l],tobs[k]-arrival.t[l]) t.event.C[l] = ifelse(arrival.t[l]+event.t.C[l]<=tobs[k],event.t.C[l],tobs[k]-arrival.t[l]) } data.E=data.frame(Time=t.event.E,Status=Ind_event_E) ## treatment dataset data.C=data.frame(Time=t.event.C,Status=Ind_event_C) ## control dataset print("Treatment Dataset:") print(head(data.E)) print("Control Dataset:") print(head(data.C)) conduct(data.E,data.C,median.1=median.1,median.2=median.2,lambda=0.95,gamma=1,nmax=63,S_likely=S_likely,L=L,U=U,trunc.para=c(12.86,0.19))[[4]] ``` The result is `Go`. Assuming we have the full dataset now, we can conduct the final analysis. ```{r} k = 2 Ind_event_E = ifelse(arrival.t[1:n.interim[k]] + event.t.E[1:n.interim[k]] <= tobs[k],1,0) ##event indicator for th treatment, 1: event. Ind_event_C = ifelse(arrival.t[1:n.interim[k]] + event.t.C[1:n.interim[k]] <= tobs[k],1,0) t.event.E = rep(0,n.interim[k]) t.event.C=rep(0,n.interim[k]) for(l in 1:length(t.event.E)){ t.event.E[l] = ifelse(arrival.t[l]+event.t.E[l]<=tobs[k],event.t.E[l],tobs[k]-arrival.t[l]) t.event.C[l] = ifelse(arrival.t[l]+event.t.C[l]<=tobs[k],event.t.C[l],tobs[k]-arrival.t[l]) } data.E=data.frame(Time=t.event.E,Status=Ind_event_E) data.C=data.frame(Time=t.event.C,Status=Ind_event_C) print("Treatment Dataset:") print(head(data.E)) print("Control Dataset:") print(head(data.C)) conduct(data.E,data.C,median.1=median.1,median.2=median.2,lambda=0.95,gamma=1,nmax=nmax,S_likely=S_likely,L=L,U=U,trunc.para=c(12.86,0.19))[[4]] ``` The result is `Reject Null hypothesis`, which meets our expectations. ## 8. Incorporating Early Stopping Probability into Sample Size Determination Now suppose we require the probability of early stopping to be no less than 0.4. We calculate the interim and final sample sizes using the function `Two_stage_sample_size(.)` with `earlystop_prob =0.4`, under two scenarios: `S_likely=2.28` and `S_likely=2`. In both cases, the resulting sample sizes ensure that the probability of early stopping (PS) is at least 0.4 when $S\leq U$. ***Example 8.1:*** `S_likely`$=2.28$ ```{r,eval=FALSE} median.1=2.8 #\bar{mu}_0 median.2=3.5 #\bar{mu}_1 L=2 # Lower bound of S U=2.5 # Upper bound of S S_likely=2.28 # The most likely delayed time trunc.para=c(12.86,0.19) # prior of S rate=6 # accrual rate. It assumes that the arrival time follow Poisson(rate) FUP=6 # Follow-up time of the last patient err1 = 0.1 # type I error err2=0.15 # type II error Two_stage_sample_size(median.1=median.1 , median.2=median.2,L=L,U=U,S_likely=S_likely,err1=err1,err2=err2,trunc.para=trunc.para, FUP=FUP, rate=rate, earlystop_prob =0.4,seed=123) ``` ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #earlystop_sample_size_1=readRDS("early_stopping_sample_size_1.rds") earlystop_sample_size_1=c(39,52) print(earlystop_sample_size_1) ``` With the interim and final sample size, we can check empirically the early stopping rate under $H_0$ when $S=2$ and $S=2.5$ as follows. ```{r} ##S=2 n.interim=c(39,52) getoc_2arm_piecewise(median.true=c(median.1,median.1),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = n.interim,S_likely=2.28,L=2,U=2,FUP=FUP,trunc.para=trunc.para,rate=rate,nsim=10000,seed=10)$earlystop ##S=2.5 getoc_2arm_piecewise(median.true=c(median.1,median.1),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = n.interim,S_likely=2.28,L=2.5,U=2.5,FUP=FUP,trunc.para=trunc.para,rate=rate,nsim=10000,seed=10)$earlystop ``` The results indicate that, in order to ensure an early stopping probability of at least 0.4 under $H_0$, the final sample size must increase from 40 to 52. This aligns with expectations, as the original probability of early stopping under H0 was approximately 0.24 when $S=2.5$. Achieving a ~16% increase in the probability of early stopping requires an increase in sample size. ***Example 8.2:*** `S_likely`$=2$ ```{r,eval=FALSE} median.1=2.8 #\bar{mu}_0 median.2=3.5 #\bar{mu}_1 L=2 # Lower bound of S U=2.5 # Upper bound of S S_likely=2 # The most likely delayed time trunc.para=c(12.86,0.19) # prior of S rate=6 # accrual rate. It assumes that the arrival time follow Poisson(rate) FUP=6 # Follow-up time of the last patient err1 = 0.1 # type I error err2=0.15 # type II error Two_stage_sample_size(median.1=median.1 , median.2=median.2,L=L,U=U,S_likely=S_likely,err1=err1,err2=err2,trunc.para=trunc.para, FUP=FUP, rate=rate, earlystop_prob =0.4,seed=123) ``` ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #earlystop_sample_size_2=readRDS("early_stopping_sample_size_2.rds") earlystop_sample_size_2=c(46,65) print(earlystop_sample_size_2) ``` The results indicate that to ensure a probability of early stopping rate of at least 0.4 in this case, the final sample size must be also increasing from 63 to 65. This aligns with our expectations, as the initial early stopping rate was approximately 0.37 when $S=2.5$. A modest increase in sample size yields an approximately 4% improvement in the probability of early stopping ## 9. Application of the Proposed Method in a Three-Stage Design Previously, we only considered a two-stage design. Here, we demonstrate how our method can be applied to a multi-stage setting by showing a three-stage design. Specifically, we set the interim sample sizes as `n.interim`$= c(13, 28, 40)$ and obtain the optimal values of $\lambda$ and $\gamma$ below in the first place. ```{r,eval=FALSE} n.interim=c(13,28,40) get.optimal_2arm_piecewise(median.1=median.1,median.2=median.2, L=L,U=U,S_likely=2.28,Uniform=FALSE,trunc.para=c(1,1), err1=0.1, n.interim=n.interim, rate=rate, FUP=FUP,track=TRUE,nsim = 10000,control = T,control.point = c(L,U)) #Get the optimal lambda and gamma ``` ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #result_optimal_4=readRDS("result_optimal_4.rds") result_optimal_4=c(0.9500, 1.0000, 0.0875, 0.8678) print(result_optimal_4) ``` The optimal values of $\lambda$ and $\gamma$ are 0.95 and 1, respectively. Next, we evaluate the operating characteristics when $S=2$ and $S=2.5$ like before. ***Example 9.1:*** $S = 2$ ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(13,28,40),L=2,U=2,S_likely=2.28,FUP=6,trunc.para=c(1,1),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(13,28,40),L=2,U=2,S_likely=2.28,FUP=6,trunc.para=c(1,1),rate=6,nsim=10000,seed=10) ``` ***Example 9.2:*** $S = 2.5$ ```{r} ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(13,28,40),L=2.5,U=2.5,S_likely=2.28,FUP=6,trunc.para=c(1,1),rate=6,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(13,28,40),L=2.5,U=2.5,S_likely=2.28,FUP=6,trunc.para=c(1,1),rate=6,nsim=10000,seed=10) ``` Compared to the two-stage design, the three-stage design exhibits similar operating characteristics, with only a slight increase in probability of early stopping under $H_0$. This suggests that incorporating an additional interim analysis enables earlier stopping without substantially altering the overall statistical properties of the design. ## 10. When an expert-elicited prior with high certainty is used In this subsection, we conduct a further sensitivity analysis under the following scenarios: - When an expert-elicited prior with high certainty is used. - When a default prior for $S$ is used. ***Example 10.1: Expert-Elicited Prior for*** $S$ Suppose we have three experts providing the following prior information: - **Expert A**: Mean = 2.28 - **Expert B**: Median = 2.4 - **Expert C**: Mean = 2.4, Median = 2.3 ```{r,eval=FALSE} expert_data_correct <- list( list(mean = 2.28, median = NULL, sd = NULL, q25 = NULL, q975 = NULL), # Expert A list(mean = NULL, median = 2.4, sd = NULL, q25 = NULL, q975 = NULL), # Expert B list(mean = 2.4, median = 2.3, sd = NULL, q25 = NULL, q975 = NULL) # Expert C ) param_1 <- trunc_gamma_para( L = 2, U = 2.5, expert_data = expert_data_correct, weights = c(4,4,2,1,1), num_cores = 4,seed=123 # Number of cores for parallel computing ) print(param_1) ``` The estimated $s_1$,$s_2$ are 19.49 and 0.27 and the density plot of the prior of $S$ is as below ```{r} plot(density(rtrunc_gamma(1000,L=2,U=2.5,shape=19.49,scale=0.27)),xlab="",main="") ``` Compared with the prior we used before, the new prior is more concentrated around 2.42. Now, we set `S_likely`$=2.28$, and the optimal parameters of $\lambda,\gamma$ is obtained ```{r,eval=FALSE} n.interim = c(28,40) trunc.para=param_1 get.optimal_2arm_piecewise(median.1=median.1,median.2=median.2, L=L,U=U,S_likely=S_likely,Uniform=FALSE,trunc.para=trunc.para, err1=err1, n.interim=n.interim, rate=rate, FUP=FUP,track=TRUE,nsim = 10000,control = T,control.point = c(L,U)) #Get the optimal lambda and gamma ``` ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #result_optimal_3=readRDS("result_optimal_3.rds") result_optimal_3=c(0.9500, 1.0000, 0.0870, 0.8628) print(result_optimal_3) ``` From the results, the optimal values of $\lambda$ and $\gamma$ remain the same as those obtained under the default prior with slightly different average type I error and power. Consequently, the operating characteristics, including type I error and power, are nearly identical to those under the default prior. We give one example to illustrate it. Suppose that `S_likely=2.28` and $S$=2.5 ```{r} median.1=2.8 #\bar{mu}_0 median.2=3.5 #\bar{mu}_1 L=2 # Lower bound of S U=2.5 # Upper bound of S S_likely=2.28 # The most likely delayed time trunc.para=c(19.49,0.27) # prior of S rate=6 # accrual rate. It assumes that the arrival time follow Poisson(rate) FUP=6 # Follow-up time of the last patient err1 = 0.1 # type I error err2=0.15 ##When H0 holds, the reject rate is the type I error. getoc_2arm_piecewise(median.true=c(2.8,2.8),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2.5,U=2.5,S_likely=S_likely,FUP=FUP,trunc.para=trunc.para,rate=rate,nsim=10000,seed=10) ##When H1 holds, the reject rate is the power. getoc_2arm_piecewise(median.true=c(2.8,3.5),Uniform=FALSE,lambda=0.95,gamma=1,n.interim = c(28,40),L=2.5,U=2.5,S_likely=S_likely,FUP=FUP,trunc.para=trunc.para,rate=rate,nsim=10000,seed=10) ``` This suggests that the operating characteristics have stable performance to expert-elicited prior. ***Example 10.2: Default Prior for*** $S$ If sufficient information is unavailable to determine the prior parameters $s_1$ and $s_2$ of $S$, we recommend using the default prior with $s_1 = 1$ and $s_2 = 1$. As shown below, the resulting prior distribution is approximately uniform over $[L, U]$, indicating a vague prior for $S$. ```{r} plot(density(rtrunc_gamma(1000,L=2,U=2.5,shape=1,scale=1)),xlab="",main="") ``` In addition, the optimal parameters, as shown below, are identical to those obtained using the expert-elicited prior and yield comparable average type I error and power. ```{r,echo=FALSE} #setwd("C:\\Users\\zcai13\\OneDrive - St. Jude Children's Research Hospital\\Desktop\\meeing_ht\\meeing_ht\\Jianrong Wu\\NPH\\BOP2+NPH\\R code\\cache") #result_optimal=readRDS("result_optimal.rds") result_optimal=c(0.9500, 1.0000, 0.0876, 0.8694) print(result_optimal) ``` Therefore, if we have no expert data for the $S$, we may use the default prior for $S$. ### Conclusion This work presents a flexible and efficient framework for designing two-arm multi-stage survival trials with delayed treatment effects (DTE). By incorporating prior knowledge about the DET separation time point S, our method enables substantial reductions in required sample size while preserving type I error control and achieving desired power. The use of ratio $\tilde{\mu}_1/\tilde{\mu}_0$ instead of $\bar{\mu}_1/\bar{\mu}_0$ in sample size determination offers a more appropriate and robust basis for design when DTE is anticipated. Through comparison with the traditional log-rank test, we demonstrated that our method could achieve the same operating characteristics with far fewer events/sample sizes. Sensitivity analyses further confirmed the robustness of our approach under varying prior specifications and design parameters. Additionally, we illustrated the applicability of the method to three-stage designs, given sample sizes in each stage, how to conduct the trial, e.g, to make Go/No-GO decisions. Overall, this approach offers a principled, data-informed, and computationally feasible solution for designing efficient survival trials under delayed treatment effects, and serves as a valuable tool for researchers and practitioners in clinical trial planning. ## References