A Practical Look at Sample Size Re-Estimation of the Promising Zone Design

The randomized controlled clinical trial (RCT) with a fixed design has been a gold standard for evaluating drug safety and efficacy since its debut. However, for traditional fixed-design RCTs, one drawback is that they may fail to meet the need for flexibility and adaptivity in modern drug development. For instance, sample sizes cannot be adjusted for a lower-than-expected treatment effect size once the study has started. Adaptive study designs that allow responsiveness to efficacy data and adjustment to the study are a desirable alternative that may result in more informative and successful trials.

To date, many adaptive designs have been developed and implemented in actual trials (Table 1). One example is group sequential designs (GSDs), which allow pre-specified interim analyses—statistical analyses of the current data while the study is ongoing—where the study can be stopped early for either futility or efficacy. Adaptive designs can be leveraged in RCTs to introduce flexibility and allow the study to change to best meet its objectives.

Table 1. Overview of adaptive designs with examples of trials employed these methods (Pallmann et al., 2018)

Despite many of the promising features, the uptake of adaptive designs in clinical trials remains low, mainly due to statistical and operational challenges. For instance, in many GSDs, performing multiple interim analyses may result in increasing statistical penalties that can lead to larger maximum sample sizes than a fixed design.

An adaptive design option that can be added to a GSD or used on its own without a penalty at the final analysis is the “promising zone” method of sample size re-estimation (SSR) introduced by Mehta and Pocock (2011). This design allows re-evaluation of the sample size based on the evidence accumulated at the interim analysis to determine whether an increase in sample size is needed to ensure the conditional power meets the desired target. The sample size is only increased when the interim results show some amount of efficacy but are not strong enough for the conditional power to be as high as the target power for the study. In this way, the SSR can help increase the chance of a study showing a significant result when the treatment is efficacious, reducing the risk of expensive failed studies with ambiguous non-significant results. For more details about the benefit of this SSR design, please refer to Mehta and Pocock’s paper.

Application

Trial operating characteristics are quantifications of how a design performs in various scenarios. One example of an operating characteristic is a trial’s power, which measures its ability to detect a clinically significant treatment effect if one exists. While fixed-design RCTs have relatively well-known and easy to calculate operating characteristics, more complex adaptive designs are often required by regulatory authorities to be demonstrated in the protocol for a design to be acceptable. For many adaptive designs, operating characteristics are usually demonstrated by running many in silico virtual simulations of the experiment. With better understanding of the trial operating characteristics, informed decisions can be made by the sponsor for improved trial efficiency. Mehta and Pocock’s theoretical proofs of the SSR promising zone method demonstrate that their method will have the appropriate operating characteristics of type 1 and type 2 error rates (and, thus, power) to provide a valid trial design.

In this practical guide, we specifically discuss unblinded SSR with the promising zone design. This design, among many other adaptive designs, e.g., GSD, mainly relies on conditional power. Conditional power is the probability of rejecting the null hypothesis conditioned on the evidence accumulated at the interim analysis, i.e., the interim estimate of the drug effect size. The “promising zone” SSR method pre-defines boundaries for the conditional power to define 3 distinct zones: a “favourable” zone where no sample size adjustment is needed; a “promising” zone where the sample size increase can bring the conditional power to a desired level; and an “unfavourable” zone where a significant study result is unlikely.

In addition to the trial type 1 error and power specifications, three key parameters are to be defined for SSR: 1. timing of the interim analysis, e.g. when 50% of subjects have reported the primary endpoint data, 2. Maximum affordable sample size, e.g. 1.5 times of the initial sample size, 3. Initial sample size calculated (Figure 1).

Figure 1. Flow diagram of key parameters for re-estimating the sample size at the interim analysis

We will walk through a case study to give you a better idea how SSR is applied. Assume a two-arm trial comparing the treatment effect of an anti-viral drug vs. the placebo is to be conducted. The typical two-sided type 1 error rate of 0.05 and power of 90% are assumed. Assuming based on the literature that the placebo has a response rate of 5% and that the treatment effect is 0.15 (that is, that the treatment response rate will be 20%) then a total sample size of 202 subjects is needed for a fixed design. However, the study sponsor admits that an effect of 0.15 is uncertain and may be overly optimistic; they say that a more realistic assumption can be anywhere between 0.105 (minimally clinically significant difference) and 0.15. We therefore introduce the SSR design to meet the need for dynamic adjustment of the sample size for uncertain treatment effects. This way, if the treatment effect is less than expected, we will still have the chance to increase the sample size to ensure the trial retains the desired power. The sponsor places a limit on funding up to 1.75 x the sample size of 202. Then this, 354 subjects, is the maximum affordable sample size. With the last assumption that the sample size will be re-estimated when 50% of the subjects (101 subjects) report the primary outcome we are able to perform the interim look at the data and decide whether we need to increase the sample size.

Based on these assumptions and the formulae developed by Mehta and Pocock, we can define the promising zone: we find that the promising zone is when the conditional power is between 37.9% and 90%. The % increase in sample size given the conditional power at the interim analysis is plotted in Figure 2: if the conditional power of the actual trial data falls within this promising zone, the sample size will be increased to a maximum of 1.75 times the originally planned sample size to maintain the desired power of 90%. If the conditional power is already at or above 90%, no modification to the trial is necessary. If the conditional power is below the promising zone, there is no sample size increase, and the statistician performing the analysis may give a non-binding recommendation to stop the trial for futility.

In discussion with the sponsor, we can re-calculate the promising zone for various sample sizes, maximum limits, and interim analysis timings. The optimal design can then be chosen among these alternatives. Finally, while there is no penalty for performing this SSR, there are important considerations (including careful calculation of the final estimates at end of study to prevent bias) for which an experienced statistician is needed.

To fully leverage the benefits of SSR promising zone design, or for more insights about SSR or other adaptive designs, please do not hesitate to contact us.

Reference

Pallmann, P., Bedding, A.W., Choodari-Oskooei, B. et al. Adaptive designs in clinical trials: why use them, and how to run and report them. BMC Med 16, 29 (2018). https://doi.org/10.1186/s12916-018-1017-7

Mehta, C.R. and Pocock, S.J. (2011), Adaptive increase in sample size when interim results are promising: A practical guide with examples. Statist. Med., 30: 3267-3284. https://doi.org/10.1002/sim.4102

Articles

A Practical Look at Sample Size Re-Estimation of the Promising Zone Design

Application