All randomized controlled trials are adaptive to some degree. For example, inclusion criteria can be altered, new sites can be enrolled, or recruitment to a trial arm can be stopped. This paper draws a distinction between trials that are adaptive by design and trials that are reactive. Reactive trials change the design of the study in response to unforeseen events; adaptive designs change the conduct of the study in response to accumulating evidence, according to a prespecified protocol.
There is a substantial amount of literature on adaptive designs. The objective of this paper is not to review this literature, but rather to highlight some examples of adaptive designs that can be helpful for surgical trials. In particular, its focus is on the ability to change allocation ratios to trial arms during the course of a study, the use of priors for monitoring a study, and sample size re-estimation. Since Bayesian methods play a role in this, the next section gives an overview of Bayesian methodology.
The idea at the heart of Bayesian methods is that we learn incrementally. That is, we do not start from zero knowledge, perform a trial, and then base all our inferences on that trial. Rather, before beginning a trial, we have some knowledge about the interventions being studied, and the trial itself will add to that body of knowledge. In traditional frequentist statistics, this concept is recognized by performing a meta-analysis before the trial has begun, conducting and analyzing the trial, and updating the meta-analysis with the results of this new trial. Similarly, Bayesian methods begin by setting a prior distribution that describes the existing knowledge and uncertainty. The information from the trial is gathered and summarized in the likelihood function. This information is used to update the knowledge and uncertainty about the effectiveness of the intervention (i.e., to form the posterior distribution). The key advantage of Bayesian methods is that this updating of beliefs is an integral part of the analysis rather than an optional extra.
Since Bayesian methods express uncertainty about treatment effects through probability distributions, they allow investigators to quantify the probability that the new intervention is better than the old. This is a feature often used in adaptive designs, as is illustrated in the next section.
As described by Cook1 there are three phases of intervention evaluation in surgical trials, including exploratory (early-stage evaluation), explanatory (evaluation under ideal conditions), and pragmatic (evaluation to inform decision-making under realistic conditions). Examples of adaptive Bayesian designs for each stage of evaluation are considered here.
Early-Stage Intervention Evaluation: Dropping Arms in an Exploratory Study
One type of adaptive design is a design that alters allocation ratios to arms of a trial in response to the accumulated data. The simplest example is that of ceasing enrollment to one or more arms after an interim analysis. This might be done, for example, in a dose-finding study that starts with many doses of an active drug, which are compared with standard-of-care at one or more interim analyses with the intention to cease enrollment to those arms that show a low probability of being efficacious.
Example: Use of Probabilities To Drop Arms from a Study
Smith et al.2 described a phase-II randomized trial of a new therapy for neuropathic pain (versus placebo). There were seven candidate doses. The outcome was a pain score, measured on a scale from 0 to 10, and an improvement of 1.5 units over placebo was required for a dose to be considered worthy of future study.
The study was designed to have two interim analyses. At each interim analysis, a given dose was considered futile if there was at least an 80% chance that the true dose effect was less than 1.5. Any dose meeting the futility criterion had no additional patients randomized to it. The sample size was 280 patients (thirty-five patients per group). The study design is illustrated in Figure 1.
At the first interim analysis, 100 patients had been recruited, and eighty of them had evaluable data. All seven doses had a probability of 1 of being futile, and the study was stopped. From the pharmaceutical company’s perspective, it was preferable to discover this after enrolling 100 patients rather than spending the time and money needed to enroll all 280 patients.
Bayesian adaptive designs have been widely used in phase-II cancer trials (one example was described by Biswas et al.3). This can allow for a seamless phase II/phase III trial, in which the exploratory and explanatory phases of investigation are combined into a single trial (e.g., by starting with many active doses and dropping arms until just two interventions remain, as described by Stallard and Todd4).
Surgical trials do not generally involve dose-finding. However, being able to drop some arms of the study at interim analyses is still useful. Because of the fast pace of the development of technologies in surgery, randomized controlled studies can be difficult to implement: by the time the trial is completed, the technology may be obsolete. This has led to a call for tracker trials5, which start in periods of rapid development and aim to compare multiple versions of the new technology compared with standard of care. These tracker trials need to identify superior treatments quickly and drop unpromising arms early. Similar to the dose-finding study, one could specify the minimum effectiveness that the new intervention would have to demonstrate to be worth adopting, and then, at interim analyses, quantify the probability that the intervention meets this threshold. Interventions with a low probability could then be dropped.
When dropping arms from a study, the objective is to discard unpromising arms early, thus targeting resources on the most promising interventions. However, since evidence on the lack of efficacy of dropped arms is limited, this approach is best suited to exploratory trials. In an explanatory trial, one might still consider stopping early, but the indications for stopping would be different. The following section describes how priors can be used to monitor an explanatory study and hence guide decisions on early stopping.
Later-Stage Evaluation: Priors for Monitoring Explanatory Studies
Priors for a study must be specified carefully. There are three main methods of selecting a prior: using a vague prior, thus letting the data dominate; using prior evidence (e.g., from a meta-analysis); or using expert opinion, which should be used cautiously6,7. Since individuals’ beliefs need not be identical, especially if existing evidence is limited, there may not be one “correct” prior, and, instead, a family of priors representing the range of prior beliefs can be used8. One such example is the use of skeptical and enthusiastic priors when considering stopping a trial early for efficacy or futility.
Here the investigators would specify three priors at the start of the study: the prior to be used for the main analysis, a skeptical prior, and an enthusiastic prior. The skeptical prior would represent the beliefs of a reasonable skeptic as to the new treatment’s efficacy. The enthusiastic prior would represent the beliefs of a reasonable enthusiast. One would consider stopping the trial for futility if the reasonable enthusiast could be convinced that the new treatment was not efficacious (i.e., the data would be analyzed with use of the enthusiastic prior, and stopping would be considered if this returned a high probability that the new treatment was no better than the control treatment). Similarly, one would consider stopping for efficacy when there is sufficient evidence to convince a reasonable skeptic that the new treatment is better than the old. Under these circumstances, the data would be analyzed with use of the skeptical prior, and a recommendation for stopping would be considered if this returned a high probability that the new treatment is better than the old8. Thus, stopping is considered only when there is sufficient evidence to bring the skeptic and the enthusiast to a common qualitative conclusion.
Example: The CHART Trials
The CHART trials compared continuous hyperfractionated accelerated radiation therapy (CHART) with conventional radiation therapy in lung cancer9. The study aimed to recruit 600 patients to give 90% power to detect an improvement in overall survival from 15% to 25% at two years. The eleven clinicians taking part in the trial described their beliefs about the effectiveness of CHART, and this formed the enthusiastic prior. The skeptical prior placed a small probability (5%) on the event that CHART would attain the minimal clinically important difference of a 10% improvement in survival.
Interim analysis was done yearly with the enthusiastic and skeptical priors. Although monitoring using conventional statistical procedures would likely have resulted in a recommendation for early stopping (Table I), analysis with the skeptical prior suggested that the evidence would not convince the reasonable skeptic, and, therefore, the trial continued to completion. At the final analysis, CHART was shown to be superior to conventional radiation therapy, giving a 9% improvement in the two-year survival rate. The reasonable skeptic would be convinced that CHART was superior (with a 99% probability), but not that it achieved the minimal clinically important difference (because the probability was only 9%).
Monitoring with priors can be with or without preset rules. If the trial is intended for regulatory approval, preset rules will be helpful, as the United States Food and Drug Administration (FDA) requires that the frequentist properties of any Bayesian adaptive design be investigated beforehand10.
Trials Incorporating Economic Evaluation: Sample Size Re-Estimation
For pragmatic trials intended to inform regulatory decisions, Bayesian decision theory can guide trial design. A decision needs to be made as to whether to fund the new intervention; there is a utility associated with making the correct decision and a loss incurred in making an incorrect decision. The Bayesian adaptive design uses prior information to determine the optimal sample size for the trial and to re-estimate the sample size at an interim analysis.
Decisions on whether or not to fund a new intervention are sometimes made on the basis of the intervention’s cost-effectiveness. Better efficacy may come at an increased cost, and thus a trade-off is made between increased efficacy and increased cost. When making a decision, the existing evidence on cost-effectiveness may be inconclusive. A further study would result in more information and, hence, less uncertainty. This, in turn, means there is a greater chance of adopting the more cost-effective treatment and therefore, on average, there is better cost-effectiveness. However, while more information means better cost-effectiveness, more information costs more money. Thus, there is a trade-off between money spent on a trial and money saved after the trial. The optimal sample size is one that maximizes the overall expected gain from doing the trial (i.e., the difference between the expected value of information from the trial minus the cost of doing the trial)11,12. This determines the initial sample size. Willan and Kowgier13 described how this method can be used in a group-sequential trial with a single interim analysis. The interim analysis yields a posterior distribution for the cost-effectiveness, which then becomes the prior for the second stage of the study and is used to revise the sample size requirement for the remaining portion of the trial.
Example: A Study of Early Versus Late External Cephalic Version in Pregnant Women
In an example of the use of Bayesian decision theory for sample size estimation, Willan and Kowgier13 described a trial that investigated early versus late external cephalic version in pregnant women with a baby presenting in the breech position. The primary effectiveness outcome was a noncaesarean delivery. With a willingness-to-pay of $1000 to avoid a caesarean delivery, and with use of data from the pilot study to form a prior distribution for the cost-effectiveness, the optimal sample size for a single-stage trial was 345 patients per arm. However, with a two-stage design, the optimal sample size for the first stage was 155 patients. On average, the optimal sample size for the second stage was 124 patients per arm, for a total of 279 patients per arm (Table II). In other words, a two-stage design with sample size re-estimation according to a Bayesian decision rule reduced the expected total sample size by 132 (19%).
One of the difficulties in conducting surgical trials is that once the technology has stabilized, clinical equipoise may have been lost, even in the absence of evidence from randomized controlled trials1. An additional advantage of using Bayesian decision theory to set sample size is that it provides an alternative definition of equipoise: there is decisional equipoise if the optimal sample size for a new trial is non-zero (i.e., if there is a positive net gain from doing a new trial).
Bayesian adaptive designs offer flexibility to adapt the trial by dropping arms, altering allocation ratios, stopping the trial early, or adjusting the sample size. These are attractive features when investigating an intervention that evolves rapidly, since a traditional trial may risk irrelevance by the time it is completed.
However, Bayesian adaptive designs are not a magic bullet. When arms are dropped early, information on their efficacy is limited. More complex designs are less readily understood, and thus less persuasive. Moreover, sample sizes will not always be smaller with a Bayesian adaptive design: the CHART trial would likely have been stopped early if a frequentist monitoring procedure had been used. If regulatory approval is to be sought, then the power and type-I error rate of the adaptive design will need to be evaluated10, requiring complex simulation studies.
How do we realize the potential of these methods? Firstly, there needs to be education, both among trialists and statisticians. Although Bayesian methods are not new14, most students of statistics encounter frequentist statistics first and Bayesian statistics much later, if at all. Beyond education in basic Bayesian methods, education on Bayesian adaptive designs is needed. These designs are often presented overenthusiastically, which naturally raises suspicion.
Secondly, there must be practical experience. This would clarify which methods work best for the various types of questions that are asked in surgical trials, and would iron out practical issues. Moreover, we need experienced teams rather than experienced individuals. Biswas et al.3 described how capacity in Bayesian methods was built at the University of Texas MD Anderson Cancer Center: statisticians familiar with Bayesian methodology were assigned to work with specific clinicians, forming collaborative relationships that facilitated the uptake of suitable designs.
In summary, adaptive designs can allow for unpromising interventions to be dropped early, for seamless transition between exploratory and explanatory phases of evaluation, for conservative evaluation in the explanatory phases, and for designs that are specifically tailored to assess cost-effectiveness. With education and experience, these designs could present a helpful alternative to standard trial designs.