Abstract
Abstract:
Controversy exists about whether or not similar standards apply to the clinical evaluation of orthopaedic implants and pharmaceuticals. The long-lasting dispute is likely to be abandoned shortly, given that certain regulatory bodies in Europe now mandate proof of effectiveness by randomized controlled trials (RCTs) prior to market approval of innovative devices. This is a timely signal—it will help to strengthen both the credibility of orthopaedic researchers among all health-care disciplines and the role of manufacturers as creative minds and scientific partners. Yet, it must be accompanied by substantial changes in the current trial landscape.
Given the level of perfection of available orthopaedic technology, superiority of a new product over an established standard will become a rare finding. Noninferiority or equivalence must be accepted as important trial results by investigators, sponsors, clinicians, and health authorities to enhance the spectrum of therapeutic options and help to individualize patient care.
Specific problems are slow recruitment rates and long intervals from the protocol stage to publication of results. This may counteract the innovative potential of a novel product. Pragmatic trial designs, lean but complete documentation, limited but precise end points, the avoidance of competing trials, and the fostering of international collaboration are possible ways to streamline clinical trials of orthopaedic devices. Finally, RCTs should be conducted, conditional to the presumed level of innovation of a new implant, and supplemented by data from registries to fully determine the utility, value, and safety of the intervention.
In 1947, the first formally planned, randomized controlled trial (RCT) in health care flagged the starting point of a new era of clinical research1. Under the statistical supervision of Professor (later Sir) Austin Bradford Hill, the British Medical Research Council (MRC) conducted a trial in which 107 patients with pulmonary tuberculosis were assigned according to a random-number list and sealed envelopes to receive the aminoglycoside antibiotic streptomycin (n = 55), which had been discovered in 1943, or bed rest alone (n = 52), which was the established standard of care at that time, to investigate whether the new experimental drug could improve symptoms and reduce mortality.
After six months, twenty-eight of fifty-five patients in the interventional group but only four of fifty-two patients in the control group had considerably improved. The effect of streptomycin on mortality was dramatic—it reduced the rate of tuberculosis-associated fatalities from fourteen of fifty-two to only four of fifty-five, equaling a reduction in the relative risk of death of 73% (risk ratio 0.27, 95% confidence interval [CI], 0.10 to 0.78), which was a highly significant result (p = 0.007).
The MRC trial is remarkable in many ways, apart from the fact that patients were allocated truly at random to either intervention (rather than in an alternating fashion, as typically used in clinical research in those days). The decision to randomize patients was made because of an ethical dilemma that is in contrast to the situation in the developed countries today. A promising, specific treatment for a formerly incurable disease with marked impact on public health seemed within reach. However, the toxicity profile of the new antimicrobial drug was unknown, and its supply was highly limited. This forced authorities to choose the probably fairest and most independent way of deciding to whom the new drug should be applied: chance. Nowadays, the ethical dilemma is almost the opposite—there are simply too many alternative treatment options for certain conditions available, making sufficiently powered RCTs an invaluable filter to identify the useful, harmful, useless, safest, equally effective, and, sometimes, surprising candidates to be approved for routine use in clinical practice. In countries with compulsory health insurance systems, data from RCTs are ultimately needed by authorities to decide which interventions should be included in the catalog of benefits that are financially covered by the community. This is especially true for the vast and still increasing number of orthopaedic implants that differ in certain design features but not in the biological or biomechanical concept behind them.
Orthopaedic surgeons (like other surgeons) as well as manufacturers have often been blamed for not adhering to rigorous scientific standards when testing new devices, processes, and procedures, and for ignoring RCTs as a standard methodology for evaluating effectiveness2-4. This perception, however, no longer reflects reality.
In 2010, 239 scientific articles appeared in the print version of The Journal of Bone and Joint Surgery (American Volume), the premier international periodical in orthopaedics. There were eighteen RCTs (7.5%, 95% CI, 4.5% to 11.6%)5-22. This raw proportion is, however, uninformative and probably misleading. The question is not how many RCTs are conducted in orthopaedic surgery in general, but how often a randomized design is considered when comparing different therapeutic interventions in a head-to-head fashion.
In fact, thirty-three (13.8%) of all 239 original articles appearing in The Journal in 2010 dealt with therapeutic effectiveness research, of which eighteen of thirty-three studies (54.5%; 95% CI, 36.4% to 71.9%) were RCTs.
Without doubt, there is some room for improvement in convincing orthopaedic scientists, clinicians, and manufacturers that the RCT is the most valid, elegant, clear, and analytically simple design to determine the causal contribution of a certain intervention to a predefined outcome. However, the good and stimulating news is that, during the past years, more and more orthopaedic researchers adopted and promoted the principle of random allocation in clinical research—RCTs now belong to the orthopaedic methodological toolbox just as any other study design does (e.g., case series, case-control studies, and cohort studies), and, even more important, just as any other health-care discipline does. For example, numerous RCTs are available that compare bone-tendon-bone grafts with hamstring grafts for anterior cruciate ligament replacement23, computer-navigated joint arthroplasty with conventional joint arthroplasty24, surgical management of scaphoid fractures with conservative management of scaphoid fractures25, and so on.
RCTs, however, are gaining a more important role far beyond investigator-initiated research, as they will have an impact on the decision about market approval of new orthopaedic products in the future. Following the European policy of reclassifying implants such as total joint replacements from class IIB to the highest class, III, the major changes in the Medical Device Directive 93/42/EEC in 2007, and the significant amendments to the Medicinal Products Act in Germany in 2010 as well as potential adjustments to the 510(k) premarketing process by the United States Food and Drug Administration26, manufacturers are now obliged to submit data from clinical trials when applying for approval of a new device. This is an entirely new situation that must be mastered jointly by legal authorities, industrial sponsors, and clinical researchers. While it is reasonable to test certain orthopaedic implants according to pharmacological standards before their broad distribution and regular clinical use, there remain specific problems that still await intelligent solutions.
Going back to the MRC trial, no specific treatment for a potentially fatal disease was available in 1947, and the anticipated (and subsequently observed) effect size of streptomycin was large enough to be demonstrated with a rather small sample of only 107 patients. One may argue that an observational study or even a case series would have been sufficient to prove the effectiveness of the experimental intervention in this particular scenario. An important lesson to be learned from the historic trial is that patients may recover spontaneously and without a specific medical or surgical intervention. Mortality in the bed-rest arm was obviously high (27%), but not 100%. The logical consequence is that with any new effective therapy for a certain condition, the expected additional utility of this therapy must diminish, given previous therapeutic advantages and the infinite number of confounding variables.
A simple but intriguing example in orthopaedic trauma is the treatment of tibial shaft fractures. Figure 1 illustrates how effect sizes vary according to concept, technique, or type of implant or modification. The largest differences, if any, are observed between different concepts of care, such as cast stabilization compared with surgical treatment. With each downgrade in the order of possible comparisons (e.g., external versus internal fixation, internal fixation by plates or nails, intramedullary fixation by different nails or different locking options), the predicted effect sizes fade gradually.
Of the eighteen RCTs published in The Journal in 2010, thirteen investigated a total of ninety-two continuous end points6,7,9,11-13,15,16,18-22. Figure 2 illustrates the observed standardized mean differences between treatment groups (i.e., the mean divided by the standard deviation). Empirically, effect sizes of <0.5 indicate small effects; 0.5 to 0.8, medium effects; and >0.8, large effects.
There were fourteen (15.2%; 95% CI, 8.6% to 24.2%) of ninety-two findings that were incompatible with chance. However, medium and large effect sizes were observed with only five (5.4%; 95% CI, 1.8% to 12.2%) and six (6.5%; 95% CI, 2.4% to 13.7%) outcomes, respectively.
Clinical researchers must accept the low probability of observing large effect sizes to avoid frustration. It is understandable that many manufacturers are reluctant to fund costly comparative trials with predictable marginal differences between the interventions under investigation. However, turning a blind eye to ceiling effects in modern health care that preclude big differences in patient-centered and surrogate outcomes between advanced technologies and techniques is not helpful for any stakeholder. Although European device regulations allow for rare exceptions to the trial-dependent approval process, there will be no alternative to the performance of premarketing clinical trials in the long run. Results from clinical trials will clear the market of unnecessary products and streamline the portfolio of manufacturers. Building the necessary logistics and networks for multicenter clinical investigations now will sustain competitiveness and secure advantages in the market in the future.
Information about nonsuperiority, noninferiority, or equivalence of a certain intervention compared with the standard of care in primary end points forms integral decision criteria in health policy. Comparable performance can enhance the spectrum of therapeutic options and help in individualizing treatment decisions.
Obviously, time is crucial when bringing innovations into clinical care and to individual patients. One of the most urgent tasks in refining the culture of clinical trials in orthopaedic surgery and in keeping researchers and sponsors motivated is to shorten the period from the protocol phase to publication of the clinical results.
Of the eighteen RCT reports published in The Journal in 2010, eleven contained complete information about recruitment periods and the number of screened, randomized, and evaluated patients.
The median duration of recruitment was thirty-five months (range, fourteen to eighty-five months), and the median interval from commencing the trial (or trial registration) and print publication was seventy-five months (range, thirty-one to 240 months).
The median number of patients screened was 153 (range, fifty-one to 1256). The median number of randomized and evaluated patients was 105 (range, forty-eight to 362) and eighty-nine (range, thirty-three to 280), respectively. The related proportions of randomized (of all screened) patients and evaluated (of all randomized) patients were 46.4% (95% CI, 44.7% to 48.1%) and 80.0% (95% CI, 77.9% to 81.9%).
Among all eighteen RCTs, the proportion of evaluated (of all randomized) patients was 83.0% (95% CI, 81.6% to 84.5%).
While the 80% margin is typical, generally accepted, and hardly to be exceeded, joint efforts must be made to optimize the ratio of randomized to screened and potentially eligible patients. Pragmatic designs with a limited number of inclusion and exclusion criteria may ease randomization, accelerate recruitment, and enhance external validity (that is, the transferability of results into practice). Competing small trials should be avoided, and international collaboration must be strengthened to shorten recruiting periods to about one or two years. Easy trial documentation with use of electronic data capture systems with immediate plausibility checks and a limited set of baseline and follow-up data may enhance the cooperation of local investigators and their adherence to protocol, reduce the number of missing values, and facilitate and expedite the final statistical analysis. Principal investigators and sponsors should discipline themselves with regard to the number of end points of interest, since only one outcome can be addressed in a confirmatory fashion anyway. As a rule of thumb, one disease-specific or injury-specific, one generic, one surrogate, and one safety outcome should be sufficient to answer almost all relevant questions that arise with regard to a proposed innovation. It is more likely that a change in clinical practice will occur as a result of data from concise trials with limited but complete and highly precise information based on a large sample of patients than as a result of information from trials with imputed data sets and an abundance of outcomes susceptible to a multiple test problem.
Randomized controlled trials give an unbiased estimate of the effectiveness of a certain procedure compared with another, under experimental conditions. Since the duration of follow-up in a clinical trial is usually short (about one to two years) and the number of patients is limited (even in the largest trials) compared with the number of subjects receiving the particular treatment after market approval, large-scale postmarketing surveillance programs or registries are needed to describe the value of an intervention introduced into daily practice and to patients who may not suit the eligibility criteria of an individual trial.
Health-services or outcomes research has become an integral part of health-technology assessment, and routine hospital data, registries, and postmarketing surveillance substantially contribute to this pillar of research. In contrast to common misperception, registries cannot replace RCTs27 but, from a public health perspective, may either confirm or reject the findings of RCTS. It is vital to know whether the results from controlled clinical experiments can be reproduced under daily practice conditions and if the results apply to an unselected population as well. With pragmatic trials that limit the number of inclusion and exclusion criteria, the gap between experimental and outcomes research can be narrowed but definitely not closed.
Thus, researchers, sponsors, and health authorities should be aware that demonstrating the effectiveness of a certain device or procedure under clinical trial conditions does not mean blanket approval forever. If postmarketing studies show a high frequency of unexpected complications or an unfavorable cost-utility ratio, even the results of the most promising RCT must be questioned. It is important to remember that the complications that led to the global recall of a hip-resurfacing system were observed after the typical follow-up period for RCTs28.
According to industry standards, there are three main levels of innovation: Level I, incremental (i.e., technical modification); Level II, substantial (i.e., marked improvement of an established principle); and Level III, radical (i.e., entirely new concept). It makes little sense to demand RCTs for minor technical modifications of an established principle in orthopaedic surgery that are unlikely to influence patient-centered outcomes. It also does not make sense to call for an RCT if the use of a certain device or procedure is so convincing that it is likely to change current clinical practice. The first scenario is common, and the latter is the exception—thus, they do not pose big problems to health care. Prioritization is required to maintain the resources needed for evaluating as many innovative devices as possible under the strict criteria of RCTs29.
Interventions that are meant to meet an intermediate or substantial level of innovation remain the leading problem in orthopaedic surgery. They suggest a measurable advantage in the outcomes of care as a result of the use of implant modifications, distinct surgical techniques, or sophisticated assisting systems on the basis of preclinical or early clinical data.
Figure 3 shows a simple concept of how orthopaedic developments may be rated and evaluated according to their suggested level of innovation. Interventions with a presumed moderate impact on clinical practice and health-care decisions should be given priority for evaluation by large-scale RCTs, followed by postmarketing surveillance and monitoring of device-associated adverse events. Technical modifications may show surprising results in a clinical pilot study, thus requiring an upgrade to a higher level of innovation, while products with predicted radical impact may fail during an early clinical investigation, demanding downgrading and experimental comparison of its effectiveness to the standard of care.
While this proposal is incomplete, it may serve as an early matrix and hierarchical decision tool to efficiently use the capacities for RCTs in the device-approval process. Major challenges will arise with biologically coated hybrid implants that need to be approved under both device and pharmacological regulations and rules.
In contrast to common belief, RCTs already make up about 50% of all of therapeutic effectiveness research in orthopaedics. According to recent and upcoming changes in approval processes worldwide, most experimental devices and implants that are intended to be used for fracture fixation and joint replacement must now or in the near future be evaluated under the strict criteria of RCTs.
Important barriers are ceiling effects with a low chance of demonstrating superiority of an innovation to the standard of care, meaning that noninferiority and equivalence will remain the best achievable outcomes for most new products entering the market and clinical practice. Device companies cannot bypass the RCT prerequisite, and the contribution and commitment of clinical researchers around the world is needed to facilitate this important step of clinical research. To ensure optimal use of resources, prioritization according to the proclaimed level of innovation of a product is necessary.
Note: The author thanks Kathleen Füssler, MSc, and Caspar Ottersbach, Dipl.Psych, for their assistance with data collection.
Streptomycin treatment of pulmonary tuberculosis. Br Med J. 1948;2(
4582):769-82.
McCulloch
P;
Altman
DG;
Campbell
WB;
Flum
DR;
Glasziou
P;
Marshall
JC;
Nicholl
J;
Balliol
Collaboration;
Aronson
JK;
Barkun
JS;
Blazeby
JM;
Boutron
IC;
Campbell
WB;
Clavien
PA;
Cook
JA;
Ergina
PL;
Feldman
LS;
Flum
DR;
Maddern
GJ;
Nicholl
J;
Reeves
BC;
Seiler
CM;
Strasberg
SM;
Meakins
JL;
Ashby
D;
Black
N;
Bunker
J;
Burton
M;
Campbell
M;
Chalkidou
K;
Chalmers
I;
de Leval
M;
Deeks
J;
Ergina
PL;
Grant
A;
Gray
M;
Greenhalgh
R;
Jenicek
M;
Kehoe
S;
Lilford
R;
Littlejohns
P;
Loke
Y;
Madhock
R;
McPherson
K;
Meakins
J;
Rothwell
P;
Summerskill
B;
Taggart
D;
Tekkis
P;
Thompson
M;
Treasure
T;
Trohler
U;
Vandenbroucke
J. No surgical innovation without evaluation: the IDEAL recommendations. Lancet.
2009;374(
9695):1105-12.[CrossRef][PubMed]
Horton
R. Surgical research or comic opera: questions, but few answers. Lancet.
1996;347(
9007):984-5.[CrossRef][PubMed]
Buchwald
H. Surgical procedures and devices should be evaluated in the same way as medical therapy. Control Clin Trials.
1997;18(
6):478-87.[CrossRef][PubMed]
Barton
TM;
Gleeson
R;
Topliss
C;
Greenwood
R;
Harries
WJ;
Chesser
TJ. A comparison of the long gamma nail with the sliding hip screw for the treatment of AO/OTA 31-A2 fractures of the proximal part of the femur: a prospective randomized trial. J Bone Joint Surg Am.
2010;92(
4):792-8.[CrossRef][PubMed]
Bonutti
PM;
Zywiel
MG;
Ulrich
SD;
Stroh
DA;
Seyler
TM;
Mont
MA. A comparison of subvastus and midvastus approaches in minimally invasive total knee arthroplasty. J Bone Joint Surg Am.
2010;92(
3):575-82.[CrossRef][PubMed]
Choi
WC;
Lee
S;
Seong
SC;
Jung
JH;
Lee
MC. Comparison between standard and high-flexion posterior-stabilized rotating-platform mobile-bearing total knee arthroplasties: a randomized controlled study. J Bone Joint Surg Am.
2010;92(
16):2634-42. .[CrossRef][PubMed]
Colwell
CW
Jr;
Froimson
MI;
Mont
MA;
Ritter
MA;
Trousdale
RT;
Buehler
KC;
Spitzer
A;
Donaldson
TK;
Padgett
DE. Thrombosis prevention after total hip arthroplasty: a prospective, randomized trial comparing a mobile compression device with low-molecular-weight heparin. J Bone Joint Surg Am.
2010;92(
3):527-35.[CrossRef][PubMed]
Demey
G;
Servien
E;
Pinaroli
A;
Lustig
S;
Aït Si Selmi
T;
Neyret
P. The influence of femoral cementing on perioperative blood loss in total knee arthroplasty: a prospective randomized study. J Bone Joint Surg Am.
2010;92(
3):536-41.[CrossRef][PubMed]
Hamid
N;
Ashraf
N;
Bosse
MJ;
Connor
PM;
Kellam
JF;
Sims
SH;
Stull
DE;
Jeray
KJ;
Hymes
RA;
Lowe
TJ. Radiation therapy for heterotopic ossification prophylaxis acutely after elbow trauma: a prospective randomized study. J Bone Joint Surg Am.
2010;92(
11):2032-8.[CrossRef][PubMed]
Hove
LM;
Krukhaug
Y;
Revheim
K;
Helland
P;
Finsen
V. Dynamic compared with static external fixation of unstable fractures of the distal part of the radius: a prospective, randomized multicenter study. J Bone Joint Surg Am.
2010;92(
8):1687-96.[CrossRef][PubMed]
Kim
YH;
Choi
Y;
Kim
JS. Comparison of a standard and a gender-specific posterior cruciate-substituting high-flexion knee prosthesis: a prospective, randomized, short-term outcome study. J Bone Joint Surg Am.
2010;92(
10):1911-20.[CrossRef][PubMed]
Liebs
TR;
Herzberg
W;
Rüther
W;
Haasters
J;
Russlies
M;
Hassenpflug
J. Ergometer cycling after hip or knee replacement surgery: a randomized controlled trial. J Bone Joint Surg Am.
2010;92(
4):814-22.[CrossRef][PubMed]
Nåsell
H;
Adami
J;
Samnegård
E;
Tønnesen
H;
Ponzer
S. Effect of smoking cessation intervention on results of acute fracture surgery: a randomized controlled trial. J Bone Joint Surg Am.
2010;92(
6):1335-42.[CrossRef][PubMed]
Parker
MJ. Iron supplementation for anemia after hip fracture surgery: a randomized trial of 300 patients. J Bone Joint Surg Am.
2010;92(
2):265-9.[CrossRef][PubMed]
Pihlajamäki
H;
Hietaniemi
K;
Paavola
M;
Visuri
T;
Mattila
VM. Surgical versus functional treatment for acute ruptures of the lateral ligament complex of the ankle in young men: a randomized controlled trial. J Bone Joint Surg Am.
2010;92(
14):2367-74. .[CrossRef][PubMed]
Pospischill
M;
Kranzl
A;
Attwenger
B;
Knahr
K. Minimally invasive compared with traditional transgluteal approach for total hip arthroplasty: a comparative gait analysis. J Bone Joint Surg Am.
2010;92(
2):328-37.[CrossRef][PubMed]
Rompe
JD;
Cacchio
A;
Weil
L
Jr;
Furia
JP;
Haist
J;
Reiners
V;
Schmitz
C;
Maffulli
N. Plantar fascia-specific stretching versus radial shock-wave therapy as initial treatment of plantar fasciopathy. J Bone Joint Surg Am.
2010;92(
15):2514-22.[CrossRef][PubMed]
Trumble
TE;
Vedder
NB;
Seiler
JG
3rd;
Hanel
DP;
Diao
E;
Pettrone
S. Zone-II flexor tendon repair: a randomized prospective trial of active place-and-hold therapy compared with passive motion therapy. J Bone Joint Surg Am.
2010;92(
6):1381-9.[CrossRef][PubMed]
Willits
K;
Amendola
A;
Bryant
D;
Mohtadi
NG;
Giffin
JR;
Fowler
P;
Kean
CO;
Kirkley
A. Operative versus nonoperative treatment of acute Achilles tendon ruptures: a multicenter randomized trial using accelerated functional rehabilitation. J Bone Joint Surg Am.
2010;92(
17):2767-75. .[CrossRef][PubMed]
Wong
J;
Abrishami
A;
El Beheiry
H;
Mahomed
NN;
Roderick Davey
J;
Gandhi
R;
Syed
KA;
Muhammad Ovais Hasan
S;
De Silva
Y;
Chung
F. Topical application of tranexamic acid reduces postoperative blood loss in total knee arthroplasty: a randomized, controlled trial. J Bone Joint Surg Am.
2010;92(
15):2503-13.[CrossRef][PubMed]
Wülker
N;
Lambermont
JP;
Sacchetti
L;
Lazaró
JG;
Nardi
J. A prospective randomized study of minimally invasive total knee arthroplasty compared with conventional surgery. J Bone Joint Surg Am.
2010;92(
7):1584-90.[CrossRef][PubMed]
Biau
DJ;
Katsahian
S;
Kartus
J;
Harilainen
A;
Feller
JA;
Sajovic
M;
Ejerhed
L;
Zaffagnini
S;
Röpke
M;
Nizard
R. Patellar tendon versus hamstring tendon autografts for reconstructing the anterior cruciate ligament: a meta-analysis based on individual patient data. Am J Sports Med.
2009;37(
12):2470-8. .[CrossRef][PubMed]
Bauwens
K;
Matthes
G;
Wich
M;
Gebhard
F;
Hanson
B;
Ekkernkamp
A;
Stengel
D. Navigated total knee replacement. A meta-analysis. J Bone Joint Surg Am.
2007;89(
2):261-9.[CrossRef][PubMed]
Buijze
GA;
Doornberg
JN;
Ham
JS;
Ring
D;
Bhandari
M;
Poolman
RW. Surgical compared with conservative treatment for acute nondisplaced or minimally displaced scaphoid fractures: a systematic review and meta-analysis of randomized controlled trials. J Bone Joint Surg Am.
2010;92(
6):1534-44.[CrossRef][PubMed]
Hines
JZ;
Lurie
P;
Yu
E;
Wolfe
S. Left to their own devices: breakdowns in United States medical device premarket review. PLoS Med.
2010;7(7):.
Melloh
M;
Röder
C;
Staub
LP;
Zweig
T;
Barz
T;
Theis
JC;
Müller
U. Randomized-controlled trials for surgical implants: are registries an alternative?Orthopedics.
2011;34(
3):161. .[PubMed]
Cohen
D. Out of joint: the story of the ASR. BMJ.
2011;342:. .
Katz
JN;
Wright
JG;
Losina
E. Clinical trials in orthopaedics research. Part II. Prioritization for randomized controlled clinical trials. J Bone Joint Surg Am.
2011;93(
7):e30.