Trial Identification
We searched four electronic databases (Medline, EMBASE, CINAHL, and CENTRAL) using the following terms: fractures, orthopaedic procedures, fractures/surgery, and fracture fixation. We limited our search to human, English-language, randomized controlled trials published between the years of 1995 and 2004. Two reviewers independently assessed the titles and abstracts of all citations to determine whether they met the following eligibility criteria: the study was a randomized controlled trial; the participants were living humans with a fracture of any bone, excluding bones of the head, due to injury; at least one of the interventions occurred in an operating room and involved an incision; and the study was a full-text English-language manuscript published in a biomedical journal.
Two reviewers independently applied the eligibility criteria to the full text of all potentially eligible studies. They met and resolved disagreements by discussion and consensus. We combined manuscripts that reported on the same primary trial (such as those with longer follow-up or subgroup analysis).
Data Extraction
Two reviewers independently extracted data in duplicate regarding participants, interventions, outcomes, and methodology from each eligible trial using a standardized Internet-based system (SRS; TrialStat, Ottawa, Ontario, Canada). We performed a pilot test of the clarity of our data forms by having all reviewers assess four trials in orthopaedic trauma from outside the time frame used in this study. A third individual reviewed all disagreements and made the final decision.
We classified outcomes as follows: (1) clinical outcomes rated by a clinician or by chart abstraction (e.g., mortality, infection, length of stay, cost, repeat operation, operative and/or fluoroscopy time, complications, or transfusions); (2) physiological outcomes rated by a tester (e.g., range of motion, strength, gait and/or weight-bearing, stability, or functional scores); (3) patient-reported outcomes (e.g., pain, analgesic use, function, or quality-of-life score); (4) radiographic outcomes (e.g., quality of reduction and/or alignment, implant placement, fracture-healing and/or union, or implant failure); and (5) combined outcomes incorporating two or more categories of outcome assessors (e.g., fracture-healing with use of a combined clinical and radiographic assessment).
For each category of outcomes, data extractors recorded whether the manuscript reported the assessors (none, some, or all) as having been blinded, and, if so, how this was achieved. Before the study, we hypothesized that trials might incorporate four possible methods of blinding outcome assessors: (1) use of a tester who was unaware of the intervention group (only possible if the interventions involved the same incision and had no other distinguishing features), (2) covering of scars or incisions with use of garments or bandages (if the intervention involved different incisions but had no other distinguishing features), (3) manipulation of radiographs to conceal or mask an implant, and (4) sham surgery.
Two investigators with surgical expertise then reviewed each trial to determine the feasibility of blinding the patients, clinicians, outcome assessors, and data analysts. For categories in which it would have been possible to blind some or all outcome assessors, we recorded how this could have been accomplished with use of the same classification described above. We did not include a category for sham surgery, since many have ethical concerns about subjecting patients to sham surgery without any possible benefit beyond placebo effects. The two reviewers met and came to consensus for any disagreements. Reviewers were not blinded to the author or journal names of the primary studies.
Statistical Analysis
We measured the agreement between reviewers for trial eligibility and for the assessment of blinding feasibility by calculating the kappa statistic, with quadratic weighting for ordered categories. The guidelines proposed by Landis and Koch9 formed the basis for our interpretation (kappa values of 0 to 0.20 represent slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and >0.80, almost perfect agreement).
We calculated the proportion of trials that noted incorporation of each methodological quality criterion that we evaluated. Similarly, we calculated the number and proportion of trials that measured at least one outcome in each category and the proportion of those trials that described blinding of some or all of the outcome assessors. Finally, we calculated the proportion of trials in each category in which some or all of the outcome assessors could feasibly have been blinded.
Our initial search yielded 1340 citations, and reviewers judged 188 of them to be potentially eligible (Fig. 1). The full-text reviews eliminated seven ineligible studies and ten duplicate publications, resulting in a final set of 171 unique trials (see Appendix). Reviewers achieved substantial agreement in the application of eligibility criteria (raw agreement, 96.3% to 97.9%; kappa, 0.59 to 0.84).
The number of trials published each year ranged from eight to twenty-six, with a trend toward more publications in the last five years (Table I). Single-center trials accounted for the majority of the studies, with investigators in continental Europe having the most trials published. Investigators most commonly studied fractures of the hip, followed by those of the leg, forearm, and wrist. The majority of interventions involved the implantation of hardware.
In general, reports of study methodology had limitations (Table II). Of the 171 trials, almost half did not describe the method of concealment and <4% noted a rigorous method of allocation concealment. In 84% of the trials, it was unclear whether any patients had received an intervention to which they were not allocated, and in most trials when such protocol violations did occur, the investigators excluded the participants from the analysis, thus violating the intention-to-treat principle10,11. Only three trials (2%) described blinding of the patients, and none reported blinding of clinicians or data analysts.
Trials commonly measured clinical, radiographic, patient-reported, and physiological outcomes, while combined outcomes were less common (Table III). Wound infection was the most frequent outcome, followed by fracture union and quality of reduction.
Although investigators reported blinding of outcome assessors more frequently than blinding of other trial personnel, <10% of the trials in any category reported blinding of outcome assessors (Table IV). Trials that did blind achieved the goal in most cases by utilizing an independent outcome assessor who was unaware of the group allocation. The appendix provides examples of trials that incorporated creative approaches to blinding outcome assessors12-15.
We determined that some or all of the clinical, physiological, and radiographic outcome assessors could have been blinded in >85% of the trials in each category (Fig. 2). The use of an independent assessor would have been sufficient to blind the majority of clinical outcomes, while investigators would have had to conceal incisions or manipulate radiographs to blind the majority of physiological and radiographic outcomes (Table V). In general, the agreement among reviewers for the feasibility assessments was good (Table VI).
Several reports have documented the methodological shortcomings of surgical trials16-19. There are practical issues that are more difficult in surgical trials than in their medical counterparts, including recruiting patients to be randomized to two alternative invasive interventions, standardizing surgical techniques across a group of heterogeneous surgeons, and dealing with issues of surgeon expertise20. There may also be cultural and ethical differences between surgical and medical researchers that add more subtle barriers to conducting rigorous trials7,21. In this systematic review, we explored one potential perception about surgical trials, which is that they cannot be blinded.
Orthopaedic journals have adopted guidelines that advocate comprehensive and transparent reporting of clinical trial methodology3,22. Despite this, a large number of trials in our sample did not describe critical methodological characteristics, such as concealment of allocation (49%), addressing issues of patients who crossed from one intervention to another (84%), loss to follow-up (48%), and analysis of losses to follow-up (54%). In the best-case scenario, authors are not completely reporting the important methodological safeguards that they are incorporating into their trials. It is perhaps more likely that the authors may not be including these critical features into their trial designs: a possibility that jeopardizes the validity of the results of these trials.
For the present study, the most crucial possible omission regarding the description of methods relates to blinding. Rather than contacting the authors, we relied on the published manuscripts to determine the presence of blinding. Although it is possible that published reports may not accurately reflect all of the methodology incorporated into a trial, we considered it unlikely that a substantial number of trials included in this study blinded outcome assessors but failed to report doing so. Such uncertainty would be minimized if authors were more comprehensive in their reporting of trial methodology and if journal editors were more vigilant in ensuring that authors follow their published guidelines.
We chose to focus on blinding of the individuals who assess outcomes, which is critical to reduce the potential for bias, particularly in trials in which the outcomes are subjective, such as quality of life or range of motion2,5,6. Our results indicate that subjective outcomes such as these are common in trials of orthopaedic trauma: over half of the studies included a physiological measurement (at least partly dependent on the tester), a patient-reported outcome (dependent on the patient's perceptions), and a radiographic outcome (dependent on a clinician's interpretation). We believe that researchers should take whatever steps are possible to blind the individuals involved in such subjective outcome assessments.
Although few trials that described subjective outcomes blinded their outcome assessors, those that did so demonstrated the feasibility of the blinding process (see Appendix). Previous reports have suggested similar findings; Balk et al. examined 276 randomized controlled trials identified from meta-analyses in various specialties, including sixty-seven from surgery23. Blinding of outcome assessors was reported in only 16% of the surgical trials compared with 52% of the cardiovascular studies. While there is still much room for improvement, our findings offer grounds for optimism: some surgical investigators are already using imaginative methods that are quite simple and require relatively little effort—such as utilizing an independent assessor or concealing the surgical incision—to achieve blinding.
While few would question the reduction in bias that the blinding of outcome assessment can achieve, empirical evidence suggests that blinding in trials does indeed make a difference. In a systematic review of 250 randomized controlled trials identified from thirty-three meta-analyses, researchers observed a significant difference in the size of the estimated treatment effect between trials that described "double-blinding" compared with those that did not (p = 0.01), with an overall odds ratio that was 17% larger in studies that did not note blinding24. Other studies have confirmed this finding23,25.
In this systematic review, we chose to group outcomes in trials on the basis of the type of individual who would likely assess the outcomes. Categorizing outcomes in this manner might be helpful to clinicians and investigators when considering the feasibility of blinding. Most guidelines for critically appraising the methodology or reporting of a randomized controlled trial include a criterion that asks, "Were the individuals assessing the outcomes blinded?"3,4 Rather than considering blinding as a dichotomous (all-or-nothing) design feature, investigators should view it as a continuum: the more, the better. Thus, it is possible that, within a single trial, some outcome assessors will be blinded, while others might be aware of the group allocation. That would not limit the implementation or interpretation of a trial in any way; clinicians should be more confident about estimates of effect on the blinded than on the nonblinded outcomes. Indeed, the most rigorous approaches to assessing quality of evidence demand outcome-specific ratings of evidence quality26.
This systematic review has a number of strengths, including a comprehensive literature search to include all trials in the field of orthopaedic trauma, duplicate assessment of eligibility and data extraction, and standardized data-collection forms that had been subjected to a pilot test to ensure the accuracy of the results. The subjective nature of our primary research question is a potential limitation. It is possible that other researchers might disagree about some of the outcomes that we considered feasible to blind.
We made this determination as rigorous as possible by including the use of duplicate assessments and the specification of how blinding could be accomplished. Empirical research suggests that two reviewers are sufficient to assess subjective measures such as this, and that increasing to five reviewers rarely alters the consensus outcome27. Although the reviewers did not achieve perfect agreement, the small proportion of disagreements increases confidence in our findings. It is possible, however, that the reviewers achieved good agreement in part because of similar training and biases in favor of blinding. We did not consider invasive and potentially unethical methods such as sham surgery to be feasible. Instead, researchers would need only simple techniques such as incorporating an independent assessor, concealing incisions, or manipulating radiographs (which is easily accomplished in the era of digital imaging) to blind the vast majority of outcomes. It is possible that even more outcomes could be blinded with use of other creative approaches that we did not consider. Of course, researchers must be careful that the strategies that they use to blind do not introduce biases of their own, and they should explicitly report in the manuscript any special techniques or manipulation that are employed.
In summary, the methodological reporting of trials in orthopaedic trauma has been very limited. Randomized controlled trials most commonly measure clinical and radiographic outcomes, and they rarely describe having blinded the individuals who assessed the outcomes. Most trials could feasibly blind the assessors of clinical, physiological, and radiographic outcomes with use of simple techniques. Researchers should assess the feasibility of blinding in each category of outcomes separately, incorporate methodological safeguards to maximize the validity of their conclusions, and include a description of such safeguards in their published reports.