Abstract
Background: Randomization, concealment of treatment allocation, and
blinding are all known to limit bias in clinical research. Nonsurgical studies
that fail to meet these standards have been reported to inflate the
differences between treatment and control groups. While surgical trials can
rarely blind surgeons or patients, they can often blind outcome assessors. The
aim of this systematic review was threefold: (1) to examine the reporting of
outcome measures in orthopaedic trials, (2) to determine the feasibility of
blinding in published orthopaedic trials, and (3) to examine the association
between the magnitude of treatment differences and the blinding of outcome
assessors.
Methods: We identified and reviewed thirty-two randomized,
controlled trials published in The Journal of Bone and Joint Surgery
(American Volume) in 2003 and 2004 for the appropriate use of outcome
measures. These trials represented 3.4% of all 938 studies published during
that time-period. All thirty-two trials were reviewed by two authors for (1)
the outcome measures used and (2) the blinding of outcomes assessors. We
calculated the magnitude of the treatment effect of the use of blinded
compared with unblinded outcome assessors.
Results: Ten (31%) of the thirty-two randomized controlled trials
used a modified outcome instrument. Of the ten trials, four failed to describe
how the outcome instrument was modified. Nine of the ten articles did not
describe how the modified instrument was validated and retested. Sixteen of
the thirty-two randomized controlled trials did not report blinding of outcome
assessors when blinding would have been possible. Among the studies with
continuous outcome measure, unblinded outcomes assessment was associated with
significantly larger treatment effects than blinded outcomes assessment
(standardized mean difference, 0.76 compared with 0.25; p = 0.01). Similarly,
in the studies with dichotomous outcomes, unblinded outcomes assessments were
associated with significantly greater treatment effects than blinded outcomes
assessments (odds ratio, 0.13 compared with 0.42; p < 0.001). The ratio of
odds ratios (unblinded to blinded outcomes assessment) was 0.31, suggesting
that unblinded outcomes assessment was associated with a potential for
exaggeration of the benefit of the effectiveness of a treatment in our cohort
of studies.
Conclusions: In future orthopaedic randomized controlled trials,
emphasis should be placed on detailed reporting of outcome measures to
facilitate generalization and the outcome assessors should be blinded, when
possible, to limit bias.
Randomized controlled trials represent the highest level of evidence for a
surgical therapy1.
Several reports have identified factors associated with bias in the conduct
and reporting of randomized trials in the medical
literature2-8.
These include concealment of randomization and blinding of physicians,
patients, outcome assessors, and data
analysts2-8.
If differences between treatments are modest, bias can distort the
truth9.
Lack of blinding in medical randomized controlled trials has been
associated with increased magnitudes of observed treatment
effects2,5,7,10-12.
Blinding is a methodological safeguard and is often confused with other
methodological precautions, such as concealment of allocation during the
process of creating comparison
groups12.
Allocation in a trial is concealed when investigators cannot beforehand
determine the allocated treatment of the next patient enrolled into their
study. Allocation concealment is necessary to prevent selection bias, whereas
blinding is important to prevent detection bias, i.e., a biased assessment of
outcome5.
Unlike pharmaceutical trials, surgical trials can never blind the surgeon
to the type of intervention. Thus, safeguards to prevent bias in surgical
trials include concealment of allocation and blinding of patients and outcome
assessors9. Boutron
et al. reported that the feasibility of blinding differs between
pharmaceutical and nonpharmaceutical
trials13.
In surgical trials, reporting unbiased, clinically important differences in
treatment effect ideally requires blinded outcomes assessment and the use of
validated outcome
instruments14,15.
Despite the use of the best-validated outcome instruments, the measurement of
treatment effect can still be biased if differences between treatment groups
are small and methodological safeguards are not
applied11,16.
The correct use of outcome measures was recently discussed in an editorial
by Zarins in this
journal17. In the
last decades, an effort has been made to design patient-oriented outcome
measures, but these new measures have not always been appropriately
validated18-20.
The aim of this systematic review was threefold: (1) to examine the
reporting of outcome measures in orthopaedic trials, (2) to determine the
feasibility of blinding of outcome assessors in published orthopaedic trials,
and (3) to examine the association between the magnitude of treatment
differences and blinding.
We hypothesized that outcome instruments are often modified and authors do
not report validation of the modified outcome instrument. Furthermore, we
hypothesized that studies describing unblinded outcome assessors were more
likely to report larger treatment effects than those involving blinded outcome
assessors.
Study Design
We conducted a systematic review to describe the reporting of outcome
measures and the conduct and feasibility of blinding in randomized controlled
trials published in The Journal of Bone and Joint Surgery (American
Volume) in 2003 and 2004. We chose The Journal since it is regarded
as the highest-impact general orthopaedic journal. Our review comprised only
randomized controlled trials, since they are designed to detect clinically
important changes with limited bias and are considered to provide the highest
level of
evidence1.
Eligibility Criteria
Two authors (R.W.P. and R.K.) manually searched all issues of The
Journal from January 2003 through December 2004. Eligible studies
included those reported as randomized trials of therapeutic interventions that
involved human subjects. Searches were conducted in duplicate, and any
disagreements were resolved by consensus of three authors (R.W.P., R.K., and
M.B.).
Study Demographic Information
The relevant data from each of the eligible studies were abstracted by one
investigator (R.W.P.) and were rechecked for accuracy by a second investigator
(P.A.A.S.). The data included (1) the first author (a surgeon; a physician,
but not a surgeon; or an epidemiologist); (2) citation of statistical support
or methodological support by a department of clinical epidemiology,
statistics, or public health; (3) year of publication; (4) total sample size;
(5) number of centers; (6) name of the intervention; (7) trial type, i.e.,
surgery, drug trial (categorized as injection, oral, or topical),
postoperative management, and externally applied energy (shock wave or
ultrasound); (8) body region (upper extremity, long bones of the lower
extremity, spine, hip and knee, foot and ankle, deep venous thrombosis, or
other); (9) financial support (yes or no); (10) direction of results
(positive, if the findings of the randomized trial were significant, or
negative, if they were not significant); and (11) trial reported according to
the CONSORT (Consolidated Standards of Reporting Trials)
statement21 (yes or
no).
Study Aim 1: Evaluation of Outcome Reporting
Outcome Measure Identification
All identified randomized controlled trials were retrieved and reviewed for
the types of outcome measures used in the study by two reviewers (R.W.P. and
P.A.A.S.), and they were checked for accuracy by a third reviewer (R.K.). We
categorized the outcome measures according to the following criteria described
by Boutron et
al.13: (1)
"Patient-reported outcomes" (e.g., pain and disabilities), when
the patient is the outcome assessor. (2) "Outcomes that suppose a
contact between patients and outcome assessors" (e.g., clinical
examination [blood pressure], clinical test [walking speed and
stair-climbing], and ultrasound examination). (3) "Outcomes that do not
suppose a contact between patients and outcome assessors" (e.g.,
radiography and magnetic resonance imaging). (4) "Clinical events and
therapeutic outcomes that will be determined by the interaction between
patients and care providers" (e.g., cointerventions, length of
hospitalization, treatment failure, and surgery), in which the care provider
is the outcome assessor. (5) "Clinical events and therapeutic outcomes
that will be assessed from data on the medical form" (e.g., death linked
to myocardial infarction or indication for arthroplasty from clinical and
radiographic data). (6) "Composite outcomes," i.e., those that
require each outcome to be assessed separately.
Outcome Instrument Validation and Appropriate Use
We assessed the studies with regard to the following criteria: (1) Did the
study describe a modification of an outcome instrument? (2) Did the authors
provide a detailed description of the modification in the manuscript when an
outcome instrument was modified? (3) Did the authors provide a detailed
description of the validation process after modification of the outcome
instrument in their study?
Study Aim 2: Quality of Reporting on the Methodological Safeguard of
Blinding
We examined the blinding process for all included randomized controlled
trials. We scored the use of blinding in the report on the basis of three
categories. (1) "Clearly blinded" when blinding was reported. (2)
There was no report of blinding status. When blinding was not conducted or not
reported, we further examined whether blinding would have been possible
(feasible) on the basis of the characteristics of the study. This resulted in
a category defined as: "Not blinded or blinding status not described,
but it was possible to blind." (3) The report stated that
"blinding was impossible" because of the characteristics of the
intervention or the outcome used in the trial.
This blinding process was scored for treatment providers, patients, outcome
assessors, and data analysts.
Details about the blinding were reported for different trial types. We
categorized the trials into the following subgroups: surgical trial, drug
trial (injection, oral, or topical), postoperative management, physical
therapy trial, and externally applied energy trial.
Study Aim 3: Effect of Blinding on the Reported Treatment
Effects
We compared the magnitude of the treatment effects for studies that
reported outcome blinding with those that did not. We grouped the studies that
were categorized as "not blinded or blinding status not described, but
it was possible to blind" and "blinding was impossible" into
the unblinded group.
For dichotomous outcome measures, we compared pooled odds ratios (95%
confidence intervals) across studies with and without blinding. For continuous
variables (outcome scales), we converted the reported mean differences across
treatments to an effect size, or standardized mean difference (the treatment
mean value minus the comparison mean value divided by the pooled standard
deviation), and compared the effect size across blinded and unblinded
outcomes. We used the same method to calculate the magnitude of the treatment
effect for studies that described the concealment of treatment allocation and
those that did not.
Statistical Analysis
Descriptive results were presented for the demographic characteristics of
the studies, evaluation of outcomes, and details with regard to blinding and
trial type. Data were analyzed with SPSS statistical software package (version
13.0; SPSS, Chicago, Illinois).
Ensuring the Accuracy of the Blinding Rating
We measured agreement between the reviewers for the assessment of the
blinding of outcome assessors with use of a kappa
statistic22. Landis
and Koch suggested criteria for the interpretation of agreement, with 0 to 0.2
representing slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60,
moderate agreement; and 0.61 to 0.80, substantial agreement. A value of
>0.80 is considered almost perfect
agreement22.
Regardless, if two reviewers disagreed, we attempted to come to a consensus
after carefully reading the articles a second time in a consensus meeting.
When discrepancies persisted despite a consensus meeting, a third reviewer was
asked for an opinion on the specific item to reach final consensus. This
method of quality assessment is commonly used in Cochrane reviews. All
reviewers (R.W.P., P.A.A.S., R.K., and M.B.) were well trained in quality
assessments, all had completed a Cochrane review course, and all had
coauthored Cochrane systematic reviews of randomized trials.
We used the chi-square test to calculate the relationship between outcome
assessor blinding and the direction of the results. We used a p value of
<0.05 to represent significance. All tests of significance were
two-tailed.
Magnitude of the Treatment Effect in the Studies
We used the computer program Review Manager (RevMan, version 4.2 for
Windows; The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen,
Denmark, 2003) to calculate the magnitude of the treatment effect for the
studies that described blinded outcome assessors and for the studies in which
blinding either was not described or was not possible. For continuous data, we
described treatment effect as the standardized mean difference. The
standardized mean difference is the difference in means divided by the
standard deviation. This standard deviation is the pooled standard deviation
of the outcomes of participants across the whole trial. The standardized mean
difference has the important property that its value does not depend on the
measurement scale
().
Thus, we used a standardized mean difference to convert all outcomes to a
common scale, measured in units of standard deviations. For dichotomous data,
we calculated the odds ratios to describe the magnitude of the treatment
effect. We calculated the odds ratios on the premise of preventing a bad
outcome. For the standardized mean difference, we used the fixed effect
inverse variance model. For the odds ratios, we used fixed effect assumption
and the Mantel-Haenszel risk ratio. Our aim was to describe the magnitude of
the treatment effect. We did not aim to compare data or to pool for
meta-analysis; therefore, we were able to use the fixed effect model. Data
were presented in figures with 95% confidence intervals. Ratios of odds ratios
were calculated as described
previously5,7,23.
A ratio of odds ratios of <1.0 for outcome assessor blinding indicates that
trials with unblinded outcomes assessments yielded larger (exaggerated)
estimates of treatment effects than blinded outcomes assessments, compared
with the reference
group7. Conversely,
a ratio of odds ratios of >1.0 indicates association with smaller treatment
effects7. The
outcome measures used to calculate treatment effect are listed in the
Appendix.
Sample Size
Our study sample size included all randomized trials published in The
Journal from January 2003 through December 2004. We required at least
fifty patients per group (the studies with blinded outcome assessment compared
with those with unblinded outcome assessment) across all thirty-two eligible
randomized trials to provide sufficient study power (alpha = 0.05 and beta =
0.20) to detect large differences in treatment effects (odds ratios) between
studies with blinded and unblinded outcome measures. All tests were
two-tailed, and we considered a p value of <0.05 to be the threshold for
significance.
Study Demographic Information
We identified 938 studies in The Journal from January 2003 through
December 2004. Of those studies, thirty-two (3.4%) were randomized controlled
trials (see Appendix). The first author was a surgeon in thirty-one studies
(97%) and a physician who was not a surgeon in the remaining study (3%). In
three randomized trials, at least one of the authors of each study had
training in biostatistics (MPH, MSc, or PhD) or was affiliated with a
department of statistics, public health, or clinical epidemiology. The
thirty-two randomized trials included a total of 3608 patients, with sample
sizes ranging from twenty to 474 patients per randomized controlled trial. Six
of the studies were performed in two or more centers, eleven focused on
interventions related to the treatment of degenerative joint disease, and
seven focused on fractures. Five studies included problems affecting the upper
extremity; six, the foot and ankle; and nine, the knee. Four randomized
controlled trials were reported according to the CONSORT
statement21
(Table I).
Study Aim 1: Evaluation of Outcome Reporting
Outcome Measure Identification
A total of seventy-nine different outcome measures were reported 147 times
in the thirty-two randomized controlled trials. The seventy-nine outcomes were
classified according to the criteria described by Boutron et
al.13 and were
further categorized into the following subgroups. For the sixteen
patient-reported outcomes, the most frequently used measure was the visual
analogue scale (twelve trials; 38%) followed by the Short Form-36 (SF-36)
(four trials; 13%). For the fifty-one outcomes that suppose a contact between
patients and outcome assessors, the most frequently used measure was range of
motion (nine trials; 28%) followed by other types of clinical examination
(eight trials; 25%). For the eight outcomes that do not suppose a contact
between patients and outcome assessors, radiographs were the measure used in
twenty randomized controlled trials (63%). For the four clinical events and
therapeutic outcomes that are determined by the interaction between patients
and care providers, reoperation and the use of pain medication were the
measures described in two trials. We were not able to identify clinical events
and therapeutic outcomes that are assessed from data on the medical chart or
composite outcomes that require each outcome to be assessed separately. For a
detailed description of the outcomes see the Appendix.
Outcome Measure Validation and Appropriate Use
Ten (31%) of the thirty-two randomized controlled trials used a modified
outcome instrument. Of the ten trials, four failed to describe how the outcome
instrument was modified. Nine of the ten articles did not describe how the
modified instrument was validated and retested.
Study Aim 2: Quality of Reporting on the Methodological Safeguard of
Blinding
The reviewers had substantial to almost perfect agreement in scoring the
blinding of treatment providers (kappa, 0.85), patients (kappa, 0.90), outcome
assessors (kappa, 0.84), and data analysts (kappa, 0.73).
Five randomized controlled trials (16%) did not blind or did not describe
blinding of treatment providers in trials in which blinding was possible. In
twenty-three trials (72%), blinding was impossible; the majority of these
studies (twenty-one; 66%) were surgical trials in which blinding of the
treatment provider was impossible. Fifteen trials (47%) did not blind or did
not describe blinding of patients, when blinding was possible. In eleven
trials, blinding of patients was impossible. For the surgical trials, patients
were clearly blinded in three, blinding was possible but not reported or was
not done in nine trials, and it was impossible to blind patients in nine
trials. All studies could have blinded the data analysts; however,
twenty-three trials did not report blinding or did not blind the data
analysts. The remaining nine trials (28%) blinded the data analysts.
Of the thirty-two randomized controlled trials, fourteen (44%) clearly
blinded the outcome assessors. Sixteen trials did not describe blinding of the
outcome assessors, when blinding of the outcome assessors was possible. In two
trials, which were both surgical trials, it was impossible to blind the
outcome assessors. Of the remaining surgical trials, seven trials clearly
blinded the outcome assessors and twelve trials did not blind the outcome
assessors or did not report blinding although it was possible. Data with
regard to blinding and trial type are summarized in
Table II.
Study Aim 3: The Effect of Outcome Assessor Blinding on the Reported
Treatment Effects
In studies with continuous outcome measures, the treatment effect was
larger in studies with unblinded outcome assessors. The effect size
(standardized mean difference) for three studies describing a blinded outcome
assessor was 0.25 (95% confidence interval, —0.06 to 0.56), whereas the
effect size was 0.76 (95% confidence interval, 0.57 to 0.96) for eight studies
with an unblinded outcome assessor or unreported blinding
(Fig. 1). The difference in
effect magnitude was significant (p = 0.01).
In the studies that described a dichotomous outcome measure, blinding of
the outcome assessors in ten studies (odds ratio, 0.42; 95% confidence
interval, 0.33 to 0.54) was associated with a significantly lower treatment
effect than that associated with unblinded outcome assessments in eleven
studies (odds ratio, 0.13; 95% confidence interval, 0.09 to 0.18) (p <
0.001) (Fig. 2). This
translated to relative risk reductions of 38% for blinded outcome assessments
compared with 71% for unblinded outcome assessments (a difference of 33%). The
ratio of odds ratios was 0.31 (95% confidence interval, 0.20 to 0.47),
indicating an exaggerated treatment effect when outcome assessors were not
blinded. Figure 3 represents a
comparison of this ratio of odds ratios with those of five other studies that
described relative odds associated with blinding of outcome
assessment5,7,23-25.
The effect size (standardized mean difference) for the four studies
describing concealment of treatment allocation was 0.44 (95% confidence
interval, 0.13 to 0.74) compared with 0.68 (95% confidence interval, 0.49 to
0.87) for the seven studies with unconcealed treatment allocation. In the
studies describing a dichotomous outcome measure, the treatment effect in
eight trials with concealment of treatment allocation (odds ratio, 0.32; 95%
confidence interval, 0.23 to 0.45) was lower than that in thirteen trials with
unconcealed treatment allocation (odds ratio, 0.26; 95% confidence interval,
0.20 to 0.33). The ratio of odds ratios was 0.81 (95% confidence interval,
0.53 to 1.24), indicating that there was no significant difference for the
available data.
Our evaluation of the direction of study results, positive or negative, was
underpowered. Of the sixteen trials in which the blinding status was not
described or the outcome assessors were not blinded, twelve noted a positive
result. Of the fourteen trials that described true blinding of the outcome
assessors, eight noted a positive result. The risk of reporting a positive
outcome when outcome assessors were unblinded was 1.21 (95% confidence
interval, 0.70 to 2.1). The difference was not significant (p = 0.41).
Key Findings
We found that (1) only 3.4% of all studies published in The
Journal in 2003 and 2004 were randomized controlled trials, (2)
previously validated outcome measures were commonly modified and not
revalidated, (3) outcome assessors were likely to be unblinded (56%) in
orthopaedic randomized trials, (4) blinding of outcome assessors was possible
in many situations but was not conducted, and (5) studies in which assessors
were not blinded were associated with significantly larger estimates of
reported treatment effect (ratio of odds ratios, 0.31; 95% confidence
interval, 0.20 to 0.47).
Strengths and Limitations
Our study is strengthened by a comprehensive search in duplicate to
identify all published randomized controlled trials in The Journal.
Our findings may not be generalized to other journals or other study designs.
The number of studies in our review was limited to those published in the
eligible time-period. Although the number of studies was small, the
association between blinding status and treatment effect was sufficiently
large to identify a significant difference. The study was, however,
underpowered to detect the association between positive study results and
blinding. Our power analysis suggests that 216 studies (108 per arm) would be
required to achieve 80% study power (a beta value of 0.20) with an alpha value
of 0.05.
Many factors may explain the heterogeneity in the size of treatment effects
in addition to
blinding5. In our
study, allocation concealment had less influence on the magnitude of treatment
effect than did blinding of the outcome assessors. However, our sample of
studies was not large enough to reach significance. Thus, our results should
be cautiously interpreted. Our findings do, however, raise important
hypotheses to be examined in future studies.
Review of the Relevant Literature
Evaluation of Outcome Reporting
We chose not to use the terms subjective and objective
outcome measures as Zarins did in his recent
editorial17. In
orthopaedics, so-called hard outcomes (clinical events, such as mortality, and
therapeutic outcomes that are assessed from data in the medical record) are
seldom the key outcome of
interest13. Outcome
measures, traditionally described as objective or hard, can even be subject to
interrater disagreement, for example, in imaging of the scaphoid bone and
physical examination of the range of
motion14,26;
therefore, we rely on patient-reported, subjective outcome
measures13,14.
Well-designed patient-reported questionnaires have undergone rigorous testing
and may be more
objective14. Thus,
outcome objectivity is not determined by whether a clinician measures a
parameter directly; rather, it is dependent on the reliability or
reproducibility of a finding, among patients and clinicians
alike27.
Nevertheless, these so-called subjective outcomes, currently in
vogue17, present
great opportunities for
bias11. For
example, Schulz and Grimes explained: "If outcome assessors who know of
the treatment allocation believe a new intervention is better than an old one,
they could register more generous responses to that
intervention."11
In the article by Boutron et al., 77% of the 110 studies they evaluated used a
patient-reported outcome and 15% of the studies used outcomes that did not
suppose contact with patients
(radiology)13. The
majority of the randomized controlled trials that met the eligibility criteria
for our study (63%) used radiographs as an outcome measure, whereas
patient-reported outcomes were less frequently used (a visual analogue scale
was used by 38%, and the SF-36 was used by 13%).
The study by Harvie et al., on the use of outcome scores in surgery of the
shoulder, revealed that the overall pattern of the application of an outcome
score was highly variable and at times
inappropriate19. In
their study, only nineteen randomized controlled trials were identified.
Changes were made to the outcome scores, often without proper testing of the
modification and without
justification19.
These results are comparable with our findings in the thirty-two randomized
controlled trials that we reviewed.
Quality of Reporting on the Methodological Safeguard of Blinding
In nonpharmaceutical trials of hip and knee osteoarthritis, blinding was
considered feasible less often than in pharmaceutical
trials13. Our study
investigated the feasibility and reporting of blinding in a variety of
orthopaedic diseases, whereas Boutron et al. studied only hip and knee
osteoarthritis13.
As a result, our study covers a wider spectrum of surgical procedures. If
different surgical approaches are used, as in minimally invasive hip
replacement studies, identical wound dressings can be used to blind patients
in the early recovery period, the period in which differences in treatment
effect are most likely to
occur9. If patients
cannot be blinded, outcome assessors usually can be blinded. If outcome
assessors are blinded, it is less likely they will bias their outcome
assessments, especially if patient-reported (so-called soft or subjective)
outcome measures, such as pain, are
used9,11.
Boutron et al. stated that outcome assessment is blinded in cases when the
patient is the outcome assessor and the patient is blinded, for example, when
self-reported outcome instruments are
used13. These
patient-based questionnaires are often filled out in clinics with an outcome
assessor present in or near the room. Patients frequently ask research nurses
or trial coordinators (e.g., outcome assessors) questions about the questions
in the outcome instrument. In this situation, an unblinded outcome assessor
could direct a patient in their answers to questions about the questionnaire.
Therefore, outcome assessor blinding in orthopaedic research is of paramount
importance. The 33% difference in treatment effect shown in our study further
suggests that blinding is important. A future study should focus in more
detail on the influence of outcome assessors on patients filling out
self-reported questionnaires. In the present study, the majority of the
randomized controlled trials published in The Journal in 2003 and
2004 failed to note whether outcome assessors were blinded.
A previous study on the quality of reporting of randomized controlled
trials in The Journal categorized the blinding of treatment
providers, patients, outcome assessors, and data analysts into three groups
according to whether there was (1) a clear statement of blinding, (2) a clear
statement of no blinding, and (3) no statement on
blinding28. That
study did not describe whether blinding would have been feasible, but it noted
that the majority of studies (55%) did not adequately report
blinding28.
Interestingly, that report found a similar proportion of randomized controlled
trials (3%) in the total number of publications.
Reporting guidelines in randomized controlled trials continue to
evolve29. Studies
may not describe blinding of outcome assessors when, in fact, blinding was
done30. The CONSORT
statement provides guidelines for better reporting of randomized controlled
trials29. However,
the application of the CONSORT statement in many randomized controlled trials
remains
low29,31.
The Effect of Outcome Blinding on the Reported Treatment Effects
Blinding of outcome assessors is one of the methodological safeguards to
ensure the internal validity of a
trial7,11,24.
Treatment effects are known to be overestimated in unblinded, non-orthopaedic
(pharmaceutical or medical)
studies5,7,25.
The study by Schulz et
al.7 showed a ratio
of odds ratios of 0.83 (95% confidence interval, 0.71 to 0.96), Kjaergard et
al.25 reported a
ratio of odds ratios of 0.56 (95% confidence interval, 0.33 to 0.98), and Juni
et al.5 reported a
ratio of odds ratios of 0.88 (95% confidence interval, 0.75 to 1.04). One
report described surgical trials and found a ratio of odds ratios of 0.87 (95%
confidence interval, 0.56 to
1.36)23.
Conversely, another report showed an underestimation of treatment effect
(ratio of odds ratios, 1.11; 95% confidence interval, 0.76 to
1.63)24. In our
study, we found strong evidence that unblinding the outcome assessors
exaggerates the treatment effect (ratio of odds ratios, 0.31; 95% confidence
interval, 0.20 to 0.47) (Fig.
3). Previously, it had been suggested that the use of unblinded
outcome assessors who score so-called soft outcomes (patient-reported outcomes
such as pain and
disability)13, as
is often the case in orthopaedic trials, may result in biased
findings11. Our
study is the first, as far as we know, to test this theory for outcome
assessment in randomized controlled trials in orthopaedics.
In conclusion, if readers want to apply the findings of a randomized
controlled trial to their daily clinical work, they should rely on the
internal and external validity of the randomized controlled trial. Our study
showed that reports of randomized controlled trials had serious threats to
internal validity. Investigators should carefully report outcome measures, use
validated measures whenever possible, and attempt to blind the outcome
assessment whenever possible.
Lists of the thirty-two randomized controlled trials and the specific
outcomes studied are available with the electronic versions of this article,
on our web site at
(go to
the article citation and click on "Supplementary Material") and on
our quarterly CD-ROM (call our subscription department, at 781-449-9780, to
order the CD-ROM). ?
Wright JG, Swiontkowski MF, Heckman JD.
Introducing levels of evidence to the journal. J Bone Joint Surg
Am. 2003;85:
1-3.851Â
2003Â
[PubMed][CrossRef] Â
Day SJ, Altman DG. Statistics notes:
blinding in clinical trials and other studies. BMJ.
2000;321:
504.321504Â
2000Â
[PubMed][CrossRef] Â
Fergusson D, Glass KC, Waring D, Shapiro
S. Turning a blind eye: the success of blinding reported in a random sample of
randomised, placebo controlled trials. BMJ.
2004;328:
432.328432Â
2004Â
[PubMed][CrossRef] Â
Juni P, Witschi A, Bloch R, Egger M. The
hazards of scoring the quality of clinical trials for meta-analysis.
JAMA. 1999;282:
1054-60.2821054Â
1999Â
[PubMed][CrossRef] Â
Juni P, Altman DG, Egger M. Systematic
reviews in health care: assessing the quality of controlled clinical trials.
BMJ. 2001;323:
42-6.32342Â
2001Â
[PubMed][CrossRef] Â
Moher D, Cook DJ, Jadad AR, Tugwell P,
Moher M, Jones A, Pham B, Klassen TP. Assessing the quality of reports of
randomised trials: implications for the conduct of meta-analyses.
Health Technol Assess.
1999;3: i-iv,
1-98.3iÂ
1999Â
[PubMed] Â
Schulz KF, Chalmers I, Hayes RJ, Altman
DG. Empirical evidence of bias. Dimensions of methodological quality
associated with estimates of treatment effects in controlled trials.
JAMA. 1995;273:
408-12.273408Â
1995Â
[PubMed][CrossRef] Â
Schulz KF. Assessing allocation
concealment and blinding in randomised controlled trials: why bother?
Evid Based Med. 2000;5:
36-8.536Â
2000Â
[CrossRef] Â
Lilford R, Braunholtz D, Harris J, Gill
T. Trials in surgery. Br J Surg.
2004;91:
6-16.916Â
2004Â
[PubMed][CrossRef] Â
Devereaux PJ, Bhandari M, Montori VM,
Manns BJ, Ghali WA, Guyatt GH. Double blind, you are the weakest
link—good-bye! ACP J Club.
2002;136:
A11.136A11Â
2002Â
Â
Schulz KF, Grimes DA. Blinding in
randomised trials: hiding who got what. Lancet.
2002;359:
696-700.359696Â
2002Â
[PubMed][CrossRef] Â
Schulz KF, Chalmers I, Altman DG. The
landscape and lexicon of blinding in randomized trials. Ann Intern
Med. 2002;136:
254-9.136254Â
2002Â
Â
Boutron I, Tubach F, Giraudeau B, Ravaud
P. Blinding was judged more difficult to achieve and maintain in
nonpharmacologic than pharmacologic trials. J Clin Epidemiol.
2004;57:
543-50.57543Â
2004Â
[PubMed][CrossRef] Â
Pynsent PB. Choosing an outcome measure.
J Bone Joint Surg Br.
2001;83:
792-4.83792Â
2001Â
[PubMed][CrossRef] Â
Bombardier C. Outcome assessments in the
evaluation of treatment of spinal disorders: summary and general
recommendations. Spine.
2000;25:
3100-3.253100Â
2000Â
[PubMed][CrossRef] Â
Schulz KF, Grimes DA. Allocation
concealment in randomised trials: defending against deciphering.
Lancet. 2002;359:
614-8.359614Â
2002Â
[PubMed][CrossRef] Â
Zarins B. Are validated questionnaires
valid? J Bone Joint Surg Am.
2005;87:
1671-2.871671Â
2005Â
[PubMed][CrossRef] Â
Swiontkowski MF, Buckwalter JA, Keller
RB, Haralson R. The outcomes movement in orthopaedic surgery: where we are and
where we should go. J Bone Joint Surg Am.
1999;81:
732-40.81732Â
1999Â
[PubMed][CrossRef] Â
Harvie P, Pollard TC, Chennagiri RJ,
Carr AJ. The use of outcome scores in surgery of the shoulder. J Bone
Joint Surg Br. 2005;87:
151-4.87151Â
2005Â
[CrossRef] Â
Pynsent PB, Fairbank JC, Carr AJ.
Outcome measures in orthopaedics and orthopaedic trauma. 2nd
ed. New York: Oxford University Press; 2004.Â
2004Â
Â
Moher D, Schulz KF, Altman DG; CONSORT
Group. The CONSORT statement: revised recommendations for improving the
quality of reports of parallel-group randomised trials. Lancet.
2001;357:
1191-4.3571191Â
2001Â
[PubMed][CrossRef] Â
Landis JR, Koch GG. The measurement of
observer agreement for categorical data. Biometrics.
1977;33:
159-74.33159Â
1977Â
[PubMed][CrossRef] Â
Balk EM, Bonis PA, Moskowitz H, Schmid
CH, Ioannidis JP, Wang C, Lau J. Correlation of quality measures with
estimates of treatment effect in meta-analyses of randomized controlled
trials. JAMA. 2002;287:
2973-82.2872973Â
2002Â
[PubMed][CrossRef] Â
Moher D, Pham B, Jones A, Cook DJ, Jadad
AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised
trials affect estimates of intervention efficacy reported in meta-analyses?
Lancet. 1998;352:
609-13.352609Â
1998Â
[PubMed][CrossRef] Â
Kjaergard LL, Villumsen J, Gluud C.
Reported methodologic quality and discrepancies between large and small
randomized trials in meta-analysis. Ann Intern Med.
2001;135:
982-9.135982Â
2001Â
[PubMed] Â
Poolman RW, Hanel DP, Mann FA, Ponsen
KJ, Marti RK, Roolker L. Trans-Atlantic hospital agreement in reading first
day radiographs of clinically suspected scaphoid fractures. Arch Orthop
Trauma Surg. 2002;122:
373-8.122373Â
2002Â
Â
Suk M, Hanson BP, Norvell DC, Helfet DL.
The AO handbook of musculoskeletal outcome measures and
instruments. New York: Thieme; 2005.Â
2005Â
Â
Bhandari M, Richards RR, Sprague S,
Schemitsch EH. The quality of reporting of randomized trials in the Journal of
Bone and Joint Surgery from 1988 through 2000. J Bone Joint Surg
Am. 2002;84:
388-96.84388Â
2002Â
Â
Mills E, Wu P, Gagnier J, Heels-Ansdell
D, Montori VM. An analysis of general medical and specialist journals that
endorse CONSORT found that reporting was not enforced consistently. J
Clin Epidemiol. 2005;58:
662-7.58662Â
2005Â
[CrossRef] Â
Devereaux PJ, Choi PT, El Dika S,
Bhandari M, Montori VM, Schunemann HJ, Garg AX, Busse JW, Heels-Ansdell D,
Ghali WA, Manns BJ, Guyatt GH. An observational study found that authors of
randomized controlled trials frequently use concealment of randomization and
blinding, despite the failure to report these methods. J Clin
Epidemiol. 2004;57:
1232-6.571232Â
2004Â
[CrossRef] Â
Mills EJ, Wu P, Gagnier J, Devereaux PJ.
The quality of randomized trial reporting in leading medical journals since
the revised CONSORT statement. Contemp Clin Trials.
2005;26:
480-7.26480Â
2005Â
[PubMed][CrossRef] Â