The topic of positive-outcome bias has been recognized as worthy of study
for decades in the internal medicine
literature1-13,
but it has been relatively unexamined in orthopaedics and other surgical
subspecialties14-20.
This is the case despite the increased emphasis that orthopaedic surgeons have
placed on evidence-based approaches such as meta-analysis, cost-benefit
modeling, and decision analysis, all of which depend entirely on the absence
of systematic biases and influences from nonscientific factors in the review
process21,22.
For example, if there is a bias toward the publication of studies that
demonstrate a positive result over the publication of similarly well-designed
studies that demonstrate a negative result, then subsequent meta-analyses,
which evaluate the published literature more efficiently than they evaluate
unpublished work, will systematically overestimate the size of apparent
treatment
effects23-28.
Previous
studies1,15-17,29-31,
including our
own14,19,
have identified strong associations between commercial funding and positive
research outcomes in the orthopaedic literature; in fact, all studies of which
we are aware have suggested that the print literature contains a higher
proportion of orthopaedic studies with a positive outcome than studies with
negative or nonsupportive
findings14,17,19,20.
However, these studies evaluated only published and presented work and so were
not appropriately designed to answer questions about the presence or absence
of actual bias. In order to assess whether peer review results in
positive-outcome bias, all studies submitted to a journal for peer review must
be analyzed, while controlling for relevant confounding variables, to
determine whether
those with a positive outcome are more likely to be published. Only by
analyzing all studies submitted to a journal and controlling for as many
reasonable confounding factors as possible is it feasible to determine whether
there is an inappropriate differential acceptance rate between studies with a
positive outcome and those with a negative outcome, and only by analyzing that
same denominator—manuscripts submitted for peer review—is it
possible to determine whether the disproportion of positive findings in
commercially funded published research represents an actual bias.
In the present study, performed in cooperation with The Journal of Bone
and Joint Surgery, American Volume, we evaluated all manuscripts
concerning lower-extremity adult reconstruction (hip and knee arthroplasty)
that had been submitted during a seventeen-month period. We performed our
evaluations, before these manuscripts went through The Journal's
peer-review process, in order to test the following hypotheses.
Non-scientific variables, including commercial funding, country of origin,
and the employment of a trained biostatistician in the research process, are
associated with positive outcomes in research submitted for peer review.Positive study outcomes and nonscientific variables are associated with
acceptance for publication by The Journal of Bone and Joint
Surgery.
Non-scientific variables, including commercial funding, country of origin,
and the employment of a trained biostatistician in the research process, are
associated with positive outcomes in research submitted for peer review.
Positive study outcomes and nonscientific variables are associated with
acceptance for publication by The Journal of Bone and Joint
Surgery.
Inclusion and Exclusion Criteria
All manuscripts submitted involving original research on the subject of
adult hip or knee reconstruction to TheJournal of Bone and Joint
Surgery, American Volume, between January 2004 and June 2005 were
considered for this prospective study. Manuscript revisions or resubmissions,
Current Concepts Reviews, Instructional Course Lectures, systematic reviews,
meta-analyses, and basic-science studies were excluded, as were case reports,
brief communications, technical notes, and editorial or ethics commentaries.
Two hundred and nine manuscripts met the inclusion criteria for this
analysis.
Study Blinding
The title, abstract, introduction, materials and methods, results, and
discussion sections of all manuscripts that satisfied the inclusion criteria
were reproduced and redacted such that the authors' names, departmental and
institutional affiliations, academic degrees, study locations, funding
sources, conflict-of-interest statements, and acknowledgments were not visible
to the study reviewers. Staff at The Journal's office carried out
this process of redaction, and no individual identifiers were transmitted to
the study investigators at any point in this research, including during the
formal analysis and statistical review.
Outcome Analysis
Two orthopaedic surgeons (S.S.L. and W.J.W.) evaluated all of the blinded
manuscripts and, according to strict predetermined criteria, categorized the
study outcome as either (1) positive/favorable/significant difference observed
or (2) not positive/unfavorable/no significant difference observed. Studies
that were purely dvescriptive and not suitable for analysis in this manner
were classified as "not analyzable" and were excluded from
analyses concerning study outcome.
The strict criteria that the reviewers applied to categorize the outcome
are standard in research of this type and have been published in numerous
first-tier medical specialty
journals2,5,9,11,32-34
as well as in work by the senior authors (S.S.L. and W.J.W.) published in the
orthopaedic peer-reviewed
literature14,19
(Tables I and
II). The two investigators
performing this portion of the analysis demonstrated a high degree of
interobserver agreement with this kind of analysis in previous studies on this
topic14,19.
As previously
described14,19,
any manuscript that did not clearly meet the criteria for a
"positive" designation was considered to be "not
positive" in order to minimize the likelihood that any bias toward
positive outcomes would be overstated.
Determination of Study Quality, Level of Evidence, and Sample
Size
Two orthopaedic surgeon reviewers (J.R.L. and M.R.A.C.) classified the
manuscripts according to study type (therapeutic, prognostic, diagnostic, or
exposure/risk/harm) and according to the level of evidence as set forth by the
submission requirements of The Journal. Disagreements about
classifications were resolved by consensus. The same reviewers recorded the
sample size given in each manuscript.
Study quality was assessed independently by each of these two reviewers
using evidence-based-medicine study-quality criteria described initially by
Sackett et al. (see
Appendix)18,21,35-43.
Although there are published instruments for rating the quality of randomized
controlled trials, we are not aware of any published instrument that can be
used to analyze all types of studies in terms of quality. For this reason, the
criteria of Sackett et al. were used. These criteria make up the standard
evidence-based-medicine approach that has been adopted across all disciplines,
they have been recommended by The Journal as useful in interpreting
clinical
research18,21,42,
and they are circulated by The Journal to residency programs that
have received The Journal of Bone and Joint Surgery journal-club
grants. These criteria have a high level of face validity as well as a high
level of familiarity to evidence-based-medicine practitioners and readers of
The Journal. As the evidence-based-medicine criteria vary in number,
depending on the type of study being analyzed, the Sackett scores were
normalized to allow the questionnaires for each study type to be compared as
percentages of the highest possible score. Questions were formatted so that
the possible answers were either "yes" or "no and/or
uncertain"; for example, if a manuscript about a surgical therapy did
not provide enough information for the reviewers to determine whether the
outcomes were evaluated in a blinded fashion, the answer would be "no
and/or uncertain." Study quality was graded according to the percentage
of "yes" responses (i.e., the number of "yeses"
divided by the total number of end points analyzed for studies of that
design).
If the reviewers who independently assessed the study's quality on the
basis of the Sackett questionnaires disagreed about more than one of the
study-quality questions, the manuscript was reevaluated independently by the
two reviewers to see if a consensus could be achieved; this step was designed
to avoid errors in data recording. If agreement was not reached at that point,
the paper was graded by both reviewers together to reach a consensus. The
kappa statistics for interobserver agreement demonstrated nearly perfect
agreement when the analyses were performed independently (all kappa values
= 0.94) by the two study graders using this study-quality scoring
system.
Database of Potential Predictor Variables
During the assessment of the outcomes and the study quality, The
Journal maintained a database of the manuscripts under review that
prospectively recorded each manuscript's funding source (commercial,
philanthropic, or nonfunded/other), country or countries of origin, and
presence or absence of a coauthor with expertise in statistics or
epidemiology. The investigators reviewing the outcomes and study quality were
blinded to the contents of this database, and the database was not shared with
the study investigators until all analyses of outcome, study quality, and
sample size in all manuscripts were completed.
Information concerning the funding source was obtained directly from the
authors' disclosure statements in each manuscript or from the acknowledgment
section, if a source of funding was identified there. Manuscripts were placed
into one of three categories on the basis of the funding source: (1)
commercial funding, (2) noncommercial or philanthropic funding (e.g., the
Orthopaedic Research and Education Foundation), and (3) nonfunded/other.
Industry sponsorship was attributed when any commercial benefit, either direct
or indirect, was acknowledged by the authors. When no funding was reported in
disclosures or acknowledgments, a manuscript was classified as
nonfunded/other. One manuscript submission was rejected by The
Journal before the conflict-of-interest form was received from the
submitting authors, and thus the funding source for this particular submission
was not available for analysis.
The country of origin was determined by evaluating the departmental
affiliations of all coauthors, and the study was placed into one of three
categories: (1) United States only, (2) United States and one or more other
countries, and (3) nonUnited States only.
The listing of a coauthor with expertise in statistics or epidemiology was
also recorded. This variable was considered as a binary category, with credit
for statistical or epidemiologic expertise given to studies when one or more
coauthors listed a departmental affiliation in epidemiology, health studies,
or statistics or when one or more of the coauthors listed an advanced degree
(MSc, PhD, or MPH) in epidemiology or biostatistics. In addition, manuscripts
in which the acknowledgment section identified an individual with statistical
expertise as having assisted with or performed the statistical analysis were
given credit for satisfying this end point.
After a database that included outcomes, study quality, and sample sizes
was created, the database from The Journal that contained the
additional variables of interest was added, and comparisons were made
according to the a priori study hypotheses.
Statistical Analysis
Statistical analyses were performed by the two study statisticians (D.C.S.
and F.M.W.) using SPSS software (version 13.0; Chicago, Illinois). Chi-square
analyses were conducted to test for differences in the percentages of outcomes
measured on nominal scales, and independent t tests were used to examine mean
differences in continuous outcomes. Significance was set at the p = 0.05
level, two-tailed, and statistical trend was set at the p = 0.1 level.
Institutional Review Board Approval
Approval from a local institutional review board was not obtained as
The Journal Editor and a review by The Journal legal counsel
determined that it was not necessary. It was decided that manuscript review
posed no risk of harm. Nondisclosure agreements were signed in advance of this
study by all investigators, and blinding of all study reviewers was strictly
maintained at the level of the Editor of The Journal and his staff.
During the electronic submission process, the corresponding authors of all
submitted manuscripts were made aware that a study was ongoing and that the
study was separate from the peer-review process of manuscript submission
carried out by The Journal.
Two hundred and nine manuscripts concerning hip or knee arthroplasty met
the previously described inclusion criteria and served as the cohort for the
proposed analysis. One hundred and thirty-nine articles concerned the topic of
therapy; thirty-four, the topic of exposure/risk/harm; nineteen, the topic of
prognosis; and seventeen, the topic of a diagnostic test. Seventy-one percent
(148) of the 209 studies submitted concluded with a positive outcome.
Thirty-one percent (sixty-four) of the 209 submitted manuscripts were accepted
for publication following The Journal's peer-review process.
Twenty-six percent (fifty-four) of the 208 studies for which the funding
was known were classified as commercially funded, whereas 69% (143) were
classified as having received no funding, commercial or otherwise. As
previously mentioned, information regarding funding sources was unavailable
for one manuscript. Seventy-four percent (forty) of the fifty-four
commercially funded studies concluded with a positive outcome compared with
69% (ninety-nine) of the 143 nonfunded studies; this difference was not
significant, with the numbers available (p = 0.68). The country of origin also
was not associated with the study outcome (p = 0.107), with the numbers
available. Studies that included a biostatistician as a coinvestigator or
collaborator were not more likely to conclude with a positive outcome than
were those that did not employ a statistician (p = 0.637).
Table III summarizes the
relationships between study outcome and other variables of interest.
Of the 148 studies that concluded with a positive outcome, 30% (forty-five)
were accepted for publication. In comparison, 37% (eighteen) of the forty-nine
studies with a non-positive outcome were accepted for publication. The
difference in the publication rate between the studies with and those without
a positive outcome was not significant (p = 0.410). Twelve manuscripts (6%)
were not analyzable according to the predetermined positive/non-positive
outcome criteria; one of these was accepted for publication. Interestingly,
although there was no difference in the publication rate between the studies
with and those without a positive outcome, studies with a non-positive outcome
demonstrated higher scores for study quality than did studies with a positive
outcome (mean Sackett score, 60% compared with 49%; p = 0.003) and had larger
sample sizes (mean, 782 compared with 202; p = 0.05). Manuscript disposition
(acceptance or rejection) was not associated with the presence or absence of a
statistician, study outcome, Sackett score, level of evidence, or sample size
(p = 0.206 to 0.910), with the numbers available
(Table IV).
Further analysis of studies that were eventually accepted for publication
demonstrated that they were more likely to be commercially funded (p = 0.027;
odds ratio = 2.1, 95% confidence interval = 1.08 to 4.08) and more likely to
have been authored by United States investigators (p = 0.020) than were
manuscripts rejected following the peer-review process
(Table IV). This was the case
despite the fact that commercially funded studies and United States-based
studies were no more likely to be of higher quality as assessed on the basis
of the objective Sackett criteria, level of evidence, and sample size (p =
0.24 to 0.79).
Numerous prior studies of the published literature, including our own, have
identified associations between the receipt of commercial funding and positive
study
conclusions1,14-17,19,29-31.
The present investigation, in which we evaluated not the published literature
but all manuscripts on hip and knee arthroplasty submitted to a first-tier
orthopaedic journal, demonstrated that 74% of commercially funded studies
concluded with a positive outcome compared with 69% of nonfunded
manuscripts—a difference that was not significant with the numbers
available (p = 0.668). We also found that 71% (148) of 209 submitted
manuscripts and 70% (forty-five) of sixty-four accepted manuscripts concluded
with a report of a positive outcome; this difference, again, was not
significant (p = 0.53). In summary, the present study, which to our knowledge
is the first in orthopaedics to analyze submitted manuscripts rather than
being limited to the published literature, revealed that nonscientific
factors, including industry funding, were not associated with a positive study
outcome and that a positive study outcome was not itself a predictor of
acceptance following peer review. While evaluation of the published literature
(as has been the approach in most prior studies) might imply the presence of
bias, only the methodology used in the present report—comparison of the
group of manuscripts submitted with the group eventually accepted for
publication—is sufficiently robust to actually determine whether there
is systematic positive-outcome bias in the peer-review process. If bias is
present, it does not appear to be a dominant force affecting peer review.
It is important to note that, although the likelihood of publication did
not differ significantly between studies with a positive outcome and those
with a non-positive outcome, the latter appeared to be of better quality,
demonstrating significantly higher quality scores as determined with the
Sackett criteria as well as including significantly larger sample sizes. If,
in fact, studies with a non-positive outcome are better, but are not more
likely to be published, this may represent an insidious or low-level form of
bias against non-positive-outcome studies during peer review. There has been
limited experimental evidence of such bias during manuscript evaluation by
expert reviewers44,
but this is a topic that requires further study. If, indeed, reviewers tend to
give lower merit scores to research with a non-positive outcome, this has
implications for evidence-based orthopaedics, as it would lead to
overestimations of the sizes of apparent treatment effects in meta-analyses
and other, analogous forms of synthetic
literature23-28.
However, the magnitude of our finding in this regard was not as dramatic as
was predicted by prior studies that evaluated the published literature;
therefore, this subject warrants further study for confirmation of our
findings. On the basis of our results, though, it does seem reasonable to
caution peer reviewers to give non-positive findings due consideration during
the manuscript evaluation process.
The present report showed that two nonscientific factors, commercial
funding and the origin of the research within the United States, were
associated with a greater likelihood of a manuscript being accepted for
publication. This was the case even though United States-based and
commercially funded manuscripts were not significantly different from
non-United States-based and nonfunded research in terms of sample size, study
quality (on the Sackett scale), or level of evidence. This finding cannot be
considered to be sinister, however, as the peer reviewers were entirely
blinded to the funding source and country of origin throughout the manuscript
evaluation process. One can speculate that these findings may have been the
result of a familiarity effect, with a largely United States-based pool of
reviewers specializing in adult reconstruction tending to be more accepting of
more recognizable study formats (product testing), North American syntax, and
familiar verbiage. Perhaps equally likely is the influence of apparent
relevance; adult reconstruction is both technology and implant-intensive, so
United States-based commercial studies, which may be more likely than
nonfunded research to test products and equipment in environments like those
familiar to the reviewers, may receive better reviews. Since there are at
least several potential explanations for our findings in this area, it seems
reasonable to continue to recommend that peer reviewers be particularly open
to alternative study designs and themes as well as to research that is not
written in typical United States-English vernacular.
The strengths of this report include the fact that it is the first study in
orthopaedics, to our knowledge, in which publication bias was formally and
properly searched for through a review of an ample cohort of submitted
manuscripts within a subspecialty rather than a review of the published
literature itself. This approach allowed us to comment on actual bias rather
than the "apparent" bias that has been typically assessed in prior
studies in which the methods were less
robust1,14-17,19,30,31.
The criteria that we employed to assess outcomes have been used by
investigators in other specialties as well as orthopaedics and have been
published in first-tier medical
journals2,5,9,11,32-34.
Additionally, the manuscripts were evaluated by reviewers who demonstrated a
high level of interobserver and intraobserver agreement.
Potential shortcomings of the approach used in this study include the fact
that commercial disclosures were self-reported and necessarily were
categorized dichotomously (present or absent). The disclosure statements, as
they were worded at the time of this study, did not allow further
quantification of this variable, and there is evidence that underreporting of
these conflicts may be
commonplace45-48.
Another limitation is the issue of the statistical power of the submitted
manuscripts. The authors of studies with a negative outcome frequently failed
to describe the amount of statistical power of the study, and often there was
insufficient information available in the manuscripts to allow post hoc
calculations; as a result, we relied on sample size as an imperfect, but
helpful, surrogate. Finally, the analyses performed in this report were
limited to manuscripts on adult hip and knee reconstruction submitted to
The Journal of Bone and Joint Surgery, American Volume. We do not
assume that our findings apply more broadly to other subspecialties or
journals. However, our preliminary
work14 suggested
that, if commercial funding had an impact on research outcomes, it would be
most readily detectable in the specialty of adult lower-extremity
reconstruction. Given the differences that were observed between sports
medicine and arthroplasty research in that earlier
study14, it is
quite conceivable that the findings in the present report cannot be
generalized to other subspecialties within or outside of orthopaedic
surgery.
It is worth noting that we sought only to evaluate the possibility of bias
in the peer-review process. There is evidence that, prior to manuscript
submission and peer review, nonscientific factors may exert an effect on study
completion, study outcome, and the decision to submit research for
publication. Specifically, restrictive covenants and gag clauses have delayed
or even prevented manuscript
submission49-51,
and there are numerous examples of commercially funded clinical trials being
halted by the sponsor before completion because trends in the data were
perceived as being unfavorable to the sponsoring corporate
interest52-54.
These sources of bias, if they exist, would influence the scientific process
before peer review begins and thus were not evaluated in the present report. A
curious finding in this report was the absence of significant differences in
study quality (Sackett score) between manuscripts accepted for publication and
those rejected. Potential explanations for this finding include the
possibility that the overall quality of submissions to The Journal is
so high that quality differences among them are too subtle to be discerned
with use of standard evidence-based-medicine scoring approaches or, more
likely, that reviewers make their decisions on the basis of other criteria,
such as timeliness, economic impact, or perceived relevance, in addition to
those recommended in standard evidence-based-medicine texts. In any event,
this finding is outside the scope of our primary study end points as defined
by our a priori hypotheses, and until or unless this finding is validated by
others it should not be accorded great weight.
It is encouraging that, at least for the denominator surveyed, the
peer-review process itself does not appear to suffer from severe
positive-outcome bias and the impact of commercial funding on research
outcomes is less pronounced than was previously suspected. We did identify
what may be subtle influences in both of these areas, but we observed nothing
that could be described as pervasive or sinister. While there is anecdotal
evidence of a commercial impact on research
outcomes1,14-17,19,29-31
and egregious cases of misconduct continue to be (and should be)
documented1,45-48,55-78,
the present report suggests that peer review is functioning better in this
regard than has been perceived.