Background: Since January 2003, all clinical scientific articles
published in the American volume of The Journal of Bone and Joint
Surgery (JBJS-A) have included a level-of-evidence rating. The aim of the
current study was to evaluate the interobserver agreement among reviewers,
with varying levels of epidemiology training, in categorizing the levels of
evidence of these clinical studies.
Methods: Fifty-one consecutive clinical papers published in the
American volume of JBJS were identified by a computerized search of the table
of contents from January 2003 through June 2003. Each paper was blinded so
that only the title, abstract (without the level of evidence designated), and
methods section were provided to the reviewers. The papers were coded and were
randomly organized in a binder. Six surgeons graded each blinded paper for (1)
the type of study (therapeutic, prognostic, diagnostic test, or economic or
decision analysis), (2) the level of evidence (on a scale of I through V), and
(3) the subcategory within the particular level of evidence. Three surgeons
were members of JBJS American Editorial Board, two surgeons were reviewers for
JBJS-A, and one surgeon was an active researcher not formally associated with
JBJS-A. The reviewers did not receive any formal training in the application
of the classification system, but each was provided with a detailed
description of the classification system used by JBJS-A. Intraclass
correlation coefficients with 95% confidence intervals were determined for the
reviewers' agreement regarding the type of study, level of evidence, and
subcategory within the level of evidence.
Results: The majority (69%) of the fifty-one included articles were
studies of therapy, and 57% of the studies constituted Level-IV evidence. The
intraclass correlation coefficients for the agreement among all reviewers with
regard to the study type, level of evidence, and subcategory within the level
of evidence ranged from 0.61 to 0.75. Reviewers trained in epidemiology
demonstrated greater agreement (range in intraclass correlation coefficients,
0.99 to 1.0), across all aspects of the classification system, than did
reviewers who were not trained in epidemiology (range in intraclass
correlation coefficients, 0.60 to 0.75).
Conclusions: These findings suggest that epidemiology and
non-epidemiology-trained reviewers can apply the levels-of-evidence guide to
published studies with acceptable interobserver agreement. The validity of
this system remains a question for future research.