Critical appraisal of prior evidence, especially prior randomized trials evaluating issues similar to the one of interest, should act as a starting point for researchers who wish to conduct a new trial or study. Prior evidence informs the need for additional research and facilitates interpretation of the results of a new study. Ultimately, this due diligence should discourage researchers from conducting trials for which well-established conclusions already exist, preventing redundancy within the literature, avoiding inefficient use of research time and funding, and eliminating the potential harm to patients involved in unnecessary surgical trials.
The Consolidated Standards of Reporting Trials (CONSORT), developed in 1996 to improve the quality of reporting of randomized controlled trials and subsequently adopted by most major health-care journals, was among the first to formally recognize the importance of acknowledging prior evidence. The CONSORT recommended that researchers “state general interpretation of the data in light of the totality of available evidence”1. Studies conducted after the introduction of the CONSORT have demonstrated that most researchers continue to fail to take into account all of the available evidence when reporting the results of their trial2,3.
Fergusson et al.4 demonstrated that inadequate citation of previous randomized trials evaluating aprotinin, a serine protease inhibitor, led to a number of unnecessary trials being conducted long after the drug's efficacy had been well established. Currently, aprotinin is not approved by the Food and Drug Administration for use in the United States at this time.
To explore the extent to which previous studies have been cited in investigations of hip fracture treatment, we assessed the reference sections of studies addressing four related topics: (1) internal fixation compared with arthroplasty, (2) total hip arthroplasty compared with hemiarthroplasty, (3) sliding hip screws compared with other forms of fixation, and (4) the effect of the delay of surgical treatment on hip fracture patients. The present study focused on the issue of internal fixation compared with arthroplasty to demonstrate how a lack of reference to prior evidence can lead to redundancy within the literature. We used the three remaining topics (total hip arthroplasty compared with hemiarthroplasty, sliding hip screws compared with other forms of fixation, and the effect of surgical delay on hip fracture patients) to confirm our findings across a variety of research questions related to the treatment of hip fractures.
Eligibility Criteria
A study was eligible for inclusion in this review if it met any one of the following criteria: (1) it was a published randomized trial, quasi-randomized trial, or meta-analysis that compared internal fixation with arthroplasty (total hip arthroplasty or hemiarthroplasty) in elderly patients (more than sixty years old) who had a displaced femoral neck fracture, (2) it was a published randomized or quasi-randomized trial that compared total hip arthroplasty with hemiarthroplasty or that compared sliding hip screws (sliding screw plate, dynamic hip screws, compression hip screws) with other forms of fixation (nails, multiple bone screws, pins) in all patients with a displaced femoral neck fracture, or (3) it was a published prospective or retrospective cohort study that evaluated the effect of preoperative surgical delay on all patients who sustained a hip fracture. All published studies were eligible, regardless of language.
Identification of Studies
We conducted a literature search of the electronic databases MEDLINE and Embase for eligible studies published between 1947 and May 2010. We used specific search terms for each group (internal fixation, fracture fixation, hip arthroplasty, total hip arthroplasty, osteosynthesis, sliding hip screw, surgical delay, operative delay) and our target population (hip fracture or femoral neck fracture). The complete search strategy for the “internal fixation compared with arthroplasty” topic is shown in the Appendix. Additional strategies used to identify relevant studies included reviewing the reference lists of all eligible articles, using the “Related Articles” feature in PubMed for all eligible articles, and consultation with an expert (that is, an orthopaedic surgeon specializing in the treatment of hip fractures who is knowledgeable about the orthopaedic literature).
One reviewer (U.S.) assessed the titles and abstracts of all articles retrieved from the electronic literature search and identified all potential studies meeting our eligibility criteria. All eligible articles for the primary topic (“internal fixation compared with arthroplasty”) were included in the study to carry out a cumulative meta-analysis. For the remaining topics of “total hip arthroplasty compared with hemiarthroplasty,” “sliding hip screws compared with other forms of fixation,” and “effect of surgical delay on hip fracture patients,” we identified an additional thirty-one studies.
Assessment of Methodological Quality
Two reviewers (U.S. and N.S.) independently evaluated the methodological quality of each randomized controlled trial with use of a 12-point ten-item quality-assessment scale generated by the Cochrane Database of Systematic Reviews5. This scale grades the quality of each study with respect to key features of a randomized trial, such as allocation concealment, blinding of outcome assessors, and the proportion of patients lost to follow-up (<5%).
The methodological quality of all meta-analyses was independently assessed by two reviewers (U.S. and N.S.) with use of the Oxman-Guyatt index6,7. This index was developed to assess the scientific quality of review articles and is comprised of a ten-item checklist. The last item on the checklist provides a score ranging from 1 (extensive flaws) to 7 (minimal flaws) for the overall quality of the research overview. For the purposes of the present study, this overall score was employed to represent the methodological quality of each meta-analysis.
All prospective and retrospective cohort studies were graded on the basis of their methodological quality with use of the Newcastle-Ottawa Scale for cohort studies8. This 9-point scale was developed to assess the quality of nonrandomized studies. The scale grades each cohort study on the basis of selection, comparability of study groups, and various aspects of the outcome (assessment, duration of follow-up, and loss to follow-up)8.
All methodological quality scores were converted to a percentage to standardize scores across study types (i.e., randomized controlled trials, cohort studies, and meta-analyses). On the basis of the consensus of methodological experts, a study with a quality rating of ≥75% was considered to be of high quality whereas one with a quality rating of <75% was deemed to be of low quality9. Discrepancies between the reviewers were resolved through discussion and, when necessary, the study methodology was reassessed until a consensus was reached. A complete version of the Oxman-Guyatt index is included in the Appendix.
Data Extraction
For each of the eligible studies, one of us (U.S.) extracted all of the relevant information, including (1) the name of the journal, (2) the year of publication, (3) the methodological quality rating, (4) the study design (primary study or meta-analysis), (5) the sample size, and (6) the direction of the conclusions made from the results, with positive results favoring arthroplasty over internal fixation, neutral results favoring neither, and negative results favoring internal fixation.
We also extracted data on the rate of revision (that is, a second operation performed after the initial procedure because of implant failure, dislocation, intractable pain, or infection), which was the primary outcome reported in each “internal fixation compared with arthroplasty” trial.
Assessing Reviewer Agreement
We used the intraclass correlation coefficient (ICC) to evaluate the interobserver agreement in terms of methodological quality scores. The ICC is used to evaluate the extent of agreement between reviewers assessing continuous variables and yields values identical to a weighted kappa with quadratic weights. We chose an a priori criterion of κ ≥ 0.65 for adequate agreement, which represents substantial agreement between reviewers9.
Data Analysis
The citation rate for each study was calculated by determining the number of previous studies that were actually cited in the article in proportion to the total number of previous studies that potentially could have been cited in the article. Potentially citable studies were limited to studies that evaluated the same clinical problem. For instance, all eligible studies investigating “internal fixation compared with arthroplasty” were restricted to trials that assessed the two treatment modalities in patients with displaced femoral neck fractures. Authors were given a one, two, three, and five-year grace period to cite available literature before the date of publication of their article. The overall citation rate for all eligible studies was also calculated.
The hit rate for each study was calculated by determining the number of times that the study was actually cited in proportion to the total number of times that the study potentially could have been cited. A study was deemed to be “heavily cited” if it had appeared in ≥70% of the journal articles in which it possibly could have been cited, “moderately cited” if it had appeared in >50% but <70% of the journal articles in which it possibly could have been cited, and “poorly cited” if it had appeared in ≤50% of the journal articles in which in which it possibly could have been cited.
We pooled the data from all of the “internal fixation compared with arthroplasty” studies and calculated the risk ratios and associated 95% confidence intervals for our outcome of interest (the revision rate). We chose to examine the two-year revision rate as the majority of studies had data on revision surgery at two years of follow-up. We analyzed our results with use of a cumulative random-effects meta-analysis10. A cumulative meta-analysis was used to establish whether robust evidence favoring a particular treatment (internal fixation or arthroplasty) was available at an earlier point in time, thus rendering all subsequent trials redundant. We conducted the cumulative meta-analysis by updating the pooled estimate of treatment effect for each of the treatments that we were assessing (internal fixation and arthroplasty) each time the results of a new trial were published. For the cumulative meta-analysis, revision rates for studies conducted in the same year were added consecutively on the basis of the date of publication.
The chi-squared statistic was used to determine the effect of sample size, results, quality rating, study design, and journal of publication on the hit rate (highly versus moderately versus poorly cited).
Source of Funding
No funding source had a role in the preparation of this article or the decision to submit it for publication.
Literature Search
Our literature search identified 235 potentially relevant citations, from which a sample of sixty studies proved eligible11-67 and addressed the following issues related to the treatment of hip fractures: (1) internal fixation compared with arthroplasty (twenty-nine studies), (2) total hip arthroplasty compared with hemiarthroplasty (seven studies), (3) sliding hip screws compared with other forms of fixation (seven studies), and (4) the effect of surgical delay on hip fracture patients (seventeen studies). One trial was published as an abstract alone13 and another was published in a language other than English14.
Study Characteristics
Detailed characteristics of the selected studies are shown in a table in the Appendix. The sixty studies were published in twenty different journals, with nineteen (32%) of them being published in The Journal of Bone and Joint Surgery (American or British volume). Forty studies (67%) had a positive result favoring arthroplasty, thirteen studies (22%) had a neutral result favoring neither arthroplasty nor internal fixation, and seven studies (12%) had a negative result favoring internal fixation. Of the sixty eligible studies, twenty-six (43%) had a high quality rating of ≥75% whereas the other thirty-four (57%) had a poor quality rating of <75%. The agreement between reviewers assessing methodological quality was excellent for randomized controlled trials (intraclass correlation, 0.94; 95% confidence interval, 0.89 to 0.96), meta-analyses (intraclass correlation, 0.81; 95% confidence interval, 0.62 to 0.91), and cohort studies (intraclass correlation, 1.0).
Citation Rate for Studies Comparing Internal Fixation with Arthroplasty
In our primary analysis, only 53% of previous trials were cited by subsequent randomized trials comparing internal fixation with arthroplasty for the treatment of displaced femoral neck fractures, assuming a one-year lag in publication (Fig. 1).
A total of twelve reports12,13,15,19,21,24,25,28,30,32,35,37 (1650 patients) provided data on revision surgery at two years of follow-up. The cumulative meta-analysis demonstrated a clinically important result after the first trial of 190 patients (risk ratio, 0.23; 95% confidence interval, 0.12 to 0.44). By the fifth study in 2000, a stable estimate of at least a 60% reduction in risk of revision with arthroplasty was reported (Fig. 2). However, six trials were conducted after the year 2000, enrolling an additional 1118 patients, with no change in findings. Of these 1118 patients, 545 were managed with internal fixation and 261 sustained consequences of implant-related reoperations. Of the six trials, three had already begun enrolling patients in the year 2000, two did not indicate when enrollment began, and one began enrolling patients after 2000.
Citation Rates: Internal Fixation, Arthroplasty, and Surgical Delay
On the average, fewer than 48% of previously published studies evaluating the same research question were cited by subsequent studies, assuming a one-year lag in publication (Table I). In a sensitivity analysis that was conducted to determine the effect of a longer lag in publication, the overall citation rates improved marginally to 51%, 52%, and 56% after assuming a two-year, three-year, and five-year lag in publication, respectively.
Of note, across all comparisons, only five (8%) of the sixty studies had a citation rate of 100%. In addition, only two of the five meta-analyses assessed in our study had a citation rate of 100%. The highest citation rates were identified for the trials comparing total hip arthroplasty with hemiarthroplasty (citation rate, 70%).
Hit Rate
Overall, fifteen studies were “highly cited,” nine were “moderately cited,” and thirty-one were “poorly cited.” The five remaining studies were the latest studies published within the four groups and thus did not yet have the opportunity to be cited. Only six of the fifty-five studies sampled had a hit rate of 100%, including only one of the five meta-analyses within the group of studies comparing internal fixation and arthroplasty.
To our knowledge, the differences in sample size, quality rating, and study design did not significantly affect the hit rate (p > 0.05), whereas the differences in the results and the journal in which the study was published did (p < 0.05) (Table II). All fifteen (100%) of the “highly cited” articles and eight (89%) of the nine “moderately cited” articles had positive results (i.e., favored arthroplasty), whereas only fourteen (45%) of the thirty-one “poorly cited” studies had positive results, suggesting the presence of publication bias, indicating that positive studies were more likely to be cited68.
Ideally, the bibliography of a new study should act as an index of all previous studies addressing similar questions. This assures the reader that the researchers consulted previous trials while designing their own trial in order to identify the problems and pitfalls associated with previous studies69. More importantly, it ensures that the authors of the new trial are looking to add to the current knowledge rather than to replicate it. When studies do not add value to the literature, they run the risk of being deemed unethical70. Unnecessary trials not only expose patients to interventions that have been proved to be inferior but also may lack scientific validity71.
The present study demonstrates that reference to prior evidence within the orthopaedic literature, specifically, studies related to the treatment of hip fractures, is lacking. In the four groups of studies that were examined, fewer than half of the studies that could have been cited by newer studies actually were cited. One possible reason for this finding may be that the investigators only cited studies that were available to them at the beginning of the trial; however, this should not have prevented investigators from referencing recently published studies in the Discussion section of their final report. Furthermore, our sensitivity analysis, which assumed two, three, and five-year lag times, illustrated that even after increasing the grace period for investigators to cite recent studies, the overall citation rate still only hovered just above the 50% mark.
Our cumulative meta-analysis indicates that the debate over whether internal fixation or arthroplasty is better for patients with displaced femoral neck fractures may in fact have been resolved as early as a decade ago. An overall citation rate of 46% for the seven studies published after the 2000 study by van Dortmont et al.24 indicates that the researchers may not have recognized a clinically important cumulative effect size from past trials.
Our results also suggest that authors are more likely to reference positive studies published in higher-impact journals, which are journals in which the average article is cited at a greater frequency than articles published in low-impact journals. This was evidenced by our finding that nine of the fifteen “highly cited” studies were published in The Journal of Bone and Joint Surgery (American or British volume).
Review of Relevant Literature
In 2005, Fergusson et al. used a cumulative meta-analysis to demonstrate how a lack of citation of previous work can lead to redundancy within the literature4. Their investigation revealed that aprotinin greatly reduced the need for perioperative transfusion by the twelfth trial that was conducted, yet another forty-four randomized trials were performed over the next decade4. Our findings were similar to those of Fergusson et al. in that another decade of randomized trials was conducted even after it was well established that arthroplasty was associated with a significantly lower risk of reoperation. These observations suggest that although there is no definitive answer to how many trials are necessary before the results are deemed conclusive, the researcher should take into account the cumulative effect size representing the collective results from all previous trials and should determine whether a clinically important difference has already been reached before pursuing approval for a new trial.
Gilbert et al. conducted a systematic review of observational studies evaluating the evidence for infant sleeping position and the incidence of sudden infant death syndrome72. Their cumulative meta-analysis showed that a clear association between front sleeping and sudden infant death syndrome had been demonstrated more than twenty-five years earlier72. Our study made use of similar methodology to demonstrate the poor citation practices within orthopaedics. To our knowledge, poor citation practices have not been demonstrated in the orthopaedic literature previously.
A study by Gøtzsche demonstrated that reference bias exists when researchers simply scan the reference lists of articles to locate relevant literature73. The data in Figure 1 suggest that reference bias may have played a role in the poor hit rate of a number of trials within our study. Our findings are similar to those of Fergusson et al., who also found that studies that were missed early often remained uncited in subsequent articles4. As Fergusson et al. suggested, this finding likely indicates that researchers are relying on the incomplete literature reviews performed by others4.
Strengths and Limitations
Our study had a number of strengths. To our knowledge, the use of a cumulative meta-analysis to assess the precision of evidence over time has not been reported previously in the orthopaedic literature. In addition, calculating the citation rate and the hit rate across a variety of studies addressing three different issues within orthopaedics allowed us to confirm our findings from our primary question regarding internal fixation and arthroplasty.
The current investigation also had limitations. In an effort to provide an example, our cumulative meta-analysis only examined two-year revision rates. However, other outcomes such as mortality and function also contribute to assessing the superiority of a particular intervention. A recent meta-analysis showed that there are no differences in long-term mortality rates between arthroplasty and internal fixation and that arthroplasty leads to better overall function38.
An alternative explanation for our findings is that new trials were conducted to improve on retrospective studies. Our data (see Appendix) show gross trends in this direction. Despite our best efforts to conduct a comprehensive search for all randomized trials comparing internal fixation and arthroplasty, we may not have identified all relevant articles because of publication and selection bias. We attempted to limit these biases by thoroughly searching through many databases and bibliographies for all potentially eligible studies. Our funnel plot, while inconclusive for publication bias, did show that very few negative studies have been published.
Implications for Future Research
The results of the present study not only confirm the prevalence of incomplete citation practices within the medical literature but also illustrate their consequences, such as the continued exposure of patients to interventions that have proved to be less effective. A number of measures are in place to prevent unnecessary and unethical trials from taking place; among the most important is the institutional review board.
The institutional review board exists to ensure that research involving human participants is conducted in an ethical manner. To do this, the institutional review board must closely examine the background and rationale behind a proposed research idea. At this junction, the institutional review board can potentially stop a trial before it even begins if it suspects that the researcher is proposing a trial for which a well-established conclusion already exists. However, at most institutions, the members of the institutional review board may not have expertise in the topic of the submitted protocol, thereby affecting their ability to adequately interrogate the literature. Whether it would be beneficial to have investigators submit a systematic review of relevant existing research to help demonstrate that the proposed research is in fact necessary as part of the institutional review board process remains unknown74.
Government funding agencies (e.g., the Canadian Institute of Health Research) mandate that investigators proposing randomized trials provide strong evidence from a comprehensive review of the literature that their research question is relevant and timely. Authors also may use the National Institutes of Health's online directory of clinical trials (www.clinicaltrials.gov) to ensure that they are not duplicating research.
Journals that publish the results of these trials are also charged with the responsibility of ensuring that they only accept scientific articles that are ethical and that add value to the current literature. The aforementioned CONSORT statement helps in this regard by providing journal editors and reviewers with a checklist to assess the overall quality of the trial1. A recently updated version of the CONSORT statement once again outlines the importance of providing study results in the context of a systematic review within the Discussion section75. Despite the presence of the CONSORT, researchers continue to fail to cite systematic reviews when reporting the results of their trial2,3.
In order to improve poor citation practices in the future, postgraduate medical education needs to place more emphasis on obtaining fundamental research skills as a core competency during residency training.