We conducted a systematic review of meta-analyses in orthopaedics published in the years 2005 and 2008 and compared them with a previous systematic review of meta-analyses from 1969 to 19997.
Eligibility Criteria
In order to be included in our study, a meta-analysis met the following criteria: (1) the study was described as a "meta-analysis," or, if not, statistical pooling of the results was conducted; (2) primary studies included in the meta-analysis were directly relevant to the practice of orthopaedic surgery (that is, they had to involve subjects such as thromboembolism prevention, arthroplasty, the spine, trauma, pediatrics, sports medicine, and the upper extremity); and (3) studies were published in either 2005 or 2008.
Study Identification
We performed a computerized search of MEDLINE, EMBASE, and the Cochrane Database for Systematic Reviews separately for 2005 and 2008, including the following search terms: "meta-analysis" OR "meta-anal:" OR "systematic: review" OR "systematic: overview" AND "orthopedics" OR "musculoskeletal system" OR "fractures" OR "pediatrics" OR "spine" or "hand injuries" OR "sports medicine." In addition, the bibliography of each meta-analysis was reviewed by two of us (B.G.D. and B.W.K.) for additional relevant studies. Finally, content experts (that is, those with an interest in meta-analyses) were asked to identify additional studies that may have been missed by our search strategy. The title of each article retrieved from our search was independently reviewed by two authors (J.A.K.A. and H.J.C. for 2005 studies and B.G.D. and B.W.K. for 2008 studies) for its relevance to orthopaedic practice. Any discrepancies were settled by consensus, or, if needed, by two senior authors (R.W.P. and M.B.). Whenever a meta-analysis appeared to be eligible by its title alone, the complete article was retrieved.
Assessment of Methodological Quality
Each eligible meta-analysis was independently reviewed by two of us (J.A.K.A. and H.J.C. for 2005 studies and B.G.D. and B.W.K for 2008 studies) for methodological quality. The Oxman and Guyatt index was utilized to score the methodology of the meta-analyses9,10. This index is based on ten items, the last of which gives a final score to the overall quality of the review article, ranging from 1 to 7 points, with 7 points indicating that the study contains minimal flaws; 5 points, minor flaws; 3 points, major flaws; or 1 point, extensive flaws (see Appendix). This validated index, developed to assess the scientific quality of research overviews, contains criteria that are related to key tasks entailed in conducting a research overview, such as study identification and validation of studies. The assessors reviewed the instructions of the Oxman and Guyatt index (see Appendix) and clarified any issues with other reviewers and at least one individual with epidemiology training (M.B. and R.W.P.). Any discrepancies in scoring between the reviewers were resolved by consensus.
Assessment of Study Impact
To evaluate the impact of high-quality meta-analyses on clinical research, we assessed the number of articles that cited meta-analyses with minor to minimal flaws (a quality score of 5, 6, or 7) from 2008 with use of the Science Citation Index.
Data Abstraction
For each of the eligible meta-analyses, the relevant data were abstracted by one of us and were rechecked for accuracy by another. Specifically, we abstracted the following information: (1) the name of the journal, (2) the year of publication, (3) citation of a degree (MSc, PhD, or MPH) in epidemiology or biostatistics for any author, (4) the subspecialty or category (fracture treatment, degenerative disease, diagnostics, thrombosis, and miscellaneous), (5) the method of statistical pooling, (6) the design of the primary studies (randomized, quasi-randomized, observational, or mixed), and (7) the direction of the conclusion (positive if the findings of the meta-analysis were significant or negative if no significant difference between variables was reported).
We used similar data from a previous review of meta-analyses from 1984 to 1999 that was published in 20017, to compare the quality and characteristics of meta-analyses over time.
We excluded the meta-analyses from 1984 to 1999 that used simple addition in our analysis to compare the proportions of meta-analyses with different pooling methods over time, since meta-analyses with simple addition pooling were not included in our study of meta-analyses from 2005 and 2008.
Assessment of Reviewer Agreement
A kappa statistic was calculated to determine the concordance of reviewers. For variables with more than two categories, we used weighted kappa with quadratic weights, which yields values identical to intraclass correlation coefficients. We chose an a priori criterion of kappa = 0.65 or greater for adequate agreement11.
Data Analysis
Prior to analyzing the data, we developed hypotheses regarding the quality and quantity of meta-analyses over time. Specifically, we hypothesized that the number of meta-analyses would have increased since 1999, and that the quality of meta-analyses from 2005 and 2008 would be higher than that of those published prior to 2000. In addition, we hypothesized that meta-analyses with lower quality scores would be more likely to produce a positive result. Finally, we hypothesized that Cochrane meta-analyses had higher overall quality scores compared with non-Cochrane meta-analyses. The extent to which the meta-analyses fulfilled each item on the Oxman and Guyatt index was compared with use of the chi-square test. The distributions of quality score (1 to 7 points), the fulfillment of items 1 to 9 from the Oxman and Guyatt index, the proportions of meta-analyses with methodological flaws, the designs of the primary studies, and the methods of pooling across different time periods were compared with the chi-square test. Moreover, the chi-square test was used to evaluate the relationships between the overall quality score and the results of the meta-analyses. The mean quality scores of the three time periods (1984 to 1999, 2005, and 2008) and of the five categories of meta-analyses (fracture treatment, treatment of degenerative disease of the spine or joints, thrombosis prevention, evaluation of a diagnostic test, and miscellaneous) were compared with analysis of variance. An independent t test was used to compare the mean quality scores between meta-analyses published in Cochrane and meta-analyses published in other journals. For all statistical analyses, a p value of <0.05 was considered significant and all tests were two-tailed.
Source of Funding
No external funding source was used for this study.
Study Identification
Three thousand two hundred and fifty potentially relevant citations from 2005 and 3484 citations from 2008 were identified. The application of the criteria for eligibility eliminated 3009 and 3397 studies that were not related to orthopaedics from 2005 and 2008, respectively; 147 and forty-three studies, respectively, that were not formal meta-analyses; and forty-nine studies from 2005 that were duplicate articles. Bibliography searches and content expert consultation did not result in additional studies. Thus, forty-five meta-analyses from 2005 and forty-four meta-analyses from 2008 met all of the inclusion criteria (Fig. 1) (see Appendix). Agreement between reviewers with respect to the eligibility of the meta-analyses was substantial (kappa = 0.81 and 0.77 for 2005 and 2008, respectively).
Characteristics of the Meta-Analyses
The eighty-nine meta-analyses from 2005 and 2008 were published in thirty-five different journals (Table I). Compared with the original review of meta-analyses, a greater number of meta-analyses were published in each of 2005 (forty-five meta-analyses) and 2008 (forty-four meta-analyses) than were published during the sixteen-year period from 1984 to 1999 (forty meta-analyses). Across both years (2005 and 2008), thirty-two studies (36%) included an author with a higher degree in research (MPH, MSc, or PhD) or an author affiliated with a department of statistics, public health, or epidemiology. Similar to the review of studies from the period 1984 to 1999, the largest proportion of meta-analyses pooled data from randomized trials (42% in 2005 and 43% in 2008), whereas significantly fewer meta-analyses included a mix of randomized and observational studies (11% in 2005 and 21% in 2008) compared with those published in 1984 to 1999 (40%) (p = 0.006) (Table II).
Compared with the meta-analyses from 1984 to 1999, more meta-analyses in the present review were published in the Cochrane Database of Systematic Reviews (twenty-five in 2005 and six in 2008) and fewer in internal medicine journals (Table I). Four specific categories of meta-analyses were identified: (1) fracture treatment (twelve reviews from 2005 and ten reviews from 2008), (2) treatment of degenerative joint disease (fifteen and eleven reviews, respectively), (3) prevention of deep venous thrombosis in orthopaedic patients (one and two reviews, respectively), and (4) evaluation of a diagnostic test (zero and nine reviews, respectively).
Methods of Statistical Pooling
The pooling methods used by meta-analyses differed significantly among the periods 1984 to 1999, 2005, and 2008, with random-effects models and a combination of both random and fixed-effects models becoming more popular toward 2008 (p = 0.001 for both analyses) (Table III).
Overall Scientific Quality
The level of agreement between reviewers in assessing the quality of the meta-analyses was substantial (kappa = 0.75 [95% confidence interval, 0.59 to 0.86] for 2005 and 0.71 [95% confidence interval, 0.44 to 0.85] for 2008). The mean score (and standard error of the mean) for the overall quality was 5.2 ± 0.26 points for the forty-five meta-analyses from 2005 and 4.6 ± 0.30 points for the forty-four meta-analyses from 2008. A significantly lower proportion of meta-analyses from 2005 (56%) and 2008 (68%) had methodological flaws (a score of =6 points) compared with meta-analyses from 1984 to 1999 (88%) (p = 0.006) (Fig. 2). Of all meta-analyses with methodological flaws, the proportion having major-to-extensive flaws (a score of 1, 2, 3, or 4 points) remained approximately the same from the period 1984 to 1999 (thirteen studies; 37%) to 2005 (eight studies; 32%) and 2008 (thirteen studies; 43%) (p = 0.64) (Fig. 2). Also, fewer meta-analyses from 2008 were given the lowest possible score compared with those from 1984 to 1999 (8% and 13%, respectively)7. Significantly more studies achieved the highest possible Oxman and Guyatt score in 2008 compared with those published before 2000 (32% and 12%, respectively; p = 0.006)7. Cochrane meta-analyses had higher overall quality scores compared with meta-analyses published elsewhere (5.9 and 4.4, respectively; p < 0.001), and were less likely to have major to extensive flaws compared with non-Cochrane reviews (3% and 35%, respectively; p = 0.001)7.
The quality of meta-analyses in the degenerative disease category improved significantly from the period of 1984 to 1999 compared with that in 2008 (p = 0.018). While the number of meta-analyses published from 1984 to 2008 increased, the mean quality score did not change over time (Fig. 3) (p = 0.067). No significant association was found between the quality of the meta-analyses and the direction of the conclusions for all periods (1984 to 1999, 2005, and 2008) (Table IV). That is, positive or negative findings of a meta-analysis were not associated with its quality score.
Individual Item Scores (Items 1 Through 9)
Interobserver agreement among reviewers across questions in the Oxman and Guyatt index (items 1 through 9) ranged from 0.48 to 0.91 in 2005 and from 0.65 to 0.80 in 2008. The proportion of meta-analyses that fulfilled each question ranged from 47% to 100% in 2005 and from 61% to 96% in 2008 (see Appendix). Over four times as many meta-analyses in 2008 satisfied all nine categories in the Oxman and Guyatt quality index than did those from the period 1984 to 1999 (44% and 10%, respectively)7. With the exception of item 4 (bias in the selection of studies avoided), a significantly higher proportion of meta-analyses from 2005 fulfilled the Oxman and Guyatt criteria compared with those from 1984 to 1999. However, we found a significant decline in the proportion of meta-analyses from 2005 to 2008 that fulfilled items 1, 2, 6, 8, and 9 of the criteria.
Meta-analyses from 2008 with positive conclusions were less likely to use and report appropriate criteria for the validity assessment than were meta-analyses reporting negative results (p = 0.04 for item 5 and p = 0.01 for item 6) (see Appendix).
Impact of High-Quality Meta-Analyses
Of nineteen meta-analyses from 2008 with minor to minimal flaws, five were cited by a mean of 2.8 investigations.
We conducted a systematic review of the literature to identify recently published meta-analyses that focused on issues relevant to orthopaedic surgery and compared them with meta-analyses published prior to 2000. Our findings suggest a dramatic increase in the number of orthopaedic-related meta-analyses since 1984, with fewer methodological flaws in those published in 2005 and 2008.
Our review is strengthened by a focused primary question, strict eligibility criteria, a systematic approach including a comprehensive search strategy, duplicate assessment of methodological quality, assessment of agreement among reviewers, and use of a validated measure to assess the methodological rigor of the meta-analyses.
Our review has limitations. Our findings may not be representative of other surgical or medical subspecialties. We were reassured, however, by the variety of journals publishing orthopaedic-related meta-analyses, including those in the fields of orthopaedic surgery, radiology, emergency medicine, rheumatology, and internal medicine, suggesting our findings may, indeed, be generalizable beyond only orthopaedic journals. While the Oxman and Guyatt index measures the methodological rigor of a meta-analysis, it provides little insight into the importance or relevance of the meta-analysis itself. In addition, it is based on what is actually reported in the published paper. Lack of reporting of important methodological safeguards limits the extent to which the index accurately reflects the overall quality of a meta-analysis. The assumption that lack of reporting represents lack of presence of a safeguard has been challenged by Devereaux et al.12. After contacting the authors, they found that methodological safeguards had been used frequently in randomized controlled trials that initially had failed to report them.
Currently, there is no universal acceptance regarding the optimal instrument for assessing meta-analytic quality. As opposed to the validated and reliable AMSTAR instrument for the assessment of multiple systematic reviews13, the Oxman and Guyatt instrument does not include aspects such as the assessment of publication bias and whether the authors had conflicts of interest. In addition, the Quality of Reporting of Meta-analyses (QUORUM) checklist14 has been developed and contains a much broader spectrum of quality aspects compared with the Oxman and Guyatt instrument. For example, the QUORUM checklist also examines whether heterogeneity was evaluated and whether a sensitivity analysis was performed. However, the Oxman and Guyatt index is associated with a high interobserver agreement15 and may be a more feasible instrument as the number of items is limited.
While a comprehensive search of the literature was performed, there is a possibility that potentially relevant meta-analyses were omitted for the following reasons: (1) only meta-analyses published in the English-language literature were identified, (2) only published meta-analyses (or those accepted for publication) were retrieved, and (3) there might be a publication bias against meta-analyses that do not have significant findings. However, the meta-analyses in our study likely are a representative sample of the total number of meta-analyses in this field that would be readily accessible to most orthopaedic surgeons.
An additional limitation is that we retrieved meta-analyses only from 2005 and 2008, thereby not evaluating those published in 2006 and 2007. Our decision to include two time points rather than simply 2008 strengthens our inferences about improved quality and volume over the past four years. Given the nonsignificant differences in findings between 2005 and 2008, we are confident that including 2006 and 2007 meta-analyses would have yielded little additional information.
Strengths of Inference from Meta-Analysis
A meta-analysis is a systematic review in which statistical analyses are used to combine the results of several independent clinical trials2,16. By pooling the evidence, a meta-analysis has the ability to improve the power lacking in small clinical studies17. Other benefits include the ability to detect publication bias and sources of heterogeneity across studies18. The increased number of papers published about meta-analyses in the past ten years reflects their increased influence on medical research2.
The validity of meta-analyses depends on the quality of the primary studies and the extent to which meta-analyses adhere to methodological guidelines. The largest proportion of meta-analyses in our review pooled data from randomized trials (42% in 2005 and 43% in 2008). As the design of a study is not decisive of its quality (e.g., a randomized trial may be of poor quality, and an observational trial may be of high quality), our study did not take into account the quality of primary studies in the assessment of the quality of meta-analyses, and our findings are based solely on methodological aspects. Among eighty-nine meta-analyses reviewed in 2005 and 2008, we identified thirty-four (38%) of high methodological rigor.
Meta-analyses of low quality have been associated with an overestimation of treatment effect19. Dixon et al. reviewed fifty-one meta-analyses and identified a greater magnitude of treatment effect as meta-analysis quality decreased20. Our findings, however, failed to identify an association between the conclusions (positive or negative) of the meta-analyses and the quality score. However, we observed a trend toward positive conclusions in meta-analyses with lower quality scores, consistent with previous reviews20,21. Other studies have evaluated the observed effects found in observational primary studies compared with randomized controlled trials and found no relation between the quality of studies and the estimates of treatment effects22,23. However, these studies did not associate the quality of meta-analyses, but rather the quality of primary studies, with treatment effect size.
A meta-analysis distinguishes itself from narrative reviews by pooling the results across studies. Pooling data from different studies is appropriate when the results of the primary studies are sufficiently homogeneous (widely overlapping confidence intervals, similar point estimates of treatment effect, and a nonsignificant statistical test for heterogeneity)24. A previous review of meta-analyses found that more than half of them used simple addition as a pooling method7. Given the potential for misleading conclusions when results from primary studies are simply averaged25, we excluded systematic reviews that used simple addition as a pooling method from this study. When meta-analyses that used simple addition in the period from 1984 to 1999 were excluded, we observed a significant increase in the use of the random-effects model over time.
Quantity and Scientific Quality of Meta-Analyses
The number of meta-analyses related to orthopaedic surgery increased dramatically in the last decade. Our findings suggest a fivefold increase in meta-analyses from 1999 to 2008.
We were reassured that the increase in numbers also resulted in an improvement in methodological rigor. Over four times as many meta-analyses in 2008 satisfied all nine categories in the Oxman and Guyatt quality index than did those from the period 1984 to 1999 (44% and 10%, respectively)7. Also, fewer meta-analyses from 2008 were given the lowest possible score compared with meta-analyses from 1984 to 1999 (8% and 13%, respectively)7. Significantly more studies achieved the highest possible Oxman and Guyatt score in 2008 compared with those published before 2000 (32% and 12%, respectively; p = 0.006)7.
Similar to findings of a recent review of critical-care meta-analyses published in 1994 to 200126, we found that Cochrane meta-analyses have higher overall quality scores compared with meta-analyses published in other journals and fewer Cochrane meta-analyses have major to extensive flaws. Our findings are also consistent with other reviews noting a higher quality of reporting in Cochrane reviews than in non-Cochrane reviews27-30. Since Cochrane reviews are of higher quality, our main study findings of improved methodological quality over time may partly be due to the increased number of Cochrane reviews in 2005 and 2008 compared with the period 1984 to 1999.
Meta-analyses in the field of orthopaedic surgery performed favorably compared with those published in other subspecialties. We found mean quality scores of 4.6 points (maximum, 7 points) in 2008. Comparatively, meta-analyses related to general surgery20, critical care31, otolaryngology32, and anxiety disorders33, published from 1994 to 2007, were reported to have mean quality scores ranging from 3.3 to 4.0 points (maximum, 7 points). Similar to our findings, Delaney et al.28 also found that the number of published meta-analyses pertaining to critical care increased from 1994 to 2004.
Value of High-Quality Meta-Analyses
Because a meta-analysis accounts for most if not all clinical research studies, its results are highly valued and can be used to effect a change in clinical practice. Preferably, one would see meta-analyses with high quality to have an accordingly major impact on clinical research, which would be followed by adoption of their findings into guidelines and a change of clinical practice. One way to measure the impact and influence of scientific studies is through the Science Citation Index. Of eighteen high-quality meta-analyses from 2008, only five were cited. This may reflect the fact that high-quality meta-analyses in orthopaedics are not always recognized as valuable by clinical investigators.
Implications of Our Findings
Although meta-analyses in orthopaedic surgery have improved in number and methodological quality over the past twenty years, there is considerable room for improvement. In 2005 and 2008, respectively, 18% and 30% of the meta-analyses continued to have major to extensive flaws in their methodology, suggesting an ongoing need for education and training in the area. Given the increased number of published meta-analyses and the fact that the majority of meta-analyses still contain methodological flaws, one might become concerned about this so-called overuse of poorly conducted meta-analyses in orthopaedics. However, this in no way decreases the need to continue to produce high-quality meta-analyses that may assist the orthopaedic surgeon in making valid decisions in clinical practice. Journal editors and readers should consider routine checklists such as the Oxman and Guyatt index as an easy, reliable screen for methodological quality during the peer-review process to ensure that meta-analyses meet scientific methodology requirements for publication.
The value of meta-analysis in our field can only be realized with high-quality methods addressing important clinical questions. The extent to which randomized trials continue to gain popularity in orthopaedics as the primary design to study a therapy is likely to fuel the ongoing rise of meta-analysis as a critical methodology in the practice of evidence-based orthopaedics.