A meta-analysis is a comprehensive systematic review in which statistical analyses are used to combine the results of two or more homogeneous clinical trials1-4. In the paradigm of evidence-based orthopaedics, meta-analysis can be an invaluable tool to increase study power and improve precision when study results are combined appropriately5. Evidence from a meta-analysis, however, is highly dependent on the quality of the primary studies included and on the overall methodological rigor6. Meta-analyses have impacted clinical practice, paved the way for clinical practice guidelines and health policies, and guided subsequent research6,7.
The extent to which meta-analyses in joint arthroplasty—one of the most commonly performed elective procedures in orthopaedics—are of high quality remains largely unknown. Previous reviews have examined orthopaedic meta-analyses in general, with little specific focus on joint arthroplasty (involving the hip, knee, ankle, shoulder, elbow, wrist, and metacarpophalangeal joints)5,8. We are aware of no study that has focused on the numerous subcategories within arthroplasty, including pharmacology (thromboprophylaxis, blood conservation, antibiotic treatment, and bisphosphonate treatment), anesthesia, surgical outcome, radiology, rehabilitation, education, and the hip, knee, ankle, shoulder, elbow, wrist, and metacarpophalangeal joints. Thus, it is difficult to make precise recommendations about the quality of reporting in the arthroplasty literature.
The purpose of our study was to assess the scientific rigor of meta-analyses involving arthroplasty, the overall quality of reporting, and the extent to which the results of these meta-analyses were perceived to be important to clinical practice.
Our systematic review was conducted with adherence to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) reporting guidelines for systematic reviews, as outlined below9.
Search Strategy and Eligibility Criteria
We aimed to identify all published meta-analyses in the field of arthroplasty, including those involving the hip, knee, ankle, shoulder, elbow, wrist, and metacarpophalangeal joints. A comprehensive electronic search of the current literature up to November 2009 was performed with use of the MEDLINE, EMBASE, Cochrane, and CINAHL databases. The key words used for the search strategy were “arthroplasty OR total knee OR total hip OR replacement knee OR replacement hip OR total shoulder OR replacement shoulder OR total elbow OR replacement elbow OR total wrist OR replacement wrist OR total ankle OR replacement ankle OR metacarpal OR replacement metacarpal” AND “metaanalysis OR meta-analysis OR meta-analyses OR meta.”
In order to be included in the study, a meta-analysis was required to meet the following criteria: (1) the study was described as a “meta-analysis,” or, if not, statistical pooling of the results was conducted; (2) the study was directly related to arthroplasty surgery; and (3) the full text was available in English. A study was excluded if it was an abstract, a letter to the editor, or a subset analysis of another meta-analysis.
Study Identification
Two independent reviewers (C.V. and S.B.) scanned titles and abstracts for potential relevance. The reviewers were blinded to authors, geographic location, and hospital affiliation. Disagreements between the reviewers were resolved by consensus. If a disagreement remained, a third reviewer (R.S.) resolved the disagreement by consensus after discussing the inclusion and exclusion criteria with all reviewers and applying them. The full text of each potentially relevant article was then obtained for further evaluation.
Data Extraction and Assessment of Methodological Quality
One author (R.S.) extracted demographic data from each article, including the title, author's hospital affiliation, journal in which the study was published, year of publication, number of primary studies included, total number of cases included, type of study, name and category of intervention, region of the body, financial support, design of the primary study, and rationale and method of statistical pooling. These data were then cross-referenced by the second reviewer. Two authors (C.V. and S.B.) independently reviewed all eligible meta-analyses. Scoring for the quality of reporting was performed with use of the guidelines in the PRISMA Statement9, and scoring for methodological quality was performed with use of the Oxman and Guyatt index10,11.
The Oxman and Guyatt index (see Appendix) is based on ten items, with the tenth item summarizing the other nine items in a final score that reflects the overall quality of the article10,11. The score can range from 1 to 7 points, with 7 indicating that the study contains minimal flaws; 5 or 6 indicating minor flaws; 3 or 4 indicating major flaws; and 1 or 2 indicating extensive flaws. This validated index focuses on assessing the scientific quality of the research by highlighting tasks that are necessary in performing a well-conducted meta-analysis.
The PRISMA Statement is an evidence-based minimum set of items needed for high-quality reporting of systematic reviews and meta-analyses9. Its aim is to help authors to improve reporting. The PRISMA Statement consists of a twenty-seven-item checklist (arranged in seven main categories) and a four-phase flow diagram. It is an update and expansion of the QUORUM (Quality of Reporting of Meta-analyses) Statement. The assessors reviewed the instructions accompanying both the Oxman and Guyatt index and the PRISMA Statement. After each item in each scoring tool was discussed, any differences in understanding the scoring process for the item were discussed with a third independent investigator (R.S.) and resolved by consensus.
Assessment of Clinical Relevance
Three well-established, acclaimed, fellowship-trained joint arthroplasty surgeons independently reviewed the meta-analyses and assessed the extent to which they believed the findings to be important for clinical practice. They were blinded to authors, associated hospitals, year of publication, and journal of publication. Each meta-analysis was rated as (1) not clinically important (and expected to have little impact on clinical practice), (2) potentially clinically important (and possibly having some impact on clinical practice), (3) clinically important (and expected to have some impact on practice), or (4) clinically very important (and expected to have a large impact on practice).
Data Analysis
Agreement between reviewers and between assessors was calculated with use of the kappa statistic. If a variable had more than two possible values, quadratic weighting was used to yield a kappa value identical to the intraclass correlation coefficient. A kappa value of >0.65 was considered to represent adequate interrater concordance5.
The distribution of Oxman and Guyatt scores (which measure scientific rigor and have a maximum score of 7 points) was summarized as the mean and the standard deviation. We transformed the PRISMA checklist to provide an aggregate score with a possible range of 0 to 28 points; for each of the twenty-seven items on the checklist and for the flow chart, the study was given 1 point if the item was present within the reporting or 0 if it was absent. The overall percentage of items that were present was then calculated from this 28-point scoring scale. We chose an a priori criterion of >85% present to denote high quality, 75% to 85% to denote moderate quality, and <75% to denote poor quality of reporting.
Proportions were compared with use of the chi-square test, and means were compared with use of the Student t test. All tests were two-tailed, and a p value of <0.05 was considered significant.
Source of Funding
No external funding source was used for this study.
Literature Search
The literature search identified 394 potentially relevant articles. After removing duplicate titles, 262 studies remained. Screening of the titles and abstracts eliminated 182 articles that did not have arthroplasty as the primary focus or were not meta-analyses. The full texts of the remaining eighty articles were screened for eligibility. One of the papers was excluded because it focused on all orthopaedic procedures, and two were excluded because they focused on thromboprophylaxis treatment in multiple surgical procedures rather than only in arthroplasty procedures. The remaining seventy-seven articles were included in the final analysis (Fig. 1).
Characteristics of the Meta-Analyses
Fifty (65%) of the meta-analyses were published in the most recent five years (2005 through 2009). For example, the number of meta-analyses published in 2009 (eighteen) was nine times greater than the number published in 1993 (two) (Fig. 2). The meta-analyses were published in thirty-four different journals, with The Journal of Bone and Joint Surgery (American Volume) and The Journal of Arthroplasty publishing 34% (twenty-six) of the studies (see Appendix). The majority (58%, forty-five) of the studies were conducted in North America. Thromboprophylaxis and blood conservation topics were the most common (22%, seventeen) (see Appendix). Meta-analyses pooled an average of twenty-three studies (range, three to 181, with two meta-analyses not specifying the number). The total number of patients pooled in the analyses ranged from fifty to 231,943 (mean, 8084).
Scientific Rigor and Quality of Reporting
Oxman and Guyatt Checklist
The overall mean Oxman and Guyatt score suggested the presence of major flaws in the meta-analyses (mean = 4.56, standard deviation = 1.51). Agreement between the reviewers in assessing the quality of the meta-analyses with use of the Oxman and Guyatt score was substantial (kappa = 0.75, 95% confidence interval = 0.6 to 0.9). Five meta-analyses (6%) had extensive flaws, thirty-four (44%) had major flaws, thirty (39%) had minor flaws, and eight (10%) had minimal flaws (Fig. 3). Meta-analyses of higher quality (minor or minimal flaws) were significantly more likely than meta-analyses of lower quality (major or extensive flaws) to report three of the nine items of the Oxman and Guyatt index: the criteria used for assessing the validity of the included studies, the appropriateness of the assessment of validity, and the methods used to combine the included studies (see Appendix). The mean Oxman and Guyatt score remained approximately constant from 1994 through 2009 (see Appendix).
PRISMA Checklist
The mean PRISMA score suggested limitations in reporting in the meta-analyses (mean [and standard deviation] = 74% ± 13.8%, range = 25% to 96%) (see Appendix). Thirty-three (43%) of the meta-analyses had poor-quality reporting, twenty-two (29%) had moderate-quality reporting, and twenty-two (29%) had high-quality reporting. Agreement between the reviewers was good (kappa = 0.66, 95% confidence interval = 0.56 to 0.76). The meta-analyses of higher quality were significantly more likely to report eighteen of the twenty-eight items on the PRISMA checklist (see Appendix). These items were an adequate title, protocol information, all information sources, the process of study selection, the data collection process, all data variables and assumptions, the process of assessing the risk of bias in individual studies, the process of synthesizing the results, the risk of bias affecting the cumulative evidence, any additional analyses done, the extracted characteristics, the risk of bias within individual studies, the results of the individual studies, an assessment of the risk of bias in the cumulative evidence, the results of any additional analyses performed, the limitations, a general interpretation, and the presence of a flow chart. PRISMA scores remained approximately constant from 1993 through 2009.
Impact of the Meta-Analyses
In the judgment of the independent assessors, only 51% (thirty-nine) of the meta-analyses had reported findings that were sufficiently important to impact clinical practice (Fig. 4). In addition, the results of 43% (thirty-three) of the meta-analyses reflected current clinical practice compared with 14% (eleven) that did not. On average, meta-analyses that were of higher quality as assessed with use of the Oxman and Guyatt score were deemed to have greater impact on clinical practice than those with major flaws. Seven (87%) of the eight very-high-quality meta-analyses were consistent with clinical practice. Meta-analyses dealing with thromboprophylaxis and improvements in surgical technique tended to have the greatest impact on clinical practice. Of note, thirty-three meta-analyses (43%) deemed to be clinically unimportant were consistent with current clinical practice.
Our findings were consistent with the generally reported increase in the conduct of meta-analysis over time in the medical literature12,13. The increase in the use of meta-analysis as a research tool is not surprising. Bhandari et al. reviewed 2331 articles published in 2000 in fifteen orthopaedic journals and reported that 15% of the 110 reviews met the criteria for a rigorous systematic review. These reviews received more than twice as many citations as the other reviews (13.8 compared with 6.0, p = 0.008). The rigor of the review was a predictor of the number of citations in both orthopaedic journals (p = 0.01) and nonorthopaedic journals (p = 0.03)14. Meta-analyses, which represent a highly rigorous review of the literature, therefore have a great potential impact on subsequent research. Meta-analyses differ from narrative reviews by addressing a single focused clinical question through a comprehensive search strategy involving strict eligibility criteria and assessment of the validity of included articles, and through pooling results across studies5,8.
Unfortunately, not all published meta-analyses are performed rigorously6,15-23. Their strength is derived from the quality of the primary studies5, and the effect of that quality has been extensively documented. A few studies have found that the magnitude of the estimated treatment effect increased as the quality of the primary studies decreased8,24-29. This overestimation of the effect was as great as 41%, highlighting the challenge of interpreting the results when the quality of the meta-analysis is poor6. Adherence to the use of rigorous methodology and assessment tools is critical to limit bias in meta-analyses5,8,30.
Overall, one-half of the meta-analyses in the arthroplasty literature were found to have major or extensive methodological flaws. Only 10% were found to have minimal flaws. These results were consistent with those of Jadad and McQuay, who reported that seventy-two (90%) of eighty meta-analyses dealing with analgesic intervention were flawed31. Our results were also consistent with those of Bhandari et al., who found that only 10% of meta-analyses in orthopaedics between 1984 and 1999 satisfied all of the criteria in the Oxman and Guyatt quality index8. A follow-up study comparing these older meta-analyses with ones published in 2005 and 2008 showed a fourfold improvement (from 10% to 44%) in the percentage of meta-analyses in orthopaedics that satisfied all nine criteria of the Oxman and Guyatt index5. However, on the basis of our findings, the proportion of published meta-analyses involving arthroplasty that satisfy all of the criteria of Oxman and Guyatt remains as low as the percentage reported by Bhandari et al. in 2001 for meta-analyses in all of orthopaedic surgery (10%)8. The key differences in the meta-analyses with major flaws included a significantly lower tendency to report the validity criteria and the rationale for statistical pooling across the included studies.
In 1987, Sacks et al. evaluated the reporting in eighty-six meta-analyses in the English-language medical literature. They scored the meta-analyses with use of twenty-three items in six major areas (study design, combinability, control of bias, statistical analysis, sensitivity analysis, and application of results). They found that only 28% addressed all six major areas and concluded that an urgent need to improve meta-analysis methodology existed32. A follow-up study nine years later showed little improvement33.
Hypothetically, the extent to which the results of meta-analyses are perceived to be important to changing clinical practice may reflect their quality, with higher-quality meta-analyses tending to have a greater impact on clinical practice. Almost half (49%) of the meta-analyses were perceived by independent reviewers to not have important implications for clinical practice. However, given the lower quality of reporting of many of the included studies, it is not surprising that reviewers were skeptical of their impact in changing clinical practice. Prior to commencement of the study, we had hypothesized that the quality of meta-analyses would not correlate with their perceived clinical relevance. Our findings confirmed this hypothesis. Over half (51%) of the studies were found to be of moderate or high clinical relevance. However, only 29% of the studies were identified as high quality on the basis of the PRISMA criteria, suggesting that the clinical relevance of a study may have little to do with its overall methodological quality.
Our study was strengthened by its duplicated, rigorous, comprehensive literature search. To our knowledge, it was one of the first studies to assess not only the quality of the methodology but also the quality of reporting, as well as one of the first to use expert opinion to assess the clinical relevance of the meta-analyses. Despite our careful efforts, the study does have limitations. First, our clinical questions regarding the quality of meta-analyses pertained only to arthroplasty and may not be generalizable to other procedures. Second, the strong clinical focus of our three assessors involves primarily hip and knee arthroplasty, but they also assessed the clinical relevance of five meta-analyses involving other joints (two ankle studies, two shoulder studies, and one metacarpophalangeal joint arthroplasty study). We do not believe that this introduced a major limitation as none of the assessors identified difficulties in rating these five studies and as the remaining 94% of the seventy-seven meta-analyses that they assessed involved hip or knee arthroplasty. Third, despite being an excellent tool for assessing methodological quality, the Oxman and Guyatt index relies heavily on reporting of quality criteria by the authors of the meta-analysis. Underreporting of important methodological safeguards that were actually used limits the extent to which the index can reflect the overall quality of the meta-analysis. With this in mind, we attempted to compile a more complete assessment of the meta-analyses by focusing on quality of reporting and clinical relevance in addition to methodological quality.
Our findings confirmed that the scientific rigor and reporting quality of meta-analyses in the field of arthroplasty remain suboptimal. Furthermore, study quality may have a poor correlation with expert assessments of the relevance of the study in informing and guiding clinical practice. The acceptability of meta-analysis in joint arthroplasty may be strengthened by improvements in both reporting quality and education of readers in how to recognize meta-analyses with high methodological rigor and high-quality reporting. Adoption and promotion of standard checklists such as the Oxman and Guyatt index and the PRISMA Statement by journals may prove invaluable to increasing both reporting quality and reader awareness of the characteristics of high-quality meta-analyses. With improved emphasis, education, and standards in reporting to surgical investigators, meta-analyses can have a tremendous impact on clinical practice.