A protocol that had been written before this review was undertaken as recommended by the Quality of Reporting of Meta-analyses (QUOROM) statement5.
Search Strategy and Identification of Studies
All studies indexed in Ovid MEDLINE, Ovid EMBASE, the Cochrane Library, ISI Web of Science, and Scopus databases from 1950 or inception until January 31, 2009, that evaluated white blood-cell count, erythrocyte sedimentation rate, C-reactive protein level, and interleukin-6 as markers for total hip or total knee arthroplasty infection were identified. With the use of Boolean strategy, textword and subject headings included (1) type of prosthesis (joint prosthesis or arthroplasty, specific joint-associated procedures) and (exp hip joint/ or exp knee joint/ or hip.mp. or knee) and (2) the various markers (CRP, ESR, WBC, and IL-6), both as textwords and as subject headings. The bibliographies of relevant articles were further cross-checked to search for articles not referenced in the search. Studies of patients from all age groups that evaluated the use of markers prior to suspected prosthetic joint infection were evaluated. The selection of articles was performed by two authors (E.B. and T.M.). Raw data from the articles were used to reconstruct 2 × 2 tables (see data-extraction paragraph below). When not provided in the original article, the tables were reconstructed by using the reported sensitivity and specificity as well as the prevalence of prosthetic joint infection in the cohort and the total number of patients studied. When only summary qualitative values were reported, the authors of the original article were contacted by e-mail in order to obtain data allowing us to reconstruct the 2 × 2 tables. Two of the authors of the present study (D.O. and M.S.) were asked to review the current list of included papers and to conduct their own search. With use of this recapture strategy, no additional relevant publications were identified.
Eligibility Criteria
We included cross-sectional and longitudinal studies that enrolled participants with true diagnostic uncertainty. Tests of interest were blood or serum measurement of the white blood-cell count, C-reactive protein level, interleukin-6 level, and erythrocyte sedimentation rate. Eligible studies had a reference standard for diagnosing prosthetic joint infection and calculated the accuracy of inflammation markers test results, with results expressed (1) as both sensitivity and specificity or (2) as a likelihood ratio. We included studies regardless of their publication status, language, or size.
Quality Assessment
Two reviewers (E.B. and G.T.) working independently and in duplicate analyzed the included articles to assess the reported quality of the methods with use of the tool for quality assessment of studies of diagnostic accuracy included in systematic reviews (QUADAS)6. All authors were contacted to obtain their response to the QUADAS questions. When responses were different from the author's assessment, the paper was reviewed and a consensus answer was derived. As there is not a widely accepted standard and validated definition of prosthetic joint infection, five of the authors (E.B., T.M., J.S., D.O., and M.S.), all experts in the field, agreed on the following grading system for the definition of prosthetic joint infection, with use of a consensus process: (1) two or more periprosthetic cultures showing growth of the same organisms, or the presence of a sinus tract communicating with the prosthesis (best), or (2) presence of acute inflammation on histopathologic examination of periprosthetic tissue or the presence of purulence in the periprosthetic space (good), or (3) any other definition loosely based on culture or operative findings but not further specified (mediocre).
Data Extraction
Three reviewers (E.B., G.T., and T.M.), working independently, used a standardized form to extract descriptions of study participants, including the diagnostic tests performed, the cutoff or range definitions of the tests, whether the cutoff values were derived with use of receiver operator characteristic curves or were predetermined by the study authors, and the nature and characteristics of the reference standard used. To extract data for the estimation of diagnostic accuracy measures, we used the cutoff values that the authors chose to use in the primary studies. If more than one cutoff was reported or if the results were reported at the individual patient level, then we used cutoff values that offered the best test performance.
Author Contact
We sent letters to the corresponding authors (or any other author with a contact address listed on the main manuscript) of each of the eligible studies by e-mail (or by regular mail if we could not obtain an active e-mail address). We asked the authors to verify the data that we had extracted and to complete missing data.
Statistical Analysis
We used Meta-DiSc Software for Meta-analysis for Diagnostic and Screening tests (version 1.4)7. We pooled, using random effects meta-analyses, the sensitivities, specificities, likelihood ratios, and diagnostic odds ratios and estimated the 95% confidence intervals for the outcomes. Because of the interrelation between the pooled sensitivity and specificity, we focused the analysis on estimating and pooling likelihood ratios and diagnostic odds ratios. The likelihood ratio incorporates both the sensitivity and specificity and provides an estimate of how much a test result will change the odds of having a disease. In this case, the likelihood ratio for a positive result indicates how much the odds of prosthetic joint infection increase when a test is positive. The likelihood ratio for a negative result indicates how much the odds of prosthetic joint infection decrease when a test is negative.
To simplify comparison across tests, both the likelihood ratio for a positive result and the likelihood ratio for a negative result are incorporated in the diagnostic odds ratio, which provides a global estimate of agreement between a test and a reference standard. The higher the diagnostic odds ratio, the higher accuracy a test has8. Therefore, the diagnostic odds ratio allows pooling across studies when the main source of inconsistency is the threshold to consider a test positive (i.e., when there is a common receiver operator characteristic curve across all studies)9.
Summary receiver operator characteristic curves depict the consistency of results across studies (answering the question of whether there is a single receiver operator characteristic curve across all of these studies) and the accuracy of the test, as judged by the area under the summary receiver operator characteristic curve. The summary receiver operator characteristic graph is conceptually very similar to the receiver operator characteristic curve. However, each data point comes from a different study, not a different threshold. It is calculated by using a regression model and placing it over the points to form a smooth curve. Like a receiver operator characteristic curve, the summary receiver operator characteristic curve is plotted over the original points (sensitivity, 1 — specificity) on the original axes. The Q value is estimated to reflect the overall accuracy of summary receiver operator characteristic analyses. It is calculated on the basis of the intersection of the summary receiver operator characteristic curve and the antidiagonal line of the square. Its value correlates with the area under the curve. The closer the curve is to the top left corner, the better the accuracy is, and the higher the Q value.
The inconsistency among studies was assessed with use of the I2 statistic, which represents the proportion of variability across studies that is not due to chance or random error. For example, an I2 value of 30% indicates that 30% of the variability in study results is due to differences in patient populations or study protocols, whereas the remaining variability (70%) is expected to be due to random sampling error. Traditionally, I2 values of 25%, 50%, and 75% indicate low, moderate, and high heterogeneity, respectively10.
Subgroup Analyses
A priori hypotheses to explain potential heterogeneity among studies included the site of the prosthesis (hip versus knee), cutoff rationale (receiver operator characteristic-derived versus preestablished cutoff), whether the spectrum of patients was representative (yes versus no), and definition of prosthetic joint infection (best versus good and mediocre). We tested these hypotheses with use of a test for interaction, with the level of significance set at p < 0.05.
Source of Funding
There was no external source of funding for this study.
Accurate preoperative identification of prosthetic joint infection in patients presenting with joint pain or radiographic periprosthetic lucencies is often difficult. Clinicians often rely on expensive nuclear imaging or more invasive studies such as joint aspiration for a more accurate diagnosis. Despite the best efforts, some patients with prosthetic joint infection will remain undiagnosed until the time of surgery. Therefore, having a marker with high sensitivity and specificity that is inexpensive and is done preoperatively is extremely useful for preoperative planning in cases of suspected infection. Based on this meta-analysis, we observed that serum interleukin-6 was associated with a high accuracy as a marker for periprosthetic infection, followed by the C-reactive protein level, the erythrocyte sedimentation rate, and the white blood-cell count.
Kinetic properties of these markers are important when assessing their use in clinical practice as a diagnostic marker for prosthetic joint infection. Recently published studies have suggested that interleukin-6 may be a more accurate marker for infection than the C-reactive protein level or the erythrocyte sedimentation rate3. Interleukin-6 is produced by stimulated monocytes and macrophages, and it induces the production of several acute-phase proteins, including C-reactive protein. The serum interleukin-6 level in normal individuals is approximately 1 pg/mL, and it can increase to 30 to 430 pg/mL for as long as three days following total joint arthroplasty3,4. Interleukin-6 peaks at two days after uncomplicated arthroplasty and rapidly returns to a normal value. C-reactive protein is an acute-phase reactant that is produced by the liver in response to inflammation, infection, and neoplasm. Its levels are elevated to their peak values two to three days after surgery and return to normal approximately three weeks after surgery. Currently, most experts advocate the use of the blood erythrocyte sedimentation rate and C-reactive protein level as markers for assessing patients with a suspected prosthetic joint infection37-39. When both are negative, the likelihood of infection is extremely low. Following uncomplicated joint arthroplasty, the erythrocyte sedimentation rate increases, reaching a peak at five to seven days postoperatively, and then slowly decreases to preoperative levels in three to twelve months.
The strength of the present review stems from reviewing the literature and collecting data by at least two independent reviewers, contacting authors of individual papers to confirm or correct the published data, accruing patient-level data from several studies, and using a random effects model, which represents a conservative approach that reflects the heterogeneity of the results.
The present study also had several limitations. As the diagnostic threshold of each of the markers used is different among studies, we used the summary receiver operator characteristic curve method rather than a single point analysis. A large spectrum of patients suspected of having a prosthetic joint infection are represented in these studies, allowing generalization of our results to a variety of settings in clinical practice, but the study population and patient-selection methods were not fully reported in all studies. Due to the retrospective nature of most of these studies, there was a fair amount of withdrawal and enrichment with prosthetic joint infection cases. These markers were not routinely obtained for all patients undergoing surgery, resulting in a potential selection bias and exaggeration of accuracy. None of the studies provided information on blinding and test reproducibility. Our results are susceptible to spectrum bias, because diagnostic tests may have different accuracy in patients with early or late infections. Importantly, a strict case definition for prosthetic joint infection was not used in all studies. This could have led to a classification bias between infected and noninfected patients as sometimes the distinction between infection and contamination is difficult with certain organisms such as coagulase-negative staphylococci. However, we were unable to find a significant impact on the accuracy of the markers on the basis of the appropriateness of the case definition. Other factors that could have affected the accuracy but that were not reported in most of the studies included the use of antibiotics and the time between the assessment of serum markers and the validation of an infection. The means of measuring the white blood-cell count, erythrocyte sedimentation rate, C-reactive protein level, and interleukin-6 level were mostly not reported. Furthermore, there was some variation among the cutoff values of the tests that were used in different studies. While the accuracy of interleukin-6 for the diagnosis of prosthetic joint infection seems very promising, the current data are only driven by one large and two smaller studies. Only a few studies assessed the accuracy of combining different inflammation markers for the diagnosis of infection12,21,39. Finally, some degree of publication bias is unavoidable in systematic reviews. The small number of events and the wide confidence intervals also imply a level of uncertainty about inferences from these data.
On the basis of this meta-analysis, we can conclude that the serum interleukin-6 level has the highest accuracy for the diagnosis of prosthetic joint infection, followed by the C-reactive protein level, the erythrocyte sedimentation rate, and the white blood-cell count. Further studies evaluating the accuracy of interleukin-6 and other cytokines in different patient populations are needed to confirm the findings of the present study.
Note: The authors of the present study acknowledge all of the authors of the included studies who took the time to reply to our queries and participated with their patient point data. They also thank Bettina Knoll, MD, and Davud Malekzadeh, Cand. med, for their help with translation of the German foreign language papers.