To The Editor:
I recently read the article "Treatment of Ruptures of
the Lateral Ankle Ligaments: A Meta-Analysis" (82-A: 761-773,
June 2000), by Pijnenburg et al., with interest. The authors concluded that
functional treatment was better than casting or no treatment, in
agreement with previous reviews. However, the authors also stated
that the summary measures of effectiveness for operative treatment
were better than those for functional treatment, and they stated that
this result differs from those of previous reviews, including one
that I authored1. There are two
points that I would like to make.
First, I had, in fact, come to the same conclusion that the current
authors had. When my article was written, the studies comparing
the results of surgery with those of early functional treatment were,
overall, in favor of surgery. However, given the higher risks of
surgical treatment and the fact that late reconstruction gave excellent
results, I concluded that it was prudent to use functional treatment
first and to operate only on those patients in whom conservative
treatment had failed. Pijnenburg et al. concurred: "Analysis
of the pooled results showed operative treatment to be superior
to functional treatment, yet there are reasons to question the selection
of operative treatment as the treatment of choice. . . . Finally, when
conservative treatment fails, secondary operative reconstruction
of the ruptured ligaments can be performed, with similar good results,
even years after the initial injury."
Second, the authors indicated that their review is an improvement
over previous reviews because they calculated one risk-ratio summary
statistic for giving-way and another for pain when comparing the
results of surgical treatment with those of early functional treatment.
However, the test of homogeneity failed under these conditions.
If the test of homogeneity fails, a summary statistic is not valid
because it is based upon the assumption that there will be random
variation from study to study, and results are expected to vary
from the "true value." If many studies are done, the
results should vary according to standard patterns on the basis
of sample size and other such factors, and this variation would
allow the true value of the risk ratio to be estimated. However, when
the results of the studies vary more than expected, the test of
homogeneity fails, which means that the different results are unlikely
to be due to random variation. In this case, one should examine
the differences and similarities between the individual studies
in an attempt to discover what the source of heterogeneity might
be, rather than summarize something that can’t be summarized
into one number. This method has been called an "exploratory
meta-analysis" as opposed to the use of the summary statistic, which
is called an "analytical meta-analysis"2. Further, readers should be cautious when
interpreting a summary statistic, even if there is statistical homogeneity, because,
in order to be valid, the summary statistic requires data from studies with
similar designs and study populations. As a thought experiment,
let us consider an example in which 80% of the studies
reviewed show a positive effect of a treatment and 20% show
a negative effect, and the studies are statistically homogeneous.
In this case, the summary statistic is likely to show a positive
effect of treatment. However, upon closer examination, all of the studies
showing negative results involve subjects over the age of sixty
years and all of those showing positive results involve those younger
than sixty years of age. Claiming that the treatment is effective
is not incorrect, but it can lead to inappropriate treatment for
a large segment of the population. Unfortunately, there is no test
for "methodological homogeneity," and so the summary statistic
should be used with extreme caution and only after ensuring that methodological
differences do not account for apparently contradictory results.
The indiscriminate use and acceptance of the summary statistic in meta-analyses
may be one reason why subsequent large clinical trials have failed
to confirm the hypotheses generated by many meta-analyses3,4.
A.C.M. Pijnenburg, C.N. van Dijk, P.M.M. Bossuyt, and
R.K. Marti reply:
We appreciate Dr. Shrier’s insightful and interesting
comments on our study. In his appraisal of the literature Dr. Shrier
also found positive results for operative treatment. Concerning
his question on summary statistics, we would like to make the following remarks.
It is commonly agreed that the test of homogeneity, used in systematic reviews
and added to ours upon explicit request from one of the reviewers,
is far from perfect. With only a few studies in a review, the summary
statistic lacks power. With many studies, it will be positive even
in the absence of clinically meaningful heterogeneity. We agree
that researchers should look at heterogeneity as an opportunity
for further explanation, rather than as a threat to their hypotheses.
Such heterogeneity can be due to methodological shortcomings as
well as to genuine clinical differences among study populations, treatments
given, or the outcome measures used.
The relationship between systematic reviews and new trials is
a difficult one. As we have indicated in our paper, the quality
of the trials performed so far in this area is not very impressive,
leaving ample room for improvement. We fully agree that, due to
these shortcomings, the evidence generated by new, high-quality
trials will easily surpass that found currently in the most comprehensive
systematic reviews. In the meantime, reviews such as ours summarize the
evidence that is available and can help practitioners in making
treatment decisions.