Anterior cruciate ligament reconstruction is the gold-standard surgical treatment for providing stability in the setting of an anterior cruciate ligament rupture. Reconstruction can be performed with use of either autograft or allograft tissue. It is currently unclear if the outcomes of anterior cruciate ligament reconstruction with allograft differ significantly from those with autograft.
Reconstruction with allograft has the major advantage of eliminating donor-site morbidity, with the consequent benefits of less postoperative pain and faster rehabilitation. However, allograft has the major disadvantages of the potential for disease transmission1,2 and limited availability3. In addition, concerns about slower incorporation, inadequate so-called ligamentization, and possible immunogenicity have been raised in association with allograft4-6. Interestingly, the cost of the allograft itself appears to be offset by the increased operating room time and a greater likelihood of overnight hospitalization for autograft procedures7.
A few conventional narrative reviews8-13 have addressed these issues surrounding graft selection for anterior cruciate ligament reconstruction. Firm conclusions regarding the clinical outcomes after reconstruction with autograft or allograft cannot be drawn from those narrative reviews because they may reflect some inherent bias toward summarizing literature that supports the point of view of the authors. In contrast, a systematic review of the literature has been defined as the "application of scientific strategies that limit bias by the systematic assembly, critical appraisal, and synthesis of all relevant studies on a specific topic."14 Furthermore, a meta-analysis has been defined as a quantitative systematic review "that employs statistical methods to combine and summarize the results of several studies."14
Two previous systematic reviews compared the outcomes of anterior cruciate ligament reconstruction with autograft with those of anterior cruciate ligament reconstruction with allograft15,16. The first systematic review only analyzed instrumented laxity measurements, and the value of that review was further limited by the inclusion of studies with a noncomparative study design (Level of Evidence IV)16. The second systematic review evaluated only bone-patellar tendon-bone grafts, and the outcomes of interest did not include patient-oriented outcomes or instrumented laxity15.
The purpose of the current systematic review was to determine if the short-term clinical outcomes of anterior cruciate ligament reconstruction with allograft are significantly different from those with autograft. The clinical outcomes of interest included patient-oriented outcomes, the results of physical examination tests, instrumented laxity measurements, and complications (including graft failure).
Search Strategy
A computerized search of the electronic databases MEDLINE (from 1950 to the fourth week of March 2009) and EMBASE (from 1966 to March 2009) was conducted with use of the three keywords in combination: "auto$," "allo$," and "anterior cruciate ligament." Published studies in all languages were considered for inclusion. The titles and abstracts of these potentially relevant studies from the computerized search were reviewed. If the abstract indicated a possibility that the study had a comparative study design, involved human subjects, and demonstrated any clinically relevant outcome, then the article was retrieved for more detailed evaluation. Subsequently, the references of these articles were searched by hand for any additional relevant studies.
Selection
In order to be included, a study had to be a therapeutic study with a prospective or retrospective comparative design (Level of Evidence I, II, or III)17. Furthermore, each study had to meet six inclusion criteria related to (1) the population that was considered to be acceptable for study (patients of any age), (2) the procedure of interest (unilateral anterior cruciate ligament reconstruction), (3) the intervention being studied (autograft compared with allograft, with use of same anatomic graft), (4) the outcomes being evaluated (including any clinically relevant outcome, such as a physical examination measurement, complication, or patient-oriented outcome), (5) the minimum duration of follow-up (two years), and (6) the minimum study size (fifteen patients in each treatment arm).
Of note, all patients in a study had to have been followed for at least two years. An average of two years of follow-up was not sufficient for inclusion.
Any study that failed to meet all of the inclusion criteria was excluded. Specifically, all case series (Level of Evidence IV) were excluded. In addition, a study was excluded if data from the same patients were reported in another study that had longer follow-up.
Assessment of Study Quality
Two reviewers (J.L.C. and K.P.S.) independently assessed the methodological quality of each included study with respect to study design, determination of intervention type, baseline comparability according to graft source, similarity of surgical technique, utilization of independent examiners, and proportion of patients lost to follow-up. A study was considered to be prospective if it started before the first patient was enrolled. In contrast, a study was considered to be retrospective if it started after the first patient was enrolled.
Data Extraction
Two reviewers (J.L.C. and K.P.S.) independently extracted relevant data from each included study and recorded them on multiple worksheets. The specific data that were extracted included the country where the trial was primarily conducted, the number of surgeons, the date range of the procedures, the number of eligible patients, the number of patients with follow-up, the duration of follow-up, the surgical technique used, allograft properties, the demographic characteristics of the patients, patient-oriented outcomes, composite scales, instrumented laxity measurements, the results of Lachman testing, the results of pivot-shift testing, range-of-motion deficits, the results of one-leg-hop testing, thigh-circumference differences, and complications. These worksheets were subsequently compared, and any discrepancies were resolved by means of a review of the original study and discussion to achieve consensus.
Statistical Methods and Strategies
Heterogeneity was qualitatively assessed by comparing the study designs, study populations, interventions, outcomes, and blinding among the included studies. In addition, statistical tests of homogeneity (chi-square testing for failures and for grouped frequency distribution of instrumented laxity measurements) were employed to determine if any individual study findings refuted the null hypothesis that the findings of the individual studies were the same. If the observed variation among studies was inconsistent with this null hypothesis (p < 0.10), then heterogeneity was assumed.
If a study failed the qualitative assessment and statistical tests of homogeneity, it was excluded from meta-analysis because there were presumed to be meaningful differences in the populations studied, the nature of the predictor or outcome variables, or the study results18. Furthermore, a study was withdrawn from the meta-analysis of a particular outcome if that outcome was not studied or was not reported adequately. A Mantel-Haenszel analysis utilizing a random-effects model allowed for the pooling of results according to graft source while accounting for the number of subjects in individual studies19. In order to ensure that the findings were robust, a sensitivity analysis was systematically performed by varying the included studies or variables in the meta-analysis on the basis of several factors: graft type (bone-patellar tendon-bone or hamstring), instrumented laxity cut-off value (3 or 5 mm), secondary sterilization technique (irradiated or non-irradiated), minimum duration of follow-up (two or three years), mean patient age (less than or equal to thirty years), or study methodology (prospective or retrospective).
Source of Funding
Two grants (NIH/NIAMS R01 AR053684-01A1 and NIAMS 5 K23 AR052392-04) provided salary support. The research grant from Smith and Nephew and the unrestricted educational grant from DonJoy did not provide salary support and did not contribute to the development of this manuscript.
Study Identification
The Quality of Reporting of Meta-analyses (QUOROM)20 flow diagram depicts the number of studies identified, included, and excluded as well as the reasons for exclusion (Fig. 1). The initial computerized search identified 300 potentially relevant studies. Subsequent review of the abstracts produced twenty-four articles that were retrieved for more detailed evaluation. Two studies were excluded because data from the same patients were reported in another study that had a longer duration of follow-up21,22. Eleven studies were excluded because of the failure to report a minimum two-year follow-up23-33. One study was excluded because it compared different anatomic grafts34. One study was excluded because one treatment arm had fewer than fifteen patients35. Therefore, nine studies were determined to be appropriate for systematic review36-44 (Table I).
Study Characteristics
Six studies were conducted in a North American country36-38,40,42,43, and three studies were conducted in a European country39,41,44 (see Appendix). All studies involved one or two surgeons. The procedures were performed between 1986 and 2000. With respect to surgical techniques (see Appendix), eight studies compared autograft bone-patellar tendon-bone with allograft bone-patellar tendon-bone36,37,39-44. However, in one of those studies, four of the sixty-four allografts were Achilles tendon40. The ninth study compared autograft quadruple-stranded hamstring with allograft quadruple-stranded hamstring38. The allograft was reported as non-irradiated in five studies36,38,40-42, irradiated with 2.0 Mrad in one study43, and sometimes non-irradiated and sometimes irradiated with an unknown dose in one study37. In another study, it was not reported as irradiated or non-irradiated44. In the remaining study, the allograft was treated with acetone solvent drying, followed by irradiation with 1.5 Mrad39. With regard to storage, the grafts were reported to be fresh-frozen in six studies36,37,40-43 and to be sometimes cryopreserved and sometimes fresh-frozen in one study38; storage was not reported in two studies39,44.
Study Quality
Assessment of the methodological quality of these studies revealed that there were no randomized controlled trials (Level I). Five studies were prospective comparative studies (Level II)38,39,41,42,44, and the other four studies were retrospective comparative studies (Level III)36,37,40,43. The treatment was determined on the basis of patient choice in four studies36,37,42,43, allograft availability in two studies41,44, a combination of patient choice and allograft availability in one study40, a combination of patient choice and randomization in one study38, and chronological division in one study39. More than half of the studies utilized an independent examiner37-39,41,44. In five studies, the baseline demographic characteristics according to graft source were significantly different with respect to patient age, male-to-female ratio, time from injury to reconstruction, or duration of follow-up (see Appendix)36,37,40-42. One study did not demonstrate any significant differences with respect to these factors, perhaps because approximately 75% of the patients consented to undergo randomization38. The other three studies did not investigate all of these demographic characteristics according to graft source39,43,44. Within each study, the surgical approach, fixation technique, and postoperative rehabilitation were consistent for every patient. Five studies had >80% follow-up in both treatment arms36,37,41,43,44.
Assessment of Heterogeneity
A qualitative assessment of heterogeneity demonstrated that the sterilization process and outcomes in the study by Gorschewsky et al.39 were substantially different from those in the other included studies. Specifically, the allografts used in the study by Gorschewsky et al.39 were sterilized with osmotic treatment, oxidation, and solvent drying with acetone. In contrast, the other studies involved the use of fresh-frozen allografts or cryopreservation. The patient-oriented outcomes, physical examination findings, instrumented laxity measurements, and complications in the allograft group in the study by Gorschewsky et al. were much worse than those in the other treatment arms. For example, 45% (thirty-eight) of the eighty-five patients who were managed with allograft in that study were considered to have had a clinical failure39. The next highest clinical failure rate was 12% (three of twenty-five) in the allograft treatment arm of the study by Victor et al.44.
Statistical tests of homogeneity confirmed that the study by Gorschewsky et al.39 was inconsistent with the null hypothesis that the findings of the individual studies were the same with respect to the Lysholm score, instrumented laxity measurements, and clinical failure rate (p < 0.10). Therefore, that study failed the qualitative assessment and statistical tests of homogeneity and was excluded from the meta-analyses of the Lysholm score, instrumented laxity measurements, and clinical failure rate as well as the sensitivity analysis.
The remaining eight studies were all comparative therapeutic studies in which bone-patellar tendon-bone autograft was compared with bone-patellar tendon-bone allograft36,37,40-44 or in which hamstring autograft was compared with hamstring allograft38. Statistical tests of homogeneity supported the null hypothesis that the findings of these eight individual studies were the same with respect to the Lysholm score, instrumented laxity measurements, and clinical failure rate (p > 0.10).
Patient-Oriented Outcomes
A patient-oriented outcome or composite scale was reported in every study except one43 (see Appendix). Within each study, there were no significant differences between autograft and allograft. Lysholm scores were reported as an outcome measure in six studies36-38,41,42,44. The Lysholm scores from those six studies were pooled according to graft source, and the meta-analysis of Lysholm scores estimated a mean difference of 1.5 favoring autograft (95% confidence interval, -1.1 to 4.1; p > 0.25).
Physical Examination
There were no significant differences between autograft and allograft with respect to Lachman testing or pivot-shift testing (see Appendix). Similarly, there were no significant differences with respect to flexion deficit, one-leg-hop test, or thigh circumference (see Appendix). However, two studies demonstrated a significant difference with respect to extension deficit40,42. Those two studies indicated that the autograft group lost 1.4° to 1.8° more motion than did the allograft group40,42. The other five studies that evaluated extension deficits did not demonstrate a significant difference36,37,39,41,43.
Instrumented Laxity
Every study evaluated instrumented laxity measurements as an outcome measure (see Appendix). Within each study, there were no significant differences between autograft and allograft with respect to instrumented laxity measurements. Seven studies reported the grouped frequency distribution of instrumented laxity measurements36-38,40-43. The instrumented laxity results of those seven studies were pooled according to graft source, and the meta-analysis of instrumented laxity measurements of >5 mm (Fig. 2) estimated an odds ratio of 1.23 favoring allograft (95% confidence interval, 0.52 to 2.92; p = 0.63). The corresponding funnel plot, which visually represents the standard error of the log odds ratio (a measure of precision) as a function of the odds ratio (a measure of the treatment effect), appears essentially symmetrical about the pooled estimate from the meta-analysis and is shaped like an inverted funnel, indicating no gross publication bias (see Appendix).
Complications
With respect to donor-site symptoms (see Appendix), three studies evaluated anterior knee pain, and all three demonstrated no significant difference between graft types36,41,44. Two studies evaluated patellofemoral pain or retropatellar pain, and both studies demonstrated no significant difference between graft types37,43. Peterson et al. reported that the rate of incisional site complaints was 53% (sixteen of thirty) in the autograft group and 7% (two of thirty) in the allograft group42. Gorschewsky et al. reported that the rate of kneeling pain or paresthesias was 50% (fifty of 101) in the autograft group and 0% (zero of eighty-five) in the allograft group39.
Four studies evaluated the deep infection rate and demonstrated no infections in either group36,38,42,43. Similarly, within each study, there were no significant differences between the autograft and allograft groups in terms of arthrofibrosis37,38,42,43 or reoperation rates37,41,43.
Failures were reported in seven of the eight studies that were included in the meta-analysis36-38,41-44. The authors of the other study were contacted but were not able to provide data on clinical failures from that study40. Failures were not defined identically in all studies. The criteria for clinical failure included revision anterior cruciate ligament reconstruction36,38,41,43, traumatic graft rupture37,42,44, positive Lachman testing with complaints of instability42, and a combination of positive Lachman testing, positive pivot-shift testing, and a side-to-side difference of =5 mm on arthrometer testing36,43. The clinical failures from these seven studies were pooled according to graft choice, and the meta-analysis (Fig. 3) estimated an odds ratio of 0.61 favoring autograft (95% confidence interval, 0.21 to 1.79; p = 0.37). The corresponding funnel plot appeared somewhat asymmetrical about the pooled estimate and did not have the characteristic inverted funnel shape, reflecting the very low number of events (failures) and also possibly reflecting a publication bias against small studies that favor allograft success (see Appendix). Of note, in the study by Gorschewsky et al. that was excluded from the meta-analysis, the allograft failure rate was 45% (thirty-eight of eighty-five) and the autograft failure rate was 6% (six of 101).
Sensitivity Analysis
The inclusion of only the studies involving the use of bone-patellar tendon-bone graft did not change the findings. Specifically, the instrumented laxity results of six studies involving bone-patellar tendon-bone graft36,37,40-43 were pooled according to graft source, and the meta-analysis of instrumented laxity measurements of >5 mm estimated an odds ratio of 1.02 favoring allograft (95% confidence interval, 0.40 to 2.59; p = 0.67). Similarly, the clinical failures from six studies of bone-patellar tendon-bone graft36,37,41-44 were pooled according to graft choice, and the meta-analysis estimated an odds ratio of 0.34 favoring autograft (95% confidence interval, 0.09 to 1.27; p = 0.11).
When the instrumented laxity cut-off value for stability was defined as 3 mm (rather than 5 mm as above) and the instrumented laxity results of six studies36-38,41-43 were pooled according to graft source, the meta-analysis of instrumented laxity measurements of >3 mm estimated an odds ratio of 1.03 favoring allograft (95% confidence interval, 0.60 to 1.78; p = 0.91).
Furthermore, the meta-analysis results regarding instrumented laxity measurements and clinical failure proved to be robust when including studies of a specific secondary sterilization technique (irradiated or non-irradiated), minimum duration of follow-up (two or three years), mean patient age (less than or equal to thirty years or greater than thirty years), or study methodology (prospective or retrospective). Specifically, no significant difference in instrumented laxity measurements or the clinical failure rate was discovered between autograft and allograft in any scenario.
The key findings of the present systematic review and meta-analysis indicate that, in general, the short-term clinical outcomes of anterior cruciate reconstruction with allograft are not significantly different from those with autograft. Specifically, the meta-analysis of Lysholm scores pooled according to graft source estimated a mean difference of 1.5 favoring autograft (95% confidence interval, -1.1 to 4.1; p > 0.25), which was not significant. (Of note, with respect to anterior cruciate ligament injuries, the minimum detectable change for the Lysholm score is 8.945.) Similarly, the meta-analyses of instrumented laxity measurements and the clinical failure rate indicated odds ratios that were not significant. These findings were robust during the sensitivity analysis, which varied the included studies or variables on the basis of graft type, instrumented laxity cut-off value, secondary sterilization technique, duration of follow-up, mean patient age, and study methodology.
The notable exception to these findings was the study by Gorschewsky et al.39, which failed the qualitative assessment and statistical tests of homogeneity and consequently was excluded from the meta-analysis. The allografts used in that study were sterilized with osmotic treatment, oxidation, and solvent drying with acetone. The patient-oriented outcomes, physical examination testing, instrumented laxity measurements, and complications in the allograft group in the study by Gorschewsky et al. were substantially worse than those in the other allograft treatment groups. For example, 45% (thirty-eight) of the eighty-five patients in the allograft group in that study were considered to have had a clinical failure39. The authors of that study suspected that the sterilization process contributed to the high failure rate and were planning to utilize allografts that were fresh-frozen or freeze-dried in the future, if needed39.
With respect to internal validity, the nonrandomized design of the included studies challenges the validity of clinical inferences regarding associations between graft choice and outcome. In particular, selection bias may have been introduced by the determination of treatment on the basis of patient choice. For example, in four of five studies in which patient choice was a component of treatment determination, the mean age of the patients in the autograft group was younger than the mean age of the patients in the allograft group36-38,42,43. None of those studies stratified outcomes according to age or utilized multivariable modeling to mathematically control for age (or any other possible confounder not equally distributed in the two treatment groups). Therefore, another factor or confounder may be masking a true association between graft choice and outcome. Similarly, the associations between graft choice and clinical outcome may be distorted by biological or statistical interaction due to the interdependent operation of these factors, such as age, activity level, and graft choice.
Furthermore, substantial dropout bias may have been introduced because two of the studies had treatment arms with <60% follow-up of the total number of eligible patients40,42. Missing data may not be missing at random. Additionally, four studies36,40,42,43 did not involve independent examiners, which may have contributed some observer bias—a distortion, conscious or unconscious, in the perception or reporting of measurements18.
With respect to external validity, no characteristics of the study patients were identified in these studies that would preclude generalization of these results to patients in the population with anterior cruciate ligament rupture. However, these results may not be generalizable to specific subsets of patients with an anterior cruciate ligament rupture, such as elite athletes, very young patients, or very old patients.
Our results are consistent with those of a recently published systematic review and meta-analysis of prospective trials involving the use of bone-patellar tendon-bone autograft and bone-patellar tendon-bone allograft tissue for anterior cruciate ligament reconstruction15. After excluding the study by Gorschewsky et al.39 from the analysis, the authors concluded that no significant differences were found between patients managed with bone-patellar tendon-bone autograft and those managed with bone-patellar tendon-bone allograft with respect to graft rupture, the rate of reoperation, normal or near normal International Knee Documentation Committee scores, Lachman testing, pivot-shift testing, patellar crepitus, the hop test, or return to sports activity15. The results of another meta-analysis of arthrometric stability of autografts and allografts after anterior cruciate ligament reconstruction indicated that allografts had significantly lower normal stability rates as compared with autografts16. These findings differ somewhat from those of the current meta-analysis primarily for three reasons. First, the selection of studies included case series as well as comparative studies16. Second, the statistical analysis did not employ the random-effects model, which is more conservative than the meta-analytic method employed. Third, the authors did not perform statistical tests of homogeneity and did not exclude the study by Gorschewsky et al.39
The ideal study design to assess the outcomes of autograft as compared with allograft is a randomized clinical trial. However, there are inherent ethical and practical concerns involved with randomizing a patient to possibly receive cadaveric tissue. Many patients have a preference for autograft or allograft tissue. Consequently, a high-quality prospective comparative study is the next-best option. As treatment assignment is nonrandom in this setting, multivariable modeling may be utilized to mathematically control the possible confounding variables (such as age, activity level, and associated injuries) so that the effect of autograft or allograft selection can be more purely estimated46.
In the current systematic review and meta-analysis, the short-term clinical outcomes of anterior cruciate ligament reconstruction with allograft were not significantly different from those with autograft, in general. However, it is important to note again that none of these nonrandomized studies stratified outcomes by age or utilized multivariable modeling to control mathematically for age (or any other possible confounder, such as activity level, that is not equally distributed in the two treatment groups). Understanding these limitations of the best-available evidence, the surgeon may incorporate the results of the present systematic review into the informed-consent and shared-decision-making process in order to individualize optimum patient care.