Study Design
The SPORT was conducted at thirteen medical centers with multidisciplinary spine practices in eleven states in the United States. Institutional review board approval was obtained at each center. The trial was registered with (NCT00000411). The SPORT included both a randomized cohort and a concurrent observational cohort of patients who declined randomization. This design improves the generalizability of the findings12. Additional background information is available in previous publications8-10,13,14.
Patient Population
All patients had neurogenic claudication or radicular leg pain with associated neurological signs, spinal stenosis seen on cross-sectional imaging, degenerative spondylolisthesis seen on standing lateral radiographs, symptoms that had persisted for at least twelve weeks, and physician confirmation that they were a surgical candidate. Patients with adjacent levels of stenosis were eligible, but those with spondylolysis and isthmic spondylolisthesis were not. Pre-enrollment nonoperative care was not specified but included physical therapy (68%), epidural injections (55%), chiropractic care (25%), anti-inflammatory medications (63%), and opioid analgesics (30%). Enrollment began in March 2000 and ended in February 2005.
Study Interventions
The protocol surgery consisted of a standard posterior decompressive laminectomy with or without bilateral single-level fusion (autogenous iliac crest bone-grafting with or without posterior pedicle screw instrumentation)8. The nonoperative protocol was "usual recommended care," which includes, at least, active physical therapy, education and counseling with instructions regarding home exercise, and nonsteroidal anti-inflammatory drugs if the patient can tolerate them8,15.
Study Measures
The primary end points were the Short Form-36 (SF-36) bodily pain and physical function scores16-19 and the American Academy of Orthopaedic Surgeons MODEMS (Musculoskeletal Outcomes Data Evaluation and Management System) version of the Oswestry Disability Index20 measured at six weeks, three months, six months, and yearly up to four years. If surgery was delayed for more than six weeks, additional follow-up data were obtained at six weeks and three months postoperatively. Secondary outcomes included patient self-reported improvement, satisfaction with current symptoms and with care21, the Stenosis Bothersomeness Index2,22, the Low Back Pain Bothersomeness Scale2, and the Leg Pain Bothersomeness Scale2. The treatment effect was defined as the difference in the mean changes, as compared with baseline, between the surgical and nonoperative groups (the difference in the difference).
SF-36 scores range from 0 to 100 points, with higher scores indicating less severe symptoms; the Oswestry Disability Index ranges from 0 to 100 points, with lower scores indicating less severe symptoms; the Stenosis Bothersomeness Index ranges from 0 to 24 points, with lower scores indicating less severe symptoms; and the Low Back Pain and Leg Pain Bothersomeness Scales range from 0 to 6 points, with lower scores indicating less severe symptoms.
Statistical Methods
The statistical methods for the analysis of this trial have been reported in previous publications9,10,13,14, and these descriptions are repeated or paraphrased here as necessary. In the initial analyses, the baseline characteristics were compared between the patients in the randomized cohort and those in the observational cohort and between the surgical and nonoperative groups in the combined cohort. The extent of missing data and the percentage of patients who had undergone surgery were calculated for each study group at each scheduled follow-up point. Baseline predictors of the time until surgical treatment (including treatment crossovers) in both cohorts were determined through a stepwise proportional-hazards regression model with a criterion of p < 0.1 to enter and p > 0.05 to exit. Predictors of adherence to the assigned treatment and missing follow-up visits at one, two, three, and four years were determined through stepwise logistic regression. The primary analyses consisted of comparisons of surgical and nonoperative treatments, on the basis of the changes from baseline at each follow-up visit, with a mixed-effects model of longitudinal regression that included a random individual effect to account for correlation between repeated measurements. The randomized cohort was initially analyzed on an intent-to-treat basis. Because of crossover, subsequent analyses were based on treatments actually received.
In the as-treated analyses, the treatment indicator was a time-varying covariate, allowing for variable times of surgery. In the intent-to-treat analyses, all times were from enrollment. In the as-treated analyses, the times were from the beginning of treatment (that is, the time of surgery for the surgical group and the time of enrollment for the nonoperative group). Therefore, all changes from baseline before surgery were included in the estimates of the nonoperative treatment effect. Changes after surgery were assigned to the surgical group, with follow-up measured from the date of the surgery.
Repeated measures of outcomes were used as the dependent variables, and the treatment received was included as a time-varying covariate. Adjustments were made for postoperative visit times with respect to the time of surgery in order to approximate the designated follow-up times. Treatment comparisons were performed at designated follow-up times. In addition, a global significance test was based on the time-weighted average/area under the curve analysis over all time periods23.
As-treated estimates of treatment effect from the randomized and observational cohorts were analyzed to establish comparability. Subsequent analyses combined the two cohorts. To adjust for potential confounding, baseline variables that were associated with missing data or treatment received were included as adjusting covariates in longitudinal regression models. Computations were performed with the use of the PROC MIXED procedure for continuous data and the PROC GENMOD procedure for binary and non-normal secondary outcomes from the SAS software package (version 9.1; SAS Institute, Cary, North Carolina). Significance was defined as p < 0.05 on the basis of a two-sided hypothesis test with no adjustments made for multiple comparisons. The data for these analyses were collected through May 1, 2008.
Source of Funding
Sources of funding included the National Institute of Arthritis and Musculoskeletal and Skin Diseases (U01-AR45444) and the Office of Research on Women's Health, the National Institutes of Health, and the National Institute of Occupational Safety and Health, the Centers for Disease Control and Prevention. The analyses and manuscript preparation were performed independently by the investigators.
Overall, 607 of 892 eligible participants were enrolled in the degenerative spondylolisthesis SPORT trial; 304 were enrolled in the randomized cohort and 303, in the observational cohort. In the randomized cohort, 159 patients were assigned to receive surgery and 145, nonoperative treatment. Of the 159 assigned to receive surgery, 64% (101) underwent surgery by two years and 66% (105), by four years. Of the 145 patients assigned to receive nonoperative care, 49% (seventy-one) underwent surgery by two years and 54% (seventy-nine), by four years (Fig. 1). In the observational cohort, 173 patients initially chose surgery and 130 initially chose nonoperative care. Of the 173 who initially chose surgery, 97% (168) underwent surgery by two years and had had no additional surgery by four years. Of the 130 patients who initially chose nonoperative treatment, 25% (thirty-three) underwent surgery by two years and 33% (forty-three) underwent surgery by four years. In both cohorts combined, 395 patients underwent surgery within four years, with 87% (345) of them undergoing it within the first year. At four years, 35% (212) of the 607 patients had had only nonoperative treatment. A total of 601 patients (301 [99%] of the 304 enrollees in the randomized cohort and 300 [99%] of the 303 enrollees in the observational cohort), each with at least one follow-up visit in the four-year period, were included in the analysis (Fig. 1). The proportion of enrollees who supplied data at each follow-up interval ranged from 70% to 92%, with losses due to dropouts, missed visits, and deaths.
Patient Characteristics
A table in the Appendix shows the baseline characteristics and clinical findings of participants in the randomized and observational cohorts. The cohorts were remarkably similar except for their preferences for surgery (p < 0.001), with more randomized patients unsure of their preference (39% [117] of 301 compared with 7% [twenty] of the 300 patients in the observational cohort) and fewer randomized patients preferring either surgery (13% [thirty-eight of 301] compared with 43% [129 of 300]) or nonoperative treatment (15% [forty-four of 301] compared with 28% [eighty-three of 300]).
The table in the Appendix also shows summary statistics for the combined randomized and observational cohorts according to the treatment received. In the combined cohort, the patients who underwent surgery within four years were, at baseline, significantly younger and more likely to be receiving compensation (for example, Workers' Compensation or Social Security benefits) than those who received nonoperative treatment. They also had significantly worse pain, function, disability, and symptoms than the patients in the nonoperative group. The patients in the surgery group were significantly more dissatisfied with their symptoms and more often rated their symptoms as worsening at the time of enrollment; these patients also definitely preferred surgery.
These observations highlight the need to control for baseline differences in the adjusted models. On the basis of the selection procedure for variables associated with treatment, missing data, and outcomes, the final as-treated models controlled for the following covariates: age, sex, work status, body mass index, neuroforamen involvement, depression, osteoporosis, joint problems, duration of current symptoms, reflex deficit, number of moderately or severely stenotic levels, hypertension, treatment preference, other comorbidities (including stroke; cancer; fibromyalgia; chronic fatigue syndrome; posttraumatic stress disorder; alcohol or drug dependency; lung, liver, kidney, blood vessel, and nervous system disease; migraine; and anxiety), baseline SF-36 score, baseline Oswestry Disability Index, baseline Stenosis Bothersomeness score, and center.
Nonoperative Treatments
Nonoperative treatments used during the SPORT included physical therapy (43% [176 of 412]), epidural steroid injections (47% [192 of 412]), nonsteroidal anti-inflammatory drugs (54% [224 of 412]), and opioids (35% [146 of 412]). Nonoperative treatments were similar in the randomized cohort and the observational cohort, although more patients in the randomized cohort than in the observational cohort reported visits to a surgeon (48% [122 of 252] compared with 38% [sixty of 160], p = 0.04), receiving injections (51% [128 of 252] compared with 40% [sixty-four of 160], p = 0.04), and opioid use (40% [100 of 252] compared with 29% [forty-six of 160], p = 0.03).
Surgical Treatment and Complications
The median surgical time for the combined cohort was 198 minutes, with a mean blood loss of 583 mL (see Appendix). There was no significant difference between the cohorts with regard to the rates of intraoperative blood replacement, but there was a difference in the postoperative transfusion rates (16% [twenty-nine of 178] in the randomized cohort compared with 24% [fifty-one of 209] in the observational cohort, p = 0.08). The most common surgical complication was a dural tear (11% [forty-one of 387]). The four-year reoperation rate was 15% (fifty-nine of 387), and the rate of recurrent stenosis was 5% (nineteen of 387).
Within four years after enrollment, there were seven deaths in the nonoperative group, compared with twenty-two deaths expected on the basis of age and sex-specific mortality rates24, and seventeen deaths in the surgery group, compared with twenty-eight expected. The hazard ratio based on a proportional-hazards model adjusted for age was 1.9 (95% confidence interval, 0.76 to 4.6; p = 0.17). All twenty-four deaths were independently reviewed, and eighteen were judged to be not related to the treatment. Four deaths were of unknown cause but occurred between 621 and 1379 days following the surgery. Two deaths, both in the surgical group, were judged to be potentially related to the treatment; one patient died of respiratory distress thirty-two days after the surgery, and the other died of sepsis eighty-two days after the surgery.
Crossover
Nonadherence to the treatment assignment affected both arms of the SPORT: patients in the surgical arm chose to delay or declined surgery, and patients in the nonoperative arm crossed over to receive surgery (Fig. 1). The characteristics of crossover patients that differed significantly from those of the patients who did not cross over are shown in a table in the Appendix. Patients who crossed over to receive nonoperative care were older, had less pain and disability, were less bothered by their symptoms, and had stronger baseline treatment preferences for nonoperative care as compared with patients who did not cross over. In the group randomized to receive surgery, improvements in pain and function scores at the early follow-up intervals also predicted nonadherence with surgical treatment. In the group randomized to receive nonoperative care, those who crossed over to receive surgery were younger, more often married, and more dissatisfied with their symptoms, and they had a stronger baseline preference for surgery.
Main Treatment Effects
The intent-to-treat analysis of the randomized cohort showed no significant differences between surgery and nonoperative care on the basis of overall global hypothesis tests for differences in mean changes from baseline (Fig. 2). Estimated treatment effects at four years slightly favored nonoperative treatment but were not significant; these effects were -2 for SF-36 bodily pain (95% confidence interval, -8.6 to 4.6, p = 0.56), -3.1 for physical function (95% confidence interval, -9.2 to 3, p = 0.32), and 4.1 for the Oswestry Disability Index (95% confidence interval, -0.8 to 9.1, p = 0.1).
In the as-treated analysis, the treatment effects for the randomized and observational cohorts were similar at four years (Fig. 2). In the randomized and observational groups, respectively, these effects were 17.1 (95% confidence interval, 10.9 to 23.4) and 13 (95% confidence interval, 6.6 to 19.4) for bodily pain, 19.2 (95% confidence interval, 13.4 to 25.1) and 18.8 (95% confidence interval, 12.7 to 24.9) for physical function, and 16.2 (95% confidence interval, -20.7 to -11.6) and -12.2 (95% confidence interval, -16.8 to -7.5) for the Oswestry Disability Index.
The global hypothesis test comparing the as-treated treatment effects between the randomized and observational groups over all time periods showed no difference between the cohorts (p = 0.18 for bodily pain, p = 0.17 for physical function, and p = 0.77 for the Oswestry Disability Index).
Results from the intent-to-treat and as-treated analyses of the two cohorts are compared in Figure 2. The as-treated treatment effects significantly favored surgery in both cohorts. In the combined analysis, the treatment effects were significantly in favor of surgery for all primary and secondary outcome measures at each time point out to four years (Table I).
Subgroup Analyses
Table II shows the results of subgroup analyses comparing the time-weighted average outcomes between patients with and those without neurogenic claudication on baseline clinical examination and between patients with and those without a neurological deficit on baseline clinical examination. Approximately 85% of the subjects had neurogenic claudication, while the remaining 15% had more radicular symptoms with evidence of nerve root irritation. Those with neurogenic claudication had similar overall results of surgery, as compared with those without neurogenic claudication, but showed less improvement after nonoperative care; this resulted in a significantly greater surgical treatment effect in the neurogenic claudication group. The presence of a neurological deficit did not result in significant differences in either the surgical or the nonoperative outcomes, and there was no consistent difference in treatment effects.
Only 7% of the subjects had compensation claims, so this subgroup was too small to allow meaningful comparisons. Ancillary studies from the SPORT have provided information on the effect of baseline radiographic predictors (spondylolisthesis grade, degree of mobility on flexion radiographs, and disc space height) on outcomes25 as well as the relative outcomes of surgery in subgroups based on the type of fusion performed26.
Our intent-to-treat analysis of patients who had had, for at least twelve weeks, signs and symptoms of degenerative spondylolisthesis with spinal stenosis confirmed by radiographic studies demonstrated that surgery had no significant advantage over nonoperative treatment; at the three and four-year follow-up points, nonoperative treatment showed a slight but not a significant advantage. However, these results must be viewed in the context of substantial rates of nonadherence to the assigned treatment. This mixing of treatments generally biases effect estimates toward the null9,10,13,14,27.
The treatment effect in favor of surgery that was found in the as-treated analysis suggests that the intent-to-treat analysis underestimated the true effect of surgery. The effect was seen as early as six weeks, was maximum by six to twelve months, and persisted over four years. The nonoperative treatment group demonstrated only modest improvement over time. The results in both treatment groups were maintained between two and four years.
This study provided an opportunity to compare the results between patients who were willing to participate in a randomized study (the randomized cohort) and those who were not (the observational cohort). These two cohorts were remarkably similar at baseline. Other than treatment preference, the only significant differences were small ones in the level and location of the stenosis on baseline imaging. The cohorts also had similar outcomes, with no significant differences between the treatment effects in the as-treated analyses, findings that support the validity of the combined analysis. Although these analyses were not based entirely on randomized treatment assignments, the results were strengthened by the use of specific inclusion and exclusion criteria, the large sample size, and the adjustment for potentially confounding baseline factors28.
Comparisons with Other Studies
The characteristics of the participants and the short-term outcomes in the SPORT as previously reported9 are similar to those in studies of degenerative spondylolisthesis and of mixed cohorts of patients who had stenosis with and without degenerative spondylolisthesis.
The surgical outcomes in the SPORT were generally similar to those in previous surgical series. Herkowitz and Kurz7 reported absolute improvements of 33% in scores for back pain and 55% in scores for leg pain (on 6-point scales) at an average of three years, results that are similar to the improvements of 30% and 43% (on 7-point scales), respectively, seen in the SPORT at four years. Also, the improvements at four years after surgery for degenerative spondylolisthesis in the SPORT were similar to the outcomes of surgery in the Maine Lumbar Spine Study (MLSS)2 of a mixed cohort of patients who had stenosis with and without degenerative spondylolisthesis. The improvements in the scores for the bothersomeness of stenosis, leg pain, and low-back pain were nearly identical in the two studies: -9.2, -3, and -2.1 points in the SPORT and -9.4, -3.5, and -1.7 points in the MLSS.
Carragee et al. reported on the outcomes of fusion with or without decompression in patients with isthmic spondylolisthesis29-31. They found improved outcomes with fusion independent of decompression, particularly among those with instability on flexion-extension radiographs. They also found that circumferential fusion had some early advantage over posterior fusion with instrumentation. However, isthmic spondylolisthesis and degenerative spondylolisthesis, the focus of our study, are quite different disease processes; in particular, decompression was important in our population because all patients had neurogenic claudication or radicular leg pain. In addition, Pearson et al. found that patients with degenerative spondylolisthesis who had had more mobility at baseline actually had better nonoperative outcomes than those with less movement as documented radiographically25, the opposite of what was found in patients with isthmic spondylolisthesis.
There was little evidence of harm from either treatment in our study. In the interval between two and four years, there were no cases of paralysis in either the surgical or the nonoperative group. The four-year rate of reoperation for recurrent stenosis or spondylolisthesis was 5%, and the overall reoperation rate increased from 12% at two years to 15% at four years (compared with 6.2% at four years in the MLSS2). The perioperative mortality rate remained unchanged at 0.5%, which is less than the 1.3% rate seen in Medicare patients after fusion surgery for spondylolisthesis24.
The four-year mortality rate was similar in the two treatment groups and was lower than actuarial projections. The rate in the nonoperative group was somewhat lower than that in the operative group, although not significantly so. It should be noted that higher rates of complications have been reported with increasing age and coexisting medical conditions32.
Limitations
A major limitation of this study is the marked degree of nonadherence to the randomized treatment. This reduced the power of the intent-to-treat analysis to demonstrate a treatment effect. Although the as-treated analysis lacked the strong protection from confounding conferred by randomization, these analyses were carefully controlled for important covariates and yielded results similar to those of prior studies2,7,9. Another limitation is the heterogeneity of the treatment interventions. The choice of nonoperative therapies as well as the decision regarding whether and how to perform the fusions were at the discretion of the treating physician and the patient. This resulted in a clinically relevant comparison of the effectiveness of current treatment practices but does not allow us to draw direct conclusions regarding the efficacy of one specific surgical technique compared with one specific nonoperative treatment.
Overview
In the as-treated analysis, combining the randomized and observational cohorts of patients with spinal stenosis secondary to degenerative spondylolisthesis, those treated surgically were found to have significantly greater improvement in scores for pain, function, satisfaction, and self-rated progress over four years compared with patients treated nonoperatively. The results in both groups were stable between two and four years.