Revision surgery after an operation for the treatment of lumbar stenosis is an undesired outcome. In an attempt to further understand the incidence of revision surgery after the initial treatment of lumbar spinal stenosis, the authors of the current paper used the Medicare Provider Analysis and Review (MedPAR) billing database from 1999 to 2008. As with any retrospective study relying on billing data to make appropriate clinical conclusions, one must recognize and understand the limitations of the potentially flawed and biased source data that have no easily verifiable inputs or controls. In this case, the authors just accepted the data as presented.
The MedPAR database contains information about all Medicare hospital claims submitted for reimbursement. Importantly, the database only contains broad, categorically based, facility ICD-9-CM billing codes selected by billing personnel after the fact rather than the surgeon-selected Healthcare Common Procedure Coding System (HCPCS) Level-I Current Procedure Terminology (CPT) code, which also does not always accurately reflect what was done. These data only reflect what is needed for hospitals to collect payments, and they include no other scientific or clinical facts or discussion of indications or techniques. The multicenter study from 1997 that was cited by the authors to support their analysis regarding the accuracy of coding indicated that seventeen of sixty-one patients who had undergone previous lumbar surgery actually had the appropriate code for the diagnosis1. The abstract states, “Although there were both false-negative and false-positive codes at each institution, most errors were related to the low sensitivity of coding for previous spinal operations: only seventeen (28 per cent) of sixty-one such diagnoses were coded correctly.”1
Newer studies evaluating the accuracy of ICD-9-CM-based databases have shown a similarly disappointing quality of billing data. Over a six-month period in 2008, Campbell et al. performed a prospective assessment of complications following spine surgery at a single institution and then compared the data against ICD-9-CM codes that had been submitted for billing2. In that study, the ICD-9-CM database captured hospital complications of limited clinical consequence such as transient hyponatremia that were not captured by the prospectively collected data but underreported cases of infection (p = 0.003), deep wound infection (p < 0.0001), deep-vein thrombosis (p = 0.0025), and new neurological deficits (p = 0.04). This finding suggests that codes for billing might not reflect “reality.”
While underreporting of prior lumbar surgery might blur the differences between groups, underreporting of infection would alter the paper's results; infection is more likely with instrumentation than without. In the current study, the authors conclude that “At four years, complex arthrodeses were associated with the highest probability of repeat surgery.” The absolute risk reduction from complex arthrodesis at four years as compared with simple decompression or arthrodesis was small (2.8%) but significant (p < 0.05). However, if the MedPAR database failed to document infections in just fifty-nine of the 3159 reported patients undergoing a complex arthrodesis, then there would no longer be any significant difference. The appropriate future direction would not be the “need to clarify indications for arthrodesis in spinal stenosis” but could be the need to identify protocols and tools that will decrease the rates of infection—or, better yet, to improve the reporting of factual indications, techniques, and (adverse) occurrences before the data are used as facts.
Clinicians and scientists must work with the tools that they are given. While multicenter, prospective, randomized trials are ideal tools, they are likely to be even more difficult to conduct in our era of a global economic slowdown and diminishing research funding. Observational studies such as the one performed here have limited value as research tools. There is often insufficient clinical detail to allow further investigation of underlying factors that affect outcome as well as no independent assessment of the reliability or correctness of the data. These details are needed to supplement our understanding of outcomes, particularly those with complications that are delayed or that occur infrequently. The interpretation and discussion of results from database-driven observational studies are very sensitive to bias. These data are at best self-reported or reported by a hospital coder with no clinical perspective or knowledge. There are no controls, validation, or verification of the input diagnosis, surgical indications, techniques, or even subsequent surgeries. Analyzing unverified, incomplete, and occasionally flawed data without oversight to ensure standardization or precise inclusion criteria in a statistically “correct” manner is not enough to produce scientifically or clinically valid or valuable conclusions. For observational studies to be clinically useful in today's era of evidence-based medicine, there must be independent patient and data reviews to ensure the quality of the source data.
Taken at face value, this paper suggests that arthrodeses increase cost, in both the short term and in the long term, because of repeat surgeries. With the bias that spine surgeons are performing too many arthrodesis procedures when decompression is sufficient, which is potentially inherent in this paper, an insurance carrier may choose to reject all claims for all arthrodeses with a diagnosis code of lumbar stenosis. However, this observational report is not capable of identifying cases of regional kyphosis, spondylolisthesis, dynamic instability documented on preoperative imaging, iatrogenic instability, or even more complex disease, all of which are accepted relative indications for arthrodesis following decompression.
As the authors point out, the wide geographic variation in the rates of arthrodesis suggests that surgeon preference plays an important role. There are likely different standards and sets of indications used by surgeons to decide whether or not to perform an arthrodesis. We agree that some surgeons may be performing “too many arthrodeses.” However, this retrospective observational study is unable to provide evidence to support or refute that statement. It only provides evidence that our ability to answer clinical questions using billing databases is limited and imprecise. A difference of just fifty-nine patients can make the author's statement of “If patients with more complex surgery had more complex disease, the index procedure did not lower their risk of reoperation to the same level as that for patients having decompression alone” transform into “Patients with more complex disease undergoing complex surgery were able to lower their risk of reoperation to the same level as patients with early disease undergoing decompression only.” Billing databases do not provide a sufficient mechanism for subgroup analysis.
Spine clinicians should be shepherds of our profession. Ultimately, it should be clinically active physicians (e.g., academic societies) and not insurance carriers who determine what procedure, if any, is indicated for patients. In this report, the authors attempt to answer clinical questions with use of a billing database provided by an insurance carrier, which is both inaccurate and limited in scope. This is because there are no independently reviewed global data that are captured at the time of the patient encounter. Electronic medical records may have made data acquisition easier, but there has not been a similar improvement in distinguishing factual data from noise. Few systems are developed to support robust, accurate, consistent, and reliable retrospective and observational research studies.
The clinical questions and quandaries that physicians and patients face in the next decade will only get more complex as our armamentarium of procedures expands and the demand for patient outcome and cost-effectiveness data increases. Looking retrospectively at insurance-derived, physician self-entered, or hospital coder-entered databases does not scientifically present these data or truly help to differentiate treatment options. The attempt at “science” by the pure use of statistics, as is done in this paper, hopefully will not be our future. We must direct our efforts toward the development of research-grade electronic medical records that can capture accurate, consistent, algorithm-derived clinical data at the time of the patient-surgeon encounter.