0
Commentary and Perspective   |    
Can Statistics Alone Add Clinical Meaning to Non-Specific Billing Databases?Commentary on an article by Richard A. Deyo, MD, MPH, et al.: “Revision Surgery Following Operations for Lumbar Stenosis”
Alan B.C. Dang, MD; Steven R. Garfin, MD
View Disclosures and Other Information
  • Disclosure statement for author(s): PDF

None of the authors received payments or services, either directly or indirectly (i.e., via his or her institution), from a third party in support of any aspect of this work. One or more of the authors, or his or her institution, has had a financial relationship, in the thirty-six months prior to submission of this work, with an entity in the biomedical arena that could be perceived to influence or have the potential to influence what is written in this work. No author has had any other relationships, or has engaged in any other activities, that could be perceived to influence or have the potential to influence what is written in this work. The complete Disclosures of Potential Conflicts of Interest submitted by authors are always provided with the online version of the article.

Copyright © 2011 by The Journal of Bone and Joint Surgery, Inc.
J Bone Joint Surg Am, 2011 Nov 02;93(21):e128 1-2. doi: 10.2106/JBJS.K.01146
5 Recommendations (Recommend) | 3 Comments | Saved by 3 Users Save Case
Revision surgery after an operation for the treatment of lumbar stenosis is an undesired outcome. In an attempt to further understand the incidence of revision surgery after the initial treatment of lumbar spinal stenosis, the authors of the current paper used the Medicare Provider Analysis and Review (MedPAR) billing database from 1999 to 2008. As with any retrospective study relying on billing data to make appropriate clinical conclusions, one must recognize and understand the limitations of the potentially flawed and biased source data that have no easily verifiable inputs or controls. In this case, the authors just accepted the data as presented.
The MedPAR database contains information about all Medicare hospital claims submitted for reimbursement. Importantly, the database only contains broad, categorically based, facility ICD-9-CM billing codes selected by billing personnel after the fact rather than the surgeon-selected Healthcare Common Procedure Coding System (HCPCS) Level-I Current Procedure Terminology (CPT) code, which also does not always accurately reflect what was done. These data only reflect what is needed for hospitals to collect payments, and they include no other scientific or clinical facts or discussion of indications or techniques. The multicenter study from 1997 that was cited by the authors to support their analysis regarding the accuracy of coding indicated that seventeen of sixty-one patients who had undergone previous lumbar surgery actually had the appropriate code for the diagnosis1. The abstract states, “Although there were both false-negative and false-positive codes at each institution, most errors were related to the low sensitivity of coding for previous spinal operations: only seventeen (28 per cent) of sixty-one such diagnoses were coded correctly.”1
Newer studies evaluating the accuracy of ICD-9-CM-based databases have shown a similarly disappointing quality of billing data. Over a six-month period in 2008, Campbell et al. performed a prospective assessment of complications following spine surgery at a single institution and then compared the data against ICD-9-CM codes that had been submitted for billing2. In that study, the ICD-9-CM database captured hospital complications of limited clinical consequence such as transient hyponatremia that were not captured by the prospectively collected data but underreported cases of infection (p = 0.003), deep wound infection (p < 0.0001), deep-vein thrombosis (p = 0.0025), and new neurological deficits (p = 0.04). This finding suggests that codes for billing might not reflect “reality.”
While underreporting of prior lumbar surgery might blur the differences between groups, underreporting of infection would alter the paper's results; infection is more likely with instrumentation than without. In the current study, the authors conclude that “At four years, complex arthrodeses were associated with the highest probability of repeat surgery.” The absolute risk reduction from complex arthrodesis at four years as compared with simple decompression or arthrodesis was small (2.8%) but significant (p < 0.05). However, if the MedPAR database failed to document infections in just fifty-nine of the 3159 reported patients undergoing a complex arthrodesis, then there would no longer be any significant difference. The appropriate future direction would not be the “need to clarify indications for arthrodesis in spinal stenosis” but could be the need to identify protocols and tools that will decrease the rates of infection—or, better yet, to improve the reporting of factual indications, techniques, and (adverse) occurrences before the data are used as facts.
Clinicians and scientists must work with the tools that they are given. While multicenter, prospective, randomized trials are ideal tools, they are likely to be even more difficult to conduct in our era of a global economic slowdown and diminishing research funding. Observational studies such as the one performed here have limited value as research tools. There is often insufficient clinical detail to allow further investigation of underlying factors that affect outcome as well as no independent assessment of the reliability or correctness of the data. These details are needed to supplement our understanding of outcomes, particularly those with complications that are delayed or that occur infrequently. The interpretation and discussion of results from database-driven observational studies are very sensitive to bias. These data are at best self-reported or reported by a hospital coder with no clinical perspective or knowledge. There are no controls, validation, or verification of the input diagnosis, surgical indications, techniques, or even subsequent surgeries. Analyzing unverified, incomplete, and occasionally flawed data without oversight to ensure standardization or precise inclusion criteria in a statistically “correct” manner is not enough to produce scientifically or clinically valid or valuable conclusions. For observational studies to be clinically useful in today's era of evidence-based medicine, there must be independent patient and data reviews to ensure the quality of the source data.
Taken at face value, this paper suggests that arthrodeses increase cost, in both the short term and in the long term, because of repeat surgeries. With the bias that spine surgeons are performing too many arthrodesis procedures when decompression is sufficient, which is potentially inherent in this paper, an insurance carrier may choose to reject all claims for all arthrodeses with a diagnosis code of lumbar stenosis. However, this observational report is not capable of identifying cases of regional kyphosis, spondylolisthesis, dynamic instability documented on preoperative imaging, iatrogenic instability, or even more complex disease, all of which are accepted relative indications for arthrodesis following decompression.
As the authors point out, the wide geographic variation in the rates of arthrodesis suggests that surgeon preference plays an important role. There are likely different standards and sets of indications used by surgeons to decide whether or not to perform an arthrodesis. We agree that some surgeons may be performing “too many arthrodeses.” However, this retrospective observational study is unable to provide evidence to support or refute that statement. It only provides evidence that our ability to answer clinical questions using billing databases is limited and imprecise. A difference of just fifty-nine patients can make the author's statement of “If patients with more complex surgery had more complex disease, the index procedure did not lower their risk of reoperation to the same level as that for patients having decompression alone” transform into “Patients with more complex disease undergoing complex surgery were able to lower their risk of reoperation to the same level as patients with early disease undergoing decompression only.” Billing databases do not provide a sufficient mechanism for subgroup analysis.
Spine clinicians should be shepherds of our profession. Ultimately, it should be clinically active physicians (e.g., academic societies) and not insurance carriers who determine what procedure, if any, is indicated for patients. In this report, the authors attempt to answer clinical questions with use of a billing database provided by an insurance carrier, which is both inaccurate and limited in scope. This is because there are no independently reviewed global data that are captured at the time of the patient encounter. Electronic medical records may have made data acquisition easier, but there has not been a similar improvement in distinguishing factual data from noise. Few systems are developed to support robust, accurate, consistent, and reliable retrospective and observational research studies.
The clinical questions and quandaries that physicians and patients face in the next decade will only get more complex as our armamentarium of procedures expands and the demand for patient outcome and cost-effectiveness data increases. Looking retrospectively at insurance-derived, physician self-entered, or hospital coder-entered databases does not scientifically present these data or truly help to differentiate treatment options. The attempt at “science” by the pure use of statistics, as is done in this paper, hopefully will not be our future. We must direct our efforts toward the development of research-grade electronic medical records that can capture accurate, consistent, algorithm-derived clinical data at the time of the patient-surgeon encounter.
Faciszewski  T;  Broste  SK;  Fardon  D. Quality of data regarding diagnoses of spinal disorders in administrative databases. A multicenter study. J Bone Joint Surg Am.  1997;79:1481-8.[PubMed]
 
Campbell  PG;  Malone  J;  Yadla  S;  Chitale  R;  Nasser  R;  Maltenfort  MG;  Vaccaro  A;  Ratliff  JK. Comparison of ICD-9-based, retrospective, and prospective assessments of perioperative complications: assessment of accuracy in reporting. J Neurosurg Spine.  2011;14:16-22.[CrossRef][PubMed]
 

Submit a comment

References

Faciszewski  T;  Broste  SK;  Fardon  D. Quality of data regarding diagnoses of spinal disorders in administrative databases. A multicenter study. J Bone Joint Surg Am.  1997;79:1481-8.[PubMed]
 
Campbell  PG;  Malone  J;  Yadla  S;  Chitale  R;  Nasser  R;  Maltenfort  MG;  Vaccaro  A;  Ratliff  JK. Comparison of ICD-9-based, retrospective, and prospective assessments of perioperative complications: assessment of accuracy in reporting. J Neurosurg Spine.  2011;14:16-22.[CrossRef][PubMed]
 
Accreditation Statement
These activities have been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education (ACCME) through the joint sponsorship of the American Academy of Orthopaedic Surgeons and The Journal of Bone and Joint Surgery, Inc. The American Academy of Orthopaedic Surgeons is accredited by the ACCME to provide continuing medical education for physicians.
CME Activities Associated with This Article
Submit a Comment
Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
Comments are moderated and will appear on the site at the discretion of JBJS editorial staff.

* = Required Field
(if multiple authors, separate names by comma)
Example: John Doe





Alan B.C. Dang, Steven R. Garfin
Posted on January 05, 2012
Clarifications to our original commentary
University of California, San Diego

We agree that patients and physicians should be well-informed about major events such as repeat surgery at the time of making clinical decisions, but your study does not provide the appropriate data needed to allow patients and physicians to be well-informed. Additionally, we believe that arthrodesis surgery and complex fusions may not prevent the need for future surgery in all patients, but your study does not provide the appropriate evidence to support that claim. This is not an error in the statistical methods, but an inherent limitation of the MedPAR database. The purpose of our commentary was to 1) discuss the limitations of using hospital self-coded billing databases to attempt to extract clinical information, 2) emphasize the need for the development of new registries and electronic medical records that are capable of answering clinical questions more accurately and 3) put the paper in clinical context. The authors appropriately pointed out the limitations of a randomized clinical trial for studying events with infrequent occurrences. But it is wrong to say that the MedPAR database is 'well-suited' to study rates of repeat surgery. Though it may be a better option than others currently available, there are many limitations, some of which we have discussed. Neither a small randomized trial nor a retrospective look at the MedPAR database is sufficient to study rates of repeat surgery. If no difference between treatment groups is found in a hypothesis-driven research study, researchers are obliged to consider the risk of a Type I and Type II error. What are the chances that a difference was reported as existing, but there was truly no difference, or vice-versa? In a parallel fashion, we should be just as critical with conclusions drawn from retrospective reviews. To help determine the strength of the conclusions made, we used infection only as a point of discussion as to how small errors could be made and lead to a major change in the conclusion, and to show that the sensitivity of the data is just fifty-nine patients. Presented another way, if each of the 50 U.S. states made a mistake in categorizing 1.2 patients at any point from 2001 to 2008, the presented conclusion would not have statistical significance. Our discussion of infection was intended only as a hypothetical example. Although you have cited a 2004 paper from Neurology Today(1) about increased coding audits in the field of neurology with quotes and expert opinions, the reality is very different. An independent investigation by ABC News in 2009 estimates that $60 billion in Medicare Fraud occurs annually (2). The National Health Care Anti-Fraud Association estimates that $68 billion is lost to fraud annually (3). Most relevant is our referenced paper, published in 2011, which compared coding in a retrospective and prospective manner for patients undergoing spine surgery (4). Even this recently, and even in an academic institution, substantial mistakes were still being made with coding. We brought up the issue of reimbursement and costs because that is integral to the issue. The MedPAR database isn't magical. It’s simply convenient. It is not a robust clinical database with the tools we need to answer many clinical questions. The MedPAR database is generated by hospitals entirely for reimbursement and costs. The data entered into the MedPAR database is done by coders with little clinical knowledge and only includes the ICD-9-CM codes that most closely reflect the billable services provided under available guidelines. The information put into the database is not done with the foresight that the data will be reviewed in the future to help make clinical decisions. The information in the MedPAR database available for us to analyze is limited to what was determined as important by policy makers and insurers, not clinician scientists. Although your intended audience is the clinician, and your intention is to appropriately warn clinicians that arthrodesis must be used judiciously, JBJS does not exist in a bubble and the original manuscript as written can be used by an insurance carrier to reject all claims for all arthrodeses with a diagnosis code of lumbar stenosis. Your study highlights the daunting task that we face ahead of us. Even with the best available dataset and the most rigorous statistics available, we are unable to define the exact indications for arthrodesis. The appropriate next step is to improve the quality of the data. This means moving away from databases that are designed from the point-of-view of the insurance carrier and policy maker such as MedPAR. This is why we stated that we must direct our efforts toward the development of research-grade electronic medical records that can capture accurate, consistent, algorithm-derived clinical data at the time of encounter. To prevent any further miscommunication or misinterpretation, we hope that JBJS readers who are following our current conversation understand: 1) The limitations of using billing databases to attempt to extract clinical information. Specifically, that MedPAR and similar billing databases are not designed to contain useful clinical data. The clinical value is a statistical byproduct. Additionally, most of these billing databases do not distinguished between clinician-driven and coder-driven input. In the specific study, the conclusion presented with statistical significance would/could lose significance with a margin of error of just 59 patients over the entire 2001-2008 dataset, as an example with infections. Strengthening the differences between groups may require databases with additional clinical verification of patient information. 2) The need for the development of new registries and electronic medical records. The current study has excellent statistical analysis, but limited clinical value. This is the fault of the source database. Only by supporting the development of clinician-driven databases can we improve our retrospective analyses. If we do not have clinician-driven databases then all of our decisions are subject to the data that is provided to us by policy makers and insurance companies via their billing databases. 3) The clinical context of the study. Surgeons should recognize that indications for arthrodesis should be specific. Performing complex fusions in order “to prevent further surgery” without any specific indication is not something that is supported by the best available data. In our opinion, arthrodesis should be reserved for regional kyphosis, spondylolisthesis, dynamic instability documented on pre-operative imaging, iatrogenic instability and more complex disease. These are relative not absolute indications for arthrodesis following decompression. Fusions should only be used to solve a patient’s current needs, when appropriate, taking into consideration short- and long-term consequences. We hope this additional comment clarifies the misconceptions. REFERENCES: (1) Avitzur O. Are you ready for an audit? Neurology Today 2004; 4(3): pp 25,28,31. (2) http://abcnews.go.com/Nightline/medicare-fraud-costs-taxpayers-60-billion-year/story?id=101265553. http://www.gwumc.edu/sphhs/departments/healthpolicy/dhp_publications/pub_uploads/dhpPublication_924894E4-5056-9D20-3DA16EE2DF2E2336.pdf (4) Campbell PG, Malone J, Yadla S, et al. Comparison of ICD-9–based, retrospective, and prospective assessments of perioperative complications: assessment of accuracy in reporting. J Neurosurg Spine 2011;14:16-22.

Richard A. Deyo, MD, MPH; Brook I. Martin, PhD, MPH; William Kreuter, MPA; Jeffrey G. Jarvik, MD, MPH; Heather Angier, MPH; Sohail K. Mirza, MD, MPH
Posted on November 21, 2011
Possible Misconceptions
Department of Family Medicine, Oregon Health and Science University, Portland, OR; Orthopaedic Surgery, Dartmouth-Hitchcock Medical Center, Lebanon, NH; Department of Health Services, University of Wa

We appreciate the commentary by doctors Dang and Garfin[1] regarding our recent article on revision surgery for spinal stenosis[2], but are concerned about some possible misconceptions in their discussion. We fully appreciate the limitations of the ICD-9-CM diagnosis and procedure codes. Earlier studies have indeed suggested that the ICD-9 codes underestimate the prevalence of previous surgery. This is why we not only searched for ICD-9 codes suggesting repeat surgery, but also examined 3 prior years of actual claims to identify back operations. This process greatly increased the number of previous back operations identified, though undoubtedly some earlier operations remained unrecognized. We are puzzled by the commentary’s focus on infections, as we did not attempt to identify infections per se. Instead, we sought to identify repeat surgery for any reason. Reoperations within the first 30 days of the index procedure, when most infections would be treated, were distinctly uncommon, as the graph of cumulative reoperations in figure 1 suggests.[2] It appears that the vast majority of repeat operations are unrelated to infection. The authors refer to Medicare claims data as “unverified, incomplete, and occasionally flawed”, and later “inaccurate”. Undoubtedly, all research data are “occasionally flawed”. However, errors of classification would reduce differences among groups by diluting real differences, rather than accentuating them. For example, if we mistakenly categorized a fusion procedure as a decompression (underrepresenting complexity), it would make the decompression group look more like the fusion group. Most believe that the quality of claims data has improved in recent years, and the Center for Medicare and Medicaid Services (CMS) has conducted many coding audits with substantial fines for poor documentation.[3] The commentary focuses repeatedly on costs and reimbursement for surgical procedures, yet neither is mentioned in our article. By publishing our report in a clinical journal, we are targeting those making clinical decisions rather than policy makers or insurers. Patients and physicians should be well-informed about major events such as repeat surgery at the time of making clinical decisions. We have repeatedly heard the argument that more complex fusions are performed in order to prevent the need for future surgery, yet our data fail to support this rationale. We certainly agree that further rigorous investigation of the question is needed.Finally, researchers and clinicians should recognize that for uncommon adverse events, such as repeat surgery or unusual complications, randomized trials are rarely the ideal tool for quantification. Methodologists agree that most randomized trials are too small and too brief to identify infrequent events and those that occur years after treatment, and often exclude high-risk patients.[4] Repeat surgery is the type of event for which Medicare claims data are particularly well-suited, because virtually all inpatient events are identified for all patients. REFERENCES [1]. Dang ABC, Garfin SR. Can statistics alone add clinical meaning to non-specific billing databases? J Bone Joint Surg Am 2011; 93: e128(1-2). [2]. Deyo RA, Martin BI, Kreuter W, Jarvik JG, Angier H, Mirza SK. Revision surgery following operations for lumbar stenosis. J Bone Joiint Surg Am. 2011; 93: 1979-86. [3]. Avitzur O. Are you ready for an audit? Neurology Today 2004; 4(3): pp 25,28,31. [4]. Chou R, Helfand M. Challenges in systematic reviews that assess treatment harms. Ann Intern Med 2005; 142: 1090-1099.

Related Content
The Journal of Bone & Joint Surgery
JBJS Case Connector
Topic Collections
Related Audio and Videos
Clinical Trials
Readers of This Also Read...
JBJS Jobs
04/22/2014
New York - Columbia University Medical Ctr/Dept of Ortho.Surg
05/03/2012
California - UCLA/OH Department of Orthopaedic Surgery
11/15/2013
Louisiana - Ochsner Health System
03/05/2014
Oklahoma - The University of Oklahoma