It has become increasingly important for surgeons and hospitals to measure and improve the quality of the surgical care that they deliver. Safety and quality in the perioperative period is one area that has garnered considerable attention, and over the last several years a number of large-scale, nationwide efforts have been initiated to champion this cause. These include the Surgical Care Improvement Project (SCIP) as well as the American College of Surgeons Surgical Quality Alliance and National Surgical Quality Improvement Program (ACS NSQIP)1-4. In the past, these efforts have primarily targeted general and vascular surgery; however, there is growing interest in expanding the scope of these programs to address the specific needs of other surgical specialties such as orthopaedics5.
Orthopaedic surgery is a broad field in which a diverse array of procedures is performed across a broad range of human anatomy. As such, it is not entirely clear where quality-improvement efforts should be directed to have the greatest impact on quality and safety. One approach would be to target procedures with the highest rates of adverse events. Another approach would be to target the most common procedures. However, a third way would be to combine the two approaches and target those procedures that generate the greatest number of adverse events6. Such a strategy would be a way to focus quality-improvement efforts on procedures that not only generate the greatest number of adverse events, but also potentially offer the most room for improvement. Prior investigators have applied this exact strategy to general surgery and found that adverse events are indeed concentrated among a small group of approximately ten general surgical procedures6.
In this context, we sought to prioritize procedures for quality-improvement efforts in orthopaedic surgery by attempting to identify those procedures that generate the greatest number of adverse events. We first ranked procedures according to their relative contribution to the overall number of adverse events during the first thirty postoperative days. We then ranked procedures according to their relative contribution to overall excess length of hospital stay, a proxy for the additional costs imposed by complications. Finally, we performed a sensitivity analysis to assess how our findings were affected by using different definitions of adverse events and by adjusting adverse-event counts according to the patient's preoperative risk. Our intention is for surgeons and other stakeholders to use these rank lists as a guide to direct future safety and quality-improvement efforts.
Study Population
The methods used in this study mirror the methods that we previously used to rank general surgery procedures according to their relative contribution to the overall number of adverse events in that specialty6. Similarly, we used data from the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP public use files, 2005-2006, 2007)2. ACS NSQIP has traditionally focused on general and vascular surgery rather than orthopaedic surgery. As such, data on orthopaedic cases are a small subset of the ACS NSQIP data set since only eight hospitals participated in the program's multispecialty module during 2005 through 2006 and thirty hospitals, during 2007. On the basis of data from these hospitals, we created a cohort of 7971 orthopaedic surgery cases by limiting ACS NSQIP data to cases where it was specified that the primary surgeon was an orthopaedic surgeon. Patients under the age of eighteen years were excluded. We also excluded one operative case (a femoral fracture repair) from our cohort because of the improbable combination of a 120-day length of hospital stay without a record of a single adverse event during the stay. After elimination of this case, the final cohort consisted of 7970 cases.
We used the Current Procedural Terminology (CPT) code of the principal operative procedure field to define the primary procedure of the case. We combined CPT codes into forty-four clinically recognizable procedure groups by referring to code groupings used by billing services7. We refined and finalized these groupings on the basis of our clinical judgment by considering each procedure's anatomic location and pathologic indication. The forty-four procedure groups accounted for 6597 (83%) of the 7970 orthopaedic cases in the cohort.
The other variables required for our study were limited to those providing information on adverse events occurring within the first thirty postoperative days. These variables included information on superficial or deep (incisional or organ space) surgical site infection, wound disruption, urinary tract infection, stroke or a cerebrovascular accident with a neurologic deficit, coma lasting longer than twenty-four hours, peripheral nerve injury, cardiac arrest requiring cardiopulmonary resuscitation, myocardial infarction, bleeding transfusions, mechanical failure of an extracardiac graft, deep-vein thrombosis or thrombophlebitis, sepsis, septic shock, a return to the operating room, and death. Additional details about the data are well described elsewhere2.
Analysis
We examined the procedure groups according to their relative contribution to the overall number of adverse events. We began by calculating the total number of adverse events experienced by the patients in the entire cohort. We determined the proportion of this total attributable to each of the forty-four procedure groups. We then ranked the procedure groups in descending order according to their relative contribution to the cohort's total count of adverse events.
We conducted a sensitivity analysis to determine how our findings would be affected by modifying our definition of an adverse event. We first repeated the analysis after excluding superficial surgical site infection as an adverse event. (Superficial surgical site infections are defined by ACS NSQIP as involving only skin or subcutaneous tissues of an incision, but not deep soft tissues like fascial or muscle layers. ACS NSQIP uses separate variables for deeper surgical site infections [i.e., deep incisional and organ space infection].) This was followed by an analysis excluding urinary tract infection as an adverse event. We then repeated the analysis using mortality as the only adverse event. In a final analysis, we calculated the excess length of hospital stay associated with the adverse events within each procedure group. Excess length of hospital stay was calculated as the difference in the mean length of the stay between the cases with and those without adverse events following the operation, multiplied by the number of cases in the procedure group with one or more adverse events.
Finally, we used the American Society of Anesthesiologists (ASA) physical status classification to adjust adverse-event counts for the patient's underlying illness and preoperative risk8. ASA scores were assigned to each patient before the operation. Scores range from 1, indicating a healthy patient, to 5, indicating a patient who is not expected to survive longer than twenty-four hours, with or without surgery. Between these end points, a score of 2 is assigned to patients with mild systemic illness; 3, to patients with severe systemic illness; and 4, to patients with severe systemic illness that poses a constant threat to life. We adjusted for preoperative risk within the procedure groups by dividing each procedure group's adverse event count by its mean ASA score. As such, greater weight was given to adverse events that occurred among healthier groups of patients than to those that occurred in less healthy patients.
In each instance, we again ranked the procedure groups on the basis of their relative contribution to the overall excess length of hospital stay, adverse events, and mortality alone within the specialty. The resultant rankings were compared with the rankings obtained under the initial assumptions.
Source of Funding
No outside funding was received in support of this study.
In this study, we found that the vast majority of adverse events in orthopaedic surgery occur among a very small number of procedures. Only ten procedures accounted for 70% of the adverse events and 65% of the excess hospital days. Hip fracture repairs were responsible for the greatest share of adverse events, followed closely by hip and knee arthroplasty procedures. Other notable procedures include knee arthroscopy, laminectomy, lumbar/thoracic arthrodesis, and femoral fracture repairs. With the exception of knee arthroscopy and femoral fracture repair, the same procedures also accounted for the greatest share of deaths, excess hospital days, and adverse-event counts adjusted for the patient's preoperative risk.
In general surgery, the majority of adverse events are also clustered among a relatively small number of procedures6. Orthopaedics, like general surgery, is a diverse field with a wide variety of procedures. Nonetheless, despite the breadth of these fields, a "vital few" procedures still account for a majority of the adverse events. This is a finding that is consistent with a principle more widely known as the Pareto principle9, or the observation that the majority of some result or effect (in this case adverse events) can usually be accounted for by a minority of factors, agents, or causes (in this case operative procedures).
Clearly, procedure categories could have been "lumped" or "split" in different ways. For example, we might have opted to group revision joint arthroplasty procedures with primary arthroplasty procedures. Although such modifications would change the rank ordering of the procedures, we believe that these changes would have little effect on our overall conclusions. Hip fracture repairs and joint arthroplasty procedures are, without a doubt, major sources of adverse events in orthopaedic surgery, regardless of whether they rank first or second. We found that reasonable modification of other procedure category definitions resulted in similar, unimportant changes in procedure rank order.
There are a few procedure groups that deserve special attention. First, hip fracture repairs stand out as the largest contributor to perioperative adverse events in orthopaedic surgery. Hip fracture repairs alone accounted for nearly 20% of both excess hospital days and adverse events in the field, and its rank dropped from first to only second when adverse-event counts were adjusted for preoperative risk (ASA score). Moreover, as the baby boomer generation ages, hip fracture repairs are destined to account for an even larger share of cases and therefore adverse events in this field. These data clearly indicate the importance of targeting hip fracture repairs in future quality-improvement initiatives. The same demographic changes are likely to affect both total hip and total knee arthroplasty procedures as well.
Knee arthroscopy also deserves special comment. Knee arthroscopy ranked fifth in terms of its relative contribution to the overall number of adverse events, a finding that is somewhat surprising in light of its low rate of adverse events (approximately 1%). The procedure appeared toward the top of the rank list in large part because of how frequently it is performed; nearly a quarter of the cases in the cohort were knee arthroscopy procedures. As such, this procedure highlights a scenario in which a high-ranking procedure may not necessarily be an efficient target for tracking perioperative adverse events. Reviewing operative procedures is time-consuming and costly. Tracking knee arthroscopy cases for short-term adverse events would require the review of an extremely large number of operative cases in order to capture only 5% of the adverse events.
There are a number of study limitations that deserve consideration. First, if it were possible, we would have ranked procedures on the basis of counts of preventable adverse events (rather than observed events). After all, counts of preventable adverse events would allow us to target our quality-improvement efforts to only those adverse events that we could actually do something about. Some have suggested that preventable adverse events be teased out from nonpreventable events by using characteristics such as age, comorbidities, or severity of illness. This suggestion is closely related to, but not quite the same as, risk adjustment. Risk adjustment is likely to have an important role to play after a target list of procedures has been identified for quality-improvement efforts because it is the means by which we "level the playing field" for comparing outcomes across hospitals or surgeons with differences in case mix. Yet, disentangling the various components of risk in this study would be distracting—if it were feasible at all—because no amount of risk adjustment would enable us to distinguish preventable from nonpreventable adverse events. Moreover, the procedures in our rank lists are likely to require procedure-specific risk-adjustment models, a task that is well beyond the scope of this study and is best saved for future research.
Nonetheless, as a preliminary step in this direction, we weighted adverse-event counts for each procedure group by the group's average ASA score in an attempt to adjust for the patient's preoperative risk of complications. We found that these weights had almost no impact on our rank lists. The same small number of procedures accounted for the vast majority of adverse events in our cohort with only minor changes in rank order. We believe that this finding clearly speaks to the robust nature of our findings since the technique, in essence, gave a healthy cohort's adverse events four to five times the weight of the adverse events in a cohort consisting solely of patients with severe, life-threatening comorbid conditions.
A second concern relates to the relative weight given to different types of complications. Clearly, adverse events are not equal in their morbidity or severity, and it could be suggested that weights be applied to adverse-event counts to better reflect differences in the morbidity of one complication compared with that of another. However, such a process requires a series of ad hoc judgments about adverse-event severity that we chose to avoid. Instead, we addressed this issue by conducting a sensitivity analysis to determine how our findings would be affected by excluding superficial surgical site infection and urinary tract infection (arguably the least severe complications) as adverse events as well as by ranking procedures on the basis of mortality counts alone (almost certainly the most severe complication). Our procedure rankings were nearly identical in all three of these analyses. This homogeneity indicates that, although accounting for the relative severity of complications might be an important consideration, it does not change our central finding.
A third concern is that the types of hospitals included in our sample could affect the ability to generalize our findings. ACS NSQIP is overrepresented by large teaching hospitals. In 2005 and 2006, 67% of participating hospitals were classified as teaching hospitals. The program is also expensive, and participation is generally limited to larger hospitals that can afford the over $100,000 annual cost of the program. Furthermore, although more than 180 hospitals participate in ACS NSQIP, only a small number participate in the program's multispecialty program that tracks the outcomes of orthopaedic surgery cases. As such, our findings are based on the experience of thirty hospitals and may not be representative of the larger universe of hospitals in the United States. If the thirty participating hospitals had extremely large or small case volumes for a specific procedure group (e.g., a very large or small number of spine cases), it could alter that procedure group's relative ranking. Similarly, a procedure group's relative ranking could also be affected by a hospital's unusually good or bad outcomes for that procedure. Although it is important to question the extent to which our findings can be generalized, our results do make intuitive sense, and we anticipate that a similar study performed on a larger, more diverse sample of hospitals would demonstrate very similar results with only minor changes in the rank order of procedures.
A fourth limitation relates to ACS NSQIP's traditional focus on general and vascular surgery, rather than orthopaedic surgery. This is reflected by two attributes of ACS NSQIP's sampling frame10. The sampling frame is defined by a CPT-code inclusion list, and, although ACS NSQIP samples an extremely wide variety of operations, it underrepresents operations for the repair of foot fractures by excluding the CPT codes for everything but talar and calcaneal fractures. Moreover, the ACS NSQIP sampling frame excludes a large number of procedures performed during acute trauma admissions. Procedures performed after discharge from the acute trauma stay are, however, always included. As a result, ACS NSQIP undercounts certain fracture-repair operations such as pelvic fracture repair. It logically follows that, if data were collected specifically for orthopaedic surgery procedures, certain fracture-repair operations would not only appear in greater numbers but also make a larger relative contribution to adverse-event counts within the cohort.
The absence of orthopaedic-specific outcome measures also reflects the database's traditional focus on general and vascular surgery. While ACS NSQIP does follow complications that have direct relevance to orthopaedics (e.g., deep-vein thrombosis and pulmonary embolism), there are many orthopaedic-specific complications that the program does not track. Examples include periprosthetic fracture, implant failure, and implant loosening. Moreover, the maximum duration of follow-up for ACS NSQIP is only thirty days. As such, the program does not track long-term functional outcomes, nor does it record instances of implant loosening or infection that might manifest months to years after the operation. Clearly, our study would have benefited from the addition of long-term, orthopaedic-specific outcomes; however, this limitation is not unique to our study. Rather, it speaks to the larger problem of the scarcity of good, multicenter sources of outcomes data tailored to the needs of orthopaedic surgery.
Although we cannot comment on long-term outcomes in orthopaedic surgery, our study does provide valuable information about adverse events within the immediate postoperative period, with implications for future quality-measurement and improvement efforts in orthopaedic surgery. Quality measurement and improvement have become increasingly important in recent years. Payers are looking for ways to measure quality as a means to reward hospitals and surgeons for superior performance (i.e., centers of excellence and pay-for-performance programs)11-14. Regulators, such as The Joint Commission, are emphasizing surgical safety as part of the credentialing process1. However, quality-measurement and improvement initiatives can be both expensive and time-consuming, and it is not always clear where efforts should be directed for the greatest impact on patient outcomes14.
One way for stakeholders to prioritize their efforts would be to target procedures that appear at the top of our rank lists. Since the top-ranking procedures generate the greatest numbers of adverse events, they can also be viewed as procedures that offer the most room for improvement. Focusing on a limited number of these high-yield procedures could greatly reduce the cost of data collection and increase the likelihood of success. Prioritizing these high-yield procedures might be a reasonable starting point for quality-measurement and improvement efforts in orthopaedic surgery.
These rank lists should not, however, be the sole consideration when prioritizing quality-improvement efforts. For example, lower-extremity amputation (ranked ninth in terms of adverse events and twelfth when based on ASA-adjusted counts) may not be a high-yield target for quality improvement since procedure outcomes are more likely to be dictated by patient comorbidities than they are by the quality of the operation or other processes of care. Stakeholders might also choose to target at least one procedure in each of the subspecialties, regardless of the procedure's rank, in order to garner broader interest and support for quality-improvement initiatives.
Although the rank lists in our study are imperfect, our intention is for surgeons and other stakeholders to use them as a guide to direct future safety and quality-improvement efforts in orthopaedic surgery. With growing pressure to measure and improve quality in orthopaedic surgery, there is no doubt that there is much work to be done, and orthopaedic surgery is only just out of the starting blocks.