The practice of evidence-based medicine has evolved and entered the rubric of most clinicians during the past several years. Gordon Guyatt, who first coined the term evidence-based medicine in 1990 at McMaster University, conceptualized the practice of evidence-based medicine as the integration of clinical expertise with the best available clinical evidence and patients' values1,2. The term best available evidence implies a hierarchy of evidence with highly valid and believable studies at the top and less valid evidence at the bottom2.
The motivation for this concept has been a desire to minimize the potential harming of patients by basing clinical decisions on the sorts of evidence that are least likely to be wrong. The increasing degree of potential bias in the design of studies has been categorized as follows3: systematic reviews of randomized controlled trials that reveal differences in treatment effect (heterogeneity) and individual high-quality randomized controlled trials (Level-I evidence); less rigorous randomized controlled trials, cohort studies, or observational studies (Level-II evidence); case-control studies (Level-III evidence); case series (Level-IV evidence), and expert opinion (Level-V evidence). Each of these types of studies is described below, although a more in-depth description of the meaning, history, and applicability of the concept of levels of evidence has been presented elsewhere4.
In a recent survey of the participants at the 2007 Annual Meeting of the American Orthopaedic Association (AOA), 94% of the respondents indicated that they incorporated evidence-based medicine into their decision-making always, often, or sometimes (Fig. 1). However, the same survey also showed little agreement in the choice and level of the primary type of evidence that was used in clinical decision-making; 82% believed that randomized controlled trials were not able to answer a majority of important clinical and research questions, 66% believed that there was a lack of appropriate evidence in the literature relevant to their clinical practice, and only 36% believed that randomized controlled trials would bring about important future advances in the field (Figs. 2 through 5). In this regard, the current reality is that the orthopaedic profession is employed as both an art and a science in the care of patients.
The purpose of this study, therefore, was to review the scientific literature to compare the degree to which evidence-based medicine (as expressed particularly in randomized controlled trials), observational studies, and expert opinion have each augmented the methodology of orthopaedic research studies and the clinical decision-making processes of orthopaedic surgeons.
Since January 2003, all clinical scientific articles published in The Journal of Bone and Joint Surgery (JBJS) (American Volume) have included a level-of-evidence rating5. Levels of evidence are hierarchical rating systems for classifying study quality. On the basis of a review of several existing evidence ratings, JBJS uses five levels for each of the four different study types (therapeutic, prognostic, diagnostic, and economic or decision-modeling). In a review of articles published in JBJS, Bhandari et al. compared the frequency of Level-I studies (randomized controlled trials) with those with lower levels of evidence5. Among fifty-one articles, 69% were studies of therapy. Only seven (14%) of the fifty-one articles represented the highest level of evidence (Level I). The majority (twenty-nine; 57%) of the scientific articles in JBJS were Level-IV evidence.
To optimize patient outcome, a tool or strategy to understand the quality of the current evidence in orthopaedic surgery would be beneficial. A rough surrogate for the assessment of quality could start with an assessment of the prevalence of Level-I studies (randomized controlled trials) that can assist surgeons in clinical decision-making. In a systematic review of 2468 studies in JBJS from 1988 to 2000, Bhandari et al. identified seventy-two (2.9%) that were randomized controlled trials6. The mean transformed score (and standard error) for overall study quality for the seventy-two studies was 68.1% ± 1.6%. Sixty percent (forty-three) of the seventy-two randomized controlled trials had a score of <75%. Drug trials had significantly greater mean quality scores than did surgical trials (72.8% compared with 63.9%; p < 0.05). Regression analysis revealed that randomized controlled trials indicating an epidemiology affiliation and those with cited funding were associated with higher scores for quality. Failure to (1) conceal randomization, (2) blind outcome assessors, and (3) describe the reasons why patients were excluded resulted in significantly lower quality scores (more than the 5% expected by the removal of each item) than those in the studies that met these individual criteria (p < 0.05).
The correlation of high levels of evidence with quality is an important related issue. Poolman et al. evaluated this very question in a recent review of 938 orthopaedic publications, thirty-two (3.4%) of which were randomized controlled trials7. Twenty-nine randomized controlled trials were reported as Level-I studies and three, as Level-II studies. With use of a Cochrane Collaboration Study Quality assessment tool, they found that the mean scores (and standard deviation) for study quality did not differ significantly between Level-I and Level-II studies (15.2 ± 3.3 compared with 11.7 ± 1.5 points, respectively; with a mean difference of 3.5 points [95% confidence interval, -0.5 to 7.5 points]; p = 0.08). These findings suggest that study quality varies considerably across Level-I studies. In addition, the level-of-evidence grades represent a simplification of a difficult critical appraisal process and may not always reflect the quality of randomized controlled trials.
A number of major factors limit the direct application of orthopaedic evidence to practice. They include (1) biases introduced by the lack of randomization (and the lack of concealment of allocation) in the comparison of interventions, (2) biases introduced by the lack of blinding (or independent outcomes assessment), (3) the lack of so-called patient-important outcomes as the focus of the study, (4) insufficient sample sizes that render many studies inconclusive, (5) a differential expertise bias in many surgical trials that challenges the comparability of the treatments, and (6) the reporting of significant differences even when they have no clinical impact. For some orthopaedic surgeons, this may cast doubt on the reliability of the results of many orthopaedic trials8,9. It should also be noted that each of these factors is not equal in the extent or nature of their effect on research quality.
Several limitations (namely, items 1, 2, and 4 described above) in the orthopaedic literature have the potential to be overcome by studies in which an adequate number of patients are randomized whenever "randomization is a feasible option." More than 3% of all questions in the orthopaedic field can be answered with a randomized controlled trial. If investigators decide to use the strategy of a randomized controlled trial, attention to details, such as concealment of allocation and blinding whenever possible, will further enhance inferences from the study results. Surgeons should look to innovative designs, such as expertise-based trials. Finally, surgeons should collaborate and ensure adequately powered studies of sufficient sample size.
The paradigm toward improving the evidence through the conduct of high-quality, collaborative studies is changing the field of orthopaedics. Several large collaborative groups have modeled this strategy in Canada, the United States, and Europe. Orthopaedic surgeons are far ahead of many other surgical disciplines in making large, collaborative trials a standard in the field. It is inevitable that orthopaedic surgeons will soon begin to make the sort of impact in research studies that cardiovascular, oncology, and osteoporosis trials have enjoyed over the past decade.
Strengths of Randomized Controlled Trials
Although they are difficult to perform, randomized controlled trials have rapidly become the so-called gold-standard study in the fields of medicine and surgery because of their many advantages over retrospective and observational studies. The difficulties associated with randomized controlled trials have been well described and include their expense, the participation of individuals with skill sets traditionally outside those associated with surgery (i.e., statisticians), the involvement of multiple centers, and the prolonged time from inception to completion of the study10. Consequently, quality research is difficult, and it takes time and money. However, the advantages of randomized controlled trials are compelling and include the ability to control for known variables, the treatment of a clearly defined problem in a clearly defined patient group, the elimination of surgical bias, and the comprehensive, independent evaluation of outcome11. In an ideal environment, these factors are compelling reasons to perform randomized controlled trials evaluating the outcome of new techniques or implants prior to their introduction to the general population.
There are numerous examples in the surgical literature in general and the orthopaedic literature in particular of studies on new techniques or implants with stellar initial results that are not reproducible and do not fare well when subjected to rigorous testing through a randomized trial. The extracranial to intracranial artery bypass surgery was highly touted as a means of reducing stroke in an at-risk population with carotid stenosis in retrospective studies by surgeons who had developed the procedure12. On the basis of these and similar studies, it became a common neurosurgical intervention for cerebrovascular disease. However, a large randomized clinical trial subsequently revealed a 14% increase in the risk of stroke (both fatal and nonfatal) in patients undergoing this surgery compared with medical management alone, and the procedure was essentially abandoned13.
Unfortunately, there are similar examples in the orthopaedic literature. In the treatment of humeral shaft fractures, locking humeral nails were initially introduced in the hope that their theoretical advantages would reproduce the excellent clinical results seen with similar devices in the lower extremity, and the initial retrospective reviews were promising. However, randomized controlled trials comparing locked humeral nailing and conventional plating not only failed to show superior results with the nails but also demonstrated a higher complication rate, and the enthusiasm for humeral nailing waned14.
Unreamed femoral nailing was introduced as a means of (theoretically) minimizing embolic load from the fixation of femoral shaft fractures and, on the basis of excellent results from retrospective, single-center reviews, became the standard of care in many institutions. However, an adequately powered randomized controlled trial of reamed compared with unreamed nailing of femoral shaft fractures revealed that the nonunion rate was 4.5 times higher with unreamed nailing (p = 0.049) and that there were no obvious pulmonary or systemic advantages to their use15. These scenarios illustrate the drawbacks of retrospective reviews—in the enthusiasm for a new and innovative technique or product, unanticipated complications may be overlooked or minimized. Also, a single surgeon or center may not treat sufficient numbers of patients to recognize the difference between a 98% union rate (reamed femoral nailing) and a 93% union rate (unreamed femoral nailing). As the quality of orthopaedic care improves, the differences in success rates between treatment methods become incrementally smaller. Thus, as the treatment effect diminishes (i.e., comparing union rates of 93% and 98%), large multicenter randomized controlled trials become the preferred method for establishing optimal patient treatment.
Lastly, orthopaedic surgery remains a technically based specialty, and the results reported by a subspecialist who has mastered the learning curve with a product or technique he or she developed and has performed multiple times may not be reproducible in the general orthopaedic community. Proprietary interests in products or implants may further exacerbate the discrepancies between single-center reviews and objective, prospective, multicenter investigations.
Weaknesses of Randomized Controlled Trials
While randomized controlled trials are the so-called gold standard, there are concerns about ensuring the proper standardization of this methodology. Several approaches have been employed in this regard.
In order to improve the reporting of randomized controlled trials, the Consolidated Standards of Reporting Trials (CONSORT) statement was proposed in the Journal of the American Medical Association in 1996 and has subsequently been adopted by leading journals such as the Journal of the American Medical Association, the British Medical Journal, The Lancet, and Annals of Internal Medicine16. The CONSORT system involves a checklist and flowchart to assist researchers in preparing their report of a randomized controlled trial. The checklist consists of twenty-one items related to the methods, results, and discussion sections of a report, which help to identify vital information that can be used by authors, reviewers, and editors to assess the report's internal and external validity. The flowchart provides information about the progression of patients through a two-stage, parallel-design, randomized controlled trial, such as flow and withdrawal of participants and timing of outcome measures. However, an assessment by Bhandari et al.17 of randomized controlled trials in the orthopaedic fracture care literature demonstrated that an average of only 36% of the critical CONSORT criteria were met in published articles.
Assessing the quality of randomized controlled trials in orthopaedic fracture care can also be done with use of the Detsky quality index6. The index involves fourteen specific items and contains questions under the following five main categories, each of which is assigned a maximum of 4 points: (1) randomization, (2) outcome measures, (3) eligibility criteria and reasons for patient exclusion, (4) interventions, and (5) statistics. The final category on statistics asks an extra question of the researcher for any so-called negative randomized controlled trials (i.e., the findings were not significant), namely, whether confidence intervals were used or post hoc power calculations were performed. Therefore, the total possible scores for positive and negative randomized controlled trials are, respectively, 20 and 21 points. However, with use of a normalized version of the Detsky quality index, where a score of =75% was considered to be a high-quality study, Bhandari et al.6 found that during a twelve-year period, the randomized controlled trials published in JBJS had a mean quality score of only 68%, and 60% of the studies had a score of <75%.
This quality problem is most manifest when evaluating beta error. This is the chance that a finding of sameness in a randomized controlled trial is incorrect. This comes into play when a trial examines two methods of treatment for a problem, whether it is fixation methods for a specific fracture, implant type for hip replacement surgery, or the choice of operative or nonoperative management of a spinal disorder. Beta error occurs when the study groups are not found to be different from one another, but this is not proven statistically. Most often, the study noted that "there was no statistical difference between the groups." This finding, when based on not reaching the preset level of significance (typically, p < 0.05), is not proof. The lack of proving a difference between two groups is different from proof of sameness. To accept that two groups are the same, different criteria have to be met. The statistical term is power. In clinical studies, the acceptable power to believe that two groups are the same is usually set at 80%, meaning that there is only a 20% chance of the groups being different if they are reported to be the same. In many areas of orthopaedic surgery, the vast majority of randomized controlled trials are severely underpowered. Lochner et al. found that 91% of randomized controlled trials in fracture care were underpowered in their conclusions18. The rates in other areas, such as spine and hand, have been reported to be >80%19,20.
Similarly, trials can incorrectly conclude that two treatments are different with respect to a given outcome measure, when, in fact, they are not. This is an alpha error. Most commonly, alpha error is introduced by testing multiple outcome measures without adjusting for the increased chance of a random finding this introduces. A level of p < 0.05 as significant is arbitrarily set, indicating that, if two groups are found to be different, there is a <5% probability that this is by chance. If we evaluate multiple outcomes, the chances of finding a difference by chance are increased with each evaluation. Bhandari et al. reported potential alpha error in 37% of positive findings in sixty orthopaedic studies21. For example, if one evaluates sixteen outcomes, there is a 55% chance that one will have a p value of <0.05 simply by chance.
Even if a randomized controlled trial meets the statistical burden of proof, demonstrating that two treatment methods are different with respect to an outcome measure does not make the finding clinically relevant. One recently published biomechanical study noted a statistical difference in two fixation techniques in the proximal part of the tibia, with a p value of 0.045. While this sounds important because of a significant p value, the real value of the difference in the mean displacements of the two groups was <0.75 mm, clearly a clinical irrelevancy. Sung et al. evaluated the importance of a group of positive findings in the orthopaedic trauma literature22. They found that 30% of continuous and 47% of dichotomous positive outcomes failed to meet generally agreed-on levels of significance. While it is incumbent on the reader to determine the clinical relevance of published articles, the way that abstracts are written and, indeed, the direction that the discussion frequently draws the reader are often misleading. Finally, it must be noted that alpha and beta errors are not limited to randomized controlled trials but may be present in other types of studies.
Problems in the Methodology of Randomized Controlled Trials
A randomized controlled trial is most efficacious for common problems with more than one acceptable treatment that may be compared, particularly if the patient group, treatment method, and outcomes are generalizable; there is a high percentage of follow-up; and the outcomes have clinical importance to patients. Each of these issues may cause a study to be less than convincing. A good example is patient follow-up. A loss of just 20% of a study group may virtually invalidate a study's findings. For example, a hypothetical study of 100 patients that described the results for eighty patients who had an average score of 86 of 100 points may represent a successful intervention. However, if the twenty missing patients were lost because of an average poor outcome of 40 points and they went elsewhere for treatment as a result, then the true outcome, if the entire group had been followed, would be only 77 points, or 9 points lower, on a 100-point scale.
While loss to follow-up is a substantial hurdle, it pales in comparison with problems in generalizability. Most studies are performed by surgeons with a particular interest, and therefore expertise, in the area of investigation. Because their previous experience likely far outweighs that of the generalist, this can make some findings inapplicable to the generalist's practice. For example, Sanders reported that the rate of acceptable reductions for calcaneal fractures treated operatively increased for three years and involved over 150 patients23. The average orthopaedic surgeon would probably not see that number of patients in his or her career, limiting the individual applicability of a comparison study done by such an experienced surgeon. The more technically demanding and experience-based an intervention is, the less generalizable it is, diminishing the usefulness of a randomized controlled trial.
In addition to surgical skill and experience, problems in classification plague orthopaedic studies, although this problem is not unique to randomized controlled trials. The relevance of a study's findings to one's patients is predicated on the similarity of the study group to the individual patient. Several problems, however, have been noted by a number of studies for a variety of fracture types in the literature recently24-33. First, there is often poor interobserver agreement. Audigé et al. reported poor observer agreement in most fracture classification schemes, with an average kappa value of only 0.2324,25. Second, many of the classification schemes have not been validated quantitatively. While patient-based outcomes have provided a common ground for the evaluation of some outcomes, the objective assessment that should accompany them may be missing. For instance, the quality of the reduction after acetabular fracture surgery has been shown to correlate with the rate of radiographic signs of arthritis, but not within the first two years33. Without the objective information about the quality of the reduction, it would not be possible to tease out the difference in procedures that resulted in perfect compared with imperfect reduction quality. Finally, it may be that future work could entail the development of new, or the augmentation of existing, classification systems to increase both the precision and the accuracy of a given system. In this regard, Audigé et al.25 proposed a three-phase approach for the validation of fracture classification systems, involving (1) the definition of classification categories and processes with use of diagnostic images that are evaluated by experts to ascertain reliability, accuracy, likelihood ratios, etc., (2) multicenter studies conducted by a representative group of potential future users of the classification system, and (3) the evaluation of the new classification scheme in a prospective clinical study to determine its appropriate use in a clinical setting.
Several logistical issues that also deter randomized controlled trials are primarily related to the occurrence of a problem and costs. Beta error, or the incorrect conclusion that two arms of a study have the same result, is essentially due to small study size. The greater the sample size, the more a study group represents the population from which it is derived. Sample size calculations based on point estimates are therefore needed to plan a sufficiently powered comparison for the specific outcomes of interest. If the outcome in question occurs commonly, such as pin track infection with long-term external fixation use, then a randomized trial is possible with few subjects if even a moderate treatment effect is expected. In contradistinction, if the outcome is rare, then large numbers of patients are needed to prove even a moderate effect. For example, if a surgeon is interested in fatal pulmonary embolus and the rate is 1%, then, to demonstrate a 20% difference in occurrence, more than 35,000 patients would be needed. The cost of such an endeavor makes it almost impossible to perform. There are limited resources for medical research, and their allocation must be to areas that have a chance for success.
Moreover, many important questions cannot be answered by a randomized trial. In a survey of participants at the 2007 Annual Meeting of the AOA, 82% of the respondents believed that more than half of the important questions in orthopaedics could not be answered with use of a randomized trial (Fig. 3). These fall under several categories but involve situations in which the factor in question cannot be manipulated. Etiologic research, such as the effect of bone mineral density on fracture risk, is an excellent example. Patients either have or do not have low bone mineral density on presentation. Similarly, incidence studies do not lend themselves to randomization. For instance, if an orthopaedic surgeon were interested in the rate of meniscal tears associated with tibial plateau fractures, patients enter the evaluation with one or another type and could not be randomized.
Prognostic information, such as that obtained from a diagnostic test or a patient state on presentation, is also not amenable to a randomized evaluation. Recently, tenderness on the medial side of the ankle has been demonstrated not to predict deltoid ligament incompetence in patients with a fibular fracture34. Similarly, Pape et al. reported a difference in the rate of adult respiratory distress syndrome on the basis of the initial inflammatory marker levels35. This is quite important information and may change the management of patients, but it represents a patient state that is not randomizable. Another example of prognostic information occurs when unintended treatment effects are seen, such as the possible effect of unplanned delay to the operating room on infection rates in patients with an open tibial fracture. It is unlikely that a patient would consent to a substantial delay in treatment, but the safety or risk of this type of delay might be shown by an analysis of patients delayed for other reasons.
Finally, there are issues with ethics and randomization. Some problems that would otherwise lend themselves nicely to a randomized controlled trial cannot be addressed because of consent issues. For example, research in the acute management of trauma patients is hindered by the fact that the patients are frequently not able to consent to a complex study in the face of the injury. Other questions may be difficult to answer because of the refusal of surgeons or caregivers to abandon time-tested techniques. For example, the efficacy of external stabilization for open-book pelvic injuries is not likely ever to be examined because of the common belief that some form of mechanical stabilization is helpful and does no harm. Much in the same way, the efficacy of parachutes cannot be evaluated36.
Consequently, in some cases where the reporting of randomized controlled trials is poor and/or simply because of the inherent limitations of randomized controlled trials, the orthopaedic surgeon must rely on his or her own experience and opinion as an "artist" in order to successfully treat his or her patients.
The role of nonrandomized or observational studies in orthopaedic research is an area of continued debate. Nonrandomized studies in which patients receiving a treatment are simply observed over a period of time usually overestimate or underestimate treatment effects. Nonetheless, nonrandomized controlled trials play a major role in certain circumstances. Nonrandomized controlled trials or observational studies make sense when the outcome is rare and the resulting sample size would otherwise be too large, when it is unethical to randomize patients, when so-called surgeon buy-in will be very difficult, or when technical expertise is highly varied across groups. Observational studies are also appropriate when a question is asked about prognosis. An example would be a question concerning the factors that determine whether malunion or osteoarthritis develop after a wrist fracture. In this situation, it is difficult and not practical to randomize patients to prognostic factors such as the quality of reduction.
The major types of study design include cohort studies, case-control studies, and case series. In performing these studies, orthopaedic surgeons are observing for prognostic factors and risk factors. Prognostic factors are variables or factors that predict which patients do better or worse. Risk factors are factors associated with development of the disease. Cohort studies are prospective in nature and determine the prognostic factors that are associated with an outcome. In a prospective study evaluating tibial fractures, a researcher might look at a prognostic factor such as quality of reduction and its effect on outcome, such as union. Case-control studies are retrospective in nature and determine the risk factors that predict an outcome. In a case-control study evaluating tibial fractures, a researcher might look at a risk factor such as smoking and determine how many smokers and nonsmokers went on to have union of the tibial fracture. Case series evaluate the success of a particular procedure in addressing a particular problem, but they are retrospective in nature.
In a cohort study design, the cohort represents a group of people followed over time to see whether an outcome of interest develops. Ideally this group meets a level of certain predetermined criteria representative of a population of interest and is followed with well-defined outcome variables. An example would be a cohort of patients with calcaneal fractures followed over time to evaluate the influence of certain prognostic factors or interventions, such as age or the quality of reduction, on the desired outcomes, such as the need for a subtalar fusion. The strengths of a prospective cohort include the ability to study several outcomes over time and to ensure the data collected are relevant and accurate. The weaknesses of a prospective cohort are the expense of tracking a large number of subjects, the long study period required, the limited inferences possible if there is no comparison group, and the problems with confounding variables because random allocation is not used.
A case-control study is a retrospective study that starts with the identification of individuals who already have the outcome of interest and then compares them with a control group without the outcome. An example of a case-control study would be to identify patients with an infected fracture of the tibia and a similar group of patients with a noninfected fracture to see whether the patients with an infected fracture were less likely to receive prophylactic antibiotics. Case-control studies are useful to study rare outcomes, when there are multiple etiologic factors, and when outcomes such as osteoarthritis take time to develop. Case-control studies can be conducted in a short time, require small sample sizes, and are inexpensive. They have limited usefulness because the data may be inaccurate as a result of the study's retrospective nature, and there may be problems in determining the cases and controls.
Case series include a single group of patients who often represent the experience of a single surgeon or center with a procedure or problem. Patients are identified backward in time, or retrospectively. The major benefit of a case series is that it allows the description of results or complications associated with a procedure. If a new technique for shoulder stabilization to treat anterior instability results in a high rate of recurrent instability or stiffness, then it may not be necessary to perform a randomized controlled trial in this situation. Case series are quick and easy to perform, provide information on rare conditions, and generate hypotheses. They have limited usefulness because they are retrospective, may overestimate or underestimate the truth, and are not generalizable.
Because of these limitations in the usefulness and validity of observational studies, the orthopaedic surgeon may be called on by circumstances to rely on his or her own experience as an "artist" in order to assess and treat patients.
With the increasing emphasis over the past decade on practicing evidence-based medicine in the form of randomized controlled trials as the so-called gold standard, the roles of expert opinion and clinical experience in the literature and in the practice of orthopaedic surgery have come into question. However, in our opinion, the important role of clinical experience should not be minimized in the process.
Some authors have suggested that early definitions of evidence-based medicine have deemphasized traditional determinants of clinical decision-making, including individual clinical experience. For example, as originally defined by Sackett et al.37 and reiterated by Spindler et al.38, evidence-based medicine is a "conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients." More recent definitions of evidence-based medicine, however, have stressed combining the best available evidence with clinical experience and individual patient values. This suggests that clinicians must apply their expertise to assess the patient's problem and must also incorporate the research evidence and the patient's preferences and values before making a management recommendation.
Guidelines are certainly useful for standardizing approaches to certain diseases; however, when taken to their extreme, they threaten to eliminate the surgeon's clinical judgment from the decision-making process. For example, practice guidelines, such as those put forth by the Cochrane Collaboration, represent the ultimate application of evidence-based medicine. If medicine could be practiced solely on the basis of the published evidence, then what need would there be for physicians? Perhaps clinical decisions should be left to the insurance claims adjusters with their books of guidelines. Certainly, in the era of pay-for-performance, care must be taken to maintain some level of control over the clinical decision-making process.
What should be the role of expert opinion, or Level-V evidence, in the orthopaedic literature? For many situations that are not amenable to randomized controlled trials, such as patients with multiple diseases and problems that contribute to their diagnosis, and when patients have disease processes with a low prevalence, then observational studies, clinical experience, and the expert opinion of others need to be relied on when making treatment decisions. Orthopaedic surgeons, however, must look critically at the Level-V evidence in the literature. The principal downfall of Level-V evidence, or even Level-IV studies, is the lack of controls. Without appropriate controls, whether prospective or historical, it is uncertain whether the outcomes observed were due to the investigator's intervention or merely the natural history of the disease process.
Medicine is not just an art or a science; rather, it is both. The debate over evidence-based medicine has been presented by some39,40 as a debate over the so-called soul of medicine: Is medicine an art or a science? In 1998, Shaughnessy et al.41 described music as a metaphor for the practice of medicine. Evidence-based medicine and the apparently conflicting concept of clinical experience represent two aspects of medicine. To many physicians, evidence-based medicine seems rigid, highly structured, and uninspiring, like a poorly performed baroque. In contrast, economists, academics, and health authorities view the seemingly unpredictable use of clinical experience as analogous to punk rock, in that it is uncontrollable, chaotic, and obeys few rules. The best medical practice is similar to neither baroque nor punk music; instead it is like good jazz, combining technical mastery with the artistry of improvisation. Clinical jazz combines the structure supplied by patient-oriented evidence with the physician's clinical experience to manage situations of uncertainty, instability, uniqueness, and conflicting values.
While an extensive framework for the practical application of this metaphor cannot be supplied presently, we suggest the development of a quasi-quantitative scoring system that a clinician would self-administer to determine whether he or she should apply the findings of randomized controlled trial data, observational studies, or his or her own clinical experience to a particular patient case. This scoring system might tally together in a weighted fashion such factors as the number of years of experience a clinician has in the orthopaedic field in general, the number of patients they have treated in the past with the same clinical problem, the number of randomized controlled trial papers they have read about similar cases, and the number of related observational studies they have read in the literature. The score would determine how many more randomized controlled trials, observational studies, and colleagues they should consult before assigning a treatment regime for the patient.
On the basis of the current review of the orthopaedic literature, evidence-based medicine in the form of randomized controlled trials represents the current gold standard when seeking information to use in clinical decision-making. While not every clinical problem lends itself to a randomized study, information from this type of study, rather than retrospective reviews of case-control studies or case series, should be used whenever possible. Despite the obvious advantages of randomized controlled trials, there are many deficiencies in their execution and reporting. While education and increased scrutiny may reduce these problems, many clinical questions in the orthopaedic field are not amenable to randomized controlled trials. Moreover, the other aspects of clinical practice, such as clinical experience in the form of expert opinion, should not be ignored. The experienced clinician remains at the cornerstone of medicine as a facilitator for optimal patient care. They consider the best available evidence, as well as the patient's values and circumstances, and direct the process of decision-making. Because of the dual and complementary aspects of the role of an orthopaedic surgeon (i.e., evidence-based and instinctual), the profession may be considered to be both a science and an art.
Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA.1992;268:2420-5.2682420Â
1992Â
[PubMed][CrossRef] Â
Guyatt GH, Haynes RB, Jaeschke RZ, Cook DJ, Green L, Naylor CD, Wilson MC, Richardson WS. Users' Guides to the Medical Literature: XXV. Evidence-based medicine: principles for applying the Users' Guides to patient care. Evidence-Based Medicine Working Group. JAMA.2000;284:1290-6.2841290Â
2000Â
[CrossRef] Â
Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. New York: Churchill Livingstone; 1997.Â
1997Â
Â
Petrisor BA, Keating J, Schemitsch E. Grading the evidence: levels of evidence and grades of recommendation. Injury.2006;37:321-7.37321Â
2006Â
[CrossRef] Â
Bhandari M, Swiontkowski MF, Einhorn TA, Tornetta P 3rd, Schemitsch EH, Leece P, Sprague S, Wright JG. Interobserver agreement in the application of levels of evidence to scientific papers in the American volume of the Journal of Bone and Joint Surgery. J Bone Joint Surg Am.2004;86:1717-20.861717Â
2004Â
Â
Bhandari M, Richards RR, Sprague S, Schemitsch EH. The quality of reporting of randomized trials in the Journal of Bone and Joint Surgery from 1988 through 2000. J Bone Joint Surg Am.2002;84:388-96.84388Â
2002Â
Â
Poolman RW, Struijs PA, Krips R, Sierevelt IN, Lutz KH, Bhandari M. Does a "Level I Evidence" rating imply high quality of reporting in orthopaedic randomised controlled trials? BMC Med Res Methodol.2006;6:44.644Â
2006Â
[CrossRef] Â
Bhandari M, Guyatt GH, Swiontkowski MF. User's guide to the orthopaedic literature: how to use an article about a surgical therapy. J Bone Joint Surg Am.2001;83:916-26.83916Â
2001Â
[CrossRef] Â
Poolman RW, Kerkhoffs GM, Struijs PA, Bhandari M, International Evidence-Based Orthopedic Surgery Working Group. Don't be misled by the orthopedic literature: tips for critical appraisal. Acta Orthop.2007;78:162-71.78162Â
2007Â
[CrossRef] Â
McCulloch P, Taylor I, Sasako M, Lovett B, Griffen D. Randomised trials in surgery: problems and possible solutions. BMJ.2002;324:1448-51.3241448Â
2002Â
[CrossRef] Â
Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med.2000;342:1878-86.3421878Â
2000Â
[CrossRef] Â
Popp AJ, Chater N. Extracranial to intracranial vascular anastomosis for occlusive cerebrovascular disease: experience in 110 patients. Surgery.1977;82:648-54.82648Â
1977Â
Â
Failure of extracranial-intracranial arterial bypass to reduce the risk of ischemic stroke. Results of an international randomized trial. The EC/IC Bypass Study Group. N Engl J Med.1985;313:1191-200.3131191Â
1985Â
[CrossRef] Â
Bhandari M, Devereaux PJ, McKee MD, Schemitsch EH. Compression plating versus intramedullary nailing of humeral shaft fractures—a meta-analysis. Acta Orthop.2006;77:279-84.77279Â
2006Â
[CrossRef] Â
Canadian Orthopaedic Trauma Society. Nonunion following intramedullary nailing of the femur with and without reaming. Results of a multicenter randomized clinical trial. J Bone Joint Surg Am.2003;85:2093-6.852093Â
2003Â
Â
Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA.1996;276:637-9.276637Â
1996Â
[CrossRef] Â
Bhandari M, Guyatt GH, Lochner H, Sprague S, Tornetta P 3rd. Application of the Consolidated Standards of Reporting Trials (CONSORT) in the fracture care literature. J Bone Joint Surg Am.2002;84:485-9.84485Â
2002Â
Â
Lochner HV, Bhandari M, Tornetta P 3rd. Type-II error rates (beta errors) of randomized trials in orthopaedic trauma. J Bone Joint Surg Am.2001;83:1650-5.831650Â
2001Â
Â
Bailey CS, Fisher CG, Dvorak MF. Comment on: Type II error in the spine surgical literature. Spine.2005;30:164.30164Â
2005Â
[CrossRef] Â
Chung KC, Kalliainen LK, Hayward RA. Type II (beta) errors in the hand literature: the importance of power. J Hand Surg [Am].1998;23:20-5.2320Â
1998Â
[CrossRef] Â
Bhandari M, Whang W, Kuo JC, Devereaux PJ, Sprague S, Tornetta P 3rd. The risk of false-positive results in orthopaedic surgical trials. Clin Orthop Relat Res.2003;413:63-9.41363Â
2003Â
[CrossRef] Â
Sung J, Siegel J, Tornetta 3rd P, Bhandari M. "Statistical difference" does NOT mean clinically important! An evaluation of orthopaedic trauma randomized trials. Read at the Annual Meeting of the Orthopaedic Trauma Association; 2006 Oct 5-7; Phoenix, AZ. Paper no 5.Â
2006Â
Â
Sanders R. Intra-articular fractures of the calcaneus: present state of the art. J Orthop Trauma.1992;6:252-65.6252Â
1992Â
[CrossRef] Â
Audigé L, Bhandari M, Kellam J. How reliable are reliability studies of fracture classifications? A systematic review of their methodologies. Acta Orthop Scand.2004;75:184-94.75184Â
2004Â
[CrossRef] Â
Audigé L, Bhandari M, Hanson B, Kellam J. A concept for the validation of fracture classifications. J Orthop Trauma.2005;19:401-6.19401Â
2005Â
Â
Maripuri SN, Rao P, Manoj-Thomas A, Mohanty K. The classification systems for tibial plateau fractures: how reliable are they? Injury.2008;39:1216-21.391216Â
2008Â
[CrossRef] Â
Budny AM, Young BA. Analysis of radiographic classifications for rotational ankle fractures. Clin Podiatr Med Surg.2008;25:139-52, v.25139Â
2008Â
[CrossRef] Â
Fung W, Jonsson A, Buhren V, Bhandari M. Classifying intertrochanteric fractures of the proximal femur: does experience matter? Med Princ Pract.2007;16:198-202.16198Â
2007Â
[CrossRef] Â
Dirschl DR, Ferry ST. Reliability of classification of fractures of the tibial plafond according to a rank-order method. J Trauma.2006;61:1463-6.611463Â
2006Â
[CrossRef] Â
Talarico RH, Hamilton GA, Ford LA, Rush SM. Fracture dislocations of the tarsometatarsal joints: analysis of interrater reliability in using the modified Hardcastle classification system. J Foot Ankle Surg.2006;45:300-3.45300Â
2006Â
[CrossRef] Â
Malek IA, Machani B, Mevcha AM, Hyder NH. Inter-observer reliability and intra-observer reproducibility of the Weber classification of ankle fractures. J Bone Joint Surg Br.2006;88:1204-6.881204Â
2006Â
[CrossRef] Â
Humphrey CA, Dirschl DR, Ellis TJ. Interobserver reliability of a CT-based fracture classification system. J Orthop Trauma.2005;19:616-22.19616Â
2005Â
[CrossRef] Â
Matta JM. Fractures of the acetabulum: accuracy of reduction and clinical results in patients managed operatively within three weeks after the injury. J Bone Joint Surg Am.1996;78:1632-45.781632Â
1996Â
Â
McConnell T, Creevy W, Tornetta P 3rd. Stress examination of supination external rotation-type fibular fractures. J Bone Joint Surg Am.2004;86:2171-8.862171Â
2004Â
Â
Pape HC, van Griensven M, Rice J, Gänsslen A, Hildebrand F, Zech S, Winny M, Lichtinghagen R, Krettek C. Major secondary surgery in blunt trauma patients and perioperative cytokine liberation: determination of the clinical relevance of biochemical markers. J Trauma.2001;50:989-1000.50989Â
2001Â
[CrossRef] Â
Smith GC, Pell JP. Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials [reprint]. Int J Prosthodont.2006;19:126-8.19126Â
2006Â
Â
Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence-based medicine: what it is and what it isn't. BMJ.1996;312:71-2.31271Â
1996Â
Â
Spindler KP, Kuhn JE, Dunn W, Matthews CE, Harrell FE Jr, Dittus RS. Reading and reviewing the orthopaedic literature: a systematic, evidence-based medicine approach. J Am Acad Orthop Surg.2005;13:220-9.13220Â
2005Â
Â
Berg M. Rationalizing medical work: decision-support techniques and medical practices. Cambridge, MA: The MIT Press; 1997.Â
1997Â
Â
Timmermans S, Mauck A. The promises and pitfalls of evidence-based medicine. Health Aff (Millwood).2005;24:18-28.2418Â
2005Â
[CrossRef] Â
Shaughnessy AF, Slawson DC, Becker L. Clinical jazz: harmonizing clinical experience and evidence-based medicine. J Fam Pract.1998;47:425-8.47425Â
1998Â
Â