To The Editor:
The articles "Intraobserver and Interobserver Reliability of the Classification of Thoracic Adolescent Idiopathic Scoliosis" (80-A: 1097—1106, Aug. 1998), by Lenke et al., and "Interobserver Reliability and Intraobserver Reproducibility of the System of King et al. for the Classification of Adolescent Idiopathic Scoliosis" (80-A: 1107—1111, Aug. 1998), by Cummings et al., questioned the value of the system of King et al. for the classification of idiopathic thoracic scoliosis.
Studies that are designed to test the validity of an evaluation tool often overlook the need to educate the reviewers with regard to the classification system being studied. This education should be included in the protocol for such studies. Otherwise, the study will test the common clinical experience but may not test the validity of the assessment tool.
It may be wrong to assume that surgeons who treat scoliosis actually understand the nuances of the classification system of King et al. when their education consists of reading a description of the method. Learning a classification system requires reading, instruction, feedback, and experience. For example, complex systems for the classification of clubfoot have demonstrated good interobserver reliability after the initial learning phase with properly supervised instruction3. In contrast, fracture classification has been shown to have poor reliability even when experts received instructions and written descriptions and attended a lecture on the principles of the system2. Both of these studies2,3 evaluated the systems and not the assumed knowledge of the observers. The Journal should impose similar standards.
The articles by Lenke et al. and Cummings et al. did not give the system of King et al. a fair chance because both studies assumed that the experts understood the system and knew how to apply it. The figures in the articles clearly illustrate gross errors that are appalling to anyone who has been properly educated about the system. These studies need to be repeated after the observers have been appropriately educated.
These two articles do have some value in that they provide sad evidence that some surgeons who treat scoliosis do not understand an important tool that they are trying to use. This should be the only valid conclusion until the system of King et al. can be more rigorously tested among truly knowledgeable observers.
Charles T. Price, M.D.: Nemours Children's Clinic, P.O. Box 568908, Orlando, Florida 32856-8908
Dr. Lenke, Dr. Betz, Dr. Bridwell, Dr. Clements, Dr. Harms, Dr. Lowe, and Dr. Shufflebarger reply:
Dr. Price does highlight a potential weakness of our study on the reliability of the classification system of King et al.4. As was directly stated in our Discussion section: "One of the strongest criticisms of this study could be that the reviewers lacked an accurate understanding of the proper use of the classification system of King et al. … In other words, the problem with the reliability of the classification may be inherent to the education of the reviewers … Incomplete understanding may have contributed to the poor reliability data." As Dr. Price points out, this is a potential problem inherent to any type of classification system. However, there are other facets of the system of King et al. that contributed to our suboptimum reliability data. These all have as their foundation the use of classification systems as tools to help to predict appropriate treatment1. Treatment recommendations were a large component of the original article by King et al., which was based solely on the principles of Harrington instrumentation and was certainly a major advance in the classification of thoracic idiopathic scoliosis. However, times have changed; two and three-dimensional analysis of scoliosis and treatment with segmental spinal instrumentation systems are now routine and produce different classification and treatment considerations.
We found several additional factors that may have contributed to the poor reliability data in our study, including, most importantly, the unidimensional nature of the system of King et al., which was designed to assess scoliosis in the coronal plane only. Scoliosis surgeons certainly recognize the importance of a careful analysis of lateral radiographs to aid in the identification of abnormalities in the sagittal plane (such as thoracolumbar hyperkyphosis) that may change one's opinion regarding the curve and the type of treatment that should be rendered (for example, from a single thoracic to a double major curve with instrumentation and arthrodesis extending into the middle-to-caudad aspect of the lumbar spine). In addition, thoracic curves are not isolated to an area extending from the cephalad aspect of the thoracic region superiorly to the thoracolumbar or lumbar region inferiorly. Thus, even though a structural thoracic curve is present, it must be viewed in context with either the compensatory or the structural curves cephalad and caudad to it in order to be classified appropriately; this so-called noncomprehensiveness also produced problems with regard to the reliability of the classification system of King et al. In a similar sense, some curves certainly satisfy the criteria for two types simultaneously (for example, type IV and type V), and, in fact, King type-I through IV deformities can also have a structural curve in the cephalad aspect of the thoracic region that needs to be included in the classification as well as in the operative treatment. Finally, the distinction between a King type-II and a King type-III curve is still often difficult when the apex of the lumbar curve sits near the center sacral line as there may be uncertainty as to whether the apex actually crosses the midline (type II) or not (type III). These are just some examples of situations in which poor reliability data can result not from the education of the reviewers but from problems inherent in the classification system.
In conclusion, it may be true that a more formalized educational process for the reviewers could have improved the reliability of the classification system of King et al. However, for the reasons given in this response and throughout our manuscript, we do not think that this would have improved the reviewers' ability to analyze the curve or to predict appropriate treatment. Operative treatment of adolescent idiopathic scoliosis goes far beyond the classification of the curve; the true test is in the type of treatment that is rendered and the results of that treatment. In our view, a classification system is only clinically useful if it helps the clinician to organize his or her thoughts and to determine the appropriate treatment, as was discussed and recommended in an editorial in The Journal1. Additional education of scoliosis surgeons in the use of the current system may improve the reliability data somewhat, but the system will still fall far short of being a useful tool for the current comprehensive, three-dimensional analysis and treatment of adolescent idiopathic scoliosis.
Lawrence G. Lenke, M.D.; Keith H. Bridwell, M.D.: Department of Orthopaedic Surgery, Spinal Deformity Service, Washington University, One Barnes Plaza, Suite 11300, St. Louis, Missouri 63110
Randal R. Betz, M.D.: Shriners Hospital, 8400 Roosevelt Boulevard, Philadelphia, Pennsylvania 19152
David H. Clements, M.D.: Temple University Hospital, 3401 North Broad Street, Philadelphia, Pennsylvania 19140
Jürgen Harms, M.D.: Rehabilitations Krankenhaus, Guttmannwtr.1, Postfach 327, 7516 Karlsbad, Langensteinbach, Germany
Thomas G. Lowe, M.D.: 3550 Lutheran Parkway, Suite 201, Wheat Ridge, Colorado 80033
Harry L. Shufflebarger, M.D.: 1150 Campo Sano Avenue, Suite 300, Coral Gables, Florida 33146
Dr. Cummings, Dr. Loveless, Dr. Campbell, Dr. Samelson, and Dr. Mazur reply:
Dr. Price believes that education of the observers who participate in a study that tests the validity of an evaluation tool is necessary to give the tool a fair chance. He opines that, without this education, the study tests the common clinical experience but may not test the validity of the assessment tool. We disagree.
As we stated in our article, the design of a study on the reliability and reproducibility of a classification system should reflect the manner in which most of the individuals who currently use, or are likely to use, the system learn it (perhaps this is what Price means by "common clinical experience"). All of the authors of our study have read the original article by King et al.4. In addition, we have all read textbook discussions of the system of King et al. as well as numerous articles in which the classification was used. Until the completion of our study, the system of King et al. formed the basis of discussions of operative strategy at our preoperative conferences. Finally, the article by King et al. was reread by each observer in our study before the study and was available for reference as each radiograph was examined.
It is true that none of us have attended a conference or a meeting devoted solely to the use of the system of King et al.; we suspect that we are not unique in this regard. A study that includes only experts in the area in question and, more specifically, experts who gathered before the study and standardized their collective interpretation of the classification system would represent a best-case analysis of the reproducibility and reliability of the system. Indeed, their collective interpretation might well differ from the individual or collective interpretation of other well qualified experts in the area. Furthermore, such a study would have to be repeated at intervals after the tutorial to see whether or not this best-case analysis was sustainable.
Many others who have studied the reliability and reproducibility of orthopaedic classification systems would probably agree with this logic. In contrast to the two studies cited by Price2,3, several studies that were cited in our article5-8 did not include a prestudy tutorial. As in our study, the observers in those studies5-8 were simply supplied with the description written by originators of the system.
It is likely that some individuals will blame the investigators who have studied a popular and widely used classification system when those investigators report that the system is less than reliable. It is beneficial to require authors to supply reliability and reproducibility data when they introduce a classification system. In light of Dr. Price's comments, perhaps authors should also specify how potential users may become properly educated in the nuances of the system.
R. Jay Cummings, M.D.; Eric A. Loveless, M.D.; John M. Mazur, M.D.: Nemours Children's Clinic, 807 Nira Street, Jacksonville, Florida 32207
Joseph Campbell, M.D.: Department of Orthopaedics, Naval Hospital, Camp Le Jeune, North Carolina 28547
Stephen Samelson, M.D.: 42nd Medical Group/SGSO, 3030 Kirkpatrick Avenue, Maxwell Air Force Base, Alabama 36112