Abstract
Background:
Radiographic measures such as the rib vertebral angle difference (RVAD), Cobb angle, and space available for the lung (SAL) help to guide treatment and measure treatment effects in patients with infantile idiopathic scoliosis. This study aimed to evaluate the intraobserver and interobserver reliability of these radiographic measures.
Methods:
Forty-five spine radiographs of skeletally immature patients (age, two months to four years) with infantile idiopathic scoliosis were measured with use of Surgimap software. Three pediatric orthopaedic surgeons and a pediatric orthopaedic fellow identified the major curve apex, rib-vertebra phase, Cobb angle, and end vertebrae and calculated the RVAD and SAL values at two separate time points. Interobserver and intraobserver reliability of the RVAD, Cobb angle, and SAL values were assessed with use of intraclass correlation coefficients (ICCs). Fleiss kappa coefficients were calculated for categorical variables.
Results:
The RVAD (ICC = 0.86 to 0.92) and Cobb angle (ICC = 0.99) measurements had high reliability. The SAL value had substantial interobserver reliability (ICC = 0.66) and moderate intraobserver reliability (ICC = 0.73). Despite the high agreement for the Cobb angle, the choice of the major curve vertebrae (kappa = 0.19 to 0.39) and apical vertebra varied (kappa = 0.57 to 0.62). Observers were more likely to choose the same apical vertebra in large curves (r = 0.483, p = 0.001). The agreement for the apical rib-vertebra phase was substantial (kappa = 0.67). Paired RVAD measurements fell within ≤10° of each other in 82% of cases, but the remaining 18% of the RVAD measurements showed >10° of variation.
Conclusions:
Measurements used to guide treatment of infantile idiopathic scoliosis curves were reliable despite standard radiographic measurement error and the difficulty in obtaining quality images in young patients. Clinicians are dependent on seemingly objective radiographic data. The reliability of the Cobb angle and RVAD measurements in infantile scoliosis was high but not devoid of variability that could skew the ability to accurately and reliably suggest the best course of treatment. The SAL value was a less reliable measure. Treatment recommendations for infantile idiopathic scoliosis should rely on the synthesis of objective and clinically subjective data, as variations in radiographic measurements can lead to inconsistencies in management and to inconsistent treatment outcomes.
Serial clinical examinations and radiographic measures such as the “rib vertebral angle difference” (RVAD), Cobb angle, and “space available for the lung” (SAL) are imperfect means for guiding treatment and measuring the treatment effect in patients with infantile idiopathic scoliosis, a subgroup of patients with early-onset scoliosis1. As with all measures used in clinical decisions, intraobserver and interobserver variations are of concern. Despite their general use, these measures have not been rigorously examined in this population. Radiographic measurement error can be magnified by difficulties in obtaining quality images of the very young patient.
In 1972, in an ambispective review of 138 children who had been diagnosed with infantile idiopathic scoliosis at an age of less than two years, Mehta identified two primary radiographic findings that were useful for distinguishing resolving from progressive idiopathic infantile scoliosis curves on serial radiographs: the apical rib-vertebra relationship and the RVAD value (Fig. 1)2. Mehta described the normal relationship of the rib head to the vertebral body on supine anteroposterior radiographs as being symmetric with a separation of 2 to 4 mm. The apical rib-vertebra relationship on the reviewed radiographs was classified as being in one of two phases. In Phase 1 (resolving or early progressive scoliosis), the rib head does not overlap the vertebral body, whereas in Phase 2 the rib head on the convex side of the spine overlaps the apical vertebral body. The RVAD is the angle formed by the apical thoracic vertebra and its corresponding rib (from the rib head to the rib neck) on the concave side of the curve minus the corresponding angle on the convex side. Mehta concluded that 83% of resolving curves had an RVAD value of <20° on the initial radiographs. Since this first description, several studies have supported her findings3,4. As illustrated in the figure adapted from the original article by Mehta (Fig. 1), identification of the various RVAD landmarks is somewhat subject to individual interpretation and is also dependant on the quality of the radiographs.
The interobserver and intraobserver reliability of the RVAD value has not been rigorously examined. McAlindon and Kruse tested the intraobserver and interobserver variability of RVAD measurements in a juvenile rather than an infantile population, and bone maturity may have minimized error and increased accuracy in that population. They also decreased measurement variability by prelabeling the apical vertebra used for the RVAD measurement5.
The reliability of Cobb angle measurements (see Appendix) has been well documented in the adolescent population but not in the infantile population6,7. The literature indicates intraobserver and interobserver variability of 4° to 8°. Gstoettner et al. reported that the primary source of variability in the Cobb angle measurement was in the identification of the end vertebrae7. Pruijs et al. defined two “phases” in which error occurs when measuring the Cobb angle in scoliosis: (1) in obtaining the radiographic image, and (2) in measuring the actual Cobb angle. The first phase introduced a standard deviation of 3.2° in the measurements, and the second phase introduced 2.0°8. Morrissy et al. found comparable intraobserver variability in the Cobb angle (4.9°) when taking into account differences in the choice of vertebrae, differences among the protractors used to obtain the measurements, and differences due to the observer9.
Campbell et al. described the measurement of the SAL value (Fig. 2). A declining value suggests progressive thoracic deformity10. The SAL value has only recently been included as a radiographic measure in the infantile idiopathic scoliosis population. To our knowledge, no literature exists regarding its reliability.
Understanding the reliability of these measures will aid treating physicians as they weigh clinical and radiographic data in determining the best course of treatment. Infantile idiopathic scoliosis treatments include observation, bracing, casting, and surgical management with expandable spinal and thoracic cage instrumentation. The present study evaluated the intraobserver and interobserver variability of radiographic measurements that help to guide the treatment of infantile idiopathic scoliosis.
A multicenter evaluation of radiographic measurements that are commonly used in the assessment of infantile idiopathic scoliosis was performed by four clinicians from three participating institutions (Columbia University Medical Center, New York, NY; University of Rochester Medical Center, Rochester, New York; and Washington University Medical Center, St. Louis, Missouri). Three of the clinicians were attending pediatric orthopaedic spine surgeons and the fourth was a pediatric orthopaedic surgery clinical fellow.
After appropriate institutional review board approval was obtained, the investigators at each participating site were asked to submit de-identified copies (in JPEG or DICOM format) of exiting radiographs of patients with infantile idiopathic scoliosis. Contributing sites assured that the patients were between two months and four years of age at the time of the radiographic imaging. Patients with known comorbidities, congenital scoliosis (vertebral or rib formation and segmentation abnormalities), and neural axis abnormalities were excluded. A sample size of convenience was utilized. Forty-nine posteroanterior or anteroposterior radiographs were initially available for evaluation. Both weight-bearing (standing) and non-weight-bearing (supine) radiographs were submitted. Four of the radiographs showed congenital vertebral abnormalities and were excluded from the final analysis. All radiographs were free of markings, and apical vertebrae were not preselected.
All images were uploaded onto flash drives containing Surgimap Spine software (New York, NY). Surgimap is a tool for viewing and measuring clinical images, and it has built-in image exporting and storage capabilities. Two flash drives were provided to each observer at an interval of three months. At both the first and the second acquisition session, each observer was asked to count the number or thoracic and lumbar vertebrae, to identify the apex of the major curve, to identify the rib-vertebra phase, to measure the Cobb angle of the major curve, to identify the end vertebrae used for the Cobb angle measurement, to measure the rib vertebral angle on the concave side and on the convex side, and finally to calculate the RVAD and SAL values for each of the radiographs that had been provided. All variables were recorded on paper datasheets and were also saved into the Surgimap software database within each flash drive.
Statistical Methods
A descriptive analysis of the data collected by the four observers at the two separate time points was performed. Given the non-normal distribution of the parameters (as confirmed by Shapiro-Wilk testing), the median and the interquartile range were calculated for each variable of interest.
Bivariate analyses of the entire dataset were conducted. A p value of ≤0.05 was considered significant. The calculated Spearman correlation coefficient (r) was interpreted according to convention, with ±0.01 to 0.29 indicating a weak correlation; ±0.30 to 0.49, an intermediate correlation; and ±0.50 to 1.00, a strong correlation.
Intraclass correlation coefficients (ICCs) were used to assess the interobserver and intraobserver reliability of measurements of the major curve Cobb angle, RVAD, and SAL by multiple observers11. The ICCs were based on a two-way random-effects model utilizing absolute agreement and 95% confidence intervals. In essence, the ICC is a ratio of group or individual variance to the total variance. Interobserver reliability was calculated with use of the data from the first acquisition session only. Intraobserver reliability was calculated with use of the data from both acquisitions for each observer. The calculated ICCs were interpreted according to convention, with 0.00 to 0.20 indicating slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, almost perfect agreement.
Error analysis was performed to calculate the standard deviation of the mean error of the RVAD measurement. A pairwise comparison was performed between measurements by different observers. The percentage of agreement with ranges of <5°, 5° to 8°, >8° to 10°, and >10° was calculated.
The Fleiss kappa coefficient was calculated to assess interobserver agreement for categorical variables (the number of thoracic and lumbar vertebrae, the location of the end and apical vertebrae, the rib-vertebra phase, identification of the concave and convex sides of the curve, and assessment of the curve as progressive or resolving). The Fleiss kappa coefficient was chosen because the measurements of more than two observers were being compared12. The calculated Fleiss kappa coefficients were interpreted according to convention, with <0 indicating no agreement; 0.00 to 0.20, slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, almost perfect agreement. The kappa value indicates the actual agreement relative to that expected because of chance; the higher the kappa value, the more likely that the agreement was not due to chance. (Negative kappa values are obtained when agreement occurs less often than would be expected because of chance.)
Source of Funding
No funding was received for this investigation.
Descriptive Analysis
A descriptive analysis of the measured radiographic variables is presented in Table I. The mean Cobb angle was 34.06° (range, 0° to 82°), the mean RVAD value was 13.61° (range, −14° to +62°), and the mean SAL value was 90.11% (range, 66% to 128%).
Spearman Correlation
The Cobb angle correlated with the rib-vertebra phase (r = 0.66, p < 0.001) and with classification of the curve as progressive (r = 0.69, p < 0.001). The RVAD value correlated with the Cobb angle (r = 0.54, p < 0.001), the rib-vertebra phase (r = 0.39, p < 0.001), and classification of the curve as progressive (r = 0.57, p < 0.001). The rib-vertebra phase also correlated with classification of the curve as progressive (r = 0.7; p < 0.001). A secondary analysis demonstrated that greater curve severity correlated with a greater likelihood that the observers would choose the same apical vertebra (r = 0.483, p = 0.001).
Agreement According to Intraclass Correlation
Data from all four observers for the first acquisition were pooled to determine the interobserver agreement for the group with use of ICC values calculated with a two-way random-effects model (Table II). When assessed as a group in this fashion, the interobserver agreement was almost perfect for the RVAD calculation (ICC = 0.92) and the Cobb angle measurement (ICC = 0.99). The SAL value demonstrated substantial interobserver agreement (ICC = 0.65).
Data from all four observers for both acquisition sessions were pooled to calculate the intraobserver agreement for the group (Table II). When assessed as a group in this fashion, the intraobserver agreement was almost perfect for the RVAD calculation (ICC = 0.86) and the Cobb angle measurement (ICC = 0.99). The SAL value demonstrated substantial intraobserver agreement (ICC = 0.73).
Data for both acquisition sessions were utilized to calculate the intraobserver agreement for each individual (Table III). Three observers had almost perfect intraobserver agreement for the RVAD value (ICC = 0.88 to 0.94) and one observer had only substantial agreement (ICC = 0.65). All four observers had almost perfect intraobserver agreement for the Cobb angle (ICC = 0.98 to 0.99). Intraobserver agreement for the SAL value varied among the individual observers; two observers had almost perfect agreement (ICC = 0.88 to 0.92), one had moderate SAL agreement (ICC = 0.41), and one had fair agreement (ICC = 0.29).
Agreement According to Fleiss Kappa Analysis
The group interobserver and intraobserver agreement for categorical variables was calculated with use of the Fleiss kappa value and the percentage of agreement (Table IV).
Data from all four observers for the first acquisition session were pooled to calculate the interobserver agreement for the group. The number of thoracic vertebrae (kappa = −0.02), the number of lumbar vertebrae (kappa = 0.02), and the identification of the superior vertebra (kappa = 0.19 demonstrated slight group interobserver agreement. The identification of the inferior vertebra (kappa = 0.32) demonstrated fair agreement. The identification of the apical vertebra (kappa = 0.57) and the assessment of the curve as progressive or resolving (kappa = 0.52) demonstrated moderate agreement. The rib-vertebra phase demonstrated substantial agreement (kappa = 0.67), and the labeling of the concave and convex sides of the spinal deformity demonstrated almost perfect agreement (kappa = 0.82).
Data from all four observers for both acquisition sessions were pooled to calculate the group intraobserver agreement for categorical variables. The numbering of thoracic and lumbar vertebrae (kappa = 0.33 to 0.34) and the identification of the superior vertebra (kappa = 0.24) and the inferior vertebra (kappa = 0.39) demonstrated fair agreement. The identification of the apical vertebra (kappa = 0.62) and the assessment of the curve as progressive or resolving (kappa = 0.81) demonstrated almost perfect agreement. The labeling of the concave and convex sides of the spinal deformity demonstrated almost perfect agreement (kappa = 0.91).
Data from both acquisition sessions were utilized to calculate the intraobserver agreement for each individual for categorical variables with use of the Fleiss kappa value (Table V). Two observers had slight agreement for the identification of the superior vertebra (kappa = 0.17 to 0.18) and two had fair agreement (kappa = 0.27 to 0.32). One observer had slight agreement for the identification of the inferior vertebra (kappa = 0.12), one had fair agreement (kappa = 0.37), and two had moderate agreement (kappa = 0.41 to 0.58). One observer had fair agreement for the identification of the apical vertebra (kappa = 0.32), one had moderate agreement (kappa = 0.54), one had substantial agreement (kappa = 0.67), and one had almost perfect agreement (kappa = 0.91).
Standard Error According to Pairwise Comparisons
Error analysis revealed that 82% of paired RVAD measurements fell within ≤10° of each other (Fig. 3). The mean “error” between matched pairs (and standard deviation) was 6.2° ± 5.3°. The mean error was 5.3° ± 4.58° for paired RVAD measurements of ≤20° and 7.93° ± 6.25° for measurements of >20°. The standard error was 2.83° for the Cobb angle measurement, 7.47% for the SAL value, and one vertebral level for the identification of the apical vertebra.
Despite high intraobserver and interobserver reliability statistics, only 82% of all paired RVAD measurements fell within ≤10° of each other (55% within <5°) (Fig. 3). The Cobb angle measurement demonstrated almost perfect interobserver and intraobserver reliability; the measurement of the SAL value was less reliable. The radiographs used during this study reflect a cross-section of patients with idiopathic early-onset scoliosis. As the radiographs used for this study were obtained from standard-of-care visits, they are representative of the quality of radiographs obtained for this population and they varied in quality and in patient position.
Despite the high statistical reliability found for the RVAD measurement, it is important to note that outliers did exist and that other factors should be considered when recommending treatment for patients with infantile idiopathic scoliosis. Unlike the situation in previous studies, apical vertebrae were not prelabeled5. It was the opinion of the authors that this better represented true-life scenarios, in which clinicians independently choose the vertebra on which to base the RVAD measurement. The authors only agreed with their own previous identification of the apical vertebra (intraobserver agreement) 69% of the time (p = 0.05) and with each other 47% of the time (interobserver agreement). The intraobserver and interobserver reliability of identifying the same apical vertebra was higher in radiographs from subjects with larger Cobb angles (r = 0.0483, p = 0.001). The standard error for the identification of the apical vertebra was one vertebral level. It is not clear how best to minimize error in identification of the apical vertebra for smaller curves. However, our study demonstrated that the effect of selecting varying apical vertebrae was modest. Our data demonstrated high Cobb angle agreement despite the fact that the choice of end vertebrae, on which the Cobb angle was based, did not have such strong agreement. Adjacent vertebra may have had comparable vertebral end plate tilt, or the difference may have been canceled out by the measurement error, thereby minimally affecting the Cobb angle measurement.
Larger Cobb angles were associated with larger RVAD values (r = 0.54, p < 0.001). Despite the fact that the apical vertebra was easier to identify in these larger curves, the RVAD value for these curves had a larger dispersion (r = 0.656, p < 0.001). This dispersion may be explained by the increased rotation and rib deformity often seen in patients with larger curves. The RVAD measurement error was slightly greater (6.25° compared with 4.58°) if the RVAD was >20°, which is often representative of a larger curve.
Despite the fact that our study found the RVAD value to be statistically reliable as assessed on the basis of the ICC, it is important to note that such variations in the identification of the apical vertebra can have large clinical ramifications. Looked at another way, 18% of measurements showed >10° of variation in the RVAD value—a variation that clearly can make a difference in treatment decisions, given that an RVAD cutoff value of 20° is generally considered important for predicting progression or regression. As an example, the patient in Figure 4 was seen by three pediatric orthopaedic surgeons at three different centers of care, and the same radiograph was measured by all three surgeons. On initial assessment of the radiograph, each clinician chose a different apical vertebra (T10, T11, and T12), and they obtained three different RVAD measurements (35°, 14°, and 9°). The suggested treatments varied from observation to serial Risser casting. One of the three clinicians presumed the likelihood of progression to be high. Such clinical variations can be very disconcerting to parents and can cause confusion in choosing the optimal treatment. Measurement variability and the complexity of developing treatment recommendations should be well described to parents, who often focus on seemingly objective radiographic data such as the RVAD value and the Cobb angle.
Negative RVAD values were measured on seven of the forty-five radiographs by several observers at different acquisition time points. Mehta described negative RVAD values in her original paper, in which early double thoracic and lumbar curves demonstrated symmetrical thoracic ribs yielding RVAD values close to zero, drooping of the twelfth thoracic ribs (concave > convex) yielding a negative RVAD at this level, and rotation of the thoracic and lumbar vertebrae in opposing directions2. However, the negative RVAD values in our study did not represent this type of curve. Instead, the negative RVAD values for these radiographs were attributable to the identification of the apical vertebra on non-standing radiographs and/or the placement of the line across the end plate of the apical vertebrae. The RVAD values for these radiographs were small, and the negative measurements fell within the 10° of measurement error found in our study.
The RVAD is likely quite sensitive to small variations in the rotational position of the patient. Just as the Stagnara views most accurately quantify curve magnitude, radiographs would ideally be obtained orthogonal to the apex of the deformity to best assess the RVAD value. Three-dimensional measures of this relationship may hold future promise.
It is important to note that the Surgimap RVAD tool was utilized for measuring the RVAD value. It is possible that the reliability of the RVAD measurement was increased by utilizing this tool. The Surgimap RVAD tool instructs the observer to draw a line under the end plate of the apical vertebra and to draw lines for the associated ribs (first on the concave side and then on the convex side). The Surgimap RVAD tool automatically calculates the RVAD value by constructing a perpendicular line from the chosen end plate and calculating the difference between the created rib-vertebra angles (Fig. 5). When measuring the RVAD manually, there are more possible sources of measurement error (Fig. 1). The use of digital radiographs has become increasing popular, but is not yet a universal standard of care. The effect of using tools such as Surgimap should be considered when calculating radiographic measurements.
Assessment of the likelihood of curve progression as described by Mehta can be made on the basis of the first available radiograph; however, this judgment may be modified on the basis of serial radiographs. The study by Mehta revealed that 83% of resolving curves had an RVAD value of <20°; however, it is important to also note that 17% of resolving curves had an RVAD value of >20°. In our study, a standard deviation of 4.58° in RVAD values of ≤20° and 6.25° in RVAD values of >20° existed. Any curve with an RVAD value that is close to 20° or greater should be treated as progressive until proven otherwise by subsequent radiographs; interpretation of the RVAD value is likely to be subject to the observer’s clinical judgment. Despite objective cutoff values, clinician judgment will always affect interpretation of the data.
Although the RVAD value demonstrated substantial statistical reliability, its use must be tempered by the reality that 18% of paired observations were associated with a >10° difference in measurement. Although the RVAD value is one important metric in describing the deformity and assessing the likelihood of progression in idiopathic infantile scoliosis, it should be used with some care. Assessment of serial radiographs is crucial to increasing the likelihood of correctly discriminating between progressive and resolving curves. Other measures of curve progression need to be identified to better guide treatment decisions in patients with infantile idiopathic scoliosis.
A figure demonstrating measurement of the Cobb angle is available with the online version of this article as a data supplement at jbjs.org.
Note: The authors acknowledge Virginie C. Lafage, PhD (NYU Hospital for Joint Diseases, New York, NY), David P. Roye Jr., MD, and Benjamin D. Roye, MD, MPH (Columbia University College of Physicians and Surgeons, New York, NY), and the design contributions of medical illustrator Michael P. Moran (Made of Cells, New York, NY).
Gillingham
BL;
Fan
RA;
Akbarnia
BA. Early onset idiopathic scoliosis. J Am Acad Orthop Surg.
2006;14(
2):101-12.[PubMed]
Mehta
MH. The rib-vertebra angle in the early diagnosis between resolving and progressive infantile scoliosis. J Bone Joint Surg Br.
1972;54(
2):230-43.[PubMed]
Ceballos
T;
Ferrer-Torrelles
M;
Castillo
F;
Fernandez-Paredes
E. Prognosis in infantile idiopathic scoliosis. J Bone Joint Surg Am.
1980;62(
6):863-75.[PubMed]
Kristmundsdottir
F;
Burwell
RG;
James
JI. The rib-vertebra angles on the convexity and concavity of the spinal curve in infantile idiopathic scoliosis. Clin Orthop Relat Res.
1985;(
201):205-9.
McAlindon
RJ;
Kruse
RW. Measurement of rib vertebral angle difference. Intraobserver error and interobserver variation. Spine (Phila Pa 1976).
1997;22(
2):198-9.[PubMed][CrossRef]
Carman
DL;
Browne
RH;
Birch
JG. Measurement of scoliosis and kyphosis radiographs. Intraobserver and interobserver variation. J Bone Joint Surg Am.
1990;72(
3):328-33.[PubMed]
Gstoettner
M;
Sekyra
K;
Walochnik
N;
Winter
P;
Wachter
R;
Bach
CM. Inter- and intraobserver reliability assessment of the Cobb angle: manual versus digital measurement tools. Eur Spine J.
2007;16(
10):1587-92. .[PubMed][CrossRef]
Pruijs
JE;
Hageman
MA;
Keessen
W;
van der Meer
R;
van Wieringen
JC. Variation in Cobb angle measurements in scoliosis. Skeletal Radiol.
1994;23(
7):517-20.[PubMed][CrossRef]
Morrissy
RT;
Goldsmith
GS;
Hall
EC;
Kehl
D;
Cowie
GH. Measurement of the Cobb angle on radiographs of patients who have scoliosis. Evaluation of intrinsic error. J Bone Joint Surg Am.
1990;72(
3):320-7.[PubMed]
Campbell
RM
Jr;
Smith
MD;
Mayes
TC;
Mangos
JA;
Willey-Courand
DB;
Kose
N;
Pinero
RF;
Alder
ME;
Duong
HL;
Surber
JL. The characteristics of thoracic insufficiency syndrome associated with fused ribs and congenital scoliosis. J Bone Joint Surg Am.
2003;85-A(
3):399-408.[PubMed]
Shrout
PE;
Fleiss
JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull.
1979;86(
2):420-8.[PubMed][CrossRef]
Fleiss
JL. Measuring nominal scale agreement among many raters. Psychol Bull.
1971;76(
5):378-382.[CrossRef]