A variety of methods to classify cervical spine injuries have been
proposed, but none have been well
accepted1-4.
The purposes of a classification system are to aid the physician in grading
the severity of injuries, determining the prognosis, facilitating
communication with other physicians, and, most importantly, developing
effective management
strategies5.
Classification systems provide generalizations that often approximate the
pathologic and anatomic abnormalities. Despite a lack of precision, a
classification system should provide sufficient information to allow
recognition of specific patterns and appropriate categorization. Systems that
include too much detail become burdensome, whereas classifications that are
too broad do not allow sufficient division to be of clinical utility. Given
the complex uses and requirements for classification systems and the variety
of potential patterns of injury in the subaxial cervical spine, it is not
surprising that there is no widely accepted system.
Stability is one of the fundamental factors in the management of cervical
spine injuries. White and Panjabi defined stability as the "ability of
the spine under physiologic loads to maintain its pattern of displacement so
that there is no initial or additional neurologic deficit, no major deformity,
and no incapacitating
pain."6 Louis
provided a similar definition: "Instability...is a pathological process
which can lead to displacement of vertebrae beyond their normal physiological
limits."1
Although these definitions are widely accepted, we are aware of no means with
which to accurately determine stability. Stability is a binary variable
(stable or unstable) in the majority of systems of which we are
aware3,4.
In reality, there are gradations of injury ranging from normal to a total loss
of any connection between adjacent segments.
An accurate measure of stability and precise assessment of neurologic
injury are two important factors that influence the prognosis and treatment of
these injuries. Accordingly, we developed a quantitative continuous scale, the
Cervical Spine Injury Severity Score (CSISS), to measure stability. We believe
that this quantified stability score, when combined with a morphologic
description of the injury and neurologic examination of the patient, provides
sufficient information to adequately classify the injury and fulfill the
criteria described by Mirza et
al.5.
We hypothesized that a quantitative scale can be used to identify cervical
spine injury patterns that are best treated surgically, those that can be
treated nonoperatively, and those that can be treated with either means. The
purpose of this investigation was to measure the intraobserver and
interobserver reliability of a novel quantitative system for assessing the
severity of cervical spine injury.
Cervical Spine Injury Severity Score
The CSISS was developed by the senior author (P.A.A.) and is based on
consideration of both ligamentous and osseous injury and the distribution of
the injury across various portions of the spinal column. In this system, the
cervical spine is divided into four columns: anterior, posterior, right pillar
(right lateral column), and left pillar (left lateral column)
(Fig. 1-A). The anterior column
includes the anterior and posterior longitudinal ligaments, vertebral body,
intervertebral disc, uncinate processes, and transverse process. The posterior
column includes both laminae, the spinous process, the posterior ligamentous
complex, and the ligamentum flavum. The lateral columns or pillars include the
pedicle, lateral masses, superior and inferior articular processes, transverse
processes, and facet capsules.
Each column is scored on an analog scale ranging from 0 to 5 points
(Fig. 1-B), with higher values
given for more severe injuries as judged on the basis of bone and ligamentous
disruption. Ligamentous disruption is presumed on the basis of separation of
normal osseous landmarks. If more than one cervical spine level is injured,
only the most severely affected level is graded. A score of 0 points signifies
no injury, a score of 1 indicates a nondisplaced fracture, and a score of 5
points is given for the worst possible injury to that particular column, such
as a complete fracture-dislocation of a facet (pillar) or a complete
disruption of the posterior ligamentous complex as evidenced by wide
separation of the spinous processes. Fractional values can be used, and the
examiner is free to either upgrade or downgrade the score as he or she deems
appropriate, such as for patients with severe degenerative ankylosis, diffuse
idiopathic spinal hyperostosis, or ankylosing spondylitis that substantially
alters the rigidity of the spine. In the present study, no specific
instructions were given with regard to the amount of modification (upgrading
or downgrading) that the examiners should employ. The overall injury severity
score is the sum of the analog scores for the four columns, with the total
ranging from 0 to 20 points.
Patient Cohort
Thirty-four consecutive patients with subaxial cervical spine injuries were
included in our study. The treatment of the injury was not affected by the
results of the study. Institutional review board approval was obtained. The
patients' informed consent was not required for the use of the deidentified
images.
Image Acquisition, Storage, and Viewing
Lateral digital radiographs and computed tomography scans of the cervical
spine were made for all patients at the time of presentation. The computed
tomography scans were all acquired with multidetector scanners in helical
mode, with 1.25-mm images and 0.6-mm overlap. These source images were then
reformatted into two-dimensional true axial, sagittal, and coronal images,
with a reformatted slice thickness of either 2 or 3 mm. The de-identified
lateral radiographs and two-dimensional computed tomography reformatted images
were then stored in compact-disc read-only format. The compact discs could
each accommodate images of five patients. The images of the thirty-four
patients were stored in random order, with the images of five patients stored
twice to measure intraobserver variability. The viewing software was eFilm
Lite (Merge eFilm, Milwaukee, Wisconsin), which allows images to be viewed on
a standard computer in much the same fashion as images are viewed at a
standard radiographic picture archiving and communication system (PACS)
workstation.
The injury images were evaluated by fifteen different reviewers, who
included orthopaedic and neurosurgical residents and fellows, attending
surgeons experienced in the treatment of spinal trauma, and a musculoskeletal
radiologist. The reviewers were given instructions on the use of the viewing
software and general instructions for utilizing the stability scoring system.
Each reviewer then scored each case in random order without knowledge of any
clinical or treatment data.
Statistical Methods
Intraobserver and interobserver reliabilities were determined with the
intraclass correlation coefficient, which ranges from 0 (no agreement) to 1
(perfect agreement). The intraclass correlation coefficient is a ratio of the
variability among subjects to the overall variability observed in the data. It
is commonly used to assess reliability of ordinal data that have many classes
or to assess continuous data.
The interobserver intraclass correlation coefficient was calculated by
comparing results of each case between each examiner and averaging. The
intraobserver intraclass correlation coefficient was calculated by comparing
results between repeat measurements for the five randomly repeated cases. An
intraclass correlation coefficient score of 0 to 0.4 indicates poor
reliability, a score of >0.4 to 0.75 indicates fair or moderate
reliability, and a score of >0.75 indicates excellent
reliability7. Prior
to institution of the study, a power analysis performed with a minimum of ten
examiners and thirty-four patients predicted that we would have >85% power
to show excellent reliability if it was present. Increasing power was noted
with an increasing number of
examiners8. A p
value of 0.05 was considered to be significant. All analyses were performed
with use of SAS software (version 9; SAS Institute, Cary, North Carolina).
Case Distribution
The CSISS scores were widely distributed from 0 to 20 points. The
distribution was not normal, as there was a concentration of cases toward the
less severe ratings and a spike of cases at the most severe rating (20
points). Figure 2 is a
frequency plot based on divisions of 0.5 units, such as 7.0 to 7.4 or 9.5 to
9.9. In general, the low values represented nondisplaced or minimally
displaced fractures, whereas the high scores were for bilateral facet
dislocations with >5 mm of displacement and injury to each of the four
columns.
Intraobserver Error
The intraobserver error was assessed by comparing each examiner's scores
for two separate evaluations of five randomly selected patients. The mean
intraclass correlation coefficient for the fifteen examiners was 0.977 (range,
0.948 to 1.000), indicating excellent reliability.
Interobserver Error
The mean interobserver intraclass correlation coefficient for the CSISS was
0.883, indicating excellent reliability
(Table I). The mean intraclass
correlation coefficient for anterior, posterior, and combined pillars was
0.818, 0.759, and 0.831 respectively. These high intraclass correlation
coefficient scores also indicate excellent reliability.
Effect of Experience of Examiner
We noted no differences based on the experience of the examiners, with an
intraclass correlation coefficient of 0.871 for residents and fellows compared
with 0.894 for the attending surgeons and the radiologist.
Effect of Fracture Type
The mean differences in the CSISS scores among the examiners are given for
six different fracture patterns in Table
II. These differences ranged from 1.64 for isolated fractures to
3.06 for fractures in a spine with ankylosing spondylitis.
Association with Treatment and Neurologic Injury
The CSISS scores were compared with neurologic function and with treatment.
All fourteen patients with a mean CSISS score of =7 points were treated
surgically (see Appendix), and only four of the twenty patients who had
surgery had a score of <7 points. Three of those four patients presented
with a neurologic deficit. One of them had a traumatic disc herniation that
had produced a severe central cord syndrome. This disc herniation was two
levels cephalad to a subtle disc distraction injury, and surgical intervention
was undertaken to decompress the spinal cord at the level cephalad to the disc
distraction injury. The other two patients with a neurologic deficit had a
unilateral superior articular facet fracture with fracture displacement into
the neural foramen causing radicular symptoms. Both had a unilateral superior
articular facet fracture with fracture displacement into the neural foramen
causing radicular symptoms.
The presence of a neurologic deficit had a strong influence on the decision
to treat a patient surgically. All fourteen patients with a neurologic deficit
were treated surgically. Of the patients who were neurologically intact, all
who had a score of =7 points were treated with surgery and only one of
seventeen who had a score of <7 points was treated surgically. Although the
CSISS accurately predicted the need for surgery, all treatment decisions were
made independently at the time of injury irrespective of the calculation of
that score.
Worst-Case Analysis
The cases that led to the highest variability among the examiners' scores
(the seven most severe cases) were analyzed separately to determine potential
weaknesses of the scoring system (Table
III). On retrospective review of the computed tomography scans of
two flexion-axial loading injuries, it was determined that fractures or
subluxation of the facets that were clearly demonstrated by those scans had
been missed or underscored (Figs. 3-A and
3-B, 3-C and 3-D).
The other five cases that were associated with variability in scoring
illustrate limitations of computed tomography imaging and the analog scoring
system. These cases included major ligamentous and disc disruptions that were
often underestimated by the reviewers. For example, one injury was in an
ankylosed spine, and the other was an injury through the disc space (Figs.
4-A and 4-B,
4-C and 4-D). In the latter
case, distraction of the disc space, anterior subluxation, and ankylosis from
spondylosis were seen on the sagittal computed tomography reformation
(Fig. 4-A). Despite the
variability, all reviewers assigned a score of at least 8 points to this case,
and the intraclass correlation coefficients were again high for interobserver
reliability.
Determining the severity or stability of a cervical spine injury has proved
difficult despite the use of high-resolution imaging. Stability has been
considered to be a binary variable (stable or unstable) when in fact it is
likely to be a continuum of injury patterns that can appear similar. For
instance, a burst fracture of C7 with 5 mm of retropulsion of bone into the
spinal canal may be associated with no other injury or with a concurrent
complete posterior osteoligamentous
injury9. We
therefore developed our classification system to allow measurement of all
components of the spinal column and not just the most obvious injury
region.
The concept of functional stability was probably first described by Nicoll,
who defined it in a population of Welsh miners on the basis of their ability
to return to work3.
Subsequently, in 1970, Holdsworth described the force vectors of an injury and
the importance of an intact posterior osteoligamentous complex in maintaining
the stability of the injured motion
segment4. However,
we believe that the criteria for determining the stability of the cervical
spine have not been clearly delineated.
Here we have proposed a system to assess stability of the cervical spine on
the basis of the concept that increasing amounts of skeletal displacement or
osseous separation correlate with the amounts of ligamentous disruption and
instability. The CSISS is determined by evaluating four distinct anatomic
columns, with the injury to each column scored independently. By adding the
scores for the four columns, the amount of overall stability is determined on
the basis of the assumption that combined column injuries will lead to greater
instability. This model is consistent with biomechanical studies demonstrating
increasing neutral zone motion with increasing severity of
injury9. Also, it is
not dissimilar to the basic principles of the qualitative schemes described by
Allen et al.2 or the
clinical stability checklist described by White and
Panjabi6.
Our four-column model of the spine is a modification of the model described
by Louis1. Louis,
who applied his system to C3 through L5, proposed that the spine is supported
by three pillars, with the anterior pillar consisting of the vertebral body
and disc and the two posterior pillars formed by the articular processes,
reinforced horizontally by the lamina and pedicles. We utilized four columns
in our system by adding the posterior osteoligamentous complex, which has been
shown to be of paramount importance to
stability4,10.
The CSISS demonstrated excellent intraobserver and interobserver
reliability. We believe that this is due to the division of the spine into
four columns and the critical assessment of each. Scoring was simplified by
the use of an analog system. Greater variation in the scores was introduced
primarily by the examiner's lack of appreciation of subtle signs of
substantial ligamentous disruption, such as facet joint diastasis, disc
distraction, an ankylosed spine, or interspinous widening. Because major
ligamentous or disc derangements were underestimated in some of our patients,
the variability in the scores may have been reduced by the addition of
magnetic resonance images. The CSISS performed well for all fracture types and
for relatively minor to grossly unstable injuries. The fracture type
associated with the greatest variation in scores was a fracture in an
ankylosed spine.
The kappa statistic is the most commonly used measure of agreement in an
analysis of categorical data. When there are a large number of classes
(choices), there is less of a chance that two raters will choose the same
category, which results in lower kappa values. A weighted kappa has been
developed that adjusts for the extent of the disagreement by assigning weights
based on the distance between the disagreement cells. When the categories are
equally spaced along one dimension, the weighted kappa is equivalent to the
intraclass correlation
coefficient11. The
intraclass correlation coefficient is thus useful for measuring observer
reliability in the presence of ordinal data for which there are many
categories or when the data are continuous, as they were in this study. One
must keep in mind that heterogeneous populations, for which there are wide
ranges of scores, often have elevated intraclass correlation coefficients as a
result of large between-subject variances. However, the intraclass correlation
coefficient is still mainly influenced by the amount of variability among
raters compared with the overall variability in the data and is therefore a
good representation of observer reliability.
As a preliminary step to understanding the clinical utility of the CSISS,
we examined the association between the scores and whether surgery had been
performed and the association between the scores and the presence of a
neurologic deficit. A clinically useful tool for assessing stability should be
correlated with these two factors. We found that all of the patients with a
CSISS score of =7 points were treated surgically. Similarly, eleven of the
fourteen patients with a score of =7 points had a neurologic deficit
compared with only three of the twenty with a score of <7 points.
A table showing the injury type, neurologic function, type of treatment,
and CSISS scores for all patients is available with the electronic versions of
this article, on our web site at jbjs.org (go to the article citation and
click on "Supplementary Material") and on our quarterly CD-ROM
(call our subscription department, at 781-449-9780, to order the CD-ROM).
?