Scoring systems with which subjective and objective factors are
evaluated together and the final result is rated categorically are commonplace
in
orthopaedics1-8.
These systems reflect the patient's functional status from the surgeon's point
of view. Patient-completed health status questionnaires have become popular as
a means of evaluating function from the patient's point of
view9-12.
Comparison of physician-based scoring systems with health status instruments
can facilitate the investigation of the causes of disagreement between the
physician's and patient's points of view.
While using physician-based elbow scoring systems for the evaluation of
patients in several scientific investigations, we observed that pain seemed to
dominate the scoring systems. This was a cause of some concern because of the
strong influence of psychosocial aspects of illness on the perception of
pain13,14.
If physician-based scoring systems do not adequately reflect objective
measures of elbow function, they will not provide a basis for comparison of
elbow function according to the physician's and patient's points of view, and
they may undervalue objective improvements achieved by operative
intervention.
In this study, we analyzed the influence of objective and subjective
factors on the quantitative ratings of elbow function generated by several
physician-based and patient-based rating systems in order to test the
hypothesis that pain has a strong influence on these scores and that the
rating systems are therefore less sensitive to objective measures of elbow
dysfunction.
Patients
During a five-year period, health status data and elbow ratings were
systematically gathered during evaluations of patients at various stages of
recovery as part of various prospective and retrospective trials, all approved
by our Human Research Committee. Inclusion criteria for the present study were
(1) a previously sustained intra-articular fracture of the elbow that had been
treated operatively, (2) an age of eighteen years or older, (3) a date of
evaluation more than six months following the most recent surgery, and (4) no
interpositional or prosthetic arthroplasty. One hundred and four patients
satisfied these criteria and represent the study cohort.
There were fifty-two women and fifty-two men with an average age of
forty-six years (range, eighteen to seventy-nine years). The right arm was
involved in forty-five patients and the left arm, in fifty-nine. The dominant
elbow was affected in seventy-five patients (72%). Sixty-six patients were
employed outside the home at the time of the injury; forty-four performed
desk-based work, and twenty-two were laborers. Six patients were unemployed,
four were disabled, fourteen were retired, eleven were homemakers, one was a
volunteer, and two were students.
The initial injury was the result of a fall from a standing height in
forty-nine patients and the result of a higher-energy injury in fifty-five
patients. The injuries included an intra-articular fracture of the distal part
of the humerus in fifty patients, a fracture-dislocation of the elbow in
thirty-one (a posterior fracture-dislocation of the olecranon in eleven and a
posterior dislocation of the elbow with intra-articular fractures in twenty),
a simple elbow dislocation in nine, and an isolated radial head fracture in
fourteen. The patients were evaluated at a mean of fifty-eight months (range,
six to 273 months) after the injury and forty-six months (range, six to 136
months) after the latest surgery.
Evaluation
An investigator who was not involved in the patient's care evaluated each
patient according to the American Shoulder and Elbow Surgeons Elbow Evaluation
(ASES)2. The
evaluation consisted of an interview, a physical examination, radiographs, the
completion of three physician-based elbow scoring systems (Mayo Elbow
Performance Index
[MEPI]3, Broberg and
Morrey rating
system4, and
American Shoulder and Elbow Surgeons Elbow Evaluation
[ASES]2), and the
administration of an upper-extremity-specific health status questionnaire
(Disabilities of the Arm, Shoulder and Hand [DASH]) and a general health
status questionnaire (Short Form-36 [SF-36]). Arthrosis was graded according
to the system of Broberg and
Morrey4.
Pain
We employed the pain subscales of the
ASES2 as a
quantitative measure of pain for use in all analyses. Patients rated their
pain on five Likert 11-point ordinal scales ranging from 0, indicating no
pain, to 10, indicating the worst imaginable pain. The five scales were (1)
pain when it is at its worst, (2) pain at rest, (3) pain when lifting a heavy
object, (4) pain when doing a task with repeated elbow movements, and (5) pain
at night. We added the scores of these five scales for a summary pain score
ranging from 0 to 50 points, with 0 points indicating no pain.
Physician-Based Scoring Systems
The MEPI3 is one
of the most commonly used physician-based elbow-rating systems. This index
divides 100 points among a physician assessment of pain (45 points),
ulnohumeral motion (20 points), stability (10 points), and the ability to
perform five functional tasks (25 points). Pain is rated as none (45 points),
mild (30 points), moderate (15 points), or severe (0 points) by the physician
on the basis of an interview with the patient. The total score ranges from 5
to 100 points, with higher scores indicating better function. Categorical
ratings are assigned, with 90 to 100 points considered to be excellent; 75 to
89 points, good; 60 to 74 points, fair; and <60 points, poor.
The rating system of Broberg and
Morrey4 is also a
100-point system, based on motion (40 points), strength (20 points), stability
(5 points), and pain (35 points). The physician rates pain as none (35
points); mild with activity but requiring no medication (28 points); moderate
with or after activity (15 points); or severe at rest, requiring constant
medication, and disabling (0 points). In the categorical rating, 95 to 100
points indicates an excellent outcome; 80 to 94 points, a good outcome; 60 to
79 points, a fair outcome; and <60 points, a poor outcome.
The ASES2 is a
100-point scale that combines an assessment of pain based on the patient's
completion of five 11-point Likert scales (25 points); the patient's
assessment, on five 11-point Likert scales, of the same five functional tasks
used in the MEPI (30 points); ulnohumeral and radioulnar motion (30 points);
strength (10 points); and stability (5 points). This instrument is therefore
based partly on the perspective of the patient and partly on factors measured
and evaluated by the physician. The scores range from 0 to 100 points, with
higher scores indicating better function. No categorical ratings are
assigned.
Health Status Questionnaires
The DASH
questionnaire10 was
developed by the American Academy of Orthopaedic Surgeons in collaboration
with the Council of Musculoskeletal Specialty Societies and the Institute for
Work and Health as an outcomes instrument specific to the upper extremity, and
it is applicable to a wide variety of
problems10. The
questionnaire contains thirty items: twenty-one evaluate difficulty with
specific tasks, five evaluate symptoms, and one each evaluates social
function, work function, sleep, and confidence. The score ranges from 0 to 100
points, with higher scores indicating worse upper-extremity function.
The SF-369 is the
most commonly used general health status measure. The physical (PCS) and
mental (MCS) component summary scores were calculated and used for this
analysis. Both component scores range from 0 to 100 points and are
standardized to population norms. A score of 50 points is equal to the mean
score for the general population. Every 10 points above or below 50 represents
one standard deviation from the mean for the general population.
Statistical Analysis
Continuous data are presented in terms of the mean, standard deviation, and
range. Eleven demographic and clinical variables were examined with respect to
each of the outcome instruments; these included age, gender, injury to the
dominant side, distal humeral fracture, time since the last surgery, number of
operations subsequent to the original treatment of the injury, total arc of
flexion and extension, total arc of pronation and supination, ulnar
neuropathy, arthrosis, and pain. The Pearson product-moment correlation
coefficient was used to evaluate the association between continuous predictor
variables and each outcome instrument as well as between the outcome
instruments themselves. A power analysis (nQuery Advisor program, version 4.0;
Statistical Solutions, Saugus, Massachusetts) indicated that a minimum sample
size of 100 patients would provide 90% statistical power (ß = 0.1) to
detect a significant moderate correlation (absolute r = 0.50) with use of a
Pearson coefficient to correlate each health outcome score with the patients'
subjective pain scores and a Bonferroni significance level of 0.01 to account
for the different outcome instruments. Subsequent operations were defined
according to three levels (index procedure only, one subsequent operation, and
more than one subsequent operation), and the Spearman rho correlation was used
to test the influence of this variable on each outcome score. In addition to
measuring correlation, the univariate analysis involved Student t tests for
comparing outcome scores between men and women and according to the presence
or absence of categorical variables.
Multivariate analysis was based on two statistical approaches. First,
multivariate analysis of variance was performed to identify which of the
eleven variables were independently associated with scores on each outcome
instrument in the entire cohort of 104 patients. The F test was used to judge
the significance of each variable. A backward stepwise procedure was utilized,
with testing of all eleven predictor variables as candidates to determine the
final models, and goodness-of-fit was assessed with use of adjusted r
squared15. A
multivariate model containing the significant independent predictors was
established for each health outcome of interest (Broberg and Morrey system,
MEPI, ASES, DASH, SF-36 PCS, and SF-36 MCS). Since the arc of ulnohumeral
motion and pain were found to be important predictors of each outcome score
(with the exception of the SF-36 MCS), the second part of the statistical
modeling approach was to construct predictive equations with use of these
variables, with an ulnohumeral arc of 100° as the cutoff. Finally, on the
basis of the results of the multiple regression analyses, the relationships of
an individual patient's pain score and flexion-extension arc (<100° or
=100°) with their DASH and MEPI outcome scores were depicted
visually16. The
data were analyzed with use of the SPSS software package (version 12.0; SPSS,
Chicago, Illinois).
The scores (mean and standard deviation) for the physician-based
measures were 81 ± 16 points (range, 30 to 100 points) for the MEPI, 85
± 12 points (range, 50 to 100 points) for the Broberg and Morrey
system, and 82 ± 15 points (range, 31 to 100 points) for the ASES. The
scores for the patient-based evaluations were 20 ± 19 points (range, 0
to 73 points) for the DASH, 45 ± 11 points (range, 13 to 63 points) for
the PCS of the SF-36, and 49 ± 8 points (range, 25 to 59 points) for
the MCS of the SF-36.
Correlations Among Evaluation Instruments
Scores derived with the six evaluation instruments had moderate-to-high
levels of correlation with one another. The raw scores derived with the elbow
scoring systems (MEPI, Broberg and Morrey, and ASES) demonstrated excellent
agreement (Pearson correlation coefficient, 0.86 < r < 0.89; p <
0.001). The health status instruments (DASH and SF-36) showed moderate-to-good
correlation with the surgeon-based systems, with the upper-extremity-specific
DASH scores showing stronger correlations (Pearson correlation coefficient,
-0.65 < r < -0.81; p < 0.001) than those shown by the PCS (Pearson
correlation coefficient, 0.55 < r < 0.61; p < 0.001) and the MCS
(Pearson correlation coefficient, 0.48 < r < 0.55; p < 0.001) of the
SF-36.
Predictors of MEPI Scores
In the univariate analysis, the MEPI was strongly correlated with the ASES
pain scores (r = -0.82; p < 0.001) and moderately correlated with the range
of flexion-extension (r = 0.40; p < 0.001), the range of
pronation-supination (r = 0.38; p < 0.001), and the number of subsequent
operations (rSpearman = -0.26; p < 0.01).
Multivariate analysis revealed age (F = 4.28; p < 0.05),
flexion-extension (F = 7.34; p < 0.01), pronation-supination (F = 5.10; p
< 0.05), and pain (F = 184.32; p < 0.001) to be independent predictors
of MEPI scores (Table I). The
model with these four variables accounted for 73% of the variability in the
MEPI scores. A model including pain alone accounted for 66% of the variability
in the MEPI scores. When pain was excluded from the model, the best model
accounted for only 22% of the variability in the MEPI scores.
The relative importance of the independent variables of pain and motion is
illustrated by the equation: Y[MEPI] = -1.1 × (ASES pain
score) + 6 points (flexion-extension arc < 100°) or + 12 points
(flexion-extension arc =100°) + 85
(Fig. 1).
Predictors of Broberg and Morrey Scores
The results of the univariate and multivariate analyses of the Broberg and
Morrey evaluation system were similar to those of the MEPI. The univariate
analysis showed a significant correlation with the number of subsequent
operations (rSpearman = -0.27; p < 0.01), moderate correlation
with flexion-extension (r = 0.54; p < 0.001) and pronation-supination (r=
0.50; p < 0.001), and strong correlation with the ASES pain score (r =
-0.77; p < 0.001).
The independent predictors of the Broberg and Morrey score in the
multivariate analysis were age (F = 4.86; p < 0.05), flexion-extension (F =
31.90; p < 0.001), pronation-supination (F = 21.89; p < 0.001), and pain
(F = 178.38; p < 0.001) (Table
I). The model with these three variables accounted for 79% of the
variability in the Broberg and Morrey scores. The model with pain alone
accounted for 59% of the variability in the Broberg and Morrey scores, whereas
the model without pain accounted for 41% of the variability.
Predictors of ASES Scores
The ASES scores correlated with ASES pain scores (r = -0.76; p < 0.001),
flexion-extension (r = 0.56; p < 0.001), pronation-supination (r = 0.49; p
< 0.001), and number of subsequent operations (rSpearman =
-0.37; p < 0.001) in the univariate analysis. The ASES was the only measure
that showed significant correlation with arthrosis (t = 2.17; p < 0.05) and
ulnar neuropathy (t = 2.46; p < 0.05) in the univariate analysis, with the
presence of arthrosis or ulnar neuropathy associated with worse ASES
scores.
In the multivariate analysis, only age (F = 9.19; p < 0.05),
flexion-extension (F = 36.57; p < 0.001), pronation-supination (F = 20.28;
p < 0.001), and pain (F = 178.00; p < 0.001) were independent predictors
of the ASES scores (Table I).
The multivariate model accounted for 79% of the variability in the ASES
scores. When age and range of motion were excluded from the model, 57% of the
variability in the ASES scores was accounted for by pain alone. A model
without pain accounted for 41% of the variability in the ASES scores.
Predictors of DASH scores
Univariate analysis of continuous variables showed significant (p <
0.01) but only moderate correlation of DASH scores with the number of
operations subsequent to the index procedure (rSpearman = 0.32),
flexion-extension (r = -0.42; p < 0.001), and pronation-supination (r =
-0.34; p < 0.01). The strongest correlation was between the DASH scores and
the ASES pain scores (r = 0.61; p < 0.001). Univariate analysis of
dichotomous variables showed a significant influence of ulnar neuropathy (t =
2.22; p = 0.03).
Multivariate analysis identified pain (F = 49.1; p < 0.001) and
flexion-extension (F = 15.96; p < 0.001) as significant independent
predictors of the DASH scores (Table
II). The model with pain and flexion-extension accounted for 45%
of the variability in the DASH scores, whereas the model with pain alone
accounted for 36% of the variability in the DASH scores and the model with
motion alone accounted for only 17% of the variability.
We used a cutoff point of 100° for the flexion-extension arc to derive
the following equation to illustrate the independent relationship of pain and
flexion-extension with the final DASH scores: Y[DASH] = 0.93
× (ASES pain score) - 5 points (flexion-extension arc < 100°) or
- 10 points (flexion-extension arc =100°) + 17
(Fig. 2).
Predictors of SF-36 PCS Scores
Univariate analysis demonstrated correlations between the SF-36 PCS scores
and flexion-extension (r = 0.22; p < 0.05), pronation-supination (r = 0.29;
p < 0.05), number of subsequent operations (rSpearman = -0.25; p
< 0.05), and pain (r = -0.57; p < 0.001).
In the multivariate analysis, older age (F = 18.8; p < 0.001), a smaller
pronation-supination arc (F = 4.4; p < 0.05), and a higher ASES pain score
(F = 57.4; p < 0.001) were identified as significant independent predictors
of a worse PCS score (Table
II). This model accounted for 45% of the variability in the PCS
scores. A model with pain alone accounted for 32% of the variability in the
PCS scores, and the model without pain accounted for 12% of the
variability.
Predictors of SF-36 MCS Scores
The SF-36 MCS scores had a moderate correlation with pain (r = -0.50; p
< 0.001) and pronation-supination (r = 0.21; p < 0.05) in the univariate
analysis.
The multivariate analysis showed that pain also dominated the MCS scores.
The model with pain (F = 47.6; p < 0.001) and age (F = 13.9; p < 0.001)
accounted for 35% of the variability in the MCS scores, but only 3% of the
variability in the scores was accounted for when pain was removed from the
model.
These data confirm our hypothesis that pain dominates measures of
elbow function and health status. Pain was the strongest predictor of all
physician and patient-based scores. Objective factors alone were much poorer
predictors of final scores and added relatively little predictive value to
that provided by pain alone in most of the multivariate statistical models.
These statements are most applicable to patients who have recovered from
intra-articular elbow trauma (the focus of this study), but they are likely to
be generalizable to other elbow conditions.
Our data agree with those of Turchin et
al.17, who compared
five physician-derived elbow-scoring systems and reported a lack of agreement
in the categorical rankings derived with those systems but good correlation
among the raw scores. Our finding of greater responsiveness of the
upper-extremity-specific health status measure (DASH) compared with that of
the general health status measure (SF-36) with regard to detecting clinical
change in the function of the upper extremity has also been noted
previously17-19.
In our study, the surgeon-based scores showed moderate correlation with the
health status measures, with a better correlation with the
upper-extremity-specific measures than with the general health status
measures. One interpretation of those findings is that the various instruments
have good agreement with each other and seem to be measuring elbow function
consistently. An alternative interpretation, supported by our analysis, is
that the instruments are all being driven and dominated by the variable of
pain.
The strong influence of pain on outcome measures has been noted for other
upper-extremity conditions. Tomaino et al. noted that a satisfactory
postoperative function score following limited wrist fusion was more dependent
on pain relief than on residual
motion20. Midha et
al. found two significant predictors (p < 0.01) of various outcome measures
for assessing patients with ulnar neuropathy at the elbow: pain accounted for
60% of the variation in the scores whereas objective measures of strength and
function accounted for only 17% of the variability in the
scores21. Karnezis
and Fragkiadakis identified grip strength as the only significant objective
predictor of posttraumatic wrist function (p < 0.01) as assessed with a
wrist-specific health status
measure22. In the
absence of neuromuscular disorders, grip strength has a strong relationship
with pain.
The experience and expression of pain are strongly influenced by
psychological and sociological factors and are not always explained by
objective
factors13,14.
As a result of the strong influence of pain on elbow ratings and health status
measures, objective improvements in elbow function achieved by operative
procedures may be devalued by the use of these systems. For example, a patient
with a complex fracture-dislocation of the elbow who regains nearly normal
motion, strength, and stability would be considered to have an excellent
result according to the standards of any orthopaedic surgeon, but several such
patients seen in our practices and evaluated in our research had poor
standardized elbow scores and health status ratings because of substantial
pain. Some patients had a clear secondary gain issue such as a lawsuit, an
insurance claim, or narcotic dependence that we believed to cause the
discrepancy between the objective and subjective evaluations of the results.
The reasons for the discrepancy were less clear for other patients, but they
seemed to be related to less obvious psychosocial factors such as psychiatric
illnesses (e.g., anxiety or depression), stress and dissatisfaction (at work
or of a personal nature), or a maladaptive personality and poor coping
mechanisms. These factors have been noted in patients with chronic pain and
idiopathic
pain13,14,23,24.
We suggest the following as an alternative to elbow scoring systems that
combine objective and subjective factors. The results of reconstructive elbow
procedures can often be evaluated on the basis of one primary objective goal
such as the restoration of elbow mobility. The change in the upper-extremity
health status could then be compared with the change in the objective outcome
measure (e.g., elbow motion). Any discrepancies could then be evaluated to
determine what is preventing the patient from feeling that the elbow is more
functional when improvements in objective measures of function (e.g.,
increased motion) have been observed. In other words, this approach would
facilitate an evaluation of the relationship between achieving the goal of
surgery (e.g., motion) and improvement in function (e.g.,
upper-extremity-specific health status), while accounting independently for
other potentially important objective factors (such as instability, arthrosis,
or ulnar neuropathy) and subjective factors (such as pain, depression,
anxiety, and maladaptive coping skills or personality disorders) with use of
multivariate statistical techniques. ?