The evaluation of patients with questionnaires after shoulder surgery has
been shown to be reproducible, valid, and
reliable1-7.
Patient-administered questionnaires such as the Simple Shoulder Test (SST) or
the American Shoulder and Elbow Surgeons (ASES) questionnaire provide insight
into patient limitations preoperatively, allowing for direct comparison with
functional outcomes postoperatively with good
responsiveness1,6.
Responses to patient-derived questionnaires have been shown to be similar
to physician assessments in studies of the hip and
knee8-14.
Reliable patient-derived functional outcome assessment is an attractive option
as it minimizes loss of valuable patient outcomes data, clinic time, and
personnel resources. However, concerns remain about the limitations of this
type of assessment, particularly with regard to differences between patient
and physician assessments of
outcome8,12,13.
We found no reports in the literature comparing patient and physician
assessments of outcomes after shoulder arthroplasty. Furthermore, we found no
study, of any body region, addressing patient assessment of physical
examination parameters such as range of motion, stability, or strength. We
performed the present study to examine the relationships between patient and
surgeon assessments of pain and functional outcomes, including stability,
strength, and range of motion, after shoulder arthroplasties.
Patients
From August 2003 to February 2004, consecutive patients who agreed to
participate in this study in the clinical practices of the two senior authors
(R.H.C. and J.W.S.) were evaluated at routine follow-up visits. Approval for
this study was obtained from the overseeing institutional review board.
Only patients who were seen at a minimum of six months after the shoulder
arthroplasty were asked to participate in the study. When a patient had
undergone bilateral surgery, the shoulder that was operated on first was
included in the statistical evaluation. We examined two patients who could not
participate in the study because of cognitive limitations, and they were not
included. Incomplete questionnaires filled out by the physician or patient
were not included for analysis. Sixty-nine patients agreed to participate;
however, two did not fill out the questionnaire completely, leaving
sixty-seven patients with a mean age of sixty-one years (range, thirty-eight
to eighty-one years). There were thirty-three men and thirty-four women.
The mean time between the surgery and the completion of the questionnaire
was fifty-one months (range, six months to twenty-two years). Thirteen
shoulders had been followed for less than one year; seventeen, for one to two
years; sixteen, for two to five years; thirteen, for five to ten years; and
eight, for more than ten years. Primary total shoulder arthroplasty had been
performed in forty-seven patients; hemiarthroplasty, in nine; revision of the
humeral and glenoid components, in nine; revision of only the glenoid
component, in one; and revision of only the humeral stem, in one. The
diagnoses leading to the arthroplasty included primary osteoarthritis in
forty-eight patients, rheumatoid arthritis in ten, osteonecrosis in five,
acute trauma in three, and polymyalgia rheumatica in one.
Patients who agreed to participate in the study were given the shoulder
questionnaire, told that the study involved assessment of surgical outcomes,
and asked to respond to all questions. The patients completed the
questionnaire in the examination room and then returned it to a designated
clinic employee, who checked it for completeness before the patient was seen
by the physician. All patients were then evaluated by a physician within two
hours after having received the shoulder questionnaire. The treating physician
recorded the patient's history and carried out a physical examination and then
discussed an identical shoulder questionnaire with the patient and completed
it in the patient's presence.
The two senior authors either directly performed the examinations or
supervised and confirmed the examinations by three trainees. The physicians
answered pain-related questions by asking patients to assess their overall
level of pain, pain at night, pain without activity, and pain with activity on
a scale of 1 (no pain) to 10 (severe). Physicians then performed a physical
examination and reported the findings regarding stability, strength as
measured with manual motor testing, and range of motion (active elevation,
external rotation, and internal rotation) on a questionnaire that was
identical to what the patient filled out. The questionnaires were then
collected by a designated clinical staff member and were stored for later
evaluation.
Shoulder Questionnaire
The questionnaire addressed clinical and functional outcomes. Satisfaction
and pain, including "pain overall," "pain at night,"
"pain without activity," and "pain with activity,"
were subjectively rated on an ordinal scale ranging from 1 (no pain) to 10
(severe) (Fig. 1). Stability
was assessed with one question that asked for a rating on an ordinal scale
ranging from 1 (stable) to 10 (very unstable [dislocates]). Strength was also
assessed with one question that asked for a rating on an ordinal scale ranging
from 1 (normal) to 10 (paralysis). Range of motion was assessed with three
questions, each of which included a pictorial drawing (for forward elevation,
external rotation, and internal rotation)
(Fig. 2). Forward elevation
(0° to 180°) and external rotation (-60 to 90°) were assessed by
choosing among 10° increments, and internal rotation was assessed by
choosing among eight reference points along the spine.
Outcomes were also graded according to a modified Neer rating system for
physician and patient
responses15-17.
The result was considered to be excellent if the patient had no or slight pain
(a pain score of 1 to 4), had external rotation of =45°, had active
abduction of =140°, and was satisfied with the result. The result was
satisfactory if the patient had no or slight pain or moderate pain only with
vigorous activity (a pain score of 1 to 6), had external rotation of
=20°, had active abduction of =90°, and was satisfied with the
procedure. If any of these criteria were not met, the result was considered
unsatisfactory.
Statistical Analysis
The agreement between the physician assessment and the patient responses
was examined with use of the intraclass correlation coefficient along with 95%
confidence
intervals18.
Intraclass correlation is a measure of agreement, with 1.0 indicating perfect
agreement, and is similar to the Pearson correlation coefficient, which has
been used as an indicator of agreement. However, the Pearson correlation
coefficient does not necessarily provide an indication of actual agreement
because it disregards systematic differences. The intraclass correlation
accounts for systematic mean differences and can be interpreted as a
chance-corrected index of agreement. We used the benchmarks for intraclass
correlation set forth by Landis and
Koch19, whereby
0.00 to 0.20, 0.21 to 0.40, 0.41 to 0.60, 0.61 to 0.80, and 0.81 to 1.00
indicate poor, fair, moderate, substantial, and almost perfect agreement,
respectively. Raw agreement was also determined for each of the questions by
calculating the proportions of exact and approximate agreement. A macro
formula was used to calculate the intraclass correlation statistics along with
the 95% confidence
intervals18. The
intraclass correlation used in this study assumes that all subjects are
assessed by the same raters (themselves and their surgeon), who constitute the
entire population of raters.
Exact agreement, as the term suggests, was defined as those cases
in which the patient and the physician chose identical responses for a given
item. Approximate agreement was defined as agreement within one or
two grades, in a positive or negative direction, for the ordinal items scaled
from to 1 to 10, and within 10° to 20° for the items measuring motion.
A sample size of sixty-seven patients provided >90% power to detect a
correlation of =0.40 (medium effect). Thus, even with a relatively small
sample size, given the observed effects, there was adequate power for the
analyses performed.
The physician and patient responses to the four items related to pain were
in exact agreement at least 37% of the time (i.e., at least twenty-five of the
sixty-seven patient responses were in exact agreement with the physician
responses), and they were in approximate agreement (within two values, with
ten possible responses to the question) at least 85% of the time (fifty-seven
of sixty-seven) (Table I). The
responses to the questions regarding stability and strength were in exact
agreement 61% (forty-one of sixty-seven) and 40% (twenty-seven of sixty-seven)
of the time, respectively, and in approximate agreement (within two values on
the 10-point scale) 91% (sixty-one of sixty-seven) and 84% (fifty-six of
sixty-seven) of the time. There was exact physician-patient agreement
regarding active elevation and external rotation 31% (twenty-one of
sixty-seven) and 12% (eight of sixty-seven) of the time and approximate
agreement (within 20°, with nineteen possible responses to the question)
75% (fifty of sixty-seven) and 61% (forty-one of sixty-seven) of the time. The
responses to the question about internal rotation were in exact agreement 19%
of the time (thirteen of sixty-seven) and in approximate agreement (within two
spinal levels, with eight possible responses to the question) 81% of the time
(fifty-four of sixty-seven).
Assessment with intraclass correlation demonstrated "almost
perfect" agreement (intraclass correlation > 0.80) between the
patients and physicians with regard to five (of the nine) questions, including
those related to overall pain, pain at night, pain with activity, stability,
and active elevation (Table
II). There was substantial agreement with regard to two questions:
pain without activity (intraclass correlation = 0.66) and strength (intraclass
correlation = 0.69). The agreement with regard to the remaining two
questions—those relating to external rotation (intraclass correlation =
0.49) and internal rotation (intraclass correlation = 0.40)—was moderate
and fair, respectively.
Calculation of the mean differences between the physician and patient
responses to each question revealed that none of these mean differences was
greater than one response category or >10°, for any of the nine
response items (Table III).
While the differences were small, on the average the physician ratings were
lower (indicating less pain) for all four questions related to pain. The
physicians also rated stability and strength as being closer to normal, and
they reported less active elevation than did the patients.
The patient and physician assessments of the overall outcome (as excellent,
satisfactory, or unsatisfactory) with use of the modified
Neer15-17
rating system demonstrated exact agreement 69% of the time (forty-six of
sixty-seven) when all three outcome possibilities were used and 87% of the
time (fifty-eight of sixty-seven) when the satisfactory and excellent results
were grouped together. Further analysis demonstrated that the mean differences
between the physician and patient responses were small
(Table IV). The intraclass
correlation coefficient of 0.75 demonstrated substantial agreement. This high
intraclass correlation was related to the much smaller number of possible
responses compared with that in our shoulder questionnaire.
Analysis of the agreement on the basis of the type of surgery (revision or
primary shoulder arthroplasty) and the time from the index operation to
participation in the study demonstrated no significant differences between
these subgroups (p = 0.05). Thus, the sixty-seven patients were treated as
a homogeneous group.
This study demonstrated good agreement between patients and physicians
using a shoulder questionnaire to measure pain following shoulder surgery.
Furthermore, a high level of agreement was demonstrated for functional
parameters such as strength, stability, and active elevation. These findings
are similar to those seen in studies following hip and knee arthroplasty, in
which "acceptable" agreement was found between patients and
physicians regarding items that assessed functional
outcomes12,13;
however, the authors of those studies did not attempt to assess range of
motion. To our knowledge, we are the first to examine agreement between
physician and patient-derived values for range of motion.
Physician and patient assessments of outcome with use of the modified Neer
rating
system15-17
were also found to have substantial agreement (87% agreement if excellent and
satisfactory responses were combined). This higher level of agreement is
directly related to the broader definitions of agreement allowing for more
variation of response. For example, with use of the modified Neer rating
system, a patient estimate of active shoulder elevation of 170° and a
physician estimate of 145° both fall into the category of an
"excellent" result despite differing by 25°. Combining
satisfactory and excellent results, to assess which patients did and did not
do well, allows even more variation between patient and physician examiners
while still obtaining a high level of agreement. We believe that the use of
patient-derived responses could facilitate reliable, cost-effective assessment
of functional outcomes by reducing the amount of clinic personnel and
physician time that would otherwise be spent administering and performing
outcomes assessments.
The information gleaned from this study has some limitations.
Patient-physician agreement in their assessments of pain and function may be
directly related to the ability of the physician to communicate with the
patient. Physicians who allot more time for direct interaction with their
patient may better understand the patient's perception of the problem, and
this study did not account for any difference in the time allotted for such
interaction. In addition, the patients filled out the questionnaire in an
orthopaedic office setting, and their responses may not reflect the responses
to a questionnaire filled out at the patient's home. Finally, we did not
assess the patients' preoperative expectations regarding the postoperative
results. Such expectations could have influenced agreement correlations, as
was noted in a study of patients treated with total hip
arthroplasty9.
The role of patient questionnaires for the assessment of physical
examination parameters such as active elevation should be examined further. We
believe that such questionnaires can play a vital role in screening patients
who would benefit or not benefit from an office visit and also strengthen the
ability to perform outcomes research while minimizing the use of medical
resources by the patient and treating physician. There should continue to be
efforts to improve patient-physician agreement, with continued evaluation of
the patient-directed questionnaire with the goal of achieving a balance
between simplicity for patient use and complexity required for research and
monitoring. We concluded that the patient-derived questionnaire used in this
study provides a high level of agreement with the surgeon's assessment, and
its role should continue to be evaluated. ?