Abstract
Background: The American Academy of Orthopaedic Surgeons (AAOS) has
developed an array of outcomes assessment instruments designed for the
efficient collection of outcomes data from patients of all ages with
musculoskeletal conditions in all body regions. The Lower Limb Instruments
were developed through a process of literature review, consensus-building, and
field-testing.
Methods: The instruments were distributed to a total of 290 subjects
in twenty orthopaedic practices throughout the United States and Canada. Of
the 290 patients, seventy each had a diagnosis in the categories of foot and
ankle, sports/knee, and hip and knee and forty each had a diagnosis in the
categories of trauma and rehabilitation. Retests to be taken twenty-four hours
after the first test were distributed to subsamples of patients for each
instrument. Seventy-one one-year follow-up questionnaires (twenty-five
Sports/Knee, twenty-five Foot and Ankle, sixteen Hip and Knee, and five Lower
Limb Core instruments) were returned.
Results: The Lower Limb Core Scale and the Hip and Knee Core Scale,
each consisting of seven items addressing pain, stiffness and swelling, and
function, performed at an acceptable level. Additional Sports/Knee and Foot
and Ankle Modules proved to have internal and retest reliability of 0.80 or
better, comparable with the values for well-established measures such as the
Short Form-36 (SF-36). All of the new scales were moderately to strongly
correlated with other measures of pain and function, such as physician
ratings, the SF-36, and the Western Ontario and McMaster Universities
Osteoarthritis Index (WOMAC). Seventy-one patients provided follow-up
information for the analysis of sensitivity to change. The Lower Limb Core was
found to contribute independently to the prediction of the transition score
based on the patient and physician assessments of change.
Conclusions: The AAOS Lower Limb Instruments for outcomes assessment
are highly reliable and are correlated with other measures for similar
constructs. They are also sensitive to change in patient status. The Lower
Limb Core Scale may be used with attribution of pain either to the lower limb
or to a specific joint or side without sacrificing reliability. Combined with
the SF-36, the AAOS outcomes assessment instruments comprehensively and
efficiently measure outcomes in orthopaedic patients with lower-limb
conditions.
In the 1990s, the American Academy of Orthopaedic Surgeons (AAOS) began a
commitment to the measurement and analysis of musculoskeletal outcomes. An
initial stage of this program was the development and testing of regional and
pediatric musculoskeletal outcomes assessment instruments. In September 1993,
the AAOS began the process of instrument construction with consensus-building
meetings. By December 1994, a full array of items (questions) pertaining to
all anatomic regions were ready for a scale-building and validation effort.
The purpose of this report is to summarize the development and validation of
the AAOS Lower Limb Core Scale, Hip and Knee Core Scale, Sports/Knee Module,
and Foot and Ankle Module.
Item and Scale Selection
Amodified nominal group technique was used to identify the relevant domains
of the Lower Limb Instruments at a jointly sponsored meeting of the AAOS and
Council of Musculoskeletal Specialty Societies (COMSS) in Tarpon Springs,
Florida, in April 1994. Small work groups of clinicians and health-services
researchers with expertise in the hip and knee, foot and ankle, pediatric
disorders, and sports injuries to the knee were asked to focus on the
patient-oriented outcomes that were realistically expected to be affected by
medical and surgical interventions. A comprehensive, collectively exhaustive,
mutually exclusive list was developed. For this exercise, the participants
were asked to defer discussions of how the domains were to be measured and to
concentrate on content only.
After the domains were identified, they were grouped according to
similarity and inspected in relation to published questionnaires, instruments,
and scales that might capture the domain of interest. This compendium had been
prepared prior to the meetings. It included original citations describing the
instruments and their psychometric properties as well as the actual
questionnaires. In separate resource books, the general domains were organized
so that a comparison could be made between the different approaches to asking
similar questions. If no suitable wording or questions existed, the
participants were asked to purpose their own.
The agreed-on items were then tested for validity, reliability, and
sensitivity1.
Validity is the psychometric criterion whereby an outcome measure is tested
for its ability to actually measure what it purports to measure. In the
absence of a socalled gold standard, a number of approaches are used to assess
validity. Construct validity is the extent to which a measure corresponds to
theoretical concepts or constructs concerning the phenomenon of interest. For
example, a measure of lower-limb function might be expected to change with
age. Content validity is the extent to which a measure represents the domain
of interest. Face validity (Does the instrument "look like" it
measures what it purports to measure?) is an example of content validity. The
AAOS instruments were developed by clinicians who continually confirmed their
face validity during the item selection and pilot testing period. Criterion or
concurrent validity is assessed by correlating scores on a new instrument with
external criteria known or believed to measure the attribute.
Reliability is the random error of a measure, or the extent to which the
scores are reproducible.
We use the term sensitivity throughout this article to denote the
ability of an instrument to measure any change, as has been suggested by
Fortin et al.2 and
by Liang et al.3. In
distinction, responsiveness is the ability of a measure to capture
clinically meaningful changes in a patient's state.
The Medical Outcomes
Study4,5
Short Form-36 (SF-36) was selected as a companion general health status
questionnaire for all instruments. In addition to reliably assessing the
patient's general health, the SF-36 contains subscales that form a robust
assessment of physical function. The SF-36 has demonstrated the improvements
following many orthopaedic interventions. The perceived challenge in 1994 was
the need to construct scales that would provide a higher level of region
specificity and symptom definition and would not only measure outcomes but
also assist in clinical decision-making.
The Western Ontario and McMaster Universities Osteoarthritis Index
(WOMAC)6 has been
used frequently by hip and knee surgeons for outcomes assessment. It contains
items that address pain (five items), stiffness (two items), and function
(seventeen items). The WOMAC functional scale has substantial item overlap
with the SF-36 physical function subscale. For the purposes of criterion
validation, it was included in the assessment of patients with hip and knee
disorders and compared with the results of the AAOS instruments.
This report summarizes the results of field testing of four major
instruments: (1) the Lower Limb Core Scale, a global scale that was created by
combining seven items into three subscales (pain attributed to the lower limb,
stiffness and swelling, and function); (2) the Hip and Knee Core Scale, which
is identical to the Lower Limb Core Scale, with attribution of pain to the
left or right hip or knee; (3) the Foot and Ankle Module, which contains two
scales—the Global Foot and Ankle Scale (twenty items and four subscales
measuring pain, function, stiffness and swelling, and giving way) and the Shoe
Comfort Scale (five items); and (4) the Sports/Knee Module, which contains the
Lower Limb Core Scale with attribution of symptoms to the knee only and also
contains five other scales with four items each—Knee Giving Way, Knee
Locking or Catching, Preinjury function, Current (Postinjury) Limitations on
Activity, and Pain on Activity—which can themselves be combined into a
global score.
The Lower Limb Instruments, scoring algorithms, and description of
normative data can be accessed by logging onto
,
clicking on "Research," and clicking on "Outcomes."
After the user passes through the security barriers, the respective items are
under "Lower Extremities," "Hip and Knee,"
"Sports/Knee," and "Foot and Ankle." The instruments
described here were the results of the efforts of several work groups and
committees and of feedback gained by wide distribution of the questionnaires
among musculoskeletal specialists. The series of Lower Limb Instruments
combines features of extant, and in many cases substantially longer,
instruments into shorter more user-friendly questionnaires. The instruments
may be used alone or supplemented by the
SF-364,5,
a widely used general health status questionnaire. The modular approach
employed here was specifically designed to enhance usefulness and flexibility
and to optimize the time required for questionnaire completion.
Study Design
Institutional review board approval was obtained for this study. All
instruments were tested in two stages. The first stage was a cross-sectional
study of internal reliability, validity, and twenty-four-hour test-retest
reliability. Each subject was also asked an open-ended question about whether
the questionnaire captured all of his or her concerns and, if not, what were
the concerns that were not captured. Subsequently, a one-year follow-up study
was done to evaluate the instruments' sensitivity to change. The newly created
scales were compared with physician assessments of the patient's pain and
ability to work, perform self-care, and engage in recreational activities;
with the results of the
SF-364,5;
and with the WOMAC6
scores of patients with hip or knee arthritis.
Subjects
The patients who were recruited had common orthopaedic conditions covering
a range of functional deficits and chronic, progressive disorders with a
continuum of pain and functional levels that would be expected to demonstrate
a measurable response to treatment over one year. The diagnostic categories
included (1) sports/knee (knee ligament injuries, meniscal tears, and
patellofemoral disorders); (2) foot and ankle (bunions, plantar fasciitis,
hindfoot arthrodesis, ankle arthrodesis, rheumatoid foot deformities, claw
toe, hammer toe, and Morton neuroma); (3) hip and knee (rheumatoid arthritis
and osteoarthritis); (4) trauma (fractures of the femur, tibia, pelvis, and
acetabulum); and (5) rehabilitation (amputation).
Consecutive patients were recruited from twenty orthopaedic practices
representing generalists and subspecialists, with academic and nonacademic
affiliations in metropolitan, suburban, and rural settings.
Sample Size Calculations
Sample sizes were calculated to achieve representation of important
diagnostic, practice, and geographic subgroups and were compared with the
numbers needed for various statistical analyses (psychometric analyses and
assessments of reliability, validity, and comparative sensitivity of
instruments to change). Final sample sizes were adjusted to provide adequate
power for important statistical and subgroup questions.
We determined a target sample size on the basis of several assumptions.
Psychometric analyses of summated scales are best done with at least five to
ten subjects per item, to reduce the effect of
chance7. The longest
potential summated scale (the Lower Limb Core Scale prior to item reduction)
had a total of twenty-one items, suggesting a need for at least 200 subjects.
This provided sufficient numbers for analyses of test-retest reliability,
validity, and sensitivity to
change8. In general,
correlations at the levels expected in test-retest situations (e.g., r >
0.80) can be estimated with reasonable precision (95% confidence intervals of
±0.1) with as few as fifty subjects. Validity is determined primarily
by examining patterns of correlation, and 150 subjects provide adequate power
to estimate correlation coefficients with reasonable precision.
For sensitivity testing, the new scales were compared with the SF-36, with
use of the differences between the scales' correlation coefficients and those
of a common global physician and patient transition report (better, same, or
worse). Alternatively, standardized response means for each scale can be
compared with use of a ratio of paired t
tests9. In each
case, sensitivity could be adequately assessed, with detection of moderate
differences between scales with 80% power and an alpha of 0.05 with a sample
size of 150.
Data Management
Physician members of the instrument development committee provided patients
and nominated other members of their communities and specialty societies to
participate in field testing. Consecutive patients were recruited from the
practices of twenty physicians, with an average of thirteen patients from each
practice. Physicians filled out their assessment forms after each examination,
and both patient and physician forms were mailed to a central data-management
site. Nine packets in each office contained retest questionnaires, which were
sent home with subjects for completion twenty-four hours later. The retest
forms were mailed directly to the central data-management site. One-year
follow-up questionnaires were mailed to the patients and physicians. All data
were entered twice and 100% verified. Statistical analyses were performed with
use of the SAS statistical
package10, on a Sun
microcomputer (Sun Microsystems, Santa Clara, California).
Statistical Analyses
Scales were constructed and assessed with classic psychometric
techniques7,
including internal reliability (Cronbach coefficient alpha), test-retest
reliability (Pearson coefficient), and exploratory factor analyses. Face
validity and content validity of scales were determined by the item selection
process. Construct validity of scales was determined by examining patterns of
correlations between them and physician and patient assessments and previously
validated scales (the SF-36 and WOMAC) and by assessing the ability of the new
scales to discriminate between groups, as demonstrated by the F and t tests.
Sensitivity to change was assessed on the basis of correlations between the
change in scores (follow-up minus baseline) and physician and patient-rated
transition scores (better, same, or worse, as assessed at the time of
follow-up).
Patient Recruitment
The instruments were distributed to a total of 290 subjects, seventy each
with a diagnosis in the foot and ankle, sports/knee, and hip and knee
categories and fifty each with a condition in the trauma and rehabilitation
categories. Retests to be taken twenty-four hours after the first test were
distributed to subsamples of patients to provide fifty retests for each new
instrument or subscale. The SF-36 was excluded from the retesting because of
its well-established reliability.
All subjects provided written informed consent. Useable baseline
questionnaires were returned by 205 patients (seventy with a diagnosis in the
foot and ankle category, fifty-nine with a diagnosis in the sports/knee
category, forty-three with a diagnosis in the hip and knee category,
twenty-four with a diagnosis in the rehabilitation category, and nine with a
diagnosis in the trauma category). Twenty-four-hour retests were available for
a total of 168 patients (fifty-one who completed the Sports/Knee
questionnaire, forty who completed the Hip and Knee Core questionnaire,
twenty-nine who completed the Foot and Ankle questionnaire, and forty-eight
who completed the Lower Limb Core questionnaire). There were seventy-one
one-year follow-up questionnaires (twenty-five Sports/Knee, twenty-five Foot
and Ankle, sixteen Hip and Knee Core, and five Lower Limb Core
questionnaires). There were no significant differences between the groups who
returned follow-up questionnaires and those who did not from the standpoint of
physical and mental health measured by the SF-36, age, gender, education, or
diagnosis. There was a trend toward the patients in the follow-up group being
younger than those in the non-follow-up group (forty-six compared with
fifty-two years). Those completing the Sports/Knee questionnaire had a higher
rate of follow-up than did all of those completing the other instruments (42%
compared with 30%).
Patient Characteristics
Fifty-four percent of the subjects were male, and 90% were white. Eleven
percent of the patients had less than a highschool education, 23% were
high-school graduates, 28% had attended college or technical school but had
not graduated, and 38% were college graduates. The average age (and standard
deviation) was 48 ± 16.9 years, with a range of twenty-one to
eighty-five years. Patients who completed the Foot and Ankle questionnaire
reported an aggregate of twenty-three diagnostic categories, whereas those who
completed the Sports/Knee questionnaire reported twenty-four and those who
completed the Hip and Knee Core questionnaire reported fourteen. Patients in
the rehabilitation and trauma groups who completed the Lower Limb Core
questionnaire reported a total of twelve diagnostic categories. The most
frequently reported diagnoses were osteoarthritis (fifty-four patients), toe
conditions (twenty-four), meniscal tear (twenty), knee ligament injuries
(seventeen), plantar fasciitis (fourteen), patellofemoral disorders (nine),
rheumatoid arthritis (seven), ankle fusion (six), and miscellaneous
(forty-six).
Scale Construction and Reliability
Physician Assessment of Patients' Function and Pain
To test criterion validity, physicians rated the patient's ability to
perform work, school, or homemaking activities; self-care; and recreational
activities. The assessment was recorded on three 6-point scales (ranging from
"not limited at all" to "extremely limited or
unable"). The internal reliability of the summative scale was acceptable
(alpha = 0.79). The patient's pain at the time of the office visit was rated
on a 10-point scale (ranging from "no pain" to "severe
pain").
Lower Limb Core Scale
At the initiation of the scale construction process, the intent was to form
a single core scale that had acceptable face validity for all musculoskeletal
specialists and could be used efficiently to assess all lower-limb problems.
At the end of the first iteration of item generation, the large number of
items selected suggested that the core would need to be supplemented with
special items to provide flexibility and alleviate respondent burden.
At the beginning of field testing, there were twenty-eight items in the
instrument, which could be grouped into pain, stiffness and swelling,
function, giving way, and locking or catching subscales. A global scale that
included pain, stiffness and swelling, and function was constructed. The
preliminary individual scales had internal reliability (Cronbach alpha) of
0.81 to 0.95.
Principal factor analyses indicated that there was considerable overlap
between the new scales and the physical subscales of the SF-36. In the
interest of reducing respondent burden and the time for the administration of
the test, the Lower Limb Core Scale was reduced to seven items addressing
pain, stiffness and swelling, and function. This was treated as a summative
scale that could be used alone with attribution of pain to the lower limb or
used in a joint-specific module with attribution of pain to the hip and knee
or to the foot and ankle. The Giving Way Scale and the Locking or Catching
Scale were incorporated into the Sports/Knee Module (described below). The
internal reliability and retest reliability of the Lower Limb Core Scale were
good (alpha = 0.82 and r = 0.91) (Table
I).
Hip and Knee Core Scale
The Hip and Knee Core Scale with attribution of pain to the left or right
hip or knee was completed by forty-three patients with a hip and/or knee
disorder who also completed the Lower Limb Core Scale with attribution of pain
to the lower limb. The two methodologies, therefore, were compared as used for
the same patient. The results were very similar, with a high correlation
between the two core scales (r = 0.95)
(Table II). In addition to
completing the SF-36 and Lower Limb Core Scales, patients with hip and/or knee
disorders completed the WOMAC. For comparison purposes, a global WOMAC scale
was created by averaging the scores for all twenty-four items, and the
instrument was rescaled to a range of 0 to 100 possible points, with 100
representing the best function and least pain.
The WOMAC had excellent internal and test-retest reliability, with all
alphas and correlations >0.90 except for those for the two-item stiffness
subscale (alpha = 0.86 and r = 0.86) (Table
III). The WOMAC was highly correlated with the Hip and Knee Core
Scale (r = 0.89) (Table II) and
was correlated with the SF-36 physical function and pain scores (r =
0.68).
Sports/Knee Module
The Sports/Knee Module contains items addressing preinjury knee function,
postinjury knee function, knee pain on activity, activity limitations
resulting from the knee, knee swelling with activity, giving way, and locking
or catching.
A Sports/Knee Core Scale that was identical to the Lower Limb Core Scale,
with attribution of pain to the knee injury, was constructed. It had good to
excellent internal and retest reliability (alpha = 0.86 and r = 0.96)
(Table IV).
Four degrees of giving way were assessed for five levels of activity
(ranging from sedentary to very strenuous) in the full patient population. The
sedentary item was dropped because of poor correlation with the other items.
The resulting four-item scale had excellent internal and retest reliability
(alpha = 0.91 and r = 0.91) (Table
IV). However, 34% of the subjects had missing scores at the higher
levels of activity (very strenuous, strenuous, and moderate) because their
health prevented testing of giving way at those levels. Scoring such an item
as "unable to do" would have created a measure of function rather
than a measure of giving way. The Giving Way Scale appeared to be most
interpretable and valid for persons capable of performing at least at the
strenuous level.
Four degrees of locking or catching were assessed for five levels of
activity (ranging from sedentary to very strenuous) in the full patient
population. The sedentary item was dropped because of poor correlation with
the other items. The resulting four-item scale had excellent internal
reliability (alpha = 0.95) but only moderate retest reliability (r = 0.68)
(Table IV). Inspection of the
individual scores revealed a skewed distribution, with most subjects not
having the symptoms. A nonparametric correlation coefficient was therefore
used to assess retest reliability, and it indicated somewhat better
performance (Spearman r = 0.80). As with the Giving Way Scale, there was a 39%
rate of missing responses to the Locking or Catching Scale. The patients with
the missing responses were unable to perform at the top three activity levels
because of their physical health.
The other Sports/Knee Scale items address preinjury and postinjury status,
pain on activity, and functional limitations (general activity and specific
knee impairment). The resulting subscales had excellent internal reliability
with all alpha coefficients but one being >0.90, and all measures but
Locking or Catching had good to excellent retest reliability
(Table IV).
Foot and Ankle Module
The Foot and Ankle Module combines items from the Lower Limb Core Scale
with additional items that more completely assess symptoms and functional
status related to foot and ankle problems. Items that were field tested were
formatted into four subscales (Table
V) that addressed pain (nine items), function (six), stiffness and
swelling (two), and giving way (three). Each subscale had good internal
reliability (alpha = 0.83 to 0.91) except for the two-item stiffness and
swelling subscale (alpha = 0.61). These two items were found to perform better
in the context of a global scale such as the Lower Limb Core Scale. The retest
reliabilities for the subscales were uniformly good to excellent (r = 0.70 to
0.99).
The Global Foot and Ankle Scale combines twenty items for pain, function,
stiffness and swelling, and giving way, which generates a single score. The
internal reliability and retest reliability of this scale were good (alpha =
0.93 and r = 0.79) (Table
V).
The Shoe Comfort Scale, a companion to the Foot and Ankle Module, assesses
the ability to wear a variety of shoe types comfortably (with a response of
yes or no for each type). Orthopaedic shoes were reverse-scaled, as people
tended to wear them because of discomfort with other types of shoes. As
scoring of some items in this scale was contingent, a coefficient alpha was
not an appropriate measure of reliability. Retest reliability was good (r =
0.87) (Table V).
Validity Analyses: Correlations Among Measures
The validity of the Lower Limb Core Scale, the Hip and Knee Core Scale, the
Sports/Knee Scale, and the Foot and Ankle Scale was assessed by comparing them
with the physician assessments, the SF-36, and the WOMAC.
Table II shows correlations
among the Lower Limb Core Scale, specialty scales, SF-36, WOMAC, and physician
assessments of pain and function. The Lower Limb Core Scale was correlated
with physician measures of pain (r = 0.49) and function (r = 0.60), with the
SF-36 Physical Health score (average of SF-36 Bodily Pain, Physical Function,
and Role-Physical Subscales) (r = 0.60), and with the WOMAC global score (r =
0.89). It was strongly correlated with the global Sports/Knee score (r = 0.59)
and the global Foot and Ankle score (r = 0.89). This demonstrates the general
validity and probable usefulness of the brief seven-item scale, which also
demonstrated a high level of correlation with the Hip and Knee Core Scale (r =
0.95). The Hip and Knee Core Scale was nearly as strongly correlated with
physician assessment of function (r = 0.73) as was the WOMAC (r = 0.78).
Sensitivity Analyses: Ability to Measure Patient Change
Differences between the baseline and follow-up scores were calculated for
the Lower Limb Core Scale, the Giving Way Scale, the Locking or Catching
Scale, the SF-36, and the physician ratings of pain and function. This yielded
a change score, which was then compared with a transition score. The
transition score was calculated on the basis of a combination of a physician
and patient-generated questionnaire regarding the perception of improvement
during the year since the baseline evaluation. Two 5-point scales were used by
the patient and the physician (four items in total) to indicate whether pain
and function had improved, stayed the same, or become worse. The physician and
patient scores were averaged, resulting in a highly reliable scale (alpha =
0.88) that was used to best represent the patient's improvement. Patients were
also asked about their satisfaction with their current condition.
The mean score on the global transition scale was 3.9, with 1 indicating
much worse; 2, somewhat worse; 3, about the same; 4, somewhat better; and 5,
much better. Sixty-five percent of the scores were in the "better"
zone (>3.5), 21% were in the "same" zone (2.5 to 3.5), and 14%
were in the "worse" zone (<2.5).
All change scores for the various scales correlated positively with the
transition scores (Table VI)
and showed patient improvement. Additionally, the degree of satisfaction
expressed by the seventy-one patients at the time of the one-year follow-up
correlated with the transition scores (r = 0.62). The change in the Lower Limb
Core score and the SF-36 Physical Health score (the average of the scores on
the Bodily Pain, Physical Function, and Role-Physical Subscales) demonstrated
a moderate correlation with the transition scores obtained from the
patient-physician assessment.
To evaluate scale sensitivity to change, a regression analysis was done
with the transition score and the corresponding change score. This analysis
demonstrated which scale best reflected the patients' and physicians' sense of
how much improvement or deterioration had occurred over one year. In the
regression model, both the Lower Limb Core Scale and the SF-36 Physical Health
score contributed independently to prediction of the transition score,
accounting for 33% of the variance. The two best predictors of the transition
score for thirty-six subjects who filled out both the Giving Way and the
Locking or Catching Scales at baseline and at the time of follow-up were the
Lower Limb Core Scale and the Locking or Catching Scale, accounting for 40% of
the variance.
The mean change scores for each measure are shown in
Table VII. The largest T-scores
were for the Lower Limb Core Scale and the SF-36 Physical Health score, which
indicate that they were the most sensitive.
The battery of AAOS Lower Limb Instruments represents one of the first
efforts by professional societies to standardize a focus on patient-oriented
outcomes and marks a shift in paradigm of goals from anatomical restoration to
the effect of interventions on symptoms and function. Its development
emphasized clinical sensibility, practical application, and building on the
work of many others.
The Lower Limb Core Scale, consisting of seven items addressing pain,
stiffness and swelling, and function, is complementary to the SF-36.
Additional Hip and Knee Core, Sports/Knee, and Foot and Ankle Scales proved to
have acceptable internal and retest reliability of 0.80 or better, comparable
with that of other well-established measures. All of the new scales have
construct validity in that they correlated moderately to strongly with other
measures of pain and function, such as physician ratings, the SF-36, and the
WOMAC. The Lower Limb Core Scale was found to contribute independently to
prediction of the transition score constructed from the patient and physician
assessments of change.
The AAOS outcomes assessment instruments measure a wide range of pain and
function. They are generally reliable and are correlated with other measures,
as expected. They are also sensitive to change in patient status. The Lower
Limb Core Scale can be used with attribution of pain either to the lower limb
or to a specific joint or side without sacrificing reliability. Combined with
the SF-36, the AAOS outcomes assessment instruments should prove useful for
the measurement of outcomes in orthopaedic patients with lower-limb
conditions.
Some caveats should be acknowledged. All items were generated primarily by
clinicians. However, patients were asked whether the questionnaires addressed
their concerns, and no unaddressed issues were identified by these open-ended
questions. The use of the correlation coefficient rather than the kappa
statistic or intraclass coefficient for test-retest reliability is a potential
limitation of this study. Potential ceiling and floor effects of the scales
were not tested. However, this is a concern, to one degree or another, with
nearly all questionnaires when they are tested formally, and these effects
should be studied in greater depth in the future. Finally, not all specialty
scales could be studied for their sensitivity since only seventy-one patients
provided follow-up information. The small sample sizes also contributed to the
wide standard deviations that were observed.
Note: The authors gratefully acknowledge the contributions of
the following individuals in the development and validation of the
instruments: Earl R. Bogoch, MD, St. Michael's Hospital, Toronto, ON, Canada;
Arthur L. Boland Jr., MD, Massachusetts General Hospital, Boston, MA; John H.
Bowker, MD, University of Miami, Miami, FL; Robert L. Buly, MD, The Hospital
for Special Surgery, New York, NY; W. Dilworth Cannon Jr., MD, University of
California at San Francisco, San Francisco, CA; Lowell H. Gill, MD, Miller
Orthopaedic Clinic, Charlotte, NC; Michael G. Glover, MD, Wilson Memorial
Hospital, Wilson, NC; Frank A.B. Gottschalk, MD, University of Texas
Southwestern Medical Center at Dallas, Dallas, TX; John J. Harast, Brigham and
Women's Hospital/Harvard Medical School, Boston, MA; Richard C. Johnston, MD,
University of Iowa Hospitals, Iowa City, IA; Clifford R. Kahn, MD,
Encino-Tarzana Hospital, Encino, CA; James F. Kellam, MD, Carolinas Medical
Center, Charlotte, NC; Robert B. Keller, MD, Maine Medical Assessment
Foundation, Manchester, ME; Leland C. McCluskey, MD, Hughston Clinic,
Columbus, GA; Tye Ouzounian, MD, Encino-Tarzana Hospital, Encino, CA; Jennifer
Misius, American Academy of Orthopaedic Surgeons, Rosemont, IL; Charlotte
Phillips, RN, MPH, Maine Medical Center, Portland, ME; Harry E. Rubash, MD,
Pittsburgh Medical Center, Pittsburgh, PA; Melanie Sanders, MD, Orthopaedic
Medicine of Indiana, Indianapolis, IN; and David B. Thordarson, MD, USC
Orthopaedic Surgery Associates, Los Angeles, CA.
Liang MH, Jette AM. Measuring
functional ability in chronic arthritis: a critical review.
Arthritis Rheum.1981;24:
80-6.2480
1981
[PubMed][CrossRef]
Fortin PR, Stucki G, Katz JN.
Measuring relevant change: an emerging challenge in rheumatologic clinical
trials. Arthritis Rheum.1995;38:
1027-30.381027
1995
[PubMed][CrossRef]
Liang MH, Lew RA, Stucki G, Fortin
PR, Daltroy L. Measuring clinically important changes with
patient-oriented questionnaires. Med Care.2002;40(4 Suppl):
II45-51.40II45
2002
[PubMed]
Ware JE Jr, Snow KK, Kosinski M,
Gandek B.SF-36 health survey: manual and interpretation
guide. Boston: The Health Institute, New England Medical Center;
1993.
1993
Ware JE Jr, Kosinski M, Keller
SD.SF-36 physical and mental health summary scales: a users
manual. Boston: The Health Institute, New England Medical Center;
1994.
1994
Bellamy N, Buchanan WW, Goldsmith CH,
Campbell J, Stitt LW. Validation study of WOMAC: a health status
instrument for measuring clinically-important patient-relevant outcomes
following total hip or knee arthroplasty in osteoarthritis. J
Orthop Rheumatol.1988;1:
95-108.195
1988
Nunnally JC, Bernstein IH.Psychometric theory. 3rd ed. New York: McGraw-Hill;
1994.
1994
Cohen J.Statistical
power analyses for the behavioral sciences. 2nd ed. Hillsdale,
NJ: Lawrence Erlbaum Assoc; 1988.
1988
Liang MH, Fossel AH, Larson MG.
Comparisons of five health status instruments for orthopedic evaluation.
Med Care.1990;28:
632-42.28632
1990
[PubMed][CrossRef]
Inc Staff SAS Institute.SAS/STAT user's guide: release 6.03 edition. Cary, NC:
SAS Publishing; 1995.
1995