Abstract
Background:
Health status questionnaires are important, especially with the growing interest in outcome studies. However, these questionnaires continue to be administered in their original paper format. We hypothesized that total hip arthroplasty outcome data derived with computer-based questionnaires do not differ significantly from those derived with established paper-based formats.
Methods:
From January 2006 to January 2007, the clinic schedules of four attending arthroplasty surgeons were screened weekly to identify patients who could potentially be included in the study. Charts were reviewed for subjects who were scheduled for or had received primary total hip arthroplasty. Patients were recruited during their office visit or when they attended a preoperative educational class, and five health status questionnaires (the Harris hip score, WOMAC [Western Ontario and McMaster Universities Osteoarthritis Index], SF-36 [Short Form-36], EQ-5D [EuroQol-5D], and UCLA [University of California at Los Angeles] activity score) were administered in three formats: paper, touch screen, and web-based. Repeated-measures analysis of variance and Pearson correlations were used to compare the questionnaire modes for the Harris hip score (normally distributed data), and the Friedman test and Spearman correlations were used to compare the modes for the other health status scores (non-normally distributed data). The study was designed with 90% power for detecting 10% differences between modes in the entire series of sixty-one patients and with 82% and 87% power in preoperative and postoperative subgroups, respectively.
Results:
The mean age was sixty-three years, with thirty-seven male and twenty-four female patients in the study. Forty-seven hips (77%) had osteoarthritis as the primary diagnosis. No significant differences were detected, for any of the five health outcome systems, among the paper, touch screen, and web-based modes, and there were highly significant correlations among all questionnaire modes in the entire series of patients and in the preoperative and postoperative subgroups (p < 0.001).
Conclusions:
The scores obtained with the paper, touch screen, and web-based modes of the five questionnaires demonstrated excellent agreement. Thus, touch screen and web-based formats can be used to collect and track patient outcome data. Use of electronic formats of these questionnaires will facilitate a more efficient and reliable data collection process.
Total hip arthroplasty is an effective treatment for relieving pain and improving function in the hip affected by severe osteoarthritis1-3. More than 1,000,000 patients worldwide are treated with total hip arthroplasty annually4. There is a growing interest in outcome studies following hip replacement and a demand for evidence-based health care1,5,6. Therefore, health status questionnaires are of particular importance when comparing the cost-to-benefit ratios of surgical interventions such as total hip arthroplasty. These questionnaires, when properly administered, can provide an objective measure of the overall health of the patient as well as of progress and treatment outcomes of specific diseases or conditions.
Two types of outcome instruments are used to assess the effectiveness of total hip arthroplasty: disease-specific and general health. Disease-specific questionnaires contain questions regarding the studied disease without the influence from other diseases. In order to evaluate various hip disabilities and methods of treatment, Harris introduced, in 1969, a new rating scale, which was a numerical classification with a maximum of 100 points (the Harris hip score)4,7,8. This was originally designed to be a physician-administered total hip arthroplasty outcome measure that primarily assessed pain, function, absence of deformity, and hip motion. Since 1969, hip surgeons have administered the Harris hip score, and it is currently the most widely used scoring system for evaluating results of total hip arthroplasty. Today, self-administered scoring systems such as the WOMAC (Western Ontario and McMaster Universities Osteoarthritis Index) have become popular as a result of their high internal consistency and test-retest reliability5,6,9,10. The WOMAC is a well-known disease-specific measure that is widely used for evaluating outcomes after total hip arthroplasty and has the added advantage of being patient-administered.
In contrast to disease-specific questionnaires, general health questionnaires ask patients about their health-related quality of life. One commonly used general health questionnaire is the SF-36 (Short Form-36), a self-administered questionnaire developed for applications in psychometric theory. It is now a widely used measure of general health status5,11,12. The EQ-5D (EuroQol-5D) is a generic multidimensional health-related quality-of-life profile with acceptable values for construct validity, internal consistency, and reliability3,10. It is a standardized generic instrument for describing health-related quality of life. Finally, the UCLA (University of California at Los Angeles) activity score is an important and widely used general health outcome measure for patients treated with total hip arthroplasty that has great validity and reliability. It has recently been shown to be the most appropriate scale for assessment of the physical activity levels of patients treated with total joint arthroplasty12.
Despite the current advances in technology and computer systems, health outcome questionnaires are only slowly being implemented in an electronic format13,14. Administration of paper-based questionnaires is an extremely cumbersome process, and retrieval and analysis of the paper-based data are very inefficient. Additionally, there is a greater risk of loss, inaccurate interpretation, and incomplete collection of these data. It has been shown that that the response rate and completeness of the data entered can be significantly improved by using computer-based technology15. While studies in other medical fields have evaluated various data-collection modes of patient-administered questionnaires16,17, their conclusions are specific to the particular questionnaires that they used and the patient demographic that they targeted, and their findings cannot be directly applied to other self-administered surveys or patient groups. It is therefore important that the electronic versions of the specific surveys used to evaluate the outcomes of total hip arthroplasty be evaluated against the gold standard of paper-based outcome assessment. In this study, we hypothesized that total hip arthroplasty outcome data derived from computer-based versions of the standard surveys used in our hip arthroplasty clinic do not significantly differ from those derived with established paper-based formats.
A prospective study protocol was designed and was approved by our institutional review board committee. From January 2006 to January 2007, the clinic schedules of four attending arthroplasty surgeons at our institution were screened weekly to identify potential subjects to be enrolled in the study. Patient charts were reviewed for recruitment of subjects who were scheduled for or had been treated with a primary total hip arthroplasty by one of the four surgeons. Patients who had a revision total hip arthroplasty, were under the age of eighteen, or did not speak English were not included in the study.
Patients were recruited during their office visit or during their attendance of a preoperative educational class. They consented to participate in the study, and they completed the five health status questionnaires (the Harris hip score, WOMAC, SF-36, EQ-5D, and UCLA activity score) in three different formats: paper-based, touch-screen computer-based, and web-based. The order in which the different modes were administered was not standardized or recorded, but was dependent on the patient flow of the clinic.
The computer-based questionnaires contained an initial secure patient log-in screen followed by a list of the five questionnaires to be completed. Patients completed the computer-based (touch screen and web) questionnaires in a step-wise systematic fashion. When using the computer-based formats, patients could not move forward to another page or to a different questionnaire without fully completing each questionnaire sheet. Scores were automatically calculated by the computer system.
Pain and function subscores were tabulated for comparison among the three different questionnaire modes to evaluate for any inconsistencies in reporting among the three different patient subsets (the entire series and preoperative and postoperative subgroups). For the Harris hip score, the pain subscore was determined by direct questioning of the patients and the function subscore was determined by mathematical addition of the scores for the functional aspects of the Harris hip score questionnaire (activity, transportation, sitting, gait, distance, and limp). The WOMAC questionnaire scores are broken down into pain, stiffness, and physical function, and the function subscores were determined by the mathematical addition of the stiffness and physical function subscores. The SF-36 subscores include physical function, role physical, bodily pain, general health, vitality, social function, role emotional, and mental heath. The pain subscore was the bodily pain score. The function subscore of the SF-36 was determined by averaging the physical and social function scores. The EQ-5D questionnaire includes questions regarding mobility, self-care, usual activities of daily living, pain/discomfort, and anxiety/depression. The pain subscore was determined by the pain/discomfort question. The function subscore was determined by averaging the mobility, self-care, and usual activities scores. The UCLA questionnaire does not have a pain or function subscore.
Our study was designed with 90% power (a = 0.05, ß = 0.10) for detecting a difference of 10 points between the different questionnaire modes for the total Harris hip score (and a 5-point mean difference in pain and function scores) as well as similar effect sizes for each of the other health status outcome measures when differences were assessed on the basis of all sixty-one patients. For the subgroup analyses, the twenty-five patients who completed the questionnaires preoperatively and the thirty-six who completed them postoperatively provided 82% and 87% power, respectively, to detect differences among the results of the three questionnaire modes with use of the appropriate parametric or nonparametric paired analysis (version 7.0, nQuery Advisor; Statistical Solutions, Saugus, Massachusetts).
Outcome scores were evaluated for normality with use of the Shapiro-Wilk normality test18, and this assumption was satisfied for the Harris hip score but not for the other measures because of skewness. Therefore, correlation among the scores of the different modes were assessed with use of the Pearson correlation coefficient (r) for the Harris hip score and with use of the Spearman rho correlation for the WOMAC, SF-36, EQ-5D, and UCLA scores. Additionally, since patients were evaluated with use of paper, touch screen, and web-based questionnaire modes, the F test in repeated-measures analysis of variance (ANOVA) and the nonparametric version (Friedman test) were used to compare scores among the modes in order to determine by paired analysis if the scores were mode-dependent. Analysis of preoperative and postoperative subgroups was also performed. Statistical analysis was carried out with the SPSS software package (version 16.0; SPSS, Chicago, Illinois). A more stringent two-tailed value of p < 0.01 was considered significant to adjust for multiple comparisons.
Source of Funding
No outside funding or grants were received in support of the research or preparation of this work.
A total of sixty-six patients were enrolled in the study. All patients who were asked to participate in the study agreed. Five were excluded secondary to incomplete or missing data, leaving a total of sixty-one patients (sixty-one hips) in the study cohort. Twenty-eight of the sixty-one patients completed all questionnaires on the same day; twenty-nine patients, within two weeks; and four patients, within eight weeks. Twenty-five patients completed the questionnaires preoperatively and thirty-six, postoperatively. The mean age was sixty-three years (range, thirty-six to eighty-five years). The majority (thirty-seven of the sixty-one) were male, and the majority (forty-seven) had osteoarthritis as their primary diagnosis. Other diagnoses included rheumatoid arthritis, systemic lupus erythematosus, osteonecrosis, and hip dysplasia.
The Harris hip scores (the total score as well as the pain, function, and range-of-motion subscores) were found to conform to a normal, Gaussian-shaped distribution. Therefore, the Harris hip scores derived with the paper, touch screen, and web-based questionnaire modes were compared by using the F test in repeated-measures ANOVA for the entire series of sixty-one patients and for the preoperative (n = 25) and postoperative (n = 36) subgroups. The other health outcome scores showed some skewness according to the Shapiro-Wilk test and were thus presented as medians and ranges, with the questionnaire modes compared by using the nonparametric Friedman test distributed as a chi-square statistic with two degrees of freedom (overall p value). If the Friedman test showed significance, indicating overall differences between modes, then the Wilcoxon signed-rank test for pairwise comparisons was applied19.
Assessment of the total Harris hip score and pain, function, and range-of-motion subscores derived with the three modes of collection showed similar means and standard deviations within the three subgroups (all patients and the preoperative and postoperative subgroups). There were no significant differences among the three collection modes for the Harris hip total, pain, function, and range-of-motion scores in the total patient group or in either subgroup (Table I). Highly significant Pearson correlations were found among all three questionnaire modes for the Harris hip total, pain, and function scores (p < 0.001) within all three patient subgroups (Table II).
Analysis of the WOMAC total score and pain and function subscores derived with the three modes of collection showed similar medians and ranges in all three patient subgroups. The median WOMAC pain and function subscores did not differ significantly among the collection modes or among the patient subgroups (Table III). Highly significant Spearman correlations were found among all three questionnaire modes for the WOMAC total score and pain and function subscores (p < 0.001) in all three patient subgroups (Table IV). In summary, all three patient self-reported modes of the WOMAC questionnaire captured comparable total outcome scores in all three patient subgroups (Fig. 1).
Assessment of the SF-36 total score and mental, pain, and function subscores derived with the three modes of collection demonstrated similar medians and ranges within the three patient subgroups, with no significant differences shown by the Friedman test (Table V). Spearman correlations for the SF-36 (Table VI) indicated highly significant correlations among the three questionnaire modes for each variable in all three subgroups of patients (p < 0.001). In summary, all three self-reported modes of the SF-36 captured comparable total outcome scores in all three patient subgroups (p > 0.10, Fig. 2). In addition, all eight subscores on the SF-36 indicated significant correlations among the three questionnaire modes (all p < 0.001).
The medians and ranges for the total, pain, and function EQ-5D scores also not differ significantly among the paper, touch screen, and web-based modes (p > 0.05, Table VII). The median values for the EQ-5D total index were similar for all three modes in the entire series of patients (p = 0.46) and in the preoperative (p = 0.62) and postoperative (p = 0.52) subgroups (Fig. 3). Significant correlations were found among all three questionnaire modes for each variable (Spearman rho values between 0.68 and 0.93, all p < 0.001) (Table VIII).
The UCLA activity questionnaire was also examined for differences among the three modes. The median scores derived with the paper, web, and touch screen formats were all 6 points, with no significant differences among modes, and similar ranges of scores were observed (Table VII). Spearman correlations among the three modes were all very high and highly significant (Spearman rho ranges between 0.84 and 0.98, all p < 0.001) (Table VIII).
Finally, four patients (three men and one woman, all in the postoperative subgroup) filled out the questionnaires over an eight-week period. This is potentially a long duration since the questionnaire scores are dynamic in nature. We therefore conducted a sensitivity analysis of the data excluding these four patients, and it revealed no significant differences among the three modes and demonstrated significant correlations across the board. Therefore, the delay in the completion of the questionnaires by these four patients did not affect the conclusions of this study.
General health status and outcome questionnaires are important tools for assessing the overall health and progress of a patient as well as determining the effectiveness of a procedure. They are particularly important for determining the cost-to-benefit ratio of surgical interventions such as total hip arthroplasty. These outcome measures have been, and continue to be, administered in their original paper format. Administration of paper-based questionnaires is cumbersome, and retrieval of the data can be inefficient. There is often incomplete retrieval as well as misinterpretation of this important information. Hence, many physicians refrain from administering these questionnaires.
We are not aware of any studies in which computer versions of all of the outcome measures used in our arthroplasty clinic were assessed in a single group of patients who were scheduled to receive or had received a total hip arthroplasty. We hypothesized that data derived with computer-based questionnaires would not significantly differ from those obtained with their established paper-based counterparts. We anticipated that using the computer-based questionnaires would improve collection and interpretation of these data as well as physician utilization of these imperative outcome measures. To test our hypothesis, we systematically compared the results of computer-based and paper-based questionnaires in the same group of patients who were scheduled to receive or had received a total hip arthroplasty. We also compared the results of the different formats of the questionnaires among different subgroups of patients (the total group and preoperative and postoperative subgroups) and between subscores for pain and function to evaluate for any discrepancies among the different formats. We found significant consistency in the data retrieved with the computer-based questionnaires and their paper-based counterparts. While not finding statistical differences does not imply that the modes are equal, the computer-based questionnaires showed high validity and reliability.
Interestingly, however, the SF-36 questionnaire domains of bodily pain and role emotional showed significant differences among the paper and computer-based formats in the preoperative and total patient groups (p < 0.001). The implications of those findings are unknown. However, they do demonstrate that, preoperatively, patients rate their emotional status and pain differently on paper and computer versions of a questionnaire. In order to comment on this discrepancy, one would have to further evaluate these patients, their demographics, and the time taken between the completions of the two formats. These factors would then need to be compared with those in the postoperative patient subgroup in order to determine the cause for such a discrepancy.
There are some inherent limitations of our study. We evaluated one specific subset of patients. Although there was a wide range of ages in our cohort (thirty-six to eighty-five years), all patients were English speaking and able to read, write, and use a computer. Furthermore, they all underwent primary total hip arthroplasty. This limits our study to a very specific subset of the population, and therefore our results may not be generalizable to all orthopaedic patients. Furthermore, in order to utilize computer-based questionnaires, patients must have home computers and Internet access or physicians must provide patients with computer stations at their offices. This could potentially lead to low compliance. However, home computers and home Internet access have dramatically increased in the last ten years. The relative small number (n = 25) of patients who completed the questionnaires preoperatively did show larger variability in all scores. This variability suggests less agreement among the three modes of assessment than we observed in the postoperative subgroup. Consequently some caution is warranted with regard to the high level of agreement among modes in the preoperative group; these conclusions may need to be confirmed in a larger group of patients. However, in the analyses of all five instruments, there were clearly no significant differences in the results among the modes of data collection in the preoperative subgroup, suggesting good agreement despite the preoperative variability.
The order in which the modes of input were completed by the patients was not randomized in a controlled fashion or recorded. Randomization would have strengthened the generalizability of the data. The order was dictated by the flow of patients in the clinic and the availability of kiosks. Therefore, it was not possible to control for this in the busy clinical setting. Although we do believe that an order effect is unlikely, this was still a limitation of the study. Lastly, one of the difficulties encountered during the study was in the enrollment of patients. This was due to the length of time needed to complete all five questionnaires in three different formats. Hence, this may have affected the number of patients recruited for the duration of the study (sixty-six were recruited, and five of them did not complete all of the questionnaires).
In summary, the results of our study clearly demonstrate that computer-based questionnaires provide data comparable with those obtained with paper-based outcome instruments over a range of patient ages of thirty-six to eighty-five years. Therefore, it is effective to collect this information electronically with either web-based or touch screen formats depending on which format is most convenient for either the patient or the physician. Conversion to the electronic formats of these questionnaires will allow for a less cumbersome and time-intensive process for the administration of each questionnaire, more efficient data retrieval and analysis, and greater protection against loss of data. Furthermore, this conversion may increase the utility of these important outcome measures among more surgeons.
Note: The authors acknowledge Harry Rubash, Andrew Freiberg, and Sara Jane Wessinger for helping with patient recruitment.
Liang
MH;
Katz
JN;
Phillips
C;
Sledge
C;
Cats-Baril
W. The total hip arthroplasty outcome evaluation form of the American Academy of Orthopaedic Surgeons. Results of a nominal group process. The American Academy of Orthopaedic Surgeons Task Force on Outcome Studies. J Bone Joint Surg Am.
1991;73:639-46.[PubMed]
Bellamy
N;
Campbell
J. Hip and knee rating scales for total joint arthroplasty: a critical but constructive review; part I. J Orthop Rheumatol.
1989;2:3-21.
Kavanagh
BF;
Ilstrup
DM;
Fitzgerald
RH
Jr. Revision total hip arthroplasty. J Bone Joint Surg Am.
1985;67:517-26.[PubMed]
Harris
WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J Bone Joint Surg Am.
1969;51:737-55.[PubMed]
Söderman
P;
Malchau
H. Is the Harris hip score system useful to study the outcome of total hip replacement?Clin Orthop Relat Res.
2001;384:189-97.[PubMed][CrossRef]
Rubash
HE;
Harris
WH. Revision of nonseptic, loose, cemented femoral components using modern cementing techniques. J Arthroplasty.
1988;3:241-8.[PubMed][CrossRef]
Mahomed
NN;
Arndt
DC;
McGrory
BJ;
Harris
WH. J Arthroplasty.
2001;16:575-80.[PubMed][CrossRef]
Ritter
MA;
Fechtman
RW;
Keating
EM;
Faris
PM. The use of a hip score for evaluation of the results of total hip arthroplasty. J Arthroplasty.
1990;5:187-9.[PubMed][CrossRef]
McGrory
BJ;
Morrey
BF;
Rand
JA;
Ilstrup
DM. Correlation of patient questionnaire responses and physician history in grading clinical outcome following hip and knee arthroplasty. A prospective study of 201 joint arthroplasties. J Arthroplasty.
1996;11:47-57.[PubMed][CrossRef]
Dawson
J;
Fitzpatrick
R;
Murray
D;
Carr
A. Comparison of measures to assess outcomes in total hip replacement surgery. Qual Health Care.
1996;5:81-8.[PubMed][CrossRef]
Patrick
DL;
Deyo
RA. Generic and disease-specific measures in assessing health status and quality of life. Med Care.
1989;27(
3 Suppl):S217-32.[PubMed][CrossRef]
Naal
FD;
Impellizzeri
FM;
Leunig
M. Which is the best activity rating scale for patients undergoing total joint arthroplasty?Clin Orthop Relat Res.
2009;467:958-65.[PubMed][CrossRef]
Saleh
KJ;
Radosevich
DM;
Kassim
RA;
Moussa
M;
Dykes
D;
Bottolfson
H;
Gioe
TJ;
Robinson
H. Comparison of commonly used orthopaedic outcome measures using palm-top computers and paper surveys. J Orthop Res.
2002;20:1146-51.[PubMed][CrossRef]
Kvien
TK;
Mowinckel
P;
Heiberg
T;
Dammann
KL;
Dale
Ø;
Aanerud
GJ;
Alme
TN;
Uhlig
T. Performance of health status measures with a pen based personal digital assistant. Ann Rheum Dis.
2005;64:1480-4.[PubMed][CrossRef]
Kongsved
SM;
Basnov
M;
Holm-Christensen
K;
Hjollund
NH. Response rate and completeness of questionnaires: a randomized study of Internet versus paper-and-pencil versions. J Med Internet Res.
2007;9:e25.[PubMed][CrossRef]
Fouladi
RT;
McCarthy
CJ;
Moller
NP. Paper-and-pencil or online? Evaluating mode effects on measures of emotional functioning and attachment. Assessment.
2002;9:204-15.[PubMed][CrossRef]
Steenhuis
MP;
Serra
M;
Minderaa
RB;
Hartman
CA. An Internet version of the Diagnostic Interview Schedule for Children (DISC-IV): correspondence of the ADHD section with the paper-and-pencil version. Psychol Assess.
2009;21:231-4.[PubMed] [CrossRef]
Shapiro
S;
Wilk
MD. An analysis of variance test for normality. Biometrika.
1965;52:591-611.
Petrie
A;
Sabin
C. Medical statistics at a glance. 2nd ed. Malden, MA: Blackwell; 2005. p 49-51.