0
Scientific Article   |    
Type-II Error Rates (Beta Errors) of Randomized Trials in Orthopaedic Trauma
Heather V. Lochner, MSc; Mohit Bhandari, MDMSc; Paul TornettaIII, MD
The Journal of Bone & Joint Surgery.  2001; 83:1650-1655 
5 Recommendations (Recommend) | 3 Comments | Saved by 3 Users Save Case

Abstract

Background: Although an investigator may limit bias through randomization, concealment of patient allocation, and blinding, the results of randomized trials may be less convincing when the sample size is not sufficiently large to reveal a true difference between treatment groups. When the sample size is small, randomized trials are subject to beta errors (type-II errors)—that is, the probability of concluding that no difference between treatment groups exists when, in fact, there is a difference. The purpose of this study of randomized trials involving fracture care published between 1968 and 1999 was twofold: (1) to evaluate type-II error rates and study power (1 - β) for the primary outcomes and (2) to identify whether investigators clearly identified the primary and secondary outcomes.

Methods: To be eligible, studies were required to (1) be published in English, (2) be described as a randomized trial, (3) involve the care of adult patients with fractures, treated either operatively or nonoperatively, and (4) contain sufficient outcome information to enable study power to be calculated. Computer database searches were performed independently by two investigators to identify all potentially relevant study titles. Additional strategies to identify articles included (1) hand searches of selected orthopaedic journals from 1989 to 1999, (2) searches of the bibliographies of potentially relevant articles, and (3) review by content experts to identify missing studies. For each study, a standard power calculation was performed on the primary and secondary outcomes. For those studies in which the primary outcome was not explicitly reported, the most clinically relevant measure was chosen by consensus. Acceptable study power was agreed a priori to be 80% (type-I error of £ 0.20).

Results: We identified 620 potentially relevant citations from MEDLINE, of which only 187 were potentially eligible. We identified nine more articles with other searches, and application of the eligibility criteria to the 196 articles eliminated seventy-nine. Thus, we analyzed 117 studies in which a total of 19,942 patients with orthopaedic trauma had been randomized. Sample sizes ranged from ten to 662 patients (mean and standard deviation, 95 79 patients). The majority (34%) of trials involved the treatment of hip fractures. The mean overall study power among the 117 trials was 24.65% (range, 2% to 99%). The type-II error rate for primary outcomes was 90.52%.

Conclusions: Mean type-II error rates in the orthopaedic trauma trials that we analyzed exceeded accepted standards. Investigators can reduce type-II error rates by performing power and sample-size calculations prior to conducting a trial.

Figures in this Article
    Although there is agreement that a randomized trial is the best study design for the assessment of treatment effectiveness, it is believed that trials of surgical therapies can be too small to have a meaningful impact on clinical practice. Studies with a small sample size are subject to beta errors (type-II errors)—that is, the probability of concluding that no difference between treatment groups exists when, in fact, there is a difference1-3. Typically, investigators accept a beta error rate of 20% (&beta; = 0.20), which corresponds with a study power of 80%. Most investigators agree that beta error rates of >20% (study power of <80%) are subject to an unacceptably high risk of false-negative results4-6. Therefore, although an investigator may limit bias through randomization, concealment of patient allocation, and blinding, the results of a randomized trial are less convincing when the sample size is not sufficiently large to reveal a true difference between treatment groups7.
    Previous investigators have examined the prevalence of type-II error rates in many medical fields4-6,8-11. To our knowledge, there has been no systematic appraisal of the type-II error rates in trials of orthopaedic trauma treatment. Given the increased popularity of randomized trials in the orthopaedic trauma literature, the purpose of our study was twofold: (1) to evaluate the type-II error rates for primary and secondary outcomes in studies in which "nonsignificant" results were reported, and (2) to identify whether investigators clearly reported the primary and the secondary outcomes in trials of treatments for orthopaedic trauma.
     
    Anchor for JumpAnchor for Jump
    +Fig. 1:Four possible conclusions in a randomized trial.
     
    Anchor for JumpAnchor for JumpTABLE I:  Studies Excluded from Analysis
    Reason for ExclusionNo. of Articles Excluded
    Significant study outcome43
    >2 fracture treatment methods20
    Insufficient data?6
    Duplicate publication?4
    Pediatric study?3
    Significant primary or secondary end points?2
    Mechanism of injury other than trauma?1
    Total79
     
    Anchor for JumpAnchor for JumpTABLE II:  Study Power and Type-II Error Rates
    Outcome TypePower (1 — &beta;)Type-II Error Rate (&beta;)
    Average (%)Standard Deviation (%)Range (%)Proportion of Outcomes (%)
    Primary (n = 190)24.6527.21 2.24-99.19 90.52
    Secondary (n = 101)19.66 21.31 2.24-99.19 96.58

    Eligibility Criteria

    We included studies that met the following eligibility criteria: (1) published in English, (2) described as a randomized trial, (3) involved the care of adult patients with fractures, treated either operatively or nonoperatively, and (4) contained sufficient outcome information to enable calculation of study power.

    Study Identification

    We conducted a search of MEDLINE from 1968 to 1999 with use of the following keywords: "fractures (MeSH)" and "randomized controlled trials (publication type)." The search was restricted to "human subjects" and "English language" articles. Two independent reviewers applied the eligibility criteria to potentially eligible study titles. One of the two reviewers was trained in health research methodology, and the other was an orthopaedic traumatologist with experience in the conduct of randomized trials. After a second application of the eligibility criteria to abstracts by the independent reviewers, the complete articles for the potentially eligible studies were retrieved. Two of us reviewed the methods section of each of the retrieved articles to ensure that all inclusion criteria were met.
    In addition to the MEDLINE searches, two of us performed a search of the National Institutes of Health PubMed computerized database, and one of us conducted a search of the Cochrane database. For both searches, we used "fractures" and "randomized trials" as keywords.
    Additional strategies to identify relevant citations included: (1) hand searches of the tables of contents of the Journal of Orthopaedic Trauma, Journal of Trauma, Clinical Orthopaedics and Related Research, and Acta Orthopaedica Scandinavica published from 1989 to 1999; (2) review of the reference lists of eligible (included) studies to identify other potentially eligible studies; and (3) a review by content experts (traumatologists) of the list of eligible studies to identify any missing studies.

    Characteristics of Eligible Studies

    Two reviewers independently abstracted general characteristics of each eligible study. These included first author (surgeon, nonsurgeon, or epidemiologist), epidemiology affiliation, geographic location, category of intervention, body region of focus (upper extremity, lower extremity, or spine), number of participating centers, and whether or not the study was funded.

    Determination of Primary Outcomes in Eligible Studies

    In the overwhelming majority (94%) of studies, multiple outcomes were reported but the primary outcome was not identified. The same reviewers independently reviewed all study outcomes in each eligible trial and identified the most relevant outcome measure as the "primary outcome." Relevant outcome measures were considered those that pertained directly to the interventions that were compared. Although in the majority of studies no explicit statement was made about the primary outcome, the study title, abstract, and introduction often contained information that could be used to infer the authors’ intentions. When no information was available, we used our best judgment to designate a primary outcome for the study. We based our choice on the important clinical outcome specific to the interventions that were compared. All other study outcomes were designated as secondary outcomes. Any discrepancies were resolved by consensus. The chosen primary outcome of each study was described as positive (a difference between treatments) or negative (no difference between treatments).

    Calculation of Type-II Error Rates

    A type-II error (beta error) occurs when investigators conclude that there was no difference between two interventions when a difference actually exists. Study power (1 - &beta;) is the ability of a study to show a difference when one actually exists (Fig. 1). Standard post hoc power calculations were conducted for each outcome in the studies that demonstrated no or nonsignificant differences between treatment groups. The method of calculation used to determine study power depended upon the type of outcome (continuous or dichotomous).
    For continuous outcomes (such as time to fracture-healing in weeks), standard power calculations were performed12 (see Appendix [Formulae for Standard Power Calculations]). When outcomes were dichotomous (such as the presence or absence of deep infection), we chose to calculate study power by the method of Pocock12 (see Appendix [Formulae for Standard Power Calculations]). In both instances, the area under the curve for the calculated Z&beta; values was determined from a standard normal curve table. The power was calculated by subtracting the area by 1. Acceptable study power was agreed a priori to be 80% (type-II error of £0.20).

    Validity of Power Calculations

    To ensure accuracy, the same two investigators independently performed a random sample of the power calculations for thirty articles. The remaining power calculations were not performed until 100% agreement was obtained. Inconsistencies between study results and power calculations were also examined independently as an additional check for validity. We also calculated standard 95% confidence intervals for each outcome and compared them with post hoc power calculations as a final check of validity.

    Assessing Reviewer Agreement

    Agreement in the application of study eligibility criteria as well as the identification of study outcomes and study results (positive or negative) was quantitated with the kappa statistic with quadratic weighting. The kappa statistic, a measure of the agreement between two or more observers beyond chance, provided a measure of agreement between the reviewers with regard to titles, abstracts, and methods sections of potentially relevant studies. Donner and Klar13 and Fleiss14 provided persuasive arguments in favor of the use of this statistic instead of other measures of interobserver agreement that have been proposed.

    Literature Search

    We identified 620 potentially relevant study titles from the MEDLINE database search. Application of the study eligibility criteria eliminated 433 titles and left 187 for further consideration. The advanced PubMed and Cochrane database searches yielded an additional two articles not identified by the MEDLINE search. A review of 11,800 study titles from a hand search of the four journals from the previous ten years identified seven additional potentially eligible studies. Bibliography searches and suggestions from content experts did not yield additional relevant studies. In total, 196 studies were considered to be potentially eligible on the basis of the study title alone and were retrieved for a detailed review (see electronic Appendix [references]). Agreement on the application of eligibility criteria to the study titles was substantial (kappa = 0.88, 95% confidence interval = 0.81 to 0.95).

    Complete Manuscript Review

    Application of the eligibility criteria to 196 complete articles eliminated seventy-nine studies (Table I). The majority of these studies (forty-three [54%] of the seventy-nine) were excluded because the authors reported a positive result (or a significant outcome) and were therefore ineligible for type-II error calculation.
    Twenty studies were eliminated because more than two fracture treatment methods were used. Four articles were found to be duplicate publications of the same research presented in other articles and were removed from the study group. Three studies that involved children were also eliminated because we aimed specifically to identify studies of adult patients. Two studies that had significant primary or secondary end points were removed, and one study was removed because it did not focus on fracture management. Six studies were found to have insufficient information for statistical calculation of the study power and were thus eliminated. Ultimately, 117 trials with nonsignificant results met all of the eligibility criteria and were used for all subsequent power analyses.

    Study Characteristics

    The 117 eligible trials were published in twenty-five different journals (see electronic Appendix [Table E-1]). The majority of the studies (seventy-three [62%] of the 117) were published in The Journal of Bone and Joint Surgery (American and British editions), Acta Orthopaedica Scandinavica, Clinical Orthopaedics and Related Research, and Injury. Forty-six (39%) of the studies were conducted in Scandinavia; twenty-three (20%), in North America; twenty-two (19%), in the United Kingdom; and twelve (10%), in other countries in Europe (see electronic Appendix [Table E-2]). A surgeon was the first author of 108 (92%) of the articles, and nine (8%) articles were written by nonsurgeons. None of the randomized trials had an epidemiologist as the first author, and only four (3%) had at least one author with cited training in biostatistics (MSc or PhD) or affiliation with a department of statistics, public health, or clinical epidemiology. A total of 19,942 patients were randomized in the 117 trials. Study sample sizes ranged from ten to 662 patients (mean and standard deviation, 95 ± 79 patients). One hundred and eight (92%) of the studies were conducted at only one center, and 115 (98%) focused upon interventions related to fracture repair. Fractures of the hip were the primary focus of forty (34%) of the studies.

    Outcomes Assessment

    The vast majority of the eligible trials (110; 94%) involved multiple outcomes, but they were not explicitly defined as primary or secondary end points. On the basis of the nature of the treatment comparisons in each trial, we identified 190 primary outcome measures (see electronic Appendix [Table E-3]). Almost 50% of the articles reported primary outcomes such as clinical or functional scores (accounting for forty-seven [25%] of the 190 primary outcomes), radiographic results or scores (twenty-one; 11%), or complications (twenty; 11%). We found that secondary end points were reported 101 times in the 117 trials. They included complications (accounting for fourteen [14%] of the 101 secondary outcomes), implant failures (twelve; 12%), pain (ten; 10%), and reoperations (nine; 9%) (see electronic Appendix [Table E-4]).

    Study Power and Type-II Error Rates

    The study power for the 190 defined primary outcomes averaged 24.65% (range, 2% to 99%), which corresponded with an average beta value of 0.75 (range, 0.01 to 0.98) (Table II). We found that 172 of 190 primary outcomes were limited by type-II errors. Analysis of secondary outcomes revealed that the study power averaged only 19.66% (range, 2% to 99%); thus, the average beta value was 0.80 (range, 0.01 to 0.98). Type-II errors limited secondary outcomes in 113 of the 117 trials. Of the 117 studies, only five (4%) even mentioned study power in the methods section.
    We performed a systematic review to examine the rates of type-II errors in 117 clinical trials with "negative" outcomes in the orthopaedic trauma literature. The current study was strengthened by our use of predefined eligibility criteria, a comprehensive search of the literature to identify relevant studies, assessment of the reproducibility of study selection, determination of the primary and secondary outcomes for each study by consensus, and detailed calculations of study power performed in duplicate for each eligible study. The majority of studies (95%) that met our eligibility criteria did not meet conventional standards of acceptable type-II error rates (study power of 80% or beta value of £0.20) with regard to both their primary and their secondary outcomes.
    The results are limited by the fact that we determined the primary and secondary outcomes by consensus because few, if any, authors explicitly stated the primary outcome in their study. Moreover, as we identified only the articles published in English, it remains uncertain whether these findings can be generalized to articles published in other languages.
    Most surgeons are familiar with the concept that the results of a particular study may appear to be true when, in fact, they are due to chance (or random sampling error). This erroneous false-positive conclusion is designated as a type-I or alpha error (Fig. 1). By convention, most authors of orthopaedic studies adopt an alpha error rate of 0.05. Thus, investigators can expect a false-positive error about 5% of the time5,15.
    Less appreciated by investigators who conduct surgical trials is the risk of concluding "no difference" between treatments when a difference actually exists (Fig. 1). This type of conclusion is termed a type-II error (beta error). It is equally important to minimize the probability of a type-II error as it is to minimize the probablility of a type-I error. By convention, investigators in clinical orthopaedic trials have designated acceptable type-II error rates as 0.20, a 20% chance of a false-negative conclusion. Cohen16 defended the choice of a type-II error rate that is four times larger than the type-I error rate with the rationale that increasing the study power (or lowering the type-II error rate) would result in large increases in the sample size. For example, decreasing the beta error from 0.20 to 0.05 would increase the sample size from approximately ten patients to 10,000 patients. Such a sample size would be prohibitive for almost all trials in orthopaedics5.
    Additionally, this type of error is seen as less egregious because a wrong conclusion that there is no difference between treatments is not likely to effect a substantial change in the clinical practice of medicine. This is not necessarily the case, however. For example, one study (reference 178 in the list in the electronic Appendix) demonstrated "no difference" between operative and nonoperative management of calcaneal fractures. If the conclusion of that study is true, no patient would be likely to choose to have a calcaneus fixed, given the reported complications of operative treatment. However, if the conclusion is false and there is actually an advantage to reduction and fixation, the reported study will have done a serious disservice to all patients with that injury. The results of randomized studies are given much greater weight than are those of retrospective or case-controlled studies, but if they are underpowered they can lead to conclusions that may justify an inferior treatment.

    Study Power in Clinical Trials

    The power of a study is the probability that it will demonstrate a difference between two treatments when one actually exists2,3,9. Power (1 - &beta;) is simply the complement of the type-II error (beta error). Thus, if we accept a 20% chance of an incorrect study conclusion (&beta; = 0.20), we also accept the corollary that we will come to the correct conclusion 80% of the time. Study power can be calculated before the start of a clinical trial to assist with the determination of sample size or after the completion of a study to determine whether the negative findings were true or due to chance17.
    The power of a statistical test is typically a function of the magnitude of the treatment effect, the designated type-I error (alpha error) rate, and the sample size5,12. When a clinical trial is designed, investigators can decide upon the desired study power (1 — &beta;) and calculate the sample size needed to achieve this goal. If investigators conduct a post hoc power analysis after the completion of a study, they use the actual sample size to calculate the power of the study.
    The magnitude of the effect is, for example, the difference between the mean functional score of the surgically treated group and that of the nonoperatively treated group. The difference can be divided by the standard deviation of the control group to compensate for the variability of the functional scores in each group (variance or standard deviations about the mean scores). The resultant value is termed the "effect size." Interpretation of the effect size is largely a clinical decision and should represent the point at which a surgeon will change his or her practice if the results are true2,3,5. Cohen16 suggested broad guidelines for the interpretation of effect sizes, with 0.2 considered a small effect; 0.5, a moderate effect; and 0.8, a large effect.
    Sample size plays an important role in power analyses. The smaller the difference that an investigator wishes to detect, the larger the sample size required for the study. Extreme examples of large sample sizes needed to detect relatively small treatment effects can be seen in the clinical trials of treatments for cardiovascular disease. In a recent trial of angiotensin-converting-enzyme inhibitor therapy for patients at high risk for cardiovascular events, investigators recruited 9297 patients to identify a 0.5% difference (p = 0.02) in myocardial infarction rates between the treatment and placebo groups18.

    Study Results

    More than 90% of the 117 trials included in the current review were underpowered (<80%) for the treatment effect of their primary outcome. This result is similar to the findings of Chung et al.9, who reported that 86% of thirty-nine trials with negative outcomes in the hand literature lacked sufficient power to identify a moderate treatment effect. Williams et al.6 reviewed forty-one articles in the cardiovascular literature and identified 80% that were insufficiently powered to detect at least one outcome. In a review of fourteen articles that reported negative results in emergency medicine, Brown et al.8 found none that met acceptable standards of study power.
    Not unexpectedly, only five (4%) of the 117 eligible articles in our study included a discussion of sample size and power issues. Brown et al.8 found that only one of fourteen reports in their series provided the study power for the given sample size. Perhaps the lack of consideration of sample size and power issues in the current group of trials was related, in part, to the disproportionately high percentage of single-center initiatives (>90%), led by surgeons. Involvement of someone from a department of epidemiology or biostatistics was infrequent.

    Choice of Outcome Measure: Primary and Secondary

    In their review of thirty-nine reports with negative outcomes in the hand literature, Chung et al.9 identified the choice of outcome measure as a potential source of insufficient study power. The most common primary outcome measure in our series was some measure of patient function, either a score or a qualitative description. Of the forty-seven primary outcomes that were based on patient function, ten were continuous variables (scores) and thirty-seven were dichotomous variables (good versus poor). Only four of the ten studies that reported continuous outcomes and none of the thirty-seven studies that reported dichotomous outcomes were adequately powered.

    Considerations in the Plan of a Clinical Trial

    Given the prevalence of type-II errors in clinical trials involving orthopaedic trauma, future investigators should preplan estimated sample-size requirements on the basis of conventionally accepted standards for study power (80%) and type-I errors (a = 0.05). Small pilot studies on a topic of interest or previous reports in the literature can be helpful to determine the likely treatment effect and to avoid type-II errors. For example, when a trial of alternate strategies for the treatment of tibial shaft fractures is planned, an investigator may identify, on systematic review of the literature, an article that reports that the time to fracture-healing with treatment A is 120 ± 45 days whereas the time to healing with treatment B (control group) can be up to 140 ± 40 days. The expected treatment difference is twenty days, and the effect size is 0.5 (20/40). We know that this is a moderate effect and is likely to be clinically relevant16. The anticipated sample size for this continuous outcome measure is determined by use of the following equation12:
    Image Not Available
    where Za = 1.96, Z&beta; = 0.84, s = 40, and d = 20.
    This study will require a total of approximately sixty-three patients to obtain sufficient power to identify a difference of twenty days between treatments. An investigator may then review the records from his or her center for the last year and decide whether enough patients are likely to present to the center to meet the sample-size requirements.
    Let us assume that this same investigator chooses nonunion instead of time to union as the primary outcome. On the basis of previous reports in the literature, treatment A will result in a 95% union rate and treatment B (control group) will result in a 90% union rate. A different sample-size calculation, for dichotomous variables, is used12:
    Image Not Available
    where PA = 95, PB = 90, and f(a,&beta;) = 7.9.
    A total of 869 patients is required to identify a 5% difference in nonunion rates between treatments. An investigator may realize that this number is too large for the trial to be conducted at one center and may try to obtain support from multiple sites for this trial.
    In conclusion, a systematic review of articles in the orthopaedic trauma literature showed that the majority of clinical trials are limited with regard to sample size and are insufficiently powered. Investigators can avoid the risk of type-II errors by enlisting the aid of biostatisticians to perform a priori power and sample-size calculations when clinical trials are planned.

    Formulae for Standard Power Calculations

    For continuous variables, we used the equation: N = {[(Z&alpha; - Z&beta;)s]/&delta;}, where N = sample size, Za = 1.96, and D = difference between treatments. The standard deviation (s) was determined by calculating the pooled standard deviation between treatment groups: s2 = [(Ntreatment - 1)(streatment)2 + (Ncontrol - 1)(scontrol)2]/Ntreatment - Ncontrol.
    For dichotomous variables, we used the equation: Z&beta; = [n/2s]D - Za. The standard deviation was calculated with the formula: s = PT(1 - PT) + PC (1 - PC)/2, where PT and PC = proportion of events in the treatment and controls, respectively.
    A reference list of the identified randomized trials and tables showing the journals from which the 117 articles were obtained, characteristics of the eligible trials, and primary and secondary outcomes are available with the electronic versions of this article, on our web site at www.jbjs.org (go to the article citation and click on "Supplementary Material") and on our quarterly CD-ROM (call our subscription department, at 781-449-9780, to order the CD-ROM).
    Dorrey F,Swiontkowski MF. Statistical tests. What do we learn from a clinical study? pValues versus confidence intervals.. Advances Orthop Surg,1997;21: 81-5. 2181  1997 
     
    Guyatt GH, Jaeschke R, Heddle N, Cook D, Shannon H,Walter S. Basic statistics for clinicians: 1. Hypothesis testing.. CMAJ,1995;152: 27-32. 15227  1995  [PubMed]
     
    Guyatt GH, Jaeschke R, Heddle N, Cook D, Shannon H,Walter S. Basic statistics for clinicians: 2. Interpreting study results: confidence intervals. CMAJ,1995;152: 169-73. 152169  1995  [PubMed]
     
    Staquet MJ, Rozencweig M, Von Hoff DD,Muggia FM. The delta and epsilon errors in the assessment of cancer clinical trials. Cancer Treat Rep,1979;63: 1917-21. 631917  1979  [PubMed]
     
    Streiner DL. Sample size and power and psychiatric research. Can J Psychiatry,1990;35: 616-20. 35616  1990  [PubMed]
     
    Williams JL, Hathaway CA, Kloster KL,Layne BH. Low power, type II errors, and other statistical problems in recent cardiovascular research. Am J Physiol,1997;273: 487-93. 273487  1997 
     
    Guyatt GH, Sackett DL,Cook DJ. Users’ guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA,1993;270: 2598-601. 2702598  1993  [PubMed][CrossRef]
     
    Brown CG, Kelen GD, Ashton JJ,Werman HA. The beta error and sample size determination in clinical trials in emergency medicine. Ann Emerg Med,1987;16: 183-7. 16183  1987  [PubMed][CrossRef]
     
    Chung KC, Kalliainen LK,Hayward RA. Type II (beta) errors in the hand literature: the importance of power. J Hand Surg [Am],1998;23: 20-5. 2320  1998  [PubMed][CrossRef]
     
    Freiman JA, Chalmers TC, Smith H Jr,Kuebler RR. The importance of beta, type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials. N Engl J Med,1978;299: 690-4. 299690  1978  [PubMed][CrossRef]
     
    Mittendorf R, Arun V,Sapugay AM. The problem of the type II statistical error. Obstet Gynecol,1995;86: 857-9. 86857  1995  [PubMed][CrossRef]
     
    Pocock SJ. Clinical trials: a practical approach. New York: Wiley; 1983. p 123-40. 
     
    Donner A,Klar N. The statistical analysis of kappa statistics in multiple samples. J Clin Epidemiol,1996;9: 1053-8. 91053  1996  [CrossRef]
     
    Fleiss JL. Measuring agreement between two judges on the presence or absence of a trait. Biometrics,1975;31: 651-9. 31651  1975  [PubMed][CrossRef]
     
    Dorrey F,Swiontkowski MF. Statistical tests. What they tell us and what they don’t. Advances Orthop Surg,1997;21: 81-5. 2181  1997 
     
    Cohen J. Statistical power analysis for the behavioral sciences. Rev. ed. New York: Academic Press; 1977. 
     
    Goodman SN,Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med,1994;121: 200-6. 121200  1994  [PubMed]
     
    Yusuf S, Sleight P, Pogue J, Bosch J, Davies R,Dagenais G. Effect of an angiotensin-converting-enzyme inhibitor, ramipril, on cardiovascular events in high-risk patients. The Heart Outcomes Prevention Study Investigators. N Engl J Med,2000;342: 145-53. 342145  2000  [PubMed][CrossRef]
     

    Submit a comment

    Anchor for JumpAnchor for Jump
    +Fig. 1:Four possible conclusions in a randomized trial.
    Anchor for JumpAnchor for JumpTABLE I:  Studies Excluded from Analysis
    Reason for ExclusionNo. of Articles Excluded
    Significant study outcome43
    >2 fracture treatment methods20
    Insufficient data?6
    Duplicate publication?4
    Pediatric study?3
    Significant primary or secondary end points?2
    Mechanism of injury other than trauma?1
    Total79
    Anchor for JumpAnchor for JumpTABLE II:  Study Power and Type-II Error Rates
    Outcome TypePower (1 — &beta;)Type-II Error Rate (&beta;)
    Average (%)Standard Deviation (%)Range (%)Proportion of Outcomes (%)
    Primary (n = 190)24.6527.21 2.24-99.19 90.52
    Secondary (n = 101)19.66 21.31 2.24-99.19 96.58
    Dorrey F,Swiontkowski MF. Statistical tests. What do we learn from a clinical study? pValues versus confidence intervals.. Advances Orthop Surg,1997;21: 81-5. 2181  1997 
     
    Guyatt GH, Jaeschke R, Heddle N, Cook D, Shannon H,Walter S. Basic statistics for clinicians: 1. Hypothesis testing.. CMAJ,1995;152: 27-32. 15227  1995  [PubMed]
     
    Guyatt GH, Jaeschke R, Heddle N, Cook D, Shannon H,Walter S. Basic statistics for clinicians: 2. Interpreting study results: confidence intervals. CMAJ,1995;152: 169-73. 152169  1995  [PubMed]
     
    Staquet MJ, Rozencweig M, Von Hoff DD,Muggia FM. The delta and epsilon errors in the assessment of cancer clinical trials. Cancer Treat Rep,1979;63: 1917-21. 631917  1979  [PubMed]
     
    Streiner DL. Sample size and power and psychiatric research. Can J Psychiatry,1990;35: 616-20. 35616  1990  [PubMed]
     
    Williams JL, Hathaway CA, Kloster KL,Layne BH. Low power, type II errors, and other statistical problems in recent cardiovascular research. Am J Physiol,1997;273: 487-93. 273487  1997 
     
    Guyatt GH, Sackett DL,Cook DJ. Users’ guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA,1993;270: 2598-601. 2702598  1993  [PubMed][CrossRef]
     
    Brown CG, Kelen GD, Ashton JJ,Werman HA. The beta error and sample size determination in clinical trials in emergency medicine. Ann Emerg Med,1987;16: 183-7. 16183  1987  [PubMed][CrossRef]
     
    Chung KC, Kalliainen LK,Hayward RA. Type II (beta) errors in the hand literature: the importance of power. J Hand Surg [Am],1998;23: 20-5. 2320  1998  [PubMed][CrossRef]
     
    Freiman JA, Chalmers TC, Smith H Jr,Kuebler RR. The importance of beta, type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials. N Engl J Med,1978;299: 690-4. 299690  1978  [PubMed][CrossRef]
     
    Mittendorf R, Arun V,Sapugay AM. The problem of the type II statistical error. Obstet Gynecol,1995;86: 857-9. 86857  1995  [PubMed][CrossRef]
     
    Pocock SJ. Clinical trials: a practical approach. New York: Wiley; 1983. p 123-40. 
     
    Donner A,Klar N. The statistical analysis of kappa statistics in multiple samples. J Clin Epidemiol,1996;9: 1053-8. 91053  1996  [CrossRef]
     
    Fleiss JL. Measuring agreement between two judges on the presence or absence of a trait. Biometrics,1975;31: 651-9. 31651  1975  [PubMed][CrossRef]
     
    Dorrey F,Swiontkowski MF. Statistical tests. What they tell us and what they don’t. Advances Orthop Surg,1997;21: 81-5. 2181  1997 
     
    Cohen J. Statistical power analysis for the behavioral sciences. Rev. ed. New York: Academic Press; 1977. 
     
    Goodman SN,Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med,1994;121: 200-6. 121200  1994  [PubMed]
     
    Yusuf S, Sleight P, Pogue J, Bosch J, Davies R,Dagenais G. Effect of an angiotensin-converting-enzyme inhibitor, ramipril, on cardiovascular events in high-risk patients. The Heart Outcomes Prevention Study Investigators. N Engl J Med,2000;342: 145-53. 342145  2000  [PubMed][CrossRef]
     
    Accreditation Statement
    These activities have been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education (ACCME) through the joint sponsorship of the American Academy of Orthopaedic Surgeons and The Journal of Bone and Joint Surgery, Inc. The American Academy of Orthopaedic Surgeons is accredited by the ACCME to provide continuing medical education for physicians.
    CME Activities Associated with This Article
    Submit a Comment
    Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
    Comments are moderated and will appear on the site at the discretion of JBJS editorial staff.

    * = Required Field
    (if multiple authors, separate names by comma)
    Example: John Doe




    Related Articles
    Related Cases
    Related Content
    Topic Collections
    Related Audio and Videos
    PubMed Articles
    Transfusion thresholds in high-risk patients after hip surgery.
    The New England journal of medicine: Issue date- 2012 Mar 29
    Transfusion thresholds in high-risk patients after hip surgery.
    The New England journal of medicine: Issue date- 2012 Mar 29
    Guidelines
    Results provided by:
    PubMed
    Clinical Trials
    Readers of This Also Read...
    jbjs jobs
    02/16/2012
    MA - Beth Israel Deaconess Medical Center
    05/18/2012
    NH - Concord Orthopaedics
    01/04/2012
    LA - LSU Health Shreveport
    05/18/2012
    TX - University of North Texas Health Science Center