DissLiteratur/storage/7TILHF33/.zotero-ft-cache

Kenneth J. Berry Janis E. Johnston
Statistical Methods: Connections, Equivalencies, and Relationships

Statistical Methods: Connections, Equivalencies, and Relationships

Kenneth J. Berry • Janis E. Johnston
Statistical Methods: Connections, Equivalencies, and Relationships

Kenneth J. Berry Fort Collins, CO, USA

Janis E. Johnston Alexandria, VA, USA

ISBN 978-3-031-41895-2

ISBN 978-3-031-41896-9 (eBook)

https://doi.org/10.1007/978-3-031-41896-9

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Paper in this product is recyclable.

For Dr. Paul W. Mielke, Jr. (1931–2019), teacher, colleague, collaborator, and friend.

Preface
For many years, the authors have been struck by the connections, equivalencies, and relationships among seemingly disparate statistical tests and measures, many of which have been developed by multiple researchers in different disciplines, not realizing that the test or measure might exist elsewhere. Some tests and measures are connected or related only under certain conditions and some tests and measures were developed for ostensibly unrelated purposes, but later were found to be related.
This book addresses connections, equivalencies, and relationships between and among various statistical tests and measures. By no means is the book meant to be a comprehensive documentation of connections in the ﬁeld of statistics, but merely discusses those connections and relationships that have come to the attention of the authors over many years of teaching and research. The book was inspired by memories of a television series that investigated connections among historical events. Older readers will recall a television series from the late 1970s, created and hosted by British science historian James Burke titled Connections. The 10-episode made-for-television series ﬁrst aired in the United Kingdom in 1978 and in the United States in 1979 and demonstrated how seemingly unconnected discoveries, scientiﬁc achievements, and world events were, in fact, interconnected to bring about particular aspects of modern technology. For example, Gutenberg invents the printing press, and literacy rates rise, which causes a signiﬁcant portion of the reading public to require eyeglasses for the ﬁrst time, which creates a surge of investment in lens-making across Europe, which leads to the invention of the telescope and the microscope. While this book explores connections and relationships among various statistical tests and measures, it is most assuredly a more modest undertaking.
Many of the statistical connections and relationships presented in the book will be well known to practicing statisticians and teachers of statistics, such as the relationship between Student’s pooled t test for two independent samples and Fisher’s F-ratio test for a completely randomized one-way analysis of variance. Some of the connections may be less well known, such as the relationship between Pearson’s product-moment correlation coefﬁcient and Pearson’s chi-squared test for independence. Finally, some of the connections presented in the book are more
vii

viii

Preface

obscure and will be known to only a few statisticians, such as the relationship between Kendall’s S measure of ordinal association and Spearman’s footrule measure of disarray.
The organization of the book follows the structure of a typical introductory textbook in statistical methods with chapters on one-sample tests, two-sample tests, matched-pairs tests, completely randomized multi-sample tests, randomized-blocks multi-sample tests, measures of regression and correlation, and analyses of goodness of ﬁt and contingency tables. Because not all connections within, say, two-sample tests are other two-sample tests, some overlap is necessitated among chapters. For example, Student’s two-sample test is related to Pearson’s product-moment correlation coefﬁcient. Thus, the connection appears in both the chapter on twosample tests and the chapter on regression and correlation, but is illustrated with different examples.
The book is oriented toward those readers who possess a basic background in parametric and non-parametric statistical methods and are interested in the connections and relationships among a variety of statistical test and measures. No background in higher mathematics, such as calculus or matrix algebra is assumed and, to the extent possible, all connections are demonstrated with examples and without derivations or formal proofs.
This is a fairly large book of some 650 pages with over 300 documented connections. On the other hand, it is a hopelessly short one. Many connections and relationships have been omitted due to space constraints. Our hope is that the book will sensitize and encourage the reader to document and illustrate other connections. The aim of the authors in this book is to both inform and entertain. If this modest book does a little bit of both, the authors will consider it successful.

Fort Collins, CO, USA Alexandria, VA, USA September 2023

Kenneth J. Berry Janis E. Johnston

Acknowledgment
The authors wish to thank the editors and staff at Springer–Verlag. Very special thanks to Dr. Eva Hiripi, Senior Editor, Europe, who guided the present project though from beginning to end.
ix

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Overviews of Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 One-Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Two-Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.4 Matched-Pair Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.5 Completely Randomized Designs . . . . . . . . . . . . . . . . . . . . . 6 1.1.6 Randomized-Block Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.7 Measures of Interval Association . . . . . . . . . . . . . . . . . . . . . . 7 1.1.8 Measures of Ordinal Association I . . . . . . . . . . . . . . . . . . . . 8 1.1.9 Measures of Ordinal Association II . . . . . . . . . . . . . . . . . . . 9 1.1.10 Measures of Nominal Association I . . . . . . . . . . . . . . . . . . . 10 1.1.11 Measures of Nominal Association II . . . . . . . . . . . . . . . . . . 10 1.1.12 Measures of Fourfold Association I . . . . . . . . . . . . . . . . . . . 11 1.1.13 Measures of Fourfold Association II . . . . . . . . . . . . . . . . . . 11 Preview of Chap. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Equivalencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Permutation Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Exact Permutation Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Monte Carlo Permutation Methods . . . . . . . . . . . . . . . . . . . . 25 2.4 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5 The Neyman–Pearson Population Model . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.2 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.3 Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
xi

xii

Contents

2.5.4 Homogeneity of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5.5 Homogeneity of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.6 A Summary: The Five Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.7 The Fisher–Pitman Permutation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7.1 A Historical Perspective. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7.2 Permutation and Parametric Tests . . . . . . . . . . . . . . . . . . . . . 37 2.7.3 Exact Permutation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7.4 The Neyman–Pearson Population Model . . . . . . . . . . . . . 41 2.7.5 The Fisher–Pitman Permutation Model . . . . . . . . . . . . . . . 43 2.7.6 Analyses with Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.7.7 A Second Exact Permutation Example . . . . . . . . . . . . . . . . 46 2.7.8 Monte Carlo Permutation Methods . . . . . . . . . . . . . . . . . . . . 48 2.7.9 A Monte Carlo Permutation Example . . . . . . . . . . . . . . . . . 50 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Preview of Chap. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 One-Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Student’s One-Sample t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 A Permutation One-Sample Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4 Connections Linking Statistics t and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Test Statistics t and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.1 A Conventional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.2 A Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5.3 Connections Linking Statistics δ and t . . . . . . . . . . . . . . . . 64 3.6 The Measurement of Effect Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6.1 The d Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6.2 The r Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.6.3 The Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.7 A Second Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.7.1 A Conventional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.7.2 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 72 3.7.3 Connections Linking Effect-Size Measures . . . . . . . . . . . 75 3.8 Measures of Effect Size for the Observed Data . . . . . . . . . . . . . . . . . . . 76 3.9 Rank-Score Permutation Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.9.1 The Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 78 3.9.2 A Permutation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.9.3 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.9.4 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 80 3.9.5 A Monte Carlo Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.9.6 Connections Linking Statistics T and δ . . . . . . . . . . . . . . . 82 3.10 Tests of Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.10.1 Tests of Proportions for One Sample . . . . . . . . . . . . . . . . . . 84 3.10.2 A Chi-Squared Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.10.3 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 88

Contents

xiii

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Preview of Chap. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 Two-Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.2 Two-Sample Tests of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.2.1 Student’s Two-Sample t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.2.2 A Permutation Two-Sample Test . . . . . . . . . . . . . . . . . . . . . . 95

4.3 Example Two-Sample Analyses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3.1 Student’s t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.3.2 A Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.3.3 Connections Linking Statistics t and δ . . . . . . . . . . . . . . . . 101

4.4 Measures of Effect Size for Two-Sample Tests . . . . . . . . . . . . . . . . . . . 103

4.4.1 Effect-Size Measures for the Observed Data. . . . . . . . . . 104

4.4.2 Connections Linking the Five Measures . . . . . . . . . . . . . . 107

4.4.3 Connections with the Observed Data . . . . . . . . . . . . . . . . . . 108

4.5 Connections Linking Statistics t and rxy . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.6 Equivalencies Linking Statistics rxy and rpb . . . . . . . . . . . . . . . . . . . . . . . 115 4.7 Hotelling’s T 2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.7.1 An Example Analysis for Hotelling’s T 2 . . . . . . . . . . . . . 117

4.7.2 4.7.3

A Conventional Hotelling T 2 Analysis. . . . . . . . . . . . . . . . 119 Connections Linking Statistics T 2 and δ . . . . . . . . . . . . . . 120

4.8 Rank-Score Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.8.1 Wilcoxon’s Rank-Sum Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.8.2 Mann–Whitney’s Rank-Sum Test. . . . . . . . . . . . . . . . . . . . . . 125

4.8.3 Connections Linking Statistics W and U . . . . . . . . . . . . . 126

4.8.4 A Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4.8.5 An Alternative Permutation Analysis . . . . . . . . . . . . . . . . . 128

4.9 Connections Linking Statistics W and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.10 Connections Linking Statistics W and S . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

4.10.1 Calculation of C and D: Method 1 . . . . . . . . . . . . . . . . . . . . 134

4.10.2 Calculation of C and D: Method 2 . . . . . . . . . . . . . . . . . . . . 135

4.11 Connections Linking Statistics S and W . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

4.12 Connections Linking Statistics W and τa . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.13 Connections Linking Statistics S and τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.13.1 An Example Analysis for Whitﬁeld’s τ . . . . . . . . . . . . . . . 140

4.13.2 Connections Linking Statistics S, U , and W . . . . . . . . . . 142

4.14 Connections Linking Statistics W and rrb . . . . . . . . . . . . . . . . . . . . . . . . . 143

4.14.1 Connections Linking Statistics rrb, W , and U . . . . . . . . 146

4.14.2 Connections Linking U , rrb, τa, and τ . . . . . . . . . . . . . . . . 147

4.14.3 Connections Linking Statistics U and W . . . . . . . . . . . . . 147

4.14.4 Connections Linking Statistics U and rrb . . . . . . . . . . . . . 148

4.14.5 Connections Linking Statistics τa and rrb . . . . . . . . . . . . . 148

4.14.6 Connections Linking Statistics rrb and τ . . . . . . . . . . . . . . 149

4.15 Goodman and Kruskal’s γ -Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

xiv

Contents

4.15.1 Equivalencies Linking Statistics γ and rrb . . . . . . . . . . . . 151 4.15.2 Connections Linking Statistics γ and W . . . . . . . . . . . . . . 151 4.15.3 Connections Linking Statistics γ and U . . . . . . . . . . . . . . 152 4.15.4 Connections Linking Statistics γ and τa . . . . . . . . . . . . . . 154 4.16 Festinger’s Rank-Sum Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.16.1 Connections Linking Statistics W and U . . . . . . . . . . . . . 156 4.16.2 Connections Linking Statistics d and W . . . . . . . . . . . . . . 156 4.16.3 Connections Linking Statistics d and U . . . . . . . . . . . . . . . 157 4.17 Haldane–Smith’s Rank-Sum Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.18 Van der Reyden’s Rank-Sum Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.19 Connections Related to Tests of Proportions . . . . . . . . . . . . . . . . . . . . . . 163 4.19.1 A Test of Two Independent Proportions . . . . . . . . . . . . . . . 163 4.19.2 A Chi-Squared Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 4.19.3 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 167 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Preview of Chap. 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5 Matched-Pair Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 5.2 Matched-Pair Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.2.1 Advantages of Matched-Pair Tests . . . . . . . . . . . . . . . . . . . . 174 5.3 Student’s Matched-Pair t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.4 Connections Linking Statistics t and t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 5.5 A Permutation Matched-Pair Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.6 Connections Linking Statistics t and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.7 An Example Analysis for Student’s t-Test. . . . . . . . . . . . . . . . . . . . . . . . . 178 5.7.1 A Conventional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 5.7.2 A Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 5.8 Connections Linking Statistics t and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 5.9 Measures of Effect Size for Matched Pairs . . . . . . . . . . . . . . . . . . . . . . . . 183 5.9.1 Effect Size Measures for the Observed Data . . . . . . . . . . 184 5.9.2 Connections Linking Effect Size Measures . . . . . . . . . . . 185 5.9.3 Comparisons with the Observed Data . . . . . . . . . . . . . . . . . 187 5.10 A Second Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5.10.1 Connections Linking Statistics t and δ . . . . . . . . . . . . . . . . 190 5.11 Multivariate Permutation Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.11.1 Hotelling’s Matched-Pair T 2 Test . . . . . . . . . . . . . . . . . . . . . 197 5.11.2 An Exact Analysis with v = 2 . . . . . . . . . . . . . . . . . . . . . . . . . 198 5.11.3 An Exact Analysis with v = 1 . . . . . . . . . . . . . . . . . . . . . . . . . 200 5.12 Rank-Score Permutation Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.12.1 The Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 202 5.12.2 A Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.12.3 Connections Linking Statistics W and δ . . . . . . . . . . . . . . 205 5.13 The Sign Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 5.13.1 Connections Linking Statistics δ and R+ . . . . . . . . . . . . . 207

Contents

xv

5.13.2 An Example Analysis for the Sign Test . . . . . . . . . . . . . . . 208 5.14 Tests of Proportions for Matched Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.14.1 Tests of Two Correlated Proportions . . . . . . . . . . . . . . . . . . 211 5.14.2 A Chi-Squared Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Preview of Chap. 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6 Completely Randomized Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 6.2 Fisher’s One-Way Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 6.3 Connections Linking Statistics F and t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 6.3.1 Fisher’s F -Ratio Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . 222 6.3.2 Student’s Two-Sample t Test Statistic . . . . . . . . . . . . . . . . . 223 6.4 A Permutation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 6.5 Connections Linking Statistics F and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 6.6 Test Statistics F and δ Illustrated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 6.6.1 A Conventional Analysis of Variance . . . . . . . . . . . . . . . . . 228 6.6.2 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 229 6.6.3 Connections Linking Statistics δ and F . . . . . . . . . . . . . . . 232 6.7 Measures of Effect Size for One-Way ANOVA . . . . . . . . . . . . . . . . . . . 233 6.7.1 Effect-Size Measures for the Observed Data. . . . . . . . . . 236 6.7.2 Comparisons of Effect-Size Measures. . . . . . . . . . . . . . . . . 239 6.7.3 Comparisons with the Observed Data . . . . . . . . . . . . . . . . . 241 6.8 A Second Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 6.8.1 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 249 6.8.2 Connections Linking Statistics F and δ . . . . . . . . . . . . . . . 251 6.8.3 Effect-Size Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 6.8.4 Comparisons of Effect-Size Measures. . . . . . . . . . . . . . . . . 255 6.9 The Intraclass Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 6.10 Robinson’s Measure of Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 6.10.1 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 6.10.2 The Intraclass Correlation Coefﬁcient . . . . . . . . . . . . . . . . . 269 6.10.3 Connections Linking Statistics A and rI . . . . . . . . . . . . . . 272 6.11 Connections Linking Statistics z, F , rxy, and rI . . . . . . . . . . . . . . . . . . . 273 6.11.1 Connections Linking Statistics z and F . . . . . . . . . . . . . . . 275 6.11.2 Connections Linking Statistics z and rI . . . . . . . . . . . . . . . 275 6.11.3 Connections Linking Statistics rI and F . . . . . . . . . . . . . . 276 6.11.4 Connections Linking Statistics rI and rxy . . . . . . . . . . . . . 276 6.12 Rank-Score Permutation Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 6.12.1 The Kruskal–Wallis Rank-Sum Test. . . . . . . . . . . . . . . . . . . 280 6.12.2 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 282 6.12.3 Connections Linking Statistics H and δ . . . . . . . . . . . . . . . 285 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Preview of Chap. 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

xvi

Contents

7 Randomized-Blocks Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

7.2 Randomized-Blocks Analysis of Variance. . . . . . . . . . . . . . . . . . . . . . . . . 293

7.2.1 Fisher’s F -Ratio Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . 294

7.3 A Permutation Approach to Randomized-Blocks . . . . . . . . . . . . . . . . . 296

7.4 Connections Linking Statistics F and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

7.5 Test Statistics F and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

7.5.1 Randomized-Blocks and Matched-Pairs . . . . . . . . . . . . . . 300

7.5.2 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 301

7.5.3 Connections Linking Statistics F and δ . . . . . . . . . . . . . . . 304

7.6 An Example with GED Test Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

7.6.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 307

7.7 Measures of Effect Size for Randomized-Blocks . . . . . . . . . . . . . . . . . 309

7.8 An Example with Restaurant Inspection Data . . . . . . . . . . . . . . . . . . . . . 310

7.8.1 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 312

7.8.2 Measures of Effect Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

7.8.3

Connections Linking Statistics η2 and ωˆ 2 . . . . . . . . . . . . . 314

7.9 Rank-Score Permutation Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

7.9.1 Friedman’s Analysis of Variance for Ranks . . . . . . . . . . . 315

7.9.2 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 317

7.9.3 Interrelationships Among the Measures . . . . . . . . . . . . . . . 318

7.9.4 Connections for the Observed Data . . . . . . . . . . . . . . . . . . . 319

7.9.5 Probability Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

7.9.6 Additional Connections and Equivalencies . . . . . . . . . . . 323

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

Preview of Chap. 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

8 Measures of Interval Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 8.2 Linear Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 8.3 A Permutation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 8.4 Connections Linking Statistics rxy and δ . . . . . . . . . . . . . . . . . . . . . . . . . . 334 8.5 An Example Linear Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 335 8.5.1 A Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 8.5.2 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 339 8.5.3 Connections Linking Statistics and rxy . . . . . . . . . . . . . 340 8.5.4 Connections Linking Statistics rxy, t, and F . . . . . . . . . . 342 8.6 The Point-Biserial Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . 347 8.6.1 An Example Analysis for rpb . . . . . . . . . . . . . . . . . . . . . . . . . . 347 8.6.2 Means, Intercepts, and Slopes . . . . . . . . . . . . . . . . . . . . . . . . . 350 8.6.3 A Probability Value for rpb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 8.6.4 An Alternative Probability Procedure . . . . . . . . . . . . . . . . . 352 8.7 A Rank-Order Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 8.7.1 Spearman’s Rank Correlation Coefﬁcient . . . . . . . . . . . . . 355 8.7.2 Connections Linking Statistics rs and δ . . . . . . . . . . . . . . . 356

Contents

xvii

8.7.3 Equivalencies Linking Statistics rs and . . . . . . . . . . . . . 358 8.7.4 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 358 8.8 Measures of Ordinal–Interval Association. . . . . . . . . . . . . . . . . . . . . . . . . 359 8.8.1 Jaspen’s Index of Ordinal–Interval Association . . . . . . 359 8.8.2 A Probability Value for Jaspen’s rc . . . . . . . . . . . . . . . . . . . . 363 8.9 Biserial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 8.9.1 Connections Linking Statistics rpb and rb . . . . . . . . . . . . . 369 8.9.2 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 8.10 Pearson’s φ2 Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 8.10.1 Connections Linking Statistics φ2 and χ 2 . . . . . . . . . . . . 375 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Preview of Chap. 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
9 Measures of Ordinal Association I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 9.2 Spearman’s Rank-Order Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . 381 9.2.1 A Permutation Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 9.2.2 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 9.3 Connections Linking Statistics rs, δ, and . . . . . . . . . . . . . . . . . . . . . . . . 384 9.3.1 Connections Linking Statistics and rs . . . . . . . . . . . . . . . 385 9.4 Connections Linking Statistics rs and rxy . . . . . . . . . . . . . . . . . . . . . . . . . . 386 9.5 Kendall’s S Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 9.5.1 The Calculation of C and D Pairs . . . . . . . . . . . . . . . . . . . . . 390 9.6 Kendall and a Recursion Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 9.6.1 A Recursion Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 9.7 Mann and a Test of Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 9.8 Connections Linking Statistics rs and S . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 9.9 Connections Linking Statistics S and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 9.10 Spearman’s Footrule Agreement Measure . . . . . . . . . . . . . . . . . . . . . . . . . 400 9.10.1 The Probability of Spearman’s Footrule. . . . . . . . . . . . . . . 403 9.10.2 An Example Analysis for R . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 9.11 Connections Linking Statistics R and . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 9.12 The Benford Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 9.12.1 A Brief History of the Benford Distribution . . . . . . . . . . 406 9.12.2 Goodness of Fit for the Benford Distribution . . . . . . . . . 408 9.12.3 Nigrini’s Goodness-of-Fit Test Statistic . . . . . . . . . . . . . . . 409 9.12.4 An Example of Benford’s Law . . . . . . . . . . . . . . . . . . . . . . . . 411 9.12.5 Connections Linking Statistics δ and R . . . . . . . . . . . . . . . 414 9.13 Kendall’s τa Measure of Ordinal Association . . . . . . . . . . . . . . . . . . . . . 416 9.13.1 An Example Analysis for Kendall’s τa . . . . . . . . . . . . . . . . 416 9.14 Connections Linking Statistics τa and R . . . . . . . . . . . . . . . . . . . . . . . . . . 417 9.14.1 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 9.15 Connections Linking Statistics τa and γ . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 9.16 Connections Linking Statistics τa and δ . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 9.16.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 419

xviii

Contents

9.17 Connections Linking δ and S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 9.17.1 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
9.18 Kendall and Babington Smith’s u Measure . . . . . . . . . . . . . . . . . . . . . . . . 421 9.18.1 An Example Analysis for u . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 9.18.2 Connections Linking Statistics u and τa . . . . . . . . . . . . . . 425 9.18.3 A Second Example Analysis for u. . . . . . . . . . . . . . . . . . . . . 427
9.19 Kendall’s τb Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 9.19.1 An Example Analysis for Kendall’s τb . . . . . . . . . . . . . . . . 431 9.19.2 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 433
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Preview of Chap. 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
10 Measures of Ordinal Association II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 10.2 The Analysis of Contingency Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 10.3 Kendall’s τb Measure of Ordinal Association . . . . . . . . . . . . . . . . . . . . . 438 10.3.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 441 10.3.2 Connections Linking Statistics τb and δ . . . . . . . . . . . . . . . 441 10.4 Stuart’s τc Measure of Ordinal Association . . . . . . . . . . . . . . . . . . . . . . . 442 10.4.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 443 10.4.2 Connections Linking Statistics τc and δ . . . . . . . . . . . . . . . 444 10.5 Goodman and Kruskal’s γ Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 10.5.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 446 10.5.2 Connections Linking Statistics γ and δ . . . . . . . . . . . . . . . 446 10.6 Somers’ dyx and dxy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 10.6.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 449 10.6.2 Connections Linking Statistics dyx and δ . . . . . . . . . . . . . 450 10.6.3 Connections Linking Statistics dxy and δ . . . . . . . . . . . . . 451 10.6.4 Connections Linking Statistics dyx, dxy, and τb . . . . . . 452 10.7 Percentage Differences and dyx and dxy . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 10.7.1 Percentage Differences and byx and bxy . . . . . . . . . . . . . . 455 10.8 Wilson’s e Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 10.8.1 A Monte Carlo Permutation Analysis . . . . . . . . . . . . . . . . . 458 10.8.2 Connections Linking Statistics e and δ . . . . . . . . . . . . . . . . 458 10.9 Connections Linking Pairwise Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 459 10.9.1 Marginal Frequency Distributions . . . . . . . . . . . . . . . . . . . . . 463 10.10 Some Cautions Regarding Ordinal Measures . . . . . . . . . . . . . . . . . . . . . 465 10.11 Whitﬁelds τ Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 10.11.1 Connections Linking Statistics S and U . . . . . . . . . . . . . . 477 10.11.2 An Example of Whitﬁeld’s Approach . . . . . . . . . . . . . . . . . 479 10.12 Cohen’s Weighted κ Measure of Agreement . . . . . . . . . . . . . . . . . . . . . . 483 10.12.1 An Example Analysis for κw . . . . . . . . . . . . . . . . . . . . . . . . . . 484 10.12.2 Cohen’s κw with Linear Weighting . . . . . . . . . . . . . . . . . . . . 486 10.12.3 Cohen’s κw with Quadratic Weighting . . . . . . . . . . . . . . . . 486 10.12.4 Connections Linking Statistics κw and . . . . . . . . . . . . . 487

Contents

xix

10.12.5 Linear Weighting with v = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 488 10.12.6 Quadratic Weighting with v = 2. . . . . . . . . . . . . . . . . . . . . . . 490 10.12.7 Connections Linking Statistics κw and rxy . . . . . . . . . . . . 491 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 Preview of Chap. 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
11 Measures of Nominal Association I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 11.2 Pearson’s Chi-Squared Test of Independence . . . . . . . . . . . . . . . . . . . . . 496 11.2.1 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 11.3 Measures of Effect Size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 11.4 Pearson’s φ2 Measure of Nominal Association . . . . . . . . . . . . . . . . . . . 498 11.4.1 Connections Linking Statistics φ2 and rx2y . . . . . . . . . . . . 501 11.5 Tschuprov’s T 2 Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . . 503 11.6 Cramér’s V Measure of Nominal Association. . . . . . . . . . . . . . . . . . . . . 504 11.6.1 Cohen’s w Measure of Nominal Association . . . . . . . . . 505 11.6.2 Connections Linking Statistics V and w . . . . . . . . . . . . . . 505 11.7 An Exact Permutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 11.8 A Second Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 11.8.1 Measures of Effect Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 11.8.2 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 509 11.9 Pearson’s C Measure of Nominal Association . . . . . . . . . . . . . . . . . . . . 510 11.9.1 Proper Norming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 11.10 The Maximum Arrangement of Cell Frequencies . . . . . . . . . . . . . . . . . 513 11.11 Goodman–Kruskal’s ta and tb Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . 516 11.11.1 An Example Analysis for ta . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 11.11.2 An Exact Permutation Analysis for ta . . . . . . . . . . . . . . . . . 520 11.11.3 An Example Analysis for tb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 11.11.4 An Exact Permutation Analysis for tb . . . . . . . . . . . . . . . . . 522 11.12 Connections Linking Statistics ta, tb, and χ 2 . . . . . . . . . . . . . . . . . . . . . . 523 11.12.1 The First Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 11.12.2 The Second Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 11.12.3 The Third Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 11.12.4 The Fourth Connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 11.13 Connections Linking Statistics tb, δ, and . . . . . . . . . . . . . . . . . . . . . . . . 530 11.14 McNemar’s QM Test for Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 11.14.1 An Example Analysis for QM . . . . . . . . . . . . . . . . . . . . . . . . . 535 11.15 Cochran’s QC Test for Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 11.15.1 An Example Analysis for QC . . . . . . . . . . . . . . . . . . . . . . . . . . 539 11.16 Connections Linking Statistics QM and QC . . . . . . . . . . . . . . . . . . . . . . . 541 11.16.1 An Alternative Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 11.17 Connections Linking Statistics QM, QC, and δ . . . . . . . . . . . . . . . . . . . 545 11.17.1 An Example Analysis for QM and QC . . . . . . . . . . . . . . . . 546 11.17.2 Connections Linking Statistics QC and δ . . . . . . . . . . . . . 550 11.18 Connections Linking Probability Distributions. . . . . . . . . . . . . . . . . . . . 552

xx

Contents

11.18.1 Connections Linking Statistics F and χ 2 . . . . . . . . . . . . . 553 11.18.2 Connections Linking Statistics z and χ 2 . . . . . . . . . . . . . . 553 11.18.3 Connection Linking Statistics z and F . . . . . . . . . . . . . . . . 554 11.18.4 Connections Linking Statistics z and t . . . . . . . . . . . . . . . . 555 11.18.5 Connections Linking Statistics t and F . . . . . . . . . . . . . . . 555 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Preview of Chap. 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557

12 Measures of Nominal Association II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560

12.2 Cohen’s Unweighted κ Measure of Agreement . . . . . . . . . . . . . . . . . . . 560

12.2.1 An Example Analysis for κ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562

12.2.2 Equivalencies Linking Statistics κ and . . . . . . . . . . . . . 563

12.3 Cohen’s Weighted κ Measure of Agreement . . . . . . . . . . . . . . . . . . . . . . 564

12.3.1 The Measurement of Agreement . . . . . . . . . . . . . . . . . . . . . . 564

12.3.2 An Example Analysis for κw . . . . . . . . . . . . . . . . . . . . . . . . . . 567

12.4 Connections Linking Statistics κw and . . . . . . . . . . . . . . . . . . . . . . . . . . 569

12.4.1 Everitt’s Exact Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570

12.4.2 Linear Weighting with v = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 571

12.4.3 Quadratic Weighting with v = 2. . . . . . . . . . . . . . . . . . . . . . . 573

12.4.4 Linear and Quadratic Weighting Compared. . . . . . . . . . . 575

12.4.5 Connections Linking κw, rxy, rI, and . . . . . . . . . . . . . . . 577

12.4.6 The Advantages of Linear Weighting . . . . . . . . . . . . . . . . . 582

12.4.7 Embedded 2 × 2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

12.4.8 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584

12.4.9 Connections Linking Statistics κ and κw . . . . . . . . . . . . . . 587 12.5 Connections Linking Statistics χ 2 and rx2y . . . . . . . . . . . . . . . . . . . . . . . . 587
12.6 Connections Linking χ 2 and rx2y for r × c Tables. . . . . . . . . . . . . . . . . 589
12.7 An Example Orthonormalization Analysis . . . . . . . . . . . . . . . . . . . . . . . . 591

12.7.1 Orthonormal Row Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592

12.7.2 12.7.3

Orthonormal Column Weights . . . . . . . . . . . . . . . . . . . . . . . . . 596 Connections Linking Statistics rxy and χ 2 . . . . . . . . . . . . 599

12.8 An Analysis with Shadow Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

12.9 An Orthonormalization Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
12.10 Leik and Gove’s dNc Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608 12.10.1 The Observed Contingency Table . . . . . . . . . . . . . . . . . . . . . 610

12.10.2 The Expected Contingency Table . . . . . . . . . . . . . . . . . . . . . . 611

12.11

12.10.3 The Maximized Contingency Table . . . . . . . . . . . . . . . . . . . 614
12.10.4 The Calculation of Leik and Gove’s dNc . . . . . . . . . . . . . . . 619 12.10.5 A Permutation Test for dNc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Agresti’s δˆ Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 12.11.1 An Example Analysis for δˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622

12.12 Freeman’s θ Statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

12.12.1 An Example Analysis for θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624

12.13 Somers’ dyx Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626

Contents

xxi

12.13.1 An Example Analysis for dyx . . . . . . . . . . . . . . . . . . . . . . . . . . 628 12.14 Equivalencies Linking Statistics dˆ, θ , and dyx . . . . . . . . . . . . . . . . . . . . 629
12.14.1 Agresti’s δˆ Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 12.14.2 Freeman’s θ Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 12.14.3 Somers’ dyx Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Preview of Chap. 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
13 Measures of Fourfold Association I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 13.2 Fourfold Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 13.3 Pearson’s φ Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636 13.3.1 Connections Linking Statistics φ2 and rx2y . . . . . . . . . . . . 638 13.3.2 Connections Linking Statistics φ2 and χ 2 . . . . . . . . . . . . 638 13.4 Pearson’s C Measure of Association. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 13.4.1 Connections Linking Statistics C and φ2 . . . . . . . . . . . . . 641 13.5 Pearson’s rtet Measure of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 13.5.1 An Example Analysis for rtet . . . . . . . . . . . . . . . . . . . . . . . . . . 645 13.6 Yule’s Q Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 13.6.1 An Example Analysis for Q . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 13.6.2 An Exact Permutation Analysis for Yule’s Q . . . . . . . . . 648 13.7 Goodman and Kruskal’s γ Measure of Association. . . . . . . . . . . . . . . 648 13.7.1 Connections Linking Statistics Q and γ . . . . . . . . . . . . . . 649 13.8 Yule’s Y Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649 13.8.1 An Example Analysis for Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649 13.8.2 Connections Linking Statistics Q and Y . . . . . . . . . . . . . . 651 13.8.3 Connections Linking Statistics rtet and Y . . . . . . . . . . . . . 651 13.9 The Odds Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 13.9.1 An Example Analysis for the Odds Ratio . . . . . . . . . . . . . 653 13.9.2 Odds Ratios Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 13.9.3 Connections Linking Statistics P and ϕ . . . . . . . . . . . . . . . 654 13.9.4 An Exact Permutation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 655 13.9.5 Connections Linking Statistics ϕ and Q . . . . . . . . . . . . . . 656 13.9.6 Connections Linking Statistics ϕ and Y . . . . . . . . . . . . . . . 657 13.9.7 Connections Linking Statistics ϕ and γ . . . . . . . . . . . . . . . 658 13.9.8 Connections Linking Statistics ϕ and δ . . . . . . . . . . . . . . . 658 13.9.9 Connections Linking Statistics Q and δ . . . . . . . . . . . . . . . 659 13.9.10 Some Additional Relationships . . . . . . . . . . . . . . . . . . . . . . . . 660 13.10 Goodman-Kruskal’s ta and tb Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 13.10.1 An Example Analysis for ta . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 13.10.2 Connections Linking Statistics ta and δ . . . . . . . . . . . . . . . 664 13.10.3 An Example Analysis for tb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 13.10.4 Connections Linking Statistics tb and δ . . . . . . . . . . . . . . . 665 13.10.5 Connections Linking Statistics ta, tb, and χ 2 . . . . . . . . . 666 13.11 Somers’ dyx and dxy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667

xxii

Contents

13.11.1 An Example Analysis for dyx . . . . . . . . . . . . . . . . . . . . . . . . . . 667 13.11.2 An Example Analysis for dxy . . . . . . . . . . . . . . . . . . . . . . . . . . 669 13.12 Percentage Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 13.12.1 Percentage Difference for Variable y . . . . . . . . . . . . . . . . . . 671 13.12.2 Percentage Difference for Variable x . . . . . . . . . . . . . . . . . . 673 13.13 Kendall’s τb Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 13.13.1 Connections Linking Statistics τb, dyx, and dxy . . . . . . 677 13.13.2 Connections Linking Statistics τb and rxy . . . . . . . . . . . . . 679 13.13.3 An Example Analysis for τb and rx2y . . . . . . . . . . . . . . . . . . 681 13.14 Pearson’s Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686 13.15 Unstandardized Regression Coefﬁcients. . . . . . . . . . . . . . . . . . . . . . . . . . . 689 13.16 Pearson’s φ2 and Cohen’s κ Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692 13.16.1 An Example Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 13.17 Interconnections Among the Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 Preview of Chap. 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
14 Measures of Fourfold Association II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702 14.2 Symmetric Fourfold Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 14.2.1 Connections Linking φ2, T 2, and V 2 . . . . . . . . . . . . . . . . . . 704 14.3 Pearson’s rxy Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 14.4 Regression Coefﬁcients byx and bxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 14.5 Leik and Gove’s dNc Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 14.5.1 Observed Cell Frequency Values . . . . . . . . . . . . . . . . . . . . . . 708 14.5.2 Expected Cell Frequency Values . . . . . . . . . . . . . . . . . . . . . . 709 14.5.3 Maximized Cell Frequency Values . . . . . . . . . . . . . . . . . . . . 710 14.5.4 Leik and Gove’s Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711 14.6 Goodman and Kruskal’s ta and tb Statistics . . . . . . . . . . . . . . . . . . . . . . . 712 14.7 Kendall’s τb Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 14.8 Stuart’s τc Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 14.9 Somers’ dyx and dxy Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714 14.10 Percentage Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 14.11 Yule’s Y Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 14.12 Cohen’s κ Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 14.13 Goodman and Kruskal’s λa and λb Measures . . . . . . . . . . . . . . . . . . . . . 716 14.13.1 Problems with λa and λb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 14.13.2 Example λa and λb Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 722 14.14 The Interconnections Among the Measures . . . . . . . . . . . . . . . . . . . . . . . 725 14.15 Equivalencies Linking κ and Other Measures . . . . . . . . . . . . . . . . . . . . . 725 14.15.1 Equivalencies Linking Statistics κ and κw . . . . . . . . . . . . 726 14.15.2 Equivalencies Linking Statistics κ and rxy . . . . . . . . . . . . 727 14.15.3 Equivalencies Linking Statistics κ and rI . . . . . . . . . . . . . 729 14.15.4 Equivalencies Linking Statistics κ and dyx . . . . . . . . . . . 730 14.15.5 Equivalencies Linking Statistics κ and dxy . . . . . . . . . . . 731

Contents

xxiii

14.15.6 Equivalencies Linking Statistics κ and τb . . . . . . . . . . . . . 732

14.15.7 Equivalencies Linking Statistics κ and τc . . . . . . . . . . . . . 732

14.15.8 Equivalencies Linking Statistics κ and y . . . . . . . . . . . . 733

14.15.9 Equivalencies Linking Statistics κ and x . . . . . . . . . . . . 733

14.15.10 Equivalencies Linking Statistics κ and byx . . . . . . . . . . . 734

14.15.11 Equivalencies Linking Statistics κ and bxy . . . . . . . . . . . 734

14.15.12 Equivalencies Linking Statistics κ and λx . . . . . . . . . . . . 735

14.15.13 Equivalencies Linking Statistics κ and λy . . . . . . . . . . . . 736

14.15.14 Equivalencies Linking Statistics κ and Y . . . . . . . . . . . . . 736

14.15.15 Equivalencies Linking Statistics κ and φ . . . . . . . . . . . . . . 737

14.15.16 Equivalencies Linking Statistics κ and T . . . . . . . . . . . . . 737

14.15.17 Equivalencies Linking Statistics κ and V . . . . . . . . . . . . . 737

14.15.18 Equivalencies Linking Statistics κ and . . . . . . . . . . . . . 738

14.16 Connections Linking κ to Other Measures . . . . . . . . . . . . . . . . . . . . . . . . 739

14.16.1 Connections Linking Statistics κ and tx . . . . . . . . . . . . . . . 740

14.16.2 Connections Linking Statistics κ and ty . . . . . . . . . . . . . . . 740

14.16.3 Connections Linking Statistics κ and ϕ . . . . . . . . . . . . . . . 741

14.16.4 Connections Linking Statistics κ and Q . . . . . . . . . . . . . . 741

14.16.5 14.16.6 14.16.7

Connections Linking Statistics κ and γ . . . . . . . . . . . . . . . 742
Connections Linking Statistics κ and dNc . . . . . . . . . . . . . . 742 Connections Linking Statistics κ and δ . . . . . . . . . . . . . . . . 745

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746

Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765

Chapter 1
Introduction

For many years the authors have been struck by the connections and relationships among seemingly disparate statistical tests and measures, many of which have been developed by multiple researchers in different disciplines, not realizing that the test or measure might exist elsewhere. Some tests and measures are connected or related only under certain conditions and some tests and measures were developed for ostensibly unrelated purposes, but later were found to be related.
While many of the connections, equivalencies, and relationships described in the book are, to the authors’ knowledge, new, none of the documented connections would have been possible without the contributions of countless statisticians, mathematicians, and quantitative researchers who preceded the authors and to whom the authors are deeply indebted. As two historic ﬁgures described the obligation:
If I have seen further, it is by standing on the shoulders of giants. (Issac Newton in a letter to Robert Hooke, 1675)
And
We are like dwarfs on the shoulders of giants, so that we can see more than they, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size. (Bernard of Chartres in a letter to John of Salisbury, 1159)
The primary purpose of this book is to introduce, document, and illustrate connections, equivalencies, and relationships between and among a wide variety of statistical tests and measures. Many of the connections are well known to statisticians and teachers of statistics, such as the connections linking Pearson’s chi-squared test for independence for .2 × 2 contingency tables and Pearson’s product-moment correlation coefﬁcient, or the connections linking Student’s t-test for two independent samples and Fisher’s one-way completely randomized analysis of variance. Other connections are less well known, such as the connections linking Kendall’s S measure of pairwise ordinal association and Mann and Whitney’s U two-sample rank-sum test, or the connections linking the odds ratio and Yule’s

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

1

K. J. Berry, J. E. Johnston, Statistical Methods: Connections, Equivalencies,

and Relationships, https://doi.org/10.1007/978-3-031-41896-9_1

2

1 Introduction

Q measure of contingency. And some connections are much more obscure and will be known only to a few statisticians and quantitative researchers, such as the connections linking percentage differences and the associated unstandardized slopes of regression lines, or the connections linking Pearson’s chi-squared test for independence for .r × c contingency tables, Pearson’s squared product-moment correlation coefﬁcient, and Cramér’s .V 2 measure of nominal association.
Although the authors have taught a variety of different courses in conventional statistics under the Neyman–Pearson population model, their major research focus has been on permutation statistical methods, under the Fisher–Pitman permutation model. Consequently, some connections described in the book link two or more conventional tests or measures, some link two or more permutation tests or measures, and some link conventional tests or measures with associated permutation tests or measures. Because the average reader will most likely be unfamiliar with permutation statistical methods, Chap. 2 is devoted to an overview of permutation statistical methods and the relationships between permutation and conventional parametric statistical methods.
Finally, the book is oriented toward those interested readers who possess a basic background in parametric and nonparametric statistical methods and are interested in the connections and relationships among a variety of statistical tests and measures. No background in higher mathematics is assumed and, to the extent possible, all connections are demonstrated with examples and without formal derivations or proofs.1

1.1 Overviews of Chapters
In this section a brief introduction to each chapter is provided. The order of the chapters follows the order in most introductory books on statistics with an introductory chapter followed by chapters on one-sample tests, two-sample tests, matched-pair tests, completely randomized designs, randomized-block designs, correlation and association, measures of ordinal association, measures of nominal association, and a ﬁnal chapter on fourfold contingency tables. There are three exceptions to the conventional organization in this book. The ﬁrst is the absence of a chapter on central tendency and variability. The authors simply did not uncover any interesting connections relating to central tendency or variability. Because some readers will be unfamiliar with permutation statistical methods, the second exception is the inclusion of Chap. 2 where permutation statistical methods are introduced and compared with conventional statistical methods. The third exception is the addition of two chapters on fourfold contingency tables. There are so many connections between and among measures of categorical association that the discussion of measures of nominal association is divided into four chapters: two
1 With apologies, one section of Chap. 12 assumes some knowledge of matrix algebra.

1.1 Overviews of Chapters

3

chapters on .r × c contingency tables and two chapters devoted to .2 × 2 contingency tables.
Finally, because this book is about connections between and among statistical tests and measures, the division of measures of correlation and association into seven discrete chapters is not always clean. For example, Spearman’s rank-order correlation coefﬁcient properly belongs in Chap. 9 on the analysis of ordinal-level variables, but because it is intimately connected to Pearson’s product-moment correlation coefﬁcient, it is also included in Chap. 8 on the analysis of intervallevel variables. Also, Jaspen’s correlation coefﬁcient for one interval-level variable and one ordinal-level variable most properly belongs in Chap. 9, but since Jaspen’s correlation coefﬁcient is simply a Pearson product-moment correlation between an interval-level variable and a transformed ordinal-level variable, it is also included in Chap. 8.
As noted, in a few cases the authors have chosen to describe measures in more than one chapter. This is done for two reasons: ﬁrst, to make the connections linking different measures without forcing the reader to go back and examine either a preceding chapter or a later chapter and second, in the age of electronic books, oftentimes only a single chapter is downloaded from a library or other source. Thus, to the extent possible, each chapter is designed to be complete and self-contained. The result is a small amount of redundancy among chapters.

1.1.1 Statistical Methods
Chapter 2 presents two models of statistical inference: the population model and the permutation model. Most introductory textbooks in statistics and statistical methods present only the Neyman–Pearson population model of statistical inference.2 However, there are a few textbooks on conventional statistical methods that include or have included a section or chapter on permutation methods, for example, Howell [92, pp. 635–662] and May, Masson, and Hunter [143, pp. 219–229]. The Neyman–Pearson population model is named from two contemporaries, Jerzy Neyman (1894–1981)3 and Egon Pearson (1895–1980),4 and was designed to make inferences about population parameters and provide approximate probability values under the appropriate null hypothesis. The Neyman–Pearson population model is characterized by the concepts of null (.H0) and alternative (.H1) hypotheses, Type I
2 It should be noted that the Neyman–Pearson population model of statistical inference is usually not accurately presented in introductory texts [95]. 3 Jerzy Spława-Neyman (1894–1981) was a Polish mathematician and statistician who emigrated to the United Kingdom in 1925 and then to the United States in 1938. He was the founder and ﬁrst Chair of the Department of Statistics at the University of California at Berkeley. 4 Egon Pearson (1895–1980) was the son of Karl Pearson (1857–1936) and an important and inﬂuential statistician in his own right. His father has often been credited with establishing the discipline of mathematical statistics.

4

1 Introduction

(.α) and Type II (.β) errors, power (.1 − β), and the assumptions of independence, random sampling, normally distributed populations, homogeneity of variance, and homogeneity of covariance, where appropriate.
While the Neyman–Pearson population model will be familiar to most readers and needs little or no introduction, the Fisher–Pitman permutation model of statistical inference is less likely to be familiar to most readers. The Fisher–Pitman permutation model is named from R.A. Fisher (1890–1962)5 and E.J.G. Pitman (1897–1993).6 In contrast to conventional statistical tests based on the Neyman– Pearson population model, tests based on the Fisher–Pitman permutation model are distribution-free, entirely data-dependent, appropriate for nonrandom samples, provide exact probability values, and are ideal for small datasets.

1.1.2 One-Sample Tests
Chapter 3 presents connections linked to tests of a single sample. In many introductory textbooks, two one-sample tests are described: a one-sample z-test and a one-sample t-test. The distinction between the two tests is based on the well-known relationship between Student’s t-distribution and the unit-normal zdistribution: Student’s t-distribution approaches the z-distribution as degrees of freedom (df ) .→ ∞. The difference between a one-sample z-test and a one-sample t-test is based on knowledge of the population standard deviation, .σx. If .σx is known, then the sampling distribution of sample means is approximated by the z-distribution, but if .σx is not known and is estimated by the sample standard deviation, .sx, then the sampling distribution of sample means is approximated by Student’s t-distribution.
In this book on statistical connections, equivalencies, and relationships, onesample z-tests are not presented as the population standard deviation is generally not known for statistical data pertaining to experiments and surveys. The one-sample ztest ﬁnds its application, most generally, in testing and measurement, where .σx is usually known or otherwise established, such as the IQ, ACT, SAT, GRE, and other standardized tests and measures.
The chapter describes connections, equivalencies, and relationships relating to one-sample tests of null hypotheses. First, Student’s conventional one-sample t-test is described. Second, a permutation one-sample test is presented and the connections linking the two tests are established. An example analysis illustrates the differences
5 Ronald Aylmer Fisher (1890–1962) was for many years the Director of the Statistical Laboratory at the Rothamsted Experimental Station. Later, Fisher was the Galton Professor of Eugenics at University College, London, and then the Arthur Balfour Chair of Genetics at the University of Cambridge. 6 Edwin James George Pitman (1897–1993) was a mathematician and statistician at the University of Tasmania. Pitman produced three formative articles on permutation statistical methods in 1937 and 1938.

1.1 Overviews of Chapters

5

in the two approaches and the connections linking the two tests. Third, measures of effect size for one-sample tests are presented for both Student’s one-sample ttest and the permutation one-sample test, and the connections linking the various measures are described. Fourth, Wilcoxon’s nonparametric one-sample signed-rank test is introduced for rank-score data and illustrated with an example analysis. A permutation alternative to Wilcoxon’s test is described and the connections linking the two tests are established. Finally, the connections linking a conventional onesample z test for proportions and Pearson’s chi-squared goodness-of-ﬁt test are documented and illustrated with an example analysis.

1.1.3 Two-Sample Tests
Chapter 4 presents connections linked to tests of two independent samples. Statistical tests for differences between two samples are of two basic varieties.7 The ﬁrst variety examines differences between datasets with values obtained from two independent samples of subjects. For example, a study might seek to compare grades in an introductory course in psychology for majors and non-majors, for transfer and non-transfer students, or for juniors and seniors. The second variety of two-sample tests examines differences between datasets with values obtained on the same or matched subjects. For example, a study might compare the same subjects at two different time periods, such as before and after an intervention or program change, or matched subjects on two different formats for academic instruction. Connections related to tests for two independent samples are presented in this chapter. Connections related to tests for two matched samples are presented in Chap. 5.
The chapter describes connections, equivalencies, and relationships relating to two-sample tests of null hypotheses. First, Student’s conventional two-sample ttest is described. Second, a permutation two-sample test is presented and the connections linking the two tests are established. An example analysis illustrates the differences in the two approaches and the connections linking the two methods. Third, measures of effect size for two-sample tests are presented for both Student’s two-sample t-test and the permutation two-sample test and the connections linking the measures are described and illustrated. Fourth, the Wilcoxon–Mann–Whitney two-sample rank-test is presented with a permutation alternative for rank-score data and the connections linking the two tests are described and illustrated. Multiple connections linking the Wilcoxon–Mann–Whitney two-sample rank-sum test to other tests and measures are described. Finally, the connections linking a conventional two-sample z-test for proportions and Pearson’s chi-square test of independence are described and illustrated with an example analysis.
7 There are, of course, a multitude of two-sample tests, but these are the two basic types from which most other two-sample tests are constructed.

6
1.1.4 Matched-Pair Tests

1 Introduction

Chapter 5 presents connections linked to tests of two matched samples. Two-sample tests of experimental differences between matched samples constitute the backbone of research in such diverse ﬁelds as horticulture, biology, experimental psychology, education, animal husbandry, and agriculture, where breeding, cloning, embryo transplants, gene splicing, and other forms of genetic engineering produce closely matched subjects. A matched-pair study might examine the difference between two sets of related subjects, such as identical twin studies, or between the same subjects, before and after an intervention, experimental treatment, program initiation, or program change.
The chapter describes connections, equivalencies, and relationships relating to matched-pair tests of null hypotheses. First, Student’s conventional matched-pair t test is described. Second, a permutation matched-pair test is presented and the connections linking the two matched-pair tests are described. An example analysis illustrates the differences in the two approaches and the connections linking the two tests. Third, measures of effect size for matched-pair tests are presented for both Student’s matched-pair t test and the permutation matched-pair test and the connections linking the various measures are described and illustrated. Fourth, Wilcoxon’s nonparametric signed-rank test is introduced and illustrated with an example analysis. A permutation alternative to Wilcoxon’s test is described and the connections linking the two tests are established. Finally, the connections linking a conventional matched-pair z-test for proportions and Pearson’s chi-squared test of goodness of ﬁt are described and illustrated with an example analysis.

1.1.5 Completely Randomized Designs
Chapter 6 presents connections linked to one-way completely randomized analysisof-variance designs. There are two major types of multi-sample tests: tests for experimental differences among three or more independent samples (completely randomized designs) and tests for experimental differences among three or more dependent samples (randomized-block designs).8 Connections relating to statistical methods for randomized-block designs are presented in Chap. 7 Connections relating to statistical methods for completely randomized designs are presented in this chapter.
The chapter describes connections, equivalencies, and relationships relating to multi-sample tests of null hypotheses. First, Fisher’s conventional one-way completely randomized analysis of variance is described. Second, a multi-sample permutation test is presented and the connections linking the two tests are estab-
8 There are, of course, a multitude of multi-sample tests, but these are the two basic types from which most other multi-sample tests are constructed.

1.1 Overviews of Chapters

7

lished. An example analysis illustrates the differences in the two approaches and the connections linking the two tests. Third, measures of effect size for multiple independent samples are described and the interconnections among the measures of effect size are detailed. Fourth, the connections linking the analysis of variance and the intraclass correlation coefﬁcient are described and illustrated. Fifth, the Kruskal– Wallis g-sample rank-sum test is described and illustrated with a small rank-score dataset. A permutation alternative multi-sample rank-sum test is introduced and the connections linking the Kruskal–Wallis H -test statistic and permutation multisample test statistic .δ are described and illustrated with an example analysis.

1.1.6 Randomized-Block Designs
Chapter 7 presents connections linked to one-way randomized-block analysis-ofvariance designs. As explained in the overview of Chap. 6, there are two types of multi-sample tests: tests for experimental differences among three or more independent samples (completely randomized designs) and tests for experimental differences among three or more dependent samples (randomized-block designs). Connections relating to statistical methods for completely randomized designs are presented in Chap. 6. Connections relating to statistical methods for randomizedblock designs are presented in this chapter.
The chapter describes connections, relationship, and equivalencies relating to one-way randomized-block analysis-of-variance designs. First, Fisher’s conventional one-way randomized-block analysis of variance is described. Second, a permutation test is presented for randomized-block data and the connections linking the two approaches are established. An example analysis illustrates the differences in the two approaches and the connections linking the two tests. Third, measures of effect size for multiple related samples are described and the interconnections linking the various measures of effect size are detailed and illustrated. Fourth, Friedman’s two-way analysis of variance for ranks is described and illustrated with a small rank-score dataset. Finally, a permutation multi-sample rank-sum test is introduced and the connections linking Friedman’s test statistic and the permutation randomized-block test statistic are described and illustrated with an example analysis.

1.1.7 Measures of Interval Association
Chapter 8 presents connections linked to measures of association between intervallevel variables, commonly called measures of correlation. Bivariate measures of linear correlation and regression constitute the backbone of research in such diverse ﬁelds as biology, education, psychology, sociology, clinical studies, and genetics and

8

1 Introduction

underpin more advanced statistical procedures such as multiple linear regression, logistic regression, path analysis, and hierarchical linear modeling.
The chapter describes connections, equivalencies, and relationships relating to bivariate linear correlation and association. First, Pearson’s product-moment correlation coefﬁcient is described. Second, a permutation alternative to Pearson’s correlation coefﬁcient is presented for bivariate data and the connections linking the two measures are described. Third, an example analysis illustrates the differences and similarities of the two measures and asymptotic and Monte Carlo probability values are calculated and compared. Fourth, the point-biserial correlation coefﬁcient is described, an example analysis illustrates the point-biserial correlation coefﬁcient, and a Monte Carlo probability value for the point-biserial correlation coefﬁcient is generated. The connections linking the point-biserial correlation coefﬁcient and Pearson’s product-moment correlation coefﬁcient are established.
Fifth, the connections linking Spearman’s rank-order correlation coefﬁcient and Pearson’s product-moment correlation coefﬁcient are detailed and illustrated with an example analysis. Sixth, Jaspen’s multi-serial correlation coefﬁcient for one ordinal-level variable and one interval-level variable is described and the connections linking Jaspen’s coefﬁcient and Pearson’s product-moment correlation coefﬁcient are established. Finally, the biserial correlation coefﬁcient is described and the connections linking the biserial correlation coefﬁcient, the point-biserial correlation coefﬁcient, Jaspen’s multi-serial correlation coefﬁcient, and Pearson’s product-moment correlation coefﬁcient are documented and illustrated with an example analysis.

1.1.8 Measures of Ordinal Association I
Chapter 9 is the ﬁrst of two chapters presenting connections linked to measures of association between ordinal-level (rank) variables. It is common for researchers to analyze data consisting of simple rank scores. Ordinal measurements, or rank scores, generally transpire in two different ways. First, oftentimes data gathered do not conform to one or more of the assumptions underlying classical parametric measures, such as normality or homogeneity. Nonparametric measures of ordinal association, such as discussed in this chapter, generally have fewer and less stringent assumptions. Thus, the raw data may be converted to ranked data. Second, some data are gathered as ordinal data; for example, respondents may be asked to rank entities such as their favorite airlines to ﬂy, their preferred chain of hotels in which to stay, their favorite restaurants at which to dine, or a list of political candidates (ranked-choice voting).
The chapter describes connections, equivalencies, and relationships relating to measures of ordinal association. First, Spearman’s rank-order correlation coefﬁcient is presented and the connections linking Spearman’s rank-order correlation coefﬁcient and Pearson’s product-moment correlation coefﬁcient are described. Next, the connections linking Spearman’s rank-order correlation coefﬁcient and Mielke

1.1 Overviews of Chapters

9

and Berry’s . chance-corrected measure are documented. Second, Kendall’s S-test statistic is described and illustrated with an example analysis. Third, Mann’s test of randomness is described and the connections linking Mann’s test with Kendall’s Stest statistic are described. Fourth, the connections linking Kendall’s S-test statistic and Spearman’s rank-order correlation coefﬁcient are described and illustrated. Next, the connections linking Kendall’s S- and permutation test statistic .δ are documented. Fifth, Spearman’s footrule measure of ordinal association is presented and the connections linking Spearman’s footrule with permutation test statistic .δ are demonstrated with an example analysis. Sixth, the Benford probability distribution and Nigrini’s goodness-of-ﬁt measure are introduced and the connections with Spearman’s footrule measure are demonstrated. Seventh, Kendall’s .τa measure is described and illustrated with an example analysis. Eighth, Kendall and Babington Smith’s u measure of ordinal association is described and the connections linking Kendall’s .τa and Babington Smith’s u are described. Finally, Kendall’s .τb measure of ordinal association is described and illustrated with an example analysis.

1.1.9 Measures of Ordinal Association II
Chapter 10 is the second of two chapters presenting connections linked to measures of association between ordinal-level (rank) variables. In contrast to Chap. 9 and 10 examines connections and relationships among measures of ordinal association as applied to contingency tables.
The chapter describes connections, equivalencies, and relationships relating to measures of ordinal association. First, Kendall’s .τa and .τb measures of ordinal association are described and the connections linking Kendall’s .τa and .τb to test statistic .δ are deﬁned. Second, Stuart’s .τc measure of ordinal association is described and the connections linking Stuart’s .τc and test statistic .δ are documented. An example illustrates the connections among S, .τa, .τb, .τc, and .δ. Third, Goodman and Kruskal’s .γ measure of ordinal association is presented and the connections linking Goodman and Kruskal’s .γ and permutation test statistic .δ are detailed. Fourth, Somers’ .dyx and .dxy measures of ordinal association are described and the connections linking Somers’ .dyx and .dxy and test statistic .δ are established and illustrated. Fifth, Wilson’s e measure of ordinal association is described and the connections linking Wilson’s e and permutation test statistic .δ are delineated. Finally, the connections linking Kendall’s S measure of ordinal association and Mann and Whitney’s U two-sample rank-sum statistic are described and illustrated with an example analysis.

10
1.1.10 Measures of Nominal Association I

1 Introduction

Chapter 11 is the ﬁrst of two chapters presenting connections linked to measures of association between nominal-level (categorical) variables. It is common for researchers to analyze data consisting of simple count scores. Nominal-level measurements, or categorical scores, generally transpire in two different ways. First, oftentimes data gathered do not conform to one or more of the assumptions underlying classical parametric measures, such as normality or homogeneity. Nonparametric measures of nominal association, such as discussed in this chapter, generally have fewer and less stringent assumptions. Second, oftentimes data are simply gathered as nominal-level data. The most common occurrence is the gathering of simple count data such as the number of COVID-19 cases recorded each week for a given city or county, the number of children attending school on a given day, the number of felons incarcerated for a particular offense, or the number of times a speaker says “well, like, you know.”
The chapter describes connections, equivalencies, and relationships relating to measures of nominal association. First, Pearson’s chi-squared test of independence is described and illustrated with an example analysis. Second, Pearson’s .φ2, Pearson’s C, Tschuprov’s .T 2, and Cramér’s .V 2 measures of nominal association are described and the connections linking the measures to Pearson’s .χ 2 are established and illustrated. Third, the connections linking Goodman and Kruskal’s .ta and .tb measures with Pearson’s .χ 2-test of independence and permutation test statistic .δ are established. Finally, McNemar’s .QM and Cochran’s .QC measures of change are described and the connections linking .QM, .QC, and permutation test statistic .δ are established and illustrated with an example analysis.

1.1.11 Measures of Nominal Association II
Chapter 12 is the second of two chapters presenting connections linked to measures of association between nominal-level (categorical) variables. First, Cohen’s unweighted kappa (.κ) and weighted kappa (.κw) measures are described and the connections linking .κ, .κw, and the permutation-based measure of chance-corrected agreement, . , are established. Second, the connections linking Pearson’s .χ 2 test of independence and Pearson’s .rx2y product-moment correlation coefﬁcient are described for .r × c contingency tables. Third, Leik and Gove’s .dNc measure of nominal association, Agresti’s .δˆ measure of nominal association, Freeman’s .θ measure of nominal association, and Somers’ .dyx and .dxy measures of ordinal association are presented. Finally, the connections linking Leik and Gove’s .dNc , Agresti’s .δˆ, Freeman’s .θ , and Somers’ .dyx and .dxy are described and illustrated with an example analysis.

1.1 Overviews of Chapters

11

1.1.12 Measures of Fourfold Association I

Chapter 13 is the ﬁrst of two chapters presenting connections linked to the analysis
of fourfold contingency tables. The statistical analysis of fourfold contingency
tables is so prevalent in the current research literature, in such a wide variety of disciplines, that Chap. 13 is devoted entirely to connections relating to .2 × 2 contingency tables. While .2×2 tables appear deceptively simple and uncomplicated at ﬁrst consideration, the analysis of .2 × 2 tables is fraught with controversy and has
been for over a century.
The chapter describes connections, equivalencies, and relationships relating to
fourfold contingency tables. First, Pearson’s mean-square contingency coefﬁcient, .φ2, and tetrachoric correlation coefﬁcient, .rtet, are described. Second, Yule’s Q and Y measures are described and the connections linking Q and Y are established.
Third, the odds ratio is described and the connections linking the odds ratio, Yule’s
Q, and Yule’s Y are detailed. Fourth, Goodman and Kruskal’s .ta and .tb asymmetric measures of association are presented and the connections linking .ta, .tb, .χ 2, Pearson’s .φ2, and Pearson’s .rx2y are described and illustrated. Fifth, Somers’ .dyx and .dxy asymmetric measures are described, the connections linking .dyx and .dxy with simple percentage differences, . y and . x, are established, and the connections with the corresponding unstandardized sample regression coefﬁcients, .byx and .bxy, are detailed. Sixth, Kendall’s .τb measure of association is described and the connections among .τb, .χ 2, Pearson’s .φ, and Pearson’s .rxy are delineated. Finally, the connections linking Pearson’s .φ2 and Cohen’s .κ are documented and illustrated with an example
analysis.

1.1.13 Measures of Fourfold Association II
Chapter 14 is the second of two chapters presenting connections linked to the analysis of fourfold contingency tables. The previous chapter on fourfold association described connections, equivalencies, and relationships relating to fourfold contingency tables in general. Chapter 14 describes connections, equivalencies, and relationships relating to symmetric fourfold contingency tables. A symmetric fourfold contingency table is a .2 × 2 contingency table in which N is even and each marginal frequency total is equal to .N/2.
The chapter describes connections, equivalencies, and relationships relating to symmetric fourfold contingency tables. First, the connections linking Pearson’s .χ 2, Pearson’s .φ2, Tschuprov’s .T 2, and Cramér’s .V 2 are described and illustrated. Second, the connections linking Pearson’s .rxy product-moment correlation coefﬁcient, regression coefﬁcients .byx and .bxy, Leik and Gove’s .dNc measure of nominal association, and Goodman and Kruskal’s .ty and .tx asymmetric measures of nominal association are described. Third, the connections linking Kendall’s .τb measure of ordinal association, Stuart’s .τc measure of ordinal association,

12

1 Introduction

percentage differences . y and . x, and Somers’ .dyx and .dxy asymmetric measures of ordinal association are described and illustrated. Fourth, the connections linking Yule’s Y measure of nominal association, Cohen’s unweighted .κ measure of interrater agreement, and Goodman and Kruskal’s .λy and .λx measures of nominal association are described. Finally, the connections linking Cohen’s unweighted .κ measure with a variety of other measures are described and illustrated with an example analysis.

Preview of Chap. 2
Chapter 2 provides comprehensive introductions to two models of statistical inference: the population model ﬁrst put forward by Jerzy Neyman and Egon Pearson in 1928 and the permutation model developed by R.A Fisher, R.C. Geary, T. Eden, F. Yates, H. Hotelling, M.R. Pabst, and E.J.G. Pitman in the 1920s and 1930s. Exact and Monte Carlo permutation statistical methods are described in Chap. 2 and compared with conventional parametric tests, and the assumptions of independence, random sampling, normality, and homogeneity of variance are examined for a variety of data sources.

Chapter 2
Statistical Methods

This chapter sets the background for the connections, equivalencies, and relationships between and among a wide variety of statistical tests and measures. Many of the documented connections are between conventional parametric and nonparametric tests and measures, other connections are between permutation tests and measures, and still other connections are between conventional and permutation tests and measures. While the reader is assumed to be familiar with most of the conventional tests and measures presented in the various chapters, permutation tests and measures may be less familiar. To this end, the chapter provides a brief overview of permutation statistical measures, comparisons with conventional test and measures, and illustrative examples of the two approaches to statistical methods and delineates some of the advantages and disadvantages of permutation statistical methods when compared with conventional statistical methods.
Because of the many statistical tests and measures presented in the book, it may be helpful for the reader to have access to computer programs designed for many of the tests. The reader is referred to Permutation Statistical Methods with R by Berry, Kvamme, Johnston, and Mielke [17] containing 70 R scripts for many of the tests and measures described in the book, both conventional and permutation. R is an open-source computer language originally designed for statistical calculation and graphical display by Ross Ihaka and Robert Gentleman. It may be downloaded free of charge from
https://cran.r-project.org/. The authors also recommend downloading RStudio, a free companion program containing an editor and other applications, available at
https://rstudio.com.products/rstudio/.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

13

K. J. Berry, J. E. Johnston, Statistical Methods: Connections, Equivalencies,

and Relationships, https://doi.org/10.1007/978-3-031-41896-9_2

14
2.1 Introduction

2 Statistical Methods

The chapter is organized as follows. First, the concepts of connections, equivalencies, and relationships, given in the subtitle of the book, are deﬁned and illustrated with example analyses. Second, the Neyman–Pearson population model of statistical inference is summarized, which is the standard model taught in all introductory courses on statistical methods. Third, the assumptions underlying the population model of independence, random sampling, normality, homogeneity of variance, and homogeneity of covariance are examined. Fourth, a brief historical perspective on the Fisher–Pitman permutation model of statistical inference is presented, with which most readers are assumed to be unfamiliar. Fifth, the two models of statistical inference are compared. Sixth, the differences between the Neyman–Pearson population model and the Fisher–Pitman permutation model are illustrated with an analysis of a small dataset. Finally, exact and Monte Carlo permutation methods are further described, compared, and illustrated utilizing a historical dataset.

2.2 Deﬁnitions
The subtitle of the book is Connections, Equivalencies, and Relationships. The meanings of these three terms and their differences are important in understanding the many examples and illustrations presented in the book. In this section the three concepts are deﬁned and illustrated with example analyses.

2.2.1 Connections
Two statistical tests or measures are considered connected when one test or measure can be deﬁned in terms of the second, and vice versa, but yield different results. An example is the connections linking the biserial and point-biserial correlation coefﬁcients.
Both the biserial and point-biserial correlation coefﬁcients calculate the correlation between a dichotomous variable and a continuous interval-level variable, for example, the correlation between juniors and seniors and grade-point average. The difference lies in the deﬁnition of dichotomous. For the pointbiserial correlation coefﬁcient, the dichotomy is naturally occurring, such as true/false or pro/con. For the biserial correlation coefﬁcient, the dichotomy is arbitrary, such as dividing a continuous variable into two pieces, for example, dividing IQ into under 100 and over 100 or dividing height into under 5’10” and over 5’10”.

2.2 Deﬁnitions
Table 2.1 Example dummy .(0, 1) coded data for the biserial and point-biserial correlation coefﬁcients

15

Score x y

1

0 10

2

0 20

3

0 30

4

1 20

5

1 30

6

1 40

7

1 40

8

1 50

To illustrate the statistical connections linking the biserial and point-biserial correlation coefﬁcients, consider the bivariate correlation data listed in Table 2.1 with dummy (.0, 1) coded variable x and continuous interval-level variable y.

The Biserial Correlation Coefﬁcient

The biserial correlation coefﬁcient is given by

.rb

=

(y¯1 − y¯0)n0n1 N 2uSy

,

(2.1)

where .n0 denotes the number of scores coded 0, .n1 denotes the number of scores coded 1, N denotes the total number of scores, .y¯0 denotes the sample mean of the y scores coded 0, .y¯1 denotes the sample mean of the y scores coded 1, .Sy denotes the sample standard deviation of the y scores coded 0 and 1, and u denotes the ordinate of the normal distribution at .p = n0/N .
For the bivariate correlation data listed in Table 2.1, the number of scores coded 0 is .n0 = 3, the number of scores coded 1 is .n1 = 5, the sample mean of the y scores coded 0 is

.y¯0

=

1 n0

n0 i=1

y0i

=

10

+ 20 3

+ 30

=

20.00

,

(2.2)

the sample mean of the y scores coded 1 is

.y¯1

=

1 n1

n1 i=1

y1i

=

20 + 30 +

40 + 5

40 +

50

=

36.00

,

(2.3)

16

2 Statistical Methods

the sample mean of the y scores coded 0 and 1 is

.y¯ = 1 N

N

yi

=

10

+

20

+

30

+

20 + 8

30

+ 40

+

40

+ 50

=

30.00

,

i=1

(2.4)

the sample standard deviation of the y scores coded 0 and 1 is1

.Sy =

1N N i=1

yi − y¯ 2

1/2

= (10 − 30.00)2 + (20 − 30.00)2 + · · · + (50 − 30.00)2 1/2 7

= 12.2474 ,

(2.5)

the standard score that deﬁnes the lower .p = n0/N = 3/8 = 0.3750 of the unit-normal distribution is .z = −1.1503, the ordinate of the standard unit-normal distribution at .z = −1.1503 is

.u = exp√(−z2/2) = exp[√−(−1.1503)2/2] = 0.7731 ,

(2.6)

2π

(2)(3.1416)

and the biserial correlation coefﬁcient is

.rb

=

(y¯1 − y¯0)n0n1 N 2uSy

=

(20.00 − 36.00)(3)(5) (82)(0.7731)(12.2474)

=

−0.3961

.

(2.7)

The Point-Biserial Correlation Coefﬁcient

The point-biserial correlation coefﬁcient is given by

.rpb

=

(y¯0

−

y¯1

√ ) pq

Sy

,

(2.8)

where .y¯0 denotes the sample mean of the y scores coded 0, .y¯1 denotes the sample mean of the y scores coded 1, .Sy denotes the sample standard deviation of the y scores coded 0 and 1, p denotes the proportion of y scores coded 0, and q denotes the proportion of y scores coded 1.

1 Note that the summation is divided by N and not .N − 1, yielding the sample standard deviation and not the estimated population standard deviation.

2.2 Deﬁnitions

17

For the bivariate correlation data listed in Table 2.1, the number of observations
coded 0 is .n0 = 3, the number of observations coded 1 is .n1 = 5, the proportion of observations coded 0 is .p = 3/8 = 0.3750, the proportion of observations coded 1 is .q = 5/8 = 0.6250, the sample mean of the y scores coded 0 is .y¯0 = 20.00, the mean of the y scores coded 1 is .y¯1 = 36.00, the mean of the scores coded 0 and 1 is .y¯ = 30.00, the sample standard deviation of the y scores coded 0 and 1 is .Sy = 12.2474, and the point-biserial correlation coefﬁcient is

.rpb

=

(y¯0

− y¯1)√pq Sy

=

√ (20.00 − 36.00) (0.3750)(0.6250)
12.2474

= 7.7460 = −0.6325 . (2.9) 12.2474

Connections Linking Statistics rb and rpb

The biserial correlation coefﬁcient can be deﬁned in terms of the point-biserial correlation coefﬁcient and vice versa. Thus, the biserial correlation coefﬁcient, deﬁned in terms of the point-biserial correlation coefﬁcient, is

.rb

=

rpb √pq u

=

√ −0.6325 (0.3750)(0.6250)
0.7731

= −0.3961

(2.10)

and the point-biserial correlation coefﬁcient, deﬁned in terms of the biserial correlation coefﬁcient, is

.rpb

=

√rbu pq

=

√(−0.3961)(0.7731) (0.3750)(0.6250)

=

−0.3062 0.4841

= −0.6325 .

(2.11)

Thus, the connections linking the biserial and point-biserial correlation coefﬁcients are demonstrated.

2.2.2 Equivalencies
In contrast to statistical connections, two statistical tests or measures are considered equivalent when one test or measure can be deﬁned in terms of the second, and vice versa, and both yield identical results. An example is the equivalency between the Wilcoxon and Mann–Whitney two-sample rank-sum tests. To illustrate the statistical equivalency between the Wilcoxon and Mann–Whitney tests, consider the rank-score data listed in Table 2.2, where for two samples, the number of scores in Sample 1 is .n = 8, the number of scores in Sample 2 is .m = 12, and there are .N = n + m = 20 total scores.

18
Table 2.2 Example rank-score data (.R1 and .R2) for a comparison of a Wilcoxon and a Mann–Whitney two-sample rank-sum test

2 Statistical Methods

Sample 1

Score .R1

64

3

71

6

68

5

81 11

77

8

67

4

85 13

75

7

Sample 2

Score .R2

60

1

78

9

90 16

94 18

83 12

86 14

97 19

79 10

61

2

98 20

89 15

92 17

It is well established that Wilcoxon’s and Mann–Whitney’s two-sample rank-sum tests are equivalent, to the extent that the test is often referred to as the Wilcoxon– Mann–Whitney test [190, p. 128]. Wilcoxon’s W test statistic is simply a linear function of Mann and Whitney’s U test statistic. The two statistics differ only by a constant, given by

.nm + n(n + 1) , 2

where n denotes the number of scores in the sample with the smaller sum of ranks and m denotes the number of scores in the sample with the larger sum of ranks.
For the rank-score data listed in Table 2.2, Wilcoxon’s two-sample rank-sum test is simply the sum of the ranks in the smaller of the two sample sums, i.e., the sum of ranks in Sample 1 is2

n
.W = Ri = 3 + 6 + 5 + 11 + 8 + 4 + 13 + 7+ = 57
i=1

(2.12)

and Mann–Whitney’s two-sample rank-sum test is

nm

.U =

Sij ,

i=1 j =1

(2.13)

2 The sum of ranks in Sample 2 is 153.

2.2 Deﬁnitions

19

where for no tied ranks,

⎧ ⎨1 .Sij = ⎩ 0

if R1 < R2 , otherwise .

(2.14)

To illustrate the calculation of Mann and Whitney’s U statistic, consider the rank-
score data listed in Table 2.2. Since .R1 = 3 is less than all but two of the 12 values (1 and 2) in column .R2, .S = U = 10; next, .R1 = 6 is less than all but two of the 12 values (1 and 2) in column .R2, .U = 10 + 10 = 20; next, .R1 = 5 is less than all but two of the 12 values (1 and 2) in column .R2, .U = 10 + 10 + 10 = 30; next, .R1 = 11 is less than eight values (16, 18, 12, 14, 19, 20, 15, and 17) in column .R2, .U = 10 + 10 + 10 + 8 = 38; next, .R1 = 8 is less than all but two of the 12 values (1 and 2) in column .R2, .U = 10 + 10 + 10 + 8 + 10 = 48; next, .R1 = 4 is less than all but two of the 12 values in column .R2, .U = 10 + 10 + 10 + 8 + 10 + 10 = 58; next, .R1 = 13 is less than seven values (16, 18, 14, 19, 20, 15, and 17) in column .R2, .U = 10 + 10 + 10 + 8 + 10 + 10 + 7 = 65; ﬁnally, .R1 = 7 is less than all but two of the 12 values in column .R2, .U = 10 + 10 + 10 + 8 + 10 + 10 + 7 + 10 = 75.
Then, Wilcoxon’s rank-sum test, deﬁned in terms of Mann and Whitney’s rank-
sum test, is

n(n + 1)

(8)(8 + 1)

.W = nm +

− U = (8)(12) +

− 75 = 57

2

2

(2.15)

and Mann and Whitney’s rank-sum test, deﬁned in terms of Wilcoxon’s rank-sum test, is

.U = nm + n(n + 1) − W = (8)(12) + (8)(8 + 1) − 57 = 75 .

2

2

(2.16)

For the rank-score data listed in Table 2.2, the mean value of Wilcoxon’s W test statistic is

n(n + m + 1) (8)(8 + 12 + 1)

.μW =

2

=

2

= 84 ,

(2.17)

the variance of test statistic W is

.σW2

=

nm(n + m + 12

1)

=

(8)(12)(8 + 12 12

+ 1)

=

168

,

(2.18)

20

2 Statistical Methods

the standard normal deviate for Wilcoxon’s W is

.z = W − μW = 5√7 − 84 = −27.00 = −2.0831 ,

σW2

168 12.9615

(2.19)

and the .N (0, 1) probability value of .z = −2.0831 is .P = 0.0372. For the rank-score data listed in Table 2.2, the mean value of Mann and Whitney’s
U test statistic is

nm (8)(12) .μU = 2 = 2 = 48 ,

(2.20)

the variance of test statistic U is

.σU2

=

nm(n + m + 12

1)

=

(8)(12)(8 + 12

12

+ 1)

=

168

,

(2.21)

the standard normal deviate for Mann and Whitney’s U is

.z = U − μU = 7√5 − 48 = +27.00 = +2.0831 ,

σU2

168 12.9615

(2.22)

and the .N (0, 1) probability value of .z = +2.0831 is .P = 0.0372. The equivalencies linking Wilcoxon’s and Mann and Whitney’s two-sample rank-sum tests are hereby demonstrated with identical probability values.

2.2.3 Relationships
Two statistical tests or measures are considered related when one test or measure is incorporated into another test or measure. The fact that measures are related does not preclude connections linking some or all of the measures. Two measures exhibiting relationships with other measures are Pearson’s .χ 2 and Kendall’s S.
Pearson’s χ 2 Measure
A typical measure that is incorporated into other measures is Pearson’s .χ 2-test of independence. A few of the measures of nominal association incorporating Pearson’s .χ 2 measure are Pearson’s .φ2, Pearson’s C, Tschuprov’s .T 2, and Cramér’s .V 2 measures. To illustrate a statistical relationship, consider the frequency data given in Table 2.3 with .r = 2 rows, .c = 3 columns, and .N = 100 observations.

2.2 Deﬁnitions

21

Table 2.3 Example .2 × 3 contingency data with observed cell frequencies on the left in (a) and corresponding expected cell frequencies on the right in (b)

(a) Observed cell frequencies

Variable y

Variable x .y1 .y2 .y3

.x1

15 25 20

.x2

5

5 30

Total

20 30 50

Total 60 40 100

(b) Expected cell frequencies

Variable y

Variable x .y1 .y2 .y3

.x1

12 18 30

.x2

8 12 20

Total

20 30 50

Total 60 40
100

For the frequency data given in Table 2.3 with .Oij denoting an observed cell frequency and .Eij denoting an expected cell frequency for .i = 1, . . . , r and .j = 1, . . . , c, Pearson’s chi-squared test statistic is

r
.χ 2 =

c Oij − Eij 2 = (15 − 12)2 + (25 − 18)2

i=1 j =1

Eij

12

18

+ · · · + (30 − 20)2 = 17.0139 . 20

(2.23)

Then, Pearson’s .φ2 measure, incorporating .χ 2, is

.φ2 = χ 2 = 17.0139 = 0.1701 ,

N

100

Pearson’s C measure of contingency is

(2.24)

χ 2 1/2

17.0139 1/2

.C = χ 2 + N

= 17.0139 + 100

= 0.3813 ,

(2.25)

Tschuprov’s .T 2 measure is

.T 2 = √ χ 2

= √ 17.0139

= 0.1203 ,

N (r − 1)(c − 1) 100 (2 − 1)(3 − 1)

(2.26)

and Cramér’s .V 2 measure is

.V 2

=

N [min(r

χ2 − 1, c

− 1)]

=

17.0139 100[min(2 − 1, 3 − 1)]

=

0.1701

.

(2.27)

22

2 Statistical Methods

Since the four measures are deﬁned in terms of Pearson’s .χ 2, it follows that they are all connected. For example, Pearson’s C, deﬁned in terms of Pearson’s .φ2, is

φ2 1/2

0.1701 1/2

.C = 1 + φ2

= 1 + 0.1701

= 0.3813

(2.28)

and Pearson’s .φ2, deﬁned in terms of Pearson’s C, is

.φ2

=

C2 1 − C2

=

(0.3813)2 1 − (0.3813)2

=

0.1454 0.8546

=

0.1701

.

(2.29)

Kendall’s S Measure

A second measure that is incorporated into other measures is Kendall’s S test

statistic. A few of the measures of ordinal association incorporating Kendall’s S

measure are Kendall’s .τa, Kendall’s .τb, Stuart’s .τc, Goodman and Kruskal’s .γ , Wilson’s e, and Somers’ .dyx and .dxy measures. To illustrate a statistical relationship, consider the frequency data given in Table 2.4 with .r = 2 rows, .c = 3 columns, and .N = 27 observations. Kendall’s S is deﬁned as the number of concordant pairs

minus the number of discordant pairs.

For the frequency data given in Table 2.4, the number of concordant (like) pairs

is

⎛

r−1 c−1

r

.C =

nij ⎝

⎞
c
nkl⎠ = (2)(6 + 7) + (3)(7) = 47 ,

(2.30)

i=1 j =1

k=i+1 l=j +1

the number of discordant (unlike) pairs is

⎛

r−1 c−1

r

.D =

ni,c−j +1 ⎝

⎞
c−j
nkl⎠ = (4)(5 + 6) + (3)(5) = 59 ,

i=1 j =1

k=i+1 l=1

(2.31)

Table 2.4 Example data for
Kendall’s S test statistic with .r = 2 rows, .c = 3 columns, and .N = 27 cross-classiﬁed observations

Variable x .x1 .x2 Total

Variable y .y1 .y2 .y3 23 4 56 7 7 9 11

Total 9
18 27

2.2 Deﬁnitions

23

the number of pairs tied on variable x is

⎛

⎞

r c−1

c

.Tx =

⎝

ni k ⎠

i=1 j =1 k=j +1

= (2)(3 + 4) + (3)(4) + (5)(6 + 7) + (6)(7) = 133 ,

(2.32)

and the number of pairs tied on variable y is

c r−1
.Ty =
j =1 i=1

r
nkj
k=i+1

= (2)(5) + (3)(6) + (4)(7) = 56 .

For the frequency data given in Table 2.4, Kendall’s S test statistic is

(2.33)

.S = C − D = 47 − 59 = −12 ;

(2.34)

Kendall’s .τa symmetric measure of ordinal association, incorporating S, is

.τa

=

2S N (N − 1)

=

(2)(−12) (27)(27 − 1)

=

−24 702

=

−0.0342

;

(2.35)

Kendall’s .τb symmetric measure of ordinal association is

S .τb = (C + D + Tx )(C + D + Ty ) 1/2

=

(47

+

59

+

−12 133)(47

+

59 +

56)

1/2

=

−0.0610

;

(2.36)

Stuart’s .τc symmetric measure of ordinal association is

2mS

(2)(2)(−12)

.τc = N 2(m − 1) = (272)(2 − 1) = −0.0658 ,

(2.37)

where .m = min(r, c) = min(2, 3) = 2; Goodman and Kruskal’s .γ symmetric measure of ordinal association is

.γ

=

C

S +D

=

−12 47 + 59

=

−0.1132

;

(2.38)

Wilson’s e symmetric measure of ordinal association is

S

−12

.e = C + D + Tx + Ty = 47 + 59 + 133 + 56 = −0.0407 ;

(2.39)

24

2 Statistical Methods

Somers’ .dyx asymmetric measure of ordinal association is

.dyx

=

C

S + D + Ty

=

−12 47 + 59 + 56

= −0.0741 ,

(2.40)

and Somers’ .dxy asymmetric measure of ordinal association is

.dxy

=

C

S + D + Tx

=

−12 47 + 59 + 133

= −0.0502 .

(2.41)

Thus, the seven measures of ordinal association detailed above are related to Kendall’s S test statistic in the same sense that children are related to their common mother. Since all seven measures are deﬁned in terms of Kendall’s S, it follows that they are all interconnected. For example, the connection linking Somers’ .dyx and .dxy asymmetric measures and Kendall’s .τb symmetric measure is given by the geometric mean of .dyx and .dxy , i.e.,

.τb = dyx dxy = (−0.0741)(−0.0502) = ±0.0610 .

(2.42)

Although much of the book explores connections, equivalencies, and relationships between and among statistical tests and measures, there are other connections that are not related to tests and measures. As academics, the authors have taught a variety of courses in statistics and have used numerous different textbooks. The result is exposure to a variety of equivalent, but different equations for the same test. Students often have trouble connecting deﬁnitional and computational formulas for the same test or measure and then get even more confused when they discover there are a variety of computational equations.
As programmers and coders, the authors sometimes use formulas that are quite different than those used in ordinary textbooks, as the formulas are designed for calculation efﬁciency. And as specialists in permutation statistical methods, the authors often reduce equations to only their variable components when generating probability values from all possible arrangements of the observed data.

2.3 Permutation Statistical Methods
There are two models of statistical inference considered in this book. This chapter describes the two models of statistical inference: the “population” model ﬁrst put forward by contemporaries Jerzy Neyman (1894–1981) and Egon Pearson (1895– 1980) in 1928 and the “permutation” model developed by R.A. Fisher (1890–1962),

2.4 Statistical Models

25

E.J.G. Pitman (1897–1993), and others in the 1920s and 1930s. In this chapter, exact and Monte Carlo permutation statistical methods are described and compared with conventional (classical) parametric methods.

2.3.1 Exact Permutation Methods
Exact permutation methods generate all possible arrangements of the observed data, calculating the speciﬁed test statistic on each arrangement of the observed data. The exact probability value is the proportion of generated test statistic values that are equal to or greater than the observed test statistic value.

2.3.2 Monte Carlo Permutation Methods
When the reference set of possible arrangements is too large to generate efﬁciently, Monte Carlo permutation statistical methods can be employed. Monte Carlo permutation methods generate a large random sample of all possible arrangements of the data, calculating the speciﬁed test statistic on each randomly selected arrangement of the data. A Monte Carlo probability value is the proportion of randomly generated test statistic values that are equal to or greater than the observed test statistic value. Given a sufﬁcient number of randomly selected arrangements of the data, a Monte Carlo probability value can be computed to any reasonable accuracy.

2.4 Statistical Models
Similar to many ﬁelds of inquiry that bring different theoretical approaches to the description and analysis of data, the ﬁeld of statistics approaches data analysis from a number of different schools of thought, commonly called statistical models. The two models considered here are the Neyman–Pearson population model and the Fisher–Pitman permutation model.3 The Neyman–Pearson population model of statistical inference is the standard model taught in all introductory statistics classes and will be familiar to most readers.
The Neyman–Pearson population model is named after J. Neyman (1894–1981) and E. Pearson (1895–1980) and is designed to make inferences about population parameters and provide approximate probability values. The Fisher–Pitman permu-
3 Another popular model of statistical inference is the Bayesian model, named after English statistician and Presbyterian minister Thomas Bayes (1701–1761), which is not included in this book. Bayesian statistical models are widely used in economics and business.

26

2 Statistical Methods

tation model of statistical inference is less well known than the Neyman–Pearson population model. The Fisher–Pitman permutation model is named after R.A. Fisher (1890–1962) and E.J.G. Pitman (1897–1993) and, as noted above, includes two different permutation methodologies: exact permutation statistical methods and Monte Carlo permutation statistical methods. For a concise comparison of the different approaches to statistical analysis developed by Neyman and Fisher, see a small volume on Fisher, Neyman, and the Creation of Classical Statistics by Erich Lehmann [124].

2.5 The Neyman–Pearson Population Model
The population model of statistical inference, formally proposed by contemporaries Jerzy Neyman (1894–1981) and Egon Pearson (1895–1980) in a seminal two-part article on statistical inference published in Biometrika in 1928, is the model taught almost exclusively in introductory courses on statistical methods. The Neyman– Pearson population model of statistical inference assumes random sampling with replacement from one or more speciﬁed populations [161, 162]. Under the Neyman– Pearson population model, the level of statistical signiﬁcance that results from applying a statistical test to the results of an experiment or survey corresponds to the frequency with which the null hypothesis (.H0) would be rejected in repeated random samplings from the same speciﬁed population or populations.
Because repeated sampling of the speciﬁed population(s) is prohibitive, it is assumed that an approximating theoretical distribution such as z, t, F , or .χ 2 conforms to the discrete sampling distribution of the test statistics generated under repeated random sampling. That assumption then allows researchers to integrate the distribution and determine an approximate probability value for the experiment, ﬁeld research, clinical trial, or survey.
Under the Neyman–Pearson population model, two hypotheses concerning a population parameter or parameters are advanced: the null hypothesis symbolized by .H0 and a mutually exclusive, exhaustive alternative hypothesis symbolized by .H1. For example, for a two-sample test the null hypothesis might stipulate no difference between the means of the populations from which the samples have been randomly drawn, i.e., .H0: μ1 = μ2, and the alternative hypothesis might simply state that there is a difference, i.e., .H1: μ1 = μ2. The probability of rejecting a true .H0 is determined by the researcher and speciﬁed as Type I or .α error, a deﬁned region of rejection in the tail or tails of the theoretical distribution that is delimited corresponding to .α; for example, .α = 0.05 or .α = 0.01, and .H0 is rejected if the observed test statistic value falls into the regions(s) of rejection with probability of Type I error equal to or less than .α.
The introduction of an approximating theoretical distribution such as z, t, F , or .χ 2 imparts a host of requirements, commonly called assumptions. These assumptions include independence of observations, random sampling from a normally distributed population, and homogeneity of variance and homogeneity of

2.5 The Neyman–Pearson Population Model

27

covariance, when the speciﬁed model requires them. Each of these assumptions is examined in more detail below.

2.5.1 Independence
In conventional statistical analyses the observations are assumed to be independent of each other. Thus, for any two observations with an experimental treatment, it is assumed that knowing how one of the observations stands relative to the treatment mean implies nothing about the other observation. This is the major motivation for randomization of subjects to treatments [93, p. 301]. Consider, for example, a researcher who wants to randomly assign subjects into two groups: a treatment group and a control group. A random sample of N subjects is arranged in single ﬁle and the researcher assigns each subject to one of the two conditions by the toss of a balanced coin, moving down the line of subjects where “heads” assigns the subject to the control group and “tails” assigns the subject to the experimental group, or vice versa. If three groups were used instead of two, e.g., one control group and two treatment groups, the researcher might utilize a fair die where a “1” or “2” assigns a subject to the control group, a “3” or “4” assigns the subject to the ﬁrst treatment group, and a “5” or “6” assigns the subject to the second treatment group. Violation of the independence assumption can result in serious misinterpretations for the subsequent statistical analysis [113].
In the 1920s and 1930s, statisticians expressed considerable concern about the normality or nonnormality of distributions. R.A. Fisher, on the other hand, felt that nonnormality was less of a factor than independence. He observed that the fertility of adjacent plots, rain in successive hours of the day, or yield on successive milkings of a cow were not independent, but were highly correlated. Almost certainly, it was the evident lack of independence of ﬁeld observations that led Fisher to seek a foundation for an analysis which did not involve this assumption. He found the answer in permutation [randomization] methods [25, p. 148].
There is some confusion over the use of the word “randomization.” On the one hand, randomization refers to the random assignment of subjects to treatments. For example, a sample of subjects may be randomly assigned to a treatment and a control group using a fair coin. On the other hand, in a permutation context randomization refers to the random shufﬂing of subjects within, between, or among treatments. For example, given a typical correlation analysis, one variable can be held constant and the other variable can be randomly shufﬂed yielding possible .N ! arrangements. Then, a correlation coefﬁcient is calculated on each arrangement. In the early permutation literature, randomization referred to the latter analysis. In the contemporary literature, the term “Monte Carlo” is preferred over “randomization,” thus avoiding any confusion with the random assignment of subjects to treatments.

28
2.5.2 Random Sampling

2 Statistical Methods

It is important to note that the mathematical theorems that justify most statistical procedures under the Neyman–Pearson population model of statistical inference apply only to random samples drawn with replacement from a completely speciﬁed sampling frame. However, if the sample is not a random sample drawn with replacement from a well-deﬁned population, the validity of the hypothesis test may be questionable [176]. It should be emphasized that random sampling from a normally distributed population is seldom met in practice. Conventional statistics specify sampling with replacement, while in practice sampling is always done without replacement. Provided the sample is a small proportion of the population, there is little difference in the results.
As many have noted, the requirement of random sampling is fundamental to classical statistics and of paramount importance to statistical inference under the Neyman–Pearson population model. Thus, random sampling, i.e., a sampling technique in which every element in a speciﬁed population has an equal, or otherwise known, opportunity of being selected, is the single most important requirement in contemporary statistical research. Done properly, random sampling permits the researcher to generalize results to the target population.
A variety of mathematical theorems constitute the underpinnings of conventional statistics. Thus, it is important to understand the mathematical principles that justify most statistical procedures apply only to statistics calculated on data derived from random samples drawn with replacement from a completely speciﬁed and valid sampling frame. For example, the model assumptions for the most basic statistic—a one-sample z-test—are a simple sample of a random variable drawn from a normally distributed population. Consider all possible simple random samples of size N from random variable x. Then, if the model assumptions and the null hypothesis are both true, the sampling distribution of sample means, .x¯, will be approximately normal with mean .μx¯ equal to√the value speciﬁed by the null hypothesis, .μ0, and standard error .σx¯ given by .σx/ N , where .σx denotes the population standard deviation. However, if the sample is not a simple random sample from a well-deﬁned sampling frame, then the validity of the hypothesis test is questionable.
A common misconception is that under the central limit theorem (CLT), if the sample size is equal to or greater than 30, the sampling distribution of the test statistic will be approximately normal, regardless of the distribution of the population from which the random samples have been drawn. However, this applies only to sample means (.x¯). The authors commonly see the central limit theorem applied to correlation coefﬁcients, both simple and multiple, which is contrary to mathematical theory. If, for example, the population correlation between education and income is .ρxy = +0.60, the sampling distribution of .rxy will be negatively skewed, no matter the number of samples drawn nor the size of the samples. In 1928 Fisher provided a normalizing function for the sampling distribution of .rxy based on the hyperbolic arc tangent [61, 62]. An attendant problem, often observed by the authors, is the use of more sophisticated sampling procedures, such as systematic,

2.5 The Neyman–Pearson Population Model

29

stratiﬁed, or cluster sampling, where the researcher then submits the sampled data to a statistical package for analysis, unaware (or unconcerned) that the package assumes a simple random sample.
As John Ludbrook noted many years ago, statisticians such as R.A. Fisher, Frank Yates, and Oscar Kempthorne, early founders of permutation statistical methods, but who typically analyzed agricultural data with conventional statistical methods under the Neyman–Pearson population model of statistical inference, readily acknowledged that in their extensive agricultural research, random samples were never drawn from, nor represented, well-deﬁned populations. In their experiments at the Rothamsted Experimental Station, 25 miles northwest of London, plant varieties or different fertilizers were assigned to blocks of land within a ﬁeld by a process of randomization. The ﬁeld was not a random sample of the population of all possible ﬁelds, or even a random sample of ﬁelds from a deﬁned category [130, p. 675].
This holds true for contemporary research where samples of patients, laboratory animals, students, the homeless, or incarcerated felons seldom are drawn from a well-deﬁned population and are usually acquired in a nonrandom fashion, then randomized into subgroups in which one or more subgroups receive an intervention or treatment and the other subgroup is maintained as a control group. In fact, a number of authors have documented that the requirement of obtaining a random sample from a well-deﬁned population is seldom met in practice; see, for example, articles by Altman and Bland [3], Bradbury [26], Feinstein [58], Frick [70], LaFleur and Greevy [121], Ludbrook [130], Ludbrook and Dudley [132], and Still and White [197].
Social scientists and psychologists, in particular, have been especially concerned with problems of random sampling because, perhaps, random samples from welldescribed populations are so difﬁcult to obtain in the social sciences. Writing in Psychological Bulletin in 1966, psychologist Eugene Edgington stated his position unequivocally: “statistical inferences cannot be made concerning populations that have not been randomly sampled” [51, p. 485]. In 1988 psychologist William Hays wrote:
The point is that some probability structure must be known or assumed to underlie the occurrence of samples if statistical inference is to proceed. . . . This is why our assumption of random sampling is not to be taken lightly. . . . Unless this assumption is at least reasonable, the probability results of inferential methods mean very little, and these methods might as well be omitted. [86, p. 212]4
Writing in Canadian Psychology in 1993, psychologists Michael Hunter and Richard May noted that random sampling is of particular importance to psychologists, “who rarely use random sampling or any other sort of probability sampling” [96, p. 385].
To reiterate an earlier point, the problem of random sampling is especially acute in survey research where typically only a small proportion of respondents return questionnaires or respond to telephone calls. It is probably safe to conclude that

4 Emphasis in the original.

30

2 Statistical Methods

very few studies in, for example, the social sciences even attempt to achieve a truly random sample.

2.5.3 Normality
The assumption of normality is so basic to classical statistics that it deserves special attention. Two points should be emphasized. The assumption of normality by conventional tests is always unrealistic and never justiﬁed in practice [138]. The fact that statisticians recognize the challenges of normality is obvious, given the frequent discussions about the robustness of various statistics when the normality assumption is violated.5
It is important to recognize that the normality assumption is not new; statisticians have described and discussed the issue for decades. In 1917 the French physicist and Nobel laureate in physics, Gabriel Lippmann, once wrote in a letter to Henri Poincaré à propos the normal curve: “Experimentalists think that it is a mathematical theorem, while mathematicians believe it to be an experimental effect” (Gabriel Lippman, quoted in D’Arcy Wentworth Thompson’s On Growth and Form [203, p. 121]). A decade later, R.C. Geary famously proclaimed: “Normality is a myth; there never has, and never will be, a normal distribution” [73, p. 241], and in 1938 Joseph Berkson wrote: “we may assume that it is practically certain that any series of real observations does not actually follow a normal curve with absolute exactitude in all respects” [8, p. 526];6 Robert Matthews once described the normal distribution as “beautiful, beguiling and thoroughly dangerous” [138, p. 193].
In 1947 George Barnard, responding to a paper published by Egon Pearson [167] on the analysis of .2 × 2 contingency tables, noted that “in the case of normal distributions, and only in this case, the mean and variance of samples are independently distributed” [4, p. 169].7 Finally, in 1954 I.D.J. Bross pointed out, in reference to the normal distribution, that statistical methods “are based on certain assumptions— assumptions which not only can be wrong, but in many situations are wrong” [29, p. 815].8
Other researchers have empirically demonstrated the prevalence of highly skewed and heavy-tailed distributions in a variety of academic disciplines; see, for example, discussions by Schmidt and Johnson [187], Bradley [27], Saal, Downey, and Lahey [182], Bernardin and Beatty [9], Matthews [139], and Murphy and Cleveland [158], the best known of which is Micceri’s widely quoted 1989
5 Statistical robustness refers to statistical tests and measures that are not unduly affected by extreme values or small departures from the Neyman–Pearson population model assumptions, such as normality or homogeneity of variance. 6 Emphasis in the original. 7 Emphasis in the original. 8 Emphasis in the original.

2.5 The Neyman–Pearson Population Model

31

article on “The unicorn, the normal curve, and other improbable creatures” in the Psychological Bulletin [147].
Extreme values or outliers are prevalent in applied research, leading to nonnormal distributions. In the social sciences most variables are not even close to normally distributed and many are highly skewed, often positively. Some examples of positively skewed variables in the social sciences are family income, net worth, prices of houses sold in a given month, age at ﬁrst marriage, length of ﬁrst marriage, monthly apartment rent, and amount of student debt.
Newman, in a discussion of power-law distributions, provided other examples of skewed distributions: sales of book titles, populations of cities, frequencies of words in human languages, the number of “hits” on web pages, the number of citations of academic papers, the ﬁnancial worth of individuals, the magnitudes of earthquakes and solar ﬂares, and the sizes of craters on the moon [160].
Finally, a number of authors have documented that the assumption of normality is rarely satisﬁed in real-data situations; see, for example, articles by Bross [29], Feinstein [58], and Geary [72]. See also a discussion by Stephen Stigler in The Seven Pillars of Statistical Wisdom [196, pp. 91–92].

2.5.4 Homogeneity of Variance
Homogeneity of variance is fundamental to the independent samples t-test and the multi-sample completely randomized F -test, e.g., for k independent random samples, .H0: σ12 = σ22 = · · · = σk2. When confounded with unequal sample sizes, serious problems can arise. When sample sizes are unequal and the homogeneity assumption does not hold, t and F -ratio test statistics tend to be in error. Speciﬁcally, for Student’s two-sample t-test, if the smaller of the two samples has the greater variance, a Type I error is more likely to occur, which can cause the null hypothesis to be falsely rejected.
On the other hand, for Student’s two-sample t-test, if the larger of the two samples has the greater variance, a Type II error is more likely to occur, which can cause the null hypothesis to fail to be rejected. This does not cause the same problem as falsely rejecting the null hypothesis; however, it can cause a decrease in the power of the test [24, 79, 84, 89, 94]. In addition, it has been well documented that equal sample sizes provide little protection against inﬂated error rates for the t and F -ratio tests when variances are unequal [75, 84].

2.5.5 Homogeneity of Covariance
The assumption of homogeneity of covariance is more complicated than the assumption of homogeneity of variance. Homogeneity of covariance is an assumption underlying randomized-block analysis-of-variance and other analysis-of-variance

32

2 Statistical Methods

Table 2.5 Example data comparing ﬁve variances and 10 covariances

Factor A
.a1 .a2 .a3 .a4 .a5 .s12 .s122 .s123 .s124 .s125 .s221 .s22 .s223 .s224 .s225 .s321 .s322 .s32 .s324 .s325 .s421 .s422 .s423 .s42 .s425 .s521 .s522 .s523 .s524 .s52

Table 2.6 Example data comparing three variances among differences between three sets of ages

Subject 1 2 3 4 5 6 Variance

Age 1 20 24 21 31 19 25

Age 2 22 28 23 35 25 30

Age 3 25 36 23 39 28 35

Ages 1–2 .−2 .−4 .−2 .−4 .−6 .−5 2.5667

Ages 1–3 .−5 .−12 .−2 .−8 .−9 .−10 13.0667

Age 2–3 .−3 .−8 0 .−4 .−3 .−5 6.9667

designs. There are two assumptions on the variance–covariance matrix, illustrated in Table 2.4 with ﬁve levels of Factor A. One assumption is that all the estimated population variances on the principal diagonal are identical, i.e., .s12 = s22 = · · · = s52. The second assumption is that all the estimated population covariances off the principal diagonal are identical, i.e., .s122 = s123 = · · · = s425 and .s221 = s321 = · · · = s524. However, the estimated population variances and the estimated population covariances need not be the same, and seldom are. Taken together, the two assumptions are known as the assumption of compound symmetry and can be thought of as an extension to the assumption of homogeneity of variance in completely randomized analysis-of-variance designs. The assumption of compound symmetry is actually more stringent than necessary and in recent years it has been replaced by the assumption of sphericity—equality of variances of the differences between all pairs of factor levels, of which, for k treatments, there will be .k(k −1)/2 pairs of levels. Table 2.5 displays estimated variances and covariances for ﬁve treatments.

The Sphericity Assumption
To illustrate the concept of sphericity as equality of variances of the differences between all pairs of factor levels, consider the example data given in Table 2.6. These data are from a ﬁctitious study designed to demonstrate the problem of unequal variances and describe the ages of the ﬁrst three arrests by police for six subjects. For example, Subject 1 was ﬁrst arrested at age 20, next arrested at age 22, and arrested a third time at age 25. There are two years between the ﬁrst and second

2.6 A Summary: The Five Assumptions

33

arrests, ﬁve years between the ﬁrst and third arrests, and three years between the second and third arrests for Subject 1. Likewise, there are four years between the ﬁrst and second arrests, 12 years between the ﬁrst and third arrests, and eight years between the second and third arrests for Subject 2. These differences in the ages of arrest are computed for each subject for each of the three arrests in Table 2.6. The variance for each set of comparisons is included at the bottom of Table 2.6. In particular, as is readily apparent in Table 2.6, the variances between the paired differences are not equal: .s122 = 2.5667, .s123 = 13.0667, and .s223 = 6.9667, with differences of . 12 = 2.5667 − 13.0667 = −10.50, . 13 = 2.5667 − 6.9667 = −4.40, and . 23 = 13.0667 − 6.9667 = +6.10. The unequal variances indicate a potential departure from the assumption of sphericity.

2.6 A Summary: The Five Assumptions
These ﬁve assumptions of the Neyman–Pearson population model of statistical inference (independence, random sampling, normality, homogeneity of variance, and homogeneity of covariance) have been studied extensively over the past 100 years. Some assumptions are more important than others. Student’s t and Fisher’s F are quite robust to violations of the assumption of normality, especially if the populations are skewed in the same direction, but neither test is very robust to violations of the assumption of homogeneity of variance. This is especially concerning when the samples differ with respect to both size and variance. A relatively larger sample with a relatively greater variance increases the risk of Type II error, i.e., failure to reject a false null hypothesis, while a relatively smaller sample with a relatively greater variance increases the risk of Type I error, i.e., rejection of a true null hypothesis.
In general it can be said that test statistics such as Student’s t-tests and Fisher’s F tests are robust to violations of the assumptions, meaning Student’s t- and Fisher’s F -tests are not unduly affected by the presence of a few extreme values or by small departures from the Neyman–Pearson model assumptions. While Student’s t-tests and Fisher’s F -tests are relatively robust to the assumption of normality, the problem becomes intractable when two or more assumptions are violated, e.g., normality and homogeneity of variance and, second, when one of the assumptions is severely violated. Examples might include a survey that returns only 35% of the respondents or the inclusion of a highly skewed variable, such as family income.
It is imperative that researchers clearly demonstrate that the assumptions underlying a statistical analysis have been met and explain what they have done to ensure that their data meet the assumptions. Ignorance or uncritical assessment of the adequacy and limitations of statistical methods are the primary source of error in published papers. The importance of testing and satisfying assumptions cannot be

34

2 Statistical Methods

emphasized strongly enough: The results of a statistical analysis under the Neyman– Pearson population model are only as credible as the assumptions underpinning the analysis. If the assumptions have not been met, the reported results have been compromised.

2.7 The Fisher–Pitman Permutation Model

The best-known methods of statistical inference assume that the form of the probability distribution from which the measurements were taken is known, i.e., parametric statistical methods. For example, under the Neyman–Pearson population model, it is assumed that the sample measurements have been randomly drawn from a normally distributed population. The researcher then proceeds to test hypotheses about the population’s parameters, such as the mean, .μx, or the variance, .σx2. Frequently, however, one cannot comfortably make an assumption about the form of the probability distribution under study. The evidence may not support the chosen form or, worse, may sharply deny it.
In such circumstances it may be preferable to employ statistical methods whose strengths do not depend on the shape of the distribution. Thus, researchers often turn to statistical methods based on signs of differences, ranks of measurements, and counts of objects or events falling into categories. Such methods do not rest heavily on the explicit parameters of the probability distribution and, for this reason, are called nonparametric statistical methods. The point here is that many nonparametric statistical methods stand up well against various shapes of distributions and failures of assumptions. One form of nonparametric methods is permutation statistical methods. What sets permutation statistical methods apart from the family of nonparametric methods is permutation methods are not limited to signs, ranks, or counts, but readily accept interval- and ratio-level measurements, much like their parametric counterparts, but without the attendant assumptions.
Given the stringent assumptions of statistical analyses under the Neyman– Pearson population model, there is a pressing need for a method of statistical analysis for those cases where the assumptions required of the parent population(s) cannot be met [185]. The Neyman–Pearson population model is an elegant and highly reﬁned mathematical model when used as originally intended. However, it is an inappropriate model for many datasets. The fault lies not with the model, but with the data. Enter statistical methods under the Fisher–Pitman permutation model.

Formally, “permutations” refer to the number of arrangements of N objects considered n at a time. For example, for .N = 4 and .n = 2, the number of permutations is given by

.P

N n

=

N! (N − n)!

=

P

4 2

=

4! (4 − 2)!

=

24 2

=

12

.

2.7 The Fisher–Pitman Permutation Model

35

Thus, for .x = a, b, c, d, the 12 permutations are ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, and dc.
In contrast, the number of combinations is

. C

N n

=

n!

N! (N − n)!

=

C 42

=

2!

4! (4 −

2)!

=

24 4

=

6

.

Thus, for .x = a, b, c, d, the six combinations are ab, ac, ad, bc, bd, and cd.
For permutations, order matters; thus, ab is different than ba for permutations, but for combinations, order does not matter; thus, there is no difference between combinations ab and ba. The reader may ﬁnd it interesting that a “combination” lock is actually a permutation lock, as the lock combination 5, 7, 3, 2 is a different combination than 7, 3, 2, 5.

While the Neyman–Pearson population model of statistical inference is familiar to most researchers, the Fisher–Pitman permutation model of inference will be less familiar. For the interested reader, a number of excellent presentations of the two models are available. See, especially, discussions by Curran-Everett [44], Feinstein [58], Hubbard [95], Kempthorne [102], Kennedy [112], Lachin [120], Ludbrook [130, 131], and May and Hunter [142]. For a more comprehensive overview of permutation statistical methods, see A Chronicle of Permutation Statistical Methods by Berry, Johnston, and Mielke [13].

2.7.1 A Historical Perspective
It is important to place permutation statistical methods into a historical perspective. While permutation methods predate many conventional statistical methods, it has only been in recent decades that permutation methods have emerged as a viable alternative to conventional statistical methods. Permutation methods were initiated in the 1920s and 1930s with the contributions of R.A. Fisher9 [64], R.C. Geary [72], T. Eden and F. Yates [49], H. Hotelling and M.R. Pabst [91], and E.J.G. Pitman [174–176]. The original application of permutation statistical procedures was to validate t- and F -tests when the assumptions underlying those tests were questionable. Thus, in the 1920s and 1930s, permutation tests were considered the gold standard for validating conventional tests and measures.
In 1925, Fisher [64] calculated an exact probability value using the binomial probability distribution. Although the binomial distribution is not technically a permutation test, the binomial distribution does yield an exact probability value,
9 After 1952, Sir Ronald A. Fisher.

36

2 Statistical Methods

as noted by Kempthorne [102]. The following year, Geary used an exact analysis to demonstrate the utility of asymptotic approaches for data analysis in an investigation of the properties of correlation and regression in ﬁnite populations [72]. In 1933, Eden and Yates examined height measurements of wheat grown in eight blocks [49]. Their work included comparisons of simulated and theoretical probabilities based on the normality assumption and the results were found to be in close agreement, supporting the assumption of normality.
In 1936 at Columbia University, Harold Hotelling and Margaret Richards Pabst used permutation statistical methods to calculate exact probability values for small samples of ranked data in an examination of correlation methods [91]. This important article utilized the calculation of a probability value that incorporated all permutations of the data, under the null hypothesis that all arrangements of the observed data are equally likely. In 1937 and 1938, E.J.G. Pitman, a mathematician ensconced at the University of Tasmania, contributed three seminal papers on permutation statistical methods. The ﬁrst paper utilized permutation statistical methods to analyze two independent samples [174], the second paper utilized permutation statistical methods in analyzing linear correlation [175], and the third paper utilized permutation statistical methods in an analysis-of-variance design [176]. The inﬂuence of these three papers by Pitman on the subsequent development of permutation statistical methods should not be underestimated.
The 1930s, 1940s, and 1950s witnessed a proliferation of nonparametric tests, i.e., statistical tests that are usually applied to nominal and ordinal measurements and do not assume that the underlying distribution is known, for example, Friedman’s two-way analysis of variance for ranks in 1937 [71], Kendall’s rank-order correlation coefﬁcient in 1938 [103], Wilcoxon’s signed-rank test in 1945 [211], Festinger’s two-sample rank-sum test in 1946 [60], Kruskal and Wallis’s C-sample rank-sum test in 1952 [119], and Dwass’ modiﬁed randomization test in 1957 [48]. Permutation statistical methods were often employed to generate tables of exact probability values for small samples. A theme that was repeated in this period involved difﬁculty of computation and, in response, conversion of raw data to ranks to simplify calculations.
Permutation statistical methods are generally recognized as a subset of nonparametric statistical methods. Henry Scheffé is best remembered for his contributions to the analysis of variance [186]. In the 1940s, however, Scheffé was deeply interested in hypothesis testing and nonparametric inference. In 1943 in an article titled “Statistical inference in the nonparametric case” in The Annals of Mathematical Statistics, Scheffé presciently wrote:
Only a small fraction of the extensive literature of mathematical statistics is devoted to the nonparametric case, and most of this is of the last decade. We may expect this branch to be rapidly explored however: The prospects of a theory freed from speciﬁc assumptions about the form of the population distribution should excite both the theoretician and the practitioner since such a theory might combine elegance of structure with wide applicability. [185, p. 305]
In the 1960s and 1970s, mainframe computers became available to academicians at major research universities, and by the end of this period, personal computers,

2.7 The Fisher–Pitman Permutation Model

37

although not common, were available to many researchers. In addition, the speed of computing increased greatly between 1960 and 1980. Permutation statistical methods arrived at a level of maturity during the period between 1980 and 2000 primarily as a result of two factors: greatly improved computer clock speeds and widely available and relatively inexpensive desktop computers. These two factors enabled permutation statistical methods to branch out from their home in statistics to include a variety of other disciplines, most notably in psychology with an article on randomization tests in 1964 followed by a book titled Randomization Tests in 1980 by psychologist Eugene Edgington at the University of Calgary [50, 52].10 By the early 2000s, computing power had advanced sufﬁciently that permutation statistical methods were available to provide probability values in an efﬁcient manner for a wide variety of statistical tests and measures [16].
In summary, in the early development period of 1920 to 1940, permutation statistical methods existed primarily in theory only and possessed practical utility for only the smallest of samples. In these early years, permutation statistical methods were used solely to test and validate the assumptions of conventional statistical methods under the Neyman–Pearson population model. It took the advent of highspeed computing to make permutation methods a practical statistical alternative with applications to larger datasets.
Thus, while today researchers take high-speed computing for granted, it has been only in the last 30 or so years that computing speed has evolved, allowing permutation statistical methods to achieve their potential, resulting in a suite of practical statistical methods with many research applications. Because of their simplicity and lack of assumptions, permutation statistical methods have supplanted conventional statistical methods in several ﬁelds of study.

2.7.2 Permutation and Parametric Tests
Permutation statistical tests, based on the Fisher–Pitman permutation model, differ from traditional parametric tests, based on the Neyman–Pearson population model in a number of respects. For a permutation statistical test in its most basic form, a test statistic is computed for the observed data—often the same test statistic found in the Neyman–Pearson population model, such as Student’s t-test for two independent samples, Fisher’s one-way completely randomized analysis of variance, or Pearson’s product-moment correlation coefﬁcient. The observations are then permuted over all possible arrangements of the observed data and the speciﬁed statistic is computed for each possible, equally likely arrangement of the observed data. For a simple example, if the observed data are {.a, b, c}, there are six possible, equally likely arrangements of the data: the observed arrangement {.a, b, c} and permutations {.a, c, b}, {.b, a, c}, {.b, c, a}, {.c, a, b}, and {.c, b, a}.
10 The term “randomization” is often used as a synonym for “permutation.”

38

2 Statistical Methods

The null hypothesis under the Fisher–Pitman permutation model simply states that all arrangements of the observed data are equally likely [91]. The proportion of arrangements in the reference set of all possible arrangements possessing test statistic values that are equal to or more extreme than the observed test statistic value constitutes the exact probability of the observed test statistic value.
Statistical tests and measures based on the Fisher–Pitman permutation model possess several advantages over statistical tests and measures based on the Neyman– Pearson population model. First, tests based on the permutation model are less complex than analogous tests based on the population model, meaning permutation tests and measures possess fewer assumptions and requirements than conventional tests. Therefore, the results are much easier to communicate to statistically naïve audiences.
Second, permutation tests provide exact probability values based on the discrete permutation distribution of equally likely test statistic values. Tests based on the Neyman–Pearson population model provide only vague results such as .P < 0.05.11
Third, permutation tests are entirely data-dependent in that all the information required for analysis is contained within the observed data. There is no reliance on factors external to the observed data, such as population parameters, assumptions about theoretical approximating distributions, or alternative hypotheses. While all statistical tests and measures are dependent on the nature of the data, permutation tests are “data-dependent” in the sense that nothing that is not contained in the data is considered relevant.
Fourth, permutation tests are appropriate for nonrandom samples, such as are common in many ﬁelds of research. It is unusual, especially in the social and behavioral sciences, to collect a truly random sample of subjects. To be perfectly clear, random sampling, as it is generally understood under the Neyman–Pearson population model, is not germane to permutation statistical methods and permutation methods, under the Fisher–Pitman model, never assume random sampling. On the other hand, given a random sample, permutation statistical methods are entirely appropriate.
Fifth, permutation tests are distribution-free in that they do not depend on the assumptions associated with conventional tests under the Neyman–Pearson population model, such as normality or homogeneity of variance.
Sixth, permutation tests are ideal for small datasets, where conventional tests often are problematic when attempting to ﬁt a continuous theoretical distribution to only a few unique values.
One drawback to permutation statistical methods under the Fisher–Pitman permutation model is that permutation methods are neither designed to estimate nor to test population parameters. Permutation methods simply ascertain, for example, if there is a difference between a control and an experimental group. Two points are relevant here. First, in the absence of random sampling and target variables that

11 In this book, an uppercase P indicates a cumulative probability value and a lowercase p indicates a point probability value.

2.7 The Fisher–Pitman Permutation Model

39

are not normally distributed with heterogeneous variances (where applicable), it is inappropriate to estimate population parameters under the traditional Neyman– Pearson population model. Given the nature of many datasets, there exists a serious disjunction between the data and the assumptions underlying the Neyman–Pearson population model of statistical inference. The fault lies more with the data than with the model.
Second, given random sampling from a speciﬁed population, permutation statistical methods are perfectly capable of making valid estimates of population parameters, with or without the assumptions of normality and homogeneity of variance. As a result it should be noted that when the assumptions of independence, random sampling, normality, and homogeneity of variance are satisﬁed, the results under the Neyman–Pearson population model and the Fisher–Pitman permutation model often tend to agreement [202].
Because permutation statistical methods are inherently computationally intensive, it took the development of high-speed computing for permutation methods to achieve their potential. Today, a small laptop computer outperforms even the largest and fastest mainframe computers of previous decades. Both conventional statistical methods under the Neyman–Pearson model and permutation statistical methods under the Fisher–Pitman model originated in the 1920s and 1930s, but because of the lack of high-speed computing over 80 or so years, statistical methods under the Neyman–Pearson population model advanced orders of complexity, while statistical methods under the Fisher–Pitman permutation model languished in relative obscurity. One can only speculate what the development of modern statistical methods would look like had modern computers been available 100 years ago.
Two types of permutation methods are common in the literature: exact and Monte Carlo permutation methods. Exact permutation statistical methods generate and evaluate all possible arrangements of the observed data, while Monte Carlo permutation statistical methods generate and evaluate a large random sample drawn from all possible arrangements of the observed data.

2.7.3 Exact Permutation Tests
As described previously, the ﬁrst step in an exact permutation test is to calculate a test statistic value for the data. Second, a reference set of all possible, equally likely arrangements of the data is systematically generated. Third, the desired test statistic is calculated for each arrangement in the reference set. Fourth, the probability of obtaining the test statistic, or one more extreme, is the proportion of the test statistics in the reference set with values that are equal to or more extreme than the observed test statistic.
Figure 2.1 presents a ﬂowchart detailing the calculation of an exact permutation probability value under the Fisher–Pitman permutation model. The ﬁrst step is to initialize two counters: in this case, counter A and counter B. Counter A provides

40 Fig. 2.1 Flowchart for the calculation of an exact permutation probability value
no

2 Statistical Methods
Begin
Initialize counter A and counter B
Calculate the desired test statistic on the set of observed data
Permute a new arrangement of the set of observed data
Calculate a new test statistic and increase counter B
Test if new statistic is equal to or greater than the observed statistic
yes no
Increase counter A
Test if this is the last arrangement of the observed set of data
yes Divide the results of counter A by the results of counter B
Stop

2.7 The Fisher–Pitman Permutation Model
Table 2.7 Example comparing sentencing differences between n1 = 3 females and n2 = 4 males

41

Females

Subject Years

A

3

B

4

C

6

Males Subject D E F G

Years 5 7 8 9

a count of all test statistic values that are equal to or greater than the observed test statistic value. Counter B provides a count of all possible arrangements of the data. Second, the desired test statistic is calculated on the data. Third, a new arrangement of the data is generated while preserving the sample size(s) and counter B is increased by 1. Fourth, the desired test statistic is calculated on the new arrangement of the data and compared with the original test statistic value calculated on the observed set of data. If the value of the new test statistic is equal to or greater than the value of the observed test statistic, counter A is increased by 1. If not, a check is made to see if this arrangement is the last in the reference set of all possible arrangements. If it is, then counter A divided by counter B yields the exact probability value, that is, the proportion of all possible test statistic values that are equal to or greater than the observed test statistic value. Otherwise, a new arrangement of the data is generated and the process is repeated.
To illustrate an exact permutation statistical analysis, examination of a very small dataset is preferred, due to the potentially large number of possible arrangements of the data. Consider the example data listed in Table 2.7 consisting of two independent samples with .n1 = 3 females and .n2 = 4 males convicted of identity theft. The dependent variable is sentencing in years. The next sections will utilize the traditional Student t-test for two independent samples to analyze the data, followed by an alternative permutation test for two independent samples for comparison.

2.7.4 The Neyman–Pearson Population Model

Using the identity-theft data listed in Table 2.7, under the Neyman–Pearson population model with null hypothesis .H0: μ1 = μ2, the sample sizes are .n1 = 3 and .n2 = 4, the total sample size is .N = n1 + n2 = 3 + 4 = 7, the two sample means are

.x¯1

=

1 n1

n1 i=1

x1i

=

3

+4+ 3

6

=

4.3333

(2.43)

42

2 Statistical Methods

and

.x¯2

=

1 n2

n2 i=1

x2i

=

5+7+ 4

8+9

=

7.2500

,

the two sample variances are

(2.44)

.s12

=

1 n1 − 1

n1 i=1

x1i − x¯1 2

=

(3 − 4.3333)2

+ (4 − 4.3333)2 3−1

+ (6 − 4.3333)2

=

2.3333

(2.45)

and

.s22

=

1 n2 − 1

n2 i=1

x2i − x¯2 2

= (5 − 7.2500)2 + (7 − 7.2500)2 + (8 − 7.2500)2 + (9 − 7.2500)2 4−1

= 2.9167 , (2.46)

the pooled estimate of the population variance is

.sp2

=

(n1

− 1)s12 + (n2 N −2

− 1)s22

=

(3

−

1)(2.3333) 7

+ −

(4 2

−

1)(2.9167)

=

2.6833

,

(2.47)

and Student’s two-sample pooled test statistic is

.t =

x¯1 − x¯2

sp2

1+1 n1 n2

1/2 =

4.3333 − 7.2500 (2.6833) 1 + 1

1/2

=

−2.3313

.

34

(2.48)

Under the Neyman–Pearson null hypothesis, .H0: μ1 = μ2, test statistic t is asymptotically distributed as Student’s t with .N − 2 degrees of freedom. With .N − 2 = 7 − 2 = 5 degrees of freedom, the asymptotic two-tail probability value of .t = −2.3313 is .P = 0.0671, under the assumptions of independence, random
sampling, normality, and homogeneity of variance.

2.7 The Fisher–Pitman Permutation Model

43

2.7.5 The Fisher–Pitman Permutation Model

Under the Fisher–Pitman permutation model, there are exactly

.M

=

(n1 + n2)! n1! n2!

=

(3 + 4)! 3! 4!

=

35

(2.49)

possible, equally likely arrangements in the reference set of all permutations of
the sentencing data listed in Table 2.7. Since 35 is a relatively small number, it is possible to list the .M = 35 arrangements in Table 2.8, along with the corresponding values of .|t|, ordered by .|t| values from high (.|t| = 3.8730) to low (.|t| = 0.0000). The arrangements for which permuted .|t| values are equal to or greater than the observed .|t| value are indicated by asterisks; the observed arrangement of values is indicated by an underscore.
Under the Fisher–Pitman permutation model, the exact probability of an observed .|t| value is the proportion of .|t| test statistic values computed on all possible arrangements of the data that are equal to or greater than .|t| = 2.3313. The value of .|t| obtained for the realized data listed in Table 2.7 is unusual since only four of the test statistic values listed in Table 2.8 are equal to or greater than

Table 2.8 All .M = 35 arrangements of the sentencing data listed in Table 2.7 with corresponding pooled Student .|t| values

Number 1* 2* 3* 4* 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Arrangement 3,4,5 6,7,8,9 7,8,9 3,4,5,6 3,4,6 5,7,8,9. a 6,8,9 3,4,5,7 3,4,7 5,6,8,9 3,5,6 4,7,8,9 6,7,9 3,4,5,8 5,8,9 3,4,6,7 3,4,8 5,6,7,9 3,5,7 4,6,8,9 4,5,6 3,7,8,9 4,8,9 3,5,6,7 5,7,9 3,4,6,8 6,7,7 3,4,5,9 3,4,9 5,6,7,8 3,5,8 4,6,7,9 3,6,7 4,5,8,9 3,8,9 4,5,6,7

.|t | 3.8730 3.8730 2.3313 .b 2.3313 1.5811 1.5811 1.5811 1.5811 1.0742 1.0742 1.0742 1.0742 1.0742 1.0742 0.6742 0.6742 0.6742 0.6742

Number 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Arrangement 4,5,7 4,6,8,9 4,7,9 3,5,6,8 5,6,9 3,4,7,8 5,7,8 3,4,6,9 5,6,8 3,4,7,9 3,5,9 4,6,7,8 3,6,8 4,5,7,9 3,7,9 4,5,6,8 4,5,8 3,6,7,9 4,6,7 3,5,8,9 4,6,9 3,5,7,8 4,7,8 3,5,6,9 3,6,9 4,5,7,8 3,7,8 4,5,6,9 4,5,9 3,6,7,8 4,6,8 3,5,7,9 5,6,7 3,4,8,9

.|t | 0.6742 0.6742 0.6742 0.6742 0.3262 0.3262 0.3262 0.3262 0.3262 0.3262 0.3262 0.3262 0.0000 0.0000 0.0000 0.0000 0.0000

* Indicates those arrangements for which .|t| values are equal to or greater than .|t| = 2.3313 a The underscored arrangement in row 3 denotes the observed arrangement of the data b Indicates the realized value of Student’s .|t|

44

2 Statistical Methods

.|t| = 2.3313. The rows containing the highest four .|t| values are indicated with asterisks and the data arrangement yielding the observed value of t is underscored in Table 2.8. If all M arrangements of the data occur with equal chance, the exact probability value of .|t| = 2.3313 under the Fisher–Pitman null hypothesis is

.P (|t| ≥ |to|) =

number of |t| values M

≥ |to|

=

4 35

= 0.1143 ,

(2.50)

where .|to| denotes the observed value of .|t|.

There is a considerable difference between the asymptotic probability value of .P = 0.0671 under the Neyman–Pearson population model and the exact probability value of .P = 0.1143 under the Fisher–Pitman permutation model. The difference is . P = 0.1143−0.0671 = 0.0472. This result is not surprising, however. A continuous mathematical function such as Student’s t cannot be expected to provide a precise ﬁt to .M = 35 values, of which seven values (0.0000, 0.3262, 0.6742, 1.0742, 1.5811, 2.3313, and 3.8730) are unique. Figure 2.2 plots the seven unique values in a histogram, demonstrating the difﬁculty of ﬁtting a continuous mathematical function such as Student’s t to a discrete probability distribution with only seven different values. The darker gray bars represent the four values equal to or greater than the observed value of .|t| = 2.3313.

8

8

6 5
4

2

2

0.0000 0.3262 0.6742 1.0742 1.5811 2.3313 3.8730
Frequencies of the seven unique Student t values
Fig. 2.2 Histogram of the seven unique Student t values from Table 2.7

2.7 The Fisher–Pitman Permutation Model

45

2.7.6 Analyses with Combinations

The arrangements listed in Table 2.8 represent the

.C

N n1

=

n1!

N! (N − n1)!

=

3!

7! (7 − 3)!

=

35

combinations of the .N = 7 observations listed in Table 2.7, not the

(2.51)

.P

N n1

=

(N

N! − n1)!

=

7! (7 − 3)!

=

210

(2.52)

possible permutations. While permutation statistical methods are known by the attribution “permutation methods,” they are generally not based on all permutations of the observed data. Instead, exact permutation methods are based on all possible combinations of arrangements of the observed data. Since, in general, there are fewer combinations than permutations of a dataset, analysis of combinations of the observed data greatly reduces the number of calculations required.
To illustrate the efﬁciency achieved by analyzing all combinations of the observed data instead of all permutations, consider .N = 10 observations that are to be randomized into two groups, A and B, where .nA = nB = 5 observations. Suppose that the purpose is to compare differences between the two groups, such as a mean difference. Let the .nA = 5 observations be designated as a through e and the .nB = 5 observations be designated as f through j . For Group A, the ﬁrst observations can be chosen in 10 different ways, the second observation in nine ways, the third observation in eight ways, the fourth observation in seven ways, and the ﬁfth observation in six ways. Once the ﬁve observations of Group A have been selected, the remaining ﬁve observations are assigned to Group B.
Of the .10 × 9 × 8 × 7 × 6 = 30,240 ways in which the ﬁve observations can be arranged for Group A, each individual quintet of observations will appear in a series of permutations. Thus, the quintet .{a, b, c, d, e} can be permuted as .{a, b, c, e, d}, .{a, b, d, e, c}, .{a, b, d, c, e}, and so on. Each permutation of the ﬁve observations will yield the same mean value. The number of different permutations for a group of ﬁve observations is .5! = 120. Thus, each distinctive quintet will appear in 120 ways among the 30,240 possible arrangements. Therefore, 30,240 divided by 120 yields 252 distinctive quintets of observations that can be formed by dividing .N = 10 observations into two groups of ﬁve observations each. The number of quintets can conveniently be expressed as

(nA + nB )! . nA! nB !

=

(5 + 5)! 5! 5!

=

252

.

(2.53)

However, half of these 252 arrangements are similar, but opposite. Thus, a quintet such as .{a, b, c, d, e} might be assigned to Group A and the quintet .{f, g, h, i, j } might be assigned to Group B, or vice versa, yielding the same absolute mean

46

2 Statistical Methods

difference. Consequently, there are only .252/2 = 126 distinctly different pairs of quintets to be considered. A substantial amount of calculation can be eliminated by considering all possible combinations of arrangements of the observed data instead of all permutations, with no loss of accuracy. Even in this small example, a reduction from 30,240 equally likely arrangements of the observed data to 126 arrangements constitutes a substantial increase in efﬁciency.

2.7.7 A Second Exact Permutation Example

For a second example of an exact permutation analysis, consider a test of differences
between means. Table 2.9 contains the per capita relief expenditures in 1831, in shillings, for .N = 22 parishes in two counties in the United Kingdom: Southampton and Suffolk counties. In 1831, Southampton county consisted of .n1 = 15 parishes with a mean relief expenditure of

1 n1

231.2565

.x¯1 = n1 i=1 x1i =

15

= 15.4171

(2.54)

Table 2.9 Average per capita relief expenditures for Southampton and Suffolk counties in shillings: 1831*

Southampton

Suffolk

Parish Relief

Parish Relief

1

6.731808 1

26.383673

2 16.156615 2

16.727664

3 14.760218 3

27.628032

4 15.057353 4

19.914255

5 11.001482 5

13.833671

6 29.089955 6

33.827534

7 11.818136 7

19.050737

8 16.002180

9 18.761256

10 32.443278

11 15.447992

12 15.756267

13

4.257547

14

8.611310

15 15.361136

* Table 2.9 is adapted from Johnston, Berry, and Mielke, “Quantitative historical methods: A permutation alternative” [99, p. 37]

2.7 The Fisher–Pitman Permutation Model

47

shillings and a sample standard deviation of

.s1 =

1 n1 n1 − 1 i=1

x1i − x¯1 2

1/2

=

(6.731808 − 15.4171)2 + · · · + (15.361136 − 15.4171)2 1/2 15 − 1

= 7.4081 (2.55)

shillings, and Suffolk county consisted of .n2 = 7 parishes with a mean relief expenditure of

1 n2

157.3656

.x¯2 = n2 i=1 x2i =

7

= 22.4808

(2.56)

shillings and a sample standard deviation of

.s2 =

1 n2 n2 − 1 i=1

x2i − x¯2 2

1/2

= (26.383673 − 22.4808)2 + · · · + (19.050737 − 22.4808)2 1/2 7−1

= 7.0321 (2.57)

shillings. For the Southampton and Suffolk county relief data given in Table 2.9, the pooled
estimate of the population variance is

.sp2

=

(n1

− 1)s12 + (n2 N −2

− 1)s22

=

(15 − 1)(7.4081)2 + (7 − 1)(7.0321)2 22 − 2

=

53.2511

,

the value of Student’s two-sample pooled test statistic is

(2.58)

.t =

x¯1 − x¯2

sp2

1+1 n1 n2

1/2 =

15.4171 − 22.4808 (53.2511) 1 + 1 1/2
15 7

= −7.0637 = −2.1147 , 3.3403

(2.59)

48

2 Statistical Methods

and with .n1 + n2 − 2 = 15 + 7 − 2 = 20 degrees of freedom, the two-tail probability value under the null hypothesis is .P = 0.0472.
There are only

.M

=

(n1 + n2)! n1! n2!

=

(15 + 7)! 15! 7!

=

170,544

(2.60)

possible, equally likely arrangements of the Southampton and Suffolk county relief data given in Table 2.9, making an exact permutation analysis feasible. Exactly 9010 .|t| values in the reference set of the .M = 170,544 possible .|t| values are equal to or more extreme than the observed value of .|t| = 2.1147, yielding an exact two-tail probability value under the null hypothesis of

.P (|t| ≥ |to|) =

number of |t| values M

≥ |to|

=

9010 170,544

= 0.0528 ,

(2.61)

where .|to| denotes the observed value of .|t|.

When the assumptions of independence, random sampling, normality, and homogeneity of variance underlying Student’s two-sample t-test have been satisﬁed, the asymptotic and exact probability values often tend to agreement. In this case, the difference between the asymptotic and exact probability values is only . P = 0.0528 − 0.0472 = 0.0056.

2.7.8 Monte Carlo Permutation Methods

As sample sizes increase, the size of the reference set of all possible arrangements of the observed data can become quite large and exact permutation methods are quickly rendered impractical. For example, permuting two independent relatively small samples of sizes .n1 = n2 = 35 yields

.M

=

(n1 + n2)! n1! n2!

=

(35 + 35)! 35! 35!

=

112,186,277,816,662,845,432

(2.62)

equally likely arrangements of the data, or in words, 112 billion billion different arrangements of the data—too many statistical values to compute in a reasonable amount of time.
When exact permutation procedures become intractable, a random subset of all possible arrangements of the data can be analyzed, providing approximate, but highly accurate, probability values. Monte Carlo permutation methods generate and examine a random subset of all possible, equally likely arrangements of the data. As with exact permutation statistics, a test statistic is calculated on the observed data.

2.7 The Fisher–Pitman Permutation Model

49

Then, in lieu of calculating a test statistic value for all possible arrangements of the data, a test statistic is calculated for each randomly selected arrangement of the data.
The probability of obtaining the observed test statistic, or one more extreme, is the proportion of the randomly selected test statistics that are equal to or more extreme than the observed test statistic. Given a sufﬁcient number of randomly selected arrangements of the data, a Monte Carlo probability value can be computed to any reasonable accuracy. Provided the probability value is not too small, the current recommended practice is to use .L = 1,000,000 randomly selected arrangements of the data to ensure a probability value with three decimal places of accuracy. To ensure four decimal places of accuracy, the number of randomly selected arrangements must be increased by two magnitudes of order; that is, .L = 100,000,000 [98].
Meyer Dwass is usually credited with the formal development of Monte Carlo permutation methods, ﬁrst presented in an article on “Modiﬁed randomization tests for nonparametric hypotheses” published in The Annals of Mathematical Statistics in 1957 [48]. Dwass provided the ﬁrst rigorous investigation into the accuracy of Monte Carlo probability approximations, although Dwass relied heavily on the theoretical contributions of an article titled “On the theory of some non-parametric hypotheses” by Erich Lehmann and Charles Stein published in The Annals of Mathematical Statistics in 1949 [125].
Presently, Monte Carlo permutation methods are the method of choice for most researchers, with exact permutation tests reserved for smaller datasets. There are three notable advantages to Monte Carlo permutation tests. First, Monte Carlo permutation tests are highly efﬁcient given the ready availability of high-speed computers and the recent development of rapid pseudorandom number generators such as the Mersenne Twister, on which Monte Carlo permutation tests are highly dependent. Second, in some applications a Monte Carlo permutation test is much more efﬁcient than an exact permutation test, even for small samples. For example, in the permutation analysis of contingency tables, an exact permutation test must calculate a hypergeometric point probability value for each of, potentially, thousands of cell frequency arrangements, while a Monte Carlo permutation test need only count the number of cell arrangements as extreme or more extreme than the observed cell arrangement. Third, algorithms for exact permutation tests are nonexistent or completely impractical for analyzing certain problems, such as multi-way contingency tables, while an efﬁcient Monte Carlo algorithm is presently available for multi-way tables [153].

50
2.7.9 A Monte Carlo Permutation Example

2 Statistical Methods

To illustrate a Monte Carlo permutation analysis, consider Table 2.10 containing the per capita relief expenditures in 1831, in shillings, for .N = 36 parishes in two counties in the United Kingdom: Oxford and Hertford counties. In 1831, Oxford county consisted of .n1 = 24 parishes with a mean relief expenditure of

.x¯1

=

1 n1

n1 i=1

x1i

=

486.6384 24

=

20.2766

(2.63)

Table 2.10 Average per capita relief expenditures for Oxford and Hertford counties in shillings: 1831*

Oxford

Hertford

Parish Relief

Parish Relief

1 20.361860 1 27.974783

2 29.086095 2

6.417284

3 14.931757 3 10.484120

4 24.123211 4 10.005750

5 18.207501 5

9.769865

6 20.728732 6 15.866521

7

8.119472 7 19.342360

8 14.020071 8 17.145218

9 18.424789 9 13.134206

10 34.546600 10 10.041964

11 16.092713 11 15.083824

12 24.616592 12

6.398451

13 25.468298

14 12.563194

15 13.278003

16 27.302973

17 29.605508

18 13.613192

19 11.371418

20 21.524807

21 20.940801

22 11.595229

23 18.235469

24 37.880889

* Table 2.10 is adapted from Johnston, Berry, and Mielke, “Quantitative historical methods: A permutation alternative” [99, p. 37]

2.7 The Fisher–Pitman Permutation Model

51

shillings and a sample standard deviation of

.s1 =

1 n1 n1 − 1 i=1

x1i − x¯1 2

1/2

=

(20.361860 − 20.2766)2 + · · · + (37.880889 − 20.2766)2 1/2 15 − 1

= 7.6408 (2.64)

shillings, and Hertford county consisted of .n2 = 12 parishes with a mean relief expenditure of

1 n2

161.6640

.x¯2 = n2 i=1 x2i =

12

= 13.4720

(2.65)

shillings and a sample standard deviation of

.s2 =

1 n2 n2 − 1 i=1

x2i − x¯2 2

1/2

= (27.974782 − 13.4720)2 + · · · + (6.398451 − 13.4720)2 1/2 7−1

= 6.1270 (2.66)

shillings. For the Oxford and Hertford county relief data given in Table 2.10, the pooled
estimate of the population variance is

.sp2

=

(n1

− 1)s12 + (n2 N −2

− 1)s22

=

(24 − 1)(7.6408)2 + (12 − 1)(6.1270)2 36 − 2

=

51.6389

,

the value of Student’s two-sample pooled test statistic is

(2.67)

.t =

x¯1 − x¯2

sp2

1+1 n1 n2

1/2 =

20.2766 − 13.4720 (51.6389) 1 + 1 1/2
24 12

= 6.8046 = +2.6783 , 2.5406

(2.68)

52

2 Statistical Methods

and with .n1 + n2 − 2 = 15 + 7 − 2 = 20 degrees of freedom, the two-tail probability value under the null hypothesis is .P = 0.0113.
There are

.M

=

(n1 + n2)! n1! n2!

=

(24 + 12)! 24! 12!

=

1,251,677,700

(2.69)

possible, equally likely arrangements of the Oxford and Hertford county relief data
given in Table 2.10, making an exact permutation analysis impractical. Based on .L = 1,000,000 random arrangements of the observed data with .n1 = 24 and .n2 = 12 preserved for each arrangement, exactly 8478 of the calculated .|t| values are equal to or more extreme than the realized value of .|t| = 2.6783, yielding a Monte Carlo probability value under the null hypothesis of

.P (|t| ≥ |to|) =

number of |t| values L

≥ |to|

=

8478 1,000,000

= 0.0085 ,

(2.70)

where .|to| denotes the observed value of .|t|.

While an exact permutation analysis is impractical for the Oxford and Hertford county relief data in Table 2.10, it is not impossible. The exact probability value based on all .M = 1,251,677,700 possible arrangements of the observed data is

.P (t

≥ to|H0) =

number of t values M

≥ to

=

10,635,310 1,251,677,700

= 0.0085 .

More precisely, the Monte Carlo probability value, based on 1,000,000 random arrangements of the observed data, is .P = 0.008478 and the exact probability value, based on all 1,251,677,700 possible arrangements of the observed data, is .P = 0.008497. The difference is only . P = 0.008497 − 0.008479 = 0.000019, thereby demonstrating the precision possible with Monte Carlo
permutation methods. In this example analysis, the difference between the asymptotic and exact Carlo probability values is only . P = 0.011315 − 0.008497 = 0.002818.

Summary
Chapter 2 provided deﬁnitions and illustrative examples of the concepts of connections, equivalencies, and relationships. Next, two models of statistical inference were introduced: the population model ﬁrst put forward by Jerzy Neyman and Egon Pearson in 1928 and the permutation model developed by R.A Fisher, R.C. Geary,

Preview of Chap. 3

53

T. Eden, F. Yates, H. Hotelling, M.R. Pabst, and E.J.G. Pitman in the 1920s and 1930s. Exact and Monte Carlo permutation statistical methods were described and compared with conventional parametric tests, and the assumptions of independence, random sampling, normality, homogeneity of variance, and homogeneity of covariance were examined.

Preview of Chap. 3
Chapter 3 describes connections, equivalencies, and relationships relating to onesample tests of null hypotheses. First, Student’s conventional one-sample t-test is described. Second, a permutation one-sample test is presented and the connections linking the two tests are established. An example analysis illustrates the differences in the two approaches and the connections linking the two tests. Third, measures of effect size for one-sample tests are presented for both Student’s one-sample ttest and the permutation one-sample test and the connections linking the various measures are demonstrated. Fourth, Wilcoxon’s nonparametric one-sample signedrank test is introduced for rank-score data and illustrated with an example analysis. A permutation alternative to Wilcoxon’s test is described and the connections linking the two tests are established. Finally, the connections linking a conventional one-sample z-test for proportions and Pearson’s chi-squared goodness-of-ﬁt test are described and illustrated with an example analysis.

Chapter 3
One-Sample Tests

This chapter describes connections, equivalencies, and relationships relating to onesample tests of null hypotheses. Under the Neyman–Pearson model of statistical inference, one-sample tests evaluate a null hypothesis that posits a value for a population mean; for example, .H0: μx = 0 or .H0: μx = 100. One-sample tests are the most elementary of a large family of statistical tests and provide an introduction to the two-sample and multi-sample tests presented in later chapters.
The chapter is organized as follows. First, Student’s conventional one-sample t-test is described. Second, a permutation one-sample test is presented and the connections linking the two tests are established. An example analysis illustrates the differences in the two approaches and the connections linking the two methods. Third, measures of effect size for one-sample tests are presented for Student’s one-sample t-test and a permutation one-sample test. The connections linking the various measures of effect size are described. Fourth, a second abbreviated example illustrates the connections linking Student’s t-test statistic, the permutation test statistic, and three measures of effect size: Cohen’s .dˆ, Pearson’s .r2, and Mielke and Berry’s . . Fifth, Wilcoxon’s signed-rank test is introduced and illustrated with a small dataset. A permutation alternative to Wilcoxon’s signed-rank test is described and the connections linking the two tests are established. Finally, the connections linking a conventional z-test for proportions and Pearson’s chi-square test of goodness of ﬁt are described and illustrated with an example analysis.

3.1 Introduction
In many introductory textbooks, two one-sample tests are described: a one-sample z-test and a one-sample t-test. The distinction between the two tests is based on the well-known relationship between Student’s t-distribution and the unit-normal z-distribution: Student’s t-distribution approaches the z-distribution as degrees of

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

55

K. J. Berry, J. E. Johnston, Statistical Methods: Connections, Equivalencies,

and Relationships, https://doi.org/10.1007/978-3-031-41896-9_3

56

3 One-Sample Tests

freedom (df ) .→ ∞. The difference between a one-sample z-test and a one-sample t-test is based on knowledge of the population standard deviation, .σx. If .σx is known, then the sampling distribution of sample means is approximated by the
z-distribution, but if .σx is not known and is estimated by the sample standard deviation, .sx, then the sampling distribution of sample means is approximated by Student’s t-distribution with df given by .N − 1, where N denotes the sample size.

Bowing to convention, in this book the population standard deviation for variable x is denoted by .σx and the sample standard deviation is denoted by .sx. Technically,

.sx =

1N N − 1 i=1

xi − x¯ 2

1/2

is the estimated population standard deviation, sometimes denoted by .σˆx, especially in the older literature. Most modern textbooks designate .sx as the sample standard deviation, so the authors follow this convention in the book. Occasionally, the true sample standard deviation given by

.Sx =

1N N i=1

xi − x¯ 2

1/2

is used for permutation methods where the sum of squared deviations is divided by N instead of .N − 1 and is identiﬁed in the text by an upper-case letter S. The reason that the sum of squared deviations is divided by N instead of .N − 1 is because degrees of freedom are not relevant to permutation statistical methods.

The one-sample test statistics for z and Student’s t are given by

.z = x¯ − μx and t = x¯ − μx ,

σx¯

sx¯

(3.1)

respectively, where .x¯ denotes the sample mean, .μx denotes the hypothesized population mean, and

.σx¯ = √σx and sx¯ = √sx

N

N

(3.2)

3.2 Student’s One-Sample t Test

57

denote the population standard error of .x¯ and the estimated population standard error of .x¯, respectively. The single source of variability for test statistic z is the sample mean (.x¯) as .μx, .σx, and N are constants. In contrast, there are two sources of variability for test statistic t: the sample mean (.x¯) and the estimated population standard deviation (.sx). Thus, the t-distribution typically exhibits more variability than the z-distribution and tends to heavier tails.
In this book on statistical connections and relationships, one-sample z-tests
are not presented as the population standard deviation is generally not known
for statistical data pertaining to experiments, clinical trials, ﬁeld research, and surveys. The one-sample z-test ﬁnds its application, most generally, in testing and measurement, where .σx is usually known or otherwise established, such as IQ, SAT, ACT, GMAT, and other standardized tests.
The 10 sections of Chap. 3 present Student’s conventional one-sample t-test and a
permutation alternative to Student’s test, appropriate measures of effect size for one-
sample tests, the calculation of both exact and Monte Carlo probability values for
these tests, conventional and permutation statistical methods applied to one-sample rank-score data, and a comparison of z and .χ 2 tests for proportion or percentage
data. Section 3.2 describes Student’s conventional one-sample t-test. Section 3.3
presents a permutation alternative to Student’s one-sample test. Section 3.4 describes connections linking Student’s one-sample t-test and a permutation one-
sample alternative. Section 3.5 illustrates a permutation analysis of a one-sample
test and the construction of an exact probability value.
Measures of effect size express the clinical or substantive signiﬁcance of a
difference between a sample statistic (often a mean) and a hypothesized population
parameter, in contrast to the statistical signiﬁcance of the difference. Section 3.6 describes three measures of effect size for one-sample tests: Cohen’s .dˆ, Pearson’s .r2, and Mielke and Berry’s . . Section 3.7 illustrates the connections linking Student’s one-sample t-test and a permutation one-sample test, and the connections linking effect-size measures .dˆ, .r2, and . . Section 3.8 provides further details of the connections linking Cohen’s .dˆ, Pearson’s .r2, and Mielke and Berry’s
. . Section 3.9 introduces permutation statistical methods for rank-score data,
connecting Wilcoxon’s one-sample signed-rank tests with a permutation one-sample rank-score test. Section 3.10 presents a comparison between test statistics z and Pearson’s .χ 2 for testing sample proportions or percentages under both the Neyman–
Pearson population model and the Fisher–Pitman permutation model and details the connections linking z and Pearson’s .χ 2 for one-sample tests.

3.2 Student’s One-Sample t Test
One-sample tests of hypotheses are an important component of contemporary research. The conventional one-sample test is Student’s t-test wherein the null hypothesis (.H0) posits a value for a population parameter, such as a population

58

3 One-Sample Tests

mean, from which a random sample has been drawn, i.e., .H0: μx = θ , where .θ is a speciﬁed population parameter. For example, the null hypothesis might state that the
average IQ score in the population from which a sample has been randomly drawn is .H0: μx = 100. The test does not determine whether or not the null hypothesis is true, but only provides the probability that, if the null hypothesis is true, the sample
has been randomly drawn from a population with the speciﬁed value. A null hypothesis such as given by .H0: μx = θ with an alternative hypothesis
given by .H1: μx = θ deﬁnes a nondirectional, two-sided, or two-tail test wherein under the Neyman–Pearson population model, rejection regions corresponding to
.α/2 are deﬁned for each tail of the distribution. Alternatively, directional, one-sided, or one-tail null and alternative hypotheses may be deﬁned: .H0:μx ≤ θ and .H1:μx > θ or .H0: μx ≥ θ and .H1: μx < θ , wherein the rejection region corresponding to .α is deﬁned for only one tail of the distribution—the right tail of Student’s t-distribution for alternative hypothesis .H1: μx > θ and the left tail of Student’s t-distribution for alternative hypothesis .H1: μx < θ . In this book, nondirectional or two-sided tests are employed almost exclusively as two-sided tests are most common in the
contemporary literature.
For a one-sample test with N observations, Student’s t-test statistic is given by

x¯ − θ

.t =

,

sx¯

(3.3)

where .θ denotes a value for a hypothesized population mean, i.e., .H0: μx = θ , .x¯ denotes the arithmetic mean of the observed sample values given by

.x¯ = 1 N

N

xi ,

i=1

(3.4)

.xi denotes the ith sample value for .i = 1, . . . , N , .sx¯ denotes the sample-estimated standard error of .x¯ given by

.sx¯ = √sx , N

(3.5)

and .sx denotes the sample standard deviation of variable x given by

.sx =

1N N − 1 i=1

xi − x¯ 2

1/2
.

(3.6)

The assumptions underlying Student’s one-sample t-test are as follows: (1) The observations are independent, (2) the data are a random sample from a well-deﬁned population, and (3) target variable x is normally distributed in the population.

3.3 A Permutation One-Sample Test

59

3.3 A Permutation One-Sample Test

Consider a one-sample test under the Fisher–Pitman permutation model of statistical inference. A permutation alternative to Student’s conventional one-sample t-test based on paired differences between sample values is easily deﬁned [15, pp. 103– 108]. Let .xi denote the sample values for .i = 1, . . . , N . Permutation test statistic .δ for a one-sample test is given by

.δ =

N 2

−1 N −1 N
xi − xj v ,
i=1 j =i+1

(3.7)

where the scaling function is set to .v = 2 for correspondence with Student’s onesample t-test. Test statistic .δ is the principal test statistic underlying a large variety of permutation statistical methods. Statistic .δ provides an average of paired values raised to the v power, where v is typically set to 1 for ordinary scaling or set to 2 for squared scaling, as described in Chap. 2.
Under the Fisher–Pitman null hypothesis, the exact probability of a .δ is the proportion of .δ test statistic values calculated on all possible arrangements of the data that are equal to or less than .δ, i.e.,

.P (δ ≤ δo|H0) =

number of δ values M

≤ δo

,

(3.8)

where .δo denotes the observed value of permutation test statistic .δ and M represents the number of possible, equally likely arrangements of the data. In words, the
probability of obtaining a test statistic value that is equal to or less than .δo under the null hypothesis (.H0) is the proportion of the M possible .δ values equal to or less than .δo. For a one-sample test, the number of possible, equally likely arrangements of the data is given by

.M = 2N−1 .

(3.9)

For example, given a random sample of .N = 30 observations, the number of possible arrangements of the data is

.M = 2N−1 = 230−1 = 536,870,912 .

(3.10)

When M is large, as in Eq. (3.10), an approximate probability value for test statistic .δ may be obtained from a Monte Carlo permutation procedure, i.e.,

.P

δ ≤ δo|H0

= number of δ values L

≤ δo ,

(3.11)

where L denotes the number of randomly sampled test statistic values. Typically, L is set to a large number to ensure accuracy, e.g., .L = 1,000,000 [98].

60
3.4 Connections Linking Statistics t and δ

3 One-Sample Tests

Student’s one-sample t-test statistic and permutation test statistic .δ are directly connected. However, whenever a one-sample t-test posits a value for the population mean other than zero, comparing Student’s test statistic to a permutation test statistic requires a correction to compensate for the hypothesized mean value. Thus, consider Student’s one-sample t-test given by

x¯ − θ

.t =

.

sx¯

(3.12)

Then,

.t 2

=

(x¯

− θ )2 sx2¯

=

x¯ 2

− 2x¯θ sx2¯

+ θ2

=

x¯ 2 sx2¯

−

θ (2x¯ − θ ) sx2¯

,

(3.13)

where the term .x¯2/sx2¯ represents Student’s squared test statistic when .θ = 0. Let C represent the correction factor given by the last term in Eq. (3.13), i.e.,

θ (2x¯ − θ )

.C =

sx2¯

.

(3.14)

Then, under the Neyman–Pearson null hypothesis, .H0: μx = θ , with .θ = 0 and .v = 2, the connections linking permutation test statistic .δ and Student’s one-sample
t-test statistic are given by

N

2 xi2

.δ

=

t2

+

i=1
N−

1

+

C

(3.15)

and

.t =

2 δ

N

xi2 − N + 1 − C

1/2

.

i=1

(3.16)

Note that permutation test statistic .δ and Student’s one-sample t-test statistic are inversely related, as is evident in Eqs. (3.15) and (3.16). Thus, as test statistic t increases, .δ decreases, and vice versa.

3.5 Test Statistics t and δ

61

3.5 Test Statistics t and δ

In this section a permutation analysis for a one-sample test is detailed step by step to illustrate the calculation of permutation test statistic .δ and the exact probability of .δ. An example analysis serves to illustrate the connections linking test statistics t and .δ for a one-sample test. Illustrations of statistical relationships are best accomplished with very small datasets. The small datasets analyzed in this book are designed to demonstrate the connections and relationships linking tests and measures and any differences in results should be taken as illustrative. Moreover, the small datasets employed throughout the various chapters of the book oftentimes permit the listing of all permutations of the data, enabling detailed illustrations of exact permutation statistical methods.

3.5.1 A Conventional Analysis

Consider the small dataset with .N = 6 observations such as given in Table 3.1 and let the null hypothesis be .H0: μx = 3.00. For the example data listed in Table 3.1 under the Neyman–Pearson population model of statistical inference, the sample
mean of variable x is

.x¯

=

1 N

N
xi
i=1

=

9+8+7+6+4+2 6

=

6.00

,

(3.17)

the sample standard deviation of variable x is

.sx =

1N N − 1 i=1

xi − x¯ 2

1/2

(9 − 6.00)2 + (8 − 6.00)2 + · · · + (2 − 6.00)2 1/2

=

6−1

= 2.6077 ,

(3.18)

the sample-estimated standard error of .x¯ is

.sx¯ = √sx = 2.√6077 = 1.0646 ,

N

6

(3.19)

Table 3.1 Example dataset for a one-sample test with .N = 6 observations

Variable

Scores

x

9

8

7

6

4

2

62

3 One-Sample Tests

Student’s one-sample t-test statistic is

.t = x¯ − μx = 6.00 − 3.00 = +2.8180 ,

sx¯

1.0646

(3.20)

and following Eq. (3.14), the correction factor is

.C

=

μx (2x¯ − μx) sx2¯

=

(3.00)[(2)(6.00) − 3.00] (1.0646)2

=

23.8227

.

(3.21)

Under the Neyman–Pearson null hypothesis, .H0: μx = 3.00, test statistic t is asymptotically distributed as Student’s t with .N − 1 degrees of freedom. With .N − 1 = 6 − 1 = 5 degrees of freedom, the two-tail probability value of .t = +2.8180 is .P = 0.0372, under the assumptions of independence, random sampling, and
normality.

Technically, under the Neyman–Pearson population model of statistical inference, probability values have no place. As Hubbard noted:
The p value is not a Type I error rate, long-run or otherwise; it is a measure of inductive evidence against .H0. Type I errors play no role in Fisher’s paradigm. [95]
In the Neyman–Pearson decision model, the researcher is only allowed to say whether or not the result fell in the critical region, not where it fell, as might be indicated by a probability value [95, p. 309]. Thus, if the Type I error rate, .α, is ﬁxed at its usual 0.05 value, the study is carried out, and the researcher subsequently obtains a probability value of, say, .P = 0.0014; this exact value cannot be reported in a Neyman–Pearson hypothesis test [165]. As Gigerenzer noted:
For Fisher, the exact level of signiﬁcance is a property of the data (i.e., a relation between a body of data and a theory); for Neyman and Pearson, alpha is a property of the test, not of the data. Level of signiﬁcance [p value] and alpha are not the same thing. [74, p. 317]
A proper analysis of the data listed in Table 3.1 would, for example, provide a null hypothesis, .H0: μx = 3.00; an alternative hypothesis, .H1: μx = 3.00; a probability of Type I error, .α = 0.05; degrees of freedom, .N = 1 = 6 − 1 = 5; and a critical value, .±2.571. Since the observed t-value is .t = +2.8180, the null hypothesis would be rejected, i.e., .2.8180 > 2.571. In breaking with the Neyman–Pearson model, computer statistical packages provide probability values instead of alpha levels.

3.5 Test Statistics t and δ

63

Following modern convention, the authors have chosen to provide probability values under the Neyman–Pearson population model for two reasons. First, it is convenient to compare probability values under the Neyman– Pearson population model and the Fisher–Pitman permutation model. Second, while permutation methods are considered the gold standard for evaluating conventional statistical analyses, as the reader will see, conventional methods compare favorably throughout the book for a large variety of statistical tests and measures, even with the very small samples utilized in many of the example analyses.

3.5.2 A Permutation Analysis

For the data listed in Table 3.1 with .N = 6 observations, under the Fisher–Pitman permutation model, employing squared scaling with .v = 2 for correspondence with Student’s one-sample t-test, the sum of the squared absolute differences between all pairs of observations is

N −1 N

.

xi − xj 2 = 9 − 8 2 + 9 − 7 2 + 9 − 6 2

i=1 j =i+1

+ 9 − 4) 2 + 9 − 2 2 + 8 − 7 2 + 8 − 6) 2 + 8 − 4 2

+ 8−2 2+ 7−6 2+ 7−4 2+ 7−2 2+ 6−4 2

+ 6 − 2 2 + 4 − 2 2 = 204

(3.22)

and following Eq. (3.7) on p. 59 with scaling function .v = 2, permutation test statistic .δ is

N −1 N−1 N .δ =
2

xi − xj 2 =

6 −1 (204)
2

i=1 j =i+1

=

(2)(204) (6)(6 − 1)

=

13.60

.

(3.23)

Alternatively, permutation test statistic .δ can be deﬁned in terms of the sum of squared deviations from the mean of the observations when scaling function .v = 2

64

3 One-Sample Tests

is adopted. Thus, with sample mean .x¯ = 6.00 as given in Eq. (3.17), permutation test statistic .δ is

. δ

=

N

2 −1

N i=1

xi − x¯ 2

2 |9 − 6.00|2 + |8 − 6.00|2 + |7 − 6.00|2 + · · · + |2 − 6.00|2 =
6−1

= (2)(34.00) = 13.60 . 5

(3.24)

3.5.3 Connections Linking Statistics δ and t

Following the expressions given in Eqs. (3.13) on p. 60 and (3.14) on p. 60 for the connections linking test statistics .δ and t, permutation test statistic .δ, deﬁned in terms of Student’s one-sample t-test statistic, is

N

.δ

=

t2

2 xi2
i=1
+N −1+C

=

(2)(92 + 82 + 72 + 62 + 42 + 22) (+2.8180)2 + 6 − 1 + 23.8227

=

13.60

(3.25)

and Student’s one-sample t-test statistic, deﬁned in terms of permutation test statistic .δ, is

.t =

2 δ

N

xi2 − N + 1 − C

1/2

i=1

(2)(92 + 82 + 72 + 62 + 42 + 22)

1/2

=

− 6 + 1 − 23.8227

13.60

= ±2.8180 .

(3.26)

Because of the inverse relationship between test statistics .δ and t, as detailed in Eqs. (3.15) and (3.16), the probability value for permutation test statistic .δ is taken from the lower (left) tail of the permutation distribution and the probability value for .|t| is taken from both tails of Student’s t-distribution. Thus, the probability values given by

.P (δ ≤ δo) =

number of δ values M

≤ δo

(3.27)

3.5 Test Statistics t and δ

65

and

.P (|t| ≥ |to|) =

number of |t| values M

≥ |to|

(3.28)

are equivalent under the Fisher–Pitman null hypothesis, where .δo and .|to| denote the observed values of test statistics .δ and .|t||, respectively, and M is the number of possible, equally likely arrangements of the data.
To establish the exact probability of .δ = 13.60 (or .t = +2.8180) under the Fisher–Pitman permutation model, it is necessary to enumerate all possible
arrangements of the data, of which there are only

.M = 2N−1 = 26−1 = 32

(3.29)

possible, equally likely arrangements in the reference set of all permutations of the data.
For test statistic .δ, there is only one test statistic value that is equal to or less than .δ = 13.60. Then, if all M arrangements of the .N = 6 observations occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability of .δ = 13.60 is

.P (δ

≤ δo) =

number of δ values M

≤ δo

=

1 32

= 0.0313 ,

(3.30)

where .δo denotes the observed value of test statistic .δ and M is the number of possible, equally likely arrangements of the data, as illustrated in Table 3.2 where the row containing the lowest (highest) .δ (.|t|) value is indicated with an asterisk and the data arrangement yielding the lowest .δ value and the highest .|t| value is underscored.
Alternatively, for test statistic t, there is only one .|t| value that is equal to or greater than .|t| = 2.8180. Then, if all .M = 32 arrangements of the .N = 4 observations occur with equal chance, the exact probability value under the Fisher–
Pitman null hypothesis is

.P (|t| ≥ |to|) =

number of |t| values M

≥ |to|

=

1 32

= 0.0313 ,

where .|to| denotes the observed value of test statistic .|t|.

(3.31)

When the assumptions of independence, random sampling, and normality
underlying Student’s one-sample t-test have been satisﬁed, the asymptotic and
exact probability values often tend to agreement. In this case, the difference between the asymptotic and exact probability values is quite small, i.e., . P = 0.0372 − 0.0313 = 0.0059.

66

3 One-Sample Tests

Table 3.2 Calculation of .δ and .|t| values for .x1 = 9, .x2 = 8, .x3 = 7, .x4 = 6, .x5 = 4, and .x6 = 2

Number 1.∗ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

x
.+1 .+1 .+1 .+1 .+1 .+1 .+1 .+1 .+1 .+1 .+1 .−1 .+1 .+1 .+1 .+1 .−1 .+1 .+1 .+1 .+1 .−1 .+1 .+1 .+1 .+1 .+1 .+1 .−1 .−1 .+1 .+1 .−1 .+1 .+1 .+1 .+1 .−1 .+1 .+1 .+1 .+1 .+1 .+1 .+1 .−1 .+1 .−1 .−1 .+1 .+1 .+1 .+1 .+1 .+1 .+1 .−1 .+1 .+1 .−1 .+1 .+1 .+1 .−1 .−1 .+1 .+1 .−1 .+1 .+1 .+1 .−1 .+1 .+1 .−1 .+1 .−1 .+1 .−1 .+1 .+1 .+1 .+1 .−1 .−1 .−1 .−1 .+1 .+1 .+1 .−1 .+1 .−1 .−1 .+1 .−1 .−1 .−1 .+1 .−1 .+1 .+1 .−1 .+1 .+1 .+1 .−1 .+1 .−1 .−1 .+1 .+1 .−1 .−1 .+1 .−1 .+1 .−1 .+1 .+1 .+1 .−1 .+1 .+1 .−1 .−1 .+1 .−1 .−1 .+1 .+1 .+1 .+1 .−1 .−1 .−1 .+1 .+1 .+1 .+1 .−1 .−1 .+1 .−1 .+1 .−1 .−1 .+1 .−1 .−1 .−1 .+1 .−1 .+1 .+1 .+1 .+1 .−1 .+1 .−1 .+1 .−1 .−1 .−1 .+1 .+1 .+1 .+1 .−1 .+1 .+1 .−1 .−1 .+1 .−1 .−1 .+1 .+1 .+1 .−1 .−1 .+1 .+1 .−1 .+1 .−1 .−1 .+1 .−1 .+1 .+1 .−1

y .+9 + 8 + 7 + 6 + 4 + 2 a .+9 .+8 .+7 .+6 .+4 .−2 .+9 .+8 .+7 .+6 .−4 .+2 .+9 .+8 .+7 .−6 .+4 .+2 .+9 .+8 .+7 .−6 .−4 .+2 .+9 .+8 .−7 .+6 .+4 .+2 .+9 .−8 .+7 .+6 .+4 .+2 .+9 .+8 .+7 .−6 .+4 .−2 .−9 .+8 .+7 .+6 .+4 .+2 .+9 .+8 .−7 .+6 .+4 .−2 .+9 .+8 .+7 .−6 .−4 .+2 .+9 .−8 .+7 .+6 .+4 .−2 .+9 .+8 .−7 .+6 .−4 .+2 .−9 .+8 .+7 .+6 .+4 .−2 .−9 .−8 .−7 .+6 .+4 .+2 .−9 .+8 .−7 .−6 .+4 .−2 .−9 .−8 .+7 .−6 .+4 .+2 .−9 .+8 .+7 .+6 .−4 .−2 .−9 .−8 .+7 .+6 .−4 .−2 .+9 .+8 .+7 .−6 .+4 .+2 .+9 .−8 .+7 .+6 .−4 .−2 .+9 .−8 .−7 .+6 .+4 .+2 .+9 .−8 .−7 .−6 .+4 .+2 .+9 .+8 .−7 .−6 .+4 .−2 .+9 .−8 .−7 .+6 .−4 .−2 .−9 .+8 .−7 .+6 .+4 .−2 .+9 .−8 .+7 .−6 .+4 .−2 .−9 .−8 .+7 .+6 .+4 .+2 .−9 .+8 .+7 .−6 .−4 .+2 .−9 .−8 .+7 .+6 .+4 .−2 .−9 .+8 .+7 .+6 .+4 .−2 .−9 .+8 .−7 .+6 .+4 .−2

.δ 13.6000 31.7333 47.7444 61.6000 61.6000 67.7333 73.3333 73.3333 78.4000 78.4000 82.9333 82.9333 86.9333 86.9333 90.4000 90.4000 93.3333 93.3333 93.3333 95.7333 95.7333 97.6000 97.6000 97.6000 97.6000 98.9333 98.9333 99.7333 99.7333 99.7333 99.7333
100.0000

.|t | 2.8180 1.4349 0.8357 0.4414 0.4414 0.2806 0.1348 0.1348 0.0000 0.0000 0.1268 0.1268 0.2477 0.2477 1.8217 1.8217 1.6733 0.4781 1.6733 0.5901 0.5901 0.7013 1.4026 0.7013 1.4026 0.8126 0.8126 0.9250 1.1562 1.1562 0.9250 1.0392

* Indicates the arrangement for which the permuted .δ values are equal to or greater than the
observed .δ value and the permuted .|t| values are equal to or less than the observed .|t| value a The underscored arrangement in row 1 indicates the observed arrangement of scores

3.6 The Measurement of Effect Size

67

3.6 The Measurement of Effect Size

The fact that a statistical test produces a low probability value indicates only that there are differences among the variables that (possibly) cannot be attributed to error. The obtained probability value does not indicate whether or not these differences possess any practical value. For a one-sample test, measures of effect size express the practical, substantive, or clinical signiﬁcance of a difference between the sample mean and the hypothesized population mean, as contrasted with the statistical signiﬁcance of the difference. Measures of effect size have become increasingly important in recent years as they index the magnitude of a treatment effect and indicate the practical signiﬁcance of the research. Over the years, more and more editors of academic research journals have required the reporting of measures of effect size prior to publication. As Kirk noted many years ago, a test statistic and its associated probability value provide no information as to the size of treatment effects, only whether or not the effects are statistically signiﬁcant [114, p. 135]. It was American psychologists who spearheaded the reporting of effect sizes in academic journals. Consequently, a number of editors of academic journals began requiring measures of effect size as a condition of publication.
Statisticians and quantitative methodologists have raised a number of issues and concerns with null hypothesis statistical testing (NHST). There are now literally hundreds of articles, chapters, editorials, and blogs dealing with the problems of NHST, far too many to be summarized here. However, a brief overview of the limitations of null hypothesis testing will sufﬁce for these purposes.
First, the null hypothesis is almost never literally true, so rejection of the null hypothesis is relatively uninformative. Second, tests of signiﬁcance are highly dependent on sample size. When sample sizes are small, important effects can be nonsigniﬁcant, and when sample sizes are large, even trivial effects can produce very small probability values. Third, the requirement of obtaining a random sample from a well-deﬁned population is seldom met in practice. Fourth, the assumption of normality is never satisﬁed in real-data situations.

3.6.1 The d Family
Three types of measures of effect size have been advanced to represent the magnitude of treatment effects for one-sample tests [179]. One type, designated the d family, is based on one or more measures of the differences among the samples or among levels of an independent variable. Representative of the eponymous d family is Cohen’s .dˆ, which measures the effect size by the number of standard deviations separating the sample mean from the hypothesized population mean for a one-sample test and may also be deﬁned in terms of Student’s t-test statistic.

68

3 One-Sample Tests

In this book Cohen’s measure of effect size is denoted by the letter d with a caret overscore (.dˆ ) to distinguish Cohen’s measure of effect size from a
simple d indicating a difference between values. Some authors denote Cohen’s
measure by a bold d.

Convenient expressions for Cohen’s measure of effect size for a one-sample test in terms of Student’s one-sample t-test statistic are given by

.dˆ = t2 1/2 and dˆ = √t ,

N

N

(3.32)

where .t2 is Student’s squared one-sample t-test statistic.1 Cohen suggested that when .dˆ ≤ 0.20, the effect size should be considered “small”; when .0.20 < dˆ <
0.80, the effect size should be considered “medium” or “moderate”; and when .dˆ ≥ 0.80, the effect size should be considered “large” [41], although these values
are not universally accepted [144].

3.6.2 The r Family

The second type of measure of effect size, designated the r family, represents some
sort of relationship that exists among variables. Measures of effect size in the r
family are typically measures of correlation or association, the most prominent being Pearson’s squared product-moment correlation coefﬁcient, .r2. Since .0 ≤ r2 ≤ 1, many researchers ﬁnd Pearson’s .r2 measure easily interpretable. In addition, .r2
can be deﬁned in terms of Student’s one-sample t-test statistic. The usual expression for Pearson’s .r2 measure of effect size for a one-sample test, deﬁned in terms of
Student’s one-sample t-test statistic, is given by

.r 2

=

t2

t2 +N

−

1

.

(3.34)

Cohen suggested that when .r2 ≤ 0.09, the effect size should be considered “small”; when .0.09 < r2 < 0.25, the effect size should be considered “medium” or

1 Another common expression for Cohen’s measure of effect size for Student’s one-sample t-test is

.dˆ = x¯ − μx , sx

(3.33)

where .x¯ denotes the sample mean, .sx denotes the sample standard deviation, and .μx denotes the hypothesized population mean.

3.6 The Measurement of Effect Size

69

“moderate”; and when .r2 ≥ 0.25, the effect size should be considered “large” [41]. It should be noted that .r2 has come under substantial criticism as a measure of effect
size; see discussions by D’Andrade and Dart [45], Ozer [166], and Rosenthal and
Rubin [180, 181].

3.6.3 The Family

The third type of measure of effect size, designated the . family, represents chance-
corrected measures of effect size, sometimes termed “improvement-over-chance” measures of effect size [88]. For example, . = +0.60 is interpreted as 60% agreement among the observations above that expected by chance. If all .N = 6 observations in Table 3.1 possessed the same value, . would be .+1.00. Chance-
corrected measures of effect size possess signiﬁcant advantages in both application and interpretation over conventional measures of effect size such as Cohen’s .dˆ and Pearson’s .r2.

Chance-corrected measures yield values that are interpreted as a proportion above that expected by chance. Chance-corrected agreement measures provide clear and meaningful interpretations of the amount of, or lack of, agreement present in the data. In general, chance-corrected measures of agreement are equal to .+1 when perfect agreement among the measurements occurs, 0 when agreement is equal to that expected under independence, and negative when agreement among the measurements is less than expected by chance. For example, deﬁne a chance-corrected measure such that

.Ai = 100

Oi − Ei N − Ei

,

i = 1, . . . , m ,

where .Oi and .Ei denote the observed (earned) and expected (chance) scored from purely guessing, respectively, on a multiple-choice examination with N questions for the ith student in a class of m students.
Thus, on a 50-question multiple-choice examination with ﬁve choices per question, chance would indicate that a student could answer .50 × 0.20 = 10 questions correctly simply by guessing. If a student answered 40 questions correctly, then a chance-corrected measure of agreement would yield a grade of

.A = 100

40 − 10 50 − 10

= 100

30 40

= 75 .

70

3 One-Sample Tests

If a student answered all 50 questions correctly, a chance-corrected measure of agreement would yield a grade of

.A = 100

59 − 10 50 − 10

= 100

40 40

= 100 .

If a student answered only eight questions correctly, a chance-corrected measure of agreement would yield a grade of

8 − 10

−2

.A = 100 50 − 10 = 100 40 = −5

since the number of questions answered correctly (8) was less than expected by chance (10). The lowest grade would occur when a student answered all 50 questions incorrectly, yielding a score of

.A = 100

0 − 10 50 − 10

= 100

−10 40

= −25 .

Note that the lowest possible score is .−25, not .−100. Thus, distributions of chance-corrected measures are usually asymmetric.

The three types of measures represent different approaches to the measurement
of effect size. For one-sample tests under the Neyman–Pearson population model, Cohen’s .dˆ expresses effect size in terms of the number of standard deviations separating .x¯ and .μx, Pearson’s .r2 expresses effect size as the proportion of total variability attributable to treatment, and Mielke and Berry’s . expresses effect
size as the proportion of within-sample agreement above that expected by chance. However, Cohen’s .dˆ, Pearson’s .r2, and Mielke and Berry’s . are highly related and
can be deﬁned in terms of each other.

3.7 A Second Example Analysis
For a second example analysis of one-sample tests, consider the one-sample data listed in Table 3.3 with observations on a sample of .N = 9 youthful offenders. The x-values represent increases (.+) and decreases (.−) in jail time for bad/good behavior, e.g., number of days added to or taken away from a sentence. The data are deliberately kept small for purposes of illustration.

3.7 A Second Example Analysis

71

Table 3.3 Example of data for a one-sample permutation test with univariate measurements on a sample of .N = 9 youthful offenders

Offender

1

2

3

4

5

6

7

8

9

Time (x)

.+4

.+6

.+6

.−5

.+4

.+3

.+7

.−3

.+4

3.7.1 A Conventional Analysis

For the example data listed in Table 3.3, the hypothesized population mean is .μx = 0, the sample size is .N = 9, the sample mean of variable x is

.x¯

=

1 N

N
xi
i=1

=

4+6+6+···+4 9

=

2.8889

,

the sample standard deviation of variable x is

(3.35)

.sx =

1N N − 1 i=1

xi − x¯ 2

1/2

= (4 − 2.8889)2 + (6 − 2.8889)2 + · · · + (4 − 2.8889)2 1/2 9−1

= 4.1366 ,

(3.36)

the sample-estimated standard error of .x¯ is

.sx¯ = √sx = 4.√1366 = 1.3789 ,

N

9

(3.37)

and Student’s one-sample test statistic for the data given in Table 3.3 is

.t = x¯ − μx = 2.8889 − 0 = +2.0951 .

sx¯

1.3789

(3.38)

Under the Neyman–Pearson null hypothesis, .H0: μx = 0, test statistic t is asymptotically distributed as Student’s t with .N − 1 degrees of freedom. With .N − 1 = 9 − 1 degrees of freedom, the two-tail probability value of .t = +2.0951 is .P = 0.0695, under the assumptions of independence, random sampling, and
normality.

72
3.7.2 An Exact Permutation Analysis

3 One-Sample Tests

For the sample data listed in Table 3.3, permutation test statistic .δ is

N −1 N−1 .δ =
2

N

xi − xj 2

i=1 j =i+1

=

9

−1
(1232) =

(2)(1232)

= 34.2222 ,

2

(9)(9 − 1)

(3.39)

where the double summation in Eq. (3.39) is

N −1 N

.

xi − xj 2 = (+4) − (+6) 2 + (+4) − (+6) 2

i=1 j =i+1

+ (+4) − (−5) 2 + · · · + (−3) − (+4) 2 = 1232 .

(3.40)

Alternatively, permutation test statistic .δ can be deﬁned in terms of the sum of squared deviations from the mean of the observations when scaling function .v = 2 is adopted. Thus, with sample mean

.x¯

=

1 N

N
xi
i=1

=

4+6+6+···−3+4 9

=

2.8889 ,

permutation test statistic .δ is

(3.41)

2N .δ = N − 1

xi − x¯ 2

i=1

(2) | + 4 − 2.8889|2 + | + 6 − 2.8889|2 + · · · + | + 4 − 2.8889|2

=

9−1

= (2)(136.8889) = 34.2222 . 8

(3.42)

The exact expected value of permutation test statistic .δ is

N −1 N−1 .μδ = 2

N

xi2 + xj2 =

9 −1 (1696)
2

i=1 j =i+1

=

(2)(1696) (9)(9 − 1)

=

47.1111

,

(3.43)

3.7 A Second Example Analysis

73

where the double summation in Eq. (3.43) is

N −1 N

.

xi2 + xj2 = [(+4)2 + (+6)2] + [(+4)2 + (+6)2]

i=1 j =i+1

+ [(+4)2 + (−5)2] + · · · + [(−3)2 + (+4)2] = 1696 .

(3.44)

Finally, the chance-corrected measure of effect size is

δ

34.2222

. =1− =1−

= +0.2736 ,

μδ

47.1111

(3.45)

indicating approximately 27% agreement among the .N = 9 values listed in Table 3.3 above that expected by chance.
The connections linking . and Student’s t are easily demonstrated. For the data listed in Table 3.3, permutation test statistic . , deﬁned in terms of Student’s t, is

t2 + C − 1

(+2.0951)2 + 0.00 − 1

. = t2 + C + N − 1 = (+2.0951)2 + 0.00 + 9 − 1

= 3.3894 = +0.2736 12.3897

and Student’s t, deﬁned in terms of permutation statistic . , is

(3.46)

1 + (C + N − 1) − C 1/2

.t =

1−

=

1 + (0.2736)(0.00 + 9 − 1) − 0.00

1/2
= ±2.0951 ,

1 − 0.2736

(3.47)

where the correction factor in this case is

.C

=

μx

2x¯ − μx sx2¯

(0) (2)(2.8889) − 0

=

(1.3789)2

= 0.00 = 0.00 . 1.9014

(3.48)

Permutation test statistic .δ, deﬁned in terms of Student’s t-test statistic, is

N

2 xi2

. δ

=

i=1
t2 + N − 1 + C

=

(2)(212) (+2.0951)2 + 9 − 1 + 0.00

= 424 = 34.2222 12.3897

(3.49)

74

3 One-Sample Tests

and Student’s t-test statistic, deﬁned in terms of permutation test statistic .δ, is

.t =

2 δ

N

xi2 − N + 1 − C

1/2

i=1

=

(2)(212)

− 9 + 1 − 0.00

1/2
= ±2.0951 ,

34.2222

(3.50)

where

N
. xi2 = (+4)2 + (+6)2 + · · · + (+4)2 = 212 .
i=1

(3.51)

Under the Fisher–Pitman null hypothesis, the exact probability of a .δ value is the proportion of .δ test statistic values calculated on all possible arrangements of the data that are equal to or less than .δ, i.e.,

.P (δ ≤ δo|H0) =

number of δ values M

≤ δo

,

(3.52)

where .δo denotes the observed value of permutation test statistic .δ and M represents the number of possible, equally likely arrangements of the data. For the sample data listed in Table 3.3 with .N = 9 observations, there are only

.M = 2N−1 = 29−1 = 256

(3.53)

possible, equally likely arrangements, making an exact permutation analysis feasible. The exact probability is the proportion of the arrangements of the data that are equal to or less than .δ = 34.2222, i.e.,

.P (δ ≤ δo|H0) =

number of δ values M

≤ δo

=

22 256

= 0.0859 .

(3.54)

When the assumptions of independence, random sampling, and normality underlying Student’s two-sample t-test have been satisﬁed, the asymptotic and exact probability values tend to agreement. In this case, the difference between the asymptotic probability value and the exact probability value is only . P = 0.0859 − 0.0695 = 0.0164.

3.7 A Second Example Analysis

75

3.7.3 Connections Linking Effect-Size Measures

Since Cohen’s .dˆ, Pearson’s .r2, and Mielke and Berry’s . measures of effect size can
all be deﬁned in terms of Student’s t-test statistic, the three measures are necessarily
interconnected and can be deﬁned in terms of each other for a one-sample test. Note that Cohen’s .dˆ is a measure based on the number of standard deviations separating the sample mean and the hypothesized population mean, Pearson’s .r2 is a measure
of linear correlation, and Mielke and Berry’s . is a measure of chance-corrected
within-sample agreement—three different but interconnected measures. The connections linking Student’s one-sample t-test statistic and Cohen’s .dˆ
measure of effect size are given by

.t = N dˆ 2 1/2

and dˆ =

t 2 1/2 ,

N

(3.55)

the connections linking Student’s one-sample t-test statistic and Pearson’s .r2 measure of effect size are given by

r2(N − 1) 1/2 .t = 1 − r2

and

r2

=

t2

t2 +N

−1

,

(3.56)

the connections linking Student’s one-sample t-test statistic and Mielke and Berry’s . measure of effect size are given by

1 + (C + N − 1) − C 1/2

.t =

1−

and

=

1

−

t2

+

N C+

N

−

1

,

(3.57)

where the correction factor is given by

.C

=

μx(2x¯ − sx2¯

μx ) ,

(3.58)

the connections linking Cohen’s .dˆ measure of effect size and Pearson’s .r2 measure of effect size are given by

.dˆ =

r2(N − 1) 1/2 N (1 − r2)

and

r2

=

1

−

N −1 N (dˆ 2 + 1)

−

1

,

(3.59)

the connections linking Cohen’s .dˆ measure of effect size and Mielke and Berry’s . measure of effect size are given by

.dˆ =

1+

(C + N − 1) − C 1/2 N(1 − )

and

=

N dˆ 2 N dˆ 2 +

+C C+

− N

1 −

1

,

(3.60)

76

3 One-Sample Tests

and the connections linking Pearson’s .r2 measure of effect size and Mielke and Berry’s . measure of effect size are given by

.r2 =

(N + C − 1) − C + 1 N − C(1 − )

and

=

1

−

N

N (1 −1+

− r2) C(1 −

r2)

.

(3.61)

3.8 Measures of Effect Size for the Observed Data

Measures of effect size express the clinical signiﬁcance of statistical tests, as
contrasted with a statistical test of signiﬁcance, as noted in Sect. 3.6. Three measures
of effect size are commonly used for determining the magnitude of treatment effects for one-sample tests: Cohen’s .dˆ, Pearson’s .r2, and Mielke and Berry’s . . For the example data listed in Table 3.3 on p. 71 with a sample of .N = 9 youthful offenders and null hypothesis .H0: μx = 0, Cohen’s measure of effect size is

.dˆ =

t 2 1/2 =

(+2.0951)2

1/2
= ±0.6984 ,

N

9

(3.62)

indicating a moderate effect size (.0.20 ≤ |dˆ| ≤ 0.80), with approximately 0.70 standard deviations separating .x¯ and .μx, and Pearson’s measure of effect size is

.r 2

=

t2

t2 +N

−

1

=

(+2.0951)2 (+2.0951)2 + 9 −

1

=

0.3543

,

(3.63)

indicating a large effect size (.r2 ≥ 0.25), with approximately 35% of the total variability attributable to treatment.
For Mielke and Berry’s measure of effect size, permutation test statistic .δ is .δ = 34.2222, the exact expected value of .δ is

2

N −1 N

.μδ = N (N − 1)

xi2 + xj2

=

(2)(1696) (9)(9 − 1)

=

47.1111

,

i=1 j =i+1

(3.64)

where the double summation Eq. (3.64) is

N −1 N

.

xi2 + xj2 = [(+4)2 + (+6)2] + [(+4)2 + (+6)2]

i=1 j =i+1

+ [(+4)2 + (−5)2] + · · · + [(−3)2 + (+46)2] = 1696 ,

(3.65)

3.9 Rank-Score Permutation Analyses

77

and the chance-corrected measure of effect size is

δ

34.2222

. =1− =1−

= +0.2736 ,

μδ

47.1111

(3.66)

indicating approximately 27% agreement among the .N = 9 scores listed in Table 3.3 above that expected by chance.

The three measures of effect size often yield very different answers: Cohen’s .dˆ = ±0.6984, Pearson’s .r2 = 0.3543, and Mielke and Berry’s . = +0.2736. The reason is the three measures are in entirely different units. Cohen’s .dˆ is in standard deviation units, as illustrated by the alternative formula for .dˆ presented
in the framed box below. For the youthful offender data listed in Table 3.3,

.dˆ =

t2

1/2
=

x¯ − μx

= 2.8889 − 0.00 = +0.6984 .

N

sx

4.1366

Thus, the sample mean (.x¯) and the population mean (.μx) are approximately 0.70 standard deviations apart, Pearson’s .r2 measure represents the proportion
of the variance attributable to treatment, and Mielke and Berry’s . measure rep-
resents the chance-corrected agreement among the sample values. In general,
measures of agreement, such as . , tend to be more conservative than measures of correlation, such as .r2.

3.9 Rank-Score Permutation Analyses
In contemporary research it is often necessary to analyze rank-score data. Two scenarios manifest themselves. First, the data are collected as ranks where, for example, experienced travelers are asked to rank preferences among hotel chains, airlines, airports, or cruise lines. Second, and more commonly, the raw data are converted to ranks because one or more of the assumptions for Student’s onesample t-test cannot be satisﬁed, with a concomitant loss of information. There is no need for the second scenario with permutation methods as the conventional assumptions underlying Student’s t-test are moot. The conventional approach to one-sample rank-score data under the Neyman–Pearson population model is Wilcoxon’s one-sample signed-rank test, published by Wilcoxon in Biometrics Bulletin in 1945 [211].

78
3.9.1 The Wilcoxon Signed-Rank Test

3 One-Sample Tests

Consider a one-sample rank test for N univariate rank scores under the Neyman– Pearson population model. Wilcoxon’s one-sample signed-rank test statistic is simply the smaller of the sums of the like-signed ranks.2 For convenience, let n denote the number of negative ranks and m denote the number of positive ranks. If the sum of the negative ranks is less than the sum of the positive ranks, Wilcoxon’s test statistic is given by

n
.T = Ri−,
i=1

(3.67)

and if the sum of the positive ranks is less than the sum of the negative ranks, Wilcoxon’s test statistic is given by

m
.T = Ri+ ,
i=1

(3.68)

where .R− and .R+ represent individual negative and positive signed ranks, respectively.
Wilcoxon’s signed-rank test statistic is asymptotically distributed .N (0, 1) under the Neyman–Pearson null hypothesis as .N → ∞. The mean value of test statistic T is given by

N(N + 1)

.μT =

4

,

(3.69)

where N denotes the total number of positive and negative signed ranks, the standard deviation of test statistic T is given by

N (N + 1)(2N + 1) 1/2

.σT =

24

,

(3.70)

and the standard normal deviate of test statistic T is given by

.z = T − μT . σT

(3.71)

2 Some computer packages deﬁne the test statistic as the larger of the sums of the like-signed ranks—the results are the same.

3.9 Rank-Score Permutation Analyses

79

3.9.2 A Permutation Approach

For a permutation analysis of one-sample rank-score data under the Fisher–Pitman permutation model, let .v = 2, employing squared differences between the paired signed ranks for correspondence with Wilcoxon’s signed-rank test. Let .xi denote the observed rank-score values for .i = 1, . . . , N ; then permutation test statistic .δ is given by

.δ =

N −1 N−1 N 2

xi − xj v .

i=1 j =i+1

(3.72)

3.9.3 An Example Analysis

To illustrate a permutation analogue of Wilcoxon’s one-sample signed-rank test, consider the example set of .N = 18 signed ranks listed in Table 3.4. The scores represent the differences in time for .N = 18 police recruits running a 10k at two attempts over the same course. The sums of the (.R−) and (.R+) signed ranks in
Table 3.4 are

n
.T = Ri− = 3 + 5 + 6 + 7 + 8 + 9 + 10 + 12
i=1
+ 14 + 15 + 16 + 17 + 18 = 140

(3.73)

Table 3.4 Example rank-score data for the Wilcoxon one-sample signed-rank test

Difference .+1 min 30 s .+2 min 10 s .−1 min 50 s .+2 min 30 s .−2 min 10 s .−2 min 20 s .−2 min 25 s .−2 min 35 s .−2 min 45 s

Signed
Rank .+1 .+2 .−3 .+4 .−5 .−6 .−7 .−8 .−9

Difference .−3 min 00 s .+4 min 10 s .−3 min 20 s .+4 min 25 s .−3 min 50 s .−4 min 05 s .−4 min 20 s .−4 min 35 s .−4 min 45 s

Signed
Rank .−10 .+11 .−12 .+13 .−14 .−15 .−16 .−17 .−18

80

3 One-Sample Tests

and

m
.T = Ri+ = 1 + 2 + 4 + 11 + 13 = 31 ,
i=1

(3.74)

respectively. Thus, Wilcoxon’s test statistic is .T = 31, the smaller of the two sums. For the .N = 18 signed ranks listed in Table 3.4, the mean value of Wilcoxon’s
T -test statistic is

N(N + 1) (18)(18 + 1)

.μT =

4

=

4

= 85.50 ,

(3.75)

the standard deviation of test statistic T is

N (N + 1)(2N + 1) 1/2

.σT =

24

=

(18)(18 + 1)[(2)(18) + 1]

1/2
= 22.9619 ,

24

(3.76)

and the standard normal deviate of test statistic T , corrected for continuity, is

.z = |T − μT | − 0.50 = |31 − 85.50| − 0.50 = +2.3517 ,

σT

22.9619

(3.77)

yielding an asymptotic probability value of .P = 0.0187, under the assumptions of independence, random sampling, and normality.

3.9.4 An Exact Permutation Analysis

Consider an analysis of the one-sample signed ranks listed in Table 3.4 under the Fisher–Pitman permutation model with scaling function .v = 2 for correspondence with Wilcoxon’s one-sample signed-rank test. Following Eq. (3.72) on p. 79, for the rank-score data with .N = 18 rank scores and scaling function .v = 2, permutation test statistic .δ is

N −1 N−1 .δ =
2

N

ri − rj v

i=1 j =i+1

=

2 (18)(18

−

1)

(+1) − (+2) 2 + (+1) − (−3) 2 + (+1) − (+4) 2

+ · · · + (−17) − (−18) 2 = (2)(26,081) = 170.4641 . 306

(3.78)