Last Updated onNovember 28, 2019 Quick-reference guide to the 17 statistical hypothesis tests that you need in applied machine learning, with sample code in Python.Although there are hundreds of statistical hypothesis tests that you could use, there is only a small subset that you may need to use in a machine learning project. In this post, you will discover a cheat sheet for the most popular statistical hypothesis tests for a machine learning project with examples using the Python API. Each statistical test is presented in a consistent way, including: The name of the test. What the test is checking. The key assumptions of the test. How the test result is interpreted. Python API for using the test.
Note, when it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely to degrade gracefully rather than become immediately unusable if an assumption is violated. Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis. In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to name two examples. Finally, there may be multiple tests for a given concern, e.g. normality. We cannot get crisp answers to questions with statistics; instead, we get probabilistic answers. As such, we can arrive at different answers to the same question by considering the question in different ways. Hence the need for multiple different tests for some questions we may have about data. Discover statistical hypothesis testing, resampling methods, estimation statistics and nonparametric methods in my new book, with 29 step-by-step tutorials and full source code. Let’s get started. Statistical Hypothesis Tests in Python Cheat Sheet Photo by davemichuda, some rights reserved. Tutorial OverviewThis tutorial is divided into 5 parts; they are: Normality Tests Shapiro-Wilk Test D’Agostino’s K^2 Test Anderson-Darling Test
Correlation Tests Pearson’s Correlation Coefficient Spearman’s Rank Correlation Kendall’s Rank Correlation Chi-Squared Test
Stationary Tests Augmented Dickey-Fuller Kwiatkowski-Phillips-Schmidt-Shin
Parametric Statistical Hypothesis Tests Student’s t-test Paired Student’s t-test Analysis of Variance Test (ANOVA) Repeated Measures ANOVA Test
Nonparametric Statistical Hypothesis Tests Mann-Whitney U Test Wilcoxon Signed-Rank Test Kruskal-Wallis H Test Friedman Test
1. Normality TestsThis section lists statistical tests that you can use to check if your data has a Gaussian distribution. Shapiro-Wilk TestTests whether a data sample has a Gaussian distribution. Assumptions Interpretation Python Code
|
# Example of the Shapiro-Wilk Normality Test from scipy.stats import shapiro data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] stat, p = shapiro(data) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably Gaussian') else: print('Probably not Gaussian') |
More Information D’Agostino’s K^2 TestTests whether a data sample has a Gaussian distribution. Assumptions Interpretation Python Code
|
# Example of the D'Agostino's K^2 Normality Test from scipy.stats import normaltest data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] stat, p = normaltest(data) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably Gaussian') else: print('Probably not Gaussian') |
More Information Anderson-Darling TestTests whether a data sample has a Gaussian distribution. Assumptions Interpretation Python Code
|
# Example of the Anderson-Darling Normality Test from scipy.stats import anderson data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] result = anderson(data) print('stat=%.3f' % (result.statistic)) for i in range(len(result.critical_values)): sl, cv = result.significance_level[i], result.critical_values[i] if result.statistic < cv: print('Probably Gaussian at the %.1f%% level' % (sl)) else: print('Probably not Gaussian at the %.1f%% level' % (sl)) |
More Information 2. Correlation TestsThis section lists statistical tests that you can use to check if two samples are related. Pearson’s Correlation CoefficientTests whether two samples have a linear relationship. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.
Interpretation Python Code
|
# Example of the Pearson's Correlation test from scipy.stats import pearsonr data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579] stat, p = pearsonr(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably independent') else: print('Probably dependent') |
More Information Spearman’s Rank CorrelationTests whether two samples have a monotonic relationship. Assumptions Interpretation Python Code
|
# Example of the Spearman's Rank Correlation Test from scipy.stats import spearmanr data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579] stat, p = spearmanr(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably independent') else: print('Probably dependent') |
More Information Kendall’s Rank CorrelationTests whether two samples have a monotonic relationship. Assumptions Interpretation Python Code
|
# Example of the Kendall's Rank Correlation Test from scipy.stats import kendalltau data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579] stat, p = kendalltau(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably independent') else: print('Probably dependent') |
More Information Chi-Squared TestTests whether two categorical variables are related or independent. Assumptions Interpretation Python Code
|
# Example of the Chi-Squared Test from scipy.stats import chi2_contingency table = [[10, 20, 30],[6, 9, 17]] stat, p, dof, expected = chi2_contingency(table) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably independent') else: print('Probably dependent') |
More Information 3. Stationary TestsThis section lists statistical tests that you can use to check if a time series is stationary or not. Augmented Dickey-Fuller Unit Root TestTests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive. Assumptions Interpretation Python Code
|
# Example of the Augmented Dickey-Fuller unit root test from statsmodels.tsa.stattools import adfuller data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] stat, p, lags, obs, crit, t = adfuller(data) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably not Stationary') else: print('Probably Stationary') |
More Information Kwiatkowski-Phillips-Schmidt-ShinTests whether a time series is trend stationary or not. Assumptions Interpretation Python Code
|
# Example of the Kwiatkowski-Phillips-Schmidt-Shin test from statsmodels.tsa.stattools import kpss data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] stat, p, lags, crit = kpss(data) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably not Stationary') else: print('Probably Stationary') |
More Information 4. Parametric Statistical Hypothesis TestsThis section lists statistical tests that you can use to compare data samples. Student’s t-testTests whether the means of two independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.
Interpretation Python Code
|
# Example of the Student's t-test from scipy.stats import ttest_ind data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] stat, p = ttest_ind(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Paired Student’s t-testTests whether the means of two paired samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired.
Interpretation Python Code
|
# Example of the Paired Student's t-test from scipy.stats import ttest_rel data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] stat, p = ttest_rel(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Analysis of Variance Test (ANOVA)Tests whether the means of two or more independent samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.
Interpretation Python Code
|
# Example of the Analysis of Variance Test from scipy.stats import f_oneway data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204] stat, p = f_oneway(data1, data2, data3) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Repeated Measures ANOVA TestTests whether the means of two or more paired samples are significantly different. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Observations across each sample are paired.
Interpretation Python Code Currently not supported in Python. More Information 5. Nonparametric Statistical Hypothesis TestsMann-Whitney U TestTests whether the distributions of two independent samples are equal or not. Assumptions Interpretation Python Code
|
# Example of the Mann-Whitney U Test from scipy.stats import mannwhitneyu data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] stat, p = mannwhitneyu(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Wilcoxon Signed-Rank TestTests whether the distributions of two paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired.
Interpretation Python Code
|
# Example of the Wilcoxon Signed-Rank Test from scipy.stats import wilcoxon data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] stat, p = wilcoxon(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Kruskal-Wallis H TestTests whether the distributions of two or more independent samples are equal or not. Assumptions Interpretation Python Code
|
# Example of the Kruskal-Wallis H Test from scipy.stats import kruskal data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] stat, p = kruskal(data1, data2) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Friedman TestTests whether the distributions of two or more paired samples are equal or not. Assumptions Observations in each sample are independent and identically distributed (iid). Observations in each sample can be ranked. Observations across each sample are paired.
Interpretation Python Code
|
# Example of the Friedman Test from scipy.stats import friedmanchisquare data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869] data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169] data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204] stat, p = friedmanchisquare(data1, data2, data3) print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions') |
More Information Further ReadingThis section provides more resources on the topic if you are looking to go deeper. SummaryIn this tutorial, you discovered the key statistical hypothesis tests that you may need to use in a machine learning project. Specifically, you learned: The types of tests to use in different circumstances, such as normality checking, relationships between variables, and differences between samples. The key assumptions for each test and how to interpret the test result. How to implement the test using the Python API.
Do you have any questions? Ask your questions in the comments below and I will do my best to answer. Did I miss an important statistical test or key assumption for one of the listed tests? Let me know in the comments below.
|