Non-parametric tests
Multiple non-parametric tests. Here, Pingouin is mostly a wrapper around HypothesisTests.jl
.
Pingouin.cochran
— Methodcochran(data[, dv, within, subject])
Cochran Q test. A special case of the Friedman test when the dependent variable is binary.
Arguments
data::DataFrame
dv::Union{Nothing,String,Symbol}
: Name of column containing the binary dependent variable.within::Union{Nothing,String,Symbol}
: Name of column containing the within-subject factor.subject::Union{Nothing,String,Symbol}
: Name of column containing the subject identifier.
Returns
stats::DataFrame
'Q'
: The Cochran Q statistic,'p-unc'
: Uncorrected p-value,'ddof'
: degrees of freedom.
Notes
The Cochran Q test [1] is a non-parametric test for ANOVA with repeated measures where the dependent variable is binary.
Data are expected to be in long-format. NaN are automatically removed from the data.
The Q statistics is defined as:
\[Q = \frac{(r-1)(r\sum_j^rx_j^2-N^2)}{rN-\sum_i^nx_i^2}\]
where $N$ is the total sum of all observations, $j=1,...,r$ where $r$ is the number of repeated measures, $i=1,...,n$ where $n$ is the number of observations per condition.
The p-value is then approximated using a chi-square distribution with $r-1$ degrees of freedom:
\[Q \sim \chi^2(r-1)\]
References
[1] Cochran, W.G., 1950. The comparison of percentages in matched samples. Biometrika 37, 256–266. https://doi.org/10.1093/biomet/37.3-4.256
Examples
Compute the Cochran Q test for repeated measurements.
julia> data = Pingouin.read_dataset("cochran");
julia> cochran(data, dv="Energetic", within="Time", subject="Subject")
1×4 DataFrame
│ Row │ Source │ ddof │ Q │ p_unc │
│ │ String │ Int64 │ Float64 │ Float64 │
├─────┼────────┼───────┼─────────┼───────────┤
│ 1 │ Time │ 2 │ 6.70588 │ 0.0349813 │
Pingouin.friedman
— Methodfriedman(data, dv, within, subject, method)
Friedman test for repeated measurements.
Arguments
data::DataFrame
,dv::Union{String,Symbol}
: Name of column containing the dependent variable,within::Union{String,Symbol}
: Name of column containing the within-subject factor,subject::Union{String,Symbol}
: Name of column containing the subject identifier,method::String
: Statistical test to perform. Must be"chisq"
(chi-square test) or"f"
(F test).
See notes below for explanation.
Returns
"W"
: Kendall's coefficient of concordance, corrected for ties,stats::DataFrame
, ifmethod="chisq"
"Q"
: The Friedman Q statistic, corrected for ties,"p-unc"
: Uncorrected p-value,"ddof"
: degrees of freedom.
stats::DataFrame
, ifmethod="f"
:"F"
: The Friedman F statistic, corrected for ties,"p-unc"
: Uncorrected p-value,"ddof1"
: degrees of freedom of the numerator,"ddof2"
: degrees of freedom of the denominator.
Notes
The Friedman test is used for one-way repeated measures ANOVA by ranks.
Data are expected to be in long-format.
Note that if the dataset contains one or more other within subject factors, an automatic collapsing to the mean is applied on the dependent variable (same behavior as the ezANOVA R package). As such, results can differ from those of JASP. If you can, always double-check the results.
Due to the assumption that the test statistic has a chi squared distribution, the p-value is only reliable for n > 10 and more than 6 repeated measurements.
NaN values are automatically removed.
The Friedman test is equivalent to the test of significance of Kendalls's coefficient of concordance (Kendall's W). Most commonly a Q statistic, which has asymptotical chi-squared distribution, is computed and used for testing. However, in [1] they showed the chi-squared test to be overly conservative for small numbers of samples and repeated measures. Instead they recommend the F test, which has the correct size and behaves like a permutation test, but is computationaly much easier.
References
[1] Marozzi, M. (2014). Testing for concordance between several criteria. Journal of Statistical Computation and Simulation, 84(9), 1843–1850. https://doi.org/10.1080/00949655.2013.766189
Examples
Compute the Friedman test for repeated measurements.
julia> data = Pingouin.read_dataset("rm_anova")
julia> Pingouin.friedman(data,
dv="DesireToKill",
within="Disgustingness",
subject="Subject")
1×5 DataFrame
Row │ Source W ddof Q p_unc
│ String Float64 Int64 Float64 Float64
─────┼───────────────────────────────────────────────────────
1 │ Disgustingness 0.0992242 1 9.22785 0.00238362
This time we will use the F test method.
julia> data = Pingouin.read_dataset("rm_anova")
julia> Pingouin.friedman(data,
dv="DesireToKill",
within="Disgustingness",
subject="Subject",
method="f")
1×6 DataFrame
Row │ Source W ddof1 ddof2 F p_unc
│ String Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────────────────
1 │ Disgustingness 0.0992242 0.978495 90.0215 10.1342 0.00213772
Pingouin.harrelldavis
— Functionharrelldavis(x[, q, dim])
EXPERIMENTAL Harrell-Davis robust estimate of the $q^{th}$ quantile(s) of the data. TESTS NEEDED
Arguments
x::Array{<:Number}
: Data, must be a one or two-dimensional vector.q::Union{Float64,Array{Float64}}
: Quantile or sequence of quantiles to compute, must be between 0 and 1. Default is $0.5$.dim::Int64
: Axis along which the MAD is computed. Default is the first axis. Can be either 1 or 2.
Returns
y::Union{Float64,Array{Float64}}
: The estimated quantile(s). Ifquantile
is a single quantile, will return a float, otherwise will compute each quantile separately and returns an array of floats.
Notes
The Harrell-Davis method [1] estimates the $q^{th}$ quantile by a linear combination of the order statistics. Results have been tested against a Matlab implementation [2]. Note that this method is also used to measure the confidence intervals of the difference between quantiles of two groups, as implemented in the shift function [3].
See Also
References
[1] Frank E. Harrell, C. E. Davis, A new distribution-free quantile estimator, Biometrika, Volume 69, Issue 3, December 1982, Pages 635–640, https://doi.org/10.1093/biomet/69.3.635
[2] https://github.com/GRousselet/matlab_stats/blob/master/hd.m
[3] Rousselet, G. A., Pernet, C. R. and Wilcox, R. R. (2017). Beyond differences in means: robust graphical methods to compare two groups in neuroscience. Eur J Neurosci, 46: 1738-1748. https://doi.org/doi:10.1111/ejn.13610
Examples
Estimate the 0.5 quantile (i.e median) of 100 observation picked from a normal distribution with zero mean and unit variance.
julia> using Distributions, Random
julia> d = Normal(0, 1)
julia> x = rand(d, 100);
>>> Pingouin.harrelldavis(x, 0.5)
-0.3197175569523778
Several quantiles at once
julia> Pingouin.harrelldavis(x, [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
-0.8584761447019648
-0.3197175569523778
0.30049291160713604
On the last axis of a 2D vector (default)
julia> using Distributions, Random
julia> d = Normal(0, 1)
julia> x = rand(d, (100, 100));
julia> Pingouin.harrelldavis(x, 0.5)
100×1 Array{Float64,2}:
0.08776830864191214
0.03470963005927001
-0.0805646920967012
0.3314919956251108
0.3111971350475172
⋮
0.10769293112437549
-0.10622118136247076
-0.13230506142402296
-0.09693123033727057
-0.2135938540892071
On the first axis
julia> Pingouin.harrelldavis(x, 0.5, 1)
1×100 Array{Float64,2}:
0.0112259 -0.0409635 -0.0918462 ...
On the first axis with multiple quantiles
julia> Pingouin.harrelldavis(x, [0.5, 0.75], 1)
1×100 Array{Float64,2}:
0.0112259 -0.0409635 -0.0918462 ...
Pingouin.kruskal
— Methodkruskal(data[, dv, between, detailed])
Kruskal-Wallis H-test for independent samples.
Arguments
data::DataFrame
: DataFrame,dv::String
: Name of column containing the dependent variable,between::String
: Name of column containing the between factor.
Returns
stats::DataFrame
'H'
: The Kruskal-Wallis H statistic, corrected for ties,'p-unc'
: Uncorrected p-value,'dof'
: degrees of freedom.
Notes
The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. The test works on 2 or more independent samples, which may have different sizes.
Due to the assumption that H has a chi square distribution, the number of samples in each group must not be too small. A typical rule is that each sample must have at least 5 measurements.
NaN values are automatically removed.
Examples
Compute the Kruskal-Wallis H-test for independent samples.
julia> data = Pingouin.read_dataset("anova")
julia> Pingouin.kruskal(data, dv="Pain threshold", between="Hair color")
1×4 DataFrame
│ Row │ Source │ ddof │ H │ p_unc │
│ │ String │ Int64 │ Float64 │ Float64 │
├─────┼────────────┼───────┼─────────┼───────────┤
│ 1 │ Hair color │ 3 │ 10.5886 │ 0.0141716 │
Pingouin.madmedianrule
— Methodmadmedianrule(a)
Robust outlier detection based on the MAD-median rule.
Arguments
a::Array{<:Number}
: Input array. Must be one-dimensional.
Returns
outliers::Array{Bool}
: Boolean array indicating whether each sample is an outlier (true) or not (false).
See also
Statistics.mad
Notes
The MAD-median-rule ([1], [2]) will refer to declaring $X_i$ an outlier if
\[\frac{\left | X_i - M \right |}{\text{MAD}_{\text{norm}}} > K\]
,
where $M$ is the median of $X$, $\text{MAD}_{\text{norm}}$ the normalized median absolute deviation of $X$, and $K$ is the square root of the .975 quantile of a $X^2$ distribution with one degree of freedom, which is roughly equal to 2.24.
References
[1] Hall, P., Welsh, A.H., 1985. Limit theorems for the median deviation. Ann. Inst. Stat. Math. 37, 27–36. https://doi.org/10.1007/BF02481078
[2] Wilcox, R. R. Introduction to Robust Estimation and Hypothesis Testing. (Academic Press, 2011).
Examples
julia> a = [-1.09, 1., 0.28, -1.51, -0.58, 6.61, -2.43, -0.43]
julia> Pingouin.madmedianrule(a)
8-element Array{Bool,1}:
0
0
0
0
0
1
0
0
Pingouin.mwu
— Methodmwu(x, y)
Mann-Whitney U Test (= Wilcoxon rank-sum test). It is the non-parametric version of the independent T-test.
Arguments
x, y::Array{<:Number}
: First and second set of observations.x
andy
must be independent.
Returns
stats::DataFrame
'U-val'
: U-value'p-val'
: p-value'RBC'
: rank-biserial correlation'CLES'
: common language effect size
See also
Notes
The Mann–Whitney U test [1], (also called Wilcoxon rank-sum test) is a non-parametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. The test assumes that the two samples are independent. This test corrects for ties and by default uses a continuity correction (see HypothesisTests.MannWhitneyUTest
for details).
The rank biserial correlation [2] is the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence.
The common language effect size is the proportion of pairs where $x$ is higher than $y$. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:
\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]
The advantage is of this method are twofold. First, the brute-force approach pairs each observation of $x$ to its $y$ counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.
References
[1] Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50-60.
[2] Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
[3] McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
[4] Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
julia> x = [1,4,2,5,3,6,9,8,7]
julia> y = [2,4,1,5,10,1,4,9,8,5]
julia> Pingouin.mwu(x, y)
1×4 DataFrame
│ Row │ U_val │ p_val │ RBC │ CLES │
│ │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼──────────┼────────────┼──────────┤
│ 1 │ 46.5 │ 0.934494 │ -0.0333333 │ 0.516667 │
Compare with HypothesisTests
julia> using HypothesisTests
julia> MannWhitneyUTest(x, y)
Approximate Mann-Whitney U test
-------------------------------
Population details:
parameter of interest: Location parameter (pseudomedian)
value under h_0: 0
point estimate: 0.5
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.9345
Details:
number of observations in each group: [9, 10]
Mann-Whitney-U statistic: 46.5
rank sums: [91.5, 98.5]
adjustment for ties: 90.0
normal approximation (μ, σ): (1.5, 12.1666)
Pingouin.wilcoxon
— Methodwilcoxon(x, y)
Wilcoxon signed-rank test. It is the non-parametric version of the paired T-test.
Arguments
x, y::Array{<:Number}
: First and second set of observations. $x$ and $y$ must be related (e.g repeated measures) and, therefore, have the same number of samples. Note that a listwise deletion of missing values is automatically applied.
Returns
stats::DataFrame
'W-val'
: W-value'p-val'
: p-value'RBC'
: matched pairs rank-biserial correlation (effect size)'CLES'
: common language effect size
See also
HypothesisTests.SignedRankTest
,mwu
.
Notes
The Wilcoxon signed-rank test [1] tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences $x - y$ is symmetric about zero. A continuity correction is applied by default (see HypothesisTests.SignedRankTest
for details).
The matched pairs rank biserial correlation [2] is the simple difference between the proportion of favorable and unfavorable evidence; in the case of the Wilcoxon signed-rank test, the evidence consists of rank sums (Kerby 2014):
\[r = f - u\]
The common language effect size is the proportion of pairs where $x$ is higher than $y$. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:
\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]
The advantage is of this method are twofold. First, the brute-force approach pairs each observation of $x$ to its $y$ counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.
References
[1] Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics bulletin, 1(6), 80-83.
[2] Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
[3] McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
[4] Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
Wilcoxon test on two related samples.
julia> x = [20, 22, 19, 20, 22, 18, 24, 20, 19, 24, 26, 13]
julia> y = [38, 37, 33, 29, 14, 12, 20, 22, 17, 25, 26, 16]
julia> Pingouin.wilcoxon(x, y)
1×4 DataFrame
│ Row │ W_val │ p_val │ RBC │ CLES │
│ │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼──────────┼───────────┼──────────┤
│ 1 │ 20.5 │ 0.288086 │ -0.378788 │ 0.395833 │
Compare with HypothesisTests
julia> using HypothesisTests
julia> SignedRankTest(x, y)
Exact Wilcoxon signed rank test
-------------------------------
Population details:
parameter of interest: Location parameter (pseudomedian)
value under h_0: 0
point estimate: -1.5
95% confidence interval: (-9.0, 2.5)
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.2881
Details:
number of observations: 12
Wilcoxon rank-sum statistic: 20.5
rank sums: [20.5, 45.5]
adjustment for ties: 6.0