Non-parametric tests

Multiple non-parametric tests. Here, Pingouin is mostly a wrapper around HypothesisTests.jl.

Pingouin.cochranMethod
cochran(data[, dv, within, subject])

Cochran Q test. A special case of the Friedman test when the dependent variable is binary.

Arguments

  • data::DataFrame
  • dv::Union{Nothing,String,Symbol}: Name of column containing the binary dependent variable.
  • within::Union{Nothing,String,Symbol}: Name of column containing the within-subject factor.
  • subject::Union{Nothing,String,Symbol}: Name of column containing the subject identifier.

Returns

  • stats::DataFrame
    • 'Q': The Cochran Q statistic,
    • 'p-unc': Uncorrected p-value,
    • 'ddof': degrees of freedom.

Notes

The Cochran Q test [1] is a non-parametric test for ANOVA with repeated measures where the dependent variable is binary.

Data are expected to be in long-format. NaN are automatically removed from the data.

The Q statistics is defined as:

\[Q = \frac{(r-1)(r\sum_j^rx_j^2-N^2)}{rN-\sum_i^nx_i^2}\]

where $N$ is the total sum of all observations, $j=1,...,r$ where $r$ is the number of repeated measures, $i=1,...,n$ where $n$ is the number of observations per condition.

The p-value is then approximated using a chi-square distribution with $r-1$ degrees of freedom:

\[Q \sim \chi^2(r-1)\]

References

[1] Cochran, W.G., 1950. The comparison of percentages in matched samples. Biometrika 37, 256–266. https://doi.org/10.1093/biomet/37.3-4.256

Examples

Compute the Cochran Q test for repeated measurements.

julia> data = Pingouin.read_dataset("cochran");
julia> cochran(data, dv="Energetic", within="Time", subject="Subject")
1×4 DataFrame
│ Row │ Source │ ddof  │ Q       │ p_unc     │
│     │ String │ Int64 │ Float64 │ Float64   │
├─────┼────────┼───────┼─────────┼───────────┤
│ 1   │ Time   │ 2     │ 6.70588 │ 0.0349813 │
source
Pingouin.friedmanMethod
friedman(data, dv, within, subject, method)

Friedman test for repeated measurements.

Arguments

  • data::DataFrame,
  • dv::Union{String,Symbol}: Name of column containing the dependent variable,
  • within::Union{String,Symbol}: Name of column containing the within-subject factor,
  • subject::Union{String,Symbol}: Name of column containing the subject identifier,
  • method::String: Statistical test to perform. Must be "chisq" (chi-square test) or "f" (F test).

See notes below for explanation.

Returns

  • "W": Kendall's coefficient of concordance, corrected for ties,
  • stats::DataFrame, if method="chisq"
    • "Q": The Friedman Q statistic, corrected for ties,
    • "p-unc": Uncorrected p-value,
    • "ddof": degrees of freedom.
  • stats::DataFrame, if method="f":
    • "F": The Friedman F statistic, corrected for ties,
    • "p-unc": Uncorrected p-value,
    • "ddof1": degrees of freedom of the numerator,
    • "ddof2": degrees of freedom of the denominator.

Notes

The Friedman test is used for one-way repeated measures ANOVA by ranks.

Data are expected to be in long-format.

Note that if the dataset contains one or more other within subject factors, an automatic collapsing to the mean is applied on the dependent variable (same behavior as the ezANOVA R package). As such, results can differ from those of JASP. If you can, always double-check the results.

Due to the assumption that the test statistic has a chi squared distribution, the p-value is only reliable for n > 10 and more than 6 repeated measurements.

NaN values are automatically removed.

The Friedman test is equivalent to the test of significance of Kendalls's coefficient of concordance (Kendall's W). Most commonly a Q statistic, which has asymptotical chi-squared distribution, is computed and used for testing. However, in [1] they showed the chi-squared test to be overly conservative for small numbers of samples and repeated measures. Instead they recommend the F test, which has the correct size and behaves like a permutation test, but is computationaly much easier.

References

[1] Marozzi, M. (2014). Testing for concordance between several criteria. Journal of Statistical Computation and Simulation, 84(9), 1843–1850. https://doi.org/10.1080/00949655.2013.766189

Examples

Compute the Friedman test for repeated measurements.

julia> data = Pingouin.read_dataset("rm_anova")
julia> Pingouin.friedman(data,
                         dv="DesireToKill",
                         within="Disgustingness",
                         subject="Subject")
1×5 DataFrame
 Row │ Source          W          ddof   Q        p_unc      
     │ String          Float64    Int64  Float64  Float64    
─────┼───────────────────────────────────────────────────────
   1 │ Disgustingness  0.0992242      1  9.22785  0.00238362

This time we will use the F test method.

julia> data = Pingouin.read_dataset("rm_anova")
julia> Pingouin.friedman(data,
                         dv="DesireToKill",
                         within="Disgustingness",
                         subject="Subject",
                         method="f")
1×6 DataFrame
 Row │ Source          W          ddof1     ddof2    F        p_unc      
     │ String          Float64    Float64   Float64  Float64  Float64    
─────┼───────────────────────────────────────────────────────────────────
   1 │ Disgustingness  0.0992242  0.978495  90.0215  10.1342  0.00213772
source
Pingouin.harrelldavisFunction
harrelldavis(x[, q, dim])

EXPERIMENTAL Harrell-Davis robust estimate of the $q^{th}$ quantile(s) of the data. TESTS NEEDED

Arguments

  • x::Array{<:Number}: Data, must be a one or two-dimensional vector.
  • q::Union{Float64,Array{Float64}}: Quantile or sequence of quantiles to compute, must be between 0 and 1. Default is $0.5$.
  • dim::Int64: Axis along which the MAD is computed. Default is the first axis. Can be either 1 or 2.

Returns

  • y::Union{Float64,Array{Float64}}: The estimated quantile(s). If quantile is a single quantile, will return a float, otherwise will compute each quantile separately and returns an array of floats.

Notes

The Harrell-Davis method [1] estimates the $q^{th}$ quantile by a linear combination of the order statistics. Results have been tested against a Matlab implementation [2]. Note that this method is also used to measure the confidence intervals of the difference between quantiles of two groups, as implemented in the shift function [3].

See Also

plot_shift

References

[1] Frank E. Harrell, C. E. Davis, A new distribution-free quantile estimator, Biometrika, Volume 69, Issue 3, December 1982, Pages 635–640, https://doi.org/10.1093/biomet/69.3.635

[2] https://github.com/GRousselet/matlab_stats/blob/master/hd.m

[3] Rousselet, G. A., Pernet, C. R. and Wilcox, R. R. (2017). Beyond differences in means: robust graphical methods to compare two groups in neuroscience. Eur J Neurosci, 46: 1738-1748. https://doi.org/doi:10.1111/ejn.13610

Examples

Estimate the 0.5 quantile (i.e median) of 100 observation picked from a normal distribution with zero mean and unit variance.

julia> using Distributions, Random
julia> d = Normal(0, 1)
julia> x = rand(d, 100);
>>> Pingouin.harrelldavis(x, 0.5)
-0.3197175569523778

Several quantiles at once

julia> Pingouin.harrelldavis(x, [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
 -0.8584761447019648
 -0.3197175569523778
  0.30049291160713604

On the last axis of a 2D vector (default)

julia> using Distributions, Random
julia> d = Normal(0, 1)
julia> x = rand(d, (100, 100));
julia> Pingouin.harrelldavis(x, 0.5)
100×1 Array{Float64,2}:
  0.08776830864191214
  0.03470963005927001
 -0.0805646920967012
  0.3314919956251108
  0.3111971350475172
  ⋮
  0.10769293112437549
 -0.10622118136247076
 -0.13230506142402296
 -0.09693123033727057
 -0.2135938540892071

On the first axis

julia> Pingouin.harrelldavis(x, 0.5, 1)
1×100 Array{Float64,2}:
 0.0112259  -0.0409635  -0.0918462 ...

On the first axis with multiple quantiles

julia> Pingouin.harrelldavis(x, [0.5, 0.75], 1)
1×100 Array{Float64,2}:
 0.0112259  -0.0409635  -0.0918462 ...
source
Pingouin.kruskalMethod
kruskal(data[, dv, between, detailed])

Kruskal-Wallis H-test for independent samples.

Arguments

  • data::DataFrame: DataFrame,
  • dv::String: Name of column containing the dependent variable,
  • between::String: Name of column containing the between factor.

Returns

  • stats::DataFrame
    • 'H': The Kruskal-Wallis H statistic, corrected for ties,
    • 'p-unc': Uncorrected p-value,
    • 'dof': degrees of freedom.

Notes

The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. The test works on 2 or more independent samples, which may have different sizes.

Due to the assumption that H has a chi square distribution, the number of samples in each group must not be too small. A typical rule is that each sample must have at least 5 measurements.

NaN values are automatically removed.

Examples

Compute the Kruskal-Wallis H-test for independent samples.

julia> data = Pingouin.read_dataset("anova")
julia> Pingouin.kruskal(data, dv="Pain threshold", between="Hair color")
1×4 DataFrame
│ Row │ Source     │ ddof  │ H       │ p_unc     │
│     │ String     │ Int64 │ Float64 │ Float64   │
├─────┼────────────┼───────┼─────────┼───────────┤
│ 1   │ Hair color │ 3     │ 10.5886 │ 0.0141716 │
source
Pingouin.madmedianruleMethod
madmedianrule(a)

Robust outlier detection based on the MAD-median rule.

Arguments

  • a::Array{<:Number}: Input array. Must be one-dimensional.

Returns

  • outliers::Array{Bool}: Boolean array indicating whether each sample is an outlier (true) or not (false).

See also

Statistics.mad

Notes

The MAD-median-rule ([1], [2]) will refer to declaring $X_i$ an outlier if

\[\frac{\left | X_i - M \right |}{\text{MAD}_{\text{norm}}} > K\]

,

where $M$ is the median of $X$, $\text{MAD}_{\text{norm}}$ the normalized median absolute deviation of $X$, and $K$ is the square root of the .975 quantile of a $X^2$ distribution with one degree of freedom, which is roughly equal to 2.24.

References

[1] Hall, P., Welsh, A.H., 1985. Limit theorems for the median deviation. Ann. Inst. Stat. Math. 37, 27–36. https://doi.org/10.1007/BF02481078

[2] Wilcox, R. R. Introduction to Robust Estimation and Hypothesis Testing. (Academic Press, 2011).

Examples

julia> a = [-1.09, 1., 0.28, -1.51, -0.58, 6.61, -2.43, -0.43]
julia> Pingouin.madmedianrule(a)
8-element Array{Bool,1}:
 0
 0
 0
 0
 0
 1
 0
 0
source
Pingouin.mwuMethod
mwu(x, y)

Mann-Whitney U Test (= Wilcoxon rank-sum test). It is the non-parametric version of the independent T-test.

Arguments

  • x, y::Array{<:Number}: First and second set of observations. x and y must be independent.

Returns

  • stats::DataFrame
    • 'U-val': U-value
    • 'p-val': p-value
    • 'RBC': rank-biserial correlation
    • 'CLES': common language effect size

See also

Notes

The Mann–Whitney U test [1], (also called Wilcoxon rank-sum test) is a non-parametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. The test assumes that the two samples are independent. This test corrects for ties and by default uses a continuity correction (see HypothesisTests.MannWhitneyUTest for details).

The rank biserial correlation [2] is the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence.

The common language effect size is the proportion of pairs where $x$ is higher than $y$. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:

\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]

The advantage is of this method are twofold. First, the brute-force approach pairs each observation of $x$ to its $y$ counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.

References

[1] Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50-60.

[2] Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.

[3] McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.

[4] Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329

Examples

julia> x = [1,4,2,5,3,6,9,8,7]
julia> y = [2,4,1,5,10,1,4,9,8,5]
julia> Pingouin.mwu(x, y)
1×4 DataFrame
│ Row │ U_val   │ p_val    │ RBC        │ CLES     │
│     │ Float64 │ Float64  │ Float64    │ Float64  │
├─────┼─────────┼──────────┼────────────┼──────────┤
│ 1   │ 46.5    │ 0.934494 │ -0.0333333 │ 0.516667 │

Compare with HypothesisTests

julia> using HypothesisTests
julia> MannWhitneyUTest(x, y)
Approximate Mann-Whitney U test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          0.5

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.9345

Details:
    number of observations in each group: [9, 10]
    Mann-Whitney-U statistic:             46.5
    rank sums:                            [91.5, 98.5]
    adjustment for ties:                  90.0
    normal approximation (μ, σ):          (1.5, 12.1666)
source
Pingouin.wilcoxonMethod
wilcoxon(x, y)

Wilcoxon signed-rank test. It is the non-parametric version of the paired T-test.

Arguments

  • x, y::Array{<:Number}: First and second set of observations. $x$ and $y$ must be related (e.g repeated measures) and, therefore, have the same number of samples. Note that a listwise deletion of missing values is automatically applied.

Returns

  • stats::DataFrame
    • 'W-val': W-value
    • 'p-val': p-value
    • 'RBC': matched pairs rank-biserial correlation (effect size)
    • 'CLES': common language effect size

See also

  • HypothesisTests.SignedRankTest,
  • mwu.

Notes

The Wilcoxon signed-rank test [1] tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences $x - y$ is symmetric about zero. A continuity correction is applied by default (see HypothesisTests.SignedRankTest for details).

The matched pairs rank biserial correlation [2] is the simple difference between the proportion of favorable and unfavorable evidence; in the case of the Wilcoxon signed-rank test, the evidence consists of rank sums (Kerby 2014):

\[r = f - u\]

The common language effect size is the proportion of pairs where $x$ is higher than $y$. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:

\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]

The advantage is of this method are twofold. First, the brute-force approach pairs each observation of $x$ to its $y$ counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.

References

[1] Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics bulletin, 1(6), 80-83.

[2] Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.

[3] McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.

[4] Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329

Examples

Wilcoxon test on two related samples.

julia> x = [20, 22, 19, 20, 22, 18, 24, 20, 19, 24, 26, 13]
julia> y = [38, 37, 33, 29, 14, 12, 20, 22, 17, 25, 26, 16]
julia> Pingouin.wilcoxon(x, y)
1×4 DataFrame
│ Row │ W_val   │ p_val    │ RBC       │ CLES     │
│     │ Float64 │ Float64  │ Float64   │ Float64  │
├─────┼─────────┼──────────┼───────────┼──────────┤
│ 1   │ 20.5    │ 0.288086 │ -0.378788 │ 0.395833 │

Compare with HypothesisTests

julia> using HypothesisTests
julia> SignedRankTest(x, y)
Exact Wilcoxon signed rank test
-------------------------------
Population details:
    parameter of interest:   Location parameter (pseudomedian)
    value under h_0:         0
    point estimate:          -1.5
    95% confidence interval: (-9.0, 2.5)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.2881

Details:
    number of observations:      12
    Wilcoxon rank-sum statistic: 20.5
    rank sums:                   [20.5, 45.5]
    adjustment for ties:         6.0
source