Two-sample Anderson-Darling tests

Is there a function in Matlab which can perform a 2-sample Anderson-Darling test, as described/shown on Cross Validated or on R?
Indeed, to me, it looks like that the Matlab function called adtest, is not distribution-free (i.e. it is not non-parametric), and it is limited to some known distributions (i.e. 'norm', 'exp', 'ev', 'logn' and 'weibull'). However, the Anderson-Darling test should be distribution-free, which means that I can take, in the two-sample test, any empirical distributions as inputs for the two-sample Anderson-Darling test. In my case, I have two empirical distributions (coming from two datasets) that I would like to use as inputs for the two-sample Anderson-Darling test.

回答 (1 件)

Star Strider
Star Strider 2024 年 8 月 4 日

1 投票

I am not certain what you want, however the two-sample Kolmogorov-Smirnov test (kstest2) appears to be distribution-free, at least as I read the documentation for it. It may do what you want.
From the documentation: it ‘returns a test decision for the null hypothesis that the data in vectors x1 and x2 are from the same continuous distribution’.

5 件のコメント

Sim
Sim 2024 年 8 月 4 日
編集済み: Sim 2024 年 8 月 4 日
Thanks @Star :-) actually, I was looking for the two-sample Anderson-Darling test, and not for the two-sample Kolmogorov-Smirnov test... The AD test detects variations at the tails of the distributions (i.e. the 2 samples), while the KS test shows variations in the central part of the distributions....
About the two-sample test, it means that I compare two empirical distributions, which come from two arrays of data. There is clearly the one-sample test as well, both for KS and AD, where one single empirical distribution (i.e. once you have 1 single array of data) is compared to a known distribution, like a normal distribution for example... :-)
Star Strider
Star Strider 2024 年 8 月 4 日
My pleasure!
I also did a search for the two-sample Anderson-Darling test in the File Exchange and in a general interweb search, however no function for it has been written in MATLAB that I can find.
The only reference that I can find is in the Wikipedia article on the Anderson-Darling test Non-parametric k-sample tests however that would require coding it. (I am not averse to doing that, however that would first require my understanding the assumptions behind it and writing the code for it, and then devising a way to test the code. That would take a while, since your question is the first I have heard of it.) I also do not know how that statistic is itself distributed, and if a confidence interval could be calculated for it.
Sim
Sim 2024 年 8 月 5 日
編集済み: Sim 2024 年 8 月 5 日
Hi @Star Strider, thanks a lot for your research and time!
About the two-sample Anderson-Darling test, I found the k-sample Anderson-Darling test.
Indeed, from the paper of Scholz and Stephens (1987) it looks like that the k-sample Anderson-Darling test is the generalization of the two-sample Anderson-Darling test when k=2.
I found a Matlab implementation (AnDarksamtest) and a Python implementatioin (anderson_ksamp) of the k-sample Anderson-Darling test, that I compared to each other. In particular, I compared one distribution with itself, and the test - in both implementations -, should indicate me about the failure of rejection of the null hypothesis (where, in this case, the null hypothesis is that both distributions are identical).
Matlab implementation (AnDarksamtest) of the k-sample Anderson-Darling test.
% Test
a = [7 42 61 81 115 137 80 100 121 140 127 110 81 39 59 45 38 32 29 27 35 25 22 20 19 14 12 9 8 6 3 2 2 0 0 1 0 1 0 0];
b = a;
AnDarksamtest(vertcat([a' repelem(1,length(a))'],[b' repelem(2,length(b))']))
% Results
K-sample Anderson-Darling Test
----------------------------------------------------------------------------
Number of samples: 2
Sample sizes: 40 40
Total number of observations: 80
Number of ties (identical values): 47
Mean of the Anderson-Darling rank statistic: 1
Standard deviation of the Anderson-Darling rank statistic: 0.7438456
----------------------------------------------------------------------------
Not adjusted for ties.
----------------------------------------------------------------------------
Anderson-Darling rank statistic: 0.0000000
Standardized Anderson-Darling rank statistic: -1.3443650
Probability associated to the Anderson-Darling rank statistic = 0.8741461
With a given significance = 0.050
The populations from which the k-samples of data were drawn are identical:
natural groupings have no significant effect (unstructurated).
----------------------------------------------------------------------------
Adjusted for ties.
----------------------------------------------------------------------------
Anderson-Darling rank statistic: 0.0000000
Standardized Anderson-Darling rank statistic: -1.3443650
Probability associated to the Anderson-Darling rank statistic = 0.8741461
With a given significance = 0.050
The populations from which the k-samples of data were drawn are identical:
natural groupings have no significant effect (unstructurated).
----------------------------------------------------------------------------
Python implementatioin (anderson_ksamp) of the k-sample Anderson-Darling test.
% Test
import numpy as np
from scipy import stats
a = [7, 42, 61, 81, 115, 137, 80, 100, 121, 140, 127, 110, 81, 39, 59, 45, 38, 32, 29, 27, 35, 25, 22, 20, 19, 14, 12, 9, 8, 6, 3, 2, 2, 0, 0, 1, 0, 1, 0, 0];
b = a;
rng = np.random.default_rng()
method = stats.PermutationMethod(n_resamples=9999, random_state=rng)
res = stats.anderson_ksamp([a,b], method=method)
print(res.statistic)
print(res.critical_values)
print(res.pvalue)
% Results
-1.3443650498654214 % <-- statistic
[0.325 1.226 1.961 2.718 3.752 4.592 6.546] % <-- critical_values
1.0 % <-- pvalue
Summary notes and doubts.
Altohugh both implementations generally agree, their p-values are different!
Since I compare a distribution with itself, I would expect a p-value = 1 for both implementation. However, the Matlab implementation reports a p-value = 0.8741461, while the Python implementation reports a p-value = 1. Here following some doubts:
  1. Why this difference among the p-values?
  2. Might it be that the Matlab implementation has some issue related to the calculation of the p-value?
  3. As an alternative solution if the Matlab implementation has some issue with the p-value, would it be possible to call, from Matlab, the Python script I wrote?
Star Strider
Star Strider 2024 年 8 月 5 日
You will have to ask the author of the MATLAB function about that. (All I did was to provide the search information.)
Sim
Sim 2024 年 8 月 5 日
Thanks @Star Strider :-) I would like to contact the author, but it looks like he is retired and it is 4 years that he does not check matlab answers.....

サインインしてコメントする。

質問済み:

Sim
2024 年 8 月 4 日

編集済み:

Sim
2024 年 8 月 5 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by