Two-sample Anderson-Darling tests

Question

0 投票

Is there a function in Matlab which can perform a 2-sample Anderson-Darling test, as described/shown on Cross Validated or on R?

Indeed, to me, it looks like that the Matlab function called adtest, is not distribution-free (i.e. it is not non-parametric), and it is limited to some known distributions (i.e. 'norm', 'exp', 'ev', 'logn' and 'weibull'). However, the Anderson-Darling test should be distribution-free, which means that I can take, in the two-sample test, any empirical distributions as inputs for the two-sample Anderson-Darling test. In my case, I have two empirical distributions (coming from two datasets) that I would like to use as inputs for the two-sample Anderson-Darling test.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Star Strider 2024 年 8 月 4 日

1 投票

I am not certain what you want, however the two-sample Kolmogorov-Smirnov test (kstest2) appears to be distribution-free, at least as I read the documentation for it. It may do what you want.

From the documentation: it ‘returns a test decision for the null hypothesis that the data in vectors x1 and x2 are from the same continuous distribution’.

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

Sim 2024 年 8 月 5 日

編集済み: Sim 2024 年 8 月 5 日

MATLAB Online で開く

Hi @Star Strider, thanks a lot for your research and time!

About the two-sample Anderson-Darling test, I found the k-sample Anderson-Darling test.

Indeed, from the paper of Scholz and Stephens (1987) it looks like that the k-sample Anderson-Darling test is the generalization of the two-sample Anderson-Darling test when k=2.

I found a Matlab implementation (AnDarksamtest) and a Python implementatioin (anderson_ksamp) of the k-sample Anderson-Darling test, that I compared to each other. In particular, I compared one distribution with itself, and the test - in both implementations -, should indicate me about the failure of rejection of the null hypothesis (where, in this case, the null hypothesis is that both distributions are identical).

Matlab implementation (AnDarksamtest) of the k-sample Anderson-Darling test.

% Test
a = [7 42 61 81 115 137 80 100 121 140 127 110 81 39 59 45 38 32 29 27 35 25 22 20 19 14 12 9 8 6 3 2 2 0 0 1 0 1 0 0];
b = a;
AnDarksamtest(vertcat([a' repelem(1,length(a))'],[b' repelem(2,length(b))']))
% Results
K-sample Anderson-Darling Test
----------------------------------------------------------------------------
Number of samples: 2
Sample sizes: 40 40
Total number of observations: 80
Number of ties (identical values): 47
Mean of the Anderson-Darling rank statistic: 1
Standard deviation of the Anderson-Darling rank statistic: 0.7438456
----------------------------------------------------------------------------
Not adjusted for ties.
----------------------------------------------------------------------------
Anderson-Darling rank statistic: 0.0000000
Standardized Anderson-Darling rank statistic: -1.3443650
Probability associated to the Anderson-Darling rank statistic = 0.8741461
 
With a given significance = 0.050
The populations from which the k-samples of data were drawn are identical:
natural groupings have no significant effect (unstructurated).
----------------------------------------------------------------------------
Adjusted for ties.
----------------------------------------------------------------------------
Anderson-Darling rank statistic: 0.0000000
Standardized Anderson-Darling rank statistic: -1.3443650
Probability associated to the Anderson-Darling rank statistic = 0.8741461
 
With a given significance = 0.050
The populations from which the k-samples of data were drawn are identical:
natural groupings have no significant effect (unstructurated).
----------------------------------------------------------------------------

Python implementatioin (anderson_ksamp) of the k-sample Anderson-Darling test.

% Test
import numpy as np
from scipy import stats
a = [7, 42, 61, 81, 115, 137, 80, 100, 121, 140, 127, 110, 81, 39, 59, 45, 38, 32, 29, 27, 35, 25, 22, 20, 19, 14, 12, 9, 8, 6, 3, 2, 2, 0, 0, 1, 0, 1, 0, 0];
b = a;
rng = np.random.default_rng()
method = stats.PermutationMethod(n_resamples=9999, random_state=rng)
res = stats.anderson_ksamp([a,b], method=method)
print(res.statistic)
print(res.critical_values)
print(res.pvalue)
% Results 
-1.3443650498654214                          % <-- statistic
[0.325 1.226 1.961 2.718 3.752 4.592 6.546]  % <-- critical_values
1.0                                          % <-- pvalue

Summary notes and doubts.

Altohugh both implementations generally agree, their p-values are different!

Since I compare a distribution with itself, I would expect a p-value = 1 for both implementation. However, the Matlab implementation reports a p-value = 0.8741461, while the Python implementation reports a p-value = 1. Here following some doubts:

Why this difference among the p-values?
Might it be that the Matlab implementation has some issue related to the calculation of the p-value?
As an alternative solution if the Matlab implementation has some issue with the p-value, would it be possible to call, from Matlab, the Python script I wrote?

Star Strider 2024 年 8 月 5 日

You will have to ask the author of the MATLAB function about that. (All I did was to provide the search information.)

Sim 2024 年 8 月 5 日

Thanks @Star Strider :-) I would like to contact the author, but it looks like he is retired and it is 4 years that he does not check matlab answers.....

サインインしてコメントする。

Two-sample Anderson-Darling tests

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Two-sample Anderson-Darling tests

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

5 件のコメント 3 件の古いコメントを表示 3 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示