How to check for data normality using kstest?

Question

DANIEL KONG LEN HAO 2021 年 9 月 16 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1454254-how-to-check-for-data-normality-using-kstest

コメント済み: Rik 2021 年 9 月 18 日

Suppose I have a data set with about 100 numbers as listed below, how do I properly determine whether or not this data set is a normally distributed using the kstest()? The description mentioned to minus it by the mean and then divide it by standard deviation before putting in the kstest(), but do I need to do that for this case?

Dataset = [64 66 80 66 76 55 57 72 76 68 81 70 82 80 71 74 83 80 76 78 72 74 76 65 61 75 68 80 88 73 76 71 70 74 70 76 66 72 80 75 81 82 84 86 71 82 77 78 80 78 88 77 73 72 74 68 75 62 65 71 72 75 72 75 76 73 81 71 61 61 71 81 73 67 77 77 80 57 70 73 80 75 70 75 74 70 68 80 85 81 71 80 80 78 75 75 80 76 82 75 57];

PS: I'm testing on whether the data is normal only. I must use kstest to find it.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Rik 2021 年 9 月 16 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1454254-how-to-check-for-data-normality-using-kstest#answer_788519

If you want to test if your data is from a standard normal distribution you should not change it before calling kstest.

If you want to test if your data is normally distributed (but not necessarily from the standard normal distribution), you will first have to normalize it by subtracting the mean and dividing by the standard deviation.

Which of the two is relevant for your case depends on your context. I'm guessing you want the second one, otherwise you don't need the test.

2 件のコメント
なしを表示なしを非表示

DANIEL KONG LEN HAO 2021 年 9 月 18 日

Alright thank you! I was looking for normal distribution alone. Another thing I want to ask, does a smaller p-value (Probability) in the ks-test means it's more likely or less likely a normal distributed curve?

Rik 2021 年 9 月 18 日

MATLAB Online で開く

That is easy to determine: since your data is absolutely not from a standard normal distribution, you can feed it your unaltered data and see the result. You can also read the documentation:

help kstest
 KSTEST Single sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
    H = KSTEST(X) performs a Kolmogorov-Smirnov (K-S) test to determine if
    a random sample X could have come from a standard normal distribution,
    N(0,1). H indicates the result of the hypothesis test:
       H = 0 => Do not reject the null hypothesis at the 5% significance
       level. 
       H = 1 => Reject the null hypothesis at the 5% significance
       level.
 
    X is a vector representing a random sample from some underlying
    distribution, with cumulative distribution function F. Missing 
    observations in X, indicated by NaNs (Not-a-Number), are ignored.
 
    [H,P] = KSTEST(...) also returns the asymptotic P-value P.
 
    [H,P,KSSTAT] = KSTEST(...) also returns the K-S test statistic KSSTAT
    defined above for the test type indicated by TAIL.
 
    [H,P,KSSTAT,CV] = KSTEST(...) returns the critical value of the test CV.
 
    [...] = KSTEST(X,'PARAM1',val1,'PARAM2',val2,...) specifies one or
    more of the following name/value pairs:
 
        Parameter       Value
        'alpha'         A value ALPHA between 0 and 1 specifying the
                        significance level. Default is 0.05 for 5% significance.
        'CDF'           CDF is the c.d.f. under the null hypothesis.  It can
                        be specified either as a ProbabilityDistribution object
                        or as a two-column matrix. Default is the standard
                        normal, N(0,1).
        'Tail'          A string indicating the type of test. The one-sample
                        K-S test tests the null hypothesis that F = CDF
                        (that is, F(x) = CDF(x) for all x)
                        against the alternative specified by TAIL:
             'unequal' -- "F not equal to CDF" (two-sided test) (Default)
             'larger'  -- "F > CDF" (one-sided test)
             'smaller' -- "F < CDF" (one-sided test)
 
    Let S(X) be the empirical c.d.f. estimated from the sample vector X, F(X)
    be the corresponding true (but unknown) population c.d.f., and CDF be the
    known input c.d.f. specified under the null hypothesis.
    For TAIL = 'unequal', 'larger', and 'smaller', the test statistics are
    max|S(X) - CDF(X)|, max[S(X) - CDF(X)], and max[CDF(X) - S(X)], respectively.
 
    In the matrix version of CDF, column 1 contains the x-axis data and
    column 2 the corresponding y-axis c.d.f data. Since the K-S test
    statistic will occur at one of the observations in X, the calculation
    is most efficient when CDF is only specified at the observations in X.
    When column 1 of CDF represents x-axis points independent of X, CDF is
    're-sampled' at the observations found in the vector X via
    interpolation. In this case, the interval along the x-axis (the column
    1 spread of CDF) must span the observations in X for successful
    interpolation.
 
    The decision to reject the null hypothesis is based on comparing the
    p-value P with ALPHA, not by comparing the statistic KSSTAT with the
    critical value CV.  CV is computed separately using an approximate
    formula or by interpolation in a table.  The formula and table cover
    the range 0.01<=ALPHA<=0.2 for two-sided tests and 0.005<=ALPHA<=0.1
    for one-sided tests.  CV is returned as NaN if ALPHA is outside this
    range.  Since CV is approximate, a comparison of KSSTAT with CV may
    occasionally lead to a different conclusion than a comparison of P with
    ALPHA.  
 
    See also KSTEST2, LILLIETEST, CDFPLOT.

    Documentation for kstest
       doc kstest
[h,p]=kstest([64	66	80	66	76	55	57	72	76	68	81	70	82	80	71	74	83	80	76	78	72	74	76	65	61	75	68	80	88	73	76	71	70	74	70	76	66	72	80	75	81	82	84	86	71	82	77	78	80	78	88	77	73	72	74	68	75	62	65	71	72	75	72	75	76	73	81	71	61	61	71	81	73	67	77	77	80	57	70	73	80	75	70	75	74	70	68	80	85	81	71	80	80	78	75	75	80	76	82	75	57])
h = logical
   1
p = 3.2646e-90

So you can see your answer here: a small p value means it is less likely to be from a normal distribution.

サインインしてコメントする。

How to check for data normality using kstest?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How to check for data normality using kstest?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示