Why p-values are uniformly distributed when the null hypothesis is true?
64 ビュー (過去 30 日間)
I am struggling to understand why p-values are uniformly distributed when the null hypothesis is true. To me, it sounds very counterintuitive (although I know it is true) that p-values have a uniform distribution, rather than having a distribution that has more members on the side that probability is 1. I would really appreciate it if someone explain the reason with plain language.
Jeff Miller 2018 年 8 月 2 日
編集済み: Jeff Miller 2018 年 8 月 2 日
This is not really a MATLAB question, but OK, I’ll have a go.
First a technical detail: p values are only uniformly distributed when the test statistic has a continuous distribution. If the test statistic is discrete (e.g., the number of heads in 100 coin tosses, or any transformation of that number), then the distribution of p values is also discrete, hence non-uniform.
Next: When you are sampling from any continuous distribution at all, you are equally likely to sample a score at any percentile of that distribution. In other words, even though the scores themselves are not uniformly distributed, the percentiles of the scores are uniformly distributed. To visualize this, imagine that you arrange a population in a long line by height, from shortest to tallest. If you randomly select one individual, he or she is equally likely to be anywhere in the line—that is, equally likely to be at any percentile of the distribution. Note that this would be true regardless of the distribution of heights; for example, the population might include both humans and giraffes so the height distribution would be bimodel, but you could still line them all up from shortest to tallest. (Remember, continuous distribution, so no exact ties.)
Getting back to hypothesis testing, the null hypothesis allows you to compute the exact distribution of test statistic values that you should get if it is true. When you analyze some data, your computed test statistic is just a random sample from that distribution if H0 is true. In one-tailed testing, p is (by definition) the percentile of your computed test statistic, so it must be equally likely to lie at any percentile of that distribution. By the argument of the previous paragraph, this means that p has to be uniform. The idea is the same with two-tailed testing, though it’s a little harder to visualize.