Generate normally distributed sample from data

2 ビュー (過去 30 日間)
Andrea C
Andrea C 2019 年 12 月 8 日
コメント済み: Andrea C 2019 年 12 月 9 日
Hi,
I have an array with many (>800000) rows. I want to select from one column 51 values to generate a new array with 51 normally distributed data. The values range from 0 to 10.
How can I do that?
Thanks,
Andrea

採用された回答

Thiago Henrique Gomes Lobato
Thiago Henrique Gomes Lobato 2019 年 12 月 8 日
編集済み: Thiago Henrique Gomes Lobato 2019 年 12 月 8 日
I need to be careful to not start any discussion about how one actually define a normal distribution, but starting from the point that you don't want a exact perfect definition of normal distributed data you can use the Anderson-Darling test. The idea is to randomly sample 51 points from your array and them check if they are normal or not. To get it more robust, you can simply save the value with the highest p-value:
rng(33)
ArraySize = 80000;
A = rand(ArraySize,1); % not normal
A(500:1000) = randn(501,1); % normal
Founded = 0;
MaxIter = 1000;
Maxp = 0;
Ite = 1;
while ~Founded && Ite<MaxIter
SampledIndex = randperm(ArraySize,51); % Sample from your array
Asampled = A(SampledIndex);
[h,p] = adtest(Asampled); % Check if normal
% You can theoretically umcomment this, I however belive that looking at the max p
% is more robust
%Founded = ~h; % 0 if normal (can't reject the null hypotesis it is not normal)
if p>Maxp % Save the one that got the closest
BestAsoFar = Asampled;
Maxp = p;
end
Ite = Ite+1;
end
histogram(BestAsoFar)
  2 件のコメント
Walter Roberson
Walter Roberson 2019 年 12 月 8 日
? This looks like it cherry picks samples to find a subset that is approximately normally distributed??
Andrea C
Andrea C 2019 年 12 月 9 日
Geat, it works.
This is exactly what I was looking for!

サインインしてコメントする。

その他の回答 (1 件)

Walter Roberson
Walter Roberson 2019 年 12 月 8 日
You can only do that under the circumstance that the column already contains normally distributed samples. If that is the case then you could use randperm() to select indices to extract from.
However, values in the range 0 to 10 are not normally distributed: normally distributed values have infinite tails in both directions. When you have a fixed finite range such as 0 to 10, then the closest you can get is a Beta distribution.

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by