Generate normally distributed sample from data

3 ビュー (過去 30 日間)
Andrea C 2019 年 12 月 8 日
Commented: Andrea C 2019 年 12 月 9 日
Hi,
I have an array with many (>800000) rows. I want to select from one column 51 values to generate a new array with 51 normally distributed data. The values range from 0 to 10.
How can I do that?
Thanks,
Andrea

0 件のコメント

サインイン to comment.

採用された回答

Thiago Henrique Gomes Lobato 2019 年 12 月 8 日

I need to be careful to not start any discussion about how one actually define a normal distribution, but starting from the point that you don't want a exact perfect definition of normal distributed data you can use the Anderson-Darling test. The idea is to randomly sample 51 points from your array and them check if they are normal or not. To get it more robust, you can simply save the value with the highest p-value:
rng(33)
ArraySize = 80000;
A = rand(ArraySize,1); % not normal
A(500:1000) = randn(501,1); % normal
Founded = 0;
MaxIter = 1000;
Maxp = 0;
Ite = 1;
while ~Founded && Ite<MaxIter
SampledIndex = randperm(ArraySize,51); % Sample from your array
Asampled = A(SampledIndex);
[h,p] = adtest(Asampled); % Check if normal
% You can theoretically umcomment this, I however belive that looking at the max p
% is more robust
%Founded = ~h; % 0 if normal (can't reject the null hypotesis it is not normal)
if p>Maxp % Save the one that got the closest
BestAsoFar = Asampled;
Maxp = p;
end
Ite = Ite+1;
end
histogram(BestAsoFar)

2 件のコメント

Walter Roberson 2019 年 12 月 8 日
? This looks like it cherry picks samples to find a subset that is approximately normally distributed??
Andrea C 2019 年 12 月 9 日
Geat, it works.
This is exactly what I was looking for!

サインイン to comment.

Walter Roberson 2019 年 12 月 8 日
You can only do that under the circumstance that the column already contains normally distributed samples. If that is the case then you could use randperm() to select indices to extract from.
However, values in the range 0 to 10 are not normally distributed: normally distributed values have infinite tails in both directions. When you have a fixed finite range such as 0 to 10, then the closest you can get is a Beta distribution.

0 件のコメント

サインイン to comment.

サインイン してこの質問に回答します。

R2019a

Translated by