Fixing biased random number generation
古いコメントを表示
Hello. I am currently generating a dataset for a machine learning model, however I am having trouble with one of the variables.
First I generate 3000 values of a variable H between 100 and 1000.
Then I need a variable a that can have any value between 10% and 90% of the corresponding value of H. The problem is that when I look at the histogram of a it is clearly biased and I can´t find out why.
How could I generate an unbiased a?
Here is a piece of the code I am currently using:
N = 3000;
H = 100 + (1000 - 100) * rand(N, 1);
a_min = 0.1 * H;
a_max = 0.9 * H;
a = a_min + (a_max - a_min) .* rand(N, 1);
This is the histogram I am getting:

Some additional context:
The goal is to develop a machine learning (ML) model that is able to predict stress concentration factors (SCF) in a sheet with a hole in it. This has already been solved analitically, but I am doing this as an exercise. The SCF depends on H and a, which are geometrical constraints, they define the width of the plate and position of the hole respectively. So you can see why a is dependant on the values of H. The range 0.1*H to 0.9*H is just to make sure that hole won´t be located outside of the plate and that hole has a reasonable size. Of course there are other geometrical constriants and additional steps before getting to the ML part, but I believe this is enough for this post.
2 件のコメント
The constraint
% 0.1 H <= a <= 0.9 H
Is conditional probability so a must be "biased". It is simply an innevitable fact.
This is showed in the rejection random method code started from a uniform unconditional probabilty:
N = 3000;
H = 100 + (1000 - 100) * rand(N, 1);
clear a
for k = N:-1:1
a(k) = randone(H(k), 1000);
end
histogram(a)
function a = randone(H, maxa)
L = 0.1*H;
U = 0.9*H;
while true
a10 = maxa*rand(1,10);
k = find(a10 > L & a10 < U, 1, 'first');
if ~isempty(k)
a = a10(k);
return
end
end
end
Jose David
2024 年 11 月 4 日
採用された回答
その他の回答 (2 件)
% a_max .* rand(N, 1)
is a product of 2 independent uniform variables, it is NOT a uniform random variable as you expect since a_max is not constant.
x1 = rand(1,10000);
x2 = rand(1,10000);
x1Xx2 = x1.*x2;
histogram(x1Xx2)
Ths alone makes your "intuiltion" falling apart.
3 件のコメント
Perhaps what you want is this
N = 3000;
H = 100 + (1000 - 100) * rand(N, 1);
a_min = 0.1 * min(H);
a_max = 0.9 * max(H);
a = a_min + (a_max - a_min) .* rand(N, 1);
histogram(a)
Jose David
2024 年 11 月 3 日
Bruno Luong
2024 年 11 月 3 日
編集済み: Bruno Luong
2024 年 11 月 3 日
In which way you need a to be uniform in your ML problem? Does it really matter? Why?
Looks like what you want is not possible (What on earth are you really trying to do? as some of use keep asking)
May be what you call "bias" is simply a wrong expectation.
One thing people always want is that conditional probabimity have the same distribution as unconditional one. This assumption is always wrong.
John D'Errico
2024 年 11 月 3 日
編集済み: John D'Errico
2024 年 11 月 3 日
0 投票
You should recognize that the result of this operation will not be uniformly distributed. Does that matter? Or is it just a surprise to you, that starting with uniform random variables, you expect the result to also be uniform? Is there a reason why you need the result to be uniform?
カテゴリ
ヘルプ センター および File Exchange で Mathematics and Optimization についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


