Generating random numbers from 0 - 1 with limit on the sum

Question

Alexandra 2016 年 7 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/295688-generating-random-numbers-from-0-1-with-limit-on-the-sum

コメント済み: Alexandra 2016 年 7 月 18 日

Hi,

I am trying to generate values from 0 to 1 for several variables (x1 x2 x3).

These results must assure that x1*a1 + x2*a2 + x3*a3 <= D. a1 a2 a3 and D are other variables I define. How can I do this?

Thank you very much,

2 件のコメント
なしを表示なしを非表示

Adam 2016 年 7 月 15 日

How random do you want the values to be? e.g. there are ways to easily ensure that the variables will obey the constraint, but in a ways that bias the variables in some way or another rather than allowing as much free choice as might be required.

e.g. obviously every x1, x2, x3 will be no larger than 1 so if you divide each random number by (a1 + a2 + a3) you will guarantee that your condition is met, but not necessarily in a satisfactory manner because it is simply using basic upper bound maths.

Alexandra 2016 年 7 月 15 日

The numbers can't be biased. The x variables must be uniform.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

John D'Errico 2016 年 7 月 15 日

4
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/295688-generating-random-numbers-from-0-1-with-limit-on-the-sum#answer_228760

編集済み: John D'Errico 2016 年 7 月 15 日

MATLAB Online で開く

Ok, lets talk about the problem as defined. It is possible to solve the problem, but things are not always simple in high dimensions. In a high number of dimensions, the solution may be highly computationally intensive.

The set of numbers typically generated by rand are uniform, in 3 dimensions or in n dimensions. In n-dimensions, we can think of the points as living in a hypercube, an n-cube.

But if you require that the weighted sum

x1*a1 + x2*a2 + x3*a3 <= D

be limited to some constant D, then you actually want a UNIFORM random sample that lives in a truncated cube, but with one part of the cube sliced away, as if eaten by rats. :)

So, consider the problem in just 3 dimensions. I'll pick some arbitrary values for a1, a2, a3, D to see what happens here. For example, suppose that we chose the a's (and D) as:

a = [2;3;4];
D = 4;

I'm using a vector here, because numbered variables are just SO bad a programming solution. I won't do it. LEARN TO USE VECTORS. Were this a 10-dimensional problem, this solution would be far more readable and writable. In fact, as you see below, to change the solution to 10 dimensions from 3 dimensions, the change was trivial.

I've also used a column vector for a. So if X is a vector, of length 3 that comes from rand:

X = rand(1,3);

Well, actually, lets generate a large sample, say 1000 such samples, in 3 dimensions.

N =1000;
X = rand(N,3);

So now each row of X may be interpreted as one point in the unit 3-cube. We could plot them in 3-d, but many of them are outside of the linear constraint plane. In fact, only 353 of the 1000 samples we generated were within the constraint.

sum(X*a < D)
ans =
   353

Time for a picture.

ind = X*a < D;
plot3(X(ind,1),X(ind,2),X(ind,3),'ro')
hold on
plot3(X(~ind,1),X(~ind,2),X(~ind,3),'g+')
box on
view(42.5,19.6)

Ok, so it looks like roughly 35% of the 1000 points lie underneath the constraint plane. In fact, if D were a little smaller (in this case with these values for a, no larger then 2) then the region of interest would be described by a tetrahedron. But I chose D=4 above. In fact, it is possible to create the region of interest as a solid in 3 dimensions.

So it is a somewhat complex solid. Not just a simple tetrahedron. In fact, the code that I used to generate the picture created 14 different tetrahedrae in the simplicial complex that describes the volume. You can see the edges of them in that plot, where they touch the surface. Ok, better, using transparency...

So a bit messy.

Regardless ... the simple solution to this problem, to generate perhaps 1000 points that lie in the required region, is to just generate too many random samples, then throw away those that lie beyond the constraint.

So, lets suppose that we desire to sample N points, randomly and uniformly from the above volume, so that no point exceeds the required limit D. Simple (sort of) is a rejection sampling scheme:

a = [2;3;4];
D = 4;
N = 1000;
Xfinal = zeros(N,3);
Nfound = 0;
Nneed = N - Nfound;
while Nfound < N
  X = rand(N,3);
  % rejection step
  ind = find(X*a <= D);
  if ~isempty(ind)
    if numel(ind) > Nneed
      ind = ind(1:Nneed);
    end
    Xfinal(Nfound + (1:numel(ind)),:) = X(ind,:);
    Nfound = Nfound + numel(ind);
    Nneed = N - Nfound;
  end
end

Here, I've plotted the final set of 1000 points, with the required volume overlaid in transparent form to see that indeed, they do satisfy the constraint.

The nice thing is, the above solution is rather easily converted to work in a higher number of dimensions. The downside is that in a higher number of dimensions, the rejection step may be come slow. The curse of dimensionality nails you here. There are other, more sophisticated solutions that can be employed, but even there, they get nasty in a high number of dimensions.

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

John D'Errico 2016 年 7 月 15 日

編集済み: John D'Errico 2016 年 7 月 15 日

MATLAB Online で開く

From your comment, I see that the real requirement is N==10. I cannot plot a result in 10 dimensions. Unless your hyper-dimensional monitor is working. Mine is always broken, and those hyper-dimensional parts to repair it just never seem to arrive from the shipper on Alpha Centauri.

So, lets pick some arbitrary constraint plane, in 10 dimensions.

a = rand(10,1)*2;
D = 5;

I've chosen D to be reasonable.

Fix the code I wrote above by simply changing 3 to 10 everywhere.

Ndims = 10;
N = 1000;
Xfinal = zeros(N,Ndims);
Nfound = 0;
Nneed = N - Nfound;
while Nfound < N
  X = rand(N,Ndims);
  % rejection step
  ind = find(X*a <= D);
  if ~isempty(ind)
    if numel(ind) > Nneed
      ind = ind(1:Nneed);
    end
    Xfinal(Nfound + (1:numel(ind)),:) = X(ind,:);
    Nfound = Nfound + numel(ind);
    Nneed = N - Nfound;
  end
end

I can't plot the result. Sorry. :)

sum(Xfinal*a < D)
ans =
        1000

Here are the first few results from that sample:

Xfinal(1:3,:)
ans =
      0.12019      0.23908      0.72263      0.53145   0.00037582      0.26143      0.69143      0.23628      0.65201       0.6688
      0.48987      0.86883      0.18079       0.5219    0.0095166     0.044184      0.40472      0.32727      0.30587      0.52428
       0.5297      0.62398      0.19386      0.87128        0.753      0.34415      0.14606      0.51514      0.29419      0.22033

Each row of Xfinal is one point, in the 10 dimensional space. All obey the required linear constraint. And we know the sampling is uniform, because they came from rand, and we never transformed the set, only threw those away that failed the linear constraint. So they are uniform within constraint space.

Note that for SOME values of D, the above sampling scheme might be slower than others, since many points will be rejected. So, if I had chosen D as 0.01, the sampling time would have been seriously long.

Ok, in my need for perfection, I should point out there does exist a rather simple solution that would work efficiently for tiny values of D, as long as the result lies inside a single simplex in 10 dimensions, to generate uniformly sampled points inside a single 10-simplex. I won't show that unless someone really wishes to see the answer though, since it only applies for small values of D here.

John D'Errico 2016 年 7 月 15 日

編集済み: John D'Errico 2016 年 7 月 15 日

MATLAB Online で開く

I think you misunderstand what it means to be "uniform".

Lets look at a similar problem in two dimensions. So much easier to plot.

I'll generate many points that are UNIFORM in the (x,y) plane originally from the unit square [0,1]X[0,1]. I'll require they also satisfy the constraint

x + y <= 1

There are many ways I can do this, but the simple rejection scheme I offered before is entirely adequate.

xy = rand(1e4,2);
xy(sum(xy,2) > 1,:) = [];
plot(xy(:,1),xy(:,2),'k.')
axis equal
axis square
grid on

So I expect to find roughly 5000 points in the random sample, with a rejection rate of 50%. After plotting them, I think we can be quite comfortable in the uniformity. (I chose just a large enough sample size that the points are clearly uniform in density, but not to many to overwhelm the pixel resolution. Had I chosen sufficiently large sample size, all of those holes would be filled in quite neatly.)

So, IF you agree that this sampling is uniform, then lets go to the next step. Lets plot a histogram of xy*[1;1]=sum(xy,2). Essentially, how many points are near zero in terms of the sum, and how many are close to D? Surely that histogram must be pretty constant, IF they are uniform in the way you are asking? NOT. This is not true at all.

hist(sum(xy,2),100)

The reason is, there are VERY few sample points from a unit square that have a sum of some number very near zero. Both x AND y must be VERY near zero for that to happen. At the same time, there are many points in the unit square that sum to a number near 1, many ways for that to happen.

So you need to be careful what you compute on a uniform sample. The result need not be uniform at all. This is now starting to verge into the domain of measure theory in statistics.

Alexandra 2016 年 7 月 18 日

Hi John, That makes sense, thanks. It was what I meant by the data in "a" generating those results, because of the combinations. Just wanted to make sure I'm seing everything. Statistics is not the hurdle =)

サインインしてコメントする。

Answer 2

Azzi Abdelmalek 2016 年 7 月 14 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/295688-generating-random-numbers-from-0-1-with-limit-on-the-sum#answer_228671

編集済み: Azzi Abdelmalek 2016 年 7 月 14 日

MATLAB Online で開く

a1=2
a2=3
a3=4
D=5
v=rand(1,2)
x1=v(1)
x2=v(2)
x3=rand*(D-a1*x1-a2*x2)/a3

Check

x1*a1 + x2*a2 + x3*a3<D

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

John D'Errico 2016 年 7 月 15 日

編集済み: John D'Errico 2016 年 7 月 15 日

ARGH. This answer is bad for several reasons.

It does not insure the sum is less than 30. It ensures that the sum IS 30.

The question was to have a solution such that the sum is NO GREATER THAN D.

As bad is the non-uniformity issue. Even if this solution were only applied to the set of points that fell above the limit, it would then have too many points packed on the hyper-plane at the limit.

Finally, even if the question was about forcing the sum to be exactly D, this quasi-solution would STILL be a bad idea, because it does not even result in a uniform sampling on the indicated hyper-plane.

So bad for three reasons. I'll post an answer that goes into some depth.

Alexandra 2016 年 7 月 15 日