# how to generate a random vector with fixed sum and bounded elements

36 ビュー (過去 30 日間)
Saifullah Khalid 2020 年 2 月 18 日

I want to generate a random N element vector with a fixed sum: and each element is bounded as . is the fixed sum and specify upper bound on the element of the vector . I would appreciate any guidance.

#### 0 件のコメント

サインイン to comment.

### 採用された回答

Jon 2020 年 2 月 18 日
Maybe I am not understanding something, but it seems like your problem is overconstrained.
You can easily make a length n vector whose elements are bounded by d, for example using a uniform distribution
x = rand(N,1).*d % where d is a vector of upper bounds for each element
you can also make a random vector with a specified sum using
x = rand(N,1);
xfixedSum = x/sum(x)*Etotal
But it seems that in general it is not possible to hold both constraints.

#### 12 件のコメント

John D'Errico 2020 年 3 月 4 日
I guess I was too late to see the question posed, but yes, it is quite possible to solve the problem where bounds are applied, and you have a sum constraint. As well, the problem is quite efficiently solvable, using the correct algorithms.
Jon 2020 年 3 月 4 日
@John Please look at my last algorithm, that I just posted before your comment. I'm sure it could be further optimized but overall I think it is quite efficient, with no rejection and very few operations.
Saifullah Khalid 2020 年 3 月 5 日
Jon, Thank you very much for all your kind support. I spent some time to understand the solution yesterday, as things were well beyond my mathematical knowhow. Thanks to your tons of comments about every step.Now, I am trying to establish the randomness properties of the generated data and also want to understand the underlying mathn in more formal way. I wish if John or you can kindly comment on these aspects.

サインイン to comment.

### その他の回答 (2 件)

John D'Errico 2020 年 3 月 4 日

If your bounds are the same for all elements, then use randfixedsum. It is amazingly efficient, even for high dimensional problems. The number of dimensions is N in this sort of problem.
For example, to find 3 vectors, of length 10, such that the samples are uniformly distributed over the set of possible solutions, with a sum of 1, lower bound of 0, and upper bound of 1...
X = randfixedsum(10,3,1,0,1)
X =
0.11264 0.16122 0.1933
0.090028 0.13915 0.076298
0.10823 0.02528 0.024352
0.15859 0.31793 0.028457
0.062117 0.09639 0.056485
0.013329 0.051958 0.025118
0.10416 0.058997 0.3023
0.010921 0.021139 0.11247
0.21689 0.0064835 0.040133
0.12309 0.12146 0.14109
sum(X,1)
ans =
1 1 1
As I said, randfixedsum is quite efficient. The time to find 500 vectors, eah of length 1000 is about as good as you can possibly home for.
timeit(@() randfixedsum(1000,500,1,0,1))
ans =
0.062201
So only .06 seconds for the call. Find randfixedsum on the file exchange, here:
A problem arises when your bounds are not the same for all elements of the resulting vectors. Or, if the linear combination chosen is not just a simple sum. In these cases, as long as N is not too large, then use randfixedlinear combination.
For exampel, suppose we want to find vectors of length 5, such that the sum is also 1, but where the elements have an upper bound as the vector [.1 .2 .3 .4 .5]?
X = randFixedLinearCombination(3,1,ones(1,5),0,[.1 .2 .3 .4 .5])
X =
0.034225 0.15026 0.21493 0.27429 0.32629
0.091852 0.049101 0.27878 0.37448 0.20579
0.032732 0.086649 0.23589 0.36987 0.27486
sum(X,2)
ans =
1
1
1
So the first element lives in [0, 0.1], the 5th element lives in [0, 0.5], etc.
randFixedLinearCombination is not as efficient when the dimension N gets large however.
For example, it is extremely good in a low number of dimensions:
timeit(@() randFixedLinearCombination(3,1,ones(1,10),0,1))
ans =
0.00069653
>> timeit(@() randFixedLinearCombination(3,1,ones(1,15),0,1))
ans =
0.0067253
>> timeit(@() randFixedLinearCombination(3,1,ones(1,20),0,1))
ans =
0.39572
But as I start to push things a bit, say 20 dimensions, things get slower.
Find randFixedLinearCombination here:
A virtue of both codes is they assue a result that is uniformly distributed over the domain of interest.
Be very careful however, as many solutions are not so good, especially ones that seem trivial. For example, you might consider the classic rescaling solution, something that someone on the site is always fast to propose.
X = rand(1,3);
X = X./sum(X)
X =
0.55967 0.0077895 0.43254
It seems REALLY easy to do. And I will admit, it is fast. One problem is, the result is a set of vectors that are not uniformly distributed over the domain of all possible solutions. Worse, they may not satisfy the bound constraints as required. So, if we try plotting the result of 10000 such samples, we would see...
X = rand(10000,3);
X = X./sum(X,2);
plot3(X(:,1),X(:,2),X(:,3),'.')
grid on
box on
The result is not uniformly distributed over that set. As you can see, the samples are considerably less dense in some regions of the implied constraint plane.
randfixedsum and randFixedLinearCombination have no such problem however.
Y = randFixedLinearCombination(10000,1,ones(1,3),0,1);
plot3(Y(:,1),Y(:,2),Y(:,3),'.')
grid on
box on
As I said, another problem is rescaling can mess up the bound constraints. That is never a problem with the two tools I have recommended however.

#### 15 件のコメント

Jon 2020 年 3 月 9 日
Hi John, Thank so much for your really indepth, and beautifully illustrated explanation of the underlying geometry of the problem. For some reason I have become really captivated by this problem, although I don't have any direct application for it. It was really great to find that this morning, as the day before I had spent a somewhat distracted afternoon hiking with my dogs trying to visuallize all of this in higher dimensions. So your discussion was particulary appreciated. Actually up until thinking about all of this, I had always just thought of hyperplanes, as a natural generatlization of planes, and imagined them as infinitetly thin sheets (like a plane in 3d or a line in 2d), but now am realizing that they are in general, in some sense quite "fat", being only one dimension less than the one they are embedded in. I know that I am going off on a tangent here but, it makes me wonder what is really the defining nature of a plane, maybe that it has a direction that is perpindicual to it that is not in the set, or that it divides the overall space into two halves one that is on one "side" of the plane and one that is on the other side of the plane.
I also just looked briefly at the following relevant link:https://mathoverflow.net/questions/76255/random-sampling-a-linearly-constrained-region-in-n-dimensions, and in case you were interested ran down the references that it linked to, some of which were broken and attached them here. The concept of all of those is sampling the bounded region through random walks, which seems quite interesting. I will also say that although I understand the very big picture concept, the details of those papers are well beyond what I could follow.
I also attached an interesting one on the hyperbox- hyperplane intersection problem.
I wanted to let you know too, that in my suggested ideas for solving this problem I wasn't in any way suggesting that your approach wouldn't be the way to go as long as the dimensionality was not limiting. In generating some ideas, I very optimistically thought I could come up with a nice clean way to do this that would not suffer too much from the "curse of dimensionallity". I am now quite humbled, by the subtleties of the problem.
Thanks again for all of the time you have put into, this. As I said the topic somehow really grabbed me and your explanations have been really enlightening.
Jon 2020 年 3 月 9 日
Now that I know some better terms to search under, I found this in the MATLAB file exchange from Tim Benham. Haven't had a chance to try it but perhaps it might be interesting. Looks like it uses the random walk approach (actually has options for various specific methods) https://www.mathworks.com/matlabcentral/fileexchange/34208-uniform-distribution-over-a-convex-polytope
John D'Errico 2020 年 3 月 10 日
Jon - I'm glad you have found the ideas here to have been stimulating. There are often a variety of ways to solve any problem, with some schemes better for large or small problems. Sometimes that requires an understanding of widely differing areas of mathematics, seeing how you can bring tools from a variety of domains to bear on the problem at hand. And it seems you can almost always find some new way to tweak an algorithm, getting an extra boost.
If you do come up with a nice code, then don't forget to post it on the file exchange. Spend some time if you can to test the properties of that scheme. Is it uniform, or perhaps just very close? Provably so, or is the algorithm a bit too ad hoc to see any direct proof? How fast is it? Does it have perform especially well in a high number of dimensions, or is there some sort of a rough limit on that?

サインイン to comment.

Jon 2020 年 3 月 10 日
Hi John, In case I have a chance to work on this some more, and for future reference, do you have any advice on quantitative tests that can be made to assess to what extent a collection of high dimensional points can be considered to have come from a uniform distribution.
In other words, if a scheme isn't uniform by construction, like yours, is there a way to quantitavely assess how well it is doing in terms of achieving uniform sampling?
I took a quick look, but couldn't find anything that was quickly accessible in terms of an easily implemented formulae, or even better something already in the file exchange. Thanks

#### 0 件のコメント

サインイン to comment.

サインイン してこの質問に回答します。

Translated by