how to generate a random vector with fixed sum and bounded elements

53 ビュー (過去 30 日間)
Saifullah Khalid
Saifullah Khalid 2020 年 2 月 18 日
回答済み: Jon 2020 年 3 月 10 日
I want to generate a random N element vector with a fixed sum: and each element is bounded as . is the fixed sum and specify upper bound on the element of the vector . I would appreciate any guidance.

採用された回答

Jon
Jon 2020 年 2 月 18 日
Maybe I am not understanding something, but it seems like your problem is overconstrained.
You can easily make a length n vector whose elements are bounded by d, for example using a uniform distribution
x = rand(N,1).*d % where d is a vector of upper bounds for each element
you can also make a random vector with a specified sum using
x = rand(N,1);
xfixedSum = x/sum(x)*Etotal
But it seems that in general it is not possible to hold both constraints.
  12 件のコメント
Jon
Jon 2020 年 3 月 4 日
編集済み: Jon 2020 年 3 月 4 日
@John Please look at my last algorithm, that I just posted before your comment. I'm sure it could be further optimized but overall I think it is quite efficient, with no rejection and very few operations.
Saifullah Khalid
Saifullah Khalid 2020 年 3 月 5 日
Jon, Thank you very much for all your kind support. I spent some time to understand the solution yesterday, as things were well beyond my mathematical knowhow. Thanks to your tons of comments about every step.Now, I am trying to establish the randomness properties of the generated data and also want to understand the underlying mathn in more formal way. I wish if John or you can kindly comment on these aspects.

サインインしてコメントする。

その他の回答 (2 件)

John D'Errico
John D'Errico 2020 年 3 月 4 日
編集済み: John D'Errico 2020 年 3 月 4 日
If your bounds are the same for all elements, then use randfixedsum. It is amazingly efficient, even for high dimensional problems. The number of dimensions is N in this sort of problem.
For example, to find 3 vectors, of length 10, such that the samples are uniformly distributed over the set of possible solutions, with a sum of 1, lower bound of 0, and upper bound of 1...
X = randfixedsum(10,3,1,0,1)
X =
0.11264 0.16122 0.1933
0.090028 0.13915 0.076298
0.10823 0.02528 0.024352
0.15859 0.31793 0.028457
0.062117 0.09639 0.056485
0.013329 0.051958 0.025118
0.10416 0.058997 0.3023
0.010921 0.021139 0.11247
0.21689 0.0064835 0.040133
0.12309 0.12146 0.14109
sum(X,1)
ans =
1 1 1
As I said, randfixedsum is quite efficient. The time to find 500 vectors, eah of length 1000 is about as good as you can possibly home for.
timeit(@() randfixedsum(1000,500,1,0,1))
ans =
0.062201
So only .06 seconds for the call. Find randfixedsum on the file exchange, here:
A problem arises when your bounds are not the same for all elements of the resulting vectors. Or, if the linear combination chosen is not just a simple sum. In these cases, as long as N is not too large, then use randfixedlinear combination.
For exampel, suppose we want to find vectors of length 5, such that the sum is also 1, but where the elements have an upper bound as the vector [.1 .2 .3 .4 .5]?
X = randFixedLinearCombination(3,1,ones(1,5),0,[.1 .2 .3 .4 .5])
X =
0.034225 0.15026 0.21493 0.27429 0.32629
0.091852 0.049101 0.27878 0.37448 0.20579
0.032732 0.086649 0.23589 0.36987 0.27486
sum(X,2)
ans =
1
1
1
So the first element lives in [0, 0.1], the 5th element lives in [0, 0.5], etc.
randFixedLinearCombination is not as efficient when the dimension N gets large however.
For example, it is extremely good in a low number of dimensions:
timeit(@() randFixedLinearCombination(3,1,ones(1,10),0,1))
ans =
0.00069653
>> timeit(@() randFixedLinearCombination(3,1,ones(1,15),0,1))
ans =
0.0067253
>> timeit(@() randFixedLinearCombination(3,1,ones(1,20),0,1))
ans =
0.39572
But as I start to push things a bit, say 20 dimensions, things get slower.
Find randFixedLinearCombination here:
A virtue of both codes is they assue a result that is uniformly distributed over the domain of interest.
Be very careful however, as many solutions are not so good, especially ones that seem trivial. For example, you might consider the classic rescaling solution, something that someone on the site is always fast to propose.
X = rand(1,3);
X = X./sum(X)
X =
0.55967 0.0077895 0.43254
It seems REALLY easy to do. And I will admit, it is fast. One problem is, the result is a set of vectors that are not uniformly distributed over the domain of all possible solutions. Worse, they may not satisfy the bound constraints as required. So, if we try plotting the result of 10000 such samples, we would see...
X = rand(10000,3);
X = X./sum(X,2);
plot3(X(:,1),X(:,2),X(:,3),'.')
grid on
box on
The result is not uniformly distributed over that set. As you can see, the samples are considerably less dense in some regions of the implied constraint plane.
randfixedsum and randFixedLinearCombination have no such problem however.
Y = randFixedLinearCombination(10000,1,ones(1,3),0,1);
plot3(Y(:,1),Y(:,2),Y(:,3),'.')
grid on
box on
As I said, another problem is rescaling can mess up the bound constraints. That is never a problem with the two tools I have recommended however.
  15 件のコメント
Jon
Jon 2020 年 3 月 9 日
Now that I know some better terms to search under, I found this in the MATLAB file exchange from Tim Benham. Haven't had a chance to try it but perhaps it might be interesting. Looks like it uses the random walk approach (actually has options for various specific methods) https://www.mathworks.com/matlabcentral/fileexchange/34208-uniform-distribution-over-a-convex-polytope
John D'Errico
John D'Errico 2020 年 3 月 10 日
Jon - I'm glad you have found the ideas here to have been stimulating. There are often a variety of ways to solve any problem, with some schemes better for large or small problems. Sometimes that requires an understanding of widely differing areas of mathematics, seeing how you can bring tools from a variety of domains to bear on the problem at hand. And it seems you can almost always find some new way to tweak an algorithm, getting an extra boost.
If you do come up with a nice code, then don't forget to post it on the file exchange. Spend some time if you can to test the properties of that scheme. Is it uniform, or perhaps just very close? Provably so, or is the algorithm a bit too ad hoc to see any direct proof? How fast is it? Does it have perform especially well in a high number of dimensions, or is there some sort of a rough limit on that?

サインインしてコメントする。


Jon
Jon 2020 年 3 月 10 日
Hi John, In case I have a chance to work on this some more, and for future reference, do you have any advice on quantitative tests that can be made to assess to what extent a collection of high dimensional points can be considered to have come from a uniform distribution.
In other words, if a scheme isn't uniform by construction, like yours, is there a way to quantitavely assess how well it is doing in terms of achieving uniform sampling?
I took a quick look, but couldn't find anything that was quickly accessible in terms of an easily implemented formulae, or even better something already in the file exchange. Thanks

カテゴリ

Help Center および File ExchangeLogical についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by