The proper way to sample 3 normally or lognormal distributed variables added up to 1

Question

Jesse Chao 2023 年 1 月 2 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1

コメント済み: Paul 2023 年 1 月 5 日

Hello team,

Is there any way to generate 3 normally or lognormal distributed variables added up to 1?

For example, I have a human tissue with a volume as 1 L. And there are three components vascular space (R_vas), interstitial space (R_int), cellular space (R_cell) to composite this tissue. Thus, in this case, R_vas + R_int + R_cell = 1.

To further consider variation in different human indivisual, I would like to normally or lognormally sample these three parameters (R_vas, R_int and R_cell have a mean 0.1, 0.1, 0.8, and coefficient variation 0.3, 0.3, 0.3, respectively) but ensure that these three parameters add up to 1.

I don't think I could just randomly sample two of the parameters and use 1 to minus the sum of the two randomly smapled parameters to get the third one. In this case, I think the third parameter is not followed the predefined mean and coefficient variation. or is it?

How could I achieve this task?

It will be great to have your help.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

John D'Errico 2023 年 1 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139822

編集済み: John D'Errico 2023 年 1 月 2 日

MATLAB Online で開く

Let me answer your second question separately. How would you sample three NORMALLY distributed random variables that sum to 1? This part is easy, since it merely requires you to properly construct the necessary covariance matrix.

First, assume we have three variables, with means and standard deviations:

mu = [.1 .1 .8];
sigma = [.3 .3 .3];

But really, we need a covariance matrix that is not a diagonal one. I'll construct it as if I knew the eigenvalue decomposition.

Q = [null([1 1 1]),[1;1;1]/sqrt(3)]
Q = 3×3
   -0.5774   -0.5774    0.5774
    0.7887   -0.2113    0.5774
   -0.2113    0.7887    0.5774
C = Q*diag(3/2*[.3^2 .3^2 0])*Q'
C = 3×3
    0.0900   -0.0450   -0.0450
   -0.0450    0.0900   -0.0450
   -0.0450   -0.0450    0.0900

As you can see, the VARIANCES (so the sqrt of the diagonal elements) of the covariance matrix are each 0.3. But they are now correlated.

Now we can generate set of numbers that have the desired property, at least over the long term, and within floating point trash. What I did was to carefully construct a singular covariance matrix.

X = mvnrnd(mu,C,10000);

hist(sum(X,2))

mean(X)

ans = 1×3

0.1037 0.1011 0.7952

std(X)

ans = 1×3

0.2987 0.3059 0.2997

format long g

[min(sum(X,2)),max(sum(X,2))]

ans = 1×2

0.9999999675926 1.00000003250349

So to within floating point trash, they sum to 1. At least as close as we can come based on double precision arithmetic in what I did. They have the desired means and variances.

Note that these variables have a problem perhaps in what you are doing, in that some of those elements will often be negative.

hist(min(X,[],2))

That is a likely event when the sum is required to be 1, and you are asking about NORMALly distributed random variables. This is also a reason why your goal MUST fail for the lognormal case.

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

Paul 2023 年 1 月 5 日

MATLAB Online で開く

I would attack the problem this way

N = [eye(2);[-1 -1]];

N forms a basis for the null space of [1 1 1]

[1 1 1]*N
ans = 1×2
     0     0

Define C as follows

syms sigma_x sigma_y rho_xy

C = N*[sigma_x^2 rho_xy*sigma_x*sigma_y;rho_xy*sigma_x*sigma_y sigma_y^2]*N.'

C =

We see that we can define the covariance of X and Y (upper 2x2 block), but we get what we get for the variance of Z and its correlation with the other two variables.

Example from @John D'Errico formed by

C = N*[.3^2 -0.5*0.3*.3;-0.5*.3*.3 .3^2]*N'
C = 3×3
    0.0900   -0.0450   -0.0450
   -0.0450    0.0900   -0.0450
   -0.0450   -0.0450    0.0900

But we can pick other parameters. For example

C = N*[0.3^2 0.7*0.3*1.3;0.7*0.3*1.3 1.3^2]*N'

C = 3×3

0.0900 0.2730 -0.3630 0.2730 1.6900 -1.9630 -0.3630 -1.9630 2.3260

mu = [0.2 0.3 0.5]; % must sum to 1

histogram(sum(mvnrnd(mu,C,1e5),2));

Geometrically, the ellipsoid is collapsing down to an ellipse centered on mu and lying in the plane x+y+z=1. There's probably even a further degenerate case where the ellipsoid collapses to a line going through my and lying in that plane.

Bruno Luong 2023 年 1 月 5 日

The problem is - as John pointed out - x, y, z ca get negative values. If that negative quantities has physical interpretation for fractional volume of human tissu then it's OK.

Paul 2023 年 1 月 5 日

As John did, I'm just addressing the question of how to specify mu and C such that mvnrnd(mu,C,n) returns values that sum to unity. Maybe that's not the correct question to ask for what the OP is really interested in.

サインインしてコメントする。

Answer 2

Torsten 2023 年 1 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139802

You can define a common probability distribution of the three variables on the triangle

x + y + z = 1, x, y, z >= 0

but this cannot be a normal or lognormal distribution for each of the random variables as you suggested.

You can also divide the result of individual sampling by R_vas + R_int + R_cell, but this will change the assumed distributions for R_vas, R_int and R_cell.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Jesse Chao 2023 年 1 月 5 日

Hello Torsten,

Thank you for the suggestions and reminders.

Jesse

サインインしてコメントする。

Answer 3

John D'Errico 2023 年 1 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139807

編集済み: John D'Errico 2023 年 1 月 2 日

Magic?

You have three variables, with means that will at least get you in the right ballpark. The goal however, its to insure the sum is exactly 1. What property does a lognormal variable have? You can generate one by generating a normally distributed random variable, and then exponentiating it. So effectively, the log of a lognormal variate, is Normally distributed.

Now, if your goal was to find three lognormally distributed variables, where the PRODUCT was 1, this problem would be far easier. You find a set of three normal variables where the sum was zero. When you exponentiate, the product is automatically 1. And finding a set of normal variates with a sum of zero is almost trivial. (No reason to get into that here, as it is not pertinent to your problem.)

Anyway, you are correct in that you CANNOT just sample two variates, then choose the third to force that sum to 1. But let me back up. Does this entire question even make mathematical sense? Sadly, no. Remember that a lognormal variable lives on the interval (0,inf). So there is a strong chance that any one of those variables themselves are greater than 1. And when that happens, the sum can NEVER be 1 because the other two terms in the sum can never be negative. So you cannot have a situation where the sum is constrained to be a constant. Sorry.

At best, you could think about a set of TRUNCATED lognormal variables, that sum to 1. And even then, the truncation point would be difficult to quantify.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 4

Bruno Luong 2023 年 1 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139837

編集済み: Bruno Luong 2023 年 1 月 2 日

MATLAB Online で開く

This will generate 3 random variables positives and sum to 1, with mean 8/10, 1/10, 1/10:

nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
% this two lines will generate proper uniform conditioninf distributions,
% row-sum to 1 for all p columns
r=-log(rand(m,p)); 
r = sum(r./sum(r,1),2)/p;
%
r=A*r;
r
r = 3×1
    0.7922
    0.1397
    0.0680

It will NOT be normal, but it kind of "normal" in the sense of central limit theorem, i.e. a limit of sum of a "large" independent random variables.

2 件のコメント
なしを表示なしを非表示

Bruno Luong 2023 年 1 月 3 日

編集済み: Bruno Luong 2023 年 1 月 3 日

MATLAB Online で開く

Here is the histogram of the results using my code. It will not Gaussian curve but skewed bell shape:

When you increase p, the histogram will be narrower but less skewed.

nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
rtab = zeros(size(A,1),1e6);
for k=1:size(rtab,2)
    % this two lines will generate proper uniform conditioninf distributions,
    % row-sum to 1 for all p columns
    r=-log(rand(m,p));
    r = sum(r./sum(r,1),2)/p;
    %
    rtab(:,k)=A*r;
end
figure
subplot(2,2,1)
histogram(rtab(1,:))
subplot(2,2,3)
histogram(rtab(2,:))
subplot(2,2,4)
histogram(rtab(3,:))

Run with p = 20

Jesse Chao 2023 年 1 月 5 日

Hello Bruno,

Thank you for the solutions.

This somehow could solve the problem.

Jesse

サインインしてコメントする。

The proper way to sample 3 normally or lognormal distributed variables added up to 1

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

その他の回答 (3 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

The proper way to sample 3 normally or lognormal distributed variables added up to 1

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

8 件のコメント 6 件の古いコメントを表示6 件の古いコメントを非表示

その他の回答 (3 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示