## Probability distribution of a multiple variable sum

Rémy Bretin

### Rémy Bretin (view profile)

さんによって質問されました 2019 年 5 月 10 日

### Rémy Bretin (view profile)

さんによって 回答されました 2019 年 5 月 14 日
Torsten

### Torsten (view profile)

さんの 回答が採用されました
Hi everyone,
I’m coming here for really advance statistic/probability advice, which I'm a beginner in this field.
I would like to know the probability of a variable TAU_total such as TAU_total=TAU1+TAU2+….+TAU129.
The variables TAUi are independent of each other.
For each one of them, I have a sample of 20,000 values which you can see some examples of their distribution on the histograms in the attachment.
My question is the following: I would like to be able to determine the probability of TAU_total to be superior to a certain value Xmax.
Regards,
Rémy John D'Errico

### John D'Errico (view profile)

on 10 May 2019
Remember that one of the underlying assumptions of the CLT is the variables are i.i.d. Thus independent, and identically distributed. CLT will apply to some extent. Almost everything is asymptotically normal. But that does not mean the CLT will be of value here.
On the other hand, a simulation will give you a very simple way to predict the desired result, and it will be reasonably accurate, far more so than trying to assume the CLT does indeed apply. The only virtue of the CLT is that there are 129 subdistributions. That makes the CLT possibly viable. It depends on how far into the tails the goal is to get.
Walter Roberson

### Walter Roberson (view profile)

on 11 May 2019
I did a quick simulation of size equal to the original question, using randn with a range of standard deviations. std() of the totals was roughly 10% larger than sum() of the individual std, divided by sqrt(129) . The calculations for iid where thus not exactly applicable, but they were pretty close. hist() of the total looks like model illustrations of a drawing a pure normal distribution until I got up to 56 bins in the histogram, at which point you could finally start to see statistical differences compared to a perfect curve.
John D'Errico

### John D'Errico (view profile)

on 11 May 2019
Admittedly, when I first saw this question, I read it as the sum of 12 terms, not 129. 129 terms will cause pretty much anything to look as if it is normally distributed. :)
N = 129;
alph = rand(1,N)*2 + .5;
bet = rand(1,N)*2 + .5;
betamean = alph./(alph + bet);
betavar = alph.*bet./((alph + bet).^2.*(alph+ bet+1));
CLTmean = sum(betamean);
CLTvar = sum(betavar);
CLTstd = sqrt(CLTvar);
nsim = 100000;
X = zeros(nsim,N);
for i = 1:N
X(:,i) = betarnd(alph(i),bet(i),[1,nsim]);
end
MCsum = sum(X,2);
MCmean = mean(MCsum);
MCvar = var(MCsum);
MCstd = std(MCsum);
[CLTmean, MCmean;CLTvar,MCvar;CLTstd,MCstd]
ans =
61.6267682855143 61.6366984653151
7.56366115010423 7.53757246937701
2.75021111009759 2.74546398071018
Comparing the histograms, we see:
histogram(MCsum,100,'norm','pdf')
hold on
fplot(@(x) normpdf(x,CLTmean,CLTstd),[min(MCsum),max(MCsum)],'r') So I don't see any problem using either approach. With only 12 terms in the sum, I'd probably go with the Monte Carlo.

サインイン to comment.

## 2 件の回答

### Torsten (view profile)

on 10 May 2019
Edited by Walter Roberson

### Walter Roberson (view profile)

on 10 May 2019

Take samples from your empirical data and use Monte-Carlo-simulation to determine the above probability.
This code should help to take the samples:

John D'Errico

### John D'Errico (view profile)

on 10 May 2019
Yes.

サインイン to comment.