Create a Boxplot with two variables; one to separate into bins based on the other

10 ビュー (過去 30 日間)
Sayantan Sahu
Sayantan Sahu 2015 年 6 月 27 日
コメント済み: Walter Roberson 2015 年 6 月 28 日
I want to create a boxplot with two variables. One of the variable will be binned based on the other. To clarify say one of the variable is time and the other is distance travelled. I want to bin the time into several bins based on which the distance will be binned and the boxplot will be generated. How to achieve this in Matlab ? Also I want the whiskers to be 5th and 95th percentile.

回答 (1 件)

Walter Roberson
Walter Roberson 2015 年 6 月 27 日
numbins = 5;
binedges = linspace(min(FirstVariable), max(FirstVariable), numbins+1);
binedges(end) = inf;
[~, binnumbers] = histc(FirstVariable, binedges);
boxplot(SecondVariable, binnumbers, 'positions', binedges, 'whisker', 0.7193313666);
The 0.719etc value is based upon solving
erf((s + 2*s*w)/sqrt(2)) == 9/10
where s = solve(erf(s/sqrt(2)) == 1/2) = 0.6744897500 which is the z score for 50% coverage (in agreement with the first entry in the table at Wikipedia) . The 9/10 reflects that you want 5% left before and 5% left after the whiskers, leaving 90% within the whiskers.
The s + 2*s*w is based upon the whisker formula q3 + w*(q3-q1) where q1 and q3 are the first and third quartiles; in a normally distributed distribution the z*sigma that gives 50% coverage for the +/- z*sigma standard deviations is the 0.674etc noted above.
Calculating the right whisker length was the hardest part of this, which is the reason I show the work here; using this you can calculate what whisker length to use if you decide to change your 95% criteria. Others might find it useful as well. And some day I will probably look back at this post to work it out for another question.
  4 件のコメント
Sayantan Sahu
Sayantan Sahu 2015 年 6 月 28 日
They should be all of uniform width. What if I want the bins to be from 40 to 2700 for every 250 ?
Walter Roberson
Walter Roberson 2015 年 6 月 28 日
binedges = 40:250:2700;
and carry on with the rest, such as
binedges(end) = inf;
[~, binnumbers] = histc(FirstVariable, binedges);
boxplot(SecondVariable, binnumbers, 'positions', binedges, 'whisker', 0.7193313666);
Replacing the last edge with inf has to do with the fact that for histc(), the final bin counts values which are exactly the value of the last edge. If you had bins at 1 5 9 then that would be 3 bins, second of which would count values from 5 to less than 9, and the 3rd would count the exactly 9. Replacing the final bin with inf to make 1 5 inf causes the second bin to be from 5 and upward (but not infinity), and so you would include that final value 9 in the second bin.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by