Create a Boxplot with two variables; one to separate into bins based on the other
10 ビュー (過去 30 日間)
古いコメントを表示
I want to create a boxplot with two variables. One of the variable will be binned based on the other. To clarify say one of the variable is time and the other is distance travelled. I want to bin the time into several bins based on which the distance will be binned and the boxplot will be generated. How to achieve this in Matlab ? Also I want the whiskers to be 5th and 95th percentile.
0 件のコメント
回答 (1 件)
Walter Roberson
2015 年 6 月 27 日
numbins = 5;
binedges = linspace(min(FirstVariable), max(FirstVariable), numbins+1);
binedges(end) = inf;
[~, binnumbers] = histc(FirstVariable, binedges);
boxplot(SecondVariable, binnumbers, 'positions', binedges, 'whisker', 0.7193313666);
The 0.719etc value is based upon solving
erf((s + 2*s*w)/sqrt(2)) == 9/10
where s = solve(erf(s/sqrt(2)) == 1/2) = 0.6744897500 which is the z score for 50% coverage (in agreement with the first entry in the table at Wikipedia) . The 9/10 reflects that you want 5% left before and 5% left after the whiskers, leaving 90% within the whiskers.
The s + 2*s*w is based upon the whisker formula q3 + w*(q3-q1) where q1 and q3 are the first and third quartiles; in a normally distributed distribution the z*sigma that gives 50% coverage for the +/- z*sigma standard deviations is the 0.674etc noted above.
Calculating the right whisker length was the hardest part of this, which is the reason I show the work here; using this you can calculate what whisker length to use if you decide to change your 95% criteria. Others might find it useful as well. And some day I will probably look back at this post to work it out for another question.
4 件のコメント
Walter Roberson
2015 年 6 月 28 日
binedges = 40:250:2700;
and carry on with the rest, such as
binedges(end) = inf;
[~, binnumbers] = histc(FirstVariable, binedges);
boxplot(SecondVariable, binnumbers, 'positions', binedges, 'whisker', 0.7193313666);
Replacing the last edge with inf has to do with the fact that for histc(), the final bin counts values which are exactly the value of the last edge. If you had bins at 1 5 9 then that would be 3 bins, second of which would count values from 5 to less than 9, and the 3rd would count the exactly 9. Replacing the final bin with inf to make 1 5 inf causes the second bin to be from 5 and upward (but not infinity), and so you would include that final value 9 in the second bin.
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!