How to divide a column in equal density bins.

9 ビュー (過去 30 日間)
Tushar Aggarwal
Tushar Aggarwal 2015 年 9 月 27 日
コメント済み: Tushar Aggarwal 2015 年 9 月 27 日
I have a column of values of which some are missing (nan). I want to implement a function that discretizes them into 10 equal density bins (not equal width). So, each bin will have approximately the same number of samples and that function returns me original index of all values in each bin. Note: Nan must be ignored. I tried quantile but the values in each bin are different. Any help?

採用された回答

Walter Roberson
Walter Roberson 2015 年 9 月 27 日
sortvals = sort(YourData(~isnan(YourData)));
binwidth = floor(length(sortvals)/10);
leftover = length(sortvals) - binwidth*10;
bincontents = cell2mat(sortvals(:), [binwidth*ones(1,9), leftover], 1);
The extras that do not fit within equal-width bins are allocated arbitrarily to the last bin.
  4 件のコメント
Walter Roberson
Walter Roberson 2015 年 9 月 27 日
For lack of other instructions, I will distribute them evenly over the interior.
sortvals = sort(YourData(~isnan(YourData)));
binwidth = floor(length(sortvals)/10);
cellwidths = binwidth*ones(1,10);
%distribute leftovers evenly in interior
leftover = length(sortvals) - binwidth*10;
leftovers_at = floor(linspace(0,11,leftover+2)); %not linspace(1,10) !!
leftovers_at = leftovers_at(2:end-1); %trim 0, 11
cellwidths(leftovers_at) = cellwidths(leftovers_at) + 1;
bincontents = cell2mat(sortvals(:), cellwidths, 1);
Note: this code to distribute over the interior will not necessarily work correctly if the number of bins is not 10. In particular, when there are a lot of leftovers relative to the number of bins, I do not promise that floor() will not create a duplicate. I don't think it would, but I have not proven that it cannot, such as due to round-off error.
Tushar Aggarwal
Tushar Aggarwal 2015 年 9 月 27 日
Thanks

サインインしてコメントする。

その他の回答 (1 件)

Image Analyst
Image Analyst 2015 年 9 月 27 日
Perhaps compute CDF and scan along putting 10% of the total into each new, variable-width bin:
% Make random data of "density" so I assume it's a histogram.
myHistogram = randi(20, 1, 1234);
% Randomly make some of them nan's
% Not sure how this would happen with a histogram, but whatever....
nanLocations = randi(length(myHistogram), 1, 33);
myHistogram(nanLocations) = NaN
% Now we can start
% First make NaNs zero.
myHistogram(isnan(myHistogram)) = 0
% Now compute CDF
myCDF = cumsum(myHistogram);
myCDF = myCDF / myCDF(end);
% plot(myCDF);
% grid on;
% Find out how many bins to sum together
% so that we get 10 new bins.
binsToUse = round(length(myHistogram)/10);
% Rebin into 10 bins
edges(1) = 1; % Location of first bin.
for b = 1 : 9
% Find out bin that will give CDFs of 10%, 20%, 30%,...100%
endingBin = find(myCDF < b*0.1, 1, 'last')
edges(b+1) = endingBin;
% Sum those bins to form new histogram
newHist(b) = sum(myHistogram(edges(b):edges(b+1)));
end
% Finish up with last bin.
newHist(10) = sum(myHistogram(edges(9) + 1:end));
% Print to command line
edges
newHist

カテゴリ

Help Center および File ExchangeData Distribution Plots についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by