## How to calculate mean of values based on bins created from a corresponding vales?

Palash Dhande

### Palash Dhande (view profile)

さんによって質問されました 2019 年 11 月 13 日 9:46

さんによって コメントされました 2019 年 11 月 13 日 20:14

さんの 回答が採用されました
I have two column vectors, lets call them A and B, and I have created an ordered paring from the values in these two vector.
I would like to make bins from the A values.
Then I would like to calculate the mean, max, standard deviation of the corresponding B values in the bins created from A values.
I have tried using histcounts,splitapply, accumarray, but i havent been able to find a correct solution. Any hints?
The A and B vectors are distance and intensity, respectively
range_intensity is the combined matrix of these two column vectors.
range_intensity =
[NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN
26.040001 0.011764706
26.080000 0.019607844
26.112000 0.023529412
26.232000 0.023529412
26.184000 0.031372551
26.240000 0.027450981
26.260000 0.031372551
26.271999 0.031372551
26.275999 0.031372551
26.316000 0.035294119
26.312000 0.035294119
26.351999 0.031372551
26.351999 0.031372551
26.372000 0.031372551
26.424000 0.031372551
26.424000 0.031372551
26.452000 0.031372551
26.480000 0.039215688
26.496000 0.035294119
26.572001 0.031372551
26.552000 0.035294119
26.604000 0.031372551
26.620001 0.035294119
26.680000 0.035294119
26.684000 0.035294119
26.719999 0.035294119
26.747999 0.027450981
26.784000 0.031372551
26.820000 0.031372551
26.848000 0.027450981
26.875999 0.031372551
26.872000 0.031372551
26.920000 0.027450981
26.944000 0.027450981
26.972000 0.031372551
27.020000 0.031372551
27.044001 0.027450981
27.115999 0.035294119
27.132000 0.031372551
27.164000 0.031372551
27.184000 0.035294119]
edges = [0:0.5:250];
[distance_count, indx] = histc(range_intensity(:,1), edges);
% function res=my_mean_omitnan(in)
% res=mean(in,'omitnan');
% end
mean = accumarray(indx+1, range_intensity(indx+1,2), [],@(x)mean(x,'omitnan'));
max = accumarray(indx+1, range_intensity(indx+1,2), [], @max);
std = accumarray(indx+1, range_intensity(indx+1,2), [], @std);
bar(max);
hold on;
plot(mean);
grid on;
One problem is that the lenght of edges vector and mean, max vectors doesnt match, so i cant plot the mean and max agianst the edges.
There are also NaN values in the two vectors, which should be discarded for mean, max and standard deviation calculation.
Furtheremore, what would be the best way to visualize this data?

KALYAN ACHARJYA

### KALYAN ACHARJYA (view profile)

2019 年 11 月 13 日 10:57
Can you share A & B examples?
Lokking for>>
1.....
2.....
Guillaume

### Guillaume (view profile)

2019 年 11 月 13 日 11:44
It's a bit unclear what binning method you want to use. An example would indeed be useful.
accumarray or groupsummary is probably the easiest way to do what you want. mean has a 'omitnan' option so it's not a problem ignoring NaNs.

サインイン to comment.

## 1 件の回答

2019 年 11 月 13 日 15:41

2019 年 11 月 13 日 16:45
採用された回答

Generally the edges should cover the span of your data, no more and no less with the exception that the final edge should be slightly larger than your maximum value to ensure that the final bin isn't absorbing extra values.
I suggest using discretize() to group the values in column 1 into discrete groups. The line below uses the range of your data to determine the range of bin edges.
edges = floor(min(range_intensity(:,1))) : .5 : ceil((max(range_intensity(:,1))+.001)*10/5)*5/10;
bins = discretize(range_intensity(:,1), edges);
The code above uses floor() to define the minimum bin edge. Bins are 0.5 units wide. It uses ceil() to define the maximum bin edge but to ensure that the max edge doesn't fall on your maximum data value, it adds 0.001 and then rounds up to the nearest 0.5 (hense, *10/5)*5/10)
Computing group statistics
If you have the statistics and machine learning toolbox, use grpstats() to compute grouped statistics.
[meanVal, maxVal, stdVal] = grpstats(range_intensity(:,2),bins,{@mean, @max, @std});
If you do not have access to the stats and ML toolbox, use splitapply() (or accumarray or other alternatives) to compute your grouped stats.
meanVal = splitapply(@mean,range_intensity(:,2), bins); % Repeat for other stats
Plotting the results
By definition, bin edges will always have 1 additional value than the number of bins. One way to plot binned data is to compute the bin center and use that as the x-value.
binCenters = edges(2:end) - (edges(2)-edges(1))/2;
If the bin edges were set up correctly following the steps above, you should end up with a vector of binCenters that is the same size as your grouped stats values. Plotting is then as simple as
figure()
bar(binCenters,maxVal)
hold on
plot(binCenters, meanVal,'ms')
grid on #### 8 件のコメント

2019 年 11 月 13 日 17:06
I wonder if there are bins that do not contain any data and whether groupsummary is merely skipping over those bins.
Guillaume

### Guillaume (view profile)

2019 年 11 月 13 日 19:19
"I don't know why I keep forgetting about groupsummary()"
Probably because you have the stats toolbox and I don't.
groupsummary will returns as many rows as numel(unique(bins)), so if some bin indices are not present, indeed these will be skipped. The second output of groupsummary will give you the bins matching the rows of the 1st output, so:
[meanmaxstd, bin] = groupsummary(range_intensity(:,2), bins, {'mean', 'max', 'std'});
edit: Or put the whole lot (range_intensity and bins) into a table and you'll get everything as one neat table as output (including number of elements used for each bin).