MATLAB Answers

Chad Greene

Efficient moving average of scattered data

Chad Greene
さんによって質問されました 2016 年 6 月 28 日
最新アクティビティ Chris Turnes
さんによって 回答されました 2017 年 3 月 9 日
I have some scattered data and I'd like to take something similar to a moving average, where I average all values with in some radius of each point. I can do this with a loop, but I'd like a more efficient approach. Any ideas?
Here's a working example I'd like to make more efficient:
x = randi(100,45,1) + 20+3*randn(45,1) ;
y = 15*sind(x) + randn(size(x)) + 3;
radius = 10;
ymean = NaN(size(x));
for k = 1:length(x)
% Indicies of all points within specified radius:
ind = abs(x-x(k))<radius;
% Mean of y values within radius:
ymean(k) = mean(y(ind));
hold on
legend('scattered data','radial average','location','southeast')

  1 件のコメント

When I read the title I thought you might mean "sparse", and was thinking about how I might do an efficient moving average on sparse data.

サインイン to comment.


3 件の回答

Chad Greene
Answer by Chad Greene
on 30 Jun 2016
 Accepted Answer

I turned this into a generalized function called scatstat1, which is on the file exchange here.

  1 件のコメント

And a 2D version called scatstat2.

サインイン to comment.

Chris Turnes
Answer by Chris Turnes on 9 Mar 2017

If you can upgrade to R2017a, this functionality can now be achieved through the 'SamplePoints' name-value pair in the moving statistics. For your example, you would do something like movmean(y, 2*radius, 'SamplePoints', x); (though you'd need to sort your x values first).

  0 件のコメント

サインイン to comment.

Walter Roberson
Answer by Walter Roberson
on 28 Jun 2016

pdist() to get all of the distances simultaneously. Compare to the radius. Store the resulting mask. Multiply the mask by repmat() of the y value, and sum along a dimension. sum the mask along the same dimension and divide the value sum by that count. Result should be the moving average.

  3 件のコメント

Interesting idea! I got your solution working, but for N of 20,000 points the pdist function takes a bit of time. As it turns out, looping is a faster.
I wonder if looping pdist2() would be efficient? Eh, it probably just adds unnecessary overhead to a simple Euclidean calculation.
Also adds a Stats Toolbox dependency. I'll have to keep pdist in mind for future applications though. Thanks for the suggestion!

サインイン to comment.

Translated by