Efficient moving average of scattered data

Chad Greene (view profile)

さんによって質問されました 2016 年 6 月 28 日

Chris Turnes (view profile)

さんによって 回答されました 2017 年 3 月 9 日

Chad Greene (view profile)

さんの 回答が採用されました
I have some scattered data and I'd like to take something similar to a moving average, where I average all values with in some radius of each point. I can do this with a loop, but I'd like a more efficient approach. Any ideas?
Here's a working example I'd like to make more efficient:
x = randi(100,45,1) + 20+3*randn(45,1) ;
y = 15*sind(x) + randn(size(x)) + 3;
figure
plot(x,y,'bo')
ymean = NaN(size(x));
for k = 1:length(x)
% Indicies of all points within specified radius:
% Mean of y values within radius:
ymean(k) = mean(y(ind));
end
hold on
plot(x,ymean,'ks') Walter Roberson

Walter Roberson (view profile)

on 28 Jun 2016
When I read the title I thought you might mean "sparse", and was thinking about how I might do an efficient moving average on sparse data.

サインイン to comment.

3 件の回答

Chad Greene (view profile)

on 30 Jun 2016

I turned this into a generalized function called scatstat1, which is on the file exchange here.

Chad Greene (view profile)

on 30 Jun 2016
And a 2D version called scatstat2.

サインイン to comment.

Answer by Chris Turnes

Chris Turnes (view profile)

on 9 Mar 2017

If you can upgrade to R2017a, this functionality can now be achieved through the 'SamplePoints' name-value pair in the moving statistics. For your example, you would do something like movmean(y, 2*radius, 'SamplePoints', x); (though you'd need to sort your x values first).

0 件のコメント

サインイン to comment.

Answer by Walter Roberson

Walter Roberson (view profile)

on 28 Jun 2016

pdist() to get all of the distances simultaneously. Compare to the radius. Store the resulting mask. Multiply the mask by repmat() of the y value, and sum along a dimension. sum the mask along the same dimension and divide the value sum by that count. Result should be the moving average.

Chad Greene (view profile)

on 30 Jun 2016
Interesting idea! I got your solution working, but for N of 20,000 points the pdist function takes a bit of time. As it turns out, looping is a faster.
Walter Roberson

Walter Roberson (view profile)

on 30 Jun 2016
I wonder if looping pdist2() would be efficient? Eh, it probably just adds unnecessary overhead to a simple Euclidean calculation.