フィルターのクリア

Replace multiple intervals in array with NaN - No loops

2 ビュー (過去 30 日間)
Michael Madelaire
Michael Madelaire 2017 年 2 月 14 日
コメント済み: Jan 2017 年 2 月 15 日
Hi
I am working on a project with a lot of data. In the area of 10.000.000 rows. I therefore cannot use for loops. The data has to be cleaned, with respect to certain parameters. These parameters are found independently and are not on measured with the same timesteps. Below is a simplified example that explains the idea of the problem. :)
The main dataset, let's call it:
dataset = 1:30;
Two arrays are now given. They indicate intervals within which measurements has to replaced with NaN.
upperbound = [2,5,10,18,27];
lowerbound = [0,3,8,15,25];
With a for loop I would normally say:
for i = 1:length(upperbound);
dataset(lowerbound(i):upperbound(i)) = NaN;
end
But with so much data this is not possible.
Do any of you have a good idea to solve this ?
  2 件のコメント
Jan
Jan 2017 年 2 月 14 日
The smallest lowerbound index is zero. Is this a typo, because indices must be >= 1?
Michael Madelaire
Michael Madelaire 2017 年 2 月 15 日
Yes, that is a typo!

サインインしてコメントする。

採用された回答

Jan
Jan 2017 年 2 月 14 日
編集済み: Jan 2017 年 2 月 15 日
With logical indexing:
data = 1:30;
upper = [2,5,10,18,27];
lower = [1,3,8,15,25]; % Not 0 as 1st element
% FAILING for upper==lower or overlapping intervals:
index = zeros(1, numel(data));
index(lower) = 1;
index(upper) = -1;
toNaN = (cumsum(index) == 1) | (index == -1); % Thanks Adam!
data(toNaN) = NaN;
Alternatively use Bruno's FEX: mcolon:
v = mcolon(lower, upper);
data(v) = NaN;
If this is the bottleneck of your code, use a C-Mex function.
// File: BlockCopy.c
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *low, *up, *out, NaN = mxGetNaN(), *p, *pend;
size_t n, lowi, upi, maxi;
maxi = mxGetNumberOfElements(prhs[0]);
low = mxGetPr(prhs[1]);
up = mxGetPr(prhs[2]);
n = mxGetNumberOfElements(prhs[1]);
if (n != mxGetNumberOfElements(prhs[1])) {
mexErrMsgTxt("*** BlockCopy[mex]: Index vectors must have the same length.");
}
plhs[0] = mxDuplicateArray(prhs[0]);
out = mxGetPr(plhs[0]) - 1; // For 1-based Matlab indexing
while (n--) {
lowi = (size_t) *low++; // Check validity of indices
upi = (size_t) *up++;
if (lowi < 1 || upi > maxi) {
mexErrMsgTxt("*** BlockCopy[mex]: Index out of bounds.");
}
p = out + lowi;
pend = out + upi;
while (p <= pend) {
*p++ = NaN;
}
}
return;
}
This works with overlapping intervals also.
Attention: The C-code will crash if any index is <= 0 or > then the length of the input array or if the inputs are no DOUBLE arrays. Either care for proper inputs or add checks of each index.
  7 件のコメント
Michael Madelaire
Michael Madelaire 2017 年 2 月 15 日
Thank you so much for the thorough answer answer! The elapsed time table is very cool. I think i will stick to the loops then. But really had expected the vectorized solution to be much faster.
Jan
Jan 2017 年 2 月 15 日
@Michael: Feel free to use the C-mex. I've attached the commented C-code and a pre-compiled function for Win/64.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeLoops and Conditional Statements についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by