Removing rows - duplicates based on a condition

4 ビュー (過去 30 日間)
Nick
Nick 2019 年 12 月 1 日
コメント済み: Turlough Hughes 2019 年 12 月 1 日
Hi,
I have a table with a number of columns and would like to delete some rows based on some conditions.
Here is an example:
A = table([1;2;3;3;3;3;4;4;4], [ 0;0;0;0;1;1;1;1;1], [1999;1999;1999;2000;2001;2002;2000;2001;2004]);
A.Properties.VariableNames = {'ID' 'Size' 'Date'};
The condidions are:
If ID is the same and within that ID the size = 1 then if there are more than one rows, keep the row that contains the earliest Date for that ID.
So in this case the new table would look like this:
B = table([1;2;3;3;3;4], [ 0;0;0;0;1;1], [1999;1999;1999;2000;2001;2000]);
B.Properties.VariableNames = {'ID' 'Size' 'Date'};
Thanks!
  2 件のコメント
Rik
Rik 2019 年 12 月 1 日
I can't come up with anything other than a solution involving a nested loop, which will have horrible performance on larger sets of data. How large do you expect your real data to be?
Nick
Nick 2019 年 12 月 1 日
Hi Rik,
It is a prety big data set. 3million rows x 50 columns.

サインインしてコメントする。

採用された回答

dpb
dpb 2019 年 12 月 1 日
編集済み: dpb 2019 年 12 月 1 日
[~,ia]=unique(A(A.Size==1,1),'first');
B=[A(A.Size==0,:);A([ia+find(A.Size==0,1,'last')],:)];
returns
>> B
B =
6×3 table
ID Size Date
__ ____ ____
1 0 1999
2 0 1999
3 0 1999
3 0 2000
3 1 2001
4 1 2000
>>
I'm guessing the selection on Size==1 is only artificial given no duplicates by inspection. Logic works more simply without having to not subset that grouping (the mess about the complicated indexing expression for B)
The solution relies on the Date field being sorted so unique returns the first/lowest date...if possibly not, then sort first.
  1 件のコメント
Turlough Hughes
Turlough Hughes 2019 年 12 月 1 日
The above also assumes that A.Size is sorted, if this is not the case in your data you should first sort the Size and then the Date where Size=1 as follows:
A = table([1;2;3;3;3;3;4;4;4;5;5;5;5], [ 0;0;0;0;1;1;1;1;1;0;0;1;1], ...
[1999;1999;1999;2000;2001;2002;2000;2001;2004;2000;2005;2005;2001], ...
'VariableNames',{'ID','Size','Date'}) % sample data to illustrate
T=sortrows(A,2); % Sort the size first.
T2=sortrows(T(find(T.Size),:),3) % then where Size=1, sort Date
A2=[A(A.Size==0,:);T2];
Then you just sub A2 in for A into dbp's solution.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeDates and Time についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by