Deleting duplicates based on conditions of multiple columns

Question

Nick 2020 年 12 月 28 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/703957-deleting-duplicates-based-on-conditions-of-multiple-columns

回答済み: Akash kumar 2022 年 7 月 31 日

Hi,

I have a large dataset (100m rows x 40 columns ) and I would like to delete any row that has duplicates on a few specific columns. See example below:

A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24];

I would like to delete all rows where the three columns have duplicate within each specific column. So in this example, row 2, 4 and 9 would be deleted because e.g.

row 1 and 2 have duplicates in each of the three columns and so I'd want to delete one of the two (doesn't matter which one).

I suspect the answer is somewhere along the use of unique and logical indexing but haven't managed to figure it out. Any help would be much appreciated. (I'm using Matlab 2018b)

Thanks

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Nick 2020 年 12 月 28 日

Thanks for this but unfortunately, this would work for this sample only I think. The actual dataset has 40 columns and i'd like to remove the rows based on the dupicates of 3 columns only, rather than all.

Nick 2020 年 12 月 28 日

MATLAB Online で開く

Just found the answer. This way you can find the unique rows amongst a number of columns (in this case, columns 1, 2 and 3) and then produce the original table without the duplicate values.

[C,ia] = unique(A(:,1:3),'rows')
A_new = A(ia,:)

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Nick 2020 年 12 月 28 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/703957-deleting-duplicates-based-on-conditions-of-multiple-columns#answer_586042

[C,ia] = unique(A(:,1:3),'rows')

A_new = A(ia,:)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Akash kumar 2022 年 7 月 31 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/703957-deleting-duplicates-based-on-conditions-of-multiple-columns#answer_1018540

MATLAB Online で開く

% With Index Number:- Shows the which index or Row value is extract from
% the A Matrix. I thinks, It can help you.
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24]';
[B index]=unique(AA(1:3,:).','rows', 'stable')
B = 10×3
     1    10     4
     1    11     5
     1    12     6
     1    12     7
     1    13     8
     2     4    25
     2    10    28
     3     5    33
     4    25    23
     4    23    24
index = 10×1
     1
     3
     5
     6
     7
     8
     9
    11
    12
    13

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Deleting duplicates based on conditions of multiple columns

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Deleting duplicates based on conditions of multiple columns

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示