Aggregating while removing the individual data

2 ビュー (過去 30 日間)
Danielle Leblance
Danielle Leblance 2017 年 1 月 13 日
編集済み: dpb 2017 年 1 月 13 日
Hi,
Is there a way to aggregate the data of a matrix in Matlab based on ID while removing the individual observations? in other words , i will put a small sample here but the matrix is near 1 million rows with 60 columns, A =
3 7 8 5 4 1800
5 6 8 6 2 1600
4 5 7 7 3 1800
1 23 67 3 15 1800
4 4 5 7 12 1100
45 6 56 6 56 1100
the last columns is the column of IDs. we can see that there are 3 observations for firm with ID 1800, 1 firm with ID 1600, and 2 for firm with ID 1100. Is it possible to get another matrix where i take the sum of the columns for firms with the same ID? I mean can I obtain another Matrix B with the following output:
ans =
8 35 82 15 22 1800
5 6 8 6 2 1600
49 10 61 13 68 1100
thanks

採用された回答

dpb
dpb 2017 年 1 月 13 日
編集済み: dpb 2017 年 1 月 13 日
Hmmm....there oughta' be way w/ accumarray in one go, but doesn't come to me at the moment how to work around the restriction (which never have understood need for) that the accumulating values must be only a vector...
[u,~,iu]=unique(A(:,end)); % index of unique ID locations
nCol=size(A,2);
B=zeros(length(u),nCol); % preallocate output
for i=1:nCol-1
B(:,i)=accumarray(iu,A(:,i)); % sums by ID for each column
end
B(:,end)=u; % augment with corresponding ID
ADDENDUM
>> table2array(grpstats(array2table(A),'A6',@sum))
ans =
1100 2 49 10 61 13 68
1600 1 5 6 8 6 2
1800 3 8 35 82 15 22
>>
or, if a table is good enough for output, can dispense with the table2array conversion back...
For dynamic case, you'll need to build the grouping variable (here 'A6') from the size something like
grp=num2str(size(A,2),'A%d'); % group on last column in A
  2 件のコメント
Guillaume
Guillaume 2017 年 1 月 13 日
There is sort of a way with accumarray, using an anonymous function and cell array output. It may well be slower than the loop over the columns:
[ids, ~, subs] = unique(A(:, end));
B = [cell2mat(accumarray(subs, (1:size(A, 1))', [], @(rows) {sum(A(rows, 1:end-1), 1)})), ids]
dpb
dpb 2017 年 1 月 13 日
As almost certainly will be the table solution--let's see...yeah, here just for the above it's 16X the loop cost owing to the table conversion overhead (altho removing the conversion back to array makes no appreciable difference in the total). Half the total roughly appears to be the conversion to the table so still the operation on the table is still also comparatively quite slow. While cute syntax, probably will be killer for OP's size of array.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeDescriptive Statistics についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by