multiple condition sumif without loop

9 ビュー (過去 30 日間)
ahmed920
ahmed920 2022 年 5 月 26 日
回答済み: Seth Furman 2022 年 6 月 2 日
I have the following table and want to sum over unique dates and Accounts to get the daily changes per account
TransactionID = {'102001';'102002';'102003';'102004';'102005';'102006';'102007';'102008';'102009';'102010'};
Date = datetime({'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-09';'2021-02-09';'2021-02-09';'2021-02-17'},"InputFormat","uuuu-MM-dd");
AmountDebited = [3000; 0; 0; 3000; 4443; 2.02877864; 430; 50; 200; 0.297598984636403];
AccountDebited = ["TD-CAD";"LC-CAD";"LC-CAD";"SI-CAD";"SP-CAD";"LC-USD";"SP-CAD";"SP-CAD";"SP-CAD";"LC-USD"];
AmountCredited = [3000; 0; 0; 3000; 2.03048665; 4450; 0.18713085; 0.02177471; 0.0869854; 0.295498984636403];
AccountCredited = ["SP-CAD"; "MM-LPT"; "KN-DIR"; "SP-CAD"; "LC-USD"; "SI-CAD"; "LC-USD"; "LC-USD"; "LC-USD"; "MM-USD"];
X = table(TransactionID, Date, AmountDebited, AccountDebited, AmountCredited, AccountCredited);
Right Now I can do it using the following for loop to compute the result but My actual table has thousands of records and the calculation takes forever so have been trying to get it to work using indexes but had no luck so far.
UniqueDebit = unique(X.AccountDebited);
UniqueCredit = unique(X.AccountCredited);
UniqueConc = unique([UniqueDebit;UniqueCredit]);
UniqueDates = unique(X.Date);
for i = 1:length(UniqueDates)
for j = 1:height(UniqueConc)
RowsCredited = find(X.Date == UniqueDates(i) & X.AccountCredited == UniqueConc(j));
RowsDebited = find(X.Date == UniqueDates(i) & X.AccountDebited == UniqueConc(j));
DailyCredited = sum(X.AmountCredited(RowsCredited));
DailyDebited = sum(X.AmountDebited(RowsDebited));
DailyChanges = DailyCredited - DailyDebited;
XMatrix(i,j) = DailyChanges;
end
end
XTable=array2table(XMatrix);
XTable.Properties.VariableNames = UniqueConc;
XTable.Properties.RowNames = string(UniqueDates);

採用された回答

Jan
Jan 2022 年 5 月 26 日
編集済み: Jan 2022 年 5 月 26 日
Start with avoiding repeated work:
XMatrix = zeros(length(UniqueDates), height(UniqueConc)); % Pre-allocate!!!
xCredit = X.AccountCredited; % Abbreviation
xDebit = X.AccountDebited;
for i = 1:length(UniqueDates)
tmp = (X.Date == UniqueDates(i)); % Move out of the inner loop
for j = 1:height(UniqueConc)
RowsCredited = (tmp & xCredited == UniqueConc(j));
RowsDebited = (tmp & xDebited == UniqueConc(j));
DailyCredited = sum(X.AmountCredited(RowsCredited)); % [EDITED]
DailyDebited = sum(X.AmountDebited(RowsDebited)); % [EDITED]
XMatrix(i,j) = DailyCredited - DailyDebited;
end
end
Avoid the find(), because logical indexing is faster.
Is XMatrix preallocated?
  3 件のコメント
Jan
Jan 2022 年 5 月 26 日
編集済み: Jan 2022 年 5 月 26 日
It is hard to optimize code without having realistic input data. I cannot estimate, if an accumarray approach is faster here. Are you sure, that the loop is the bottleneck?
Another idea is to use ismember once, such that Matlab can search in numerical indices instead of string arrays:
xAccDebit = X.AccountDebited;
xAccCredit = X.AccountCredited;
xAmDebit = X.AmountDebited;
xAmCredit = X.AmountCredited;
uDebit = unique(xAccDebit);
uCredit = unique(xAccCredit);
uConc = unique([uDebit; uCredit]);
[uDates, ~, uDateInd] = unique(X.Date);
[~, creditInd] = ismember(xAccCredit, uConc);
[~, debitInd] = ismember(xAccDebit, uConc);
X = zeros(numel(uDates), numel(uConc)); % Pre-allocate
for i = 1:numel(uDates)
tmp = (uDateInd == i); % same as: (X.Date == uDates(i));
mCreditInd = creditInd .* tmp; % mask FALSE in tmp as 0 in vector
mDebitInd = debitInd .* tmp; %
for j = 1:numel(uConc)
RowsCredited = (mCreditInd == j);
RowsDebited = (mDebitInd == j);
DailyCredited = sum(xAmCredit(RowsCredited));
DailyDebited = sum(xAmDebit(RowsDebited));
X(i,j) = DailyCredited - DailyDebited;
end
end
Is numel(uDates) much larger than numel(uConc)? Maybe swapping the loops is faster.
I've edited the code in my first answer: I've abbreviated "X.AccountDebited" and "X.AmountDebited" both by "xDebited".
Are you working with less than 65536 elements? Then it might save some time, if you convert mCreditInd and mDebitInd to UINT16 and the loop e.g. to:
for j = uint16(1):uint16(numel(uConc))
ahmed920
ahmed920 2022 年 5 月 27 日
編集済み: ahmed920 2022 年 5 月 27 日
yeah number uDates is 4 times larger than numel uConc. for now yeah less than 65536 elements. Thanks this works it further reduced my time by a factor of 4.

サインインしてコメントする。

その他の回答 (1 件)

Seth Furman
Seth Furman 2022 年 6 月 2 日
It's worth noting that a lot of these computations can be expressed more succinctly using groupsummary, outerjoin, and fillmissing.
TransactionID = {'102001';'102002';'102003';'102004';'102005';'102006';'102007';'102008';'102009';'102010'};
Date = datetime({'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-09';'2021-02-09';'2021-02-09';'2021-02-17'},"InputFormat","uuuu-MM-dd");
AmountDebited = [3000; 0; 0; 3000; 4443; 2.02877864; 430; 50; 200; 0.297598984636403];
AccountDebited = ["TD-CAD";"LC-CAD";"LC-CAD";"SI-CAD";"SP-CAD";"LC-USD";"SP-CAD";"SP-CAD";"SP-CAD";"LC-USD"];
AmountCredited = [3000; 0; 0; 3000; 2.03048665; 4450; 0.18713085; 0.02177471; 0.0869854; 0.295498984636403];
AccountCredited = ["SP-CAD"; "MM-LPT"; "KN-DIR"; "SP-CAD"; "LC-USD"; "SI-CAD"; "LC-USD"; "LC-USD"; "LC-USD"; "MM-USD"];
X = table(TransactionID, Date, AmountDebited, AccountDebited, AmountCredited, AccountCredited);
Find total amounts credited/debited for each date-account pair
credited = groupsummary(X,["Date","AccountCredited"],"sum",["AmountCredited"])
credited = 7×4 table
Date AccountCredited GroupCount sum_AmountCredited ___________ _______________ __________ __________________ 06-Feb-2021 "KN-DIR" 1 0 06-Feb-2021 "LC-USD" 1 2.0305 06-Feb-2021 "MM-LPT" 1 0 06-Feb-2021 "SI-CAD" 1 4450 06-Feb-2021 "SP-CAD" 2 6000 09-Feb-2021 "LC-USD" 3 0.29589 17-Feb-2021 "MM-USD" 1 0.2955
debited = groupsummary(X,["Date","AccountDebited"],"sum",["AmountDebited"])
debited = 7×4 table
Date AccountDebited GroupCount sum_AmountDebited ___________ ______________ __________ _________________ 06-Feb-2021 "LC-CAD" 2 0 06-Feb-2021 "LC-USD" 1 2.0288 06-Feb-2021 "SI-CAD" 1 3000 06-Feb-2021 "SP-CAD" 1 4443 06-Feb-2021 "TD-CAD" 1 3000 09-Feb-2021 "SP-CAD" 3 680 17-Feb-2021 "LC-USD" 1 0.2976
Match total amounts credited/debited by date-account pair
credited.GroupCount = [];
debited.GroupCount = [];
sums = outerjoin(credited,debited,LeftKeys=["Date","AccountCredited"],RightKeys=["Date","AccountDebited"],MergeKeys=true)
sums = 11×4 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited ___________ ______________________________ __________________ _________________ 06-Feb-2021 "KN-DIR" 0 NaN 06-Feb-2021 "LC-CAD" NaN 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 06-Feb-2021 "MM-LPT" 0 NaN 06-Feb-2021 "SI-CAD" 4450 3000 06-Feb-2021 "SP-CAD" 6000 4443 06-Feb-2021 "TD-CAD" NaN 3000 09-Feb-2021 "LC-USD" 0.29589 NaN 09-Feb-2021 "SP-CAD" NaN 680 17-Feb-2021 "LC-USD" NaN 0.2976 17-Feb-2021 "MM-USD" 0.2955 NaN
Fill missing data
sums = fillmissing(sums,"constant",0,DataVariables=[3 4])
sums = 11×4 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited ___________ ______________________________ __________________ _________________ 06-Feb-2021 "KN-DIR" 0 0 06-Feb-2021 "LC-CAD" 0 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 06-Feb-2021 "MM-LPT" 0 0 06-Feb-2021 "SI-CAD" 4450 3000 06-Feb-2021 "SP-CAD" 6000 4443 06-Feb-2021 "TD-CAD" 0 3000 09-Feb-2021 "LC-USD" 0.29589 0 09-Feb-2021 "SP-CAD" 0 680 17-Feb-2021 "LC-USD" 0 0.2976 17-Feb-2021 "MM-USD" 0.2955 0
Compute net change by date-account pair
sums.NetChange = sums.sum_AmountCredited - sums.sum_AmountDebited
sums = 11×5 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited NetChange ___________ ______________________________ __________________ _________________ _________ 06-Feb-2021 "KN-DIR" 0 0 0 06-Feb-2021 "LC-CAD" 0 0 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 0.001708 06-Feb-2021 "MM-LPT" 0 0 0 06-Feb-2021 "SI-CAD" 4450 3000 1450 06-Feb-2021 "SP-CAD" 6000 4443 1557 06-Feb-2021 "TD-CAD" 0 3000 -3000 09-Feb-2021 "LC-USD" 0.29589 0 0.29589 09-Feb-2021 "SP-CAD" 0 680 -680 17-Feb-2021 "LC-USD" 0 0.2976 -0.2976 17-Feb-2021 "MM-USD" 0.2955 0 0.2955

カテゴリ

Help Center および File ExchangeLoops and Conditional Statements についてさらに検索

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by