request advice on groupfilter in timetables

Would greatly appreciate some advice on what should be a simple operation with groupfilter. The script under construction below needs to filter for (isolate) all the rows & columns pertaining to one ticker at a time.
load SPMIDFund.mat;
Fbatch = {'GGG','WSM','CGNX','CRL','FICO','MOH'};
for corpind = 1:length(Fbatch);
corp = Fbatch{corpind};
%groupfilter(SPMIDFund,
date = SPMIDFund.CALENDARDATE;
repdate = SPMIDFund.LASTUPDATED;
sls = SPMIDFund.REVENUE;
ps = SPMIDFund.PS
% etc.
% etc.
% assignin
% save
end
The data source is a 191x91 timetable already binned by ticker (fragment attached)--I need to also filter for that same ticker. 'TICKER' is the only column denoting the company associated with the earnings data in the rest of the table. Have not been able to find a groupfilter or rowfun syntax that works. Suggestions will be appreciated.
CC

8 件のコメント

Peter Perkins
Peter Perkins 2021 年 7 月 29 日
cris, can you say more specifically what you need to do? groupfilter applies a function to each group of rows in a table, and the function typically removes some of those rows. Your example fle seems clear, but It would help for you to give an example of what you want to do to it.
cris
cris 2021 年 7 月 29 日
Peter,
Thanks for the follow up. The flow is: isolate all the rows pertaining to a single TICKER, say, GGG (not in the attached file, just first security in the batch above) to start, and all of the data columns for that ticker only; execute statistical analysis on desired data variables; label (assignin) results with the TICKER and save to current folder; resume loop with security #2, etc. Would love to keep the timetable frame intact if possible. Have used assignin many times, the problem is just with limiting the rows to one unique ticker. The data vendor does not permit efficient filtering by ticker--that sample table is their format. I am using ML financial toolbox as well. Hope this helps.
CC
Peter Perkins
Peter Perkins 2021 年 7 月 29 日
Several things don't make sense to me.
What are "the data columns for that ticker only"? The tickers label the rows of your timetable.
assignin creates workspace variables. What does that mean to "save to the current folder". Are you saving mat files?
What do you mean to "keep the timetable frame intact"? What is the timetable frame?
If your plan is to create one workspace variable for each ticker, that's generally thought of as begin a very bad idea.
You need to give a clear example of the operations you want to perform, step by step.
cris
cris 2021 年 7 月 30 日
Using the attached timetable as an example. I will need to perform timeseries analyses on CGNX and to use the ticker to identify the correct rows. So that means rows 1 through 21, columns 1 through 7. Then repeat the same process for CRL. I cannot assume that each ticker will have 21 rows or that col 7 will always contain data for ASSETS (per vendor). Filtering rows for each ticker is the part of the loop where assistance would be helpful.
Peter Perkins
Peter Perkins 2021 年 7 月 30 日
"You need to give a clear example of the operations you want to perform, step by step."
Without that, the only advice I can give would be to use groupsummary, grouptransform, or groupfilter.
cris
cris 2021 年 7 月 30 日
All I need is to create a new timetable from SPMIDFund1.mat that contains just the rows with CGNX in the TICKER column, then do the same for CRL, etc. The computations that follow are routine. This is the only stumbling block.
Peter Perkins
Peter Perkins 2021 年 7 月 30 日
You almost certainly don't want to do that. What you probably want is to work in place, rather than creating a bunch of things in your workspace with dynamic names. It's a FAQ that maybe someone can point to.
What you more likely want to do is to use groupsummary, or one of its siblings, or maybe a grouped rowfun. The lowwing is rowfun, group* can do the same things.
function [x,y] = myfun1(a,b,c,d) % a nonsense reduction function
x = mean(a - b);
y = var(c - d);
end
>> load SPMIDFund1.mat
>> rowfun(@myfun1,SPMIDFund1,"InputVariables",["DATEKEY" "LASTUPDATED" "ASSETS" "ACCOCI"],...
"GroupingVariables","TICKER","OutputVariableNames",["X" "Y"], "OutputFormat","table")
ans =
2×4 table
TICKER GroupCount X Y
______ __________ ____________ __________
CGNX 21 -21990:51:25 1.2562e+17
CRL 5 -39360:00:00 9.1939e+16
function [x,y] = myfun2(a,b,c,d) % a nonsense transformation function
x = a - b;
y = c - mean(c)./d;
end
>> rowfun(@myfun2,SPMIDFund1,"InputVariables",["DATEKEY" "LASTUPDATED" "ASSETS" "ACCOCI"],...
"GroupingVariables","TICKER","OutputVariableNames",["X" "Y"])
ans =
26×4 timetable
CALENDARDATE TICKER GroupCount X Y
____________ ______ __________ ____________ __________
31-Mar-2016 CGNX 21 -43944:00:00 9.1364e+08
30-Jun-2016 CGNX 21 -41760:00:00 9.6444e+08
30-Sep-2016 CGNX 21 -39576:00:00 1.0201e+09
31-Dec-2016 CGNX 21 -36984:00:00 1.0386e+09
31-Mar-2017 CGNX 21 -35208:00:00 1.0962e+09
30-Jun-2017 CGNX 21 -33024:00:00 1.161e+09
30-Sep-2017 CGNX 21 -30840:00:00 1.2585e+09
31-Dec-2017 CGNX 21 -28248:00:00 1.2879e+09
31-Mar-2018 CGNX 21 -26472:00:00 1.2689e+09
30-Jun-2018 CGNX 21 -24288:00:00 1.2763e+09
30-Sep-2018 CGNX 21 -22104:00:00 1.3247e+09
31-Dec-2018 CGNX 21 -19512:00:00 1.2897e+09
31-Mar-2019 CGNX 21 -17736:00:00 1.3538e+09
30-Jun-2019 CGNX 21 -15552:00:00 1.3499e+09
30-Sep-2019 CGNX 21 -13368:00:00 1.3905e+09
31-Dec-2019 CGNX 21 -10776:00:00 1.8859e+09
31-Mar-2020 CGNX 21 -9000:00:00 1.8851e+09
30-Jun-2020 CGNX 21 -6768:00:00 1.9349e+09
30-Sep-2020 CGNX 21 -4584:00:00 2.0605e+09
31-Dec-2020 CGNX 21 -2040:00:00 1.8007e+09
31-Mar-2021 CGNX 21 -24:00:00 1.909e+09
31-Mar-2016 CRL 5 -43824:00:00 2.1037e+09
30-Jun-2016 CRL 5 -41640:00:00 2.763e+09
30-Sep-2016 CRL 5 -39456:00:00 2.6902e+09
31-Dec-2016 CRL 5 -36960:00:00 2.7118e+09
31-Mar-2017 CRL 5 -34920:00:00 2.6971e+09
If you are determined to work on one timetable at a time, write a loop over the unique values of SPMIDFund1, and grab a subtable at each iteration:
for ticker = unique(SPMIDFund1.Ticker)'
t_i = SPMIDFund1(SPMIDFund1.Ticker == ticker,:)
<do something>
end
cris
cris 2021 年 8 月 2 日
Will try your solution tomorrow (Mon). Unfortunately, it will be necessary to split these tables in due course in order to use in a different script.

サインインしてコメントする。

回答 (0 件)

カテゴリ

製品

リリース

R2021a

質問済み:

2021 年 7 月 29 日

コメント済み:

2021 年 8 月 2 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by