MATLAB Answers

Translated by

このページのコンテンツは英語から自動翻訳されています。自動翻訳をオフにする場合は「<a class="turn_off_mt" href="#">ここ</a>」をクリックしてください。

dpb
1

Rearrange cell content by groups

dpb
さんによって質問されました 2017 年 12 月 31 日
最新アクティビティ dpb
さんによって コメントされました 2018 年 12 月 23 日
There's got to be a simple way to do this...
Have a cell array of data by date wherein there are N (or fewer) instances of M quantities for each date. I need to rearrange to reflect each of the M quantities for a given date on a single row. For example given
Date Fund Shares Value
___________ _____________________________________________ _______________ ______
30-Nov-2015 'American International Growth & Income Fund' '67.958 shares' 1979.6
30-Nov-2015 'American Mutual Fund' '54.930 shares' 1997.3
30-Nov-2015 'Capital Income Builder' '34.843 shares' 1997.2
30-Nov-2015 'Europacific Growth Fund' '41.528 shares' 1988.8
30-Nov-2015 'Income Fund of America Fund' '95.102 shares' 2001.9
31-Dec-2015 'American International Growth & Income Fund' '67.958 shares' 1936.6
31-Dec-2015 'American Mutual Fund' '54.930 shares' 1952.1
31-Dec-2015 'Capital Income Builder' '34.843 shares' 1964.2
31-Dec-2015 'Europacific Growth Fund' '41.528 shares' 1941.1
31-Dec-2015 'Income Fund of America Fund' '95.102 shares' 1975.3
31-Jan-2016 'American International Growth & Income Fund' '67.958 shares' 1847.1
31-Jan-2016 'American Mutual Fund' '54.930 shares' 1894.5
31-Jan-2016 'Capital Income Builder' '34.843 shares' 1939.6
31-Jan-2016 'Europacific Growth Fund' '41.528 shares' 1823
31-Jan-2016 'Income Fund of America Fund' '95.102 shares' 1930.4
...
30-Sep-2017 'American International Growth & Income Fund' '67.958 shares' 44.02
30-Sep-2017 'Europacific Growth Fund' '41.528 shares' 101.21
31-Oct-2017 'American International Growth & Income Fund' '67.958 shares' 44.3
31-Oct-2017 'Europacific Growth Fund' '41.528 shares' 103.35
30-Nov-2017 'American International Growth & Income Fund' '67.958 shares' 44.3
30-Nov-2017 'Europacific Growth Fund' '41.528 shares' 103.35
one observes there are five Funds for each of the first few dates but only two at the end. The desired output would be
Date Fund1 Shares1 Value1 Fund2 Shares2 Value2 ...
___________ _________________ _______________ ______ _________________ _______________ ____________ ...
30-Nov-2015 'American Int...' '67.958 shares' 1979.6 'American Mut...' '54.930 shares' 1997.3 ...
31-Dec-2015 'American Int...' '67.958 shares' 1936.6 'American Mut...' '54.930 shares' 1952.1 ...
...
30-Nov-2017 'American Int...' '67.958 shares' 44.3 '' '' '' ...
where I truncated the lines showing only the first two of five. Of course, there would be empty cells at the end where there are only the two of five still extant.
It's easy enough to get grouping indices; I just haven't figured out the simple way to process them to build the output array from the grouping numbers.

  1 件のコメント

dpb
2018 年 12 月 23 日
It's been a year since I posted...I had to drop the problem for many months owing to more pressing matters until this past week I returned (with yet another year's data to add to the mix!) still with individual entries only documented as comments...
However, I did get the initial "stripper" completed last night and now do have all those from all years and all months from the inception of the capaital campaign initial gift so I will be able to then got through entry by entry and discover where the disconnect first occurred and, hopefully, that will point to the data entry error...
MC & HNY to all!!!, just wanted to let know that the input did bear fruit, eventually... :)

サインイン to comment.

製品

2 件の回答

Sean de Wolski
回答者: Sean de Wolski
2018 年 1 月 5 日
編集済み: Sean de Wolski
2018 年 1 月 5 日
 採用された回答

If you use the table datatype, which I strongly encourage over cells(!), then this is a simple one-liner. You can also specify the new variable names as an option if the defaults don't suit.
unstack(dpbtable, {'Shares', 'Value'}, 'Fund')
Attached is a MAT file with dpbtable.
More: Just for fun and because accumarray is my favorite MATLAB function, here's a cell version that gets roughly what you want:
dpbcell = table2cell(dpbtable)
% Unique of each date and fund
[udate,~,idxdate] = unique(vertcat(dpbcell{:,1}));
[ufund,~,idxfund] = unique(vertcat(dpbcell{:,2}));
% Accumulate date x fund
[num2cell(udate), accumarray([idxdate, idxfund], (1:numel(idxfund))', [], @(x){dpbcell(x, 2:4)}, {nan})]

  13 件のコメント

dpb
2018 年 1 月 12 日
Appreciate your efforts a bunch, Sean...the November worksheet didn't balance and it's taken the bookkeeper an incredible amount of time to track down why; obviously having data buried in the comments is a major reason that it is so difficult plus there's just a tremendous excess of complexity and redundancy in the whole mess.
If I can just get to the point of building a table from the comments and being able to use it for data entry and populate the rest of the spreadsheet from it for new gifts it'll be a major step forward; if I could dump XLSREAD in favor of READTABLE or an augmented version thereof, it would likely make things easier on my end knowing virtually nothing about Excel/VBA I've found the learning curve for writing Excel-only solutions even more painful...
Sean de Wolski
2018 年 1 月 16 日
readtable, even with supplied options, is embarrassingly slow for those sheets. I'll be reporting it to dev.
Using the import tool, I was able to pull all of the data into a table pretty quickly and generate the function to automate it for any sheet that looks like this. The advantage is that the data come in as strings, numeric, and categorical automatically and you could perhaps take advantage of exclusion/default rules.
Unfortunately, I don't know that this offers much over just managing xlsread/raw yourself, especially if the sheets have changing columns.
The generated function is attached.
dpb
2018 年 1 月 16 日
OK, again thanks for taking a look, Sean. Hopefully if nothing else it can serve as useful fodder for dev. to improve the routine.
The penchant for importdata to build functions with hardcoded column numbers is an issue; over the years additional columns have been inserted here and there to either contain some additional information or more often than not, just add some spacing for aesthetics. Consequently, I'm reduced to having to search for column headers to locate which is where. That's not too bad; at least the headings of the data columns that are carried over haven't changed over the time frame over which I've had to work and that goes back a number of years.
Below is a snippet that really is all I've done uniquely; everything else is the home-brew xlsread routine that I just copied in order to add the optional TRIM flag to keep the consistency between various types.
% parse data into numeric and string arrays, keep sizes commensurate
trimData=false;
[numericData, textData] = SplitNumericAndText(rawData,trimData);
commentData=xlsComments(Excel,rangeObject);
% Would be more efficient to find stuff inside the range rather than read
% all, but Matlab syntax is much simpler when don't know VBA/Object Model
% So, first find the Pool heading rows, columns
[ir,ic]=cellfun(@(txt) findtxtincell(txt,textData),pools,'uniform',0);
for i=1:length(ir) % for each pool we found
rr=[]; % initialize accumulator for funds
j=ir{i}+1; % begin with row after pool name
while ~isempty(textData{j,2}) % find account rows in pool
rr=[rr; j]; % any none-empty row is an account
j=j+1;
end
rows{i,1}=[ir{i}; rr]; % save row indices for each pool
end
idx=cell2mat(rows); % get indices as addressing vector
textData=textData(idx,:);
numericData=numericData(idx,:);
rawData=rawData(idx,:);
commentData=commentData(idx,:);
if ~isempty(customFun)
customOutput=customOutput(idx,:);
end
The section returns the desired Pool subset(s) plus the comments associated with them. At this point it is then trivial to build the table representation as the returned output is now regular and includes the pieces wanted by identifiable section indices.
It took me a while to work out the VBA-equivalent for the COM interface to get at the comments but I posted a Question and the subsequent solution on that separately.
Again, thanks a lot for the time and interest...

サインイン to comment.


回答者: Cedric Wannaz
2017 年 12 月 31 日
編集済み: Cedric Wannaz
2017 年 12 月 31 日

Hi dpb,
Assuming that you have a cell array (and that you are using this table type of output just for a display purpose), here is a "neither-too-elegant-nor-too-satisfactory" approach:
data = {'10/8/17','A',10,100; '10/8/17','B',8,30; ...
'11/8/17','E',17,70; '11/8/17','A',14,80; '11/8/17','C',5,110; ...
'12/8/17','B',9,50} ;
[groupId, dates] = findgroups( data(:,1) ) ;
% - Build row/col IDs in output array.
rowId = reshape( (groupId * ones(1,3)).', [], 1 ) ;
colId = arrayfun( @(n)1+(1:3*n), accumarray(groupId, 1), 'Unif', 0 ) ;
colId = [colId{:}].' ;
% - Build output array.
outData = cell( max(rowId), max(colId) ) ;
outData(:,1) = dates ;
outData(sub2ind( size(outData), rowId, colId )) = reshape( data(:,2:end).', [], 1 ) ;
where the input data is:
>> data
data =
6×4 cell array
{'10/8/17'} {'A'} {[10]} {[100]}
{'10/8/17'} {'B'} {[ 8]} {[ 30]}
{'11/8/17'} {'E'} {[17]} {[ 70]}
{'11/8/17'} {'A'} {[14]} {[ 80]}
{'11/8/17'} {'C'} {[ 5]} {[110]}
{'12/8/17'} {'B'} {[ 9]} {[ 50]}
and the output is:
>> outData
outData =
3×10 cell array
{'10/8/17'} {'A'} {[10]} {[100]} {'B' } {[ 8]} {[ 30]} {0×0 double} {0×0 double} {0×0 double}
{'11/8/17'} {'E'} {[17]} {[ 70]} {'A' } {[ 14]} {[ 80]} {'C' } {[ 5]} {[ 110]}
{'12/8/17'} {'B'} {[ 9]} {[ 50]} {0×0 double} {0×0 double} {0×0 double} {0×0 double} {0×0 double} {0×0 double}
Hoping that you'll get a more elegant approach ..
Happy new year!
Cedric
EDIT 1 : Using SPLITAPPLY can help to some extent:
>> outData = splitapply( @(x){reshape(x.',[],1).'}, data(:,2:end), groupId )
outData =
3×1 cell array
{1×6 cell}
{1×9 cell}
{1×3 cell}
but then we must implement padding for the concatenation.

  7 件のコメント

dpb
2018 年 1 月 3 日
splitapply is still a struggle for me to think of how to apply it for more than the most straightforward cases as are shown in the documentation (one of the problems in the doc is that only the most obvious uses are often the only examples given).
I was looking at another poster's query a few days ago (and I need to get back and see if another ever did post a working solution) that needed a two-tier grouping that couldn't seem to find a way to write a nested expression that would work correctly.
I don't know what the correct answer is, but I can't help but think there's a better overall approach than the tack TMW has headed down--but again, it may be that the constraints of adding it on top of existing Matlab means such higher abstractions just aren't feasible/possible.
Cedric Wannaz
2018 年 1 月 3 日
I have the same feeling, there should be a better approach!
About SPLITAPPLY, I think that it is not intuitive because, in the beginning, we have a hard time guessing how the input array is split and to what the function passed as 1st arg is applied. A good way to check that out is to DISP it with no output arg:
>> splitapply( @disp, data(:,2:end), groupId )
'American International Growt…' '67.958 shares' [1.9796e+03]
'American Mutual Fund' '54.930 shares' [1.9973e+03]
'Capital Income Builder' '34.843 shares' [1.9972e+03]
'Europacific Growth Fund' '41.528 shares' [1.9888e+03]
'Income Fund of America Fund' '95.102 shares' [2.0019e+03]
'American International Growt…' '67.958 shares' [1.9366e+03]
'American Mutual Fund' '54.930 shares' [1.9521e+03]
'Capital Income Builder' '34.843 shares' [1.9642e+03]
'Europacific Growth Fund' '41.528 shares' [1.9411e+03]
'Income Fund of America Fund' '95.102 shares' [1.9753e+03]
'American International Growt…' '67.958 shares' [1.8471e+03]
'American Mutual Fund' '54.930 shares' [1.8945e+03]
'Capital Income Builder' '34.843 shares' [1.9396e+03]
'Europacific Growth Fund' '41.528 shares' [ 1823]
'Income Fund of America Fund' '95.102 shares' [1.9304e+03]
'American International Growt…' '67.958 shares' [ 44.0200]
'Europacific Growth Fund' '41.528 shares' [101.2100]
'American International Growt…' '67.958 shares' [ 44.3000]
'Europacific Growth Fund' '41.528 shares' [103.3500]
'American International Growt…' '67.958 shares' [ 44.3000]
'Europacific Growth Fund' '41.528 shares' [103.3500]
Seeing this, it becomes obvious why I used:
@(x){reshape(x.',[],1).'}
in my previous comment, and why I pass all columns but the first.
dpb
2018 年 1 月 4 日
Yeah, I often do that disp "trick" with all the various XXXXfun incantations to discover just what the argument list passed to the (often anonymous) function really is--it isn't always all that intuitive as you point out.

サインイン to comment.



Translated by