textscan difficulties with mixed datatypes

Question

0 投票

Hi

I am having difficulty solving a particular problem. I might just be missing the wood for the trees but here goes:

I have a large (> 1mio) cellstr that has the following type of format (only 3 row example shown):

    blockCSV = {'record1,2,3,string4,s5';'rec2,22,33,str4,str5';'r3,222,333,s4,st5'};

I then attempt to textscan through each cellstr (for loop, as textscan is not "vectorized" for cellstr) using one of the following two syntaxes:

temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',0)

or

temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',1)

Now, the problem is that temp comes out as a cell that contains cells and matrices ie. indexing within indexing on different datatypes. I can't afford to index each one individually inside the loop (large dataset as mentioned) but I need the output to come out as :

   ans = 
    'record1'    [  2]    [  3]    'string4'    's5'  
    'rec2'       [ 22]    [ 33]    'str4'       'str5'
    'r3'         [222]    [333]    's4'         'st5'

[Edited for clarity (hopefully)]: Instead I get something like (CollectOutput is false):

ans =

    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}

or (CollectOutput is true):

ans =

    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}

With CollectOutput == false I would expect to see what I stated above instead of a cell within a cell within makes any indexing very difficult?

I hope this makes sense. I'm sure i'm missing something simplistic.

PS: I think textscan is inconsistent because when you read the example from an actual file (instead of a cellstr) it works exactly like I want the outcome to be without any for loop or indexing.

Regards, Phillip

2 件のコメント
なしを表示なしを非表示

per isakson 2014 年 5 月 27 日

Why use textscan in the first place?

Phillip 2014 年 5 月 28 日

Why not? I have tried a couple of things and it seemed to be best. Please elaborate if you think it's not so that I can reply appropriately

サインインしてコメントする。

サインインしてこの質問に回答する。

サインインしてアクティビティをフォロー

Answer 1

Cedric 2014 年 5 月 28 日

編集済み: Cedric 2014 年 5 月 28 日

MATLAB Online で開く

3 投票

Why do you get the CSV content as a cell array of rows? If you cannot change this, you could just merge/concatenate all these rows inserting line breaks, and use TEXTSCAN on the whole.

 merger = [blockCSV, repmat({sprintf('\n')}, numel(blockCSV), 1)].' ;
 data   = textscan([merger{:}], '%s%f%f%s%s', 'Delimiter', ',') ;

with that you get

 >> data
 data = 
    {3x1 cell}    [3x1 double]    [3x1 double]    {3x1 cell}    {3x1 cell}

which is most appropriate memory-wise and for further indexing, as numeric entries are stored in numeric arrays, and non-numeric entries in cell arrays.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Phillip 2014 年 5 月 28 日

Nice use of the "inconsistency". Should have thought of that. Speeds up the code nicely and now I can finally generalise the larger code. Thanks!

サインインしてコメントする。

Answer 2

dpb 2014 年 5 月 27 日

MATLAB Online で開く

1 投票

Is only one of the many inconsistencies/quirks in textscan...

AFAIK about the best you can do is to then post-process another step by substituting the value of the cell for the cell in the three string cell columns. By for loop, it's

>> for i=1:3,t(i,1)=t{i,1};t(i,4)=t{i,4};t(i,5)=t{i,5};end
>> t
t = 
  'record1'    [  2]    [  3]    'string4'    's5'  
  'rec2'       [ 22]    [ 33]    'str4'       'str5'
  'r3'         [222]    [333]    's4'         'st5'

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Phillip 2014 年 5 月 28 日

Yes, it's a bit frustrating to be honest. The solution from Cedric below uses that inconsistency nicely to get it working though. Thanks for the response.

サインインしてコメントする。

textscan difficulties with mixed datatypes

2 件のコメント
なしを表示なしを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

textscan difficulties with mixed datatypes

2 件のコメント なしを表示 なしを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示