フィルターのクリア

Converting numeric strings in cell to number for very large dataset

1 回表示 (過去 30 日間)
Peter Mills
Peter Mills 2018 年 1 月 10 日
コメント済み: Guillaume 2018 年 1 月 12 日
I have read this post which does something similar: https://uk.mathworks.com/matlabcentral/answers/375506-coverting-numeric-strings-in-cell-to-numbers
However I wanted to know if there is a more efficient way to do this as my data-set is very large. My dataset contains both text and numbers I what to convert numbers only to number format.
load('Data.mat')
tic
Qnum=str2double(Q);
toc
This code takes about 124 seconds to run
I have tried preallocating Qnum but the run time is about the same:
Qnum=zeros(length(Q),1);
  6 件のコメント
Peter Mills
Peter Mills 2018 年 1 月 11 日
All the required files are attached in the zip including an extract of the data. I have tested that the files in the zip run the same way with this extract of the data-set as the full data-set.
Peter Mills
Peter Mills 2018 年 1 月 11 日
For an explanation of my MATLAB code formatSpec = '%D %f %f %s %s %*s%*s% ... in line 26 see the Excel file attached where this code is a concatenation of A2:EC2. This file is not required to run the code. All files required to run the code are in the zip in my last comment.

サインインしてコメントする。

採用された回答

Guillaume
Guillaume 2018 年 1 月 11 日
The simplest way to read the example file, assuming you're on R2013b or later:
data = readtable('midas_marine-obs-lon-band-f_200901-200912.txt');
That's it! Numbers are read as numbers, text as text, dates as datetime. readtable figures it all out for you.
If you want to name the columns according to the other file, it's just as easy:
headers = readtable('MO_Column_Headers.txt');
data.Properties.VariableNames = headers.Properties.VariableNames
All done, only takes a few seconds to run.
  2 件のコメント
Peter Mills
Peter Mills 2018 年 1 月 12 日
Thank you for some very simple code, would have saved me about half the week if I had written my code like that in the first place. This code takes about 150 sec to run for a file of data size 935722x133. Is it possible to not spend run-time importing the columns I don't need?
Guillaume
Guillaume 2018 年 1 月 12 日
"Is it possible to not spend run-time importing the columns I don't need?"
Possibly. readtable is extremely customisable. You have two options:
  • Use the Format option of readtable and use %*fieldtype for the columns you want to ignore just as you would with textscan. The downside is that you now have to figure out the format.
  • Use detectimportoptions on the file and then edit the SelectedVariableNames property of the options, then pass that to readtable. I've not looked at the details of how detectimportoptions work. It may read the whole file which would not save you any time.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeData Type Conversion についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by