How to read in a large mixed csv data file

3 ビュー (過去 30 日間)
Noah
Noah 2015 年 6 月 30 日
コメント済み: Walter Roberson 2015 年 7 月 1 日
Hi,
Please excuse me if this is difficult to understand - I am new to Matlab and coding in general.
I’m trying to read in a large mixed data file that I can then manipulate. The file is 53 columns by about 8.8 million observations (rows) and is in csv format separated by commas. It is arranged as follows:
Numbers only: Columns 1-28, 30-34, 36, 50, 53
All other columns are text only (e.g. Johannesburg) or mixed text and numbers (e.g. E44). Some include spaces (e.g.Cape Town) and others symbols like slashes (e.g. A00-A09).
It is not clear to me if the first row is headings or not.
I’m assuming I need to use either readtable or textscan, but so far have been unsuccessful with the code.
Thanks for the help! Noah
  2 件のコメント
Image Analyst
Image Analyst 2015 年 7 月 1 日
Crop out just a few rows, say 20, and upload the cropped file here with the paper clip icon. I think readtable should work, unless it got confused at some columns that have both numbers only in them and letters only in them and perhaps a mix. Post your readtable() call.
per isakson
per isakson 2015 年 7 月 1 日
編集済み: per isakson 2015 年 7 月 1 日
There are many alternatives. Did you try A = importdata(filename)?

サインインしてコメントする。

回答 (1 件)

Walter Roberson
Walter Roberson 2015 年 7 月 1 日
fmts = repmat({'%^[,]'}, 1, 53); % %^[,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%g'};
fmt = [fmts{:}];
headers = 1; %maybe 0?
fid = fopen('YouFile.csv','rt');
datacell = textscan(fids, fmt, 'Headerlines', headers, 'Delimiter', ',');
fclose(fid);
and now datacell{1} is the first column, datacell{2} is the second column, and so on. The text columns will be a cell array of strings, one per row.
  2 件のコメント
Noah
Noah 2015 年 7 月 1 日
Hi Walter,
Thanks for your time on this.
I had success as far as the penultimate command, (datacell = textscan...) At that point I got the error:
Error using textscan
Badly formed format string
If you have suggestions that would be great, but otherwise will keep trying.
Thanks again.
Walter Roberson
Walter Roberson 2015 年 7 月 1 日
Sorry, I had two errors. The correction is
fmts = repmat({'%[^,]'}, 1, 53); % %[^,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%f'};

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLarge Files and Big Data についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by