MATLAB Answers

Read and seperate CSV data

179 ビュー (過去 30 日間)
Dan Howard
Dan Howard 2016 年 2 月 8 日
コメント済み: Noam Greenboim 2020 年 5 月 21 日
Hello everyone,
Im back again with a simple question which I cant fathom out.
Have a CSV file that Im writing a script to:
  1. open and read the file
  2. seperate the data i.e have tail in column A, type_ref in column B, date in column C etc
  3. plot against date and amp
% [data, txtdata] = xlsread('Radial.csv');
length = size(data);
length = length(1,1);
rowoffset = 11 % from first csv row of numerical data
I hope this makes sense as I don't know how else to explain this:
The first current line puts the csv into a [64x1] cell but does not seperate the data.
'FPG123,FLYDATA,2013-12-16 11:21:51.110,3,3,FLYDATA Engines,FLYDATA 170 No 1 Engine (9),FLYDATA Engine Ground,7,FLYDATA SERIAL1,FLYDATA No 1 Engine Radial,FLYDATA No 1 Engine Shaft ,FLYDATA No Freq Passband,0.72825962305069'
The above example shows all the headings in one row and column whilse all the other data follows under from row two onwards.
I have been trying to get the data to show in either seperate cells ie:
tail type_ref datetime seq etc......... ( 14 columns altogether)
FPG123 FLYDATA 2013-12-16 11:21:51.110 3
FPG126 FLYDATA 2014-12-17 11:28:32.101 3
FPG128 FLYDATA 2015-12-02 11:28:47.100 3
I have tried using the delimiter by calling each heading as in :
% [tail type_ref datetime seq] = csvimport('Radial.csv', columns, [1:4],'delimiter',',') % only for 4 headings and data
or show all the data on one sheet. Really could appreciate some help on this. Many thanks.
  5 件のコメント
dpb 2016 年 2 月 9 日
I don't know, but my first guess would be Excel if somebody pasted the whole thing record-by-record into a column without using the import tool to split columns...I think that's what it would then do on output to csv as each cell would have to be a string.
Well, let's just try...hmmm, ok, that's what happens excepting--this version of Excel, at least, used double quotes instead of single ones as the delimiter. Not sure if that's a user-settable option or dependent on locale, maybe.


回答 (2 件)

Walter Roberson
Walter Roberson 2016 年 2 月 15 日
You have a problem that your file (at least the fixedtest.csv that you posted) is UTF-16 Little Endian, which is an encoding format that MATLAB does not support. It needs to be translated. I wrote a routine that detects UTF encoding and offers a conversion routine from bytes; I have attached the source.
Using that routine:
input_file = 'fixedtest.csv';
[file_encoding, bytes_per_char, BOM_size, bytes2char] = detect_UTF_encoding(input_file);
if isempty(file_encoding)
error('No usable input file');
fid = fopen(input_file,'rt');
fread(fid, [1, BOM_size], '*uint8'); %skip the Byte Order Mark
thisbuffer = fgets(fid);
extra = mod(length(thisbuffer), bytes_per_char);
if extra ~= 0
%in little-endian modes, newline would be found in first byte and the 0's after need to be read
thisbuffer = [thisbuffer, fread(fid, [1, bytes_per_char - extra], '*uint8')];
thisline = bytes2char(thisbuffer);
data_cell = textscan(thisline, '%s', 'delimiter', ','); %will ignore the end of lines
header_fields = reshape(data_cell{1}, 1, []);
num_field = length(header_fields);
thisbuffer = fread(fid, [1 inf], '*uint8');
extra = mod(length(thisbuffer), bytes_per_char);
if extra ~= 0
thisbuffer = [thisbuffer, zeros(1, bytes_per_char - extra, 'uint8')];
thisline = bytes2char(thisbuffer);
fmt = repmat('%s', 1, num_field);
data_cell = textscan(thisline, fmt, 'delimiter', ',');
data_fields_text = horzcat(data_cell{:});
data_fields = data_fields_text;
data_fields(:,3) = num2cell(datenum(data_fields_text(:,3))); %or use datetime() instead of datenum
data_fields(:,4) = num2cell(str2double(data_fields_text(:,4)));
data_fields(:,5) = num2cell(str2double(data_fields_text(:,5)));
data_fields(:,9) = num2cell(str2double(data_fields_text(:,9)));
data_fields(:,13) = num2cell(str2double(data_fields_text(:,13)));
data_table = cell2table(data_fields, 'VariableNames', header_fields);
  4 件のコメント
Noam Greenboim
Noam Greenboim 2020 年 5 月 21 日
Here is a simple way I made to detect and write BOM to unicode files:
CurrentBOM = BOM(UnicodeFileName, Encoding, FileToWrite)


dpb 2016 年 2 月 8 日
編集済み: dpb 2016 年 2 月 8 日
Your basic problem is the file was created as a text file of multiple strings which include the comma delimiters within them, not as a csv file, strictly speaking.
But, you can deal with it fairly simply...
>> fid=fopen('dan.csv'); % open a copy of your two lines
>> hdr=textscan(fid,'%s',14,'delimiter',',','whitespace',[' \b\t' ''''],'collectoutput',1)
hdr =
{14x1 cell}
>> hdr{:}
ans =
...[elided middle values for brevity]...
>> dat=textscan(fid,[repmat('%s',1,13) '%f'],'delimiter',',','whitespace',[' \b\t' ''''],'collectoutput',1)
dat =
{1x13 cell} [7.2826e-01]
>> fid=fclose(fid);
Only tricky part is the addition of the single quote mark into the whitespace list; otherwise you'll end up with an extra tick for the first and last header names and potentially the last floating point value will error (altho I didn't test it; I'm sure it'll work for the first record, whether the internal record position will be ok to do subsequent lines or not I don't know for certain. The sure way is to ensure it's skipped as above.)
  16 件のコメント
dpb 2016 年 2 月 18 日
See Walter Roberson's Answer -- he recognized a problem I didn't as I just opened the above file in Excel to see whether it did or did not have the quote marks as given in your very first post. That encoding keeps the Matlab normal i/o functions from functioning correctly.
I can't answer why you're getting the strrep error; it works here w/o error but on trying on the actual file textread only returns a few characters instead of the whole file.
Looking back, I note that you used xlsread originally and successfully read the text; the problem is the data weren't separated by cell in Excel. Thus xlsread can read the file as it is stored so you can either use Walter's converter script/function or stick with an Excel solution.
Probably still the simplest since you have an Excel spreadsheet and your apparent level of familiarity with the tools (assuming you do have Excel and not just the file) is to use the import tool in it when you open the file with it and convert the single cell per row into a regular spreadsheet format and save that then read that file as a "real" .xls file. Then xlsread will return the values separated by column to be imported into the table class.
Or, once you've done the conversion, presuming as noted that it's not something about your OS/locale creating the specific file encoding, if you save the spreadsheet as a .csv file where each column is a cell, then the direct readtable solution should work.
Either way, you've got to work around the Matlab limitation for the file encoding as Walter discovered.
I'm presuming that the resulting file will not also be UTF-16 Little Endian here...if that's inherent in your system for some reason owing to a locale setting, or somesuch, you're going to have much difficulty with Matlab it would seem.


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by