MATLAB Answers

Read and seperate CSV data

166 ビュー (過去 30 日間)
Dan Howard
Dan Howard 2016 年 2 月 8 日
Commented: Walter Roberson 2016 年 3 月 2 日
Hello everyone,
Im back again with a simple question which I cant fathom out.
Have a CSV file that Im writing a script to:
  1. open and read the file
  2. seperate the data i.e have tail in column A, type_ref in column B, date in column C etc
  3. plot against date and amp
% [data, txtdata] = xlsread('Radial.csv');
length = size(data);
length = length(1,1);
rowoffset = 11 % from first csv row of numerical data
I hope this makes sense as I don't know how else to explain this:
The first current line puts the csv into a [64x1] cell but does not seperate the data.
eg.
'tail,type_ref,datetime,seq,seq_no,sys_ref,ref,reg_ref,res_seq,ci_ref,chan_ref,component_ref,passband_ref,amp'
'FPG123,FLYDATA,2013-12-16 11:21:51.110,3,3,FLYDATA Engines,FLYDATA 170 No 1 Engine (9),FLYDATA Engine Ground,7,FLYDATA SERIAL1,FLYDATA No 1 Engine Radial,FLYDATA No 1 Engine Shaft ,FLYDATA No Freq Passband,0.72825962305069'
The above example shows all the headings in one row and column whilse all the other data follows under from row two onwards.
I have been trying to get the data to show in either seperate cells ie:
tail type_ref datetime seq etc......... ( 14 columns altogether)
FPG123 FLYDATA 2013-12-16 11:21:51.110 3
FPG126 FLYDATA 2014-12-17 11:28:32.101 3
FPG128 FLYDATA 2015-12-02 11:28:47.100 3
I have tried using the delimiter by calling each heading as in :
% [tail type_ref datetime seq] = csvimport('Radial.csv', columns, [1:4],'delimiter',',') % only for 4 headings and data
or show all the data on one sheet. Really could appreciate some help on this. Many thanks.

  5 件のコメント

表示 2 件の古いコメント
Image Analyst
Image Analyst 2016 年 2 月 8 日
Can you try to avoid having the whole line enclosed in single quotes? If so, then try readtable(). If not, then perhaps textscan() with the proper format specifier string. Attach the file so people might try some things.
Stephen Cobeldick
Stephen Cobeldick 2016 年 2 月 9 日
Here is an almost identical question, where each line was written as one string between quotation marks (and the user also thought it was a CSV file):
What tool creates these terrible file formats? Who writes something that can't even get a CSV file right, and is so buggy that it places the whole line within one set of quotation marks (and ignored fields that really were strings). Very puzzling.
dpb
dpb 2016 年 2 月 9 日
I don't know, but my first guess would be Excel if somebody pasted the whole thing record-by-record into a column without using the import tool to split columns...I think that's what it would then do on output to csv as each cell would have to be a string.
Well, let's just try...hmmm, ok, that's what happens excepting--this version of Excel, at least, used double quotes instead of single ones as the delimiter. Not sure if that's a user-settable option or dependent on locale, maybe.

サインイン to comment.

件の回答 (2)

dpb
dpb 2016 年 2 月 8 日
編集済み: dpb 2016 年 2 月 8 日
Your basic problem is the file was created as a text file of multiple strings which include the comma delimiters within them, not as a csv file, strictly speaking.
But, you can deal with it fairly simply...
>> fid=fopen('dan.csv'); % open a copy of your two lines
>> hdr=textscan(fid,'%s',14,'delimiter',',','whitespace',[' \b\t' ''''],'collectoutput',1)
hdr =
{14x1 cell}
>> hdr{:}
ans =
'tail'
'type_ref'
'datetime'
...[elided middle values for brevity]...
'passband_ref'
'amp''
>> dat=textscan(fid,[repmat('%s',1,13) '%f'],'delimiter',',','whitespace',[' \b\t' ''''],'collectoutput',1)
dat =
{1x13 cell} [7.2826e-01]
>> fid=fclose(fid);
Only tricky part is the addition of the single quote mark into the whitespace list; otherwise you'll end up with an extra tick for the first and last header names and potentially the last floating point value will error (altho I didn't test it; I'm sure it'll work for the first record, whether the internal record position will be ok to do subsequent lines or not I don't know for certain. The sure way is to ensure it's skipped as above.)

  16 件のコメント

表示 13 件の古いコメント
dpb
dpb 2016 年 2 月 15 日
Well, post the code and any errors and the explanation of the uncertainties outlined above, starting with the "real" answer regarding the existence of or lack of the spurious quote marks in the input file...
Dan Howard
Dan Howard 2016 年 2 月 18 日
Ok, thanks dpb.
I think I have a better understanding of what you have said.
By using your method below does exactly what you said and opens and reads a new file:
f=textread('fixedtest.csv','%s','delimiter','\n'); % cell string array
f=strrep(f,'''',''); % remove the single quotes
fid=fopen('newfile.csv','wt'); % make a new file
for i=1:size(f,1)
fprintf(fid,'%s\n',char(f(i));
end
fid=fclose(fid);
Can I ask that when I get the error below, is this due to the file (fixedtest.csv) not having quote marks in each string and numeric value?
1. When i run the code above I get an error stating..
Error using strrep
Cell elements must be character arrays.
Error in filerun (line18)
f=strrep(f,'''','');
2. If i uncomment
%f=strrep(f,'''','');
This gives an error but still opens the newfile.csv.
Error using char
Cell elements must be character arrays.
Error in filerun (line 21)
fprint(fid,'%s\n',char(f(i)));
So I thought I would try using readtable with newfile.csv:
>> readtable('newfile.csv')
ans =
empty O-by-O table
I have included each step in what I have done.
Do you know why this is returning an empty file?
dpb
dpb 2016 年 2 月 18 日
See Walter Roberson's Answer -- he recognized a problem I didn't as I just opened the above file in Excel to see whether it did or did not have the quote marks as given in your very first post. That encoding keeps the Matlab normal i/o functions from functioning correctly.
I can't answer why you're getting the strrep error; it works here w/o error but on trying on the actual file textread only returns a few characters instead of the whole file.
Looking back, I note that you used xlsread originally and successfully read the text; the problem is the data weren't separated by cell in Excel. Thus xlsread can read the file as it is stored so you can either use Walter's converter script/function or stick with an Excel solution.
Probably still the simplest since you have an Excel spreadsheet and your apparent level of familiarity with the tools (assuming you do have Excel and not just the file) is to use the import tool in it when you open the file with it and convert the single cell per row into a regular spreadsheet format and save that then read that file as a "real" .xls file. Then xlsread will return the values separated by column to be imported into the table class.
Or, once you've done the conversion, presuming as noted that it's not something about your OS/locale creating the specific file encoding, if you save the spreadsheet as a .csv file where each column is a cell, then the direct readtable solution should work.
Either way, you've got to work around the Matlab limitation for the file encoding as Walter discovered.
I'm presuming that the resulting file will not also be UTF-16 Little Endian here...if that's inherent in your system for some reason owing to a locale setting, or somesuch, you're going to have much difficulty with Matlab it would seem.

サインイン to comment.


Walter Roberson
Walter Roberson 2016 年 2 月 15 日
You have a problem that your file (at least the fixedtest.csv that you posted) is UTF-16 Little Endian, which is an encoding format that MATLAB does not support. It needs to be translated. I wrote a routine that detects UTF encoding and offers a conversion routine from bytes; I have attached the source.
Using that routine:
input_file = 'fixedtest.csv';
[file_encoding, bytes_per_char, BOM_size, bytes2char] = detect_UTF_encoding(input_file);
if isempty(file_encoding)
error('No usable input file');
end
fid = fopen(input_file,'rt');
fread(fid, [1, BOM_size], '*uint8'); %skip the Byte Order Mark
thisbuffer = fgets(fid);
extra = mod(length(thisbuffer), bytes_per_char);
if extra ~= 0
%in little-endian modes, newline would be found in first byte and the 0's after need to be read
thisbuffer = [thisbuffer, fread(fid, [1, bytes_per_char - extra], '*uint8')];
end
thisline = bytes2char(thisbuffer);
data_cell = textscan(thisline, '%s', 'delimiter', ','); %will ignore the end of lines
header_fields = reshape(data_cell{1}, 1, []);
num_field = length(header_fields);
thisbuffer = fread(fid, [1 inf], '*uint8');
extra = mod(length(thisbuffer), bytes_per_char);
if extra ~= 0
thisbuffer = [thisbuffer, zeros(1, bytes_per_char - extra, 'uint8')];
end
thisline = bytes2char(thisbuffer);
fmt = repmat('%s', 1, num_field);
data_cell = textscan(thisline, fmt, 'delimiter', ',');
data_fields_text = horzcat(data_cell{:});
data_fields = data_fields_text;
data_fields(:,3) = num2cell(datenum(data_fields_text(:,3))); %or use datetime() instead of datenum
data_fields(:,4) = num2cell(str2double(data_fields_text(:,4)));
data_fields(:,5) = num2cell(str2double(data_fields_text(:,5)));
data_fields(:,9) = num2cell(str2double(data_fields_text(:,9)));
data_fields(:,13) = num2cell(str2double(data_fields_text(:,13)));
data_table = cell2table(data_fields, 'VariableNames', header_fields);

  3 件のコメント

Dan Howard
Dan Howard 2016 年 2 月 17 日
Good morning Walter, many thanks for your help on this. Need to ask a couple of questions.
Just to make sure, i am running Matlab version 2015b.
I have run your routine to detect UTF encoding and need to make sure I am running this correctly (please excuse my slow brain).
The first line of the source code:
function [encoding, bytes_per_char, BOM_size, byte2char] = detect_UTF_encoding(FILENAME)
With the routine, when you press the run button do you enter the FILENAME 'fixedtest.csv'.
Question 1
I have tried this and this error comes up :-
>> Unicodedecoder( 'fixedtest.csv')
Error: File: Unicodedecoder.m Line: 69 Column: 1
This statement is not inside any function.
(It follows the END that terminates the definition of the function "detect_UTF_encoding".)
Reason for asking as I have tried the filename that 'fixedtest.csv' is currently stored in and this spits out...
undefined function or variable.
Question 2
With the sourse code in place do i include and run the second routine you have included i.e
input_file = 'fixedtest.csv';
[file_encoding, bytes_per_char, BOM_size, bytes2char] = detect_UTF_encoding(input_file);
if isempty(file_encoding)
error('No usable input file');
end
etc...etc
Does this code run after the first source?
Reason why I ask a silly question is that
input_file
comes up with
Parse error at 'input_file': usage might be invalid MATLAB syntax.
When I run the whole sourse the following error comes up:
>> Unicodedecoder('fixedtest.csv')
Error: File unicodedecoder.m Line 70 Column: 1
This statment is not inside any function.
(It follwos the END that terminates the definition of the function " Unicodedecoder".)
I hope I have explained the issues I am having, again I thank you for your help.
Walter Roberson
Walter Roberson 2016 年 2 月 29 日
"Unicodedecoder" is not my source code. My detect_UTF_encoding code only has 59 lines and the script I posted only has 41 lines. If you stored that script into Unicodedecoder.m then you must have added a "function" line to it as what I posted was a script that would not take arguments.
If you did create such a function and if you put the detect_UTF_encoding into the same file instead of its own file, then you might not have created your function properly. In particular my detect_UTF_encoding.m uses the newer syntax in which there is an "end" matching each "function" line; if you put that code onto the bottom of a "function" you created which did not use that syntax, then you could get that error about the end being out of place (but more correct would have been an error about attempting to mix static functions with legacy functions.) Just leave it in its own file and you will not have to worry about whether it uses that syntax or not.

サインイン to comment.

サインイン してこの質問に回答します。


Translated by