Why the headerlines are not always properly detected by readtable?

85 ビュー (過去 30 日間)
pietro
pietro 2018 年 5 月 10 日
編集済み: Walter Roberson 2018 年 5 月 10 日
Hi all,
I have many .csv files to import into Matlab. Those files are automatically exported from Scopus. With most of the downloaded files, I have no problem, but for some, like the one you can download from this link , the headers are totally wrong. Matlab skips the first line.
With other files like this , Matlab returns the following error:
Error using readtable (line 198)
Reading failed at line 3. All lines of a text file must have the same
number of delimiters. Line 3 has 1164 delimiters, while preceding
lines have 365.
Note: readtable detected the following parameters:
'Delimiter', ' ', 'HeaderLines', 1, 'ReadVariableNames', false,
'Format',
'%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%f%q%q%q'
Here the code I have used for both files:
M=readtable(TestFile.csv,'Encoding','UTF-8');
How can I solve both problems?
Thanks.
Best regards,
pietro

採用された回答

Guillaume
Guillaume 2018 年 5 月 10 日
編集済み: Guillaume 2018 年 5 月 10 日
There is actually a weird character at the start of the file. It is an UTF-8 BOM marker, EF BB BF. Unicode does not recommend using a UTF-8 BOM marker.
Note that regardless of the marker, matlab R2018a imports the file correctly. It slightly mangles the Authors header because of that BOM marker that it doesn't know how to interpret. The header becomes x__Authors. The rest is as it should be.
edit: As far as I know there is nothing you can do at the readtable level but you could always check the files beforehand and remove the BOM marker:
files = {....}; %list of files
folder = 'C:\somewhere';
for fileidx = 1:numel(files)
fid = fopen(fullfile(folder, files{fileidx}));
content = fread(fid);
fclose(fid);
if isequal(content(1:3), [239; 187; 191])
fid = fopen(fullfile(folder, files{fileidx}), 'w');
fwrite(fid, content(4:end));
end
end
  2 件のコメント
Guillaume
Guillaume 2018 年 5 月 10 日
The problem with your second file is that matlab misdetect the delimiter. I've not stepped through the code to find out why but it's easily fixed:
t = readtable('testfile2.csv', 'Delimiter', ',')

サインインしてコメントする。

その他の回答 (0 件)

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by