readtable behaves different after editing CSV?

12 ビュー (過去 30 日間)
iwantrugs
iwantrugs 2021 年 12 月 8 日
編集済み: Stephen23 2021 年 12 月 24 日
I have one CSV file output from some software. The original CSV file (one attached without 'edited' appended) has some header lines which I don't care about and a column of times which I do care about.
The column of times in the original file is incorrect since some stuff happened during the acquisition. I have edited the said CSV file to fix the errors. When I use the code below on the edited file I get the error at the final line saying Input data must be a real, numeric array. If I run the same code on the unedited CSV file it runs fine and behaves as expected; outputting a vector of the original times in seconds, which is what I want. I'm really unsure why this is happening - I've not changed anything else in the CSV files aside from adding a few times, but the number format of all the cells afaik is the same in both CSV files.
ons_csv = readtable('S17.csv');
ons_times = ons_csv(:,1);
ons_times = table2array(ons_times);
ons_seconds = seconds(ons_times);
  1 件のコメント
Stephen23
Stephen23 2021 年 12 月 24 日
編集済み: Stephen23 2021 年 12 月 24 日
"I've not changed anything else in the CSV files aside from adding a few times, but the number format of all the cells afaik is the same in both CSV files."
No, it isn't. The problem is that you used Excel to alter a CSV text file. Every time you open and then save a CSV file with Excel, it will completely re-write the file using what Excel thinks is a CSV format. There is nothing preventing Excel from changing the file format or data formats, e.g. delimiters, dates, string quotations, numbers, etc. Even if you think you only changed some small part of the CSV file, Excel makes zero effort to keep the "rest" of the file unchanged.
We learn two lessons from this:
  • If you want to compare CSV files then use a reputable file comparison tool (MATLAB has one inbuilt).
  • Do not use Excel if a particular file format and/or data needs to be preserved.
CSV files are just text: if you only need to make a few changes then you should use a reputable text editor:

サインインしてコメントする。

採用された回答

Cris LaPierre
Cris LaPierre 2021 年 12 月 24 日
編集済み: Cris LaPierre 2021 年 12 月 24 日
I think the issue is more obvious if you open the csv file in a text editor instead of Excel.
The original file does not have the same number of commas in the header rows. Readtable is able to therefore detect where the data starts, and skip the header.
In the edited file, Excel adds commas so every row has the same number of columns. Readtable is not able to automatically detect the header anymore, so starts importing the data from row 1. This is leading to your error.
The fix is to tell MATLAB the number of header lines when using the edited file.
One more thing - Excel also has changed the formatting of your time (drops the hours - again, check in a text editor). This means MATLAB is no longer able to autodetect the datatype, so that column is being read in as a character array. You will need to manually correct the data type and set the input format so that it is imported correctly.
Once you've done that, there is no need to convert your table to an array to get the time values. See the Access Data in a Table documentation page.
Here's how I might import your file.
opts2 = detectImportOptions("S17edit.csv",'NumHeaderLines',8);
opts2 = setvartype(opts2,"StartTime","duration");
opts2 = setvaropts(opts2,"StartTime",'InputFormat','mm:ss.S');
ons_csv = readtable('S17edit.csv',opts2)
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ons_csv = 36×5 table
StartTime ProtocolType Pre_Rest Task Pos_Rest _________ ____________ ________ ____ ________ 01:57.2 2 45 45 0 02:27.2 2 45 45 0 02:57.3 2 45 45 0 03:27.5 2 45 45 0 05:37.0 2 45 45 0 06:07.2 2 45 45 0 06:37.6 2 45 45 0 07:07.9 2 45 45 0 08:41.1 2 45 45 0 09:11.3 2 45 45 0 09:41.5 2 45 45 0 10:11.6 2 45 45 0 11:37.8 2 45 45 0 12:08.2 2 45 45 0 12:38.5 2 45 45 0 13:08.8 2 45 45 0
% Obtain time in seconds
ons_seconds = seconds(ons_csv.StartTime)
ons_seconds = 36×1
117.2000 147.2000 177.3000 207.5000 337.0000 367.2000 397.6000 427.9000 521.1000 551.3000

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeSpreadsheets についてさらに検索

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by