Read csv with quotes and datetime stamp (readtable configuration)

6 ビュー (過去 30 日間)
AG15
AG15 2022 年 9 月 14 日
コメント済み: Walter Roberson 2022 年 9 月 15 日
Hi,
I have a csv file which contains the following format for each line. I understand how fscanf can be used for parsing data if the first digit is just a number: https://www.mathworks.com/matlabcentral/answers/1805220-read-csv-with-quotes-and-numbers. I tried setting this up for a datetime entry in the first field. However this gets tricky as I have to scan individual fields in the date and time if using fscanf.
Is there a more efficient way to parse and scan this using readtable where I can use the datatime variable format so that the date and time can be directly included in the post processing of the data?
Unfortunately I am not able to remove the quotes at this time due to the way in which the datafile is generated. Thanks a lot !
Data format:
14-Sep-2022 10:02:38.140000,"1034,19,AB"
14-Sep-2022 10:02:38.140000,"1034,19,AB"

採用された回答

Walter Roberson
Walter Roberson 2022 年 9 月 14 日
lines = {'14-Sep-2022 10:02:38.140000,"1034,19,AB"', '14-Sep-2022 10:02:38.140000,"1034,19,AB"'};
filename = tempname + ".txt";
writelines(lines, filename);
dbtype(filename)
1 14-Sep-2022 10:02:38.140000,"1034,19,AB" 2 14-Sep-2022 10:02:38.140000,"1034,19,AB"
fmt = '%{dd-MMM-uuuu}D %{hh:mm:ss.SSSSSS}T"%f%f%[^"]"';
fid = fopen(filename, 'r');
datacell = textscan(fid, fmt, 'whitespace', ',');
fclose(fid);
Time = datacell{1} + datacell{2};
Time.Format = 'dd-MMM-uuuu HH:mm:ss.SSSSSS';
T = table(Time, datacell{3:end})
T = 2×4 table
Time Var2 Var3 Var4 ___________________________ ____ ____ ______ 14-Sep-2022 10:02:38.140000 1034 19 {'AB'} 14-Sep-2022 10:02:38.140000 1034 19 {'AB'}
  2 件のコメント
AG15
AG15 2022 年 9 月 14 日
Thanks a lot for the help. This worked perfectly!
Walter Roberson
Walter Roberson 2022 年 9 月 15 日
There are two tricks here:
  1. it is difficult to get a %D format to include a space when parsing a datetime specification. It is possible by changing the Whitespace specification, but that tends to mess up other fields. So it is typically easier to read the date portion with %D and the time portion with %T and then add the two together
  2. When using datetime() the input format specification for 24 hour days is HH and where hh would be used for 12 hour days (with there typically being an 'a' field for AM/PM indicator). But when you are using duration() then the number of hours is purely a count, not a time of day -- for example 78:32 would be valid for 78 hours 32 minutes for duration() purposes. So there is a conflict between datetime() and duration() as to what to use for hours beyond 12 in a day -- you need HH for datetime but hh for duration. You cannot simply break a datetime() format up into date and time portions; you have to change the HH of datetime to hh for duration purposes.

サインインしてコメントする。

その他の回答 (1 件)

dpb
dpb 2022 年 9 月 14 日
readtable will import the file time field as a datetime variable automagically; you don't need to do anything except read it.
You'll then have to split the second column that will be imported as a cellstr array
I thought could add the " to the 'Whitespace' but that didn't work by itself -- would take a fair amount of effort to modify an import options object to account for the additional variables; think it's simpler to just import as is and split. Or, of course, you could read the file as text and delete the " and rewrite or parse from memory instead.
  1 件のコメント
AG15
AG15 2022 年 9 月 14 日
編集済み: AG15 2022 年 9 月 14 日
Thank, yes I was thinking oriniginally of replacing the "" by a comma in a text editor as an intermediate step to allow parsing the data, that I think would also allow the readtable to parse data

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeDates and Time についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by