formatstring error when using tabularTextDatastore

Hello all,
I am trying to read a large CSV file (~500GB) using tabularTextDatastore.
my command line is just one:
ds = tabularTextDatastore('filename.csv')
the error I'm getting is:
Error using tabularTextDatastore (line 147)
Output argument "formatString" (and maybe others) not assigned duting call to "matlab.io.internal.text.determineFormatString>convertDatatypeToFormatString".

could't find any material on this error online, can someone please advise?

thanks!

7 件のコメント

Jeremy Hughes
Jeremy Hughes 2022 年 11 月 17 日
If you can share a sample version of the file which reproduces the error, that would help. Also what version are you using?
sani
sani 2022 年 11 月 22 日
Hi,
I'm using 2020a
It is a bit problematic to send a sample since I couldn't split the files.
generally there are 2 headers: TIMETAG; ENERGY
thanks
sani
sani 2022 年 12 月 8 日
移動済み: Stephen23 2022 年 12 月 8 日
anyone?
Jeremy Hughes
Jeremy Hughes 2022 年 12 月 8 日
移動済み: Stephen23 2022 年 12 月 8 日
Without a sample file, anyone that might know how to help cannot reproduce the issue—this is probably why no one has responded. You'll have better luck getting help by adding an example file along with the code sample which reproduces the issue.
The code is there, but the file is an integral part of the reproduction. It doesn't have to be the exact file as long as you see the same issue. Try copying the first few lines of the file and see if that file still reproduces the issue.
I'd also suggest trying this same reproduction on another installation, on another machine. Sometimes these kinds of errors coming from MathWorks written code is due to files that are missing which should be available, or additional files on the path which are shadowing standard MathWorks functions. I'm not sure that's the case here, but I can't really try to reproduce it.
The second possibility is a bug in the existing code, which is being uncovered with this specific file, and we'd need to understand what's happening in order to fix it.
Without that file, they might be guessing at the content to try to reproduce it, but if they can't figure that out then they aren't likely to continue.
sani
sani 2022 年 12 月 8 日
移動済み: Stephen23 2022 年 12 月 8 日
HI, and thanks for your answer and patience :)
This is a thin version of the file I'm trying to read. The same error occurs.
Stephen23
Stephen23 2022 年 12 月 8 日
編集済み: Stephen23 2022 年 12 月 8 日
What MATLAB version and OS are you using? The file works fine here:
ds = tabularTextDatastore('sample file.csv')
ds =
TabularTextDatastore with properties: Files: { '/users/mss.system.JLUjl3/sample file.csv' } Folders: { '/users/mss.system.JLUjl3' } FileEncoding: 'UTF-8' AlternateFileSystemRoots: {} VariableNamingRule: 'modify' ReadVariableNames: true VariableNames: {'BOARD', 'CHANNEL', 'TIMETAG' ... and 3 more} DatetimeLocale: en_US Text Format Properties: NumHeaderLines: 0 Delimiter: ';' RowDelimiter: '\r\n' TreatAsMissing: '' MissingValue: NaN Advanced Text Format Properties: TextscanFormats: {'%f', '%f', '%f' ... and 3 more} TextType: 'char' ExponentCharacters: 'eEdD' CommentStyle: '' Whitespace: ' \b\t' MultipleDelimitersAsOne: false Properties that control the table returned by preview, read, readall: SelectedVariableNames: {'BOARD', 'CHANNEL', 'TIMETAG' ... and 3 more} SelectedFormats: {'%f', '%f', '%f' ... and 3 more} ReadSize: 20000 rows OutputType: 'table' RowTimes: [] Write-specific Properties: SupportedOutputFormats: ["txt" "csv" "xlsx" "xls" "parquet" "parq"] DefaultOutputFormat: "txt"
sani
sani 2022 年 12 月 8 日
I'm using MATLAB 2020a and windows 10 os

サインインしてコメントする。

回答 (1 件)

Jeremy Hughes
Jeremy Hughes 2022 年 12 月 8 日
編集済み: Jeremy Hughes 2022 年 12 月 8 日

0 投票

Turns out this was a bug in R2020a and is been fixed in R2020b. See: https://www.mathworks.com/support/bugreports/2263913
You can work around this with:
ds = tabularTextDatastore('sample file.csv','Delimiter',';','TextscanFormats',"%f%f%f%f%f%q")
or
ds = tabularTextDatastore('sample file.csv','Delimiter',';','TextscanFormats',"%f%f%f%f%f%x")

5 件のコメント

Jeremy Hughes
Jeremy Hughes 2022 年 12 月 8 日
Acutally the fix should be in R2020a Update 4. Check to see if you can update to that version.
sani
sani 2022 年 12 月 13 日
thank you, I will try this update :)
I will update here if it solved the issue
sani
sani 2022 年 12 月 13 日
the temporary solution you suggested shows this error:
Error using matlab.io.datastore.TabularTextDatastore/readData (line 78)
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 10106, field number 3) ==> x0\n
Learn more about errors encountered during GATHER.
Error in matlab.io.datastore.TabularDatastore/read (line 174)
[t, info] = ds.readData();
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Caused by:
Reading the variable name 'TIMETAG' using format '%f' from file:
'F:\sample\UNFILTERED\sample_file.csv'
starting at offset 3221225498.
Jeremy Hughes
Jeremy Hughes 2022 年 12 月 13 日
This looks like a separate issue with the format of the file not matching the expected format. You'll need to check the contents match on each row. If the rows aren't consistent with each other, then you might need to clean up the file to work with tabularTextDatastore.
If you're reading one file, then try readtable.
Jeremy Hughes
Jeremy Hughes 2022 年 12 月 13 日
The other alternative is to use all %q fields:
ds = tabularTextDatastore('sample file.csv','Delimiter',';','TextscanFormats',"%q%q%q%q%q%q")

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeMatrix Indexing についてさらに検索

質問済み:

2022 年 11 月 17 日

コメント済み:

2022 年 12 月 13 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by