formatstring error when using tabularTextDatastore

4 ビュー (過去 30 日間)
sani
sani 2022 年 11 月 17 日
コメント済み: Jeremy Hughes 2022 年 12 月 13 日

Hello all,
I am trying to read a large CSV file (~500GB) using tabularTextDatastore.
my command line is just one:
ds = tabularTextDatastore('filename.csv')
the error I'm getting is:
Error using tabularTextDatastore (line 147)
Output argument "formatString" (and maybe others) not assigned duting call to "matlab.io.internal.text.determineFormatString>convertDatatypeToFormatString".

could't find any material on this error online, can someone please advise?

thanks!

  7 件のコメント
Stephen23
Stephen23 2022 年 12 月 8 日
編集済み: Stephen23 2022 年 12 月 8 日
What MATLAB version and OS are you using? The file works fine here:
ds = tabularTextDatastore('sample file.csv')
ds =
TabularTextDatastore with properties: Files: { '/users/mss.system.JLUjl3/sample file.csv' } Folders: { '/users/mss.system.JLUjl3' } FileEncoding: 'UTF-8' AlternateFileSystemRoots: {} VariableNamingRule: 'modify' ReadVariableNames: true VariableNames: {'BOARD', 'CHANNEL', 'TIMETAG' ... and 3 more} DatetimeLocale: en_US Text Format Properties: NumHeaderLines: 0 Delimiter: ';' RowDelimiter: '\r\n' TreatAsMissing: '' MissingValue: NaN Advanced Text Format Properties: TextscanFormats: {'%f', '%f', '%f' ... and 3 more} TextType: 'char' ExponentCharacters: 'eEdD' CommentStyle: '' Whitespace: ' \b\t' MultipleDelimitersAsOne: false Properties that control the table returned by preview, read, readall: SelectedVariableNames: {'BOARD', 'CHANNEL', 'TIMETAG' ... and 3 more} SelectedFormats: {'%f', '%f', '%f' ... and 3 more} ReadSize: 20000 rows OutputType: 'table' RowTimes: [] Write-specific Properties: SupportedOutputFormats: ["txt" "csv" "xlsx" "xls" "parquet" "parq"] DefaultOutputFormat: "txt"
sani
sani 2022 年 12 月 8 日
I'm using MATLAB 2020a and windows 10 os

サインインしてコメントする。

回答 (1 件)

Jeremy Hughes
Jeremy Hughes 2022 年 12 月 8 日
編集済み: Jeremy Hughes 2022 年 12 月 8 日
Turns out this was a bug in R2020a and is been fixed in R2020b. See: https://www.mathworks.com/support/bugreports/2263913
You can work around this with:
ds = tabularTextDatastore('sample file.csv','Delimiter',';','TextscanFormats',"%f%f%f%f%f%q")
or
ds = tabularTextDatastore('sample file.csv','Delimiter',';','TextscanFormats',"%f%f%f%f%f%x")
  5 件のコメント
Jeremy Hughes
Jeremy Hughes 2022 年 12 月 13 日
This looks like a separate issue with the format of the file not matching the expected format. You'll need to check the contents match on each row. If the rows aren't consistent with each other, then you might need to clean up the file to work with tabularTextDatastore.
If you're reading one file, then try readtable.
Jeremy Hughes
Jeremy Hughes 2022 年 12 月 13 日
The other alternative is to use all %q fields:
ds = tabularTextDatastore('sample file.csv','Delimiter',';','TextscanFormats',"%q%q%q%q%q%q")

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeStructures についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by