Matlab readmatrix inconsistently reading csv files

21 ビュー (過去 30 日間)
Christian Taylor
Christian Taylor 2023 年 8 月 24 日
編集済み: Stephen23 2023 年 8 月 24 日
I'm using matlabs readmatrix function to read in data from a csv file and store to a variable. The csv files are identical in format, with a bunch of lines of text at the start before the data starts at line 21. However, the readmatrix function seems to behave inconsistently, sometimes capturing all the text at the start of the csv and storing as NaN, and other times ignoring these first 21 lines and only grabbing the data. Why is this? What is a better way to do this?
  7 件のコメント
Christian Taylor
Christian Taylor 2023 年 8 月 24 日
Update: I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour.
Stephen23
Stephen23 2023 年 8 月 24 日
編集済み: Stephen23 2023 年 8 月 24 日
"I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour."
Yes, differences between the files is most likely the cause.
Of course the algorithm used by READTABLE et al is not perfect (there is no such thing) and it cannot read minds: what is obevious to a human is not obvious to a machine. It is always possible to trick or confuse an algorithm with the right combination of data or whatever, such things are mathematically unavoidable.
Note that relying on what files "look like" in MS Excel is a number one mistake that you should avoid: MS Excel mangles data in all sorts of horrible ways that look indistinguishable from inside Excel, e.g. adding or changing dlimiters. It can also change data without any warning:
If you want reliable data processing do NOT open and save text files using MS Excel. It is a great tool for Excel spreadsheets... but for anything else... beware of dragons!

サインインしてコメントする。

採用された回答

Steven Lord
Steven Lord 2023 年 8 月 24 日
If you know exactly how many header lines your file contains, I would specify the NumHeaderLines name-value argument in your readmatrix call.
Alternately you can create a file import options object using detectImportOptions. Once it's been created check that its properties that specify where the data is located (either DataRange or DataLines) and where any variable metadata is located (VariableNamesLine, VariableDescriptionsLine, VariableUnitsLine, or the corresponding Range properties for SpreadsheetImportOptions) match your expectations for where the data / metadata is located based on the expected format of the files. Once you've confirmed that they match your expectations, pass that import options object into readmatrix as the opts input argument.
If the import options properties don't match what you expect, and reviewing the file doesn't indicate to you why MATLAB is detecting the values for those properties that it is, please send a sample data file that demonstrates this behavior to Technical Support using this link along with the import options object and describe the results you expect. It's possible that you've identified a bug or an ambiguous edge case in the import options detection algorithm.

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeSpreadsheets についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by