datastore=​detectImpo​rtOptions Pro Max?

3 ビュー (過去 30 日間)
fa wu
fa wu 2023 年 7 月 22 日
編集済み: fa wu 2023 年 7 月 24 日
I compared the functions of datastore and detectImportOptions. I feel that the datastore is more powerful. Almost all the functions of detectImportOptions are included
1. Datastore can read multiple files or a specified file from different folders at once, while detectImportOptions can only read one file at once
2. Datastore.readall can automatically connect multiple files,
3. DetectImportOptions can set MissingRule, but T=datastore. readall;
Then, use anymessaging | rmmissing | fillmissing | missing | isnan | ismissing
Missing values can also be handled
It seems that there is nothing detectImportOptions can do, while datastores cannot.Is there any situation where only detectImportOptions can be used and datastore cannot be used?
Can use fds = fileDatastore(location,"ReadFcn",@fcn) read "specific format file" Instead of using detectImportOptions?
I think this comment is very helpful,thanks a lot for Walter Roberson's help! comment

回答 (1 件)

Walter Roberson
Walter Roberson 2023 年 7 月 22 日
When you readmatrix() or readtable() a file, these days options are automatically detected. But the automatically detected options are not always correct options in the situation. Sometimes you need to detectImportOptions() to get out a basic options structure, then modify the detected options, and then pass the modified options into the appropriate reading routine.
The default reading routines for datasets use the default options, so they might not always read the data correctly. However, if you are aware that is happening, you can specify a custom reading function that takes appropriate steps to read the data correctly.
detectImportOptions is a utility routine that was never intended to manage sets of data, and never intended to read the data and make the read data available: it is only intended to give good guesses about the format of specific data files in order to inform the reading routines such as readtable() .
  5 件のコメント
Walter Roberson
Walter Roberson 2023 年 7 月 23 日
When datastore() internally automatically calls readmatrix() or readtable(), those routines call detectImportOptions() or similar routines. The detection of the import options can be relatively expensive -- the detection functions will read up to the first 100 megabytes to try to guess the file format accurately.
Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file.
This can be especially important for efficiency if you are reading from a network drive such as OneDrive or Google Drive, as reading from network drives can be fairly slow.
fa wu
fa wu 2023 年 7 月 24 日
thanks a lot for your comment. It is very helpful!
"Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file."--------------It is sound like detectImportOptions+datastore work together?Is there any example code?

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLarge Files and Big Data についてさらに検索

製品


リリース

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by