フィルターのクリア

Any alternative to readstruct to accelerate large XML file importation?

9 ビュー (過去 30 日間)
phenan08
phenan08 2022 年 6 月 15 日
回答済み: Pratyush 2023 年 10 月 23 日
I often import large mzXML files in MATLAB using the mzxmlread function of the Bioinformatics Toolbox, that is based on the readstruct function. Note that mzXML files are basically XML files.
My files typically each take 1 to 3 GB of disk space, and the importation consequently takes time (3-10 minutes).
Are there any tricks or any alternatives to the readstruct function, that could use parallel calculation for example, in order to import one single mzXML files faster?
Regards.
  3 件のコメント
phenan08
phenan08 2022 年 6 月 15 日
Thank you for your comment. I will try, and report my trials in this topic.
Oskar Munk Kronik
Oskar Munk Kronik 2023 年 1 月 20 日
Hi Phenan08,
I'm having the same challenges. Did your trails work?
Thanks in advance

サインインしてコメントする。

回答 (1 件)

Pratyush
Pratyush 2023 年 10 月 23 日
Hi phenan08,
I understand that you work with large mzXML files. These files take a long time to import in MATLAB, and you want some workaround that is less time-consuming to work with these files.
Here are a few suggestions that may help:
  1. Memory Mapping: Instead of loading the entire mzXML file into memory, you can consider using memory mapping techniques. MATLAB provides functions like 'memmapfile' that allow you to access the data in the file without loading it entirely. This can help reduce memory usage and improve performance. Refer to the following documentation for details on 'memmapfile': Create memory map to a file - MATLAB memmapfile - MathWorks India
  2. Read Subset of Data: If you only need to work with a subset of the data in the mzXML file, you can consider reading only the required portions. This can be achieved by modifying the readstruct function or using lower-level XML parsing functions in MATLAB, such as xmlread, to extract only the necessary data.Refer to the following documentation for details on 'xmlread': Read XML document and return Document Object Model node - MATLAB xmlread - MathWorks India
  3. External Libraries: MATLAB provides the ability to call external libraries using the mex interface. You can explore libraries specifically designed for efficient XML parsing, such as "libxml2" or "Xerces-C++", and create a custom MEX function to import the mzXML files faster.

カテゴリ

Help Center および File ExchangeStructured Data and XML Documents についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by