Why does Matlab transpose hdf5 data?
19 ビュー (過去 30 日間)
古いコメントを表示
There is an apparent bug in Matlab HDF5 read/write utility that breaks interoperability with other code. Simple array datasets are read/written as the transpose of their actual shape. I imagine this is because Matlab uses column-major (Fortran-style) order, whereas the HDF5 standard uses row-major (C-style) order.
Minimal example that illustrates the problem:
h5create('test.h5', '/dataset', [2,3]);
h5write('test.h5', '/dataset', reshape(1:6,[2,3]))
Running the HDF5 utility h5ls on the output reveals the problem:
$ h5ls test.h5
dataset Dataset {3, 2}
This is not evident if only using the HDF5 tools from within Matlab, since reading the dataset in also transposes it back.
>> h5read('test.h5', '/dataset')
ans =
1 3 5
2 4 6
Matlab should either fix this in future versions or mention the convention in the documentation, since people mostly choose HDF5 for interoperability with other systems, and this can be a tricky bug to find.
In versions:
- h5ls: Version 1.8.14
- Matlab 8.6.0.267246 (R2015b) GLNXA64
1 件のコメント
Daniel Döhring
2019 年 5 月 24 日
編集済み: Daniel Döhring
2019 年 5 月 24 日
Actually this bug seems to be still around. In my case, a (pseudo) multiarray of dimensions
is in Matlab internally permuted to
. As a consequence, it is impossible to write back a multiarray in dimensions
, since Matlab does not represent matrices in
manner.
採用された回答
James Tursa
2016 年 10 月 20 日
編集済み: Walter Roberson
2016 年 10 月 21 日
In the following link:
I read the following under Data Layout:
"Contiguous: The array is stored in one contiguous area of the file. This layout requires that the size of the array be constant"
"The offset of an element from the beginning of the storage area is computed as in a C array."
"The first dimension stored in the list of dimensions is the slowest changing dimension and the last dimension stored is the fastest changing dimension."
So, yes this appears to be clear that the data storage order in the file is "C" array convention, and I can find no options that allow a "Fortran" array convention.
That being said, the dimensions that apparently got stored in the file appear to be correct. I.e., the slowest changing dimension (3) did in fact get stored in the file first, followed by the fastest changing dimension (2). This assumes of course that the data was written into the file in the order 1, 2, 3, 4, 5, 6. So the data appears to be written to the file correctly as far as that goes (i.e., the dimensions stored in the file match the data order in the file). It just didn't get written out in the order you expected. So looks like you would need to manually transpose for 2D (or permute for nD) on the MATLAB side as you suggested if you want the data in the file to look like the "same" dimensions as the MATLAB variable.
Maybe submit a bug report and see what TMW has to say about all this. I don't know if I would classify this as a "bug" per-se since the dimensions and data storage in the file appear to match each other. What I might expect is that MATLAB would match whatever the official Fortran HDF5 interface subroutines do. If the official Fortran API routines do the same thing as MATLAB then I would say MATLAB did it correctly (but should document this behavior). But if the official Fortran API routines permute the data into "C" array storage order, then MATLAB is out of bed with this and I might call it a bug even though the file is written correctly (just didn't match the apparent expectation of the HDF Group). (Maybe contact the HDF Group and ask them that question).
0 件のコメント
その他の回答 (3 件)
Kameron Harris
2016 年 10 月 20 日
編集済み: Kameron Harris
2016 年 10 月 20 日
1 件のコメント
James Tursa
2016 年 10 月 20 日
The HDF Group intent seems to be that applications should be able to write to the file in a native storage order. This seems reasonable to me, especially from a speed standpoint. Why cripple column-ordered languages (Fortran, MATLAB) with a hard requirement to permute the data each time you read/write?
Kameron Harris
2016 年 10 月 20 日
編集済み: Kameron Harris
2016 年 10 月 20 日
2 件のコメント
James Tursa
2016 年 10 月 20 日
Well, so this pretty much answers the question. The HDF Group intended the various applications (Fortran, MATLAB, C, C++, Python, etc) to be able to write to the file in a native storage order and simply list the dimensions of the data in the file in a specified order (slowest changing first ... fastest changing last). It is then incumbent on the user to know what storage order his/her applications use if they are to share data through this file format ... and permute the data accordingly if necessary.
So given this language in the HDF doc, I would say MATLAB is doing everything correctly (but maybe could help the user out with some documentation about interoperability with other languages/applications).
参考
カテゴリ
Help Center および File Exchange で HDF5 についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!