Using save with -v7.3 takes a long time and the mat file size is enormous
118 ビュー (過去 30 日間)
古いコメントを表示
I tried to save with -v7 the file size was 18 MB while with using -v7.3 it's 6 GB !!!
4 件のコメント
Walter Roberson
2016 年 11 月 10 日
Can you make the 18 megabyte version available through something like Google Drive?
採用された回答
George
2016 年 11 月 10 日
"Note: Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files."
Using the -v7 option was my remedy as well.
4 件のコメント
Mike
2020 年 2 月 27 日
I have a scenario where saving with v7.3 results in a 750 MB mat file whereas saving with v7 results in a 3.4 MB mat file. The data i was saving was an array of Simulink.SimulationOutput returned from a parsim command.
Leon
2024 年 5 月 28 日
編集済み: Leon
2024 年 6 月 1 日
This is absurd. Eight years later and MATLAB hasn't fixed this rather important issue. @George, are you able to escalate this?
In my case I have a table with 6 columns. When I save the columns individually, they are about 75 MB each, so about 450 MB of disk space is actually required. When I save all 6 columns in one one file I'm forced to use -v7.3 and the .mat takes ages to save and is 17 GB. Using whos in Matlab tells me that the uncompressed size of each column is about 920 MB, so even uncompressed they would be about 5.5 GB. Even if Mathworks wants to use HDF, surely they could store each column as a separate uncompressed variable, which would make the file less than a third of the size, and I would have though that Mathworks could compress each column of a table before saving it. We're talking about serious space savings and time savings. Even taking the 17 GB and zipping it in 7zip with the default setting results in a file that is 750 MB.
I'm having to write a slightly awkward script to check the size of columns and split a table up into separate columns (in my current case I can use pairs of columns) before saving and then recombine the columns into the table after loading, but some people will have columns that don't even fit in a -v7 .mat. Is there a better way of doing this?
その他の回答 (1 件)
Rik van der Weij
2020 年 6 月 8 日
編集済み: Walter Roberson
2020 年 6 月 8 日
tried the following:
a = ones(15000);
save('a.mat', 'a'); % 800kb file
save('b.mat', 'a', '-v7.3'); % 11 mb file
The same problem I have with real data. My file gets flagged for 2GB limit, although any file I save in reality is much smaller, and I'm forced to save in -v7.3 and then the file size gets really, really large.
2 件のコメント
Walter Roberson
2020 年 6 月 8 日
-v7 MAT files have 32 bit size counters. For any particular variable, the process is to generate the uncompressed variable (which must therefore stay within the limits of the 32 bit counters), and then run a compression routine on it and store the compressed version. There is no clever algorithm to do piecewise packing into segments that each individually fit into 2 GB or 4 GB compressed, there is just the raw (uncompressed, not-clever) serialized representation and the LZW version of that, in -v7 files.
Unfortunately, Yes, -v7.3 HDF files are not nearly as compact as one might hope.
Poking at b.mat with an HDF viewer, I see that it was created with GZIP level 3 compression, 169.972:1 compression ratio, which is 99.4%. When I wrote those 1's out in binary with no overhead (just double precision numbers) I find that gzip -3 does indeed compress to 99.4% (though smaller than the .mat file). I find that even gzip -9 only compresses to 99.8%, leaving a file that is over 2 1/2 megabytes.
Now, if I take that gzip -9 result and pass it through gzip -9 again, then I get a super small file, only 8553 bytes, so there is still a lot of redundant information left after the 99.4 or 99.8% compression, but gzip -3 or gzip -9 cannot find that in one pass.
It looks to me as if the HDF5 specification permits a couple of compression options that could sometimes be more effective, but it does turn out that what MATLAB is invoking is not unreasonable -- it isn't Mathwork's fault that libz's gzip -3 or even gzip -9 do not do nearly as well as one might hope.
Leon
2024 年 5 月 28 日
編集済み: Leon
2024 年 5 月 28 日
Thanks for your comments. Do you know if there is a workaround for this? In my case I have size columns that are 920 MB each before compression (5.5 GB total) and 75 MB per column (450 MB total) if saved individually with standard compression, but 17 GB if saved as a -v7.3 .mat. Is my only other option to save the columns (or batches of rows) separately and then reconstruct the table after loading? Thanks.
参考
カテゴリ
Help Center および File Exchange で HDF5 についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!