How to efficiently integrate big data without using memory / (How to create big data)

  • in a study i will produce large arrays.
  • Each array will have at least 500 MB size.
  • Each array will have the same number of rows.
  • the total size of dataset will be approximately 20 GB or over.
  • Somehow I have to create a single variable/array which includes all data and size of 20 GB.
matfile seems a good solution. However when the size of file increases, it gets slower. How can i handle this problem?

9 件のコメント

blaat
blaat 2015 年 8 月 18 日
How you store and access big data is strongly dependent on what you need to do with it. Do you need all arrays at the same time? Do you need specific values from these arrays at the same time? Can you somehow partition your calculations to a subset of the data?
Without more information it is very difficult to give advice on your problem.
Mehmet OZC
Mehmet OZC 2015 年 8 月 18 日
I do not need all arrays at the same time. I think I can easily pick a few colums with matfile command. The issue is to bring together all variables to a single file. I must write all variables to a single file. this file will be a large file. I don't know how to combine/integrate separate files.
Any help is appreciated.
Mehmet OZC
Mehmet OZC 2015 年 8 月 18 日
For example;
A1 = 3918x1330 (21 MB on disk)
A2 = 3918x10000
A3 = 3918x20000
...
A99 = 3918x100000
All variables will have same number of rows. What i want to do is to write the following to a single file
[A1 A2 A3 ... A99];
blaat
blaat 2015 年 8 月 18 日
If can process the arrays separately, perhaps it would be more convenient to keep them as separated files. Or is there another reason you want a single, large file?
If a single file is required, I would advise against storing everything in a single variable. As far as I know, there is no way of reading only part of an array from a .mat-file, so the file will require 20 GB of memory to load.
Steven Lord
Steven Lord 2015 年 8 月 18 日
Why do you need to write them to a single file? Why not put each in its own file; that way if something were to happen to one of the files you wouldn't lose all of your data?
Mehmet OZC
Mehmet OZC 2015 年 8 月 18 日
編集済み: Mehmet OZC 2015 年 8 月 18 日
there can be more than one way to solve a problem. Saving all separate files to a single file on hard disk will ease my other calculations.
Matlab can easily load part of variables
The issue is to write such that big file.
blaat
blaat 2015 年 8 月 18 日
So, if I understand correctly, your problem is this: you want to write 20 GB of data to a single variable in a .mat-file, but it's getting unworkably slow? Or doesn't it work at all?
Mehmet OZC
Mehmet OZC 2015 年 8 月 18 日
編集済み: Mehmet OZC 2015 年 8 月 18 日
It works to a degree. When i try to append a 2 GB file to a 4GB file it gets slower. MATLAB does wonderful things. I believe it can handle this or is it impossible to create a really large file with using ordinary computers?
Walter Roberson
Walter Roberson 2015 年 8 月 18 日
I wonder if compression is leading to slowdowns? I do not know whether -v7.3 with matfile uses compression; see discussion http://www.mathworks.com/matlabcentral/answers/15521-matlab-function-save-and-v7-3 and http://www.mathworks.com/matlabcentral/answers/137592-compress-only-selected-variables-when-saving-to-mat

サインインしてコメントする。

 採用された回答

JMP Phillips
JMP Phillips 2015 年 8 月 19 日
編集済み: Walter Roberson 2015 年 8 月 19 日

0 投票

Here are some things you could try:
Use the matfile function, which allows you to access and change variables directly in MAT-files, without loading into memory: http://au.mathworks.com/help/matlab/large-mat-files.html http://au.mathworks.com/help/matlab/ref/matfile.html
Structure your data differently: - if you are representing the data as doubles, maybe you can afford less accuracy e.g. use int32. For example, you can use scaling of 1e4 to represent a double value such as 100.3425 as an integer 1003425.
With MATLAB:
  • use 64 bit matlab version
  • try disabling compression when saving the files, with the -v6 option
Optimize your PC for your task:

2 件のコメント

Walter Roberson
Walter Roberson 2015 年 8 月 19 日
The -v6 option is incompatible with matfile and with objects over 2 Gb.
In one of the links provided above I have run across following code
example = matfile('example.mat','Writable',true);
[nrowsB,ncolsB] = size(example,'B');
for row = 1:nrowsB
example.B(row,:) = row * example.B(row,:);
end
And that solved my problem. Thanks

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by