Memory Pre allocation, Dataset Array

Hi All,
How Can I Pre allocate memory for a DATASET ARRAY which should have 69 Rows and 740 Columns. There is another dataset array of the same size in my workspace. How Can I do NewArray=dataset(size(OldArray)) ?,,,to use size arguments from already built DATASET Array?
Regards,
AMD.

 採用された回答

Daniel Shub
Daniel Shub 2012 年 5 月 3 日

1 投票

The dataset class is basically a container holding pointers to other variables/memory locations. Even if you can preallocate the dataset array, I am not sure it will improve performance by much.

8 件のコメント

ARS
ARS 2012 年 5 月 3 日
so which container shall I use to improve performance? I am using the following for loop
for c= 1:739
for r=1:60
if else statements
Also...how to vectorize the above....my code is taking 4927 seconds using tic toc;
Regards,
AMD.
Daniel Shub
Daniel Shub 2012 年 5 月 3 日
If it is a preallocation issue, then the loop over r when c is equal to 1 will take longer (probably much) than when c is equal to 739 (assuming the processing inside the loop is the same).
You are doing 44340 iterations in 4927 seconds for a rate of about 10 iterations per second or 100 ms per iteration. Assuming what you are doing inside the loop is nontrivial, that doesn't seem that long.
You should look at profiling your code to see what the bottleneck is.
ARS
ARS 2012 年 5 月 3 日
Hi Daniel,
My profiler for the above code says the following:
time calls line
3445.98 1550916 150 var_j = a.data{varIndices};
the above line is using 3445.98 seconds....
I am matching values in a dataset array and then fetching values from fts objects for addition. Putting the results back in dataset array.
Can a Parfor loop speed it up? one parfor and one nested for in parfor?
Daniel Shub
Daniel Shub 2012 年 5 月 3 日
First, I don't think this is related to your original question any longer. Consider asking it as a new question (possibly voting or accepting answers that helped).
Second, the profiler results seem to suggest to me that the bottleneck is not a preallocation issue. I don't know what the profiler line means, but if it means 1550916 calls in 3445.98 seconds, then that is 2.2 ms per call. Assuming varindicies is fairly large, then that does not seem like much time. Speeding up such a simple call is going to be difficult. It might be that using a dataset array is just too slow.
You have provided no where near enough info to determine how to speed up what you are doing. You are going to need to break your problem down into something small enough to post a MWE if you want optimization help.
James Tursa
James Tursa 2012 年 5 月 3 日
The expression a.data{varIndices} creates a shared data copy of the variable, meaning that it allocates a new variable structure but the data pointers are pointing to the original data. I.e., there is no data copying going on. Is the "a" variable global? That can slow things down. I am in the same boat with Daniel ... we really don't have enough information to help you since the line in question is not doing much at all. You will have to post more code.
ARS
ARS 2012 年 5 月 3 日
Thanks Daniel.
What I have learnt from google searching is that this is happening due to slow indexing capabilities of DATASET array.
Now I am trying to convert my dataset Array to a Cell array but again using the function dataset2cell produces this error message..
""" Undefined function 'dataset2cell' for input arguments of type 'dataset'. """
What does the above error mean?
how to convert my dataset array to a cell array?
Daniel Shub
Daniel Shub 2012 年 5 月 3 日
"What does the above error mean" and "how to convert my dataset array to a cell array" are new questions. I will not attempt to answer them in a comment to an answer to an unrelated question. I have never used the dataset class so have little insight into converting it to other classes, or even what it is good for. You will be much better asking these as new questions.
ARS
ARS 2012 年 5 月 3 日
Sorry for the short code. I will post the other questions as new independent questions.

サインインしてコメントする。

その他の回答 (4 件)

Oleg Komarov
Oleg Komarov 2012 年 5 月 3 日

1 投票

One way:
n = 300;
names = arrayfun(@(x) sprintf('C%d',x),1:n,'un',0);
dataset([{zeros(30,n)},names])
Or
dataset(zeros(size(OldArray)));

2 件のコメント

ARS
ARS 2012 年 5 月 3 日
Hi Oleg,
I tried ABC=dataset(zeros(size(oldarray)));
it works good on rows and takes rows(69) but shows only one column as 69x1...while size(oldarray)=69x740
why is it?
Oleg Komarov
Oleg Komarov 2012 年 5 月 3 日
The first solution creates 300 columns (you can set 'n' to be anything) and the second solution is slightly different. Depends what you need to do.
I am not sure but the second might be more efficient.

サインインしてコメントする。

jeff wu
jeff wu 2012 年 5 月 3 日

0 投票

NewArray = zeros(size(Oldarray))

1 件のコメント

ARS
ARS 2012 年 5 月 3 日
No Jeff, It creates a numeric matrix of the required size....I am talking about making/Preallocating a "DATASET ARRAY"

サインインしてコメントする。

ARS
ARS 2012 年 5 月 3 日

0 投票

Hi All,
This is for the info of the MATLAB community that when I used dataset array in my loops which had 44340 iterations, the task was completed in 4927 seconds for a rate of about 10 iterations per second. means it took me approx 82 minutes to do that.(My whole day was wasted in running that script 4 times).
After I used a CELL ARRAY in place of the laziest DATASET ARRAY, the same task completed in
Elapsed time is 872.837510 seconds. means just 14 minutes (came down from 82 minutes).
This was amazing...no other changes to the code.
Life is easy with Cell Array.
Regards,
AMD
Peter Perkins
Peter Perkins 2012 年 5 月 3 日

0 投票

Ahmad, there are several things going on in this thread. Let me try to answer them one by one.
A dataset array can hold just about any type of variable, so just specifying what size you want is not sufficient to create one. As you noted, Oleg's suggestion will create a dataset with a single variable, and Jeff's suggestion will create a numeric array. You need to create a dataset array. I can't say exactly what you need to do to preallocate, because you have not provided any specific information about types. In the simplest case, if you want the new array to have the same data types in each variable as the old array, all you need to do to preallocate is
dsNew = dsOld;
and then just overwrite the existing values with the new ones. There's no particular reason why you would need to start with zeros, or empty strings or whatever, unless you would only be overwriting some elements. But there are ways to do that too. If the new array is to contain different types of data than the old array, there are ways to do that too.
You must be running a version of MATLAB older than R2012a, which is why dataset2cell is not found.
As for your comments about dataset vs. cell:
Elsewhere you've asked a question or two about manipulating data in a dataset array. One of them involved a nested loop with some string comparison. Not sure if your comments here are related to that or not, but it is often possible to avoid the kind of loop you had there by appropriate use of vectorized operations. So in that case, strcmp. Yes, dataset can be much slower than cell for scalar access, but it is often possible to write code that is both faster to write and to run, and easier to read, by using vectorized operations. In this case, I can't say what that would be without more information.
I can't tell what you are doing with your data, but you may ultimately find that a cell array is not the best way to store it. dataset provides a convenient way to wrap mixed types of data into one container, and still be able to access individual variables as their "native" type using dot subscripting. For example, with a dataset array it's easy to say mean(data.Var1). From what I can tell from your other questions, you have strings and numbers. You can put all that in a cell array, but you'll likely end up disappointed if you are storing scalar numeric data that way, because they are too general and lack, for example, the ability to do simple math. That will also not scale well, again because cell arrays are too general -- internally a cell array will store each scalar value as a separate MATLAB array. On the other hand, if what you have is all string data, then yes, a cell array is the right container.
Hope this helps.

2 件のコメント

ARS
ARS 2012 年 5 月 3 日
Thanks Peter, It was very helpful.
I am using MAtlab R2011b.
Yes,I am doing Sting comparisons in two dataset containers through a for(and a nested for) loop with 69 If-elseif conditions in the nested loop.
For string comparisons(Strings from two different Datsets), Dataset Arrays take a lot of time while Placing my data just for string comparison in Cell Array(s) is MUCH faster.
For my purpose, I can use a Cell array in this computation and save time and very soon I export it back to Excel for further working of mine.
I thank you for your comments and would seek guidance in the future as well.
Regards,
AMD.
avantika
avantika 2013 年 8 月 29 日
Hi!
I am trying to convert a dataset to cell array of strings to be able to use the unique command in matlab version 2009b. However when I use the command C = dataset2cell(ds3);
I get the following error message:
??? Undefined variable "dataset2cell" or class "dataset2cell".
Is there any solution to tihs problem in matlab v2009.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeData Type Identification についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by