Performance comparison among Struct Array, Cell Array and Table

I am facing an issue when to use what. There are three common way to store data in MATLAB: 1. Cell array; 2. Tables; 3. Struct arrays.
I did some search online for the performance among three of them: Struct will be the fastest, but still not really clear when to use what.
Can someone give me a general concept of the performance among these MATLAB data structures?
Thanks :)

3 件のコメント

James Tursa
James Tursa 2018 年 11 月 2 日
How are you accessing the data downstream in your code? Can you give a short example of how you would be accessing your data using the three methods you mention? I.e., some code or pseudo-code showing what you are thinking about implementing.
Stephen23
Stephen23 2018 年 11 月 6 日
"There are three common way to store data in MATLAB: 1. Cell array; 2. Tables; 3. Struct arrays."
You only list container classes. What about the simpler ways of storing data: the numeric array (single, double, uint*, and int*), the character array, and the logical array? These are faster to access than the ones that you list. Is there a reason why you do not list them?
Michael O'Brien
Michael O'Brien 2022 年 3 月 18 日
i found this super helpful due to the discussion it generated - thanks Kat Lee!

サインインしてコメントする。

回答 (3 件)

Bruno Luong
Bruno Luong 2018 年 11 月 2 日

3 投票

My general rule of thumbs:
  • Simple Array is the fastest
  • Using cell if you don't have a choice (mixing class or uniform sizes) and don't care about how to "name" elements.
  • Next recommendation is using struct of arrays and/or cell-arrays, that allows to have meaningful fieldnames, and flexible data exchanges.
  • Avoid at all cost array of structs for large number of records (said > 10), this will soon or later have big penalty of speed. I can't remember the last tile I use it, probably in my youth and never did it again.
  • Table is sort of Object Oriented built on top of CELL, personally I never feel a need to use it. I recognize it's very attractive for people who like excel sheet. ;-)

5 件のコメント

Kat Lee
Kat Lee 2018 年 11 月 2 日
編集済み: Kat Lee 2018 年 11 月 2 日
Thank you so much @Bruno for your such specific answers.
According to your answer, the speed performance will be Cell > Struct > Table, am I right?
Could you explain a little bit more of "Avoid at all cost array of structs for large number of records (said > 10)", I feel that >10 will be very easy to reach since I am dealing with large number most of the time.
Bruno Luong
Bruno Luong 2018 年 11 月 2 日
編集済み: Bruno Luong 2018 年 11 月 2 日
Struct of array and array of struct
% struct of array, recommended
>> s = struct('x', 1:100, 'value', sin(1:100))
s =
struct with fields:
x: [1×100 double]
value: [1×100 double]
% array of struct, not recommended
>> s = struct('x', num2cell(1:100), 'value', num2cell(sin(1:100)))
s =
1×100 struct array with fields:
x
value
The later will be very slow and impractical to process in MATLAB, since that data are scattered everywhere. Juts avoid to have you data structure like the second way, contrary to language like C/C++ where such data structure is perfectly efficient to handle.
Peter Perkins
Peter Perkins 2018 年 11 月 6 日
Bruno, tables are NOT built on top of cell, at least not in the way that you probably mean. Compare the memory requirements:
>> x = randn(1000000,10);
>> t = array2table(x);
>> c = num2cell(x);
>> whos x t c
Name Size Bytes Class Attributes
c 1000000x10 1200000000 cell
t 1000000x10 80003090 table
x 1000000x10 80000000 double
A struct array has memory footprint similar to a cell, while a scalar struct of vectors has a footprint similar to a table.
Performance-wise, a double array wins. A cell array or a struct array is likely gonna need a loop, since there's no simple way to get a contiguous vector of values corresponding to one of x's columns. A table, and a scalar struct will have good performance for vectorized operations.
Jaromir
Jaromir 2019 年 11 月 28 日
Peter
Tables are built on top cell arrays. Your example is misleading since you're comparing two very different things. Your cell array c is literally a 1000000-by-10 array. Your table t is built on top of a 1-by-1 cell array, where the entire numeric array x is placed in one cell. This is how tables work - each "variable" in the table language is placed in its own cell. The table t is hence sort of equivalent to a cell array { x }.
Walter Roberson
Walter Roberson 2019 年 11 月 28 日
Notice Peter's phrase, "at least not in the way that you probably mean."
In particular, many people tend to think that a table with N rows and V variables is stored as an N by V cell array, but instead it is stored as a struct that contains a 1 x V cell array each entry of which is an object with N rows.

サインインしてコメントする。

Matt J
Matt J 2018 年 11 月 1 日
編集済み: Matt J 2018 年 11 月 1 日

1 投票

They should all be about the same speed. If speed matters and the data is large, however, you shouldn't be using any of these. You should be storing data in numeric arrays instead. That way the data will be held contiguously in RAM and accessing it will be very fast.

5 件のコメント

Walter Roberson
Walter Roberson 2018 年 11 月 1 日
My expectations are that cell would be slightly faster than struct as struct involves a symbol lookup where cell is just following pointers. Either one should be faster than table objects as those have overhead for object processing.
Matt J
Matt J 2018 年 11 月 1 日
Kat Lee's comment moved here
Thank you for answering my question, for my case, store in numeric array won't be applicable for me since I also need the fieldname to associate with number.
Plus, there do exist time difference between these three when I run the scripts
Matt J
Matt J 2018 年 11 月 1 日
編集済み: Matt J 2018 年 11 月 1 日
Thank you for answering my question, for my case, store in numeric array won't be applicable for me since I also need the fieldname to associate with number.
But it is better to do this
s.name=rand(10000,1);
than it is to do this,
z=num2cell(rand(10000,1));
[s(1:10000).name]=z{:};
Stephen23
Stephen23 2018 年 11 月 6 日
編集済み: Stephen23 2018 年 11 月 6 日
"store in numeric array won't be applicable for me since I also need the fieldname to associate with number."
That is a very poor reason not to use numeric arrays, especially if you then ask about efficiency accessing data!
Simply keep an array of text data (e.g. cell array of char vectors, string array) and a corresponding array of numeric data (any numeric class). This will make your data processing much simpler and more efficient than messing about with numeric data pointlessly split up into a cell array.
A table might be a good solution (it effectively does the same thing).
Bruno Luong
Bruno Luong 2018 年 11 月 6 日
Especially when one can put the numerical array inside a struct with a meaningful fieldname.

サインインしてコメントする。

Peter Perkins
Peter Perkins 2018 年 11 月 6 日

0 投票

Kat, there's no way you are gonna get a useful answer without providing more information. The best representation of your data is gonna depend on your data and what you are doing, and how you plan on writing your code. Without knowing that, any answer is just guessing.

5 件のコメント

Kat Lee
Kat Lee 2018 年 11 月 6 日
Thanks Peter for letting me know this condition. The thing is that we have several different case for our data need to stored in MATLAB. One type is stored as table, the table has 3 columns for attributes name and multiple rows corresponds to values that have been calculated. This is one case, another case is that we been using cell to store some data as size = 2 x N (30>N>10), first row is name (char), second is numbers. And then we will apply interpolation based on this cell table for more rows. For these two cases, which data structure will most efficiency one to use?
Peter Perkins
Peter Perkins 2018 年 11 月 6 日
You are probably going to be very unhappy with cell, if you mean what it sounds like you mean.
My advice is to start with either numeric arrays or tables. Only go to something else if you run into trouble.
Kat Lee
Kat Lee 2018 年 11 月 6 日
I really doubt Table's performance, since what I see before Table's performance is not very good
Matt J
Matt J 2018 年 11 月 6 日
編集済み: Matt J 2018 年 11 月 6 日
I really doubt Table's performance, since what I see before Table's performance is not very good
That's not a good reason in and of itself to doubt the performance of tables. The person who was demonstrating their performance to you may have been an inexperienced programmer who didn't use properly vectorized methods to get the best performance.
Walter Roberson
Walter Roberson 2022 年 3 月 18 日
Also, Mathworks has improved table() performance over the years.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeTables についてさらに検索

質問済み:

2018 年 11 月 1 日

コメント済み:

2022 年 3 月 18 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by