Tall vs distributed array

I see that we have tall and distributed arrays.
Tall divides data into chunks.
Distributed also divides data into chunks!
What's the differece here?
And, how either of these are connected to parallel computing?

回答 (1 件)

Edric Ellis
Edric Ellis 2018 年 5 月 14 日

0 投票

Both tall and distributed arrays are designed for processing large amounts of data, but they have somewhat different capabilities.
distributed arrays exist spread across the memory of several MATLAB worker processes - so the largest distributed array you can create is limited by the total amount of physical memory you have. Also, distributed arrays are more oriented towards dense and sparse linear algebra. distributed arrays require Parallel Computing Toolbox, and are most effective when used with MATLAB Distributed Computing Server (which allows the use of multiple machines across which to distribute the data).
The data for tall arrays exists on disk, and so their size is not limited by the amount of memory you have available. However, as the name implies, tall arrays can be large only in the first dimension. tall arrays are more geared towards data analytics. tall arrays ship with MATLAB itself, but there is enhanced support in both Parallel Computing Toolbox (which enables parallel processing in a single computer) and MATLAB Distributed Computing Server (which enables parallel processing across a cluster, including Hadoop/Spark clusters).

3 件のコメント

Pey
Pey 2018 年 5 月 14 日
Thanks Edric for the reply.
The data are always stored on disk! I guess it's a matter of how you read from and write to disk and communicate with CPU and GPU.
Both also have improved operation through the two toolboxes of Parallel Computing and Distributed Computing. Distributed array requires Parallel Computing and tall array doesn't. (oen difference here, but why! mathematical, technical or marketing?!)
So if we remove several dimensions of a distributed array so that it has only one large dimension, it turns into a tall array, right? I don't see a point of having two types of arrays. Why don't we have only distributed array?
Not knowing the details on how you handle the memory and read/write, tall array seems redundant to me.
Edric Ellis
Edric Ellis 2018 年 5 月 15 日
The fundamental difference is where the data is held once you've created the array. distributed arrays are more restricted in size because the contents are always in memory, but they are more capable. tall arrays can be much larger - as long as you have the disk space.
Pey
Pey 2018 年 5 月 15 日
Thanks. So if I understood correctly, can I summarize it in this way?

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeCreating and Concatenating Matrices についてさらに検索

製品

質問済み:

Pey
2018 年 5 月 11 日

コメント済み:

Pey
2018 年 5 月 15 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by