フィルターのクリア

Big Data Analysis with Linear Regression

2 ビュー (過去 30 日間)
Joao Saavedra
Joao Saavedra 2021 年 7 月 14 日
回答済み: Alvaro 2022 年 12 月 29 日
Hi all,
I am doing a project to predict how many cpus will be needed to process a huge file (.nc) of climate data in less than 2 hours (7200s). Sequentially it takes more than 100,000 seconds.
I have the entire program done to process data sequentially and in parallel, up to 8 workers (limit of my cpu). The program takes the datafile, that has data for an entire day of climate data, and divides it in hours (25) so it can process hourly. After the processing is done, i used a stopwatch in the code to record the time taken for each number of workers.
To be easier to process and test the parallell processing, I am using a subset of the data (entire file has more than 270,000 blocks of data).
How can I use the time taken from a subset of the data to extrapolate the cpus needed for the entire data file? I have been lost in this problem for the entire day...
Thanks in advance!

回答 (1 件)

Alvaro
Alvaro 2022 年 12 月 29 日
It's not straighforward to calculate the number of workers that you would need to process your data in less than 2 hours.
Amdahl's law might give you a bit of a formal approach if you are looking to write down some rough calculations.
I would try to determine the number of cores you need by trial-and-error. Since you are looking for approximately a 14x speedup from the serial computation, a rough guess would be to start with 14 workers in your cluster and clock the time. This assumes that your computations are highly suited for parallelization, but, as noted above, it's likely not that simple. From there, try more or less cores until you can fine tune it to the time you are looking for. It could be worth doing a more thorough experiment to determine the optimal number of workers for your process if you need to analyze a large number of those data files in less than 2 hours.

カテゴリ

Help Center および File ExchangeWeather and Atmospheric Science についてさらに検索

製品


リリース

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by