Speeding Up a File Processing Script with Parfor
This video uses a different recording style from my others. Rather than recording continuously while I work, I pause recording when my code changes are taking a long time to execute or I have some repetitive editing tasks. The pausing of the recording effectively edits my video down to a shorter duration.
This lets me show you real projects and problems that typically take many hours to solve—ones that involve lots of troubleshooting, investigating, debugging, and trial-and-error thinking.
I was working on this particular example for most of a day but the resulting video is just 90 minutes, which, yes, is still too long. Feel free to play at a higher speed and skip around. Next time, I will try and be more aggressive in my pausing.
So, getting to the problem itself: I have a script that processes hundreds of large CSV files, which describe a graph of the connections between our web site pages each day. It takes several minutes to load and analyze each file, and the total running time is several hours. So I want to look at trying to speed it up.
I plan to work on these aspects:
- Using the profiler to look for places I can speed up my serial code
- Using
parfor
on my local machine with six physical and 12 logical processors - Making sure my filenames work on Windows® and Linux®
- Getting it working on a 128-processor network Linux cluster
Features covered in this code-along-style video include:
(Originally posted on Stuart’s MATLAB Videos blog)
Recorded: 10 Apr 2023