How can I improve MATLAB performance?

When I import a large amount of data (using the wizard) it takes a very long time to complete. For example, importing a text file of 3.6 GB takes more than 5 minutes. Then when I try to plot some of that data it takes an additional few minutes. If I try to 'probe' the data using the Data Cursor it, once again, takes minutes for the data value to appear!
I need to reduce this lag from minutes to seconds, as its making the use of MATLAB unbearable.
Relevant Info: Win7 Prof 64-bit i5-2400 cpu (so 3.4GHz with turbo, 4 core, 4 logical processors) 8GB RAM 500GB HDD, 7200 RPM Matlab 2011b
While the data was being imported I noticed that 25% of the CPU was being used on average (one core maxed), just over 4GB of RAM used, and 5-10 MB/s HDD read.
What do I need to do to improve performance? Is there an option i'm unaware of inside Matlab? Should I buy a SSD? Perhaps upgrade to 16GB?

回答 (1 件)

per isakson
per isakson 2012 年 9 月 5 日
編集済み: per isakson 2012 年 9 月 5 日

0 投票

Reading the text files
  1. Windows Task Manager and Resource Monitor. Does "Physical Memory, Free" decrease to zero during during reading? The Windows file cache is hard to understand - IMO.
  2. I guess (based on googling) that a SSD will improve reading speed by something like three. However, that does not apply to your text files - cpu bound.
  3. 16GB. It is a serial read, thus it is no problem that old "chunks" are remove from the file cache. I guess it won't help much for reading.
What does your use case look like?
Spontaneously, I would say:
  1. Transfer the data of the text file to a binary file in an unattended batch job.
  2. Make a test with HDF5 and the high level API. The low level API is too low.
  3. Make a test with save the data to a version 7.3 mat-file and access it with the function, matfile.
  4. SQLite might be an alternative, see http://www.sqlite.org/ and http://mksqlite.berlios.de/.
  5. Reading binary files from an SSD will be fast
However, I don't know what kind of data you have.
You say "Then when I try to plot some of that data it takes an additional few minutes." How many points do you try to plot. Which plotting function do you use? What does the function, profile, say?
Data Cursor shouldn't take that long. How many points do you have?
.
Conclusions:
  1. Convert the text files to binary
  2. Study your code with the function, profile
.
--- In response to comments 1 to 5 ---
[...] I'm partially too lazy to do so.
In my comments below I'll try to remember that you want to focus on your domain data rather than clever Matlab code.
I collect 5 values (time, voltage, etc) every 20ms for 10 days, so approximately 5 million points.
  1. I still think you should convert the text files to binary. With 5 long time series fwrite/fread is an alternative. SQLite is not appropriate and HDF5 might be overdoing. How many such 3.6GB files will you have during the next year? If many, will you need to revisit old data sets?
  2. I bet nobody at The Mathworks ever thought anybody would try to plot time series with 5e6 elements:). Thousands of elements per pixel on the screen - nobody would want to do that. However, several years ago I made a tool to "browse" time series data. With typically 5e4 elements I had real problems with the response times. My solution in short: i) use the graphic functions to show a "time-window", ii) keep the full data outside the handle graphic objects, iii) update the data on display with set( line_handle, 'Xdata', X(time_window), ... ). This approach works very well and I still use my Databrowser. I don't know if the difference is as large with recent releases of Matlab. As a side effect Datacursor will become quicker. Con: it certainly takes some effort to develop a tool.
  3. Second thought: would the accuracy suffer if you down-sample the signals? "filtfilt, Zero-phase digital filtering" or something in an unattended batch job. Why not: read text, down-sample, save some different versions of the data to binary files?
but the Data Cursor (and zoom, highlight, etc) delay is not something I can allow to endure.
  1. I don't think the standard Zoom, Datacursor, etc. can handle your use case.
  2. In my Databrowser I have "replaced" Zoom by changing "time_window". I also have a search function, which allows me to jump to the next "time_window" in which some condition is true (e.g. find_peak).
upgrading the video/graphics
  1. Ten years ago the graphic card did certainly matter even for 2D.
  2. Does Matlab throw "5 million points" at the graphic card? Sounds scary, but I have no idea.
used the Data Cursor I saw the "Physical Memory Free" drop to nearly zero than jumped to 8GB
  1. That tells me the Datacursor is not design to work on this kind time series.
  2. Did the graph in Physical Memory Usage History increase gradually and then drop quickly? Speculation: If so does that mean that Datacursor creates and releases a temporary variable of that size?
  3. I once customized Datacursor to work on one year of hourly data. I never succeeded to make that responsive.
Star Strider: "use the findpeaks function to identify peaks"
  1. This is a good point
  2. In the File Exchange there are a number of find_peak_contributions, which might be alternatives to signal toolbox. Some of them comprise a GUI and plotting.
  3. However, one should not forget to inspect the data

6 件のコメント

Daniel T
Daniel T 2012 年 9 月 5 日
編集済み: Daniel T 2012 年 9 月 5 日
Windows Task Manager and Resource Monitor. Does "Physical Memory, Free" decrease to zero during during reading?
Yes, "Physical Memory Free" decreases to zero in the first 30 seconds of reading.
You say "Then when I try to plot some of that data it takes an additional few minutes." How many points do you try to plot. Which plotting function do you use? What does the function, profile, say?
I collect 5 values (time, voltage, etc) every 20ms for 10 days, so approximately 5 million points. The file is tab delimited so the data is kept in 5 separate columns. Using "data" as the name for the data the plotting was invoked like so:
plot( data(:,1) , data(:,5) )
This plots time vs. coulombs (I'm charging and discharging a battery). It ends up making a sinusoid shape in this particular case , as expected. When I try to label the peak and valleys of the 'sinusoid', using the Data Cursor, its not possible in a reasonable amount of time.
I can accept the initial importing delay and even the plotting delay, but the Data Cursor (and zoom, highlight, etc) delay is not something I can allow to endure.
Greg Heath
Greg Heath 2012 年 9 月 5 日
What about writing code to find and mark the peaks and valleys?
Jason Ross
Jason Ross 2012 年 9 月 5 日
Per makes a number for great points. I would seriously consider the 16GB RAM route and SSD.
With the SSD, you should also consider upgrading the drive controller to the 6 GB/s speed. Access speed on the drive will change from something like 10 ms to 0.1 ms, although as Per indicates, this will only get you so far because you are waiting on CPU.
The RAM will let you essentially operate without a swap file, and I wouldn't be surprised if the cost of 16 GB was around $75 for your system. That's going to make things all-around peppy, especially since your swap file will also be on the SSD.
The other suggestions for changing the file type are great, too.
Daniel T
Daniel T 2012 年 9 月 5 日
What about writing code to find and mark the peaks and valleys?
The short answer is, in this case, I'm partially too lazy to do so. Additionally, I don't want to mark ALL the peaks and valleys, nor do I intend to mark the same specific peaks and valleys from file to file. I'm using MATLAB as a data probing tool in this case. No code is written.
Per makes a number for great points. I would seriously consider the 16GB RAM route and SSD.
I just added 8GB to my computer, now totaling 16GB. When the file is importing the "Physical Memory Free" no longer decreases to zero. Instead there is ~5GB remaining. I haven't noticed any obvious speed-up, but at least there have been no 'stalls' or crashing. I should note that when I used the Data Cursor I saw the "Physical Memory Free" drop to nearly zero than jumped to 8GB. It behaves in the manner every time I add a new data cursor point. It currently (with 16GB) takes ~40 seconds for the Data Cursor label to appear once I click on the figure.
I am in discussion with upgrading to a 120GB SSD at the moment. Our IT guy at the company suggested upgrading the video/graphics card with one we have laying around first, so I'll try that, but I predict that will have no effect given this is a simple 2D plot.
Star Strider
Star Strider 2012 年 9 月 5 日
If you have the Signal Processing Toolbox, use the findpeaks function to identify peaks and, with another line or two of code, the valleys as well. The function has a number of options to deal with noisy data and other constraints.
per isakson
per isakson 2012 年 9 月 5 日
編集済み: per isakson 2012 年 9 月 5 日
Bump: See my response above.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeTime Series Events についてさらに検索

タグ

質問済み:

2012 年 9 月 5 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by