Appending to a saved dataset

5 ビュー (過去 30 日間)
Anathea Pepperl
Anathea Pepperl 2011 年 6 月 20 日
I'm trying to read data from a text file, do some data analysis, save the results in a dataset, and export my dataset into a .dat file using the export function.
The problem arises when I have several text files and I wind up with well over 100,000 observations and about 200 parameters. My approach right now is, I read data from the text file, save my data analysis in an interim dataset, concatenate my complete dataset with the interim, and at the end of it all I use the export function. So my code looks something like:
complete_ds = [];
for i = 1:length(textfiles),
current_file = textfiles(i);
fid = fopen(current_file);
data = ReadFile(fid);
fclose(fid);
interim_ds = AnalyzeData(data);
complete_ds = vertcat(complete_ds, interim_ds);
end
export(complete_ds, 'file', 'Allmydata.dat');
This is taking a lot of time and I'd like to be able to append to the exported dataset instead. Any suggestions? Also, I know that preallocating may help, but it is difficult to predict how much memory I want to set aside for the dataset since each text file may have a different number of observations.
  3 件のコメント
Image Analyst
Image Analyst 2011 年 6 月 21 日
How many text files? How much time? Minutes? Hours? What is the difference between observations and parameters (if that matters)? You can take a guess at preallocating by looking at the file size. If you have 50,000 lines (estimated from a file size of, say, 50 kb), then preallocating say 40 or 50 thousand rows in the array would be faster than allocating none at all, even if you have to extend it a few rows or truncate it a few rows because you didn't use them all. Inside AnalyzeData(), can you possibly estimate the number of rows that interim_ds will need?
Anathea Pepperl
Anathea Pepperl 2011 年 6 月 21 日
Jan, ReadFile is a function that I would use to read the text file and convert it into a Matlab matrix for easier "digestion" by the AnalyzeData function. If the data were put into a regular text file, it wouldn't be so bad; however, my data has missing values which are not handled well when put into a text file. Hence, the need to use the dataset array (unique to the Statistics toolbox).
Image Analyst, thanks for reminding me that I can look at the file size! This is probably going to be the best option for me.

サインインしてコメントする。

回答 (1 件)

Matt Tearle
Matt Tearle 2011 年 6 月 21 日
If it just comes down to "I'd like to be able to append to the exported dataset instead", then here's one way to do it, but it's a bit of a nasty hack...
  1. Find the directory $MATLAB\toolbox\shared\statslib\@dataset (where $MATLAB is your installation directory -- eg C:\Program Files\MATLAB\R2011a).
  2. Copy the entire @dataset directory to somewhere local.
  3. Inside @dataset, make a copy of export.m and call it export_app.m (or whatever).
  4. Edit export_app.m. On line 1, change export to export_app. Change line 169 (in R2011a, at least -- it might be slightly different in other releases) from fid = fopen(filename,'wt'); to fid = fopen(filename,'at'); Save the file.
Then
>> export(x1,'file','testappend.dat')
>> export_app(x2,'file','testappend.dat','WriteVarNames',false)
should work for you.
Note, though, that you're now using a local version of the dataset class, so funky instabilities may ensue... Use with caution! Probably best to hide it away in a directory somewhere and go into that directory only for this purpose!

カテゴリ

Help Center および File ExchangeText Data Preparation についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by