problem in this code

hi,
I have ran this code since more than 4 hours ,and did not complete yet. where is the problem ?
I read 1000 files, but the running time in unreasonable:
%%%%%%%%%%%%%%%%%%5
arr1=sparse(1000,232944);
targetdir = 'd:\social net\dataset\netflix\netflix_2\training_set';
%%nofusers=480189
targetfiles = '*.txt';
fileinfo = dir(fullfile(targetdir, targetfiles));
for i = 1:1000
thisfilename = fullfile(targetdir, fileinfo(i).name);
f = fopen(thisfilename,'r');
c = textscan(f, '%f %f %s', 'Delimiter', ',', 'headerLines', 1);
fclose(f);
c1=sparse(length(c));c2=sparse(length(c1));c3=sparse(length(c));
c1 = c{1};
c3=c{3};
L(i)=length(c1);
format long
dat=round(datenum(c3,'yyyy-mm-dd'));
arr=[c1 dat];
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
end

10 件のコメント

Fangjun Jiang
Fangjun Jiang 2011 年 11 月 22 日
Please format your code.
Image Analyst
Image Analyst 2011 年 11 月 22 日
Put disp(i) in the loop and see how many i's it prints out to the command line. You might also see if the printouts start to slow down as the count gets higher.
huda nawaf
huda nawaf 2011 年 11 月 23 日
thanks,
I formatted it:
arr1=sparse(1000,232944);
targetdir = 'd:\social net\dataset\netflix\netflix_2
\training_set';
%%nofusers=480189
targetfiles = '*.txt';
fileinfo = dir(fullfile(targetdir, targetfiles));
for i = 1:1000
thisfilename = fullfile(targetdir, fileinfo(i).name);
f = fopen(thisfilename,'r');
c = textscan(f, '%f %f %s', 'Delimiter', ',', 'headerLines', 1);
fclose(f);
c1=sparse(length(c));c2=sparse(length(c1));
c3=sparse(length(c));
c1 = c{1};
c3=c{3};
L(i)=length(c1);
format long
dat=round(datenum(c3,'yyyy-mm-dd'));
arr=[c1 dat];
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
end
huda nawaf
huda nawaf 2011 年 11 月 23 日
I printed i
i is the no.of files I have read.
yes, when read some files the run slow down but.
thanks
huda nawaf
huda nawaf 2011 年 11 月 23 日
for clarifying , i is id of file
Image Analyst
Image Analyst 2011 年 11 月 23 日
No, "f" is the id of the file. "i" is your loop counter. So it just slows down more and more at each iteration until it finally grinds to a halt? I can't really help much since I don't have your files. How big does i get before it take more than about 5 seconds per iteration? Why do you need to reshape arr? Why can't you just construct it in the correct shape to begin with?
the cyclist
the cyclist 2011 年 11 月 23 日
I have not looked at your code in detail, but is it possible that as your code runs, you are using more and more memory? Maybe after a while, you are starting to use virtual memory, which will slow everything down dramatically. You can monitor that.
huda nawaf
huda nawaf 2011 年 11 月 23 日
I have 17777 files with different sizes , but I prefer to read 1000 each time because of long running time.
I checked some of these files , I found the size is with some KB,but I can told u that larger file contains 232944 integer values.
I used textscan , because the structure of files as:
1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26
thanks
huda nawaf
huda nawaf 2011 年 11 月 23 日
sorry I forget format it
1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26
Daniel Shub
Daniel Shub 2011 年 11 月 23 日
Formatting doesn't work in comments (but thanks for trying).

サインインしてコメントする。

 採用された回答

Daniel Shub
Daniel Shub 2011 年 11 月 23 日

1 投票

On every interation you create 3 sparse matrices:
c1=sparse(length(c));c2=sparse(length(c1));c3=sparse(length(c));
You then overwrite 2 of them and never use the third:
c1 = c{1};
c3=c{3};
The variable L is growing in the loop. This probably doesn't matter since it is not that long ...
L(i)=length(c1);
I believe the datenum function is slow (search for Jan Simon and datenum for answers with faster alternatives)
dat=round(datenum(c3,'yyyy-mm-dd'));
This bit of code looks crazy to me:
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
First, I have no idea how it doesn't crash since I think arr should have length L(i)+1, which only equals L(i)*2 if L(i) is equal to 1. You initialized arr1 to be a sparse matrix with a huge number of columns (but it seems like you only use 1). Also, it is unclear why you want arr1 to be sparse. A cell array might be better.

3 件のコメント

huda nawaf
huda nawaf 2011 年 11 月 23 日
Thanks,
L be very long in some loops ,L*2 is representing the length of file.
where in each iteration I read one file, and there sizes are different.
Look :
it seem to me you are not comfortable with this bit of code:
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
in this step, I try to convert each file into row in array.
i gave the structure of my file above , it has three values in each row, I need just two.
so the length of file will be L*2.
if you have any suggestion by which I can convert the values of each files into row in array, I will be grateful
regarding why I used sparse , because without sparse, I will face out of memory problem
Daniel Shub
Daniel Shub 2011 年 11 月 23 日
Thank you I missed the reshape part. The reason I ask about the sparse matrix is that the non-zero elements are not distributed throughout the matrix. For each row i, only the first N_i elements will be possibly non-zero. By using a sparse matrix, the memory required for the matrix changes on each interation (and MATLAB needs to allocate and copy the entire sparse matrix). If you use a cell array, then MATLAB only has to allocate space for the new 2L elements and doesn't have to copy anything. In the end you will have the same number of nonzero elements and will essentially use the same amount of memory. The cell array will probably use less memory then the sparse matrix.
huda nawaf
huda nawaf 2011 年 11 月 23 日
sorry,i did not use cell before.
please tell what about the no. of columns in each row(232944)
arr1 = cell(1000, 1);
why place 1 instead of it?
Also, regarding
arr1{i} = reshape(arr.',1,[]);
how I can save the values of other files ? in this case will save just the current values
Note, I would like to say the no. of columns 232944 may regard very very few files, but i have to assign this dim in matrix, I have no other choice

サインインしてコメントする。

その他の回答 (1 件)

Daniel Shub
Daniel Shub 2011 年 11 月 23 日

0 投票

I would try replacing
arr1=sparse(1000,232944);
with
arr1 = cell(1000, 1);
and
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
with
arr1{i} = reshape(arr.',1,[]);

カテゴリ

ヘルプ センター および File ExchangeParallel Computing Fundamentals についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by