Quicker way from for loop for reading columns from different csv files in the same folder

Question

Christos Antonakopoulos 2015 年 12 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/258893-quicker-way-from-for-loop-for-reading-columns-from-different-csv-files-in-the-same-folder

コメント済み: Guillaume 2015 年 12 月 4 日

Hello to everyone,

I have a folder that entails a large number of files lets say 10600 csv files (useful_stator_files). Each csv file entails a large number of columns (about 100 lets say) the length of rows is variable from 10 to 60. I am using the code under:

     for stupid_k=1:(length(useful_source_files_stator))
     final_path_stator{stupid_k}=  useful_source_files_stator(stupid_k).name;  %take the name of the final path depending on the csv files i kept 
    [SIR_AVG, SIR_MIN, SIR_MAX] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'RES_INS_STA_LOG_AVG_PRI','RES_INS_STA_LOG_MIN_PRI','RES_INS_STA_LOG_MAX_PRI'}, 'noHeader', false, 'delimiter', ',' ); % READ THE 3 COLUMNS FROM EACH USEFUL FILE
    [SPEED_AVG, SPEED_MIN, SPEED_MAX] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'SPD_ACT_LOG_AVG_PRI','SPD_ACT_LOG_MIN_PRI','SPD_ACT_LOG_MAX_PRI'}, 'noHeader', false, 'delimiter', ',' ); % READ THE 3 COLUMNS FROM EACH USEFUL FILE
    [Date_Time] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'Date_Time_ms'}, 'noHeader', false, 'delimiter', ',' ); % READ COLUMN FROM EACH USEFUL FILE
              Big_SIR_AVG{:,stupid_k}= SIR_AVG; % update big matrix 
             Big_SIR_MIN{:,stupid_k}= SIR_MIN; % update big matrix  
              Big_SIR_MAX{:,stupid_k}= SIR_MAX;  % update big matrix  
              Big_SPEED_AVG{:,stupid_k}= SPEED_AVG; % update big matrix
             Big_SPEED_MIN{:,stupid_k}= SPEED_MIN; % update big matrix
             Big_SPEED_MAX{:,stupid_k}= SPEED_MAX;  % update big matrix 
            Big_Date_Time{:,stupid_k}= Date_Time;
end

I have a stable path(source dir) and a path that changes(final path), i get inside each file and i get the columns i want and i finally keep them in cell arrays since they are other double vectors or string vectors. CSV import function, i took it from here:

http://www.mathworks.com/matlabcentral/fileexchange/23573-csvimport

It works but all this needs a lot of time, i also am taking another 4-5 signals apart from the 7 that i wrote in the code but that is the idea.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Guillaume 2015 年 12 月 4 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/258893-quicker-way-from-for-loop-for-reading-columns-from-different-csv-files-in-the-same-folder#answer_202213

Parsing 10600 text files is always going to be slow, particularly on Windows which probably struggles with that many files in a single directory. File I/O is probably the major bottleneck in what you're doing and there's not much you can do about it short of using a more efficient form of storage for your data.

Parsing the same files three times (three calls to csvimport per file) is certainly not going to help. There's no guarantee that csvimport code has been written optimally either (certainly after a quick look, the file reading part isn't efficient). You would be much better off using csvread (comes with matlab) only once per file and doing the splitting into individual columns yourself (assuming that this step is even necessary)

Preallocating your Big_* cell arrays would also help marginally.

2 件のコメント
なしを表示なしを非表示

Christos Antonakopoulos 2015 年 12 月 4 日

Thank you,

Yes the preallocating is done already. Regarding the csvimport function it is needed since i have many string values inside, csvread if i am not wrong does not work. Unfortunately, i can not change the way the data are stored and taken.

Guillaume 2015 年 12 月 4 日

Whichever function you use, the biggest and simplest speed up you can make is to read each file once instead of three times. So ask for your columns all at once rather than in three different calls to the reading function.

If csvread does no work, other options are textscan, which requires a bit more work on your part (you have to open and close the file yourself) or readtable which is dead simple to use but comes with the overhead of tables.

Or you could just parse each file yourself with regexp as I showed you in one of your questions.

サインインしてコメントする。

Quicker way from for loop for reading columns from different csv files in the same folder

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Quicker way from for loop for reading columns from different csv files in the same folder

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示