Efficient way to import large date sets

Question

Tulio 2013 年 12 月 10 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/109262-efficient-way-to-import-large-date-sets

編集済み: Tulio 2013 年 12 月 12 日

Hi All,

For the past couple of month I have been working with Intraday quotes for stocks and futures. I was focusing more on structing the data and running the codes for a given day, so uploading the data wasn't my main concern. However I finally got past it and now the process of putting the data into Matlab is taking forever. It's taking me half an hour to upload the data sets which contains 10 columns of numbers, 2 columns of strings (date vectors) and 1.7 million rows on average.

I have no clue of how long it is suppouse to take to get the data set uploaded. If is it too much how can I speed it up?

I have tried the basic txt import and ODBC connection to an Access database and both seemed really slow.

Can someone help me out?

Thanks

Túlio

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

per isakson 2013 年 12 月 11 日

編集済み: per isakson 2013 年 12 月 11 日

I think you can read your file in less that 20 seconds with textscan. Could you post a few lines of the file?

Tulio 2013 年 12 月 12 日

編集済み: Tulio 2013 年 12 月 12 日

MATLAB Online で開く

sure,

please find the file attached.

So how come it takes too long using the generated script( also attached)?

Thanks you

if true
% filename = 'M:\Trading Strategies\Arbitrage\Data\Ind Compra.txt';
% delimiter = ';';
% 
% %%Read columns of data as strings:
% % For more information, see the TEXTSCAN documentation.
% formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
% 
% %%Open the text file.
% fileID = fopen(filename,'r');
% 
% %%Read columns of data according to format string.
% % This call is based on the structure of the file used to generate
% % this code. If an error occurs for a different file, try regenerating
% % the code from the Import Tool.
% dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter,  'ReturnOnError', false);
% 
% %%Close the text file.
% fclose(fileID);
% 
% %%Convert the contents of columns containing numeric strings to numbers.
% % Replace non-numeric strings with NaN.
% raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
% for col=1:length(dataArray)-1
%     raw(1:length(dataArray{col}),col) = dataArray{col};
% end
% numericData = NaN(size(dataArray{1},1),size(dataArray,2));
% 
% for col=[1,3,4,5,6,7,9,10,11,12]
%     % Converts strings in the input cell array to numbers. Replaced
%     % non-numeric strings with NaN.
%     rawData = dataArray{col};
%     for row=1:size(rawData, 1);
%         % Create a regular expression to detect and remove non-numeric
%         % prefixes and suffixes.
%         regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
%         try
%             result = regexp(rawData{row}, regexstr, 'names');
%             numbers = result.numbers;
%             
%             % Detected commas in non-thousand locations.
%             invalidThousandsSeparator = false;
%             if any(numbers=='.');
%                 thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
%                 if isempty(regexp(thousandsRegExp, '.', 'once'));
%                     numbers = NaN;
%                     invalidThousandsSeparator = true;
%                 end
%             end
%             % Convert numeric strings to numbers.
%             if ~invalidThousandsSeparator;
%                 numbers = strrep(numbers, '.', '');
%                 numbers = strrep(numbers, ',', '.');
%                 numbers = textscan(numbers, '%f');
%                 numericData(row, col) = numbers{1};
%                 raw{row, col} = numbers{1};
%             end
%         catch me
%         end
%     end
% end
% 
% %%Split data into numeric and cell columns.
% rawNumericColumns = raw(:, [1,3,4,5,6,7,9,10,11,12]);
% rawCellColumns = raw(:, [2,8]);
% 
% 
% %%Replace non-numeric cells with NaN
% R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),rawNumericColumns); % Find non-numeric cells
% rawNumericColumns(R) = {NaN}; % Replace non-numeric cells
% %%Allocate imported array to column variable names
% 
% Data_Bid = rawCellColumns(:, 1);
% Type_Bid = cell2mat(rawNumericColumns(:, 3));
% Sequence_Bid = cell2mat(rawNumericColumns(:, 4));
% ID_Bid = cell2mat(rawNumericColumns(:, 5));
% Event_Bid = cell2mat(rawNumericColumns(:, 6));
% Time_Bid = rawCellColumns(:, 2);
% Priority_Bid = cell2mat(rawNumericColumns(:, 7));
% Price_Bid = cell2mat(rawNumericColumns(:, 8))/1000000;
% Quote_Size_Bid = cell2mat(rawNumericColumns(:, 9));
% Trade_Size_Bid = cell2mat(rawNumericColumns(:, 10));
% Avaiable_Size_Bid=Quote_Size_Bid-Trade_Size_Bid;
% 
% %%Clear temporary variables
% clearvars filename delimiter formatSpec fileID dataArray ans raw col numericData rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me rawNumericColumns rawCellColumns R;
end

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

per isakson 2013 年 12 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/109262-efficient-way-to-import-large-date-sets#answer_118014

編集済み: per isakson 2013 年 12 月 11 日

MATLAB Online で開く

Try

    len = 1e5;
    str = '2013-12-10, 00:01:02, 1,2,3,4,5,6,7,8,9,0';
    fid = fopen( 'c:\m\cssm\data_2.txt', 'w' );
    for jj = 1 : len
        fprintf( fid, '%s\n', str );
    end
    fclose( fid );
    tic
    fid = fopen( 'c:\m\cssm\data_2.txt', 'r' );
    cac = textscan( fid, '%s%s%f%f%f%f%f%f%f%f%f%f'             ... 
                       , 'CollectOutput',true, 'Delimiter', ',' );  
    fclose( fid );
    toc
    tic
    buf = cell2mat( cac{1} );
    sdn = datenum( buf, 'yyyy-mm-ddHH:MM:SS' ); 
    toc

I get

    Elapsed time is 0.782288 seconds.
    Elapsed time is 1.648444 seconds.

If you have 1GB extra ram for the data, it will scale nicely.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Efficient way to import large date sets

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Efficient way to import large date sets

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示