Are there any suggestions for writing files in a timewise sequential manner when the files are created by a parfor loop?

1 回表示 (過去 30 日間)
Thank you for any comments and help you may have for my question. Please feel free to comment on coding inefficiencies, etc., that you see.
I can successfully get stock data for individual days for a specified year using a parfor statement in the getYearlyData function below. The parfor loop increment is an index to a vector of strings representing dates that are obtained using the Matlab busdays function for a specified year. The parfor loop (getYearlyData function) calls another function getUSStocksEOD that obtains the data using the Matlab webread function and writes the data to a directory created in the getYearlyData function that contains the parfor loop. Code excerpts for both functions are provided.
The parfor loop in getYearlyData writes the files in a l manner that is not sequential by time. The date for the data to be obtained is created in the first block of code below by getting a date from the dateIndex vector created by the Matlab busdays function and passed to the second block by the parfor loop call to getUSStocksEOD. The file writing is performed by the matlab save function in the second block of code (getUSStocksEOD function).
My understanding of how parfor loops is minimal. I am just happy when they work. I have a 16 core processor so I use 28 workers in the parallel pool. The Matlab disp function within the parfor loop indicates that the first 28 indices/dates used by the parfor loop are sequential. After that they are non-sequential. I am assuming that this non-sequentiality is due to the parfor workers dependence on the time of completion of each of the parfor workers calls to getUSStockEOD. Is this a correct assumption?
I know I can get date sequentiality of the data files by prepending an incrementing value 01 to length(dateIndex) to the beginning of the date string used in the filename that is created in getUStocksEOD. I am wondering if there is a slick way to get this sequentiality by modifying how the parfor loop works so that the time of file creation for each file date in dateIndex is sequential?
getYearlyData function code excerpt:
existingYears = struct2table(dir('E:\US_Stocks\'));
% Remove the ' . ' and ' ,, ' table rows from the results of the dir command so
% that all rows in the table are datafile names.
existingYears([1,2],:) = [];
existingYears = sortrows(existingYears, 1,'ascend');
beginYear = 1999;
yearIndex = 2019; %str2double(cell2mat(existingYears{1, 1})) - 1;
dateIndex = string(datestr(busdays(strcat('01/01/',string(yearIndex)),strcat('02/28/',string(yearIndex)))));
directory = strcat('E:\US_Stocks\',string(yearIndex));
if yearIndex > beginYear
mkdir(directory);
cd(directory);
j = length(dateIndex);
parfor i = 1:j
dataDate = dateIndex(i);
disp(dataDate)
getUSStocksEOD(string(dataDate),string(yearIndex),i);
end
end
getUSStocksEOC function code excerpt:
datetime.setDefaultFormats('defaultdate','dd-MMM-yyyy')
% inputDate = "06-27-2019"; %this datetime format is desired for timetables and Financial Toolbox
EOD_Date = datestr(inputDate,'yyyy-mm-dd'); % this datetime format is desired for the
% API call to EOCHistoricalData.com
url1 = 'http://eodhistoricaldata.com/api/eod-bulk-last-day/US?&api_token=xxxxxxxxxx';
url2 = strcat('&date=',EOD_Date,'&fmt=json&filter=extended');
url3 = strcat(url1, url2);
% Reads input from EOCHistoricalData.com and puts in structure jsonData
jsonData = webread(url3, weboptions('timeout',180));
disp(string(inputDate))
% Convert json stucture from webread to table.
jsonData = struct2table(jsonData);
% Get various properties from jsonData
[rows,cols] = size(jsonData);
varNames = jsonData.Properties.VariableNames;
varTypes = {'string', 'string', 'string', 'datetime',...
'double', 'double', 'double', 'double', 'double', 'double', 'double',...
'double', 'double', 'double', 'double', 'double', 'double', 'double'};
% Create second table to contain data converted to correct type. This
% avoids errors if one were to try to use:
% jsonData.MarketCapitalization = str2double(jsonData.MarketCapitalization)
% or other variablename that may contain nulls, [], etc. The data
% stream fed by EOC Historical Data is replete with invalid values.
workingTable = table('Size',[rows,cols],'VariableNames',varNames,'VariableTypes',varTypes);
workingTable.code = jsonData.code;
workingTable.name = jsonData.name; % Cannot convert to string due to [] character being present
workingTable.exchange_short_name = jsonData.exchange_short_name;
workingTable.date = datetime(jsonData.date);
workingTable.MarketCapitalization = jsonData.MarketCapitalization;
workingTable.open = (jsonData.open);
workingTable.high = (jsonData.high);
workingTable.low = (jsonData.low);
workingTable.close = (jsonData.close);
workingTable.adjusted_close = (jsonData.adjusted_close);
workingTable.ema_50d = str2double(jsonData.ema_50d);
workingTable.ema_200d = str2double(jsonData.ema_200d);
workingTable.hi_250d = str2double(jsonData.hi_250d);
workingTable.lo_250d = str2double(jsonData.lo_250d);
workingTable.avgvol_14d = str2double(jsonData.avgvol_14d);
workingTable.avgvol_50d = str2double(jsonData.avgvol_50d);
workingTable.avgvol_200d = str2double(jsonData.avgvol_200d);
% Save workingTable to data directory for US Stocks as a timetable
workingTable = table2timetable(workingTable, 'RowTimes', 'date');
dataDirectory = strcat('E:\US_Stocks\',inputYear);
fileString = strcat('US_Stock_Data','_',inputDate);
% fileString = strcat('US_Stock_Data','_',string(index),'_', inputDate);
save(strcat(dataDirectory,'\',fileString,'.mat'),'workingTable', '-v7.3');
Please note that this code is a work in progress. It has some duplicated functionality between the two functions and does not yet have error trapping and needed code to deal with directories that already have data files. I am focusing at this moment on having the files written by the parfor loop be sortable in a manner that displays the file names sequentially by the date that is written into in the file name.
Thank you for any help
  3 件のコメント
Mark Smith
Mark Smith 2019 年 9 月 24 日
Thank you Eric. Your answer is so obvious in retrospect with respect to using a different date string format in the file names. I can use this answer readily. I consider this question answered.
Edric Ellis
Edric Ellis 2019 年 9 月 26 日
No problem - it's a trick I use all the time - not necessarily related to parallel computing at all.

サインインしてコメントする。

採用された回答

Mark Smith
Mark Smith 2019 年 9 月 25 日
編集済み: Edric Ellis 2019 年 9 月 26 日
Edric Ellis provided a great answer in the first comment. I am using this "answer" so that I can accept the answer and close this question if I have the power.
EDIT: Here's the answer (moved from comment):
One way to force that is to use yyyy-MM-dd format dates - these will be listed in order regardless of the order in which the files were created.
If you're using serial date numbers, then datestr(..., 29) returns that format:
>> datestr(busdays('1/1/2019', '2/2/2019'), 29)
ans =
22×10 char array
'2019-01-02'
'2019-01-03'
'2019-01-04'
'2019-01-07'
'2019-01-08'
'2019-01-09'
'2019-01-10'
'2019-01-11'
'2019-01-14'
'2019-01-15'
'2019-01-16'
'2019-01-17'
'2019-01-18'
'2019-01-22'
'2019-01-23'
'2019-01-24'
'2019-01-25'
'2019-01-28'
'2019-01-29'
'2019-01-30'
'2019-01-31'
'2019-02-01'

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeParallel for-Loops (parfor) についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by