why innerjoin does not work in parfor?
8 ビュー (過去 30 日間)
古いコメントを表示
While trying to use parfor, I am trying to find an error. I found that using a innerjoin (line 10-12 below) makes a problem. It is okay when I use just for-loop but it does not work with parfor. Why it causes a problem? I used innerjoin as a way of randomly sampling 'id' (one of a variable in my data) and merge it with original dataset (dta2 is here). Any idea or solution? please let me know if there is anything to be cleared here to understand.
parpool(4)
N_boot = 5;
coeff_out2 = zeros(N_boot,N_coef);
parfor i = 1:N_boot
dta2 = dta;
decisions2 = unique(dta2.decision_id);
Ndecisions2 = size(decisions2,1);
sampled_id01 = randsample(decisions2,Ndecisions2,true);
sampled_id2 = dataset2table(mat2dataset(sampled_id01));
sampled_id2.Properties.VariableNames{1} = 'decision_id';
resample_dta = innerjoin(sampled_id2,dta2,'Keys','decision_id');
resample_dta = table2array(resample_dta);
result1 = mean(resample_dta(:,1:4));
coeff_out2(i,:) = result1;
end
3 件のコメント
回答 (2 件)
Edric Ellis
2018 年 5 月 8 日
(x-post from identical question on stackoverflow)
Unfortunately, innerjoin uses the inputname function, which is causing the "transparency violation" error. There's a simple workaround, which is to wrap the call to innerjoin, like so:
innerjoinFcn = @(varargin) innerjoin(varargin{:});
parfor ...
...
resample_dta = innerjoinFcn(sampled_id2,dta2,'Keys','decision_id00');
end
0 件のコメント
Walter Roberson
2018 年 5 月 5 日
I can get further:
decision_id = randi([1 9], 50, 1);
d1 = randi([-10 10], 50, 1);
d2 = randi([-2 2], 50, 1);
d3 = randi([0 255], 50, 1);
dta = table(decision_id, d1, d2, d3);
N_coef = 4;
cp = gcp('nocreate');
if isempty(cp); parpool(4); end
N_boot = 5;
coeff_out2 = zeros(N_boot,N_coef);
parfor i = 1:N_boot
dta2 = dta;
decisions2 = unique(dta2.decision_id);
Ndecisions2 = size(decisions2,1);
decision_id = randsample(decisions2,Ndecisions2,true);
sampled_id2 = table(decision_id, 'VariableNames', {'decision_id'});
resample_dta = innerjoin(sampled_id2,dta2,'Keys','decision_id');
resample_dta = table2array(resample_dta);
result1 = mean(resample_dta(:,1:4));
coeff_out2(i,:) = result1;
end
This gives up on the innerjoin instead of earlier.
The conversion to table was running into problems when it was not being told variable names when the table was constructed, which could hypothetically be explained if the variable names themselves were not guaranteed to be the same in the workers (because the default creation of tables involves using the name of the variable being converted as the column name.)
We could hypothesize that something similar might be happening with the innerjoin.
I am not sure how to fix it yet, as I am still trying to figure out what the intention of the code is, especially in regard to what should happen when there are multiple table entries with the same key.
Or is it safe to assume that the decision_id values will be unique? If so then the call to unique would seem to be redundant ?
3 件のコメント
Walter Roberson
2018 年 5 月 5 日
Right but to do this efficiently I need to know if decision_id is unique in dta or not, and if it is not then what the meaning of sampling with it should be.
参考
カテゴリ
Help Center および File Exchange で Parallel Computing Fundamentals についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!