MATLAB Answers

1

How to make a MAT-file that can be used to create a Datastore for MapReduce?

Mehrdad Oveisi さんによって質問されました 2014 年 11 月 19 日
最新アクティビティ Oleg Komarov
さんによって コメントされました 2014 年 12 月 4 日
This page tell you how to Read and Analyze Data in KeyValueDatastore for MAT-File. However, it only "shows how to create a datastore for key-value pair data in a MAT-file that is the output of mapreduce." The question is how you can make a MAT-file to create a datastore?
I found the following reply by Rick Amos in another thread useful: Currently, the one very specific form of mat files that can be read by datastore is the output of another mapreduce call. An unofficial shortcut that creates such a mat file is the following code:-
data.Key = {'Test'};
data.Value = {struct('a', 'Hello World!', 'b', 42)};
save('myMatFile.mat', '-struct', 'data');
ds = datastore('myMatFile.mat');
readall(ds)
This is nice to know, and it works well with one key-value pair. In general case, how do you save multiple key-value pairs for datastore (such that readall(ds) would produce multiple rows)? I have tried two alternatives with no success: saving two same-sized cell arrays for keys and values, and saving one struct array of key-value pairs. Thank you!

  0 件のコメント

サインイン to comment.

1 件の回答

回答者: Rick Amos
2014 年 11 月 24 日
 採用された回答

In R2014b there is currently not a direct way of creating a MAT file datastore. However, there are several indirect ways that will create a mat file datastore in R2014b.
The first method is to use the output of a mapreduce operation. That is, create an input file 'input.txt' that has the following contents:
Filename
myMatFile.mat
mySecondMatFile.mat
Then create a 'myMapper.m' with the following contents:
function myMapper(data, ~, intermediateOutput)
filenames = data.Filename;
addmulti(intermediateOutput, filenames, filenames);
end
And a 'myReducer.m' with the following contents:
function myReducer(filename, ~, finalOutput)
% This should be changed depending on the inputData.
% This purely converts a struct array into a cell array of structs for addmulti.
data = load(filename);
values = num2cell(data.myStructArrayVariable);
keys = repmat({'SomeKey'}, size(values));
addmulti(finalOutput, keys, values);
end
With all of this in place, do:
ds = datastore('input.txt');
mapFunction = @myMapper;
reduceFunction = @myReducer;
outputFolder = '/my/output/folder';
resultDS = mapreduce(ds, mapFunction, reduceFunction, 'OutputFolder', outputFolder)
This will create a collection of MAT files in the given output folder that consists of the original data and that can be used with datastore.
The second method is an unofficial shortcut to this. That is to do the following:-
% Suppose keys and values are two arrays of the same size such as:-
keys = {'TestKey1'; 'TestKey2'};
values = struct('Foo', {1,2}, 'Bar', {3,4});
% Then this will store data in such a way that it can likely be read by datastore:-
if ~iscell(keys)
keys = num2cell(keys);
end
if ~iscell(values)
values= num2cell(values);
end
data.Key = keys(:);
data.Value = values(:);
save('myMatFile.mat', '-struct', 'data');
ds = datastore('myMatFile.mat');
readall(ds)

  2 件のコメント

Mehrdad Oveisi 2014 年 11 月 24 日
Thank you, Rick!
Oleg Komarov
2014 年 12 月 4 日
I find this useful (thus +1), since it provides a workaround to store multiple values. However, it is not a scalable option since each tuple of values is saved in a scalar structure, which is then repeated for each row.
I have not tested it but I think the benefit of the compression given by the matfile will be outweighed by the overhead of the struct.

サインイン to comment.



Translated by