Hash function for Matlab struct

125 ビュー (過去 30 日間)
Benjamin Bechtel
Benjamin Bechtel 2011 年 3 月 16 日
コメント済み: Stephen23 2017 年 3 月 2 日
Is there a function to gererate a single Hash value from a whole struct? For background information: I am storing all Preferences for an algorithm in a struct. Before processing, I'd like to check, whether this cobination has been processed before without comparing all the single settings.
Thanks in advance!
  1 件のコメント
Rik
Rik 2017 年 3 月 1 日
This is why I love this forum and the FEX. I need to do exactly this, and as I need to include the hash in a filename, it is not possible to compare the structs.

サインインしてコメントする。

採用された回答

Jan
Jan 2011 年 3 月 16 日
編集済み: Jan 2017 年 3 月 1 日
[EDITED] See FEX: DataHash for a complete version.
If you calculate a hash value for a struct, each field must be processed also. Therefore I do not see a big advantage for using a hash for comparing. I suggest using a simple copy of the struct and ISEQUAL.
But if you really want a struct Hash:
  • EDITED: Consider non numeric values:
  • EDITED (30-Mar-2011): Bugs fixed, LOGICAL, empty arrays, shape of data
  • EDITED (31-Mar-2011); Function handles
function H = DataHash(Data)
Engine = java.security.MessageDigest.getInstance('MD5');
H = CoreHash(Data, Engine);
H = sprintf('%.2x', H); % To hex string
function H = CoreHash(Data, Engine)
% Consider the type of empty arrays:
S = [class(Data), sprintf('%d ', size(Data))];
Engine.update(typecast(uint16(S(:)), 'uint8'));
H = double(typecast(Engine.digest, 'uint8'));
if isa(Data, 'struct')
n = numel(Data);
if n == 1 % Scalar struct:
F = sort(fieldnames(Data)); % ignore order of fields
for iField = 1:length(F)
H = bitxor(H, CoreHash(Data.(F{iField}), Engine));
end
else % Struct array:
for iS = 1:n
H = bitxor(H, CoreHash(Data(iS), Engine));
end
end
elseif isempty(Data)
% No further actions needed
elseif isnumeric(Data)
Engine.update(typecast(Data(:), 'uint8'));
H = bitxor(H, double(typecast(Engine.digest, 'uint8')));
elseif ischar(Data) % Silly TYPECAST cannot handle CHAR
Engine.update(typecast(uint16(Data(:)), 'uint8'));
H = bitxor(H, double(typecast(Engine.digest, 'uint8')));
elseif iscell(Data)
for iS = 1:numel(Data)
H = bitxor(H, CoreHash(Data{iS}, Engine));
end
elseif islogical(Data)
Engine.update(typecast(uint8(Data(:)), 'uint8'));
H = bitxor(H, double(typecast(Engine.digest, 'uint8')));
elseif isa(Data, 'function_handle')
H = bitxor(H, CoreHash(functions(Data), Engine));
else
warning(['Type of variable not considered: ', class(Data)]);
end
If the struct contains large arrays, James Tursa's TYPECAST implementation will save processing time, because it does not create a deep copy: typecast-c-mex-function
  2 件のコメント
Jan
Jan 2011 年 5 月 3 日
編集済み: Jan 2013 年 2 月 9 日
NOTE: The shown function replies the same hash for "struct('a', 1, 'b', 2)" and "struct('a', 2, 'b', 1)"! Using BITXOR does not consider the order of data. The fieldnames are not considered also. A better (and faster!) method is to use just Engine.update. See the FEX submission: http://www.mathworks.com/matlabcentral/fileexchange/31272-datahash
Stephen23
Stephen23 2017 年 3 月 2 日
I accepted this answer as it clearly answers the question.

サインインしてコメントする。

その他の回答 (3 件)

Benjamin Bechtel
Benjamin Bechtel 2011 年 3 月 16 日
Thanks to both of you. Maybe I should specify the problem. I'm not just looking for a way to compare structs but for a short identifier for a full struct. The idea is, that I can store the results for specific settings in a file with the ID as name (e.g. hex of the MD5). This way I can see from the filelist, if this parameter-set has been processed without opening each file or making a seperate list.
I like the idea to use the java-class. However, not all of the elements are numeric, so the typecast doesn't work.
Any idea how to get a (more or less) unique id for a struct?
Regards, Benni
  2 件のコメント
Jan
Jan 2011 年 3 月 16 日
What types do the fields have? It would be helpful, if you post such details... I'll update my function to catch cells also.
Jan
Jan 2017 年 3 月 2 日
編集済み: Jan 2017 年 3 月 2 日
This answer has been accepted by John BG. I have no idea why he prefers this question asked by the author. Therefore I've unaccepted this answer.

サインインしてコメントする。


Francois Rongère
Francois Rongère 2011 年 3 月 28 日
Hello,
Did you find an answer to your question because i am also interested in that feature.
I also have all my simulation parameters in a structure (arrays, scalar values, function handles, strings, cell arrays...) and i would like to refer to one simulation run by its unique identifier based on the structure. The goal is also to put those identifier in a light database in order to check wether a simulation has already been run or not...
Kind regards,
François.
  8 件のコメント
Francois Rongère
Francois Rongère 2011 年 4 月 5 日
I can tell you now that your code and modifications are part of my phd code... Thank you (you are in comments of my source code).
Jan
Jan 2011 年 4 月 5 日
You are welcome. It seems to be more important for you than for Benjamin ;-)

サインインしてコメントする。


Arindam Bose
Arindam Bose 2013 年 2 月 9 日
Well thanks a ton Jan Simon. Probably you never know you have helped me a lot in several projects. Wherever I go and search for something in this community, I find you. Now my problem is how can i decode this MD5 hash. Is it possible to get back the original main text from the generated hash? Can you help me please. I don't have much knowledge of JAVA.
  2 件のコメント
Walter Roberson
Walter Roberson 2013 年 2 月 9 日
No, MD5 hashes are irreversible. They are also "hashes", so by definition many different original items "hash" to the same value.
Jan
Jan 2013 年 2 月 10 日
@Arindam: I'm glad that some of my postings have been useful for others. Then it has been worth to spend the time.
MD5 creates a 8 byte hash (128 bit) for the input. Therefore input with more than 8 must create the same hash as another input with less than 8 byte, a so called collision. In consequence you can only reconstruct an input reliably, when you now that it is shorter. Beside the time-consuming brute force attacks, you find a lot of research about cracking MD5 faster. You find some explanations at http://en.wikipedia.org/wiki/MD5 .
Why do you need to get the clear text of an MD5 sum? Because this conflicts with the purpose and design of a hash sum, there could be a better method to get what you need.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeStructures についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by