Trying to average values from specific cells in a similarity matrix

Question

Wendi Fellner 2022 年 9 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1794155-trying-to-average-values-from-specific-cells-in-a-similarity-matrix

編集済み: Wendi Fellner 2022 年 9 月 10 日

I have a group of 10 vectors that represent 10 unique items I've compared to each other to assess their similarity in relation to each other. That is, they've been assigned into categories if their similarity exceeds a threshold. What I have from this process is an upper triangle similarity matrix that looks something like this where the top row and left column are the names of the categories:

        10     20     20      20     20       7        7       7       7    12
  NaN      0     0	      0	     0	    51.3    50.5    50.4    50.5  76.5
  NaN    NaN    99.7    99.6    99.3    85.3    86.0    85.9    85.9    0
  NaN    NaN    NaN	    99.5    99.3    85.2    85.8    85.8    85.8	0
  NaN    NaN    NaN	    NaN	    99.5    85.4    86.0    86.0    86.0    0
  NaN    NaN    NaN	    NaN	    NaN	    85.3    85.9    85.9    85.9    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    99.2    99.0    99.2    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    99.8    99.7    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    99.7    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    0
  NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    NaN

For my next step, what I want to do is find the average similarity for items that have been placed into a category together as compared to their similarity with items that do not share their category. That is, I want to average the similarity of the Cat20s (99.7, 99.6, 99.3, 99.5, and 99.5) and the Cat7s (99.2, 99.0, 99.2, 99.8, 99.7, and 99.7) so that I can compare it to the similarity values of out-of-category items (0, 0, 0, 0, 51.3, 50.4, 50.5, 76.5, 85.3, 86.0, 85.9, 85.9, 0, etc). What I'm trying to do is assess the effectiveness of the categorization scheme.

I have tried to think through this, but I can't find an approach that I think will work. (I'm pretty new at this, so maybe there is something obvious I haven't thought of.)

Many thanks in advance!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Wendi Fellner 2022 年 9 月 10 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1794155-trying-to-average-values-from-specific-cells-in-a-similarity-matrix#answer_1050890

編集済み: Wendi Fellner 2022 年 9 月 10 日

MATLAB Online で開く

I went back to the drawing board and figured out a way to do it. :-) Here's what I came up with. (Thank you dpb for all the time and effort and patience in working on this. I may not have communicated clearly what I was trying to do.)

% Create an index for values that are within-category and another index
 % for those that are between categories
 idxwithin = zeros(size(label_matrix)); %create a matrix of zeros the size of label_matrix to hold markers for values that are within the same category
 idxbetween = zeros(size(label_matrix)); %create a matrix of zeros the size of label_matrix to hold markers for values that are NOT within the same category
 for column = 2:length(label_matrix) %loop across each column header
     for row =2:length(label_matrix) %loop down each row header
         if label_matrix(1,column) == label_matrix(row,1) %if column header = row header...
             idxwithin(row,column) = 1; %enter 1 at the intersection of row,column into the 'idxwithin' matrix
         else
             idxbetween(row,column) = 1; %otherwise enter 1 at the intersection of row,column into the 'idxbetween' matrix
         end
     end
 end
 idxwithin = logical(idxwithin); %convert idxwithin matrix into a logical
 idxbetween = logical(idxbetween); %convert idxbetween matrix into a logical
 
% find the means of within- and between-category values
withinCatMean = mean(label_matrix(idxwithin),'all','omitnan') %calculate the mean of the within category values from label_matrix, exluding NaNs
betweenCatMean = mean(label_matrix(idxbetween),'all','omitnan') %calculate the mean of the between category values from label_matrix, exluding NaNs

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

dpb 2022 年 9 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1794155-trying-to-average-values-from-specific-cells-in-a-similarity-matrix#answer_1041290

編集済み: dpb 2022 年 9 月 5 日

MATLAB Online で開く

Not too bad ... use logical addressing to find the locations and the mean with the 'omitnan' argument over the values returned...

Generically, you can write something like (augment the array with a NaN in 1,1 position or build the CATS array independently as here depending on how you have the data originally--

CATS=[10 20 20 20 20 7 7 7 7 12].';     % the categories in respective position in array
C=unique(CATS);                         % the unique categories over which to iterate
%A=A(2:end,2:end);     % or A if you don't include the extraneous row/column to begin with
M=zeros(size(C));                       % how many means there are possible -- one/category
for i=1:numel(M)
  ixcat=(CATS==C(i));                   % get the index into the array column/row -- same since symmetric
  M(i)=mean(A(logical(ixcat.*ixcat.'),'all','omitnan'));    % expand vector to logical array, select, compute
end

results in

>> disp([C M])
    7.0000   99.4333
   10.0000       NaN
   12.0000       NaN
   20.0000   99.4833
>> 

In this case only the two categories have any finite elements, but the above will work in general regardless the size or number rows/columns per category. You can always retain only finite results in the end.

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

dpb 2022 年 9 月 7 日

編集済み: dpb 2022 年 9 月 7 日

MATLAB Online で開く

Compare the output of the expression

logical(ixcat.*ixcat.')

to the array and you'll see it is precisely the selection that is the intersection of the same values in both directions -- the only presumption is the categories are the same in both directions since only the one vector is used for both directions. The selection is NOT the whole row/column; it's the product and is a square logical addressing array the size of the array with TRUE elements at the specific interesection.

ADDENDUM

Oh. I don't recall when the automatic array expansion was introduced -- the above is the same as matrix multiplication to return a matrix product with recent releases of MATLAB. You MAY need to write the above as

logical(ixcat*ixcat.')

instead to get the matrix multiplication in earlier releases.

I don't know when the 'all' syntax was introduced; the early MATLAB idiom would be (:) which returns the whole array as a vector and serves thus the same purpose as 'all'. To apply the colon reference, however, requires having a temporary variable; MATLAB doesn't support the syntax to dereference a function return. So, another idiom one will often see, particularly in older code, is the somewhat peculiar-looking

mean(mean(x))

which serves the same purpose since mean is vectorized to return column means from a 2D array, the first call returns a vector; the second then averages the elements of the columns for the overall array average. The above is for 2D array, one has to continue to add terms as the dimensionality of the array increases, of course, which is why the alternate syntax was introduced.

However, if the 'all' syntax isn't supported, the 'omitnan' argument may not be either -- I don't recall (and am too lazy to go back thru the release notes to look it up) if they were itnroduced at the same time or not. If this is an issue, then there's a (now deprecated) family of special-purpose functions nanXXX for the various statistics where XXX is mean, std, var, min, max, ... that older release can still use.

All these little warts and improvements and that R2016 is now pretty old (as releases go) makes me suggest you should look into seeing if you could update your version to something closer to current.

Wendi Fellner 2022 年 9 月 10 日

編集済み: dpb 2022 年 9 月 10 日

MATLAB Online で開く

I have tried the 2020b version and everything in the script seems to be working until it gets to the M(i) line. I've tried with ixcat.*ixcat and also ixcat*ixcat. I'll post my code below. Perhaps I've not incorporated your code correctly. 's_matrix' is the full similarity matrix where bother upper and lower triangles are included and there are no labels along the top row or left column, so the first part of my script is creating the 'label_matrix' matrix that removes the lower triangle and adds the category names. Then I use your code to try to extract and average the within-category values. (I'll also need to extract and average the between-category values at some point, but would like to solve this part first and then maybe I'll understand how to do the between-category values.) The code and then the error messages are below. Can you see where I've gone wrong?

% modify the s_matrix to remove the lower triangle and diagonal values to
% eliminate repeats
idx = ones(size(s_matrix)); %generate a matrix of ones the same size as the similarity matrix
idx = logical(triu(idx,1)); %keep only upper triangle and make into 'logical'
s_uptri_matrix = NaN(numSamples); %create a new matrix filled with NaN
s_uptri_matrix(idx) = s_matrix(idx); %create 'upper triangle' matrix with only the upper triangle values from s_matrix
% add DATA.category values to the s_uptri-matrix as row and column headers
cats = [DATA.category];
l_cats = [NaN(1); cats'];
label_matrix = [cats; s_uptri_matrix]; %add row of category numbers from ARTwarp's DATA struct
label_matrix = [l_cats, label_matrix]; %add column of category numbers transposed from ARTwarp's DATA struct
% Identify within-category values
C = unique(cats); %create vector of unique category names
M = zeros(size(C)); %create matrix of 0s that is the same size as C
for i=1:numel(M)
    ixcat = (cats == C(i)); %create an index of where the category names equal the 'for loop' counter?
    M(i) = mean(label_matrix(logical(ixcat.*ixcat.'), 'all', 'omitnan'));
end

Error when I include the period:

The logical indices in position 1 contain a true value outside of the
array bounds.
Error in sim_matrix_wf (line 42)
M(i) = mean(label_matrix(logical(ixcat.*ixcat.'), 'all',
'omitnan'));

Error when I don't include the period:

Index in position 2 exceeds array bounds (must not exceed 81).
Error in sim_matrix_wf (line 42)
M(i) = mean(label_matrix(logical(ixcat*ixcat.'), 'all', 'omitnan'));

Thanks for your help!

dpb 2022 年 9 月 10 日

MATLAB Online で開く

>> sim_matrix_wf
Error using load
Unable to read file 'ARTwarp095_0.mat'. No such file or directory.
Error in sim_matrix_wf (line 6)
    load ARTwarp095_0.mat; %load the .mat file that was generated in the ARTwarp run 
>> 

So, no...but it also very belligerently clear'ed my workspace....that was rude!

>> whos -file s_matrix.mat
  Name           Size            Bytes  Class     Attributes
  s_matrix      80x80            51200  double              
>>

Clearly from the above your CATS array must be wrong -- the data array is 80x80 but you're generating a reference to position 81. Ergo, it must be one element too long to match.

Wendi Fellner 2022 年 9 月 10 日

I'm sorry about that!

サインインしてコメントする。

Trying to average values from specific cells in a similarity matrix

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Trying to average values from specific cells in a similarity matrix

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

9 件のコメント 7 件の古いコメントを表示7 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示