Trying to average values from specific cells in a similarity matrix

2 ビュー (過去 30 日間)
Wendi Fellner
Wendi Fellner 2022 年 9 月 3 日
編集済み: Wendi Fellner 2022 年 9 月 10 日
I have a group of 10 vectors that represent 10 unique items I've compared to each other to assess their similarity in relation to each other. That is, they've been assigned into categories if their similarity exceeds a threshold. What I have from this process is an upper triangle similarity matrix that looks something like this where the top row and left column are the names of the categories:
10 20 20 20 20 7 7 7 7 12
10 NaN 0 0 0 0 51.3 50.5 50.4 50.5 76.5
20 NaN NaN 99.7 99.6 99.3 85.3 86.0 85.9 85.9 0
20 NaN NaN NaN 99.5 99.3 85.2 85.8 85.8 85.8 0
20 NaN NaN NaN NaN 99.5 85.4 86.0 86.0 86.0 0
20 NaN NaN NaN NaN NaN 85.3 85.9 85.9 85.9 0
7 NaN NaN NaN NaN NaN NaN 99.2 99.0 99.2 0
7 NaN NaN NaN NaN NaN NaN NaN 99.8 99.7 0
7 NaN NaN NaN NaN NaN NaN NaN NaN 99.7 0
7 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0
12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
For my next step, what I want to do is find the average similarity for items that have been placed into a category together as compared to their similarity with items that do not share their category. That is, I want to average the similarity of the Cat20s (99.7, 99.6, 99.3, 99.5, and 99.5) and the Cat7s (99.2, 99.0, 99.2, 99.8, 99.7, and 99.7) so that I can compare it to the similarity values of out-of-category items (0, 0, 0, 0, 51.3, 50.4, 50.5, 76.5, 85.3, 86.0, 85.9, 85.9, 0, etc). What I'm trying to do is assess the effectiveness of the categorization scheme.
I have tried to think through this, but I can't find an approach that I think will work. (I'm pretty new at this, so maybe there is something obvious I haven't thought of.)
Many thanks in advance!

採用された回答

Wendi Fellner
Wendi Fellner 2022 年 9 月 10 日
編集済み: Wendi Fellner 2022 年 9 月 10 日
I went back to the drawing board and figured out a way to do it. :-) Here's what I came up with. (Thank you dpb for all the time and effort and patience in working on this. I may not have communicated clearly what I was trying to do.)
% Create an index for values that are within-category and another index
% for those that are between categories
idxwithin = zeros(size(label_matrix)); %create a matrix of zeros the size of label_matrix to hold markers for values that are within the same category
idxbetween = zeros(size(label_matrix)); %create a matrix of zeros the size of label_matrix to hold markers for values that are NOT within the same category
for column = 2:length(label_matrix) %loop across each column header
for row =2:length(label_matrix) %loop down each row header
if label_matrix(1,column) == label_matrix(row,1) %if column header = row header...
idxwithin(row,column) = 1; %enter 1 at the intersection of row,column into the 'idxwithin' matrix
else
idxbetween(row,column) = 1; %otherwise enter 1 at the intersection of row,column into the 'idxbetween' matrix
end
end
end
idxwithin = logical(idxwithin); %convert idxwithin matrix into a logical
idxbetween = logical(idxbetween); %convert idxbetween matrix into a logical
% find the means of within- and between-category values
withinCatMean = mean(label_matrix(idxwithin),'all','omitnan') %calculate the mean of the within category values from label_matrix, exluding NaNs
betweenCatMean = mean(label_matrix(idxbetween),'all','omitnan') %calculate the mean of the between category values from label_matrix, exluding NaNs

その他の回答 (1 件)

dpb
dpb 2022 年 9 月 3 日
編集済み: dpb 2022 年 9 月 5 日
Not too bad ... use logical addressing to find the locations and the mean with the 'omitnan' argument over the values returned...
Generically, you can write something like (augment the array with a NaN in 1,1 position or build the CATS array independently as here depending on how you have the data originally--
CATS=[10 20 20 20 20 7 7 7 7 12].'; % the categories in respective position in array
C=unique(CATS); % the unique categories over which to iterate
%A=A(2:end,2:end); % or A if you don't include the extraneous row/column to begin with
M=zeros(size(C)); % how many means there are possible -- one/category
for i=1:numel(M)
ixcat=(CATS==C(i)); % get the index into the array column/row -- same since symmetric
M(i)=mean(A(logical(ixcat.*ixcat.'),'all','omitnan')); % expand vector to logical array, select, compute
end
results in
>> disp([C M])
7.0000 99.4333
10.0000 NaN
12.0000 NaN
20.0000 99.4833
>>
In this case only the two categories have any finite elements, but the above will work in general regardless the size or number rows/columns per category. You can always retain only finite results in the end.
  9 件のコメント
dpb
dpb 2022 年 9 月 10 日
>> sim_matrix_wf
Error using load
Unable to read file 'ARTwarp095_0.mat'. No such file or directory.
Error in sim_matrix_wf (line 6)
load ARTwarp095_0.mat; %load the .mat file that was generated in the ARTwarp run
>>
So, no...but it also very belligerently clear'ed my workspace....that was rude!
>> whos -file s_matrix.mat
Name Size Bytes Class Attributes
s_matrix 80x80 51200 double
>>
Clearly from the above your CATS array must be wrong -- the data array is 80x80 but you're generating a reference to position 81. Ergo, it must be one element too long to match.
Wendi Fellner
Wendi Fellner 2022 年 9 月 10 日
I'm sorry about that!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCreating and Concatenating Matrices についてさらに検索

製品


リリース

R2016b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by