How to subset in matrix based on the first 3 columns?

79 ビュー (過去 30 日間)
Clarisha Nijman
Clarisha Nijman 2018 年 11 月 1 日
コメント済み: Clarisha Nijman 2018 年 11 月 3 日
Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities. For such a small thing the code I made look tremendously long and the results are not good at all! Is there a better way to do this in Matlab? Working with for loops and while loops is very difficult for me.
%given matrix
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
%Subsets deduced from A(i,1:3)= A(i+1,1:3)= A(i+2,1:3) B should be:
This part of the code works!
1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
1 2 3 3 4 2;
2 3 4 1 2 3;
2 3 4 2 3 4;
2 3 4 1 2 3;
1 4 3 2 3 4;
1 4 3 1 2 3;
1 3 4 3 2 4;
%final result matrix C with the probability of 1 element in the subset should be:
This is my problem! How to find the correct probabilities.
size(B,1)=4
1 2 3 2 3 4 2/4;
1 2 3 3 2 4 ¼;
1 2 3 3 4 2 ¼ ;
size(B,1)=2
2 3 4 1 2 3 ½ ;
2 3 4 2 3 4 ½ ;
size(B,1)=2
1 4 3 2 3 4 ½ ;
1 4 3 1 2 3 ½ ;
size(B,1)=1
1 3 4 3 2 4 1;
The code:
%add column to matrix for indicator variable
indicator=zeros(size( A,1),1);
A=[A indicator];
for i=1:size(A,1)
if A(i,size(A,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(A,1)%takes care that index is not exceeded
if A(i,1:3)==A(i+k,1:3)
A(i+k,size(A,2))=i;%indicator variable
end
k=k+1;
end
end
end
%add column to matrix for frequency in the subset
freq=zeros(size( A,1),1);
A=[A freq];
%start subsetting and compute the pdf
j=1;
while j<=max(A(:,size(A,2)-1))
B=A(A(:,size(A,2)-1)==j,:);%save the j-th subset in B
for i=1:size(B,1)
if B(i,size(B,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(B,1)%takes care that index is not exceeded
if B(i,1:6)==B(i+k,1:6)
B(i+k,size(B,2))=i;%indicator variable
B
%subsetting to find frequencies
for v=1:max(B(:,size(B,2)))
C=B(B(:,size(B,2))==v,:);%save the j-th subset in B
%computing probability of each element in subset
for w=1:size(C,1)
C(w,size(C,2))= 1/ C(w,size(C,1));
C
end
for w=1:size(C,1)
z=1;
while z+w<size(C,1)
if C(w,1:6)==C(w+z,1:6)
C(w,size(C,2))=C(w,size(C,2))+C(w+z,size(C,2));
C(w+z,size(C,2))=0;
end
z=z+1;
end
%remove lines with probability zero
% Specify conditions, which rows should be
% removed
weg = C(:,size(C,2))==0;
% remove
C(weg,:) = [];
E=[E;C];
end
end
end
k=k+1;
end
end
end
j=j+1;
end
  3 件のコメント
JohnGalt
JohnGalt 2018 年 11 月 1 日
agreed with Bruno... "Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities" - find sub-matrices of what form? - computing probabilities of what?
Guillaume
Guillaume 2018 年 11 月 1 日
My understanding is that all rows with identical columns 1 to 3 belong to a subset. The probability of a row is the number of times it appear in the matrix divided by the number of rows in the subset it belongs to.
I too have not tried to understand the code.

サインインしてコメントする。

採用された回答

Guillaume
Guillaume 2018 年 11 月 1 日
編集済み: Guillaume 2018 年 11 月 2 日
If I understood correctly:
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
[~, ~, uid] = unique(A, 'rows'); %get unique id for each row of A
count = accumarray(uid, 1); %get count of how many times each unique row of A appear
count = count(uid); %and assign to each row
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount; %calculate the probability of each row in its subset
%for pretty display
table(A, subset, probability)
I'm using accumarray to compute histograms, you could replace each instance of accumarray(x, 1) by histcounts(x, 'BinMethod', 'integers')' if it's clearer for you.
  4 件のコメント
Guillaume
Guillaume 2018 年 11 月 2 日
You'll notice I used meaningful names in my answer. I have no idea what D, E, F are in your code. Code whose variables have meaningful names is instantly easier to understand.
Note that the sort in unique(sort(x)) is pointless. unique does a sort anyway, unless you use the 'stable' option.
If you don't want the repeted rows in each subset, one method:
[rows, urow, uid] = unique(A, 'rows'); %get unique rows, where they come from, and unique id for each
count = accumarray(uid, 1); %histogram of rows, matches the rows variable
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount(urow);
%for pretty display
subset = subset(urow);
table(rows, subset, probability)
Clarisha Nijman
Clarisha Nijman 2018 年 11 月 3 日
Thanks a lot, Guillaume!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeCreating and Concatenating Matrices についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by