Extract identical values from a data set

Hi everyone,
I have a data set for polar coordinates from this command:
[TH,R,Z] = cart2pol(xdata,ydata,zdata);
I then want to find same values of R and corresponding to them values of Z. I have tried these commands:
u=unique(R);
n=histc(R,u);
u(n>1);
find(R==u(n>1))
which gives me an error: Error using == Matrix dimensions must agree.
Error in NSMatlabNPC (line 108) find(R==u(n>1))
And tried to use
find(diff(R)==0)
which returned me an empty matrix 1 by 0.
The problem is probably that I have several identical R of different values and these functions might not be suitable to identify them separately. Are there any other ways I could use to access the same R values?

 採用された回答

Geoff Hayes
Geoff Hayes 2014 年 7 月 2 日

1 投票

The error Error using == Matrix dimensions must agree is raised because u(n>1) contains a subset of the elements of R and so these two vectors are of different dimension and when comparing the two with ==, it will fail because of this difference in dimension.
Since R is a distance value (from http://www.mathworks.com/help/matlab/ref/cart2pol.html) then you may have different values that could be considered the same but due to the double data type used you will not be able to group the two as identical using (say) unique. For example,
1.1234567
1.1234568
Should the above two numbers be considered the same? If so, then you can group like values in R that should be considered identical, given a tolerance.
We sort the data first, and then do the grouping. If we assume that each of the TH, R, and Z are vectors (as opposed to matrices) we can do
data = [R Z TH];
Now sort the data as
sortedData = sortrows(data,1);
In the above, we just sort on the first column of data which corresponds to our R data. We can then iterate over the data and find all those elements that should be considered part of the same group given a tolerance
myTol = 0.00001; % use myTol to determine if 2 values should be consided same
grpStartIdx = 1; % start index of group
grpStopIdx = 1; % stop index of group
groupedData = {}; % a cell array for the grouped data
atGroupIdx = 0;
% iterate over each row of sortedData starting at second element
for k=2:size(sortedData,1)
% compare the first element with the previous
if abs(sortedData(k,1)-sortedData(k-1,1)) < myTol
% the two consecutive elements are considered to be identical
% so must be part of same group
grpStopIdx = k;
else
% the two consecutive elements are not identical since their difference
% is greater than myTol so we have come to a new group
grpStopIdx = k-1;
% copy all data (R,Z,TH) as a group to the cell array
atGroupIdx = atGroupIdx + 1;
groupedData{atGroupIdx} = sortedData(grpStartIdx:grpStopIdx,:);
% update next group indices
grpStartIdx = k;
grpStopIdx = k;
end
end
% copy the last group
atGroupIdx = atGroupIdx + 1;
groupedData{atGroupIdx} = sortedData(grpStartIdx:grpStopIdx,:);
The above may contain a little more code that you wish (and not as neat as using the built-in MATLAB functions), but each group of near-identical R data (as determined by myTol) are combined together in the cell array groupedData for further processing.

8 件のコメント

Sofya
Sofya 2014 年 7 月 2 日
That's fantastic! Thank you very much!
I want to further use the data, so that for each same radius I can find the average of Z and then plot all R vs Z.
I tried using cellfun(@mean,groupedData(:,2),'UniformOutput', false) but it only accesses the groupedData{1} and doesn't find the mean.
I have also tried forming a for loop but I can't get all the values from the loop to be nicely written in list.
Do you have any ideas or suggestions on how to do it?
Geoff Hayes
Geoff Hayes 2014 年 7 月 2 日
You almost have it with the cellfun but rather than trying to pass the mean function, you can create your own anonymous one that will operate on each cell of groupedData
cellfun(@(X)mean(X(:,2)),groupedData);
Since groupedData is a cell array of ?x3 matrices (where each matrix has a different number of rows) we can say - for each matrix X (in the cell array), take the second column X(:,2) and find the mean of it. We do that with the anonymous function
@(X)mean(X(:,2))
Try using this implementation of cellfun and see what happens!
Sofya
Sofya 2014 年 7 月 3 日
Great! That's exactly, what I was looking for. Thank you very much!
Geoff Hayes
Geoff Hayes 2014 年 7 月 3 日
Glad to have been able to help!
Sofya
Sofya 2014 年 7 月 23 日
Dear Geoff,
My data has slightly changed and I need to be able to control myTol more. Could I ask you if you have any idea on how to try and iterate through all elements and not just consecutive? So instead of
if abs(sortedData(k,1)-sortedData(k-1,1)) < myTol
something like
if abs(sortedData(k,1)-sortedData(:,1)) < myTol
this particular example doesn't work but maybe you would know how to modify it properly?
Also just to give you an example of R in nm:
R=52.6219 53.0314 54.4405 55.8140
and I want to have myTol=1 (nm).
Thanks in advance for your help!
Geoff Hayes
Geoff Hayes 2014 年 7 月 23 日
Hi Sofya - I think that you almost have the solution. You can use the find function to return all indices of elements in the first column that satisfy this condition
% get indices of all elements in first column that satisfy
% the absolute difference being less than the tolerance
idcs = find(abs(sortedData(k,1)-sortedData(:,1)) < myTol);
% idcs has at least one index, k, so we have to remove it
idcs(idcs==k) = [];
% if idcs is non-empty, then we have some work to do!
if ~isempty(idcs)
% do something for each index in idcs
end
Try the above!
Sofya
Sofya 2014 年 7 月 24 日
Hi Geoff, Thanks for your quick reply!
If I try and extract data with the indices idcs, I think idcs(idcs==k) = []; doesn't help as at the end length(new)=length(sortedData), so it repeats data with same indices.
for k=1:length(sortedData)
% get indices of all elements in first column that satisfy
% the absolute difference being less than the tolerance
idcs = find(abs(sortedData(k,1)-sortedData(:,1)) < myTol);
% idcs has at least one index, k, so we have to remove it
idcs(idcs==k) = [];
% if idcs is non-empty, then we have some work to do!
if ~isempty(idcs)
% do something for each index in idcs
new{k}=sortedData(idcs,1);
end
end
Do you know where the problem is?
Geoff Hayes
Geoff Hayes 2014 年 7 月 24 日
Sofya - where (and why) in the code are you comparing length(new)==length(sortedData)? (With == and not =.) Have you stepped through the code and made sure that idcs(idcs==k) = []; does reduce the idcs vector by one with the removal of the k index?

サインインしてコメントする。

その他の回答 (1 件)

the cyclist
the cyclist 2014 年 7 月 2 日

0 投票

I think you probably want to use the ismember() command.

3 件のコメント

Sofya
Sofya 2014 年 7 月 2 日
Thank you for your quick response!
I don't really see how would you use ismember() in this case, could you, please, give an example?
P.S. I only have one R array from which I want to extract repeated elements along with the positions of these element in the array to then get the corresponding values of Z. Sorry for not making it clear at the beginning.
Sofya
Sofya 2014 年 7 月 2 日
Following your kind advice I have tried this command:
u=unique(R);
[counts, bins] = histc(R, u);
[u counts];
[R bins];
R(ismember(bins, find(counts > 1)))
Which does correctly gives an output eliminating all unique values. But how can I then get the positions of these values in R?
the cyclist
the cyclist 2014 年 7 月 2 日
I have to admit am not digging into the details here (sorry!), but note that the ismember command has a second output argument, which is an index that might help you.
doc ismember
for details.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeMatrices and Arrays についてさらに検索

質問済み:

2014 年 7 月 2 日

コメント済み:

2014 年 7 月 24 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by