here you have the x y coordinate of the blu dots. The original dataset is too big to be shared here.
Mean shift clustering - issue with finding the center of my clusters
4 ビュー (過去 30 日間)
古いコメントを表示
Hi all, as you can see from the attached image, I cannot detect the center of my dots (in blu) by using the mean shift clustering. I will report the code below and I want to point out that I got the same result also chaining the bandwidht with any kind of number. Thanks a lot for helping me.
data:image/s3,"s3://crabby-images/457b0/457b0a81e03029ed89b0fb3d9bcbb54064c8b40c" alt=""
my code:
%%
% Import the data
% Prompt the user to choose a file
[filename, filepath] = uigetfile('*.txt', 'Select a text file');
file_name = filename;
remove = '.txt';
file_name_clean = strrep(file_name, remove, '');
%%
% Plotting
plot_name = ['Intensity_' file_name_clean '.svg'];
% Import data from text file
opts = delimitedTextImportOptions("NumVariables", 28);
opts.DataLines = [2, Inf];
opts.Delimiter = "\t";
opts.VariableNames = ["channel_name", "x", "y", "x_c", "y_c"];
opts.SelectedVariableNames = ["x", "y"]; % Only select the x and y columns
opts.VariableTypes = ["string", "double", "double", "double", "double"];
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Construct the full file path
file_path = fullfile(filepath, file_name);
data = readmatrix(file_path, opts);
% Perform Mean Shift clustering
bandwidth = 50; % bandwidth parameter for Mean Shift
[cluster_centers, data2cluster, cluster2dataCell] = MeanShiftCluster(data, bandwidth);
% Plotting the data with logarithmic x-axis and error bars for averages and standard deviations
figure;
plot(data(:,2), data(:,1), '.', 'MarkerSize', 10, 'DisplayName', 'XY coordinates');
hold on;
% Set x-axis limit starting from 0
xlim([0, max(data(:,2))]);
% Set y-axis limit starting from 0
ylim([0, max(data(:,1))]);
% Plot cluster centers
hold on;
plot(cluster_centers(:,2), cluster_centers(:,1), 'kx', 'MarkerSize', 15, 'LineWidth', 3, 'DisplayName', 'Cluster Centers');
hold off;
xlabel('X');
ylabel('Y');
title('Mean Shift Clustering');
legend('XY coordinates', 'Cluster Centers');
回答 (3 件)
Mathieu NOE
2024 年 5 月 13 日
hello Marco
seems that your issue is simply because the function works for row oriented data
see those lines in MeanShiftCluster.m
%**** Initialize stuff ***
[numDim,numPts] = size(dataPts);
so, with your provided data file, I needed to transpose the data array
[cluster_centers, data2cluster, cluster2dataCell] = MeanShiftCluster(data', bandwidth); % NB : data' (transposed)
full code :
%%
clc
clearvars
close all
% Import the data
% Prompt the user to choose a file
[filename, filepath] = uigetfile('*.txt', 'Select a text file');
file_name = filename;
remove = '.txt';
file_name_clean = strrep(file_name, remove, '');
%%
% Plotting
plot_name = ['Intensity_' file_name_clean '.svg'];
% Import data from text file
opts = delimitedTextImportOptions("NumVariables", 28);
opts.DataLines = [2, Inf];
opts.Delimiter = "\t";
opts.VariableNames = ["channel_name", "x", "y", "x_c", "y_c"];
opts.SelectedVariableNames = ["x", "y"]; % Only select the x and y columns
opts.VariableTypes = ["string", "double", "double", "double", "double"];
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Construct the full file path
file_path = fullfile(filepath, file_name);
% data = readmatrix(file_path, opts);
data = readmatrix(file_path); % <= works better in this case without opts
% Perform Mean Shift clustering
bandwidth = 50; % bandwidth parameter for Mean Shift
[cluster_centers, data2cluster, cluster2dataCell] = MeanShiftCluster(data', bandwidth); % NB : data' (transposed)
% Plotting the data with logarithmic x-axis and error bars for averages and standard deviations
figure;
plot(data(:,2), data(:,1), '.', 'MarkerSize', 15, 'DisplayName', 'XY coordinates');
hold on;
% Set x-axis limit starting from 0
xlim([0, max(data(:,2))]);
% Set y-axis limit starting from 0
ylim([0, max(data(:,1))]);
% Plot cluster centers
hold on;
% plot(cluster_centers(:,2), cluster_centers(:,1), 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');
plot(cluster_centers(2,:), cluster_centers(1,:), 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');
hold off;
xlabel('X');
ylabel('Y');
title('Mean Shift Clustering');
legend('XY coordinates', 'Cluster Centers');
8 件のコメント
Mathieu NOE
2024 年 5 月 16 日
I have to say I'm not an expert in image processing (and I don't have the required toolbox either), but there are many answers on this forum about how to detect circles or blobs in images and find their centers
and probably dozens more examples if you search in the FEX
Mathieu NOE
2024 年 5 月 16 日
Now probably my best contribution so far , and I post it here with maybe the hope that you will find it interesting enough to accept it ! :)
so I followed my idea to split the data in smaller chuncks , => splitting along the x axis only and repeating the process in each x window . then concatenate the cluster centers results ;
there is something I noticed though, is that you may have some duplicates at the junction between two data batches , so the trick here was to apply the same process once again on the cluster centers concatenation result, and this way you get the "unique" centers.
I also tried with different split factor (x_inter in the code below) , to see when we achieve the best performance between the clsutering process and the time to concatenate the results - there is a optimum to find :
the result on your data file are :
x_inter = 10; Elapsed time is 5.850134 seconds.
x_inter = 50; Elapsed time is 2.427936 seconds.
x_inter = 100; Elapsed time is 2.262699 seconds.
x_inter = 200; Elapsed time is 2.621037 seconds.
x_inter = 500; Elapsed time is 5.565575 seconds.
here the code :
%%
clc
clearvars
close all
% Import the data
% Prompt the user to choose a file
% [filename, filepath] = uigetfile('*.txt', 'Select a text file');
filepath = pwd;
filename = 'selected_dataset.txt';
remove = '.txt';
file_name_clean = strrep(filename, remove, '');
%%
% Plotting
plot_name = ['Intensity_' file_name_clean '.svg'];
% Import data from text file
opts = delimitedTextImportOptions("NumVariables", 28);
opts.DataLines = [2, Inf];
opts.Delimiter = "\t";
opts.VariableNames = ["channel_name", "x", "y", "x_c", "y_c"];
opts.SelectedVariableNames = ["x", "y"]; % Only select the x and y columns
opts.VariableTypes = ["string", "double", "double", "double", "double"];
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Construct the full file path
file_path = fullfile(filepath, filename);
% data = readmatrix(file_path, opts);
data = readmatrix(file_path);
%% Split the big data set in smaller chunks
x_inter = 100; % split the data along x intervals
minx = min(data(:,2));
maxx = max(data(:,2));
dx = (maxx - minx)/x_inter;
cx_all = [];
cy_all = [];
% Perform Mean Shift clustering
bandwidth = 50; % bandwidth parameter for Mean Shift
tic
for ck = 1:x_inter
xmin = minx+(ck-1)*dx;
xmax = xmin+dx;
ind = (data(:,2)>=xmin) & (data(:,2)<xmax);
data_batch = data(ind,:);
if ~isempty(data_batch) % if you split by too much, data_batch may be empty - so check it !
% Perform Mean Shift clustering
[cluster_centers, ~, ~] = MeanShiftCluster(data_batch', bandwidth); % NB : data_batch' (transposed) (row oriented array)
cx = cluster_centers(2,:);
cy = cluster_centers(1,:);
cx_all = [cx_all cx];
cy_all = [cy_all cy];
end
end
% as they may be some redondant cluster centers due to the data splitting
% process, we repeat the MeanShiftCluster process once more on the result
[cluster_centers, ~, ~] = MeanShiftCluster([cx_all;cy_all], bandwidth);
cx = cluster_centers(1,:);
cy = cluster_centers(2,:);
toc
% Plotting the data with logarithmic x-axis and error bars for averages and standard deviations
figure;
plot(data(:,2), data(:,1), '.', 'MarkerSize', 15, 'DisplayName', 'XY coordinates');
hold on;
% Set x-axis limit starting from 0
xlim([0, max(data(:,2))]);
% Set y-axis limit starting from 0
ylim([0, max(data(:,1))]);
% Plot cluster centers
hold on;
% plot(cluster_centers(2,:), cluster_centers(1,:), 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');
plot(cx, cy, 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');
hold off;
xlabel('X');
ylabel('Y');
title('Mean Shift Clustering');
legend('XY coordinates', 'Cluster Centers');
7 件のコメント
Image Analyst
2024 年 5 月 21 日
How did you read in selected_dataset.rtf? Readmatrix() does not like that extension.
I don't think dbscan should take a long time. I'm attaching a demo of it. It should work for random (x,y) locations but if you have data in a regular grid, such that the locations can be considered pixels on an image, then you can use image analysis to find things like centroids, areas, diameters, etc.
0 件のコメント
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!