How to average multiple convex hulls? (or, bootstrapping with convhullln)
6 ビュー (過去 30 日間)
古いコメントを表示
I'm trying to bootstrap a 6-dimensional cluster outline that describes 90% of my data for a group of data points.
The idea is basically this: 1) subsample 90% of the data and calculate a convex hull with convhulln(); 2) repeat step 1 many times; 3) calculate an average hull; 4) do cool analyses with this outline
Unsurprisingly, the size of the each generated hull is a bit different. I've tried to interpolate some extra points to match the hull with the largest number of 'corners' and then tried to calculate an average hull from there (see code below). This doesn't work: I think this is mostly because my hull-corners aren't actually alligned in any useful way. The resulting median outline doesn't end up matching the original data well at all.
Is there a way to calculate an average hull from a group of N-dimensional outlines/hulls? I'd be grateful for any pointers in a helpful direction; my googling has so far been unsuccessful.
My real data has is in a 6-dimensional space, but here I'm trying to solve the problem in 3D (I find it a bit easier to follow). A solution would have to work in a multidimensional space, though.
samp_frac = 0.9; %take 90% of samples
%% make some data
PC1Band = randn(543,3);
%% generate 1000 medians and outlines for each cluster
ss_hulls_idx = cell(1,500);
ss_hulls_vals = cell(1,500);
for o = 1:500
[thesevals,theseidx] = datasample(PC1Band,floor(length(PC1Band)*samp_frac),1,'Replace',false);
[thishull,~] = convhulln(thesevals);
thesecorners = unique(thishull); %indexes of corner samples in this subset
ss_hulls_vals{o} = thesevals(thesecorners,:);
ss_hulls_idx{o} = theseidx(thesecorners); %indexes in original PC1Band
end
%% adjust length of outlines to match longest outline
Lens_outl = cellfun(@length,ss_hulls_vals); %outline lengths
MaxOutline = max(Lens_outl,[],2); %length of longest outline
%for each shorter outline, add some 'missing' values, then interpolate
%those. This shouldn't change the outline shape (facets are also linear)
interp_outlines = zeros(MaxOutline,3,500);
for o = 1:500
thisOutlineVals = ss_hulls_vals{o}; %values at corner indexes
thisOutlineIdx = ss_hulls_idx{o};
newOutlineVals = zeros(MaxOutline,3); %placeholder vector for upsampled corner values
thisLen = Lens_outl(o); %nr of corners in this outline
MissVal = MaxOutline - thisLen; % nr of missing values in this outline
if MissVal > 0 %skip outlines with correct length
MissVali = randperm(MaxOutline-2,MissVal);
MissVali = MissVali + 1; % exclude first and last value (for interpolation reasons)
MissVali = sort(MissVali);
%replace placeholder values at MissVali with NaN
for m = 1:MissVal
newOutlineVals(MissVali(m),:) = [NaN,NaN,NaN];
end
% fill not-missing values
newOutlineVals(~isnan(newOutlineVals)) = thisOutlineVals; % non-missing values are thisDimVals
% linear interpolate NaNs
interp_Outline = fillmissing(newOutlineVals,'linear');
end
interp_outlines(:,:,o) = interp_Outline;
end
%% calculate median outlines
% math works fine, outputs are stupid :(
ss_median_outline = median(interp_outlines,3);
ss_std_outline = std(interp_outlines,1,3);
3 件のコメント
Bjorn Gustavsson
2023 年 1 月 12 日
Would thinking about this something like this help (In 2-D first, I'm scared about the cursed monster of dimensionality):
Set all points inside each convex hull to one, on suitably fine-scaled discretizations,
Add those binary images together, possibly smooth for "taste",
Take the contour at half the max of the sum as an "average hull".
It is not entirely clear to me what you'd want to get from this for the case with very poorly overlapping hulls, but I can at least imagine that if there are many hulls that vary randomly a little bit around some centre-curve this might give something somewhat sensible.
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Bounding Regions についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!