Defining boundaries of a curve

Question

Gabriel Stanley 2023 年 9 月 12 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2020156-defining-boundaries-of-a-curve

編集済み: Image Analyst 2023 年 9 月 14 日

HistogramData.mat

Context: I have attached some example histograms I've extracted from my data. As a simple/quick form of data clustering, I would like to find the boundaries of the curves present in the histograms (I've changed the raw counts to percentages).

Problem: None of the methods I have used thus far (gradient, findchangepts) have given me precise or robust solutions. This not being my area of expertise, I'm not really sure how to refine my questions beyond the following:

Question: How can I set up an algorithm which will approximately ID the following indecis as pairs for the given data sets

Dat1: [3, 18], [21, (24 or 25)], [25, 31], [33, 37]

Dat2: [6, 17], [52, 54]

Dat3: [(4 or 5, even 6 would be acceptable in a pinch), 15].

I will emphasise that these are the examples I've pulled out of my data thus far. Ideally, the algorithm I want to create will be able to operate over an arbitrary number of curves with 0 a priori knowledge. It is entirely possible, though unlikely, that a data set might have no curves/clusters, or very weakly-defined/low-prominence ones.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Stephen23 2023 年 9 月 12 日

MATLAB Online で開く

HistogramData.mat

S = load('HistogramData.mat')

S = struct with fields:

Dat1: [37×2 double] Dat2: [54×2 double] Dat3: [16×2 double]

scatter(S.Dat1(:,1),S.Dat1(:,2))

scatter(S.Dat2(:,1),S.Dat2(:,2))

scatter(S.Dat3(:,1),S.Dat3(:,2))

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Stephen23 2023 年 9 月 12 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2020156-defining-boundaries-of-a-curve#answer_1308451

MATLAB Online で開く

HistogramData.mat

S = load('HistogramData.mat')
S = struct with fields:
    Dat1: [37×2 double]
    Dat2: [54×2 double]
    Dat3: [16×2 double]
P = 8e-4; % prominence
D1 = diff([false;S.Dat1(:,2)>P;false]);
D2 = diff([false;S.Dat2(:,2)>P;false]);
D3 = diff([false;S.Dat3(:,2)>P;false]);
M1 = [find(D1>0),find(D1<0)-1]
M1 = 4×2
     3    18
    22    24
    26    31
    34    37
M2 = [find(D2>0),find(D2<0)-1]
M2 = 2×2
     6    17
    53    54
M3 = [find(D3>0),find(D3<0)-1]
M3 = 1×2
     5    15

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Gabriel Stanley 2023 年 9 月 14 日

That's what I ws expecting to hear, though perhaps not with so much annoyance in tone. If I knew enough about the data to predict the geometry of the clusters I wouldn't be coming here to ask these questions. Another way to describe my problem would be to say that what I'm trying to do is akin to density-based clustering, but without a-priori knowledge of a good value for epsilon and a minpts of 1 (or k-means without knowing the number of clusters).

That said, I'm looking at triangle thresholding & local minima/maxima to help refine the curves. I will add the "curve shape matching" term to my self-education. Thank you for your help.

Stephen23 2023 年 9 月 14 日

"I will add the "curve shape matching" term to my self-education."

You might find something useful in this toolbox:

https://www.mathworks.com/help/curvefit/index.html

Another option might be to try some kind of machine learning to classify those curves:

https://www.mathworks.com/solutions/machine-learning.html

https://www.mathworks.com/products/statistics.html

サインインしてコメントする。

Answer 2

Image Analyst 2023 年 9 月 14 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2020156-defining-boundaries-of-a-curve#answer_1310606

編集済み: Image Analyst 2023 年 9 月 14 日

MATLAB Online で開く

dbscan_demo.m

"what I'm trying to do is akin to density-based clustering"

You might like to learn about dbscan

help dbscan
 DBSCAN Density-Based algorithm for clustering
    IDX = DBSCAN(X, EPSILON, MINPTS) partitions the points in the N-by-P
    data matrix X into clusters based on parameters EPSILON and MINPTS.
    EPSILON is a threshold for a neighborhood search query. MINPTS is a
    positive integer used as a threshold to determine whether a point is a
    core point. IDX is an N-by-1 vector containing cluster indices. An
    index equal to '-1' implies a noise point.
 
    IDX = DBSCAN(D, EPSILON, MINPTS, 'DISTANCE', 'PRECOMPUTED') is an
    alternative syntax that accepts distances D between pairs of
    observations instead of raw data. D may be a vector or matrix as
    computed by PDIST or PDIST2, or a more general dissimilarity vector or
    matrix conforming to the output format of PDIST or PDIST2.
 
    [IDX, COREPTS] = DBSCAN(...) returns a logical vector COREPTS
    indicating indices of core-points as identified by DBSCAN.
 
    IDX = DBSCAN(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies optional
    parameter name/value pairs to control the algorithm used by DBSCAN.
    Parameters are:
 
    'Distance'      -   a distance metric which can be any of the distance 
                        measures accepted by the PDIST2 function. The 
                        default is 'euclidean'. For more information on 
                        PDIST2 and available distances, type HELP PDIST2. 
                        An additional choice is:
      'precomputed' -   Needs to be specified when a custom distance matrix
                        is passed in
 
     'P'            -   A positive scalar indicating the exponent of Minkowski
                        distance. This argument is only valid when 'Distance'
                        is 'minkowski'. Default is 2.
   
     'Cov'          -   A positive definite matrix indicating the covariance
                        matrix when computing the Mahalanobis distance. This
                        argument is only valid when 'Distance' is
                        'mahalanobis'. Default is NANCOV(X).
   
     'Scale'        -   A vector S containing non-negative values, with length
                        equal to the number of columns in X. Each coordinate
                        difference between X and a query point is scaled by the
                        corresponding element of S. This argument is only valid
                        when 'Distance' is 'seuclidean'. Default is NANSTD(X).
 
    Example:
       % Find clusters in data X, using the default distance metric 
       % 'euclidean'.
       X = [rand(20,2)+2; rand(20,2)];
       idx = dbscan(X,0.5,2);
 
    See also KMEANS, KMEDOIDS, PDIST2, PDIST.

    Documentation for dbscan
       doc dbscan

Wikipedia description with diagram:

https://en.wikipedia.org/wiki/DBSCAN

I've also attached a demo.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Defining boundaries of a curve

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Defining boundaries of a curve

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示