KNN classifier with ROC Analysis

4 ビュー (過去 30 日間)
Aaronne
Aaronne 2013 年 3 月 19 日
Hi Smart guys,
I wrote following codes to get a plot of ROC for my KNN classifier:
load fisheriris;
features = meas;
featureSelcted = features;
numFeatures = size(meas,1);
%%Define ground truth
groundTruthGroup = species;
%%Construct a KNN classifier
KNNClassifierObject = ClassificationKNN.fit(featureSelcted, groundTruthGroup, 'NumNeighbors', 3, 'Distance', 'euclidean');
% Predict resubstitution response of k-nearest neighbor classifier
[KNNLabel, KNNScore] = resubPredict(KNNClassifierObject);
% Fit probabilities for scores
groundTruthNumericalLable = [ones(50,1); zeros(50,1); -1.*ones(50,1)];
[FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthNumericalLable(:,1), KNNScore(:,1), 1);
Then we can plot the FPR vs TPR to get the ROC curve.
However, the FPR and TPR is different from what I got using my own implementation that the one above will not display all the points, actually, the codes above display only three points on the ROC. The codes I implemented will dispaly 151 points on the ROC as the size of the data is 150.
patternsKNN = [KNNScore(:,1), groundTruthNumericalLable(:,1)];
patternsKNN = sortrows(patternsKNN, -1);
groundTruthPattern = patternsKNN(:,2);
POS = cumsum(groundTruthPattern==1);
TPR = POS/sum(groundTruthPattern==1);
NEG = cumsum(groundTruthPattern==0);
FPR = NEG/sum(groundTruthPattern==0);
FPR = [0; FPR];
TPR = [0; TPR];
May I ask how to tune '`perfcurve`' to let it output all the points for the ROC? Thanks a lot.
A.
  1 件のコメント
Alessandro
Alessandro 2013 年 3 月 20 日
編集済み: Alessandro 2013 年 3 月 20 日
try adding 'xvals','all' [FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthNumericalLable(:,1), KNNScore(:,1), 1,'xvals','all');

サインインしてコメントする。

採用された回答

Ilya
Ilya 2013 年 3 月 19 日
For 3 neighbors, the posterior probability has at most 4 distinct values, namely (0:3)/3. Likely less for the Fisher iris data because the classes are well separated. With 4 distinct score values, you won't see more than 4 points on the ROC curve. Your implementation does not account for such ties.
  2 件のコメント
Aaronne
Aaronne 2013 年 3 月 20 日
Hi Ilya,
Thanks for your reply. Does that mean my implementation is wrong?
Why we we can't have more than 4 points on the ROC curve if there are 4 distinct score values? I thought the number of points on the ROC curve is defined as size of the data plus one.
A.
Ilya
Ilya 2013 年 3 月 20 日
Yes, it does mean that your implementation is wrong. As I said, you can't have more points on a ROC curve than distinct threshold values. This is actually quite simple - you just need to think about it.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeStatistics and Machine Learning Toolbox についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by