Comparing classification performance using Friedman Test

Question

MByk 2025 年 11 月 27 日 11:34

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2181527-comparing-classification-performance-using-friedman-test

コメント済み: dpb 2025 年 11 月 29 日 12:38

I am trying to compare the performance of three classifiers across four performance metrics using the Friedman test in MATLAB. Since MATLAB does not include a built-in Nemenyi post-hoc test, I used the "multcompare" function as suggested in related discussions. I obtained the following results. If I understand correctly, a high p-value indicates that there is no significant difference between the classifier performances. How should I interpret the values in c and m? Am I doing something wrong? Can Pearson's r be used to compare the classifiers instead of Friedman and other post-hoc test? Thanks for the help.

PrfMat = [0.9352    0.9697    0.7475    0.9877;
    0.9670    0.8713    0.8414    0.7052;
    0.6944    0.6841    0.9851    0.9897];
[p,~,stats] = friedman(PrfMat, 1, 'on') 
[c,m] = multcompare(stats, 'CType', 'tukey-kramer')
p = 0.8013
c = 1.0000    2.0000   -2.3747    0.3333    3.0413    0.9891
    1.0000    3.0000   -2.0413    0.6667    3.3747    0.9216
    1.0000    4.0000   -3.0413   -0.3333    2.3747    0.9891
    2.0000    3.0000   -2.3747    0.3333    3.0413    0.9891
    2.0000    4.0000   -3.3747   -0.6667    2.0413    0.9216
    3.0000    4.0000   -3.7080   -1.0000    1.7080    0.7785  
m = 2.6667    0.7454
    2.3333    0.7454
    2.0000    0.7454
    3.0000    0.7454

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Umar 2025 年 11 月 29 日 5:27

@MByk, you mentioned, “ The classification performances of this database are quite similar, but I'd like to understand how to interpret the p, c, and m values and would PCC be sufficient instead of running two separate tests? With a high p-value, would we conclude that there is no statistically significant difference between the three classifiers across the four metrics? I'd also like to understand what c and m represent. If I'm not mistaken, pairwise comparisons (Classifier 1 vs. C2, C1 vs. C3, and C2 vs. C3) should also be performed to show where the differences occur, but I don't see these in the results. May be "Multcompere" is not performing what i want. ”

My feedback:please don't give up on the Friedman test because you're actually in an excellent position to get meaningful results. The confusion in this thread comes from everyone analyzing that small toy example you posted at the beginning, which genuinely failed because it only had three classifiers with four measurements each and extremely high variability. That tiny example was statistically underpowered and correctly showed no differences, but your real study is completely different. You mentioned you have 1500 patients with 12 features, testing seven classifiers from the Classification Learner app, and collecting four performance metrics for each classifier. This is actually a robust experimental design with plenty of statistical power to detect real differences if they exist.

The main technical issue you encountered was data orientation. MATLAB's friedman function expects columns to be the groups you're comparing and rows to be the repeated measurements. You had classifiers as rows and metrics as columns, which made MATLAB compare your four metrics instead of your seven classifiers. For your real analysis, you need to create a matrix that's four rows by seven columns, where each row is one of your performance metrics and each column is one of your seven classifiers. So the first row would contain the accuracy scores for all seven classifiers, the second row would be precision for all seven classifiers, the third row recall, and the fourth row F1 scores. Then when you run friedman on this properly arranged matrix, it will actually compare your classifiers across the four metrics.

Regarding the output interpretation, the c matrix from multcompare shows pairwise comparisons between your classifiers. The first two columns tell you which two classifiers are being compared, the middle three columns give you the confidence interval and estimated difference in their rankings, and the last column is the p-value telling you whether that pair differs significantly. The m matrix is simpler, just showing the mean rank for each classifier and its standard error. Lower ranks mean better performance. When you run this on your actual data with seven classifiers, you should get meaningful results because you have 1500 patients worth of data, which is 375 times more than the example that failed.

About using Pearson correlation instead, that's not appropriate for what you're trying to do. Pearson correlation measures whether two variables move together in a linear relationship, but you want to know if your classifiers perform differently from each other. These are fundamentally different statistical questions. The Friedman test is correct for comparing multiple groups that are measured repeatedly, which is exactly your situation with seven classifiers each evaluated on four metrics.

The critical thing to understand is that even if your Friedman test comes back non-significant with a high p-value, that's a valid scientific finding, not a failure of your analysis. It would mean your seven classifiers perform equivalently on your dataset, and you should then choose among them based on practical considerations like training time, interpretability, computational cost, or ease of deployment. Many published studies report no significant differences among methods, and this is honest, valuable scientific information. You would simply report something like "The Friedman test revealed no statistically significant differences among the seven classifiers tested on 1500 patients, suggesting that all performed comparably on this dataset."

Your workflow should be straightforward. First, arrange your data correctly as that four by seven matrix. Second, visualize it with a simple boxplot to see what the distributions look like. Third, run the Friedman test with the command friedman on your matrix with one replication. Fourth, only if the overall Friedman test gives you a p-value less than 0.05 should you proceed with post-hoc pairwise comparisons using multcompare. If the p-value is greater than 0.05, you stop there and report that no significant differences were found. The mistake many researchers make is trying multiple different tests hoping to find significance somewhere, which is statistically invalid. Whatever your Friedman test tells you is the answer, whether significant or not.

The reason both dpb and I spent so much time explaining the example's failure was educational, showing you how sample size and variability affect statistical power. They demonstrated that with only four observations and high variability, you can't detect differences even if they exist. But this lesson doesn't apply to your real study. You have 1500 patients, which provides robust statistical power. The Classification Learner app likely used cross-validation, which means your performance metrics are reliable estimates. Your study design is solid and appropriate for the Friedman test.

If you want to proceed confidently, here's exactly what to do with your real data. Create your performance matrix where PerfMatrix equals a four by seven array, with PerfMatrix row one being all seven accuracy values, row two being all seven precision values, row three being all seven recall values, and row four being all seven F1 scores. Make a boxplot of this matrix to visualize the distributions. Run the Friedman test with the command brackets p comma tbl comma stats close brackets equals friedman open parenthesis PerfMatrix comma one comma quote on quote close parenthesis. Look at the p-value. If it's less than 0.05, run multcompare on the stats output to see which specific pairs differ. If it's greater than 0.05, you're done and can report that all classifiers performed similarly.

The confusion about whether MATLAB's multcompare is equivalent to the Nemenyi post-hoc test is resolved in the literature, which confirms that multcompare with Tukey-Kramer critical values after a Friedman test is mathematically equivalent to the Nemenyi test for ranked data. So you're using the correct procedure. When you write up your results for publication, you can state that classifier performance was compared using the Friedman test with post-hoc pairwise comparisons conducted using the Nemenyi test implemented via MATLAB's multcompare function with Tukey-Kramer critical values when the overall test was significant at alpha equals 0.05.

The fundamental point is this: you have excellent data for this analysis. The toy example failed for legitimate statistical reasons that don't apply to your study. You're using the correct test. Your confusion came from a simple data orientation error that's easy to fix. Whatever results you get, whether showing differences or not, will be scientifically valid and publishable. Don't abandon this approach right when you're on the verge of getting your actual results. Fix the matrix orientation, run the test on your real 1500-patient data, and trust what the statistics tell you.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Umar 2025 年 11 月 28 日 2:03

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2181527-comparing-classification-performance-using-friedman-test#answer_1572321

移動済み: Image Analyst 2025 年 11 月 28 日 3:18

Hi @MByk,

You're actually doing this correctly, and contrary to common belief, MATLAB's multcompare function does perform the equivalent of the Nemenyi post-hoc test when you pass it Friedman test statistics. The Tukey-Kramer method that multcompare uses is mathematically equivalent to Nemenyi for ranked data.

Regarding your results, yes, you're correct that the high p-value of 0.8013 indicates no significant difference between your classifiers. However, there's an important issue with how you set up your data matrix. You have three classifiers and four metrics, but the way you arranged your data, MATLAB interpreted it backwards. The friedman function expects columns to represent the groups you're comparing and rows to represent the repeated measurements. Your current setup has classifiers as rows and metrics as columns, which means MATLAB compared your four metrics instead of your three classifiers. You need to transpose your data matrix like this: friedman(PrfMat.', 1, 'on').

Now let me explain what c and m represent. The c matrix shows pairwise comparisons between groups. The first two columns tell you which two groups are being compared, columns three through five give you the lower confidence limit, the estimated mean difference, and the upper confidence limit for that comparison, and the last column is the p-value for that specific pairwise test. In your case, since there's no significant difference, all the estimated differences are close to zero or small values, and the p-values are all high. The m matrix shows the mean rank and standard error for each group. The first column is the mean rank assigned to each group by the Friedman test, and the second column is the standard error of that rank.

After you transpose your data correctly, you'll see that all three classifiers have identical mean ranks of 2.0 with a Friedman p-value of 1.0, meaning there is absolutely no statistical difference detected between your three classifiers. This isn't because your test is wrong, it's because with only four measurements per classifier and the large variability in your data, there simply isn't enough statistical power to detect any differences. Looking at your actual performance values, each classifier varies wildly across the four metrics, with standard deviations around 0.11 to 0.17, which is huge compared to the small differences between classifier means. This high within-classifier variability masks any between-classifier differences.

As for using Pearson's r instead, that would not be appropriate for this type of comparison. Pearson correlation measures linear relationships between two variables, but you're trying to determine if three classifiers perform differently across multiple metrics, which is a question about group differences, not correlations. The Friedman test is the right choice here because it's a non-parametric test for comparing multiple related groups, which is exactly your situation. Using correlation would be answering a completely different question.

The real issue you're facing is a fundamental statistical power problem. With only four data points per classifier, you cannot reliably distinguish small differences in performance. To get meaningful results, you would need either more performance metrics, more datasets to test on, or metrics with less variability. Given your current data, the statistically honest conclusion is that based on these four metrics, you cannot say that any of the three classifiers performs significantly better or worse than the others.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

dpb 2025 年 11 月 27 日 15:53

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2181527-comparing-classification-performance-using-friedman-test#answer_1572314

移動済み: dpb 2025 年 11 月 27 日 15:53

MATLAB Online で開く

PrfMat=[0.9352 0.9697 0.7475 0.9877;

0.9670 0.8713 0.8414 0.7052;

0.6944 0.6841 0.9851 0.9897];

[mean(PrfMat); std(PrfMat); std(PrfMat)./mean(PrfMat)]

ans = 3×4

0.8655 0.8417 0.8580 0.8942 0.1491 0.1451 0.1197 0.1637 0.1722 0.1724 0.1395 0.1830

boxplot(PrfMat); hAx=gca; hAx.YAxis.TickLabelFormat='%0.2f'; ylim([0.6 1.1])

●

Note the means are all quite similar as shown by the boxplot; the ranges overlap almost identically. With only three observations and one replication, the power to distinguish any small discrepancies that might be present is very low. Drawing any conclusion otherwise from these data would be impossible by any test statistic I think.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

dpb 2025 年 11 月 27 日 20:30

MATLAB Online で開く

I presumed your data were for four treatments based on the orientation of the array -- keeping with Matlab general construction, friedman() considers the columns as the effects and the rows observations with the replicates for each grouped sequentially if there are any.

Consequently, we have to transpose the data

PrfMat=[0.9352 0.9697 0.7475 0.9877;

0.9670 0.8713 0.8414 0.7052;

0.6944 0.6841 0.9851 0.9897].'

PrfMat = 4×3

0.9352 0.9670 0.6944 0.9697 0.8713 0.6841 0.7475 0.8414 0.9851 0.9877 0.7052 0.9897

[mean(PrfMat); std(PrfMat); std(PrfMat)./mean(PrfMat)]

ans = 3×3

0.9100 0.8462 0.8383 0.1105 0.1082 0.1722 0.1214 0.1279 0.2054

boxplot(PrfMat); hAx=gca; hAx.YAxis.TickLabelFormat='%0.2f'; ylim([0.6 1.1])

●

This still shows very little difference between treatments as compared to the in-treatment variability so it's highly unlikely there will be any statistically significant differences, but we can see what it thinks...

[p,~,stats] = friedman(PrfMat, 1,'off') 
p = 1
stats = struct with fields:
       source: 'friedman'
            n: 4
    meanranks: [2 2 2]
        sigma: 1
[c,m] = multcompare(stats, 'Display', 'off')
c = 3×6
    1.0000    2.0000   -1.6572         0    1.6572    1.0000
    1.0000    3.0000   -1.6572         0    1.6572    1.0000
    2.0000    3.0000   -1.6572         0    1.6572    1.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
m = 3×2
    2.0000    0.5000
    2.0000    0.5000
    2.0000    0.5000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

As for the c array, it is described in the output variables section, but the first two columns are the pairwise two being compared followed by the lower-95%, estimate and upper-95% values for the differences in means. The differences are in the effects rankings and since there is no significant difference, the estimate for the mean of each is 2, the median and therefore the differences are all identially 0. The last column is the p value and is, here, identically 1.

Th m values are the estimated means and their standard error; as above, since there is no statistical difference, they're all 2.

Umar 2025 年 11 月 28 日 2:03

Hi @MByk,

You're actually doing this correctly, and contrary to common belief, MATLAB's multcompare function does perform the equivalent of the Nemenyi post-hoc test when you pass it Friedman test statistics. The Tukey-Kramer method that multcompare uses is mathematically equivalent to Nemenyi for ranked data.

Regarding your results, yes, you're correct that the high p-value of 0.8013 indicates no significant difference between your classifiers. However, there's an important issue with how you set up your data matrix. You have three classifiers and four metrics, but the way you arranged your data, MATLAB interpreted it backwards. The friedman function expects columns to represent the groups you're comparing and rows to represent the repeated measurements. Your current setup has classifiers as rows and metrics as columns, which means MATLAB compared your four metrics instead of your three classifiers. You need to transpose your data matrix like this: friedman(PrfMat.', 1, 'on').

Now let me explain what c and m represent. The c matrix shows pairwise comparisons between groups. The first two columns tell you which two groups are being compared, columns three through five give you the lower confidence limit, the estimated mean difference, and the upper confidence limit for that comparison, and the last column is the p-value for that specific pairwise test. In your case, since there's no significant difference, all the estimated differences are close to zero or small values, and the p-values are all high. The m matrix shows the mean rank and standard error for each group. The first column is the mean rank assigned to each group by the Friedman test, and the second column is the standard error of that rank.

After you transpose your data correctly, you'll see that all three classifiers have identical mean ranks of 2.0 with a Friedman p-value of 1.0, meaning there is absolutely no statistical difference detected between your three classifiers. This isn't because your test is wrong, it's because with only four measurements per classifier and the large variability in your data, there simply isn't enough statistical power to detect any differences. Looking at your actual performance values, each classifier varies wildly across the four metrics, with standard deviations around 0.11 to 0.17, which is huge compared to the small differences between classifier means. This high within-classifier variability masks any between-classifier differences.

As for using Pearson's r instead, that would not be appropriate for this type of comparison. Pearson correlation measures linear relationships between two variables, but you're trying to determine if three classifiers perform differently across multiple metrics, which is a question about group differences, not correlations. The Friedman test is the right choice here because it's a non-parametric test for comparing multiple related groups, which is exactly your situation. Using correlation would be answering a completely different question.

The real issue you're facing is a fundamental statistical power problem. With only four data points per classifier, you cannot reliably distinguish small differences in performance. To get meaningful results, you would need either more performance metrics, more datasets to test on, or metrics with less variability. Given your current data, the statistically honest conclusion is that based on these four metrics, you cannot say that any of the three classifiers performs significantly better or worse than the others.

サインインしてコメントする。

Answer 3

Umar 2025 年 11 月 28 日 2:03

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2181527-comparing-classification-performance-using-friedman-test#answer_1572320

移動済み: Image Analyst 2025 年 11 月 28 日 3:17

Hi @MByk,

You're actually doing this correctly, and contrary to common belief, MATLAB's multcompare function does perform the equivalent of the Nemenyi post-hoc test when you pass it Friedman test statistics. The Tukey-Kramer method that multcompare uses is mathematically equivalent to Nemenyi for ranked data.

Regarding your results, yes, you're correct that the high p-value of 0.8013 indicates no significant difference between your classifiers. However, there's an important issue with how you set up your data matrix. You have three classifiers and four metrics, but the way you arranged your data, MATLAB interpreted it backwards. The friedman function expects columns to represent the groups you're comparing and rows to represent the repeated measurements. Your current setup has classifiers as rows and metrics as columns, which means MATLAB compared your four metrics instead of your three classifiers. You need to transpose your data matrix like this: friedman(PrfMat.', 1, 'on').

Now let me explain what c and m represent. The c matrix shows pairwise comparisons between groups. The first two columns tell you which two groups are being compared, columns three through five give you the lower confidence limit, the estimated mean difference, and the upper confidence limit for that comparison, and the last column is the p-value for that specific pairwise test. In your case, since there's no significant difference, all the estimated differences are close to zero or small values, and the p-values are all high. The m matrix shows the mean rank and standard error for each group. The first column is the mean rank assigned to each group by the Friedman test, and the second column is the standard error of that rank.

After you transpose your data correctly, you'll see that all three classifiers have identical mean ranks of 2.0 with a Friedman p-value of 1.0, meaning there is absolutely no statistical difference detected between your three classifiers. This isn't because your test is wrong, it's because with only four measurements per classifier and the large variability in your data, there simply isn't enough statistical power to detect any differences. Looking at your actual performance values, each classifier varies wildly across the four metrics, with standard deviations around 0.11 to 0.17, which is huge compared to the small differences between classifier means. This high within-classifier variability masks any between-classifier differences.

As for using Pearson's r instead, that would not be appropriate for this type of comparison. Pearson correlation measures linear relationships between two variables, but you're trying to determine if three classifiers perform differently across multiple metrics, which is a question about group differences, not correlations. The Friedman test is the right choice here because it's a non-parametric test for comparing multiple related groups, which is exactly your situation. Using correlation would be answering a completely different question.

The real issue you're facing is a fundamental statistical power problem. With only four data points per classifier, you cannot reliably distinguish small differences in performance. To get meaningful results, you would need either more performance metrics, more datasets to test on, or metrics with less variability. Given your current data, the statistically honest conclusion is that based on these four metrics, you cannot say that any of the three classifiers performs significantly better or worse than the others.

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

dpb 2025 年 11 月 28 日 16:56

編集済み: dpb 2025 年 11 月 28 日 17:11

MATLAB Online で開く

Why I did the boxplots first to visualize the data -- always a key point rather than just calculating numbers, get a feel for what the data are telling you.

As @Umar and I both noted, the means and std dev are

mns=[0.9100    0.8462    0.8383];
sds=[0.1105    0.1082    0.1722];

while the pairwise differences in means are

pdist2(mns.',mns.','minkowski',1)       % pairwise differences
ans = 3×3
         0    0.0638    0.0717
    0.0638         0    0.0079
    0.0717    0.0079         0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Let's now hypothesize that you could rerun the experiment and produce data of these means but with a much tighter spread between observations--

P=mns+randn(4,3)/100
P = 4×3
    0.8945    0.8563    0.8449
    0.9001    0.8678    0.8472
    0.9049    0.8331    0.8387
    0.9007    0.8637    0.8287
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
[p,~,stats] = friedman(P, 1,'off') 
p = 0.0388
stats = struct with fields:
       source: 'friedman'
            n: 4
    meanranks: [3 1.7500 1.2500]
        sigma: 1
[c,m] = multcompare(stats, 'Display', 'off')
c = 3×6
    1.0000    2.0000   -0.4072    1.2500    2.9072    0.1805
    1.0000    3.0000    0.0928    1.7500    3.4072    0.0356
    2.0000    3.0000   -1.1572    0.5000    2.1572    0.7593
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
m = 3×2
    3.0000    0.5000
    1.7500    0.5000
    1.2500    0.5000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

and Voila! we have a much different result.

Now the overall p value is significant at the 0.05 level and the pairwise p-values show varying levels of significance -- the first and third are almost certainly different; it's a little iffy about first and second and the second and third can't be distinguished with any confidence at all.

Let's visualize again

boxplot(P)

and we would instinctively draw similar conclusions although one might think that 1 and 2 are also significant at quite a lot higher certainty. There's where the power of the test is limited owing to the paucity of numbers -- the same ranges with more observations would undoubtedly bring up that p-value pretty quickly, but only four values just leaves so few DOF after calculating the parameters there just aren't any left over for the error term reduction the N divisor provides.

dpb 2025 年 11 月 28 日 17:37

MATLAB Online で開く

Since I didn't think to set the RNG seed before, to not change earlier results, let's now look at what having the same distributions but (say) 8 observations would do...

mns=[0.9100 0.8462 0.8383];

Nobs=8

Nobs = 8

P=mns+randn(Nobs,3)/100;

[p,~,stats] = friedman(P, 1,'off')

p = 0.0022

stats = struct with fields:

source: 'friedman' n: 8 meanranks: [3 1.3750 1.6250] sigma: 1

[c,m] = multcompare(stats, 'Display', 'off')

c = 3×6

1.0000 2.0000 0.4531 1.6250 2.7969 0.0033 1.0000 3.0000 0.2031 1.3750 2.5469 0.0164 2.0000 3.0000 -1.4219 -0.2500 0.9219 0.8713

m = 3×2

3.0000 0.3536 1.3750 0.3536 1.6250 0.3536

boxplot(P)

●

This should give you an idea about how the power changes with number of observations and the dispersion of the data. Folks run detailed Monte Carlo simulations to evaluate those qualities of tests for cases where there are not analytic solutions for percentages; the above isn't at all of that kind of thing, but does kinda' show what happens as change those parameters.

We don't have any idea what your data are nor how collected, but either more replications or, more importantly, being able to control the process more tightly would be the direction you would want to look into.

dpb 2025 年 11 月 28 日 21:27

編集済み: dpb 2025 年 11 月 29 日 1:11

I'm sure you could change variable names and use simply a number for patients and nobody would be able to tell a thing that would be protected information if were to want to.

Again without knowing just what it is you're doing and what you're actually trying to test, we aren't even able to say whether the Friedman test is applicable or not, but if you have paired samples, then it likely would be. Being a non-parametric test generally means will have a lower power than a parametric test of the same hypothesis IFF (the proverbial "big if") the assumptions for the parametric test are met. This is,also generally the assumption of normality, but not necessarily always. If you cannot reject the assumption of normality, then ANOVA2 would probably be the choice.

But, even with the large overall sample size, the numbers per class may still be small(ish) and if the measures are still quite dispersed, it may simply not be possible to draw any real distinction along the classifications chosen.

Unfortunately, given that such data are purely observational, you don't have any choice in what the data are as can sometimes be done with, say, manufacturing data where a process may be able to be more tightly constrained and thereby reduce variablility.

MByk 2025 年 11 月 29 日 12:14

Thank you both for the detailed explanations. I wish I understood the topic as well as you do. I'm not a statistician, but I'm trying my best. We're planning to write a paper, and I thought it would be helpful to include a statistical analysis of classification performance rather than simply presenting the results in a table. That's why I asked this question, but from what I've seen, the work isn't just about running tests and sharing the results.

dpb 2025 年 11 月 29 日 12:38

NOTA BENE: My comment is not to say the Friedman test is the incorrect one, only that

I am always reluctant to make a definitive recommendation without a very detailed knowledge of the application (burnt too many times in former life), and
It's possible there could be alternatives that might have more power if can meet the assumptions rather than the nonparametric test.

@Umar is correct in that a negative result is also a valid conclusion even if it may be disappointing to the researcher that a specific idea doesn't pan out as hoped for..but that's a part of advancing science to weed out what doesn't work as well as find what does.

If you are proposing writing a paper, my last suggestion would be to find a uni consulting statistician with whom you can discuss this; if it will be being submitted to a refereed journal, any bad decisions now will almost certainly be questioned.

サインインしてコメントする。

Comparing classification performance using Friedman Test

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Comparing classification performance using Friedman Test

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

8 件のコメント 6 件の古いコメントを表示6 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示