How to determine the most common (most occurring) number in column of a large data of more than 100,000 length of data

Question

Gali Musa 2018 年 5 月 2 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/398714-how-to-determine-the-most-common-most-occurring-number-in-column-of-a-large-data-of-more-than-100

コメント済み: Siyu Guo 2018 年 5 月 3 日

How to determine the most common (most occurring) number in column of a large data of more than 100,000 length of data?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Star Strider 2018 年 5 月 2 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/398714-how-to-determine-the-most-common-most-occurring-number-in-column-of-a-large-data-of-more-than-100#answer_318353

If you want to count a range of values, rather than exact values, one option is to use the histogram (link) function (or the hist (link) function). You can use the number of bins you want with either function. If you want to define the bins themselves, you will need to define the edges of the bins in histogram and the centres of the bins in hist.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Ameer Hamza 2018 年 5 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/398714-how-to-determine-the-most-common-most-occurring-number-in-column-of-a-large-data-of-more-than-100#answer_318337

MATLAB Online で開く

Use mode().

mostOccuringVal = mode(A);

2 件のコメント
なしを表示なしを非表示

Gali Musa 2018 年 5 月 2 日

its working but i have range of the most occurring values (values appears almost the same after approximation). i. e the most occurring has a range from 82% to 83% (0.82 - 0.83) which off cause want to use all. Thank you for your kind support

Jan 2018 年 5 月 3 日

@Gali: Please mention the important details in the question already, not only in a comment after somebody has posted an answer.

サインインしてコメントする。

Answer 3

John D'Errico 2018 年 5 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/398714-how-to-determine-the-most-common-most-occurring-number-in-column-of-a-large-data-of-more-than-100#answer_318365

編集済み: John D'Errico 2018 年 5 月 2 日

MATLAB Online で開く

There are many things you can do. But none will likely be perfectly satisfactory. For example, you could use uniquetol to do the "counting".

[Vuniq,I,J] = uniquetol(V,0.01);
counts = accumarray(J,1,[100,1],@sum);
[cmax,ind] = max(counts)
cmax =
     1094
ind =
     37
Vuniq(ind)
ans =
      0.36033

So the most frequent value, with a bin of 0.01, and a count of 1106 was 0.36033. The bins that were implicitly created by uniquetol have a width of approximately 0.01. This is essentially the same solution that would arise has a histogram tool been used, as long as the bin boundaries were the same.

That is, the first 10 such unique results obtained from uniquetol are:

    Vuniq(1:10)'
  ans =
     6.2251e-06     0.010016     0.020018     0.030019     0.040023     0.050026     0.060033     0.070049     0.080051     0.090053
  diff(ans)
  ans =
        0.01001     0.010002     0.010001     0.010004     0.010003     0.010007     0.010016     0.010002     0.010002

But was 0.36033 the truly most common? Suppose that the most frequent count happened to cross two such bins?

As I said, there is no perfect solution, at least probably not if you want it to be fast. Are you looking for ANY interval of width 0.01 that contains the most number of elements? If so, this will get more difficult. Still doable, but possibly a bit slower, with more effort. You can see that I chose a vector V that was intentionally going to be very difficult in this respect.

Vs = sort(V);
[~,~,upperbin] = histcounts(Vs + 0.01,Vs);
[Vmaxcount,Vsind] = max(upperbin' - (0:100000 - 1))
Vmaxcount =
        1106
Vsind =
       35963

So it looks like the interval of width 0.01 with the MAXIMUM number of elements in the vector V seems to be [0.36073,0.36073 + 0.01].

Vs(Vsind)
ans =
      0.36073

As a test:

sum((Vs >= Vs(Vsind)) & (Vs < Vs(Vsind) + 0.01))
ans =
        1106

So arguably, the true moving mode, with an interval width of 0.01 is:

Vs(Vsind) + 0.01/2
ans =
      0.36573

Surprisingly the best interval of width 0.01 was actually one that overlapped with the one that uniquetol found. But there is no reason this must happen. Had I chosen a different random set of data, that could easily change.

Anyway, because I was able to use efficient tools for this, it was even pretty fast.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 4

Siyu Guo 2018 年 5 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/398714-how-to-determine-the-most-common-most-occurring-number-in-column-of-a-large-data-of-more-than-100#answer_318374

MATLAB Online で開く

Suppose v is your data vector.

u = unique(v);
h = hist(v,u);
[~,i] = max(h);
value_with_most_occurrences = u(i);

2 件のコメント
なしを表示なしを非表示

John D'Errico 2018 年 5 月 3 日

Which does exactly the same thing as mode(v), but takes 4 lines, instead of 1.

Siyu Guo 2018 年 5 月 3 日

Thanks. Learnt one more function.

サインインしてコメントする。

How to determine the most common (most occurring) number in column of a large data of more than 100,000 length of data

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (4 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to determine the most common (most occurring) number in column of a large data of more than 100,000 length of data

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (4 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント なしを表示なしを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示