Random Sampling of Repeated Numbers in an Array

Question

Masato Koizumi 2018 年 4 月 21 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/396452-random-sampling-of-repeated-numbers-in-an-array

コメント済み: Jan 2018 年 4 月 22 日

Dear MATLAB Experts,

Hello. I would like to solve this problem. I have a following column vector Q that contains numbers which some are repeating and some do not. I would like to obtain a new column QQ which consists of only the unique values contained in my original Q array. I understand that I could simply use unique(Q).

However, I would like to take one step further.

I actually would like to select randomly from a collection of repeated number. That is, suppose in the following example, I have three 7's in the first, second and third row. I would like to randomly select one 7 from these three rows. Likewise, I have three 8's which I would like to randomly select from row 4,5 and 6.

I have displayed my code. However, I am obtaining a dimensional inconsistency due to random selection of rows. I would greatly appreciate if you could provide me with an advice which I could code this problem efficiently without using excessive number of loops.

Q = [7 7 7 8 8 8 10 18 27 42 65 49 54 65 78 78 78 82 87 98 98]';
B = unique(Q); 
Ncount = histc(Q, B);
i = 1;
while i < length(Q)
  QQ(i) = Q(i);
  if Ncount(i) > 1
    [row col] = find(Q == B(i));
    row_select = randsample(row,1);     
    QQ(i) = QQ(row_select);
  end
  i = i + 1;
end

Thank you.

Sincerely,

Masato

2 件のコメント
なしを表示なしを非表示

Jan 2018 年 4 月 21 日

All 7s are equal. What is the meaning of selecting a specific one?

dpb 2018 年 4 月 21 日

編集済み: dpb 2018 年 4 月 21 日

Agree with Jan if you only save the value there's no difference; appears that what your problem is that it matters which one is selected as there's something unique about position, not just magnitude. If that isn't so, then there is no difference and may as well just use the results of unique as one seven itself is as good as another; only if which seven makes a difference.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

John BG 2018 年 4 月 21 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/396452-random-sampling-of-repeated-numbers-in-an-array#answer_316428

編集済み: John BG 2018 年 4 月 21 日

MATLAB Online で開く

Hi Masato

this is John BG jgb2012@sky.com

I have done the following corrections to your code

1.- the mismatch error was cause by the i=i+1 in the while loop attempting to index +1 beyond the length of the vector.

a for loops suffices.

2.- there's no need for the variable row_select

3.- there's no need for the find producing row_select

4.- return 0, not 1, in QQ, for all those single values of Q that do not imply random selection.

Setting those to 1 may be misleading, because 1 is a possible index for a sub-selection within the partial ranges.

Q = [7 7 7 8 8 8 10 18 27 42 65 49 54 65 78 78 78 82 87 98 98];
B = unique(Q)
Ncount = histc(Q, B)
i = 1;
for i=1:1:length(B) % < length(Q)
  QQ(i) = 0 % Q(i);
  % or  QQ(i) = 1, but 1 may be index of random selection, thus potentially confusing
  if Ncount(i) > 1
%     [row col] = find(Q == B(i));
   % no need for variable row_select
    QQ(i) =  randsample(Ncount(i),1);     
  end
%   i = i + 1;
end

この回答が役に立つと判断した場合は、回答として回答をマークするようにしてください。

他の読者には、この答えは、親指をクリックして検討してください便利な投票リンクを見つける。

時間と注意を事前に感謝

If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?

To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link

thanks in advance for time and attention

John BG

jgb2012@sky.com

additional comments:

1.- as usual, and as a compliment, Jan Simon is on the top of his game always providing extremely useful insight and solutions to all questions he contributes to,

but it my opinion for this particular answer there's no need for any additional functions like runlength.m

https://uk.mathworks.com/matlabcentral/fileexchange/241-runlength-m?s_tid=srchtitle

or RunLength.m

https://uk.mathworks.com/matlabcentral/fileexchange/41813-runlength?s_tid=srchtitle

many people do not even have a complier installed, which fills up the screen with all the checks and the suggestion to install one, or to download something from a website. Again, I find RunLength a powerful function, but in the context of generating random selection of sub-sections, there's no need for such advanced function.

.

2 件のコメント
なしを表示なしを非表示

John BG 2018 年 4 月 22 日

ありがとう、いつでも、喜んで :)

Thanks, any time, happy to help :)

John BG

jgb2012@Sky.com

Jan 2018 年 4 月 22 日

MATLAB Online で開く

Does this output match the question?

QQ = [2, 1, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 2]

There is no need for i = 1 before the loop for i=1:1:length(B).
A proper pre-allocation of QQ would accelerate the code.
The FileExchange submission RunLength contains the M-file RunLength_M, so you do not need a C-compiler.

サインインしてコメントする。

Answer 2

Jan 2018 年 4 月 21 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/396452-random-sampling-of-repeated-numbers-in-an-array#answer_316412

編集済み: Jan 2018 年 4 月 21 日

MATLAB Online で開く

B = unique(Q);
while i < length(Q)
  [row col] = find(Q == B(i));
  i = i + 1;
end

This cannot work, because B is shorter than Q, when Q is not unique. i must be <= length(B).

I do not understand the purpose of the code. All elements with the same value are identical, so it does not matter which e.g. 7 you select.

[EDITED] Another solution using FEX: RunLength:

Q = [7 7 7 8 8 8 10 18 27 42 65 49 54 65 78 78 78 82 87 98 98 7 7]';
[B, N, Index] = RunLength(Q);
Select        = floor(Index + rand(size(N)) .* N)
Value         = Q(Select)

This considers the 2 different blocks of 7s separately. But if Q is sorted, this might be efficient also.

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Jan 2018 年 4 月 21 日

編集済み: Jan 2018 年 4 月 21 日

MATLAB Online で開く

A rough speed test:

Q = sort(randi([1,1e4], 1e5, 1));

tic;
for k = 1:100
   [B, N, Index] = RunLength(Q);  % Compiles C-Mex
   Select        = floor(Index + rand(size(N)) .* N);
   Value         = Q(Select);
end
toc

tic;
for k = 1:100
   [B, N, Index] = RunLength_M(Q);  % Matlab version
   Select        = floor(Index + rand(size(N)) .* N);
   Value         = Q(Select);
end
toc

tic;
for k = 1:100
  cnt   = hist(Q,unique(Q));
  idx   = arrayfun(@randi,cnt) + cumsum([0,cnt(1:end-1)]);
  Value = Q(idx);
end
toc

tic;
for k = 1:100
   cnt = hist(Q,unique(Q));
   idx = floor(rand(size(cnt)) .* cnt) + cumsum([1,cnt(1:end-1)]);
   Value = Q(idx);   
end
toc
R2016b/64/Win7:

 Elapsed time is 0.116764 seconds.   % RunLength MEX
 Elapsed time is 0.203199 seconds.   % RunLength Matlab
 Elapsed time is 4.921286 seconds.   % arrayfun(@randi,cnt)
 Elapsed time is 0.868590 seconds.   % floor(rand * cnt)

Jan 2018 年 4 月 21 日

編集済み: Jan 2018 年 4 月 21 日

@Stephen: I've updated the speed test to include the C-Mex and Matlab version of RunLength. The default flags in RunLength_M are set to "vectorized" mode.

Stephen23 2018 年 4 月 21 日

編集済み: Stephen23 2018 年 4 月 21 日

+1 very thorough!

Note that Jan's submission RunLength also includes the mfile RunLength_M (which does not require compilation). So you can choose between a very efficient mex file or an almost-as-efficient Mfile!

サインインしてコメントする。

Answer 3

Bruno Luong 2018 年 4 月 22 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/396452-random-sampling-of-repeated-numbers-in-an-array#answer_316502

編集済み: Bruno Luong 2018 年 4 月 22 日

MATLAB Online で開く

p=randperm(length(Q));
[~,I]=unique(Q(p));
RandomSelect=p(I), % Q(RandomSelect) is equal to unique(Q)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Masato Koizumi 2018 年 4 月 22 日

Thank you so much Bruno. I greatly appreciate your time and effort on your advice!

~Masato

サインインしてコメントする。

Answer 4

dpb 2018 年 4 月 21 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/396452-random-sampling-of-repeated-numbers-in-an-array#answer_316420

MATLAB Online で開く

On the assumption made in above comment...

>> [B,ia]=unique(Q);
>> isMult3=(NCount==3);
>> arrayfun(@(x) randperm(x,1),Ncount(isMult3))  % random index into matching groups
ans =
   1
   3
   2
>>

Now, fix up to get the index to the original location --

 >> ix3=arrayfun(@(x) randperm(x,1),Ncount(isMult))+ia(find(Ncount==3))-1
ix3 =
     1
     5
    15
>> Q(ix3)
Q =
     7
     8
    78
>>

2 件のコメント
なしを表示なしを非表示

Jan 2018 年 4 月 21 日

I do not see the detail in the question, that only the blocks with 3 repetitions should be considered.

dpb 2018 年 4 月 21 日

Agreed. Just shows one way to deal with a group; left generalization to OP as "exercise for student" :).

サインインしてコメントする。

Random Sampling of Repeated Numbers in an Array

2 件のコメント
なしを表示なしを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (3 件)

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Random Sampling of Repeated Numbers in an Array

2 件のコメント なしを表示なしを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (3 件)

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

2 件のコメント
なしを表示なしを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示