split dataset with probability weights

Hello everyone
i have a dataset like 20 x 6. i want to split dataset (npop1) into two dataset which depend on probabbility 'weights'. the probability comes from random number r1. i can't figure out how to store the rest data into another variable dataset (rw2). I don't use randperm because randperm has no probability function. I will appreciate any help. Thank you.
Sorry for my bad English..
pc = 0.4; %percentage to split data
nfitA = length(fitA);
PC = pc*npop;% make positive interger
r1 = rand (nfitA,1); %random number for weight probability
rw1 = datasample(npop1,PC,'Weights',r1);

 採用された回答

Fabio Freschi
Fabio Freschi 2019 年 10 月 15 日

0 投票

datasample has a two outputs, where the second is the index to the selected data in your npop1. So:
[rw1,idSelected] = datasample(npop1,PC,'Weights',r1);
idUnselected = setdiff(1:nfitA,idSelected);
rw2 = npop1(idUnselected,:);

7 件のコメント

Genz
Genz 2019 年 10 月 15 日
thank you for fast reply sir.
this is almost what i looking for but is there a bug in setdiff? because the answer its different every time i execute. i tried in data npop1(10x6). for rw1 the answer always 4x6. but rw2 sometimes 6x6, 7x6 and 8x6. how to fix this?
thank you.
Sorry for my bad English.
Fabio Freschi
Fabio Freschi 2019 年 10 月 15 日
Can you provide a MWE (minimal working example) of your code to figure out what's going wrong?
Jos (10584)
Jos (10584) 2019 年 10 月 15 日
datasample samples with replacement, so idSelected may not have 4 distinct values, leaving 6 upto 9 values not selected
Genz
Genz 2019 年 10 月 15 日
編集済み: Genz 2019 年 10 月 15 日
my entire code only 8 rows. so i don't know if this can be called MWE. but this is my entire code.
npop = 10;
pc = 0.4; %percentage to split data
nfitA = length(fitA);
PC = pc*npop;% make positive interger
r1 = rand (nfitA,1); %random number for weight probability
[rw1,idSelected] = datasample(npop1,PC,'Weights',r1);
idUnselected = setdiff(1:nfitA,idSelected);
rw2 = npop1(idUnselected,:);
i also attach m file and mat files.
run more than 3 times, then maybe you will see the different for rw2.
Genz
Genz 2019 年 10 月 15 日
sorry sir, i am not quite understand. but fitA only contain 10x1. fitA is data from another file.
split data only on npop1 data (10x6).
Fabio Freschi
Fabio Freschi 2019 年 10 月 15 日
Got the point! As Jos highlighted, datasample samples with replacement. This means that each sample can be selected more than once. see for example this link.
If you don't want the replacement, you must instruct datasample in this way:
[rw1,idSelected] = datasample(npop1,PC,'Weights',r1,'Replace',false);
You can leave the rest of the code unchanged. Let me know if this fits your initial request.
Genz
Genz 2019 年 10 月 15 日
its work well. i don't realize there was a "replacement".
many thanks sir. GBU.

サインインしてコメントする。

その他の回答 (0 件)

製品

リリース

R2016b

タグ

質問済み:

2019 年 10 月 15 日

コメント済み:

2019 年 10 月 15 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by