split training data and testing data
125 ビュー (過去 30 日間)
古いコメントを表示
abdulaziz marie
2018 年 1 月 18 日
コメント済み: Abhijit Bhattacharjee
2023 年 3 月 4 日
Hello i have a 54000 x 10 matrix i want to split it 70% training and 30% testing whats the easiest way to do that ?
1 件のコメント
Delvan Mjomba
2019 年 6 月 6 日
Use the Randperm command to ensure random splitting. Its very easy.
for example:
if you have 150 items to split for training and testing proceed as below:
Indices=randperm(150);
Trainingset=<data file name>(indices(1:105),:);
Testingset=<data file name>(indices(106:end),:);
採用された回答
Akira Agata
2018 年 1 月 18 日
編集済み: the cyclist
2022 年 8 月 16 日
% Sample data (54000 x 10)
data = rand(54000,10);
% Cross varidation (train: 70%, test: 30%)
cv = cvpartition(size(data,1),'HoldOut',0.3);
idx = cv.test;
% Separate to training and test data
dataTrain = data(~idx,:);
dataTest = data(idx,:);
11 件のコメント
Rishikesh Shetty
2023 年 1 月 9 日
Hi Akira,
Thank you for this straight forward approach.
After following these steps, I was able to predict my model accuracy as expected.
My next question is - how do I split my data for all possible combinations?
For example, I have a 13*2 array that will split into 70/30 as 9*2 (training) and 4*2 (testing). I would like to repeat this split for all possible combinations(13C9) and then obtain an average of the model prediction accuracy.
Any advise is deeply appreciated.
Abhijit Bhattacharjee
2023 年 3 月 4 日
Rishikesh,
The CVPARTITION function randomizes the selection of the training and test datasets, so to get a new random combination just run it again. I am not sure it is advisable to try all combinatorial possibilities, as it is questionable whether that will return a much better model than you could get with considerably less effort. Just retrain with a new random partitioning a few times (say 10 times). This would be 10-fold cross-validation (or also called k-fold cross-validation for the case of k different random partitions).
Best,
Abhijit
その他の回答 (4 件)
Gilbert Temgoua
2022 年 4 月 19 日
編集済み: Gilbert Temgoua
2022 年 4 月 20 日
I find dividerand very straightforward, see below:
% randomly select indexes to split data into 70%
% training set, 0% validation set and 30% test set.
[train_idx, ~, test_idx] = dividerand(54000, 0.7, 0,
0.3);
% slice training data with train indexes
%(take training indexes in all 10 features)
x_train = x(train_idx, :);
% select test data
x_test = x(test_idx, :);
1 件のコメント
uma
2022 年 4 月 28 日
how to split the data into trainx trainy testx testy format but both trainx trainy should have first dimension same also for testx testy should have first dimension same.Example i have a dataset 1000*9 . trainx should contain 1000*9, trainy should contain 1000*1, testx should contain 473*9 and texty should contain473*1.
Vrushal Shah
2019 年 3 月 14 日
If we want to Split the data set in Training and Testing Phase what is the best option to do that ?
0 件のコメント
Jere Thayo
2022 年 10 月 28 日
what if both training and testing are already in files, i.e X_train.mat, y_train.mat, x_test.mat and y_test.mat
0 件のコメント
Syed Iftikhar
2023 年 1 月 1 日
I have input variable name 's' in which i have data only in columns. The size is 1000000. I want to split that for 20% test. So i can save that data in some other variable. because i will gonna use that test data in some python script. Any Idea how to do this?
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Statistics and Machine Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!