split training data and testing data

% Sample data (54000 x 10)
data = rand(54000,10);
% Cross varidation (train: 70%, test: 30%)
cv = cvpartition(size(data,1),'HoldOut',0.3);
idx = cv.test;
% Separate to training and test data
dataTrain = data(~idx,:);
dataTest  = data(idx,:);

11 件のコメント
9 件の古いコメントを表示 9 件の古いコメントを非表示

Rishikesh Shetty 2023 年 1 月 9 日

Hi Akira,

Thank you for this straight forward approach.

After following these steps, I was able to predict my model accuracy as expected.

My next question is - how do I split my data for all possible combinations?

For example, I have a 13*2 array that will split into 70/30 as 9*2 (training) and 4*2 (testing). I would like to repeat this split for all possible combinations(13C9) and then obtain an average of the model prediction accuracy.

Any advise is deeply appreciated.

Abhijit Bhattacharjee 2023 年 3 月 4 日

Rishikesh,

The CVPARTITION function randomizes the selection of the training and test datasets, so to get a new random combination just run it again. I am not sure it is advisable to try all combinatorial possibilities, as it is questionable whether that will return a much better model than you could get with considerably less effort. Just retrain with a new random partitioning a few times (say 10 times). This would be 10-fold cross-validation (or also called k-fold cross-validation for the case of k different random partitions).

Best,

Abhijit

サインインしてコメントする。

Answer 2

Gilbert Temgoua 2022 年 4 月 19 日

編集済み: Gilbert Temgoua 2022 年 4 月 20 日

MATLAB Online で開く

4 投票

I find dividerand very straightforward, see below:

    % randomly select indexes to split data into 70% 
    % training set, 0% validation set and 30% test set.
    [train_idx, ~, test_idx] = dividerand(54000, 0.7, 0,
0.3);
    % slice training data with train indexes 
    %(take training indexes in all 10 features)
    x_train = x(train_idx, :);
    % select test data
    x_test = x(test_idx, :);

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

uma 2022 年 4 月 28 日

how to split the data into trainx trainy testx testy format but both trainx trainy should have first dimension same also for testx testy should have first dimension same.Example i have a dataset 1000*9 . trainx should contain 1000*9, trainy should contain 1000*1, testx should contain 473*9 and texty should contain473*1.

サインインしてコメントする。

Answer 3

Vrushal Shah 2019 年 3 月 14 日

3 投票

If we want to Split the data set in Training and Testing Phase what is the best option to do that ?

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 4

Jere Thayo 2022 年 10 月 28 日

0 投票

what if both training and testing are already in files, i.e X_train.mat, y_train.mat, x_test.mat and y_test.mat

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 5

Syed Iftikhar 2023 年 1 月 1 日

0 投票

I have input variable name 's' in which i have data only in columns. The size is 1000000. I want to split that for 20% test. So i can save that data in some other variable. because i will gonna use that test data in some python script. Any Idea how to do this?

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

split training data and testing data

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

採用された回答

11 件のコメント
9 件の古いコメントを表示 9 件の古いコメントを非表示

その他の回答 (4 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

split training data and testing data

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

採用された回答

11 件のコメント 9 件の古いコメントを表示 9 件の古いコメントを非表示

その他の回答 (4 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

11 件のコメント
9 件の古いコメントを表示 9 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示