How to divide dataset into a test, train, split format?

Hello,
I'm trying to split my dataset have the format X_train, X_test, y_train and y_test - in similar fashion to Python's test_train_split but I'm struggling to find a method to do so. Is this possible in MatLab?
I've tried doing the following
seed = 42;
rng(seed);
cv = cvpartition(size(dataset,1), "HoldOut", 0.2);
idx = cv.test;
X_train = subsample(~idx,:);
y_test = subsample(idx,:);
but I'm not entirely sure how to go about deriving X_test and y_train.
Does anybody have a good solution to this? Apologies as I'm fairly new to MatLab!
Thank you!

 採用された回答

Ameer Hamza
Ameer Hamza 2020 年 11 月 4 日

0 投票

Does the variable subsample contains both 'X' and 'y' values? If yes, then you don't need to create two variables for X and 'y'. Just use
subsample_train = subsample(cv.training, :)
subsample_test = subsample(cv.test, :)
However, if subsample contains 'X' values and another variable (say, 'y') contain y values then you can do something like this
X_train = subsample(cv.training, :);
y_train = y(cv.training, :);
X_test = subsample(cv.test, :);
y_test = y(cv.test, :);

6 件のコメント

Ziad
Ziad 2020 年 11 月 4 日
Hey Ameer,
thanks for the reply!
I noticed my code had a typo, the correct code is:
seed = 42;
rng(seed);
cv = cvpartition(size(subsample,1), "HoldOut", 0.2);
idx = cv.test;
X_train = subsample(~idx,:);
y_test = subsample(idx,:);
Could you eloborate more on your second point
However, if subsample contains 'X' values and another variable (say, 'y') contain y values then you can do something like this
as I'm not sure I understood it entirely.
Thank you!
Ameer Hamza
Ameer Hamza 2020 年 11 月 4 日
To explain the point, can you specify what data is stored in 'subsample'.
Ziad
Ziad 2020 年 11 月 4 日
Yes of course,
My subsample file contains my scaled, cleaned dataset containing my features and classes that I'll train my models on. It's a 664x31double.
Ameer Hamza
Ameer Hamza 2020 年 11 月 4 日
If first 30 columns are features and last column is label then you can do this
X_train = subsample(cv.training, 1:30);
y_train = y(cv.training, 31);
X_test = subsample(cv.test, 1:30);
y_test = y(cv.test, 31);
X_train and X_test are feature matrices and y_train and y_test are label vectors.
Ziad
Ziad 2020 年 11 月 6 日
Thank you very much, Ameer!
Ameer Hamza
Ameer Hamza 2020 年 11 月 6 日
I am glad to be of help!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

質問済み:

2020 年 11 月 4 日

コメント済み:

2020 年 11 月 6 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by