Is there a way to holdout specific data?
3 ビュー (過去 30 日間)
古いコメントを表示
I'm producing decision trees (both classification and regression) of my dataset and I wish to use a specific set of data as the training and a specific set for the testing. Is there a way to do this?
For example, say my dataset is consists of 100 rows, is there a way to tell the software to compute rows 1-75 as the training set and rows 76-100 as the test set?
Thanks in advance
0 件のコメント
回答 (1 件)
Udit06
2024 年 9 月 23 日
Hi Mark,
You can use the array indexing to specify the training and testing sets using indexes. Please find below the code snippet to achieve the same:
% Define the training and testing indices
trainIndices = 1:75;
testIndices = 76:100;
% Split the data
trainData = data(trainIndices, :);
testData = data(testIndices, :);
I hope this helps.
2 件のコメント
Udit06
2024 年 9 月 23 日
Hi Mark,
When you use the fitctree function with the 'CrossVal','on' option, MATLAB automatically performs cross-validation by splitting the data into multiple folds. You can find the same on the following MathWorks documentation:
However, if you want to manually specify the training and test sets, you should handle the splitting yourself rather than relying on the built-in cross-validation. You can find the code snippet on how to train the model using manually specifying train and test sets:
% Clear workspace
clear;
% Load the ionosphere dataset
load ionosphere;
% Define training data ratio and calculate number of training samples
trainRatio = 0.7;
numTrainSamples = round(trainRatio * size(X, 1));
% Split data into training and test sets
X_train = X(1:numTrainSamples, :);
Y_train = Y(1:numTrainSamples);
X_test = X(numTrainSamples+1:end, :);
Y_test = Y(numTrainSamples+1:end, :);
% Train a decision tree classifier
MdlA = fitctree(X_train, Y_train);
% Visualize the decision tree
view(MdlA, 'Mode', 'graph');
% Predict labels for the test set
Y_pred = predict(MdlA, X_test);
% Convert cell arrays to matrices for comparison
Y_test = cell2mat(Y_test);
Y_pred = cell2mat(Y_pred);
% Calculate and display accuracy
accuracy = sum(Y_test == Y_pred) / length(Y_test);
fprintf('Test Set Accuracy: %.2f%%\n', accuracy * 100);
I hope this helps.
参考
カテゴリ
Help Center および File Exchange で Get Started with Statistics and Machine Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!