Extracting testing and training data from a single dataset

I have a dataset of size 14400 x 14, where the first 2 columns represent a users x- and y- position, and ranges from 1 : 121.
Example:
first_col second_col . . . . . .
1 1
1 2
1 3
so on to 121
2 1
2 2
so on to 121
3 so on to 121
. .
so on to 121 so on to 121
I want to separate the testing data based on the user location ranging from first_col(1:30) and 2nd column(1:30).
I a using for loop, but it is taking a lot of time.
I would really appreciate any kind of suggestions on this issue.
Thank You

2 件のコメント

Rahul Gulia
Rahul Gulia 2022 年 10 月 28 日
編集済み: Rahul Gulia 2022 年 10 月 28 日
I also want to to be able to separate the dataset for training and testing purpose. And then later combine both the datasets into one for further use.
I guess we can use the index values for this one.
Khushboo
Khushboo 2022 年 10 月 31 日
Hi Rahul,
I am sorry I did not fully understand how you want your test data to look like. Could you kindly elaborate more using an example? From what I assume, using slicing would work for your use case.

サインインしてコメントする。

 採用された回答

Rahul Gulia
Rahul Gulia 2022 年 10 月 31 日

0 投票

I was able to solve this issue of mine. It was a simple example to join 2 matrices according to the 1st column values of both the matrices.
Example code:
**************************************************************
xx = [1 7 8; 4 9 10; 5 11 12];
yy = [2 13 14; 3 15 16; 6 17 18];
zz = [xx; yy]
ww = [];
for pp = 1:length(zz)
for qq = 1:length(zz)
if pp==zz(qq,1)
ww = [ww; zz(qq,:)];
end
end
end
ww
*****************************************************************

その他の回答 (2 件)

Rajeev
Rajeev 2022 年 10 月 31 日

0 投票

Hi Rahul,
Logical Indexing can be used to extract the required data from the array.
Assuming that the name of the matrix is "location", to extract only the user locations ranging from 1 to 30, one can proceed in the following way:
% logical indexing is used to extract the index of the required data from each column
first_col_index = first_col <= 130;
second_col_index = second_col <=130;
% logical & (and) operations gives the index of columns where both coordinates are less than or equal to 130
location_index = first_col_index & second_col_index;
% assuming the matrix "location" is a row matrix, the logical index array can be used to extract the required data
location_new = location(location_index,:);
Here is the documentation for logical indexing: Matrix Indexing in MATLAB - MATLAB & Simulink (mathworks.com)
Rahul Gulia
Rahul Gulia 2022 年 10 月 31 日

0 投票

I figured out a way to create the training and testing data based on the location of the users. Here is how I did it.
My DatasetTmp_14 looks like this. (Note: the first column contains the index terms of each row)
1 0 0.5 40.36 43.05 0 1 60 0 54.5 0.5 1 15 5 2301
2 0 1 40.02 42.74 0 1 60 0 54 1 1 15 5 2336
3 0 1.5 39.69 42.43 0 1 60 0 53.5 1.5 1 15 5 2311
4 0 2 39.37 42.13 0 1 60 0 53 2 1 15 5 2327
5 0 2.5 39.05 41.83 0 1 60 0 52.5 2.5 1 15 5 2318
DatasetTmp_14 size = 13310x15.
Now,
*****************************************************
idx1 = (1:length(DatasetTmp_13))';
DatasetTmp_14 = [idx1 DatasetTmp_13];
quadrant_data_test = [];
quadrant_data_train = [];
for pp = 1:length(DatasetTmp_14) % Takes too long to execute
if (DatasetTmp_14(pp,2)<=30 && DatasetTmp_14(pp,3)<=27.5)
tmp1 = DatasetTmp_14(pp,1:15);
quadrant_data_test = [quadrant_data_test; tmp1];
else
quadrant_data_train = [quadrant_data_train; DatasetTmp_14(pp,1:15)];
end
end
*****************************************************
Now I would like to combine the two datasets based on their index values, which I executed like this. This is where I am stuck right now. Kindly let me know of any suggestion on my code, as the new matrix is not created according to proper sequence.
*****************************************************
test_heatmap_data_tmp = [quadrant_data_test; quadrant_data_train];
recreated_dataset = [];
for pp = 1:length(test_heatmap_data_tmp)
for qq = 1:length(test_heatmap_data_tmp)
if (pp == test_heatmap_data_tmp(qq,1))
tmp = test_heatmap_data_tmp(pp,:);
recreated_dataset = [recreated_dataset; tmp];
end
end
end
*****************************************************
This is how the recreated and original image should look like for better reference.

質問済み:

2022 年 10 月 28 日

回答済み:

2022 年 10 月 31 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by