How to vectorize a loop over rows ?

Hello !
I've got the following code for Kaggle's digit recognizer using KNN, somehow, I unable to replace the following for loop into a vectorized implementation.
The loop is used to loop an entire row from the test data matrix.
knn_mat = zeros(m_test,1);
for i = 1:m_test
fprintf('i is %d \n',i);
compare_mat = repmat(x_test(i,:),m_train,1);
distance_mat = sum(power((compare_mat - x_train),2),2);
[a,b] = min(distance_mat);
knn_mat(i) = y(b);
end
Thank You !

10 件のコメント

Matz Johansson Bergström
Matz Johansson Bergström 2014 年 7 月 19 日
The following variables are undefined
m_test = ? x_test = ? m_train = ? x_train = ? y = ?
Could you provide information so I can run the code?
Karan
Karan 2014 年 7 月 19 日
Hi ! I'm sorry for the late reply, here is the entire code :
clc, clear;
x_train = dlmread('train.csv',',',1,0);
x_test = dlmread('test.csv',',',1,0);
[m_train,n_train] = size(x_train);
[m_test, n_test] = size(x_test);
y = x_train(:,1);
x_train = x_train(:,2:n_train);
knn_mat = zeros(m_test,1);
for i = 1:m_test
fprintf('i is %d \n',i);
compare_mat = repmat(x_test(i,:),m_train,1);
distance_mat = sum(power((compare_mat - x_train),2),2);
[a,b] = min(distance_mat);
knn_mat(i) = y(b);
end
Matz Johansson Bergström
Matz Johansson Bergström 2014 年 7 月 19 日
Please upload train.csv and test.csv then we're in business ;-)
Karan
Karan 2014 年 7 月 19 日
Well, uploading them would take a lot of time but, here is the link, you will have to register first though.
dpb
dpb 2014 年 7 月 19 日
Any small test input case would suffice...
Karan
Karan 2014 年 7 月 19 日
Matz Johansson Bergström
Matz Johansson Bergström 2014 年 7 月 19 日
編集済み: Matz Johansson Bergström 2014 年 7 月 19 日
I registered before you uploaded and took a look. I read the description on Kaggle, you compare one character by accessing a row in x_test with every other character that are in the training set x_train. I was not able to vectorize the code, perhaps someone else here can.
For anyone else who might be interested in this (it's a contest in Kaggle), here is how you can visualize the first 10 characters from the training set:
for i=1:10;
image(reshape(x_train(i,:),28,28)');
pause(0.5);
end
Karan
Karan 2014 年 7 月 19 日
編集済み: Karan 2014 年 7 月 19 日
I was able to make an entry in kaggle at 97% accuracy. But this code takes around 3 hours on a 8 GB, i5. I was unable to find a way to vectorize the loop which selects a test image row one by one and compare it with train images to get the minimum distance as per KNN.
Matz Johansson Bergström
Matz Johansson Bergström 2014 年 7 月 19 日
I think that another type of approach or calling compiled code would be of better help. I'm thinking like storing x_train in a spatial data structure to be able to quickly find the closest neighbour of x_test in x_train.
Karan
Karan 2014 年 7 月 19 日
Thank you ! How do we do that ?

サインインしてコメントする。

 採用された回答

Jan
Jan 2014 年 7 月 19 日
編集済み: Jan 2014 年 7 月 19 日

0 投票

This is about 25% faster for your small test data file:
knn_mat = zeros(m_test,1);
for i = 1:m_test
distance_mat = sum(bsxfun(@minus, x_test(i,:), x_train).^2, 2);
[a,b] = min(distance_mat);
knn_mat(i) = y(b);
end
The creation of the large intermediate array x_test-x_train might be the bottleneck here for larger arrays. A complete vectorization would increase the problem, most of all, if the data size is larger.
Then a C-Mex would be much faster. Are you familiar with writing C-Mex functions?

1 件のコメント

Karan
Karan 2014 年 7 月 20 日
編集済み: Karan 2014 年 7 月 20 日
No, this is my first program in matlab. I was using octave but its performance was poor. I'm not familiar with c-mex functions. But even if we did vectorize it, what would the code be ?

サインインしてコメントする。

その他の回答 (1 件)

Matz Johansson Bergström
Matz Johansson Bergström 2014 年 7 月 20 日

0 投票

Jan: Nice. That was actually my first thought, but I only used it on x_test (regged and downloaded from Kaggle before Karan uploaded them) and then I only get 5-8% speedup, unfortunately.
Karan: So, no, as I mentioned earlier, compiled code is the way to go here, there is no simple way you can vectorize this code inside of Matlab.
If you wish to call compiled code from Matlab you can, as Jan states, use C-mex. I would write the code directly in C AND/OR process the data a little maybe? For more information about C-mex, see link to documentation.

1 件のコメント

Karan
Karan 2014 年 7 月 20 日
Thank you all !

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeGet Started with MATLAB についてさらに検索

質問済み:

2014 年 7 月 19 日

コメント済み:

2014 年 7 月 20 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by