MATLAB Answers

0

Comparison between elements of matrix of different data type

Stewart Tan さんによって質問されました 2019 年 8 月 30 日
最新アクティビティ Guillaume
さんによって コメントされました 2019 年 9 月 4 日
So I recently wrote a few line of code to compare adjacent pairs of a matrix where the values in the matrix are integers:
test_mat = [99 100 54 32 14; 89 4 41 2 3; 87 64 32 19 20];
the matrix i currently have is a matrix of 200,000x5. When i pass the matrix for comparison, it took about roughly 2 minutes to complete the comparison. however, i had another matrix where it contains:
test_mat2 = [0.0482 0.0050 0.0516 0.0063 0.0058; 0.0847 0.0008 0.0071 0.0086 0.0502];
and the one that I'm using is also a 200,000x5 matrix which contains data as the test_mat2 above. I notice that comparison takes way longer time compared to the first matrix of integers. Is there any reasoning behind this? Is comparison more expensive with numbers with decimals?

  3 件のコメント

Guillaume
2019 年 9 月 4 日
When you say that the first matrix is integer, what is its class? In your example it's still a matrix of class double, so floating points values which just happen to be integer. There should be no difference in speed between your two examples matrices unless your comparison algorithm does something very strange.
If your first test matrix is actually of an integer class, eg.:
test_mat = uint8([99 100 54 32 14; 89 4 41 2 3; 87 64 32 19 20])
then yes there could be a difference in speed as per Nikhil's answer due to difference in memory footprint. However, I would be surprised if that was noticeable.
In any case, 2 minutes sounds like a long time, so maybe there is something odd going on with your comparison algorithm. Can you share it?
edit: actually the difference is somehow noticable but still it shouldn't take 2 minutes to compare pairs of numbers:
>> mint = randi([0 255], 2e5, 5, 'uint8'); %create a 200,000 x 5 matrix of integers (uint8)
>> mdouble = double(mint); %store the same integers in a matrix of class double
>> mdouble2 = mdouble + rand(2e5, 5); %add a fractional part to show that it doesn't matter if the numbers are integer in a double array
>> timeit(@() mint(1:2:end) == mint(2:2:end)) %compare pair of integers stored as integer
ans =
0.0021422
>> timeit(@() mdouble(1:2:end) == mdouble(2:2:end)) %compare pairs of integers stored as double
ans =
0.0038102
>> timeit(@() mdouble2(1:2:end) == mdouble2(2:2:end)) %compare pairs of non-integers
ans =
0.0037652
As you can see, whether a double array contains integers or not doesn't matter. However, comparison for integer classes is faster (less bytes to compare)
Jan
2019 年 9 月 4 日
@Guillaume: Your tests do not only compare the timing for the comparison, but also for the creation of the vectors. mdouble(1:2:end) needs more time than mint(1:2:end), because it has to allocate and write more bytes.
mint = randi([0 255], 2e5, 5, 'uint8');
mdouble = double(mint);
mdouble2 = mdouble + rand(2e5, 5);
timeit(@() mint == mint)
>> 0.000305
timeit(@() mdouble == mdouble)
>> 0.00031
timeit(@() mdouble2 == mdouble2)
>> 0.00031
The UINT8 comparison is cheaper, because for double the comparison NaN==NaN must be treated as an exception. It looks like this is implemented in the CPU already, such that both need the same time.
I'd expect a difference in the timings due to the memory band width, if the data do not match into the processor cache. I've tested this in Matlab online only, so please repeat the test on a real machine.
Guillaume
2019 年 9 月 4 日
@Jan, indeed. However, there doesn't appear to be much difference in timing for allocating uint8 or double:
>> timeit(@() randi([0 255], 2e5, 5, 'uint8'))
ans =
0.012459
>> timeit(@() randi([0 255], 2e5, 5, 'double'))
ans =
0.01323

サインイン to comment.

1 件の回答

回答者: Nikhil Sonavane 2019 年 9 月 4 日

The way floating points are allocated in the memory is very different as compared to integers. Hence, the algorithm used for comparing floating point numbers is also different from that of integers. I would suggest you go through the Floating-Point Representation to understand this better. Also, the memory allocation in case of floating-point numbers is more than that of integers. For more information please refer to the documentation of integers and floating-point numbers.

  2 件のコメント

Jan
2019 年 9 月 4 日
For the == operator the floating point representation matters only for NaNs, because NaN==NaN must reply false even if the bit representation is equal. For everything but NaN, comparing a double or a vector of 8 UINT8 is equivalent.
Guillaume
2019 年 9 月 4 日
And of course, if the original vector is a 64-bit integer type, then there's the same number of bytes to compare. I would still expect double comparison to be marginally slower due to the need to test for NaN indeed. Plus if I recall correctly modern processors have different pipelines for FP and integer.

サインイン to comment.



Translated by