partialcorr output differ greatly for single and duble-precision formats

4 ビュー (過去 30 日間)
marco ciapparelli
marco ciapparelli 2023 年 7 月 12 日
コメント済み: marco ciapparelli 2023 年 7 月 12 日
Hi,
I'm using partialcorr to compute the partial rank correlation coefficient between y and x controlling for z.
When the data are in single-percision format, the correlation coefficients are much different than the those I obtain from the same tests with double-precision format. This seems to happen only for the partialcorr function with type 'Spearman'. Using Pearson does not produce this problem, nor using the partialcorri function with Spearman. Rounding the decimal places has almost no effect, suggesting that the issue is not related to numeric precision. Here is a reproducible example:
% generate data
rng(1);
y = randn(10000,1);
x = y.*randn(10000,1);
z = x + randn(10000,1);
partialcorr(single(y),single(x),single(z),'Type','Spearman') % rho output = 0.2227
partialcorr(y,x,z,'Type','Spearman') % rho output = 0.0155
partialcorr(single(y),single(x),single(z),'Type','Pearson') % rho output = 0.0255
partialcorr(y,x,z,'Type','Pearson') % rho output = 0.0255
What could be the issue that leads to obtaining very different coefficients when using Spearman?
Thanks for your help!

回答 (1 件)

Diya Tulshan
Diya Tulshan 2023 年 7 月 12 日
編集済み: Diya Tulshan 2023 年 7 月 12 日
Hii Marco Ciapparelli,
I understand you want to get a solution to the issue regarding Spearman and Pearson.
The difference in output that you are observing when using 'partialcorr' with the 'Spearman' type between single-precision and double-precision formats could be due to the algorithm used for computing the partial rank correlation coefficient.
The 'Spearman' type in 'partialcorr' computes the partial rank correlation coefficient using the Spearman's rank correlation formula. The algorithm for computing rank correlations involves sorting the data, assigning ranks, and then calculating the correlation based on the ranks. So,when you use single-precision data, there can be differences in the sorting and ranking process due to the limited precision of single-precision numbers.
The 'Pearson' type in 'partialcorr' calculates the partial correlation coefficient using Pearson's correlation formula. Pearson's correlation is based on the covariance and standard deviations of the data, which are not affected by the precision of the numbers.Thus, you observe consistent results.
To obtain accurate results for partial rank correlation using the 'Spearman' type, it is recommended to use double-precision data instead of single-precision. The 'Spearman' type relies on the ranks of the data, and the limited precision of single-precision numbers can introduce inconsistencies in the ranking process.
Or you can convert your data to double precision to get the result as shown below:-
% generate data
rng(1);
y = randn(10000,1);
x = y.*randn(10000,1);
z = x + randn(10000,1);
partialcorr(double(y),double(x),double(z),'Type','Spearman') % rho output = 0.2227
ans = 0.0155
partialcorr(y,x,z,'Type','Spearman') % rho output = 0.0155
ans = 0.0155
partialcorr(single(y),single(x),single(z),'Type','Pearson') % rho output = 0.0255
ans = single 0.0255
partialcorr(y,x,z,'Type','Pearson') % rho output = 0.0255
ans = 0.0255
Or else if you want to use a single-precision data, partialcorri would be a better choice with 'Spearman'.
% generate data
rng(1);
y = randn(10000,1);
x = y.*randn(10000,1);
z = x + randn(10000,1);
partialcorri(single(y),single(x),single(z),'Type','Spearman') % rho output = 0.2227
ans = 0.0155
partialcorr(y,x,z,'Type','Spearman') % rho output = 0.0155
ans = 0.0155
partialcorr(single(y),single(x),single(z),'Type','Pearson') % rho output = 0.0255
ans = single 0.0255
partialcorr(y,x,z,'Type','Pearson') % rho output = 0.0255
ans = 0.0255
Also kindly refer to the links mentioned below for better understanding:-
Hope this helps!
  1 件のコメント
marco ciapparelli
marco ciapparelli 2023 年 7 月 12 日
Hi Diya,
Thanks for your answer. I see that numerical precision may affect the ranking step and thus the Spearman partial correlation coefficients. However, I'm still not convinced this is the issue I'm facing. The difference in precision between single- and double-precision formats should too small to have an appreciable effect on the coefficients, or at least not of this magnitude. Consistent with this, If I round the numbers to few decimal places, the impact on the coefficients is negligible. I also see that the ranks obtained with single-precision and bouble-precision formats are equal (except for y, where only 2 of 10000 values are flipped in the ranking):
[~,ranksx] = sort(single(x),'ascend');
[~,rankx] = sort(x,'ascend');
isequal(rankx,ranksx) % = 1
Perhaps I'm missing something!

サインインしてコメントする。

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by