Weighted correlation and covariance (weightedcorrs)
Weighted correlation and covariance (weightedcorrs)
Python, Jupyter notebook, and MATLAB function to calculate weighted correlation coefficients, covariance, and standard deviations
Installation for MATLAB
Download the weightedcorrs.m file from this github repository (https://github.com/gjpelletier/weightedcorrs) or MATLAB File Exchange and copy it to your working directory or session search path folder.
Installation for Python, Jupyter Notebook, and Google Colab
First install weightedcorrs as follows with pip or !pip in your notebook or terminal:
pip install git+https://github.com/gjpelletier/weightedcorrs.git
Next import the weightedcorrs function as follows in your notebook or python code:
from weightedcorrs import weightedcorrs
As an alternative, you can also download weightedcorrs.py from this github repository (https://github.com/gjpelletier/weightedcorrs) and add it to your own project.
Syntax for MATLAB
SYNTAX:
- [R,p,wcov,wstd,wmean] = weightedcorrs(X,w)
List of outputs:
-
'R' is the output of the weighted Pearson correlation coefficients calculated from an input nobs-by-nvar matrix X whose rows are observations and whose columns are variables and an input nobs-by-1 vector w of weights for the observations. This function may be a valid alternative to MATLAB's corrcoef if observations are not all equally relevant and need to be weighted according to some theoretical hypothesis or knowledge.
-
'p' is the output of the p-values of the Pearson correlation coefficients
-
'wcov' is the output of the weighted covariance matrix
-
'wstd' is the output of the weighted standard deviations
-
'wmean' is the output of the weighted means
Input of w is optional. If w=0, 1, or is omitted, then the function assigns w = np.ones(nobs). If w=0 or omitted, then the covariance and standard deviations are unweighted and normalizd to nobs-1, and the means are unweighted. If w=1, then the covariance and standard deviations are unweighted and normlizd to nobs, and the means are unweighted. Otherwise, if w is an input vector of weights, then results are normalized to nobs for output of standard deviations and covariance, and the means are weighted by w. If w=0 or omitted, or if the input vector of w = ones(nobs,1), then there is no difference between weightedcorrs(X, w) and corrcoef(X).
Syntax for Google Colab, Jupyter Notebooks, and Python
SYNTAX:
- results = weightedcorrs(X,w)
weightedcorrs returns a results dictionary that contains the following outputs: R, p, wcov, wstd, and wmean.
-
results['R'] is the output of the weighted Pearson correlation coefficients calculated from an input nobs-by-nvar matrix X whose rows are observations (nobs) and whose columns are variables (nvar), and an input nobs-by-1 vector w of weights for the observations. This function may be a valid alternative to np.corrcoef if observations are not all equally relevant and need to be weighted according to some theoretical hypothesis or knowledge.
-
results['p'] is the output of the p-values of the Pearson correlation coefficients
-
results['wcov'] is the output of the weighted covariance matrix
-
results['wstd'] is the output of the weighted standard deviations
-
results['wmean'] is the output of the weighted means
Input of w is optional. If w=0, 1, or is omitted, then the function assigns w = np.ones(nobs). If w=0 or omitted, then the covariance and standard deviations are unweighted and normalizd to nobs-1, and the means are unweighted. If w=1, then the covariance and standard deviations are unweighted and normlizd to nobs, and the means are unweighted. Otherwise, if w is an input vector of weights, then results are normalized to nobs for output of standard deviations and covariance, and the means are weighted by w. If w=0 or omitted, or if the input vector of w = np.ones(nobs), then there is no difference between weightedcorrs(X, w) and np.corrcoef(X,rowvar=False).
Example for MATLAB
load hospital
X = [hospital.Weight hospital.BloodPressure];
w = hospital.Age;
[wrho,p,wcov,wstd,wmean] = weightedcorrs(X,w)
wrho =
1.0000 0.1554 0.2307
0.1554 1.0000 0.5104
0.2307 0.5104 1.0000
p =
0 0.1226 0.0209
0.1226 0 0.0000
0.0209 0.0000 0
wcov =
683.6229 27.4898 41.7875
27.4898 45.7666 23.9182
41.7875 23.9182 47.9886
wstd =
26.1462 6.7651 6.9274
wmean =
154.4530 122.9480 83.0643
Example for Google Colab, Jupyter Notebook, and Python
The first step is to install weightedcorrs from github as follows:
!pip install git+https://github.com/gjpelletier/weightedcorrs.git
Next we need to import the weightedcorrs function and numpy as follows:
from weightedcorrs import weightedcorrs
import numpy as np
Now we are ready to show an example of doing an analysis of some data using weightedcorrs. We will use a data set from the MATLAB example data packages called 'hospital'. The X array will be a three column array, where the first column is the patient's weight, the second column is the systolic blood pressure, and the third column is the diastolic blood pressure.
Note that weightedcorrs requires the first index of X to be rows of observations, and the second index to be columns of random variables.
# First we assign the X array with the first index as random variables
# and second index as observations (therefore we will need to transpose X after this):
X = [np.array([176., 163., 131., 133., 119., 142., 142., 180., 183., 132., 128.,
137., 174., 202., 129., 181., 191., 131., 179., 172., 133., 117.,
137., 146., 123., 189., 143., 114., 166., 186., 126., 137., 138.,
187., 193., 137., 192., 118., 180., 128., 164., 183., 169., 194.,
172., 135., 182., 121., 158., 179., 170., 136., 135., 147., 186.,
124., 134., 170., 180., 130., 130., 127., 141., 111., 134., 189.,
137., 136., 130., 137., 186., 127., 176., 127., 115., 178., 131.,
183., 194., 126., 186., 188., 189., 120., 132., 182., 120., 123.,
141., 129., 184., 181., 124., 174., 134., 171., 188., 186., 172.,
177.]),
np.array([124., 109., 125., 117., 122., 121., 130., 115., 115., 118., 114.,
115., 127., 130., 114., 130., 124., 123., 119., 125., 121., 123.,
114., 128., 129., 114., 113., 125., 120., 127., 134., 121., 115.,
127., 121., 127., 136., 117., 124., 120., 128., 116., 132., 137.,
117., 116., 119., 123., 116., 124., 129., 130., 132., 117., 129.,
118., 120., 138., 117., 113., 122., 115., 120., 117., 123., 123.,
119., 110., 121., 138., 125., 122., 120., 117., 125., 124., 121.,
118., 120., 118., 118., 122., 134., 131., 113., 125., 135., 128.,
123., 122., 138., 124., 130., 123., 129., 128., 124., 119., 136.,
114.]),
np.array([93., 77., 83., 75., 80., 70., 88., 82., 78., 86., 77., 68., 74.,
95., 79., 92., 95., 79., 77., 76., 75., 79., 88., 90., 96., 77.,
80., 76., 83., 89., 92., 83., 80., 84., 92., 83., 90., 85., 90.,
74., 92., 80., 89., 96., 89., 77., 81., 76., 83., 78., 95., 91.,
91., 86., 89., 79., 74., 82., 76., 81., 77., 73., 85., 76., 80.,
80., 79., 82., 79., 82., 75., 91., 74., 78., 85., 84., 75., 78.,
81., 79., 85., 79., 82., 80., 80., 92., 92., 96., 87., 81., 90.,
77., 91., 79., 73., 99., 92., 74., 93., 86.])]
# Next we transpose X so that the first index is rows of observations (nobs),
# and the second index is columns of random variables (nvar)
X = np.transpose(X)
Next we will define the weighting factors to use for the analysis. In this example we will use the patients Age from the hospital data set as the weighting factors w.
w = np.array([38., 43., 38., 40., 49., 46., 33., 40., 28., 31., 45., 42., 25.,
39., 36., 48., 32., 27., 37., 50., 48., 39., 41., 44., 28., 25.,
39., 25., 36., 30., 45., 40., 25., 47., 44., 48., 44., 35., 33.,
38., 39., 44., 44., 37., 45., 37., 30., 39., 42., 42., 49., 44.,
43., 47., 50., 38., 41., 45., 36., 38., 29., 28., 30., 28., 29.,
36., 45., 32., 31., 48., 25., 40., 39., 41., 33., 31., 35., 32.,
42., 48., 34., 39., 28., 29., 32., 39., 37., 49., 31., 37., 38.,
45., 30., 48., 48., 25., 44., 49., 45., 48.])
Now we are ready to show the example of using weightedcorrs to calculate the weighted correlation coefficients, weighted covariance, and weighted standard deviations as follows:
output = weightedcorrs(X,w)
R = output['R']
p = output['p']
wcov = output['wcov']
wstd = output['wstd']
wmean = output['wmean']
print('weighted correlation coefficient matrix of X: ','\n',R,'\n')
print('p-values of the correlation coefficients: ','\n',p,'\n')
print('weighted covariance matrix: ','\n',wcov,'\n')
print('weighted standard deviations: ','\n',wstd,'\n')
print('weighted means: ','\n',wmean)
weighted correlation coefficient matrix of X:
[[1. 0.1554138 0.23071152]
[0.1554138 1. 0.51036961]
[0.23071152 0.51036961 1. ]]
p-values of the correlation coefficients:
[[0.00000000e+00 1.22589252e-01 2.09237757e-02]
[1.22589252e-01 0.00000000e+00 5.81286900e-08]
[2.09237757e-02 5.81286900e-08 0.00000000e+00]]
weighted covariance matrix:
[[683.62291955 27.48984917 41.78750454]
[ 27.48984917 45.76662876 23.91817879]
[ 41.78750454 23.91817879 47.9885557 ]]
weighted standard deviations:
[26.14618365 6.76510375 6.92737726]
weighted means:
[154.45297806 122.94801463 83.06426332]
Reference:
The mathematical formulas in matrix notation, together with MATLAB code, is also available in F. Pozzi, T. Di Matteo, T. Aste, "Exponential smoothing weighted correlations", The European Physical Journal B, Volume 85, Issue 6, 2012. DOI:10.1140/epjb/e2012-20697-x.
Adapted from weightedcorrs.m (Pozzi et al., 2012). Modified by Greg Pelletier 24-Jan-2024 to output p-values of correlation coefficients, weighted covariance matrix, and weighted standard deviations, and allow optional input of weighting factors for use with unweighted analysis or normalization to nobs-1
引用
Gregory Pelletier (2024). Weighted correlation and covariance (weightedcorrs) (https://github.com/gjpelletier/weightedcorrs), GitHub. に取得済み.
MATLAB リリースの互換性
プラットフォームの互換性
Windows macOS Linuxタグ
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!GitHub の既定のブランチを使用するバージョンはダウンロードできません
バージョン | 公開済み | リリース ノート | |
---|---|---|---|
1.0.7 | updated version number |
|
|
1.0.0 |
|