If the definition of covariance is (x-mean(x))'*(x-mean(x)), why cov(x) does not return the same result? Thank you.

The 'cov' function normalizes by dividing by N-1 where N is the number of observations, which in this case is the number of rows in your matrix x.

cov computation in Matlab

Bi Bu 2017 年 10 月 27 日

Thanks, that means that Matlab by default uses the sample covariance (n-1). Is this correct?

Roger Stafford 2017 年 10 月 27 日

Yes, by default it divides by the number of samples minus one except in case of one sample (heaven forbid!) it divides by 1.

Bi Bu 2017 年 10 月 27 日

Thank you!

Steven Lord 2017 年 10 月 27 日

And if you want it to normalize by N instead of N-1 even when N > 1, specify the input argument named w in the documentation as 1 instead of omitting it or specifying it as 0.

Bi Bu 2017 年 10 月 27 日

Thanks, this is helpful.

Bi Bu 2017 年 10 月 28 日

Dear Steven, one more question popped up: does the "mean" function in Matlab have the option to divide by n or n-1? Because in the case of "cov", it is taking the expected value (mean) of the results by dividing by n-1 and not n. So if I wanted to write differently this formula, and use "mean" I wouldn't have the option to use n-1. Thanks.

Roger Stafford 2017 年 10 月 28 日

MATLAB Online で開く

@Bi Bu: It would make no sense dividing by n-1 in taking the mean. To get an unbiased estimate from the sum of n samples, one needs to divide by just n. That is, assuming subsequent samples each have the same expected value, then the sum of n of them will have an expected value of n times the expected value of any one of them, so such a sum should be divided by just n.

However, the definition of the covariance between two variables involves the mean of each of them. If one uses samples to estimate these means along with estimating their covariance, it can be shown by rather simple mathematics that a division by n-1 rather than n is necessary in the sum of products used to yield an unbiased estimate of the theoretical covariance. This is due to the expected deviation of these sample means from their true means. If you are interested in the mathematics involved, there are many such demonstrations on the internet. One such is located at:

https://www.youtube.com/watch?v=D1hgiAla3KI

Bi Bu 2017 年 10 月 28 日

Thank you so much! Great response. I will definitely watch the video. However, I can't see how the simple mean of a sample wouldn't be as biased as the means of the samples used to compute covariance. They are samples in both cases, after all.

Roger Stafford 2017 年 10 月 28 日

編集済み: Roger Stafford 2017 年 10 月 28 日

@Bi Bu: No, the two expressions approximating the mean and the covariance are of a different nature. In the case of the mean the expression is a simple sum so that its expected value is simply the sum of the n separate means, and that certainly indicates the need to divide by n, not n-1 (where n is the number of terms). To divide by n-1 would be to give a biased estimate.

On the other hand, the expression for approximating the covariance is the sum of products, which in part depend on an approximation to the means of the two variables. It is this latter source of variation that has the effect of reducing, somewhat, the expected value of this expression, and results in a need to divide by the smaller n-1, not n. There is no such feature in the simple mean computation.

Remember, producing an unbiased estimate is defined as having the expected value of the approximation be precisely equal to the theoretical mean or covariance, so there is no choice in the matter in either case.

By the way, the website demonstration I mentioned above is actually concerned with the variance of one variable rather than the covariance of two variables. However, its argument is very similar to that needed for covariance, so it should serve to show the need for dividing by n-1 for the covariance computation. I would give the proof here, but I’m afraid it would take up quite a lot of space in this supposedly simple “answer”.

Bi Bu 2017 年 10 月 29 日

It would be great if you could write the answer. Is it too long?

cov computation in Matlab

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

10 件のコメント
8 件の古いコメントを表示 8 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

Community Treasure Hunt

cov computation in Matlab

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

10 件のコメント 8 件の古いコメントを表示 8 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

10 件のコメント
8 件の古いコメントを表示 8 件の古いコメントを非表示