The "relativeEntropy" function implements the equation for the one-dimensional case that is below equation (5.14) on page 176 (section 5.5.1) in the book: Theodoridis, Sergios, and Konstantinos Koutroumbas. Pattern Recognition, 2nd ed. Amsterdam; Boston: Academic Press, 2003. This equation is reproduced in the screenshot below.
The code snippet at the end of this answer demonstrates how the "relativeEntropy" function works internally by implementing this equation. Moreover, a few things to note about this equation:
- As the natural logarithm was used in its derivation, the output has units of nats.
- In the screenshot above, d_ij = d_ji. Hence, in this case, the calculation is symmetric.
- The entropy calculation assumes that the data in the input "X" follows a Gaussian distribution, as mentioned in the documentation:
Z = relativeEntropy(X,I) calculates the one-dimensional Kullback-Leibler divergence of two independent subsets of data set X that are grouped according to the logical labels in I. The relative entropy provides a metric for ranking features according to their ability to separate two classes of data, such as healthy and faulty machines. The entropy calculation assumes that the data in X follows a Gaussian distribution.
In the code snippet below, we do the following:
- Given two pairs of means and variances, we sample 1000 measurements each from two different Gaussian probability density functions (PDFs). Let the two sets of samples from these two PDFs be X1 and X2 respectively.
- Compute the KL-divergence between X1 and X2 using the "relativeEntropy" function and store the result in "Z".
- Compute the KL-divergence between X1 and X2 by substituting the ground-truth means and variances into the equation above and store the result in "Z_hat1".
- Compute the KL-divergence between X1 and X2 by substituting the maximum likelihood estimates of the means and variances into the equation above and store the result in "Z_hat2".
- Compare "Z" to "Z_hat2" to show that they are equal up to machine precision.
n = 1000;
var1 = 4;
var2 = 25;
mean1 = 3;
mean2 = 7;
X1 = sqrt(var1) * randn(n,1) + mean1;
X2 = sqrt(var2) * randn(n,1) + mean2;
X = [X1;X2];
I = logical([ones(1,n),zeros(1,n)]);
Z = relativeEntropy(X,I)
Z_hat1 = 0.5 * ((var2 / var1) + (var1 / var2) - 2) + ...
0.5 * (mean1 - mean2)^2 * ((1/var1) + (1/var2))
var1_hat = var(X1);
var2_hat = var(X2);
mean1_hat = mean(X1);
mean2_hat = mean(X2);
Z_hat2 = 0.5 * ((var2_hat / var1_hat) + (var1_hat / var2_hat) - 2) + ...
0.5 * (mean1_hat - mean2_hat)^2 * ((1/var1_hat) + (1/var2_hat))
assert(abs(Z - Z_hat2) < eps)