The proof is given considering the relationship with entropy, as shown below.
Notice the analogy to the union, difference, and intersection of two sets: in this respect, all the formulas given above are apparent from the Venn diagram reported at the beginning of the article.
The proofs of the other identities above are similar. The proof of the general case (not just discrete) is similar, with integrals replacing sums.
Similarly this identity can be established for jointly continuous random variables.
Several variations on mutual information have been proposed to suit various needs. Among these are normalized variants and generalizations to more than two variables.
Many applications require a metric, that is, a distance measure between pairs of points. The quantity
Sometimes it is useful to express the mutual information of two random variables conditioned on a third.
Conditioning on a third random variable may either increase or decrease the mutual information, but it is always true that
Some authors reverse the order of the terms on the right-hand side of the preceding equation, which changes the sign when the number of random variables is odd. (And in this case, the single-variable expression becomes the negative of the entropy.) Note that
which attains a minimum of zero when the variables are independent and a maximum value of
Using the ideas of Kolmogorov complexity, one can consider the mutual information of two sequences independent of any probability distribution:
The equation above can be derived as follows for a bivariate Gaussian:
In many applications, one wants to maximize mutual information (thus increasing dependencies), which is often equivalent to minimizing conditional entropy. Examples include: