Mutual information and its cousin, the Uncertainty coefficient (Theil’s U) are useful tools from Information Theory for discovering dependencies between variables that are not necessary described by a linear relationship.
Plenty of good material already exists on the subject: see Section 1.6 in “Pattern Recognition and Machine Learning” by Bishop, freely available as PDF online. What I hope to contribute here are some trivial examples, to help build intuition, based on categorical/multinomial variables.
[Read More]