**1. Shannon Entropy**

- This formula measures the uncertainty of information (it is not easy to guess the information). It also express the least number of questions to identify the information.

**Note**: log is base 2

- We have 3 strings: "AAAAAAAA", "AAAABBCD", "AABBCCDD"

"AAAAAAAA" has S = -1*log(8/8) = 0

"AAAABBCD" has S = -(4/8)*log(4/8) - (2/8)*log(2/8) - (1/8)*log(1/8) - (1/8)*log(1/8) = 1.75

"AABBCCDD" has S = -(2/8)*log(2/8) - (2/8)*log(2/8) - (2/8)*log(2/8) - (2/8)*log(2/8) = 2

- The uncertainty of "AABBCCDD" is largest

Refer this.

**2. Apply in Deep Learning - Classification**

Use a modification version of Shannon Entropy => Cross-Entropy.

**2.1 Binary Cross-Entropy Loss**

Output only takes 2 classes.

**yi**: True label

**p(yi)**: Predicted label

**2.2 Cross-Entropy**

**Loss**Output can takes n (> 2) classes.

**q(yc)**: True label, one-hot encoded

**p(yc)**: Predicted label passed through softmax

Comparing with Shannon Entropy

If p(yc) move colser to q(yc) (minimizes the cross-entropy), Cross-Entropy becomes Shannon Entropy. But Cross-Entropy is often greater than Shannon Entropy then we have Kullback-Leibler Divergence. Kullback-Leibler Divergence measures the divergence between

**q(yc)**and

**p(yc).**

Refer this.

## No comments:

## Post a Comment