Supervised Learning
1 Performance with statistical output
1.1 BC Coefficient
Measures the overlap between P(x) and Q(x).
\begin{equation} BC(P,Q) = \int_{x} P(x)Q(x)dx \end{equation}1.2 KL Divergence
Measures how much information you lose by using distribution Q to aproximate distribution P.
\begin{equation} KL(P|Q) = \int_{x} P(x)\log{\frac{P(x)}{Q(x)}}dx \end{equation}2 Cross Entropy
2.1 Entropy
Measures the average level of surprise of the outcome of the value of a random variable. A probabilty distribution with peeks has low entropy, a uniform one has high entropy. Entropy is also called information. It measures the number of bits needed to transmit a randomly selected event from a probability distribution. An event has more information the less likely it is.
\begin{equation} H(X) = -\sum_{x \in X}{p(x)\log{p(x)}} = \mathbb{E}[-\log{p(x)}] \end{equation}2.2 Cross Entropy
Intro
Cross entropy is formally defined like this:
\begin{align} H(P,Q) &= H(P) + KL(P|Q) \\ &= -\int_{x} P(x)\log{P(x)} + \int_{x} P(x)\log{\frac{P(x)}{Q(x)}}dx \\ &= -\int_{x} P(x)\log{P(x)} + \int_{x} P(x)(\log{P(x)} - \log{Q(x)}) \\ &= -\int_{x} P(x)\log{P(x)} + \int_{x} P(x)\log{P(x)} - P(x)\log{Q(x)}) \\ &= -\int_{x} P(x)\log{P(x)} + \int_{x} P(x)\log{P(x)} - \int_{x} P(x)\log{Q(x)}) \\ &= - \int_{x} P(x)\log{Q(x)})\\ \end{align}Cross entropy is the average number of bits needed to encode data coming from a source distributed with probability P when using model Q.
- Expected value of cross entropy measurment in the discrete case:
where \(D\) is the dataset, the problem is a \(k\) -class classification problem.
- Formula for the binary case
Proof
- HP: Kraft-McMillan theorem, a value \(x_{i}\) can be identified with \(l_{i}\) bits with probability \(Q(x_{i}) = 2^{-l_{i}}\).
- TH: \(H(P,Q) = -\sum_{x}P(x)\log{Q(x)}\)
- Proof: The real probability of \(x_{i}\) being identified with \(l_{i}\) bits is \(P(x_{i})\) \begin{align} \mathbb{E}_{P}[l] &= -\mathbb{E}_{P}[\log_{2}Q(x)] \\ &= -\sum_{x_{i}}\underbrace{P(x_{i})}_{\text P(l_{i})}\underbrace{[\log_{2}Q(x_{i})]}_{\text l_{i}} \end{align}