Kullback divergence
Solomon Kullback, Information Theory and Statistics
If , is the hypothesis that is from statistical measure , where
and is the prior probability of the respective hypothesis, , it follows that conditional on ,
except over a zero measured set by .
One obtains
except over a zero measured set by .
This logarithm of the likelihood ratio, is defined as the information in for discrimination in favour of against .
The mean information for discrimination in favour of against over set is
Over the entire space,
It is non-negative, . It is zero if except over a zero measured set by . It attains when for example, the space can be divided into two sets, on one of them , , and on the other , .
The prior information, still remains in .
Likewise,
One may define the divergence ,
If we slightly rewrite it,
If one swaps the position of hypothesis 1 and 2 one obtains the same quantity, . So it is symmetric, and the prior information disappeared. Kullback proposes to call or directed divergence.
As , ,
Both bounds are attainable.