Differential Entropy

What is "differential" in differential entropy?

Jun 12, 2023

3 mins

Entropy for continuous random variables is technically called differential entropy. I’ve always wondered what the differential means, and I finally have an answer.

Discrete Random Variables🔗

Shannon’s groundbreaking work in information theory¹ defined information as a measure of surprise. Specifically, for discrete random variables $X$ as $-\log{p(X)}$ where $p(X)$ is the probability mass. Consequently, the average information, or entropy $H$ is defined as,²

H(X) = -\sum_{i} p(X_i)\log{p(X_i)}.

Extending this definition to continuous random variables, however, is tricky as we’ll see next.

Continuous Random Variables🔗

Discrete probability masses are often visualized as histograms. In similar spirit, instead of thinking in terms of a continuous random variable $X$ , we are going to think in terms of its discretized version $\Delta X$ , binned into buckets of width $dX$ .³

To construct the entropy of such a discretized distribution, we need to define $p(\Delta X)$ . One way is to think in terms of the area of one bin relative to the total area occupied by all bins. For $n(\Delta X)$ number of values in a bin, the area will be $a = n(\Delta X) \times dX$ (a thin rectangle). For the total area across all bins $A = \sum a$ , we have the probability of a bin as $p(\Delta X) = a/A$ . This construction satisfies the law of total probability such that $\sum p(\Delta X) = 1$ , i.e. probability of all bins sum to $1$ .

Now that we have a normalized histogram, we can instead work with normalized counts which we denote by $q(\Delta X)$ . Under such a normalization, the area itself defines the probability of the bin:

p(\Delta X) = q(\Delta X) \times dX.

Instead of our original continuous random variable $X$ , let us now work with this definition of probability for the discretized version $\Delta X$ .

Entropy of Discretized Random Variable🔗

Let’s plug the definition of discretized probability $p(\Delta X)$ into entropy. We have

\begin{aligned} H(\Delta X) &= - \sum p(\Delta X) \log{p(\Delta X)} \\ &= -\sum q(\Delta X) dX \log{q(\Delta X)} - \log{dX} \times \underbrace{\sum q(\Delta X) dX}_{\sum p(\Delta X) = 1} \\ &= - dX \left[\sum q(\Delta X) \log{q(\Delta X)}\right] - \log{dX} \end{aligned}

As the bin width $dX$ approaches zero, the entropy becomes:

H(X) = -\int_{\mathcal{X}} q(X)\log{q(X)} + \infty

This result is trouble - the entropy for all continuous random variables in infinite. In principle, this result is not wrong - as the precision of our continuous quantity’s measurement increases (i.e. the bin width decreases), the average surprise in the measurement increases. But it leaves us with an unworkable definition of entropy for continuous random variables since we always need to know the bin width.