The Ledrappier-Walters formula

In this post I discuss the Ledrappier-Walters formula.

Set Up

EDIT: This post is still unfinished

Throughout, let $(X,d)$ be a compact metric space and $T : X \rightarrow X$ a continuous map (which is our dynamical system). For $\delta > 0$ and $n \in \mathbb{N}$ we say a set $E \subseteq X$ is delta separated if for any $x \neq y \in E$ we have

\[\max_{0 \leq i < n} d(T^i(x), T^i(y)) \geq \delta.\]

Let $f : X \rightarrow \mathbb{R}$ be any function and define

\[f_n(x) := \sum_{i=0}^{n-1} f(T^i(x)).\]

We can define the exponential accumulation by

\[S(f, T, \delta, n) := \inf\left\{\sum_{x \in E} \exp(f_n(x)) : E \subseteq X \text{ is a delta separated set} \right\}.\]

The topological pressure is then defined as

\[P(f, T) := \lim_{\epsilon \rightarrow 0} \limsup_{n \rightarrow \infty} \frac{1}{n} \log(S(f,T,\delta,n)).\]

Note that we recover the entropy if we let $f = 0$ in the above definition, so

\[h(T) = P(0,T).\]

One observation to make is that the space of $T$-invariant probability measures on $X$ is compact (in the weak*-topology) and convex, with extreme points the ergodic measures (this can be found in Walters book).

The next goal is to define measure theoretic entropy. Recall that a measurable partition for a measure space $(X, \mathcal{M}, \mu)$ is a collection of measurable subsets $\xi := \{A_i\}$ such that the following hold:

(1) $\mu \left( X \setminus \left(\bigcup_i A_i \right) \right) = 0.$

(2) $\mu(A_i \cap A_j) = 0$ for all $i \neq j$$.

A measurable partition is then a collection of subsets such that they cover $X$ almost everywhere and they are almost everywhere disjoint. We will make the additional assumption that $\mu(A_i) > 0$ for each $i$, which just generally means we ignore the sets of measure $0$ in our calculations. All measurable partitions are going to be either finite or countable with finite entropy.

Let $\xi$ be a measurable partition. We define the entropy of $\xi$ to be

\[H_\mu(\xi) := \sum_{i \in I} \mu(A_i) \log(\mu(A_i)).\]

If one doesn’t want to ignore zero measure sets in their partition, then we use the same definition with the convention that $0 \log(0) = 0$.

We define the conditional measure by

\[\mu(A \mid B) := \frac{\mu(A \cap B)}{\mu(B)}.\]

If we have two measurable partitions $\xi := \{A_i\}_{i \in I}$ and $\eta := \{B_j\}_{j \in J}$, then we define the conditional entropy of $\xi$ with respect to $\eta$ by

\[H_\mu(\xi \mid \eta) := -\sum_{j \in J} \mu(B_j) \sum_{i \in I} \mu(A_i \mid B_j) \log(\mu(A_i \mid B_j)).\]

We also define the join of two measurable partitions by

\[\xi \vee \eta := \{A \cap B : A \in \xi, B \in \eta\}.\]

If $\xi$ is a measurable partition and $T : X \rightarrow X$ a measure preserving transformation, then we define the joint partition by

\[\xi_{n}^T := \bigvee_{i=0}^{n-1} T^{-i}(\xi).\]

The measure entropy of a measure preserving transformation $T : X \rightarrow X$ and a measurable partition $\xi$ is defined by

\[h_\mu(T, \xi) := \lim_{n \rightarrow \infty} \frac{1}{n} H(\xi_{n}^T).\]

The measure entropy of just $T$ is then defined as

\[h_\mu(T) := \sup\{h_\mu(T,\xi) : \xi \text{ is a finite measurable partition of }X\}.\]

Suppose now $\pi : X \rightarrow Y$ is a measure preserving transformation of $X$ onto a probability space $Y$ so that $\pi T = S \pi$ for a measure preserving transformation $S : Y \rightarrow Y$. If $\xi$ a measurable partition of $X$, then we define the relative entropy of $T$ with respect to $S$ and the partition $\xi$ as

\[h_\mu(T \mid S, \xi) := \lim_{n \rightarrow \infty} H_\mu(\xi_n^T \mid \pi^{-1}(\epsilon_Y)),\]

where $\epsilon_Y$ is the partition of $Y$ into points. We define the relative entropy of $T$ with respect to $S$ by

\[h_\mu(T \mid S) := \sup\{ h_\mu(T \mid S, \xi) : \xi \text{ is a finite measurable partition of } X\}.\]

Claim: Let $\xi, \eta$ be finite partitions of $X$. We have

\[h_\mu(T \mid S, \xi) \leq h_\mu(T \mid S, \eta) + H_\mu(\xi \mid \eta \vee \pi^{-1}(\epsilon_Y)).\]

Proof: Recall that we have the following nice properties for entropy.

(1) For any finite measurable partitions $\xi, \eta$ we have

\[H_\mu(\xi \vee \eta) = H(\xi) + H(\eta \mid \xi).\]

(2) For any finite measurable partitions $\xi, \eta, \rho$ we have

\[H_\mu(\eta \vee \xi \mid \rho) + H_\mu(\xi \mid \rho) + H(\eta \mid \xi \vee \rho).\]

Using the above, we see that for fixed $n$ we have

\[H_\mu(\xi_n^T \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y)) + H_\mu(\eta_n^T \mid \pi^{-1}(\epsilon_Y)) = H_\mu(\xi_n^T \vee \eta_n^T \mid \pi^{-1}(\epsilon_Y)) \geq H(\xi_n^T \mid \pi^{-1}(\epsilon_Y)).\]

To show our goal, we just need to study the term $H_\mu(\xi_n^T \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y))$. The trick is to proceed inductively. Notice

\[H_\mu(\xi_n^T \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y)) = H_\mu(\xi \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y)) + H_\mu(T^{-1}(\xi_{n-1}^T) \mid \xi \vee \eta_n^T \vee \pi^{-1}(\epsilon_Y)).\]

We can get an upper bound:

\[H_\mu(\xi_n^T \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y)) \leq H_\mu(\xi \mid \eta \vee \pi^{-1}(\epsilon_Y)) + H_\mu(T^{-1}(\xi_{n-1}^T) \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y)).\]

We can continue in this fashion to eventually get

\[H_\mu(\xi_n^T \mid \eta_n^T \vee \pi^{-1}(\epsilon_Y)) \leq n H_\mu(\xi \mid \eta \vee \pi^{-1}(\epsilon_Y)).\]

The result now follows by plugging in limits. See Katok and Hasselblatt Proposition 4.3.10 for more details. $\blacksquare$

We define the boundary of a measurable partition $\xi := \{A_1, \ldots, A_n\}$ to be

\[\partial \xi := \bigcup_{i=1}^n \partial A_i.\]

We define the measure $\delta_x$ to be the probability measure concentrated at the point $x$ (that is, the Dirac measure). Also recall that if $\pi : X \rightarrow Y$ a measurable transformation, $\mu$ a measure on $X$, then the pushforward measure is defined by

\[\pi^*(\mu)(E) = \mu(\pi^{-1}(E)).\]

Finally if $\pi : X \rightarrow Y$ is continuous, $T : X \rightarrow X$ is continuoius, $S : Y \rightarrow Y$ is continuous, and $\pi T = S \pi$, then we say that $S$ is a factor of $T$.

Theorem

With all of this, we can now actually state the major theorem (see Theorem 2.1 in the paper).

Theorem: Let $X, Y$ be compact metric spaces and let $T : X \rightarrow X$, $S : Y \rightarrow Y$, $\pi : X \rightarrow Y$ be continuous maps so that $\pi$ is surjective and $\pi T = S \pi$. Let $f : X \rightarrow \mathbb{R}$ be continuous and let $\nu \in M(Y,S)$. Then

\[\sup \left\{h_\mu(T \mid S) + \int f d\mu : \mu \in M(X,T) \text{ and } \pi^*(\mu) = \nu\right\} = \int_Y P(T,f,\pi^{-1}(y))d\nu(y).\]

Sketch of Proof

Throughout the proof, we assume the conditions of the theorem above, namely we assume that $X, Y$ are compact metric spaces, $T :X \rightarrow X$, $S : Y \rightarrow Y$, $\pi : X \rightarrow Y$ are all continuous, $\pi$ is surjective, and $S$ is a factor of $T$. The first lemma says that conditional entropy with respects to maps behaves similar to conditional entropy with respect to partitions. This is Lemma 3.1 in the paper.

Lemma: Let $Z$ be a compact space and $R$ a factor of $S$ by $\delta : Y \rightarrow Z$. If $\mu \in M(X,T)$ then we have

\[h_\mu(T \mid R) = h_{\pi^*(\mu)}(S \mid R) + h_\mu(T \mid S).\]

The proof of this essentially follows from the so-called Pinsker’s lemma (see Rokhlin’s lecture notes). While a little technical, it is essentially the same as it’s partition counterpart. The next lemma is also important, but a little bit technical. This is Lemma 3.2 in the paper.

Lemma: (i) Let $\xi, \eta$ be finite partitions of $X$, let $\mu_1, \mu_2$ be two probability measures on Borel subsets of $X$. If $0 \leq p \leq 1$ then

\[p H_{\mu_1}(\xi \mid \eta\) + (1-p) H_{\mu_2} (\xi \mid \eta) \leq H_{p \mu_1 + (1-p)\mu_2}(\xi \mid \eta).\]

(ii) Suppose $\xi$ is a finite partition of $X$ and let $\mu$ be a finite Borel probability measure on $X$ with $\mu(\partial \xi) = 0$. Let $\{\mu_n\}$ be a sequence of Borel probability measures so that $\mu_n \rightarrow \mu$. Then

\[\lim_{n \rightarrow \infty} H_{\mu_n}(\xi \mid \eta) \leq H_\mu(\xi \mid \eta).\]

(iii) The function $\mu \mapsto h_\mu(T \mid S)$ defined on $M(X,T)$ is affine.

We omit the proof of this one as well. Observe that the point of the prior lemma was to be used for this lemma. With this in mind, if we let $\|f\|$ denote the supremum norm for $C(X)$ then we can set up the following lemma (this is Lemma 3.3 in the paper).

Lemma: Let $f \in C(X)$. The mappings $y \mapsto P_n(T, f, \pi^{-1}(y), \delta)$ of $Y$ to $\mathbb{R}$ are Borel measurable for $n \geq 1$. Moreover for each $\delta > 0$ there is a constant $C(\delta)$ so that

\[\frac{1}{n} \log(P_n(T, f, \pi^{-1}(y), \delta)) < C(\delta) \text{ for } n \geq 1, y \in Y.\]

Sketch of Proof: Recall that

\[P_n(T,f,\pi^{-1}(y), \delta) := \sup \left\{ \sum_{x \in E} \exp(f_n(x)) : E \text{ is } (n,\delta) \text{ separated and } E \subseteq \pi^{-1}(y) \right\}.\]

Ledrappier-Walters breaks this up into smaller more manageable chunks which are easy to study. The idea is to fix $k \in \mathbb{N}$ and $\delta > 0$ and look at those chunks which have $k$ factors that stay roughly larger than $e^{nt}$ for some $t \in \mathbb{R}$. The first step is to look at the collection of $k$-tuples of points in $X$ who all project down into $y$. That is, we look at

\[D_k := \{(x_1, \ldots, x_k) \in X^k : \pi(x_1) = \cdots = \pi(x_k)\}.\]

Now we want them to stay separated, so look at

\[E_k^{n,\delta} := \{(x_1, \ldots, x_k) \in X^k : \max_{0 \leq m < n} d(T^m(x_i), T^m(x_j)) \geq \delta \text{ if } i \neq j\}.\]

Finally we want this approximately larger than the exponential sum, so

\[F_{k,l}^{n,t} := \left\{ (x_1, \ldots, x_k) \in X^k : \sum_{i=1}^k \exp(f_n(x_i)) \geq e^{nt} + 1/l\right\}.\]

The advantage now is that each of these sets are going to be closed. The first one is the preimage of a closed set under a continuous map, the second is the intersection of a bunch of closed sets, and the third one is the preimage of a closed set under a continuous map. Let $\phi : X^k \rightarrow Y, \phi(x_1, \ldots, x_k) = \pi(x_1)$. Taking the intersection and mapping it through $\phi$, we have

\[G_{k,l}^{n,\delta,t} = \phi(D_k \cap E_k^{n,\delta} \cap F_{k,l}^{n,t}).\]

This is a compact set, and furthermore these are the slices (in terms of $k$ and $l$) of things we actually care about. Taking the union over all $k$ and $l$ gives us a Borel set which is exactly

\[\bigcup_{k,l} G_{k,l}^{n,\delta,t} = \{y : P_n(T, f, \pi^{-1}(y), \delta) > e^{nt}\}.\]

This tells us the mapping is Borel. If one makes the observation that $P_n(T,f,\pi^{-1}(y), \delta) \leq e^{n \|f\|} s_n(T,\pi^{-1}(y), \delta),$ then the last condition follows easily. $\blacksquare$

The final lemma is one that should seem familiar from ergodic theory. This is Lemma 3.4 in the paper.

Lemma: If $K \subseteq X$ compact and $m \in \mathbb{N}$ then

\[P(T^m, f_m, K) = m P(T,f,K).\]

Sketch of Proof: It’s mostly just a matter of running through definitions. Suppose $E$ is $(n,\delta)$ separated for $T^m$. This means that for $x \neq y$ in $E$ we have

\[\max_{0 \leq i < nm} d(T^i(x), T^i(y)) \geq \max_{0 \leq k < n} d(T^{km}(x), T^{km}(y)) \geq \delta.\]

Thus $E$ is an $(mn,\delta)$ separated set for $T$. Now

\[P_n(T^m, f_m, K, \delta) \leq P_{nm}(T,f,K, \delta).\]

Thus

\[\frac{1}{n} \log(P_n(T^m, f_m, K, \delta)) \leq m \cdot \frac{1}{nm} \log(P_{nm}(T,f,K,\delta)).\]

Take the limit as $n \rightarrow \infty$ and $\delta \rightarrow 0$ to get the inequality going in one direction. The other direction follows from a clever choice of $\delta$ and $\delta'$ and then doing the same kind of inequality tricks. $\blacksquare$

Written on September 13, 2021