Mixing time and expansion of non-negatively curved Markov chains

We establish three remarkable consequences of non-negative curvature for sparse Markov chains. First, their conductance decreases logarithmically with the number of states. Second, their displacement is at least diffusive until the mixing time. Third, they never exhibit the cutoff phenomenon. The first result provides a nearly sharp quantitative answer to a classical question of Ollivier, Milman and Naor. The second settles a conjecture of Lee and Peres for graphs with non-negative curvature. The third offers a striking counterpoint to the recently established cutoff for non-negatively curved chains with uniform expansion.


Introduction
In Riemannian geometry, a lower bound on the Ricci curvature classically implies an array of powerful estimates for the underlying manifold, including diameter bounds, volume growth, comparison principles, splitting theorems, spectral estimates, and concentration inequalities [10].Over the past decade, those remarkable implications have motivated the development of non-smooth analogues of curvature that can be applied to discrete geometries [27,15,9,8,21,11,20,19].In particular, Ollivier [21] proposed a transportation-based definition that makes sense on arbitrary metric spaces, hence in particular on graphs and Markov chains.Informally, a metric space has non-negative Ollivier-Ricci curvature if balls are at least as close to each other as their centers are.The simplest example of a finite non-negatively curved graph is a cycle.It is classical that this graph has poor expansion, that the random walk on it exhibits a diffusive behavior, and that its mixing time is of the same order as the inverse spectral gap.The aim of the present paper is to show that those three properties are in fact shared by all sparse Markov chains with non-negative curvature.Before we state our results in full generality, let us describe their content in the simple but important special case of random walk on graphs.
Non-negatively curved graphs.-Let G = (V, E) be a finite simple graph, and let P denote the random-walk transition matrix of G. Thus, P acts on any function f : V → R as follows: where the notation y ∼ x indicates that {x, y} ∈ E. Following Ollivier [21,22], we say that G has non-negative curvature if P contracts the Lipschitz norm, i.e., where ∥f ∥ lip := max y∼x |f (y) − f (x)|.This fundamental property is satisfied by many natural families of graphs, including all Abelian Cayley graphs and, more generally, all Cayley graphs whose generating set is conjugacy-invariant.Additional details, including a more effective formulation in terms of couplings, will be provided in the next section when we discuss curvature for general Markov chains.
Expansion.-Our first result concerns the expansion of graphs with non-negative curvature.Write ∂A for the edge-boundary of a set A ⊆ V , and deg(A) for the sum of the degrees of all vertices in A. With this notation, the conductance (also known as Cheeger constant, or bottleneck ratio) of G is Sequences of bounded-degree graphs whose size diverges but whose conductance remains bounded away from zero are famously known as expanders.Whether such graphs can have non-negative curvature is an important question, which explicitly appears in a survey by Ollivier [22, Probl.T], and is therein attributed to Milman and Naor.The problem has remained open until very recently, when a negative answer was given by the second author [26].Specifically, the latter used the notion of entropy for graph limits to prove that non-negative curvature and expansion are incompatible "at infinity", and the conclusion was then transferred to finite graphs using a compactness argument.A clear drawback of this approach is its non-quantitative nature.In particular, the second author asked for a direct, quantitative relation between volume, degree and expansion on non-negatively curved graphs.This is precisely the content of our first main result.
-If G has non-negative curvature, then where n is the number of vertices, d the maximum degree, and c a universal constant.
In other words, large graphs can not simultaneously enjoy non-negative curvature and uniform expansion unless their maximum degree grows at least like log n/ log log n.We note that this is sharp up to the log log n correction.Indeed, a celebrated result of Alon and Roichman asserts that random Cayley graphs with logarithmic degrees have uniform expansion with high probability [1], and specializing this result to random Cayley graphs of Abelian groups produces examples of non-negatively curved graphs with logarithmic degrees and uniform expansion.
Mixing times.-Our second result is a complete determination of the order of magnitude of the mixing time of all vertex-transitive graphs with bounded degrees and non-negative curvature.Suppose that G is vertex-transitive, with degree d and volume n.Fix an arbitrary origin x ∈ V (the choice is irrelevant, by transitivity), and consider the lazy simple random walk on G started at x, i.e., the Markov chain (X t ) t⩾0 on V with initial condition X 0 = x and transition matrix (P + I)/2.The mixing time of G is a fundamental graph-theoretical parameter, defined as follows [13]: An important, closely related quantity is the so-called relaxation time , denote the ordered eigenvalues of P .It is classical that t mix ⩾ t rel , and that this inequality can be off by a factor as large as log n, as is the case for expanders (see [13]).
Theorem 2 (Mixing times).-All vertex-transitive graphs with non-negative curvature satisfy , where the notation a ≍ d b means that the ratio a/b is bounded from above and below by positive constants that depend only on the degree d.
This has the following remarkable consequence.For a sequence of graphs (G n ) n⩾1 , the condition is known as the product condition.It is well-known to be necessary (see [13,Prop. 18.4]) for the occurrence of the so-called cutoff phenomenon, a celebrated but still mysterious phase transition in the approach to equilibrium of certain Markov chains (see [13,Chap. 18] for the precise definition).Thus, Theorem 2 implies that vertex-transitive graphs with fixed degree and non-negative curvature never exhibit cutoff.This stands in stark contrast with recent results due to the second author, showing that many non-negatively curved graphs with logarithmic degree do exhibit cutoff [25].Interestingly, the conclusion of Theorem 2 is known to hold for fixed-degree Cayley graphs of moderate growth [6].This geometric condition was later shown to be equivalent to the much simpler requirement that the diameter is algebraically large in the volume [5] (see the recent paper [28] for an extension to vertex-transitive graphs).This raises the following question.A positive answer would be surprisingly strong, but we have not been able to produce any counter-example.
Question 1 (Moderate growth?) -Do all non-negatively curved graphs with degree at most where n is the number of vertices, and ε d > 0 a constant depending only on d?
Indeed, this question was answered affirmatively in case of a modified Barky Emery curvature dimension condition in [2], and later by the first author in case of the weaker, unmodified Bakry Emery curvature dimension condition [17].
Diffusivity.-Finally, our last result concerns the speed of random walk on vertextransitive graphs with non-negative curvature.Many infinite graphs such as the line Z are known to exhibit a diffusive behavior, in the sense that the typical graph distance between X t and X 0 grows like √ t.On a finite graph, the distance to the starting point can of course no longer grow indefinitely with time, but one may still hope for a diffusive behavior on appropriate time-scales.This vague statement was recently given a powerful rigorous content by Lee and Peres [12], who showed that the simple random walk on any finite vertex-transitive graph satisfies the diffusive lower-bound Z n shows that this lower-bound is sharp.However, the authors conjectured that the time-scale on which the diffusive behavior remains valid should actually be much longer, namely, of order t mix [12, Conj.2.5].Our second result confirms this prediction in the case of non-negatively curved graphs.
where c is a universal constant.
J.É.P. -M., 2023, tome 10 We emphasize that our estimates are not restricted to simple random walks on graphs.Analogous results will be stated for general Markov chains with non-negative curvature.In particular, neither reversibility, nor even the symmetry of the support of P are actually required for a version of Theorem 1 to hold.Ollivier curvature with respect to a directed metric has been explored before in [30,23,24,7].However the specific consequences of non-negative curvature seem to be unexplored to the best of our knowledge.Our general results are exposed in Section 2 below, and are proved in Section 3.

Main results
In the remainder of the paper, we consider an arbitrary, irreducible stochastic matrix P on a finite state space V .A natural measure of the "distance" from a state x ∈ V to a state y ∈ V is the minimum number of transitions needed for the chain to move from x to y, namely This quantity is not necessarily symmetric, but it clearly satisfies the two other axioms of a distance.We may then use optimal transport to extend this notion to probability measures as follows: write P(V ) for the set of probability measures on V , and define where the infimum runs over all possible random pairs (X, Y ) whose marginals are µ and ν.Again, this quantity is not necessarily symmetric, but it always satisfies the two other axioms of a distance.Due to Kantorovich duality [29, Th. 5.10 & Particular Case 5.4], we can write Finally, we say that P has non-negative curvature if it is a contraction under W , i.e., (1) ∀µ, Ollivier curvature with a non-symmetric distance has been studied in [30,23,7,24].Due to Kantorovich duality and as in the introduction, non-negative curvature is equivalent to ∥P f ∥ lip ⩽ ∥f ∥ lip .By convexity, it is in fact sufficient to check property (1) on Dirac masses µ = δ x , ν = δ y , x, y ∈ V .Moreover, by the triangle inequality, we may further restrict our attention to the case where y is a neighbor of x (by which we mean that dist(x, y) = 1 and which we denote by y ∼ x), i.e., (2) ∀y ∼ x, W (P (x, •), This local condition is easily verified in practice.For example, it holds for random walks on Abelian groups and, more generally, random walks with a conjugacyinvariant support, as we now explain.
Example 1 (Random walks on groups).-Suppose that V is a group, and fix µ ∈ P(V ).By definition, the random walk on V with increment distribution µ is the Markov chain whose transitions correspond to left-multiplication by a µ−distributed element, i.e., P (x, y) := µ(yx −1 ).This chain has non-negative curvature as soon as the set Indeed, this assumption implies that dist(zx, zy) = dist(x, y) for all x, y, z ∈ V .In particular, if Z denotes a random variable with law µ, then the "obvious" coupling of P (x, •) and P (y, •) given by X := Zx and Y := Zy verifies (2).Note that the condition (3) trivially holds if the group is Abelian.An emblematic non-Abelian example is the transposition walk on the symmetric group [3].
To avoid periodicity issues, we now assume that P is lazy, i.e., P (x, x) ⩾ 1/2 for all x ∈ V .This is more than enough to guarantee that the chain mixes, i.e., where π = πP denotes the unique invariant distribution.Quantifying the speed at which this convergence to equilibrium occurs is a fundamental question, with many applications [13,16].Formally, this amounts to estimating the so-called mixing time: Here ∥µ − ν∥ tv denotes the total-variation distance between µ, ν ∈ P(V ), defined as where the infimum in the last expression runs over all possible couplings (X, Y ) of µ and ν.Thus, a natural way to estimate mixing times is to exhibit good couplings, and this is precisely where curvature enters the play.Indeed, an elementary but crucial reformulation of the non-negative curvature assumption (1) is that the trajectories (X t ) t⩾0 and (Y t ) t⩾0 emanating from any two states X 0 = x and Y 0 = y can be coupled in such a way that their distance t → dist(X t , Y t ) forms a super-martingale.
When combined with an appropriate diffusive estimate for super-martingales, this observation turns out to imply the following O(1/ √ t) decay for the total-variation distance between the laws of X t and Y t .
Theorem 4 (Total-variation decay).-If P is lazy and non-negatively curved, then, for all x, y ∈ V and all t ⩾ 0, where P min denotes the smallest non-zero entry of P .
J.É.P. -M., 2023, tome 10 Variants of this result have appeared in a number of works, under various forms [4,14,18,26].However, all proofs use the fact that the increments of the process t → dist(X t , Y t ) are uniformly bounded (by 2), and this property may dramatically fail in our more general setup where the metric is directed.Nevertheless, the conclusion turns out to remain valid, and a proof is presented in Section 3.3.The most "obvious" application of Theorem 4 consists in taking a maximum over all states x, y ∈ V to obtain the following mixing-time estimate, which is new in our directed setup.
Corollary 1 (Diameter bound).-If P is lazy and non-negatively curved, then where diam := max x,y dist(x, y) denotes the diameter of the state space.
While interesting in its own right, this estimate is actually not the key to the new results mentioned in the Introduction.Our main finding is that a significantly finer estimate can be deduced from Theorem 4 provided we replace the worst-case mixing time by its average version: Remark 1 (Transitive chains).-Obtaining a bound on t ♯ mix rather than t mix is not a huge drawback.For example, we have t ♯ mix = t mix for all random walks on groups and, more generally, for all transitive chains (P is transitive if for each x, y ∈ V , there is a bijection f : V → V which maps x to y and preserves the transition kernel, i.e., P (f (u), f (v)) = P (u, v) for all u, v ∈ V ).
Throughout the paper, we let X = (X t ) t⩾0 denote a Markov chain with transition matrix P starting from stationarity (X 0 ∼ π).Our main new estimate on t ♯ mix depends on two statistics of this chain.The first one is the mean displacement in t steps: The second is the escape probability in t steps, i.e., the conductance of P t : Theorem 5 (Main estimate).-If P is lazy and non-negatively curved, then where we recall that P min denotes the smallest non-zero entry of P .
J.É.P. -M., 2023, tome 10 Theorem 5 has a number of notable consequences, which we now enumerate.The simplest one is an "average" version of Corollary 1, obtained by sending t → ∞ in the infimum (4): (5) where diam ♯ := x,y∈V π(x)π(y) dist(x, y) denotes the effective diameter.Note that the latter can be significantly smaller than the true diameter appearing in Corollary 1 (consider, e.g., the biased random walk on a segment).A much more refined consequence of Theorem 5 is obtained by taking t = 1 in the infimum (4): writing Φ = Φ(P ), we readily obtain the following surprising bound.
Corollary 2 (Conductance bound).-If P is lazy and non-negatively curved, then This offers a considerable improvement over (5) in situations where the effective diameter diverges while the conductance remains bounded away from 0 (consider, e.g., random walk on a random Abelian Cayley graph with logarithmic degree).More importantly, by virtue of an elementary combinatorial lower-bound on t ♯ mix (see, e.g., [13, §7.1.1]),Corollary 2 implies the quantitative non-existence of non-negatively curved expanders promised in Theorem 1.For general chains, we will show that t ♯ mix can be bounded below by diam ♯ , leading to the following result.

Corollary 3 (Poor expansion). -If P is non-negatively curved, then
Thus, non-negatively curved chains which are large (diam ♯ ≫ 1) and sparse (P min bounded away from 0) must have poor expansion (Φ ≪ 1).This constitutes a precise quantitative answer to the Markov-chain generalization of the question of Milman, Naor and Ollivier [22, Probl.T].Note that there are examples of sparse chains with non-negative curvature and arbitrarily many states (consider, e.g., a biased random walk on a segment).However, the fact that their effective diameter is bounded forces their stationary measure to concentrate on a bounded number of states.
Corollary 2 is sharp in the important case where P is transitive, reversible and sparse.Indeed, we have the classical lower-bound t mix ⩾ t rel , where t rel := (1 − λ 2 ) −1 denotes the inverse spectral gap of P (see, e.g., [13]), and the first author established in [18] the Buser inequality t rel ⩾ P min 12Φ 2 , for any non-negatively curved, reversible chain.When combined with Corollary 2, this yields the following result, of which Theorem 2 is clearly a special case.-Fix p ∈ (0, 1).Then, any lazy reversible transitive chain with non-negative curvature and P min ⩾ p satisfies where the notation a ≍ p b means that the ratio a/b is bounded from above and below by positive constants that depend only on p.In particular, no family of such chains can exhibit cutoff.
An important observation here is that the transitivity of the chain is only used to ensure that t mix = t ♯ mix .Consequently, Corollary 4 extends to any collection of chains which are "spatially homogeneous" in the mild sense that t mix ≍ t ♯ mix .Finally, a last notable consequence of Theorem 5 is that the expected displacement of the chain over short time-scales is already substantial.More precisely, assuming that P is reversible, we have and the right-hand side is at least (1 − e −1 )/2 for all t ⩾ t rel , yielding the following estimate.
Corollary 5 (Fast escape).-If P is lazy, reversible and non-negatively curved, then where we recall that E dist(X 0 , X t ) = x,y π(x)P t (x, y) dist(x, y).
For reversible transitive chains, Lee and Peres [12] proved the diffusive lower-bound for all t such that P −1 min ⩽ t ⩽ t rel , where c > 0 is a universal constant.They conjectured that this diffusive lower-bound should remain valid until the mixing time [12,Conj. 2.5].Corollary 5 readily implies that this is true in the non-negatively curved case, and Theorem 3 follows as a special case.

Proofs
Section 3.1 below is devoted to the proof of our main result, namely the relation between conductance, displacement and mixing times (Theorem 5).The latter exploits the diffusive total-variation decay of non-negatively curved chains (Theorem 4), which will be proved independently in Section 3.3.Once Theorem 5 is established, all announced corollaries follow effortlessly, except for Corollary 3: the latter requires a lower bound on the average mixing time in terms of the effective diameter, which we prove in Section 3.2.

3.1.
Mixing time vs. conductance.-In this section, we prove Theorem 5. We will make crucial use of Theorem 4, as well as the following L 1 version of Cheeger's inequality.An important remark is that the latter holds without any assumption on the transition matrix P , and will thus also apply to powers of P .
Proof.-Upon replacing f with −f , we may assume that π(f ⩾ 0) ⩾ 1/2.For any t ⩾ 0, we may take A = {f ⩾ t} in the definition of Φ(P ) to obtain Integrating over t ∈ R + and interchanging the sum and integral, we obtain where a + := max(0, a) denotes the positive part of a. Now, since f is centered under π, the left-hand side does not change if we replace f + (x) by |f (x)|/2.Similarly, since any gradient is centered under the measure (x, y) → π(x)P (x, y), the right-hand side does not change if we replace for any centered observable f : V → R. Now, fix t ∈ N and z ∈ V , and let us apply this to the observable f (x) := P t (x, z) − π(z), which is centered because πP t = π.We readily obtain x,y∈V π(x)P s (x, y) P t (x, z) − P t (y, z) .
We may now sum over all z ∈ V and use Theorem 4 to get Finally, choosing t so that the right-hand side is smaller than 1/4 shows that The result follows by taking an infimum over all s ⩾ 1. □ J.É.P. -M., 2023, tome 10 3.2.Effective diameter vs. conductance.-Here we prove Corollary 3. We will use the following concentration inequality.
Lemma 2 (Concentration inequality).-For any f : V → R and a > 0, and the same holds for π(f ⩽ πf − a).
We remark that if "∼" is symmetric, then the minimum is the Lipschitz constant of f .
Proof.-Upon replacing f by f − πf if necessary, we may assume that f is centered under π, i.e., πf = 0. Now, we use Markov's inequality and Lemma 1 to write Note that, since the function (x, y) → f (x) − f (y) is centered under the measure (x, y) → π(x)P (x, y), the integral of its positive part equals that of its negative part.This establishes the first claim, and the second is obtained by replacing f with −f .□ We will use this lemma to prove the following lower-bound on the mixing time.
Lemma 3 (Diameter lower-bound).-For any lazy chain P , we have Proof.-Let us first note that for a lazy chain, Lemma 2 holds with the better constant 2Φ instead of Φ (just apply the lemma to the non-lazy chain 2P − I).Now, fix x ∈ V and t ∈ N, and write B x (t) := {y ∈ V : dist(x, y) ⩽ t}.By definition, where Y denotes a π−distributed random variable.Averaging over x ∈ V , we obtain Now, for fixed x ∈ V , the function f : y → dist(x, y) satisfies f (z) ⩽ f (y)+1 whenever z ∼ y, by the triangle inequality.Thus, Lemma 2 ensures that On the other hand, by the triangle inequality again, the function f : Combining those two estimates readily yields (6).□ Proof of Corollary where the infimum runs over all probability distributions χ ∈ P(V 2 ) with marginals P (x, •) and P (y, •).Minimizers are called optimal couplings.As in [4,18], our first task consists in showing that they can be chosen so as to assign a "decent" probability to the "good" set Lemma 4 (Good optimal couplings).-If P is lazy and x ̸ = y, then there is an optimal coupling χ of P (x, •), P (y, •) such that χ (Γ) ⩾ P min . Proof.
Proof.-Our starting point is the following easily verified inequality: for all x ∈ R, In particular, for all t ∈ N and λ ∈ [0, 1], we have on the event {τ > t},
By the Markov property, the first condition implies that the process Z := (Z t ) t⩾0 defined by Z t := dist(X t , Y t ) is a super-martingale with respect to the natural filtration (F t ) t⩾0 of (X t , Y t ) t⩾0 , and the second implies that P(Z t+1 ̸ = Z t |F t ) ⩾ P min 1 Zt̸ =0 .Thus, Lemma 5 applies and yields P x,y (X t ̸ = Y t ) ⩽ dist(x, y) 10 (t + 1)P min .
where X is π−distributed and independent of Y .The claim follows if we can show J.É.P. -M., 2023, tome 10 3.-If P is lazy and non-negatively curved, then Corollary 2 and Lemma 3 give because P min , Φ ⩽ 1 2 .If P is not lazy, we apply the above result to (P + I)/2.The latter is still non-negatively curved, but its conductance and minimal entry are half those of P , so we loose a factor of 8 and obtain □ 3.3.Diffusive total-variation decay.-In this section, we prove Theorem 4. Fix two distinct states x ̸ = y, and recall that e − λ 2 (Zt+1−Zt) |F t ⩾ E 1 − λ 2 (Z t+1 − Z t ) + 1 ∧ (λZ t+1 − λZ t ) 2where the second line uses our assumptions on Z.It inductively follows that for t ∈ N,Since Z t∧τ − Z 0 ⩾ −Z 0 = −z 0 , we deduce that