A NEW INEQUALITY ABOUT MATRIX PRODUCTS AND A BERGER-WANG FORMULA

. — We prove an inequality relating the norm of a product of matrices A n ··· A 1 with the spectral radii of subproducts A j ··· A i with 1 (cid:54) i (cid:54) j (cid:54) n . Among the consequences of this inequality, we obtain the classical Berger-Wang formula as an immediate corollary, and give an easier proof of a characterization of the upper Lyapunov exponent due to I.Morris. As main ingredient for the proof of this result, we prove that for a large enough n , the product A n ··· A 1 is zero under the hypothesis that A j ··· A i are nilpotent for all i,j such that 1 (cid:54) i (cid:54) j (cid:54) n .


Introduction
Let k be a field, and let M d (k) be the algebra of d × d matrices with coefficients in k. If k = R or C, let . be any norm on k d , with the corresponding operator norm on M d (k) also denoted by . . The spectral radius of a matrix A will be denoted by ρ(A). Given a bounded set M ⊂ M d (k), the joint spectral radius of M is defined by the formula By a submultiplicative argument, this quantity is well defined and finite, and the limit in the right hand side of (1) can be replaced by the infimum over n.
The joint spectral radius was introduced by Rota and Strang [26], and for a set M ⊂ M d (k), represents the maximal exponential growth rate of the partial sequence of products (A 1 · · · A n ) n of a sequence of matrices A 1 , A 2 , . . . with A i ∈ M . For this reason, this quantity has appeared in several mathematical contexts, making it an important object of study (see e.g. [13,14,22,29]). In particular, the question of whether the joint spectral radius may be approximated by periodic sequences plays an important role. The Berger-Wang formula gives a positive answer to this question in the case of bounded sets of matrices [2]:  This result has been generalized by Morris, to the context of linear cocycles (including infinite dimensional ones) [21], by using multiplicative ergodic theory. In the finite dimensional case, the problem of finding a formula similar to (2), when there is a Markov-type constraint on the allowed products was presented by Kozyakin [16]. Although the result of Morris already applies to this kind of constraints, the novelty in Kozyakin's proof is that his arguments are purely linear algebraic, and are consequences of Theorem 1. 1.
Another tool to obtain results related to joint spectral radius was found by Bochi in [4]. In that work, he proved some inequalities that may be seen as lower bounds for joint spectral radii of sets of matrices in terms of the norms of such matrices. Following that method, the purpose of this article is to present an inequality relating the norm of the product of matrices with the spectral radii of subproducts. We will give an upper bound for the norm of the product of matrices A N · · · A 1 in terms of the spectral radii of its subproducts A β A β−1 · · · A α+1 A α . This inequality will allow us to obtain relations similar to (2). It holds in an arbitrary local field where the notions of absolute value, norm, and spectral radius are well defined (see Section 4 for a detailed explanation). Our main result is the following: and C = C(d, . ) > 1 such that for all n N and A 1 , . . . , A n ∈ M d (k): where the right hand side is treated as zero if one of the A i is the zero matrix.
So for large enough n, if the norm of the product A n · · · A 1 is comparable to (that is, not much smaller than) the product of the norms, then there exists a subproduct A β · · · A α whose spectral radius is comparable to (that is, not much smaller than) In addition, when k = C, the constant C in (3) may be chosen independent of the norm . and found explicitly, provided that . is an operator norm (see Proposition 4.2 and Remark 4.5).
The approach of using inequalities to prove results similar to (2) was first used by Elsner [9] in his proof of the Berger-Wang formula (with an inequality of a different nature from Bochi's work). Inequalities like (3) also have been applied by I. Morris to study matrix pressure functions [23] and by the author in the context of isometries in Gromov hyperbolic spaces [24]. The novelty of the inequality presented here is that it respects the order in which the matrices are multiplied. While previous works considered a sum or a maximum over all possible subproducts of length N with respect to a given alphabet of matrices, in Theorem 1.2 we consider just one product of length N together with its subproducts, hence our inequality does not follow from previously known Bochi-type results. In addition, our error in the upper bound in terms of spectral radii is multiplicative (the constant C) and not additive as in the case of Elsner's work. These distinctions allow inequality (3) to be used in cases where only some specific kinds of products are allowed (see Theorem 1.4 below), as well to relate asymptotic quantities (like the joint spectral radius) to non-asymptotic expressions, in a uniformly controlled way (see Theorem 4.4).
The proof of this inequality is based on the non trivial case of equality, where the right hand side of (3) is zero but the matrices A i are non-zero. This occurs when The particular case of (3) that we highlighted can be restated as follows: , then the product A N · · · A 1 is zero.
The proof of Theorem 1.3 is purely linear algebraic, exploiting the properties of the n-exterior power functor. This result may be compared with Levitzki's Theorem [25, Th. 2. 1.7], that asserts that for an algebraically closed field k, every semigroup S ⊂ M d (k) of nilpotent matrices is simultaneously triangularizable. That is, there is some B ∈ GL d (k) such that BAB −1 is upper triangular with zero diagonal for every A ∈ S (compare also with the Burnside-Schur Theorem for semigroups of matrices [20]). In particular, if A 1 , . . . , A d ∈ S, then the product A 1 · · · A d is zero. As we show in Section 2.1, the optimal N (d) in Theorem 1.3 is in general larger than d, therefore the result presented here does not follow from Levitzki's Theorem nor Burnside-Schur Theorem, and we don't expect to obtain any information about the semigroup generated by A 1 , . . . , A N . In general, the matrices satisfying the hypothesis of Theorem 1.3 admit no normal form as simple as in Levitzki's Theorem.
Applications to Ergodic theory. -Let (X, F , µ) be a probability space, and let T : X → X be a measure preserving map. By a linear cocycle over X, we mean a measurable map A : X → M d (k) together with the family of maps A n defined by the formula These maps satisfy the multiplicative cocycle relation We usually denote a linear cocycle by A = (X, T, A), and say that A is integrable if max(log A , 0) is integrable. In this case, Kingman's theorem implies that, for µ-almost all x ∈ X, the limit λ(x) = lim n→∞ log A n (x) n ∈ [−∞, ∞) exists, and moreover, λ is T -invariant. This function is the upper Lyapunov exponent of A, and is one of the most important concepts in multiplicative ergodic theory.
As an application of our inequality, we reprove the following theorem due to I. Morris (first tested numerically in [11] and proved by Avila-Bochi for SL 2 (R) in [1,Th. 15] 1.6] Let T be a measure-preserving transformation of a probability space (X, F , µ) and let A : X → M d (k) be an integrable linear cocycle. If λ is as before, then for µ-almost all x ∈ X we have (4) lim sup n→∞ log(ρ(A n (x))) n = λ(x).
While Morris's proof of this result relies on Oseledets Theorem, we will mainly use Theorem 1.2 and a quantitative version of Poincaré's Recurrence Theorem.
Organization of the paper. -In Section 2 we prove Theorem 1.3 and compute N (d) for d = 2, 3. Then in Section 3, via Nullstellensatz we translate this theorem into a polynomial identity, from which we deduce Theorem 1.2 in Section 4. We prove Theorem 1.4 in Section 5, and discuss some geometric consequences and analogies of this result in Section 6.
Acknowledgment. -I am very grateful to J. Bochi for very interesting and valuable discussions throughout all this work. I also thank G. Urzúa for valuable discussions about Nullstellensatz, and the referee for the detailed report and the suggestions and corrections to the text.

Proof of Theorem 1.3
We begin the proof of Theorem 1.3 with some useful results. For a given vector space V (over an arbitrary field), let End(V ) be the algebra of linear endomorphisms of V . The dimension of the image of a linear transformation T ∈ End(V ) will be denoted as rank(T ). Also, let N n (V ) be the set of n-tuples (T 1 , . . . , T n ) ∈ End(V ) n such that T j · · · T i is nilpotent for all 1 i j n. With our previous notation, we have N n (k d ) = N n d (k).

all distinct and form a linearly independent set.
Proof.
-We will use induction on n. The case n = 1 comes from the nilpotence of T 1 . So, assume that the result holds for tuples in N n−1 (V ) and let (T 1 , . .
and v ∈ V be as in the hypothesis. Take a linear combination of v, T 1 v, . . . , T n · · · T 1 of the form and suppose that this linear combination is non trivial. As (T 1 , . . . , T n−1 ) ∈ N n−1 (V ) also satisfies the hypothesis with respect to v, by our inductive assumption we have λ n = 0. Now, apply T n · · · T 1 in (5). The rank condition over the maps T j and the fact that (T 1 , . . . , T n ) ∈ N n (V ) imply that (T j · · · T 1 ) 2 = 0, for all 1 j n. Hence, the left hand side of (5) becomes λ 0 T n · · · T 1 v, forcing λ 0 = 0. But in that case, equation (5) would be a non trivial linear combination of {w, T 2 w, T 3 T 2 w, . . . , T n · · · T 2 w}, with w = T 1 v. This is impossible by our inductive assumption, since (T 2 , . . . , T n ) ∈ N n−1 (V ) satisfies the hypothesis of the proposition with respect to w. We conclude that all linear combinations of v, T 1 v, T 2 T 1 v, . . . , T n · · · T 1 v of the form (5) are trivial, and hence this set is linearly independent with exactly n + 1 elements.
Proof. -Assume the contrary and let v ∈ V be such that T d · · · T 1 v = 0. Then by For the next steps in our proof we need some fact about exterior powers. Recall that if V is a vector space of dimension d, the r-fold exterior power Λ r V is the vector space of alternating r-linear forms on the dual space V * (see e.g. [18,XIX.1 The exterior power also induces a map Λ r : End(V ) → End(Λ r V ) given by the linear extension of (Λ r T )(w 1 ∧ · · · ∧ w r ) = (T w 1 ∧ · · · ∧ T w r ). This map is functorial: The relation Λ r (ST ) = Λ r (S) Λ r (T ) holds for all S, T ∈ End(V ). This functor also induces a map Λ r : Another important fact is that, when T ∈ N (V ) and rank(T ) = r > 0, then rank(Λ r T ) = 1. This is because the image of Λ r T is generated by any r-form associated to the r-dimensional subspace T (V ). This remark is crucial in the end of our proof.
Proof of Theorem 1. 3 , and by our inductive hypothesis, we obtain rank( T j ) d − − 1. So, we are in the assumption of 2.3 with r = d− −1 and we conclude that rank(T r( +1) · · · T 1 ) = rank( T ( d +1 ) · · · T 1 ) < d− −1. This proves the claim and concludes the proof of the theorem.
In particular, we conclude that N (2) = 2, and for higher dimensions we get the bounds 3 N (3) 9 and 4 N (4) 96. We end this section by finding a better bound for N (3).
To prove this, we need a lemma:

A polynomial identity
For the proof of Theorem 1.2 we need some notation. Let k be a field with algebraic closure k.  ((a i,j ) i,j ) where (a i,j ) j are the coefficients of A i in some fixed order.
where c ∈ k, u i,j 0 and j u i,j = λ i for all 1 i N , and that a polynomial p ∈ R d,N is multihomogeneous of multidegree mdeg p if it is a finite sum of multihomogeneous monomials of multidegree mdeg p. This is equivalent to say that, for each 1 i N , p is homogeneous of degree λ i in the variables x i, 1 where R d,N,λ denotes the vector space of multihomogeneous polynomials of multidegree λ. to the j-th entry of A N · · · A 1 . Also, for 1 d and 1 α β N , let T α,β ∈ R d,N be the polynomial that represents the map (A 1 , . . It is not hard to see that f j are multihomogeneous of multidegree (1, 1, . . . , 1, 1) and that T α,β are multihomogeneous of multidegree (0, . . . , 0, , . . . , , 0, . . . , 0), with the 's in positions α, α + 1, . . . , β.
Our purpose is to prove the following: such that for all 1 j d 2 there exist multihomogeneous polynomials p α,β j, ∈ R d,N of multidegree r mdeg f j − mdeg T α,β ∈ N N such that  (7) for all j. Finally, by the direct sum decomposition (6) and by comparing multidegrees, we may assume that p α,β,γ j, are multihomogeneous of multidegree r mdeg f j − mdeg T α,β,γ .

Proof of Theorem 1.2
Theorem 3.1 is the fundamental relation that we will need to prove inequality (3). For the next we will assume that k is a local field. That is, a field together with an absolute value |.| : k → R + that induces a non-discrete locally compact topology on k via the induced metric. Examples of these include R, C with the standard absolute values and fields of p-adic numbers Q p for a prime p. For more information about local fields, see [19].
We will work on the finite dimensional vector space k d , where k is a local field with absolute value |.|. In this situation, we consider the norm on M d (k) given by A 0 = max 1 j d 2 |a j |, where a j are the entries of A. Since the absolute value on k extends in a unique way to an absolute value on k (see Lang's Algebra [18, XII.2, Prop. 2.5]), the spectral radius of a matrix A ∈ M d (k) is then defined in the usual way. The height h(f ) of a polynomial f ∈ k[y 1 , . . . , y m ] is defined as the logarithm of the maximum modulus of its coefficients.
We begin with a lemma.
The lemma then follows by noting that a multihomogeneous polynomial of multidegree (λ 1 , . . .
Proof of Theorem   N = N (d) and r > 1 be given by Theorems 1.3 and 3.1 respectively, and consider first n = N and the norm . 0 First, note that for 1 α β N and 1 d, T α,β (A 1 , . . . , A N ) is the -th symmetric polynomial evaluated at the eigenvalues of A β · · · A α . Hence we have . Also, as the polynomials p α,β j, in the statement of Theorem 3.1 have multidegree (r, . . . , r, r − , . . . , r − , r, . . . , r), by Lemma 4.1 we have for all j, α, β, . Thus, from (7) we obtain the following:

Now, let
An easy computation shows that AB 0 d A 0 B 0 for all A, B ∈ M d (k). Moreover, by the Gelfand's formula ρ(A) = lim n→∞ A n 0 1/n we obtain ρ(A) d A 0 for all A ∈ M d (k). These facts together imply that Λ d N +1 , and hence Λ d d (N +1)(d−1) Λ. Also, depending on whether Λ is greater than 1 or not, we Applying r-th root to the last inequality, we obtain (

This implies
and proves the statement for n = N . For a general n N the result follows by applying (3) to the sequence A 1 , . . . , A N −1 , A N A N +1 · · · A n , and then using the submultiplicativity of . .  We will need the following lemma which is a consequence of John's ellipsoid theorem [27, Th. 10. 12.2] (see also [4,Lem. 3.2]): Given This inequality was first proved by Bochi in [4] (without giving and effective bound on N (d)), and it has Theorem 1.1 as an immediate consequence. In Breuillard used this result to study semigroups of invertible matrices.
For an arbitrary operator norm . on M d (C), take supremum for A i ∈ M in both sides of (3). We obtain Now, recall that R(M ) = inf . sup A∈M A , where the infimum is taken over all operator norms on M d (C) (for a proof, see [26]), and let . n be a sequence of operator norms on M d (C) such that sup A∈M A n → R(M ). Taking a subsequence, we may assume that for all . n , the maximum in the right hand side of (8) is achieved by the same index j ∈ {1, . . . , N }. Then, taking limit as n tends to infinity in (8) we will have (here is where we use Proposition 4.2 since C(d) does not depend on n). If R(M ) = 0 the conclusion is obvious. Otherwise, dividing by R(M ) N −j/r and taking j/r-th root in (9) we obtain the desired inequality. Hence

Ergodic-theoretical consequences
For the proof of Theorem 1.4, we will need the following result which may be seen as a quantitative version of Poincaré's Recurrence theorem for measure preserving transformations. It is a consequence of Birkhoff Ergodic Theorem, and the fact that for a measurable set U of positive measure, for almost all points x in U , the frequency of points of the sequence x, T x, T 2 x, . . . that belong to U is positive (compare with the subbaditive ergodic theorem of Karlsson-Gouëzel [12, Th. 1

This is a measurable T -invariant set, and since ρ(A)
A for all A ∈ M d (k), we have that both sides of(4) equal −∞ for µ-almost all x ∈ X\Y . So we only have to check the result µ-a.e. in Y .
Assume the contrary. That is, assume the existence of some ε > 0, K ∈ N and a measurable set U ⊂ Y of positive measure such that, for all x ∈ U , if n K, then log ρ(A n (x))/n + ε λ(x). By Egorov's theorem, and restricting to a smaller subset if necessary, we may assume that on U , log A n (x) /n converges uniformly to λ(x).
Let N, r and C be as in the statement of Theorem 1.2 and let ε = ε/(2 + 6N r). By the uniform convergence assumption, there is some M 1 such that, n M implies Take x ∈ U and N 0 (x) ∈ N such that Lemma 5.1 holds with γ = 1/3N , and let n max(3N M, 3N K, 3N r log C/ε , N 0 (x)). Let m 0 = 0, and given 1 i N let m i be such that 1 m i n and and T mi x ∈ U . We have By the cocycle relation, we obtain A N · · · A 1 = A m N (x), and hence for some 1 α β N . But, by definition, T mi x ∈ U for all i, and as m i − m i−1 M , (10) applies. Combining it with (12) we have On the other hand, by (11) we have m N m β − m α−1 < n n(β − α + 1)/N − 2n/3N 3N.

Geometric remarks
We can observe that the main ingredients of the proof of Theorem 1.4 are Theorem 1.2 and Poincaré's recurrence Theorem. Therefore, if we have another situation where an analogue of inequality (3) holds, then we should obtain a result similar to Theorem 1. 4. This is the case of cocycles of isometries of Gromov hyperbolic spaces. For definition and further properties of Gromov hyperbolicity see [6,7,8].
As it was proved in [24, Th. 1.2], if M is a Gromov hyperbolic space with distance d, then there is a constant C > 0 such that, for all o ∈ M and f, g isometries of M we have where d ∞ (h) = lim n→∞ d(h n o, o)/n is the stable length.
In this context, given a probability space (X, F , µ) and a measure preserving map T : X → X, a cocycle of isometries of M is a measurable map A : X → Isom(M ), where Isom(M ) is the group of isometries of M , endowed with the Borel σ−algebra induced by the compact-open topology. We say that the cocycle A is integrable if the map x → d(A(x)o, o) is integrable for some (and hence all) o ∈ M . In the same way as for linear cocycles, we define the family of maps A n : X → Isom(M ). For references about cocycles of isometries, see e.g. [12,15].
Following the same steps of the proof of Theorem 1.4, we can obtain the following: A result similar to Proposition 6.1 is far from being true if we do not assume a negative curvature condition on M .
Example 6.2. -Let X = S 1 and µ be the Lebesgue measure on X. If T (z) = z 2 is the doubling map on X, which preserves µ, and R a (p) = p + a is the translation by a = 0 on R 2 , define a cocycle A : S 1 → Isom(R 2 ) by A(z)p = T (z)R a (z −1 p) for all p ∈ R 2 . Note that A n (z)p = T n (z)R n a (z −1 p) and hence the limit lim n→∞ d(A n (z)p, p)/n exists and equals |a| > 0 for all z ∈ S 1 and p ∈ R 2 . On the other hand, if z is not a periodic point for T , then A n (z) is not a translation and hence has a fixed point. Thus we have that d ∞ (A n (z)) = 0 for all n ∈ N and all z in the set of non periodic point of T , which is a full measure set with respect to µ.