Product set growth in Burnside groups

Given a periodic quotient of a torsion-free hyperbolic group, we provide a fine lower estimate of the growth function of any sub-semi-group. This generalizes results of Razborov and Safin for free groups.


Introduction
If V is a subset in a group G, we denote by V r ⊂ G the set of all group elements that are represented by a product of exactly r elements of V . In this paper we are interested in the growth of V r . Such a problem has a long history which goes back (at least) to the study of additive combinatorics. See for instance [Nat96,TV06]. In the context of nonabelian groups, it yields to the theory of approximate subgroups, see [Tao08,BGT12], and relates to spectral gaps in linear groups, see [Hel08,BG08,BG12], as well as exponential growth rates of negatively curved groups, [Kou98,AL06,BF21,FS20].
If G is a free group, Safin [Saf11], improving former results by Chang [Cha08] and Razborov [Raz14], proves that there exists c > 0 such that for every finite subset V ⊂ G, either V is contained in a cyclic subgroup, or for every r ∈ N, we have This estimate can be thought of as a quantified version of the Tits alternative in G. A similar statement holds for SL 2 (Z) [Cha08], free products, limit groups [But13] and groups acting on δ-hyperbolic spaces [DS20]. All these groups display strong features of negative curvature, inherited from a non-elementary acylindrical action on a hyperbolic space. Some results are also available for solvable groups [Tao10,But13], as well as mapping class groups and right-angled Artin groups [Ker21].
By contrast, in this work, we focus on a class of groups which do not admit any non-elementary action on a hyperbolic space, namely the set of infinite groups with finite exponent, often referred to as of Burnside groups.
1.1. Burnside groups of odd exponent. -Given a group G and an integer n, we denote by G n the subgroup of G generated by all its n-th powers. We are interested in quotients of the form G/G n which we call Burnside groups of exponent n. If G = F k is the free group of rank k, then B k (n) = G/G n is the free Burnside group of rank k and exponent n. The famous Burnside problem asks whether a finitely generated free Burnside group is necessarily finite.
Here, we focus on the case that the exponent n is odd. By Novikov's and Adian's solution of the Burnside problem, it is known that B k (n) is infinite provided k 2 and n is a sufficiently large odd integer [Adi79]. See also [Ol'82,DG08]. More generally, if G is a non-cyclic, torsion-free, hyperbolic group, then the quotient G/G n is infinite provided n is a sufficiently large odd exponent [Ol'91, DG08]. Our main theorem extends Safin's result to this class of Burnside groups of odd exponents.
Remark 1.1. -Note that free Burnside groups of sufficiently large even exponents are also infinite. This was independently proved by Ivanov [Iva94] and Lysenok [Lys96]. Moreover any non-elementary hyperbolic group admits infinite Burnside quotients, see [IO96,Cou18b]. Nevertheless in the remainder of this article we will focus on torsionfree hyperbolic groups and odd exponents. In Section 1.4 we discuss the difficulties to extend our results to the case of even exponents.
Theorem 1.2. -Let G be a non-cyclic, torsion-free hyperbolic group. There are numbers n 0 > 0 and c > 0 such that for all odd integers n n 0 the following holds. Given a finite subset V ⊂ G/G n , either V is contained in a finite cyclic subgroup, or for all r ∈ N, we have |V r | (c|V |) Observe that the constant c only depends on G and not on the exponent n. Recall that Burnside groups do not act, at least in any useful way, on a hyperbolic space. Indeed, any such action is either elliptic or parabolic. On the other hand, it is wellknown that any linear representation of a finitely generated Burnside group has finite image. Thus our main theorem is not a direct application of previously known results.
Let us mention some consequences of Theorem 1.2. If V is a finite subset of a group G, one defines its entropy by The group G has uniform exponential growth if there exists ε > 0 such that for every finite symmetric generating subset V of G, h(V ) > ε. In addition, G has uniform uniform exponential growth if there exists ε > 0 such that for every finite symmetric subset V ⊂ G, either V generates a virtually nilpotent group, or h(V ) > ε.
Corollary 1.3. -Let G be a non-cyclic, torsion-free hyperbolic group. There are numbers n 0 > 0 and α > 0 such that for all odd integers n n 0 , the following holds. Given a finite subset V ⊂ G/G n containing the identity, either V is contained in a finite cyclic subgroup, or h(V ) α ln |V | α ln 3.
In particular, G/G n has uniform uniform exponential growth.
It was already known that free Burnside groups of sufficiently large odd exponent have uniform exponential growth, see Osin [Osi07,Cor. 1.4] and Atabekyan [Ata09,Cor. 3]. Note that Theorem 2.7 in [Osi07] actually shows that free Burnside groups have uniform uniform exponential growth. Nevertheless, to the best of our knowledge, the result was not proved for Burnside quotients of hyperbolic groups. We shall also stress the fact that, unlike in Corollary 1.3, the growth estimates provided in [Osi07,Ata09] depend on the exponent n. The reason is that the parameter M given for instance by [Osi07,Th. 2.7] is a quadratic function of n.
Given a group G with uniform exponential growth, a natural question is whether or not there exists a finite generating set that realizes the minimal growth rate. The first inequality is a statement à la Arzhantseva-Lysenok for torsion groups, see [AL06,Th. 1]. The philosophy is the following: if the set V has a small entropy, then it cannot have a large cardinality. In particular, if we expect the minimal growth rate to be achieved, we can restrict our investigation to generating sets with fixed cardinality. Note that this is exactly the starting point of the work of Fujiwara and Sela in the context of hyperbolic groups, [FS20].
Let us discuss now the power arising in Theorem 1.2. We claim that, as our estimate is independent of the exponent n, the power (r + 1)/2 is optimal. For this purpose we adapt an example of [Saf11].
Example 1.4. -Let g and h be two elements in B 2 (n) such that g generates a group of order n, that does not contain h. Consider the set V N = {1, g, g 2 , . . . , g N , h}.
Whenever the exponent n is sufficiently large compared to N , we have |V r N | ∼ N [(r+1)/2] while |V N | = N + 1.
Button observed the following fact. Assume that there is c > 0 and ε > 0 with the following property: for all finite subsets V in a group G that are not contained in a virtually nilpotent subgroup, we have |V 3 | c|V | 2+ε . Then G is either virtually nilpotent, or of bounded exponent [But13,Prop. 4.1]. We do not know if such a nonvirtually nilpotent group exists.
1.2. Groups acting on hyperbolic spaces. -In the first part of our paper, we revisit product set growth for a group G acting on a hyperbolic space X, see [DS20, Th. 1.14]. For this purpose, we use the notion of an acylindrical action, see [Sel97,Bow08]. Given a subset U of G, we exploit its ∞ -energy defined as Remark 1.5. -Unlike in [DS20], we will not make use of the 1 -energy. Our motivation is mostly technical. We explain this choice in Section 1.3.
Theorem 1.6 (see Theorem 8.1). -Let G be a group acting acylindrically on a hyperbolic length space X. There exists a constant C > 0 such that for every finite subset U ⊂ G with λ(U ) > C, (1) either |U | C, (2) or there is a subset W ⊂ U 2 freely generating a free sub-semigroup of cardinality Remark 1.7. -For simplicity we stated here a weakened form of Theorem 8.1. Actually we prove that the constant C only depends on the hyperbolicity constant of the space X and the acylindricity parameters of the action of G. The set W is also what we called strongly reduced, see Definition 3.1. Roughly speaking this means that the orbit map from the free semi-group W * to X is a quasi-isometric embedding.
There is quite some literature on finding free sub-semigroups in powers of symmetric subsets U in groups of negative curvature, see [Kou98,AL06,BF21]. We can for example compare Theorem 8.1 to Theorem 1.13 of [BF21]. In this theorem, under the additional assumption that U is symmetric, the authors construct a 2-element set in U r that generates a free sub-semigroup, where the exponent r does only depend on the doubling constant of the space. Let us highlight two important differences. First we do not assume that the set U is symmetric. In particular, we cannot build the generators of a free sub-semigroup by conjugating a given hyperbolic element. Hence the proofs require different techniques. Moreover, for our purpose, it is important that the cardinality of W grows linearly with the one of U . For the optimality of our estimates discussed in the previous paragraph, we require that it is contained in U 2 . The price that we pay for this is the correction term of the order of the ∞ -energy of U .
As the set W constructed in Theorem 1.6 freely generates a free sub-semigroup, we obtain the following estimate on the growth of U r .
Corollary 1.8 (see Corollary 8.2). -Let G be a group acting acylindrically on a hyperbolic length space X. There exists a constant C > 0 such that for every finite U ⊂ G with λ(U ) > C, and for all integers r 0, we have As in the previous statement, the constant C actually only depends on the parameters of the action of G on X. Corollary 1.8 is a variant of [DS20, Th. 1.14], where the correction term of the order of log |U | in this theorem is replaced by a geometric quantity, the ∞ -energy of U . Note that the conclusion is void whenever |U | Cλ(U ). This can be compared with Theorem 1.2 which is not relevant for small subsets V .
1.3. Strategy for Burnside groups. -Let us explain the main idea behind the proof of Theorem 1.2. For simplicity we restrict ourselves to the case of free Burnside groups of rank 2. Let n be a sufficiently large odd exponent. Any known strategy to prove the infiniteness of B 2 (n) starts in the same way. One produces a sequence of groups that converges to B 2 (n) where each G i is a hyperbolic group obtained from G i−1 by means of small cancellation. The approach provided by Delzant and Gromov associates to each group G i a hyperbolic space X i on which it acts properly co-compactly. An important point is that the geometry of X i is somewhat "finer" than the one of the Cayley graph of G i . In particular, one controls uniformly along the sequence (G i , X i ), the hyperbolicity constant of X i as well as the acylindricity parameters of the action of G i , see Proposition 10.1. As we stressed before the constant C involved in Theorem 1.6 only depends on those parameters. Thus it holds, with the same constant C, for each group G i acting on X i . Consider now a subset V ⊂ B 2 (n) that is not contained in a finite subgroup. Our idea is to choose a suitable step j and a pre-image U j in G j such that the ∞ -energy λ(U j ) is greater than C and at the same time bounded from above by a constant C that does not depend on j. The strategy for choosing j is the following. The metric spaces X i defined above come with uniformly contracting maps X i → X i+1 . Hence if V stands for a finite pre-image of V in F 2 , then the energy of its image V i in G i is a decreasing sequence converging to zero. Hence there is a smallest index j such that V admits a pre-image U j+1 in G j+1 whose energy is at most C. Working with the ∞ -energy plays now an important role. Indeed we have a control of the length of every element in U j . This allow us to lift U j+1 to a finite subset U j ⊂ G j whose energy is controlled (i.e. bounded above by some C ). It follows from the minimality of j that the energy of U j is also bounded from below by C. The details of the construction are given in Section 10.3. By Theorem 1.6, we find a "large" subset W ⊂ U 2 j that freely generates a free sub-semigroup. By large we mean that the cardinality of W is linearly bounded from below by the cardinality of U j (hence of V ).
At this point we get an estimate for the cardinality of W r , hence for the one of U r j ⊂ G j , see Corollary 1.8. However the map G j → G/G n is not one-to-one. Nevertheless there is a sufficient condition to check whether two elements g and g in G j have distinct images in G/G n : roughly speaking, if none of them "contains a subword" of the form u m , with m n/3, then g and g have distinct images in G/G n . This formulation is purposely vague here. We refer the reader to Definition 4.1 for a rigorous definition of power-free elements in G j . In particular, the projection G j → G/G n is injective when restricted to a suitable set of power-free elements. Hence it suffices to count the number of power-free elements in W r . This is the purpose of Sections 3 and 4. The computation is done by induction on r following the strategy of the first author from [Cou13].
Again, we would like to draw the attention of the reader to the fact that in this procedure, we took great care to make sure that all the involved parameters do not depend on j.
1.4. Burnside groups of even exponent. -Burnside groups of even exponent have a considerably different algebraic structure. For instance it turns out that the approximation groups G j in the sequence (1) contain elementary subgroups of the form D ∞ ×F where F is a finite subgroup with arbitrary large cardinality that embeds in a product of dihedral groups. In particular one cannot control acylindricity parameters along the sequence (G i ), which means that our strategy fails here. It is very plausible that Burnside groups of large even exponents have uniform uniform exponential growth. Nevertheless we wonder if Theorem 1.2 still holds for such groups.
Acknowledgements. -The second author thanks Thomas Delzant for related discussion during his stay in Strasbourg. We thank the coffeeshop Bourbon d'Arsel for welcoming us when the university was closed down during the pandemic, and for serving a wonderful orange cake. We thank the referees for their careful reading and helpful comments.

Hyperbolic geometry
We collect some facts on hyperbolic geometry in the sense of Gromov [Gro87], see also [CDP90,GdlH90].
2.1. Hyperbolic spaces. -Let X be a metric length space. The distance of two points x and y in X is denoted by |x − y|, or |x − y| X if we want to indicate that we measure the distance in X. If A ⊂ X is a set and x a point, we write d(x, A) = inf a∈A |x − a| to denote the distance from x to A. Let A +α = {x ∈ X | d(x, A) α} be the α-neighborhood of A. Given x, y ∈ X, we write [x, y] for a geodesic from x to y (provided that such a path exists). Recall that there may be multiple geodesics joining two points. We recall that the Gromov product of y and z at x is defined by We will often use the following facts each of which is equivalent to the triangle inequality: for every x, y, z, t ∈ X, A similar useful inequality is Indeed, after unwrapping the definition of Gromov's products, it boils down to the triangle inequality.
Definition 2.1. -Let δ 0. The space X is δ-hyperbolic if for every x, y, z and t ∈ X, the four point inequality holds, that is If δ = 0 and X is geodesic, then X is an R-tree. From now on, we assume that δ > 0 and that X is a δ-hyperbolic metric length space. We denote by ∂X the boundary at infinity of X. Hyperbolicity has the following consequences. ( The distance |s − t| is bounded above by and γ is a L-local (k, )-quasi-geodesic if any subpath of γ whose length is at most L is a (k, )-quasi-geodesic. The next lemma is used to construct (bi-infinite) quasigeodesics.
(3) If, in addition, X is geodesic, then [x 1 , x n ] lies in the 5δ-neighborhood of the broken geodesic γ = [x 1 , Remark 2.4. -Note that the result still holds if n = 1 or n = 2. Indeed the statement is mostly void, or follows from the definition of Gromov products. One just need to replace the error term 2(n − 3)δ in (1) by zero. Thus in the remainder of the article, we will invoke Lemma 2.3 regardless how points are involved.
We denote by L 0 the smallest positive number larger than 500 such that for every ∈ [0, 10 5 δ], the Hausdorff distance between any two L 0 δ-local (1, )-quasi-geodesic with the same endpoints is at most (2 + 5δ 2.3. Quasi-convex subsets. -A subset Y ⊂ X is α-quasi-convex if for all two points x, y ∈ Y , and for every point z ∈ X, we have d(z, Y ) (x, y) z + α. For instance, geodesics are 2δ-quasi-convex.
If Y ⊂ X, we denote by |·| Y the length metric induced by the restriction of |·| X to Y . A subset Y that is connected by rectifiable paths is strongly-quasi-convex if it is 2δ-quasi-convex and if for all y, y ∈ Y , |y − y | X |y − y | Y |y − y | X + 8δ.

2.4.
Isometries. -Let G be a group that acts by isometries on X. Let g ∈ G. The translation length of g is The stable translation length of g is Those two quantities are related by the following inequality: g ∞ g g ∞ +16δ. See [CDP90, Ch. 10, Prop. 6.4]. The isometry g is hyperbolic if, and only if, its stable translation length is positive, [CDP90, Ch. 10, Prop. 6.3].
Definition 2.5. -Let d > 0 and U ⊂ G. The set of d-quasi-fixpoints of U is defined by The axis of g ∈ G is the set A g = Fix(g, g + 8δ).
(2) given x ∈ X and L 0, if sup u∈U |ux − x| d + 2L, then x ∈ Fix(U, d) +L+7δ . Corollary 2.7 ([DG08, Prop. 2.3.3]). -Let g be an isometry of X. Then A g is 10δquasi-convex and g-invariant. Moreover, for all x ∈ X, 2.5. Acylindricity. -We recall the definition of an acylindrical action. The action of G on X is acylindrical if there exists two functions N, κ : R + → R + such that for every r 0, for all points x and y at distance |x − y| κ(r), there are at most N (r) elements g ∈ G such that |x − gx| r and |y − gy| r.
Recall that we assumed X to be δ-hyperbolic, with δ > 0. In this context, acylindricity satisfies a local-to-global phenomenon: if there exists N 0 , κ 0 ∈ R + such that for all points x and y at distance |x − y| κ 0 , there are at most N 0 elements g ∈ G such that |x − gx| 100δ and |y − gy| 100δ, then the action of G is acylindrical, with the following estimates for the functions N and κ: (4) κ(r) = κ 0 + 4r + 100δ and N (r) = r 5δ See [DGO17,Prop. 5.31]. This motivates the next definition.
Definition 2.8. -Let N, κ ∈ R + . The action of G on X is (N, κ)-acylindrical if for all points x and y at distance |x − y| κ, there are at most N elements g ∈ G such that |x − gx| 100δ and |y − gy| 100δ.
We need the following geometric invariants of the action of G on X. The limit set of G acting on X consists of the accumulation points in the Gromov boundary ∂X of X of the orbit of one (and hence any) point in X. By definition, a subgroup E of G is elementary if the limit set of E consists of at most two points.
Definition 2.10. -The acylindricity parameter is defined as where U runs over the subsets of G that do not generate an elementary subgroup.
Definition 2.11. -The ν-invariant is the smallest natural number ν = ν(G, X) such that for every g ∈ G and every hyperbolic h ∈ G the following holds: if g, hgh −1 , . . . , h ν gh −ν generate an elementary subgroup, then so do g and h.
Remark 2.12. -In the above definitions we adopt the following conventions. The diameter of the empty set is zero. If G does not contain any hyperbolic isometry, then τ (G, X) = ∞. If every subgroup of G is elementary, then A(G, X) = 0.
The parameters A(G, X) and ν(G, X) allow us to state the following version of Margulis' lemma.
If there is no ambiguity we simply write τ (G), A(G), and ν(G) for τ (G, X), A(G, X), and ν(G, X) respectively. Sometimes, if the context is clear, we even write τ , A, or ν.
If the action of G on X is (N, κ)-acylindrical, then τ δ/N , while A and ν are finite. In fact, one could express upper bounds on A and ν in terms of N , κ, δ, and L 0 . See for instance [Cou16,§6]. However, for our purpose we need a finer control on these invariants.
From now on we assume that κ δ and that the action of G on X is (N, κ)-acylindrical.
2.6. Loxodromic subgroups. -An elementary subgroup is loxodromic if it contains a hyperbolic element. Equivalently, an elementary subgroup is loxodromic if it has exactly two points in its limit set. If h is a hyperbolic isometry, we denote by E(h) the maximal loxodromic subgroup containing h. Let E + (h) be the maximal subgroup of E(h) fixing pointwise the limit set of E(h). It is known that the set F of all elliptic elements of E + (h) forms a (finite) normal subgroup of E + (h) and the quotient Definition 2.14 (Invariant cylinder). -Let E be a loxodromic subgroup with limit set {ξ, η}. The E-invariant cylinder, denoted by C E , is the 20δ-neighborhood of all L 0 δ-local (1, δ)-quasi-geodesics with endpoints ξ and η at infinity.

Lemma 2.15 (Invariant cylinder). -Let E be a loxodromic subgroup. Then
• C E is 2δ-quasi-convex and invariant under the action of E. If, in addition, X is proper and geodesic, then C E is strongly quasi-convex [Cou14, Lem. 2.31],

Periodic and aperiodic words
Let U be a finite subset of G containing at least two elements. We denote by U * the free monoid generated by U . We write π : U * → G for the canonical projection. In case there is no ambiguity, we make an abuse of notations and still write w for an element in U * and its image under π. We fix a base point p ∈ X. Recall that the action of G on X is (N, κ)-acylindrical.
The set U is α-strongly reduced (at p) if, in addition, for every distinct u 1 , u 2 ∈ U , we have We say that U is reduced at p (respectively strongly reduced at p) if there exists α > 0 such that U is α-reduced at p (respectively α-strongly reduced at p).
In practice, the base point p is fixed once and for all. Thus we simply say that U is (α-)reduced or (α-)strongly reduced.
Remark 3.3. -Roughly speaking, the geodesic extension property has the following meaning: if the geodesic [p, w p] extends [p, wp] as a path in X, then w extends w as a word over U .
Proof. -We first prove the geodesic extension property. Let w = u 1 · · · u m and w = u 1 · · · u m be two words in U * such that (p, w p) wp α + 145δ. We denote by r the largest integer such that u i = u i for every i ∈ {1, . . . , r − 1}. For simplicity we let q = u 1 · · · u r−1 p = u 1 · · · u r−1 p.
Assume now that contrary to our claim w is not a prefix of w , that is r − 1 < m. We claim that (wp, w p) q < |u r p − p| − α − 148δ. If r − 1 = m , then w p = q and the claim holds. Hence we can suppose that r − 1 < m . It follows from our choice of r that u r = u r . We let t = u 1 · · · u r p and t = u 1 · · · u r p.
Since U is α-strongly reduced, we have It follows then from the four point inequality that Applying Lemma 2.3(2) with the sequence of points q = u 1 · · · u r−1 p, t = u 1 · · · u r p, u 1 · · · u r+1 p, . . . , wp = u 1 · · · u m p, we get (q, wp) t (q, u 1 · · · u r+1 p) t + 2δ = (u −1 r p, u r+1 p) p + 2δ α + 2δ. (note that the last inequality follows from the fact that U is α-reduced). Hence Thus the minimum in (5) cannot be achieved by (t, wp) q . Similarly, it cannot be achieved by (w p, t ) q either. Thus which completes the proof of our claim.
Using Lemma 2.3(1) with the sequence of points Consequently, |wp − q| |u r p − p|. Combined with the previous claim, it yields Applying again the four point inequality, we get It follows from our previous computation that the minimum cannot be achieved by (q, w p) wp . We proved previously that |wp − q| |u r p − p|. Reasoning as in our first claim, Lemma 2.3(2) yields (p, wp) q α + 2δ. Since U is α-reduced we get Hence the minimum in (6) cannot be achieved by (p, q) wp either, which is a contradiction. Consequently, w is a prefix of w . Let us prove now that U freely generates a free sub-semi-group of G. Let w 1 , w 2 ∈ U * whose images in G coincide. In particular (p, w 1 p) w2p = 0 = (p, w 2 p) w1p . It follows from the geodesic extension property that w 1 is a prefix of w 2 and conversely. Thus w 1 = w 2 as words in U * .
3.1. Periodic words. -From now on, we assume that U is α-strongly reduced (in the sense of Definition 3.1). We let λ = max u∈U |up − p|. We denote by |w| U the word metric of w ∈ U * . Given an element w = u 1 · · · u m in U * , we let Remark 3.5. -Note that the definition does not require m to be an integer. Let E be a maximal loxodromic subgroup such that p belongs to the (α + 100δ)-neighborhood of C E . Let v ∈ U * whose image in G is a hyperbolic element of E. Then for every integer m 0, the element v m+1 is m-periodic with period E. The converse is not true; that is, an m-periodic word with period E is not necessarily contained in E.
If m is sufficiently large, then periods are unique in the following sense.
Proposition 3.6. -There exists m 0 0 which only depends on δ, A, ν, τ and α such that for every m m 0 the following holds.
Hence there exists m 0 0 which only depends on δ, A, ν, τ and α such that if It follows from [Cou16,Prop. 3.44] that h 1 and h 2 generates an elementary subgroup, Remark 3.7. -For all w ∈ U * , we have λ|w| U |wp − p|. In particular, if w is an m-periodic word with period E, then |w| U > mτ (E)/λ.
Consider now a general non-empty word w = u 1 · · · u r in U * . We claim that |wp − p| > 2α + 298δ|w| U . Indeed applying Lemma 2.3(1) with the sequence of points p, u 1 p, u 1 u 2 p, . . . , wp = u 1 · · · u r p, Combining the previous inequalities we get the announced estimate. Consequently, Proposition 3.8.
-Let E be a maximal loxodromic subgroup. Let m 0. There are at most two elements in U * which are m-periodic with period E, but whose proper prefixes are not m-periodic. Proof.
-Let E be a maximal loxodromic subgroup. Let P E be the set of m-periodic words w ∈ U * with period E. Assume that P E is non-empty, otherwise the statement is void. Let η − and η + be the points of ∂X fixed by E and γ : R → X be an L 0 δ-local (1, δ)-quasi-geodesic from η − to η + . For any w ∈ P E , the points p and wp lie in the (α + 100δ)-neighborhood of C E , hence in the (α + 120δ)-neighborhood of γ. Without loss of generality, we can assume that q = γ(0) is a projection of p on γ. We decompose P E in two parts as follows: an element w ∈ P E belongs to P + E (respectively P − E ) if there is a projection γ(t) of wp on γ with t 0 (respectively t 0). Observe that a priori P − E and P + E are not disjoint, but that will not be an issue for the rest of the proof.
We are going to prove that P + E ∩U * contains at most one word satisfying the proposition. Let w 1 and w 2 be two words in P + E ∩ U * which are m-periodic with period E, and whose proper prefixes are not m-periodic. We write q 1 = γ(t 1 ) and q 2 = γ(t 2 ) for the respective projections of w 1 p and w 2 p on γ. Without loss of generality we can assume that t 1 t 2 . We are going to prove that (p, w 2 p) w1p α + 145δ. As a quasigeodesic, γ is 9δ-quasi-convex [Cou14, Cor. 2.7(2)]. According to Remark 3.7, the word w 2 is not empty and |w 2 p − p| > 2α + 298δ. Applying the triangle inequality we get |q 2 − q| > 19δ. Recall that q and q 2 are respective projections of p and w 2 p on the quasi-convex γ. Hence Cor. 2.12(2)]. Since q 1 lies on γ between q and q 2 we also have Combining the previous two inequalities, we get Thus (w 2 p, p) q1 25δ. According to the triangle inequality, we get which completes the proof of our claim.
Applying the geodesic extension property (see Lemma 3.2) we get that w 1 is a prefix of w 2 . As w 1 is m-periodic, it cannot be a proper prefix, hence w 1 = w 2 . Similarly, P − E ∩ U * has at most one element satisfying the statement.

The growth of aperiodic words
Definition 3.9. -Let w ∈ U * and let E be a maximal loxodromic subgroup. We say that the word w contains an m-period of E if w splits as w = w 0 w 1 w 2 , where the word w 1 is m-periodic with period E. If the word w does not contain any m-period, we say that w is m-aperiodic.
Observe that containing a period is a property of the word w ∈ U * and not of its image π(w) in G: one could find two words w 1 and w 2 , where w 1 is m-aperiodic while w 2 is not, and that have the same image in G. However since U is strongly reduced, it freely generates a free sub-semigroup of G. Hence this pathology does not arise in our context.
We denote by U * m the set of m-aperiodic words in U * . Recall that p is a base point of X and the parameter λ is defined by Indeed, for all u ∈ U and loxodromic subgroups E, So, by Remark 3.7, u cannot be m-periodic.
We denote by S(r) the sphere of radius r in U * . Similarly B(r) ⊂ U * stands for the ball of radius r, that is the subset of elements w ∈ U * of word length |w| U r. We note that |B(r)| |U | r+1 , since |U | 2.
Proposition 3.11. -Let U be a α-strongly reduced subset of G, with at least two elements. There exists m 1 which only depends on λ, α, A, ν, τ , and δ with the following property. For all m m 1 , and r > 0, we have Proof. -We adapt the counting arguments of [Cou13]. We firstly fix some notations. Let m 0 be the parameter given by Proposition 3.6. Recall that m 0 only depends on α, A, ν, τ , and δ. Let U ⊂ G be an α-strongly reduced subset, with at least two elements. Let m > m 0 + 5λ/τ . We let We denote by E the set of all maximal loxodromic subgroups in G. For each E ∈ E , let Z E ⊂ Z be the subset of all w ∈ Z that split as a product w = w 1 w 2 , where w 1 ∈ U * m and w 2 ∈ U * is an m-periodic word with period E.
Proof. -Let w ∈ Z contain an m-period of a loxodromic subgroup E ∈ E . By definition of Z, we have w = w 0 u, where u ∈ U and the prefix w 0 ∈ U * does not contain any m-period. On the other hand w contains a subword w 2 which is an m-period with period E. Since w 2 cannot be a subword of w 0 , it is a suffix of w.
Recall that if W ⊂ U * , then |W | stands for the cardinality of the image of W in G. However, since U freely generates a free sub-semi-group (Lemma 3.2), we can safely identify the elements of U * with their images in G. It follows from Lemma 3.12, that for all natural numbers r, The next step is to estimate each term in the above inequality.
Lemma 3.13. -For all real numbers r, Proof. -It is a direct consequence of the fact that U freely generates a free subsemi-group.
-Let E ∈ E . For all real numbers r, Since w also belongs to Z, the prefix consisting of all but the last letter does not contain m-periods. Thus every proper prefix of w 2 cannot be m-periodic. It follows from Lemma 3.8 that there are at most two possible choices for w 2 . Hence the result.
Lemma 3.15. -For all real numbers r, the following inequality holds: Remark 3.16. -Note that the terms in the series on the right hand side are all non-negative. Hence if the series diverges, the statement is void. Later we will apply this lemma in a setting where the series actually converges.
Proof. -Given j 1, we define E j as the set of all maximal loxodromic subgroups E ∈ E , such that jτ τ (E) < (j + 1)τ and U * contains a word that is m-periodic with period E. We split the left-hand sum as follows Indeed if U * does not contain a word that is m-periodic with period E, then the set Z E is empty. Observe that for every E ∈ E j we have by Lemma 3.14 Thus it suffices to bound the cardinality of E j for every j 1.
Let j 1. For simplicity we let d j = (j +1)m 0 τ /δ+1. We claim that |E j | |U | dj +1 . To that end we are going to build a one-to-one map from χ : Let w be the shortest prefix of w that is m 0 -periodic with period E. Note that such prefix always exists since m m 0 . By Remark 3.7, w belongs to B(m 0 τ (E)/δ + 1) hence to B(d j ). We define χ(E) to be w . Observe that there is at most one E such that w is m 0 -periodic with period E (Proposition 3.6). Hence χ is one-to-one. This completes the proof of our claim and the lemma.
We now complete the proof of Proposition 3.11. Let us define first some auxiliary parameters. We fix once for all an arbitrary number ε ∈ (0, 1/2). In addition we let , and M = mτ λ .
Since |U | 2, we observe that σ 1/2. We claim that there exists m 1 m 0 which only depends on λ, α, A, ν, τ , and δ such that provided that m m 1 . The computation shows that then the previous inequality yields We can see from there, that there exists m 1 m 0 which only depends on λ, m 0 , τ , and δ, such that as soon as m m 1 the right hand side of Inequality (8) is nonpositive, which completes the proof of our claim. Up to increasing the value of m 1 , we can assume that M 1, provided m m 1 .
Let us now estimate the number of aperiodic words in U * . From now on we assume that m m 1 . For every integer r, we let c(r) = |U * m ∩ B(r)|. We claim that for every integer r, we have c(r) µc(r − 1). The proof goes by induction on r. In view of Example 3.10, the inequality holds true for r = 1. Assume that our claim holds for every s r. In particular for every integer t 0, we get c(r − t) µ −t c(r). It follows from (7) that Note that jM − 1 0, for every j 1. Thus applying the induction hypothesis we get We defined µ as µ = (1 − ε)|U |, hence it suffices to prove that Recall that γ/µ M σ 1/2. Hence the series converges. Moreover This completes the proof of our claim for r + 1.

Power-free elements
Let G be a group that acts (N, κ)-acylindrically on a δ-hyperbolic geodesic space X. We fix a basepoint p ∈ X. Recall our convention: the diameter of the empty set is zero, see Remark 2.12. If g ∈ G does not contain any m-power, we say that g is m-power-free.
Let U ⊂ G be a finite subset. We recall that λ = max u∈U |up − p| and that U * is the set of all words over the alphabet U . The idea of the next statement is the following. Take a word w ∈ U * . If w, seen as an element of G, contains a sufficiently large power, then the word w already contains a large period. Proof. -Let w = u 1 · · · u l . As w contains a m-power, there is a loxodromic subgroup E and a geodesic [p, wp] such that u p] be a broken geodesic joining p to wp. Let p 1 and p 2 be the respective projections of x 1 and x 2 on γ w . By Lemma 2.3, the geodesic [p, wp] is contained in the 5δ-neighborhood of γ w . Hence p 1 and p 2 are 15δ-close to C E . Moreover, Up to permuting x 1 and x 2 we can assume that p, p 1 , p 2 and wp are ordered in this way along γ w . In particular, there is i − 1 such that p 1 ∈ (u 1 · · · u i ) · [p, u i+1 p], and j − 1 such that p 2 ∈ (u 1 · · · u j ) · [p, u j+1 p]. Since p 1 comes before p 2 on γ w , we have i j. Note that actually i < j. Indeed if i = j, we would have which contradicts our assumption. Let us set w 0 = u 1 · · · u i+1 and take the word w 1 such that u 1 · · · u j = w 0 w 1 . At this stage w 1 could be the empty word. But we will see that this is not the case. Indeed Applying Lemma 2.3 to the subpath γ of γ w bounded by p 1 and p 2 , we get that γ lies in the (α + 14δ)-neighborhood of the geodesic [p 1 , p 2 ]. However p 1 and p 2 are in the 15δ-neighborhood of C E which is 2δ-quasi-convex. Thus γ is contained in the (α + 31δ)-neighborhood of C E . We conclude that w 1 is m -periodic with period w −1 0 Ew 0 .

Energy and quasi-center
Let G be a group acting by isometries on a δ-hyperbolic length space X. Recall that we assume for simplicity that δ > 0. In next sections, we denote by S(x, r) the sphere in X of radius r centered at x. (This should not be confused with the spheres in U * used in the previous section.) Let U ⊂ G be a finite subset. In order to apply the counting results from Section 3, we explain in this section and the followings how to build a strongly reduced subset of U 2 . To that end we define the notion of energy of U .
Definition 5.1. -The ∞ -energy λ(U, x) of U at x is defined by λ(U, x) = max u∈U |ux − x|. The ∞ -energy of U is given by (A, B) to be the set of elements u ∈ U satisfying the following conditions (A, A), and, if there is no ambiguity, U (A, B) = U x (A, B) for short.
Proposition 5.3. -Let q be a point that almost-minimizes the ∞ -energy of U . There exists a quasi-center p for U such that |p − q| λ(U ).
Remark 5.4. -The existence of a quasi-center is already known by [DS20]. The authors prove there that any point almost-minimizing the 1 -energy is a quasi-center. However such a point could be very far from any point almost-minimizing the ∞energy.
Proof. -We describe a recursive procedure to find a quasi-center p. The idea is to construct a quasi-geodesic from q to a quasi-center p. Let x 0 = q and suppose that x 0 , . . . , x i−1 , x i ∈ X are already defined. If x i is a quasi-center for U , we let p = x i and stop the induction. Otherwise, there is a point Our idea is to apply Lemma 2.3 to the sequence of points x 0 , x 1 , . . . , . Like this we can write the distance from x 0 to ux 0 as a function of the index i. We will observe that this function diverges to infinity, which forces the procedure to stop. To do this, we collect the following observations. By construction, we have: Remark 5.6. -Roughly speaking, this lemma tells us that x i−1 , x i , ux i and ux i−1 are aligned in the order of their listing along the neighborhood of the geodesic [x i−1 , ux i−1 ].

Proof
The first point is just a reformulation of the definition of the set According to the triangle inequality we have However, by construction |ux i−1 −x i−1 | > 2|x i−1 −x i |+2δ. Hence the maximum in (9) has to be achieved by (x i−1 , ux i−1 ) xi . The same argument works for (x i , ux i−1 ) uxi . Proof. -We note that |U xi−1 (x +100δ i ) ∩ U xi (x +100δ i+1 )| > |U |/2. Let us fix an element u in this intersection. By Lemma 5.5, (x i−1 , ux i ) xi 102δ and (x i , ux i ) xi+1 101δ. According to the four point inequality we have Observe that Since |x i − x i+1 | = 10 3 δ, the minimum cannot be achieved by (x i+1 , ux i ) xi , whence the result.
This means that the induction used to build the sequence (x i ) stops after finitely many steps. Moreover, when the process stops we have x i = p and λ(U ) 10 3 (i+ 5)δ. For every j i − 1 we have |x j − x j+1 | 10 3 δ, thus |p − q| λ(U ).

Sets of diffuse energy
In this section we assume that the action of G on X is (N, κ)-acylindrical, with κ > 50 · 10 3 δ. Let U ⊂ G be a finite subset. Let p be a quasi-center of U . In this section we assume that U is of diffuse energy (at p) that is for at least 99/100 of the elements of U ⊂ G, we have |up − p| > 2κ.
6.1. Reduction lemma. -We first prove the following variant of the reduction lemmas in [DS20].
Remark 6.2. -In the case of trees, Proposition 6.1 follows directly from [DS20, Lem. 6.4], and the proof of this lemma is due to Button [But13]. The situation is different in the case of hyperbolic spaces. Indeed, in contrast to the reduction lemmas in [DS20, §6.1], the cardinality of U 1 in Proposition 6.1 does not depend the cardinality of balls in X, as in [DS20, Lem. 6.3], and the estimates on the Gromov products do not depend on the logarithm of the cardinality of U , as in [DS20, Lem. 6.8].
We postpone for the moment the proof of this lemma and complete first the demonstration of Proposition 6.1. In case (1) of Lemma 6.3, we set v = u 0 . In case (2) of Lemma 6.3, we may assume, up to exchanging the roles of U 1 and U 2 , that there is v ∈ U 2 such that for all u 1 ∈ U 1 , |u 1 p − p| |vp − p|. This yields Proposition 6.1. See Section 5 for the definition or U p (A, B). The definition of hyperbolicity implies the following useful lemma.
Observe that the complement in U of the previous set is the union of U (z +6δ 0 , S) and U (S, y +6δ 0 ). Recall that |U | > (1 − η)|U |. Thus we can now assume that Since p is a quasi-center, the cardinality of both U 1 and U 2 is bounded above by 3|U |/4. It follows from (11) that each of them contains at least (1/4−4η)|U | elements. Observe also that |y 0 −z 0 | > 30δ. Indeed otherwise both U 1 and U 2 are contained in U (y +100δ 0 ). Hence (11) contradicts the fact that p is a quasi-center. Applying Lemma 6.4 we conclude that U 1 and U 2 satisfy (2).

6.2.
Construction of free sub-semi-groups. -We recall that λ(U ) denotes the ∞ -energy of the finite subset U ⊂ G. By Proposition 5.3, we can assume that the quasi-center p, which we fixed at the beginning of this section, is at distance at most λ(U ) from a point almost-minimizing the ∞ -energy of U . We still assume that the energy of U is diffuse (at p). We treat p as the base point of X.
Remark 6.5. -According to the triangle inequality, we have |up − p| 3λ(U ) + δ, for every u ∈ U . Since the energy of U is diffuse at p, there is an element u ∈ U that moves p by a large distance. As a consequence λ(U ) δ, and thus |up − p| 4λ(U ), for every u ∈ U . This estimates are far from being optimal, but sharp enough for our purpose.
Proposition 6.6. -There exists v ∈ U and a subset W ⊂ U v such that W is 1002δstrongly reduced and |W | 1 10 6 N δ λ(U ) |U |.
Proof. -For simplicity we let α = 1002δ. We fix U 1 and v given by Proposition 6.1. We set T = U 1 v.
Proof. -We write t = uv and t = u v with u, u ∈ U 1 . Applying twice the four point inequality (3) we have Observe that Similarly we prove that (t p, u p) p > α. Hence the minimum in (12) is achieved by (t −1 p, t p) p which proves the first point. By definition of Gromov products we have For every w ∈ T , we set Note that w ∈ A w .
In order to define W , we construct by induction an increasing sequence (W i ) of subsets of T . We first let W 0 = ∅. Assume that now that W i has been defined for some integer i 0. If the set T w∈Wi A w is empty, then the process stops and we let W = W i (note that this will ineluctably happen as T is finite). Otherwise, we choose an element w i+1 in this set for which |p − w i+1 p| is maximal and let W i+1 = W i ∪ {w i+1 }.
Lemma 6.8. -The set W is α-strongly reduced.
Proof. -By Lemma 6.7, the set T (hence W ) is α-reduced. It suffices to prove that for every distinct w, w ∈ W we have (wp, w p) p min{|wp − p|, |w p − p|} − α − 150δ.
Using the notation above, we write, w 1 , w 2 , . . . , w n for the elements W in the order they have been constructed. Let i, j ∈ {1, . . . , n} such that |p − w j p| |p − w i p|. If i < j, then w j does not belong to A wi , thus Assume now that j < i. Note that the sequence {|p − w k p|} is non-increasing, hence |p − w j p| = |p − w i p|. Since w i does not belong to A wj , thus Lemma 6.9. -For every w ∈ T , we have Proof. -Let w ∈ T . The proof goes in two steps. First we give an upper bound for subsets of sparse elements in A w . Let m 0 be an integer.
We assume in addition that |u i p − u j p| > 6 · 10 3 δ, for every distinct i, j ∈ {0, . . . , m}. Let γ : [a, b] → X be a (1, δ)quasi-geodesic from p to wp. We are going to give an upper bound for m. To that end we claim that the points u 0 p, . . . , u m p lie close to γ. Since the points u i p are sparse, this will roughly say that m |wp−p|/ max{|u i p−u j p|}. More precisely, the argument goes as follows. For every i ∈ {0, . . . , m}, we write p i for a projection of u i p onto γ. Up to reindexing the elements we can suppose that the points p, p 0 , p 1 , . . . , p m , wp are aligned in this order along γ.
Since t i belongs to A w , we have On the other hand, we know by construction of U 1 and v that (p, t i p) uip = (u −1 i p, vp) p is at most 10 3 δ, see Proposition 6.1. Hence the triangle inequality yields, see (2), Cor. 2.7(2)]. It follows that |u i p−p i | = d(u i p, γ) is at most 2161δ. According to the triangle inequality we get Observe now that Recall that w is a two letter word in U , while λ(U ) is very large compare to δ. Hence 1678mδ 9λ(U ). To simply the rest of the computations, we will use the following generous estimate m λ(U ) δ .
We now start the second step of the proof. Using acylindricity we reduce the counting of elements in A w to the case of a sparse subset. Any element t ∈ A w can be written t = u t v with u t ∈ U 1 . Consider now t, t ∈ A w .
Indeed the second inequality is just the triangle inequality, while the first one is equivalent to the following known fact (u −1 t p, vp) p 10 3 δ. Similarly we have The difference of the previous two inequalities yields Plugging this inequality in (13) we obtain Finally, by the triangle inequality |p − u t p| − |p − u t p| |u t p − u t p|. This implies the claim.
We let M = 2065N . According to acylindricity -see (4) applied with r = 10306δthe set F = {g ∈ G | |gp − p| 6000δ and |gvp − vp| 10306δ} contains at most M elements. It follows that for every t ∈ A w , there are at most M elements t ∈ A w such that |u t p − u t p| 6 · 10 3 δ. Indeed, if |u t p − u t p| 6 · 10 3 δ, our previous claim implies that u −1 t u t belongs to F . So we can extract a subset B ⊂ A w containing m |A w |/M elements such that for every distinct t, t ∈ B we have |u t p − u t p| > 6 · 10 3 δ. It follows from the previous discussion that m λ(U )/δ. Consequently, Lemma 6.10. -The cardinality of W is bounded from below as follows: Proof. -Recall that w ∈ A w for every w ∈ T . Thus, by construction, the collection of sets {A w } w∈W covers T . We have seen in Lemma 6.9 that the cardinality of each of them is at most 2065N λ(U )/δ. Hence the result.
The previous lemma completes the proof of Proposition 6.6.

Sets of concentrated energy
We still assume here that the action of G on X is (N, κ)-acylindrical, with κ > 50 · 10 3 δ. Let U ⊂ G be a finite subset and p ∈ X a base point. In this section we also assume that U has concentrated energy (at p) that is, there exists U 1 ⊂ U with |U 1 | |U |/100 such that |up − p| 2κ, for all u ∈ U 1 . The goal of the section is to prove the following statement. (1) either |U | 100M ; (2) or there exists v ∈ U and a subset W ⊂ U v such that W is 25κ-strongly reduced and |W | |U |/100M − 1.
Proof. -We assume that |U | > 100M , so that |U 1 | > M . The proof follows the exact same ideas as Lemmas 5.2 and 5.3 of [DS20]. Since the energy λ(U, p) at p is larger than 100κ, there exists v ∈ U satisfying |vp − p| > 100κ. For every u ∈ U 1 , we let Note that by the triangle inequality, |uvp − p| > |vp − p| − |up − p| 98κ, for every u ∈ U 1 . Hence u ∈ B u . Let us fix first an element u ∈ U 1 . We claim that the cardinality of B u is at most M . Recall that X is a a length space, hence there is a point m in X such that |p − m| = 21κ − δ and (p, vp) m δ. Let u ∈ B u . The element u u −1 moves the point up by at most 4κ. We now show that u u −1 moves um by at most 4κ + 8δ. By Lemma 2.2(1) we have On the one hand, we have On the other hand, the triangle inequality yields If we plug in the last two inequalities in (14) we get (p, uvp) um 2δ. Now observe that Similarly (p, u vp) u m 2δ and |p − u m| − |p − m| 2κ. In particular both |p − um| and |p − u m| are at most (uvp, u vp) p . By Lemma 2.2(3) we have |um − u m| max |p − um| − |p − u m| + 4δ, 0 + 4δ 4κ + 8δ, which corresponds to our announcement.
Note that the point up and um, which are "hardly" moved by u u −1 , are far away. More precisely |up − um| = |p − m| = 21κ − δ.
Recall that M = 2κN/δ. Using acylindricity -see (4) with r = 4κ + 8δ -we get that B u contains at most M elements, which completes the proof of our claim. Recall that u ∈ B u , for every u ∈ U 1 . We now fix a maximal subset U 2 ⊂ U 1 such that for every u ∈ U 1 , any two distinct u 1 , u 2 ∈ U 2 never belong to the same subset B u . The cardinality of U 2 is at least |U 2 | |U 1 |/M . Indeed by maximality of U 2 , the U 1 is covered by the collection (B u We claim that there is at most one element u ∈ U 2 such that (v −1 p, uvp) p > 23κ. Assume on the contrary that it is not the case. We can find two distinct element u, u ∈ U 2 such that Thus u belongs to B u which contradicts the definition of U 2 . Recall that |U 1 | > M , hence U 2 contains at least 2 elements. We define then U 3 from U 2 by removing if necessary the element u ∈ U 2 such that (v −1 p, uvp) p > 23κ. Note that We now let W = U 3 v. We are going to prove that W is 25κ-strongly reduced. Note first that |wp − p| |vp − p| − 2κ > 98κ > 50κ + 300δ for every w ∈ W . Let w = uv and w = u v be two elements in W . It follows from the triangle inequality that By construction of U 3 , no element w ∈ W has a large Gromov product with v −1 . Hence (w −1 p, w p) p 25κ. Thus the set W is 25κ-reduced. By choice of U 2 we also have (wp, w p) p < 23κ − δ for every distinct w, w ∈ W . Recall that Consequently, W is 25κ-strongly reduced.

Growth in groups acting on hyperbolic spaces
As a warm-up for the study of Burnside groups we first prove the following statement.
Proof of Theorem 8.1. -Let U ⊂ G be a finite subset such that λ(U ) > 100κ.
Choice of the base-point. -Let q be a point almost-minimizing the ∞ -energy of U . We now fix the base-point p to be a quasi-center for U . By Proposition 5.3, we can assume that |p − q| λ(U ).
Case 1: diffuse energy. -Let us first assume that U is of diffuse energy at p. That is, there is a subset U ⊂ U such that |U | 99|U |/100 and such that for all u ∈ U we have |u p − p| > 2κ. Then, by Proposition 6.6, there exists v ∈ U and a subset W ⊂ U v such that W is α-strongly reduced (with α = 1002δ) and whose cardinality satisfies Case 2: concentrated energy. -Otherwise U is of concentrated energy at p. Indeed, there is a subset U ⊂ U of cardinality |U | |U |/100 such that |u p − p| 2κ, for all u ∈ U . Recall that λ(U ) > 100κ. Assume that |U | > 400κN/δ. By Proposition 7.1, there exists v ∈ U and a subset W ⊂ U v such that W is α-strongly reduced (with α = 25κ) and whose cardinality satisfies This completes the proof of Theorem 8.1.
Corollary 8.2. -Let δ > 0, κ 50 · 10 3 δ, and N > 0. Assume that the group G acts (N, κ)-acylindrically on a δ-hyperbolic length space. For every finite U ⊂ G such that λ(U ) > 100κ and for all integers r 0, we have Proof. -Without loss of generality we can assume that |U | > 400κN/δ. Indeed otherwise the base of the exponential function on the right hand side of the stated inequality is less than one, hence the statement is void. According to Theorem 8.1, there exists v ∈ U and a subset W ⊂ U v such that W is α-strongly reduced and Let s 0 be an integer. On the one hand, Recall that W is contained in U v and freely generates a free-sub-semigroup of G by Lemma 3.2. It follows that for every integer r 0, We now combine Theorem 8.1 with our estimates on the growth of aperiodic words, see Proposition 3.11. If we use Proposition 4.2 to compare the notion of aperiodic words and power-free elements we obtain the following useful growth estimate.
(2) There is v ∈ U with the following property. For every r > 0 and m m 2 , denote by K(m, r) the set of all m-power-free elements in (U v) r . Then, Proof. -Let U ⊂ G be a finite subset such that λ(U ) > 100κ. Without loss of generality we can assume that |U | > max{4κN/δ, 4 · 10 6 N λ(U )/δ}. By Theorem 8.1 there exists v ∈ U and a subset W ⊂ U v such that W is α-strongly reduced with α 25κ and |W | 1 10 6 N δ λ(U ) |U |.
It follows from our choice that |W | 4 and λ(W ) 2λ(U ). Before moving on, let us recall some notations from Section 3. For every integer m, the set W * m stands for the collection of m-aperiodic words in W * . In addition S(r) and B(r) are respectively the sphere and the ball of radius r in W * (for the word metric with respect to W ). In view of Proposition 3.11, there exists m 1 > 0, which only depends on δ, N , κ and λ 0 such that for every m m 1 , for every r 0, we have, Let us now focus on the cardinality of spheres. As W is α-strongly reduced, it generates a free sub-semi-group (Lemma 3.2). Thus If we combine this inequality with (15) and the fact that |W |/4 1, we obtain that

Small cancellation groups
In this section we recall the necessary background on small cancellation theory with a special attention on acylindricity, see Proposition 9.9. The presentation follows [Cou14] in content and notations.
9.1. Cones. -Let Y be a metric length space and let ρ > 0. The cone of radius ρ over Y is the set where ∼ is the equivalence relation which identifies all the points of the form (y, 0) for y ∈ Y . If x ∈ Z(Y ), we write x = (y, r) to say that (y, r) represents x. We let v = (y, 0) be the apex of the cone. If y, y are in Y , we let θ(y, y ) = min {π, |y − y |/ sinh ρ} be their angle at v. There is a metric on Z(Y ) that is characterized as follows, see [BH99,Ch. I.5]. Let x = (y, r) and x = (y , r ) in Z(Y ). Then cosh |x − x | = cosh r cosh r − sinh r sinh r cos θ(y, y ). We let ι : Y → Z(Y ) be the embedding defined as ι(y) = (y, ρ). The metric distortion of ι is controlled by a function µ : R + → [0, 2ρ] that is characterized as follows: for every t ∈ R + , cosh µ(t) = cosh 2 ρ − sinh 2 ρ cos(min{π, t/sinh ρ}).

It turns out that
For all y, y ∈ Y , we have Let us mention some properties of µ for later use. Let H be a group that acts by isometries on Y . Then H acts by isometries on Z(Y ) by hx = (hy, r). We note that H fixes the apex of the cone.
9.2. The cone off space. -From now, we assume that X is a proper, geodesic, δ-hyperbolic space, where δ > 0. We fix a parameter ρ > 0, whose value will be made precise later. In addition, we consider a group G that acts properly co-compactly by isometries on X. We assume that this action is (N, κ)-acylindrical.
We let Q be a collection of pairs (H, Y ) such that Y is closed strongly-quasi-convex in X and H is a subgroup of Stab(Y ) acting co-compactly on Y . Suppose that Q is closed under the action of G given by the rule g(H, Y ) = (gHg −1 , gY ). In addition we assume that Q/G is finite. Furthermore, we let Observe that if ∆(Q) is finite, then H is normal in Stab(Y ), for every (H, Y ) ∈ Q.
Let (H, Y ) ∈ Q. We denote by | · | Y the length metric on Y induced by the restriction of | · | to Y . As Y is strongly quasi-convex, for all y, y ∈ Y , |y − y | X |y − y | Y |y − y | X + 8δ.
We write Z(Y ) for the cone of radius ρ over the metric space (Y, | · | Y ).
We let the cone-off spaceẊ =Ẋ(Y, ρ) be the space obtained by gluing, for each pair (H, Y ) ∈ Q, the cone Z(Y ) on Y along the natural embedding ι : Y → Z(Y ). We let V denote the set of apices ofẊ. We endowẊ with the largest metric | · |Ẋ such that the map X →Ẋ and the maps Z(Y ) →Ẋ are 1-Lipschitz, see [Cou14, §5.1]. It has the following properties.
We recall that µ is the map that controls the distortion of the embedding ι of Y in its cone, see (16). It also controls the distortion of the map X →Ẋ.
The action of G on X then extends to an action by isometries onẊ: given any g ∈ G, a point x = (y, r) in Z(Y ) is sent to the point gx = (gy, r) in Z(gY ). We denote by K the normal subgroup generated by the subgroups H such that (H, Y ) ∈ Q.
9.3. The quotient space. -We let X =Ẋ/K and G = G/K. We denote by ζ the projection ofẊ onto X and write x for ζ(x) for short. Furthermore, we denote by V the image in X of the apices V . We consider X as a metric space equipped with the quotient metric, that is for every We note that the action of G onẊ induces an action by isometries of G on X. The following theorem summarizes Proposition 3.15 and Theorem 6.11 of [Cou14].
(3) Let (H, Y ) ∈ Q. If v ∈ V stands for the apex of the cone Z(Y ), then the projection from G onto G induces an isomorphism from Stab(Y )/H onto Stab(v).
We use point (2) of Theorem 9.4 to compare the local geometry ofẊ and X. To compare the global geometry, we use the following proposition. If, for all v ∈ V , we have Z ∩ B(v, ρ/5 + d + 1210δ) = ∅, then there is a pre-image Z ⊂Ẋ such that the projection ζ induces an isometry from Z onto Z.
In addition, if S ⊂ G such that S Z ⊆ Z +d , then there is a pre-image S ⊂ G such that for every g ∈ S, z, z ∈ Z , we have |g z − z | = |gz − z |Ẋ .
9.4. Group action on X. -We collect some properties of the action of G.
-If v ∈ V and g ∈ G Stab(v), then for every x ∈ X we have |g x − x| 2(ρ − |x − v|).
In combination with assertion (2) of Theorem 9.4, the previous lemma implies that local properties of the action are often inherited from the action of G on the cone-off space. For example, if F is an elliptic subgroup of G, then either F ⊆ Stab(v) for some v ∈ V or it is the image of an elliptic subgroup of G, see [Cou14,Prop. 6.12].
There is a lower bound on the injectivity radius of the action on X, and an upper bound on the acylindricity parameter.
We recall that L 0 is the number fixed in Section 2.2 using stability of quasigeodesics. Note that the proposition actually does not require that finite subgroups of G have odd order. This assumption in [Cou14,Prop. 6.15] was mainly made to simplify the overall exposition in this paper. The error of the order of π sinh(2L 0 δ) in the above estimates is reminiscent of the distortion of the embedding of X intoẊ, measured by the map µ, see Proposition 9.1. 9.5. Acylindricity. -Let us assume that all elementary subgroups of G are cyclic (finite or infinite). In particular, it follows that ν(G, X) = 1, see for instance [Cou14,Lem. 2.40]. Moreover, we assume that for every pair (H, Y ) ∈ Q, there is a primitive hyperbolic element h ∈ G and a number n such that H = h n and Y is the cylinder C H of H.
Remark 9.10. -It is already known that if G acts acylindrically on X, then so does G on X, see Dahmani-Guirardel-Osin [DGO17,Prop. 2.17,5.33]. However in their proof κ is much larger than ρ. For our purpose we need a sharper control on the acylindricity parameters. With our statement, we will be able to ensure that κ ρ. Later we will use this statement during an induction process for which we also need to control uniformly the value of N . Unlike in [DGO17], if N is very large, our estimates tells us that N N .
and let us assume that diam Z κ. We are going to prove that S contains at most N elements. We distinguish two cases: either S fixes an apex v ∈ V or not.
Recall that κ > 100δ. Denote by z the point on the geodesic [v, x] at distance 100δ from v, so that z ∈ B(v, ρ/2). Since Z is 10δ-quasi-convex, z lies in the the 10δ-neighborhood of Z. In particular, for all s ∈ S, we have |s z − z| 120δ.
Let v be a pre-image of v and z a pre-image of z in the ball B(v, ρ/2). For every s ∈ S, we choose a pre-image s ∈ G such that |sz − z|Ẋ 120δ and write S for the set of all pre-images obtained in this way. Observe that by the triangle inequality, |sv − v|Ẋ ρ + 120δ, for every s ∈ S. However any two distinct apices inẊ are at a distance at least 2ρ. Thus S is contained in Stab(v). If (H, Y ) ∈ Q is such that v is the apex of the cone Z(Y ), then, by Lemma 9.
Let y be a radial projection of z on Y . By the very definition of the metric on Z(Y ), we get that |sy − y| < π sinh ρ. Recall that every elementary subgroup is cyclic, in particular so is Stab(Y ). Consequently, the number of elements g ∈ Stab(Y ) such that |gy − y| r is linear in r. More precisely, using Lemma 2.15, we have which yields the claim.
Lemma 9.12. -If S does not stabilize any v ∈ V , then |S| N .
Suppose first that this subgroup E is loxodromic. It is infinite cyclic by assumption. Recall that the translation length of any element in S is at most d. Hence, as previously we get Suppose now that E is an elliptic subgroup. In particular, the set Fix(S, 14δ) ⊂ X is non-empty, and by Lemma 2.6, Fix(S, d) is contained in the d/2-neighborhood of Fix(S, 14δ). In particular the diameter of Fix(S, 14δ) is larger that κ−200δ −d, hence, larger than κ. Consequently, by acylindricity, |S| N .
This completes the proof of Proposition 9.9.
J.É.P. -M., 2022, tome 9 9.6. ∞ -energy. -In this section we compare the ∞ -energy of finite subset U ⊂ G and its image U ⊂ G respectively.
Proposition 9.13. -Let U ⊂ G be a finite set such that λ(U ) ρ/5. If, for all v ∈ V , the set U is not contained in Stab(v), then there is a pre-image U ⊂ G of U of energy λ(U ) π sinh λ(U ).

Product set growth in Burnside groups of odd exponent
We finally prove Theorem 1.2.
10.1. The induction step. -We will use the following.
Let n 1 n 0 and n n 1 be an odd integer. Let G act properly co-compactly by isometries on a proper geodesic δ 1 -hyperbolic space X such that (1) the elementary subgroups of G are cyclic or finite of odd order n, (2) A(G, X) A 0 and τ (G, X) ρ 0 L 0 δ 1 /4n 1 , and (3) the action of G is (N, A 0 )-acylindrical, for some integer N .
Let P be the set of primitive hyperbolic elements h of translation length h L 0 δ 1 . Let K be the normal closure of the set {h n | h ∈ P } in G.
Then there is proper geodesic δ 1 -hyperbolic space X on which G = G/K acts properly co-compactly by isometries. Moreover, • (1) and (2) hold for the action of G on X; Remark 10.2. -Note that Assumptions (2) and (3) are somewhat redundant. Indeed, if the action of G on X is (N, κ)-acylindrical, then the parameters A(G, X) and τ (G, X) can be estimated in terms of δ, N and κ only. However, we chose to keep them both, to make it easier to apply existing results in the literature.
Proof. -This is essentially [Cou14,Prop. 7.1]. The only additional observation is point (3). For details of the proof, we refer the reader to [Cou14]. Here, we only give a rough idea of the proof and fix some notation for later use.
Without loss of generality we can assume that δ 0 , ∆ 0 δ 1 while ρ 0 L 0 δ 1 . In particular A 0 ρ 0 /500. Following [Cou14, pp. 319], we define a rescaling constant as follows. Let We note for later use that if ρ 0 is sufficiently large (which we assume here) we have ε n 1/ √ n, for every n > 0. We then choose n 0 such that for all n n 0 , the following holds These are the same conditions as in [Cou14,p. 319] (in this reference, ε is denoted by λ). We now fix n 1 n 0 and an odd integer n n 1 . For simplicity we let ε = ε n1 . Moreover, let Q = h n , C E(h) | h ∈ P .
As explained in [Cou14, Lem. 7.2], the small cancellation hypothesis needed to apply Theorem 9.4 are satisfied by Q for the action of G on εX. We let G and X as in Section 9.3 (applied to G acting on εX). Observe, for later use, that the map is ε-Lipschitz. Assertions (1) and (2) follows from Lemmas 7.3 and 7.4 in [Cou14]. By Proposition 9.9 the action of G on X is (N , κ)-acylindrical where N max N, 3π sinh ρ τ (G, εX) + 1 and κ = max{A(G, εX), εA 0 } + 5π sinh(150δ 1 ).
It follows from the definition of ε and our hypothesis on τ (G, X) that N max{N, n 1 }. On the other hand by (18) we have Hence the action of the G on X is (N , A 0 )-acylindrical as we announced. Consider now a subset U of G such that λ(U ) ρ 0 /5 and U does not generate a finite subgroup. Hence, applying Proposition 9.13, we see that there exists a pre-image U ⊂ G of U such that the ∞ -energy of U for the action of G on εX is bounded above by π sinh λ(U ). Thus, for the action of G on X, we obtain that This is the lifting property stated at the end of Proposition 10.1. Assume now that G is a non-elementary, torsion-free hyperbolic group. Proposition 10.1 can be used as the induction step to build from G a sequence of hyperbolic groups (G i ) that converges to the infinite periodic quotient G/G n , provided n is a sufficiently large odd exponent. For our purpose, we need a sufficient condition to detect whenever an element g ∈ G has a trivial image in G/G n . This is the goal of the next statement, see [Cou18a,Th. 4.13]. The result is reminiscence of the key argument used by Ol'shanskiȋ in [Ol'91,§10]. Recall that the definition of containing a (large) power (Definition 4.1) involves the choice of a basepoint p ∈ X.
Theorem 10.3. -Let G be a non-elementary torsion-free group acting properly cocompactly by isometries on a hyperbolic geodesic space X. We fix a basepoint p ∈ X. There are n 0 and ξ such that for all odd integers n n 0 the following holds. If g 1 and g 2 are two elements of G whose images in G/G n coincide, then one of them contains a (n/2 − ξ)-power.
Here, we need a stronger result. Indeed we will have to apply this criterion for any group (G i ) approximating G/G n . In particular we need to make sure that the critical exponent n 0 appearing in Theorem 10.3 does not depend on i. For this reason, we use instead the following statement.
Let n 1 n 0 and set ξ = n 1 + 1. Fix an odd integer n max{100, 50n 1 }. Let G be a group acting properly, co-compactly by isometries on a proper, geodesic, δ 1 -hyperbolic space X with a basepoint p ∈ X, such that (1) the elementary subgroups of G are cyclic or finite of odd order n, (2) A(G, X) A 0 and τ (G, X) ρ 0 L 0 δ 1 /4n 1 .
If g 1 and g 2 are two elements of G whose images in G/G n coincide, then one of them contains a (n/2 − ξ)-power.
Remark 10.5. -The "novelty" of Theorem 10.4 compared to Theorem 10.3 is that the critical exponent n 0 does not depend on G but only on the parameters of the action of G on X (acylindricity, injectivity radius, etc). Note that the critical exponent given by Ol'shanskiȋ in [Ol'91] only depends on the hyperbolicity constant of the Cayley graph of G. However this parameter will explode along the sequence (G i ).
Thus we cannot formally apply this result. Although it is certainly possible to adapt Ol'shanskiȋ's method, we rely here on the material of [Cou18a].
Sketch of proof. -The arguments follow verbatim the ones of [Cou18a,§4]. Observe first that the parameters δ 1 , L 0 , ρ 0 , A 0 and n 0 in [Cou18a, p. 797] are chosen in a similar way as we did in the proof of Proposition 10.1 (note that the rescaling parameter that denote ε n is called λ n there). Once n 1 n 0 has been fixed, we set, exactly as in [Cou18a,p. 797], ξ = n 1 +1 and n 2 = max{100, 50n 1 }. We now fix an odd integer n n 2 . At this point in the proof of [Cou18a] one chooses a non-elementary torsion-free group G acting properly co-compactly on a hyperbolic space X with a basepoint p ∈ X. Note in particular that the base point p is chosen after fixing all the other parameters. Next one uses an analogue of Proposition 10.1 to build a sequence of hyperbolic groups (G i ) converging to G/G n . The final statement, that is Theorem 10.3, is then proved using an induction on i, see [Cou18a,Prop. 4.6].
Observe that the fact that G is torsion-free is not necessary here. We only need that the initial group G satisfies the induction hypothesis, that is: (1) X is a geodesic δ 1 -hyperbolic space on which G acts properly co-compactly by isometries.
These are exactly the assumptions stated in Theorem 10.4. In particular, we can build as in [Cou18a] a sequence of hyperbolic (G i ) converging to G/G n . The theorem is proved using an induction on i just as in [Cou18a]. Actually the proof is even easier, since we only need a sufficient condition to detect elements of G which are not trivial in G/G n , while [Cou18a] provides a sufficient and necessary condition for this property.
10.2. The approximating sequence. -Let G be a non-elementary torsion-free hyperbolic group. The periodic quotient G/G n is the direct limit of a sequence of infinite hyperbolic groups G i that can be recursively constructed as follows. We let δ 1 , ρ 0 , L 0 , n 0 , and A 0 50 · 10 3 δ 1 be the parameters given by Proposition 10.1. Let G 0 = G and let X 0 be its Cayley graph. Up to rescaling X 0 we can assume that X 0 is a δ 1 -hyperbolic metric geodesic space and A(G 0 , X 0 ) A 0 . We choose n 1 n 0 such that τ (G 0 , X 0 ) ρ 0 L 0 δ 1 4n 1 .
Let n n 2 be an odd integer. It follows from our choices that the assumptions of Proposition 10.1 are then satisfied for the action of G 0 on X 0 .
Let us suppose that G i is already given, and acts on a δ 1 -hyperbolic space X i such that the assumptions of Proposition 10.1 are satisfied. Then G i+1 = G i and X i+1 = X i are given by Proposition 10.1. In particular, the action of G i+1 on X i+1 is (N , A 0 )-acylindrical, with N = max{N, n 1 }. However we chose N n 1 . Hence the action of G i+1 on X i+1 is (N, A 0 )-acylindrical. It follows from the construction that G/G n is the direct limit of the sequence (G i ). Compare with [Cou14,Th. 7.7].
The proof now goes as in Corollary 8.2. Let s 0 be an integer. On the one hand, (V π(v)) s is contained in V 2s , hence |V 2s | |(V π(v)) s |. On the other hand (V π(v)) s V is contained in V 2s+1 . Right multiplication by π(v) induces a bijection from G/G n to itself. Hence It follows that |V r | (a|V |) [(r+1)/2] , for every integer r 0.
This completes the proof of Theorem 1.2.
Proof of Corollary 1.3. -Let n 0 > 0 and a > 0 be the constants given by Theorem 1.2. We fix N such that a 3 N > 1. Let n n 0 . Let us take a subset V ⊂ G/G n that is not contained in a finite subgroup and that contains the identity. Then, for all k 1, we have V k−1 ⊆ V k . As V is not contained in a finite subgroup, this implies that |V k | > |V k−1 |. Thus a 3 |V N | > 1. We now apply twice Theorem 1.2, first with the set V 3N , and second with V N . For every integer r 0, we have Taking the logarithm and passing to the limit we get h(V ) 1 6N ln(|V |).
Since V does not lie in a cyclic subgroup and contains the identity, it has at least three elements, whence the second inequality in our statement.