key: cord-0043247-vjs0gvb5
authors: Dick, Jeffery; Hutchinson, Laura K.; Mercaş, Robert; Reidenbach, Daniel
title: Reducing the Ambiguity of Parikh Matrices
date: 2020-01-07
journal: Language and Automata Theory and Applications
DOI: 10.1007/978-3-030-40608-0_28
sha: 4b24c945fc0da202421897c48441676546b4a67e
doc_id: 43247
cord_uid: vjs0gvb5

The Parikh matrix mapping allows us to describe words using matrices. Although compact, this description comes with a level of ambiguity since a single matrix may describe multiple words. This work looks at how considering the Parikh matrices of various transformations of a given word can decrease that ambiguity. More specifically, for any word, we study the Parikh matrix of its Lyndon conjugate as well as that of its projection to a smaller alphabet. Our results demonstrate that ambiguity can often be reduced using these concepts, and we give conditions on when they succeed.

An approach for a more compact representation of data can be provided by histograms, which are also a well established statistical tool used in a wide range of applications. The concept of a Parikh vector [15] represents a type of such histograms that is specific to the analysis of sequences of symbols (or: words), considering the number of occurrences of each letter that exists in a word.

Parikh vectors can be easily computed and are guaranteed to be logarithmic in the size of the word they represent, but they are ambiguous; that is, multiple words typically share the same Parikh vector. Following this, in [14] the authors look at a refinement of the vector notion which is meant to reduce this ambiguity, and introduce an extension for it in the form of a Parikh matrix. A Parikh matrix not only contains the Parikh vector of the word, but also information regarding some of the word's (scattered) subwords. Such a matrix has the same asymptotic compactness as a Parikh vector and is associated to a significantly smaller number of words. However, it does not normally remove ambiguity entirely.

The bulk of the work done on the Parikh matrix mapping concerns the ambiguity that Parikh matrices exhibit. A lot of effort is spent on identifying an alternative to the Parikh matrix concept that would make a mapping from a word injective, or less ambiguous in general [1, 2, [8] [9] [10] [11] 18] . These include even more refined versions of the matrices by inclusion of polynomials, various extensions on the mappings, or both. For Parikh matrices explicitly, due to the difficulty arising from this ambiguity, the primary focus was on investigating this property on binary [4] [5] [6] [7] 17] and ternary [3, 13, 16, 19] alphabets, leaving alphabets of size greater than three relatively unexplored.

In terms of reducing the ambiguity of a word, the investigation was based on either gathering more information about the specific word by altering the order of the alphabet, known as the dual order [6, 14] , or by considering the reverse image of the word [6] . Hence an under-studied aspect that may reduce the ambiguity of a matrix concerns the information acquired by altering the word itself, or considering other alterations of the alphabet. In this work we present and investigate two different methods that reduce the ambiguity of the original Parikh matrices in the form of P-Parikh matrices and L-Parikh matrices.

The first of the two transformations, the P-Parikh matrix mapping, considers the Parikh matrices associated to a projection morphism of the initial word, where the considered alphabet is reduced to the subset of the alphabet used within the defined transformation. These represent a particular case of the extended mapping presented in [18] , where we only consider a subset of the original alphabet. For example, consider the words abcaabaac and abacabcaa. It is easy to see that both share the same number of letters, and subwords ab, bc and abc, respectively, making their Parikh matrices equal and therefore ambiguous. The P-Parikh matrices associated to them with respect to {a, c} consider the number of subwords ac, which is 6 in the former, but only 5 in the latter of the words. Hence, there exist P-Parikh matrices not shared by the words.

We show that, using P-Parikh matrices, we can reduce the ambiguity of the vast majority of words. We also explore when P-Parikh matrices do not reduce ambiguity, as well as provide some insight into the types of words that cannot be uniquely described by a P-Parikh matrix.

However, since P-Parikh matrices are defined for a subset of the initial alphabet, they prove useless when dealing with binary sequences. We therefore consider an alternative transformation of words: the Lyndon conjugate, first introduced in [20] , which is defined as the lexicographically smallest circular rotation of a word. Lyndon conjugates were used previously as a tool for ambiguity reduction. In [17] , the authors define the Lyndon image of a Parikh matrix as the lexicographically smallest word describing such a matrix. Hence every Parikh matrix has exactly one distinct Lyndon image, which therefore allows each Parikh matrix to be described uniquely. In the context of this paper, we use the Lyndon conjugate differently, i. e., we consider the Parikh matrix of the Lyndon conjugate of a word, and we call the resulting matrix the L-Parikh matrix of the original word.

Consider the Parikh matrix of the Lyndon conjugates of the two previously given words. Observe that aabaacabc has 7 occurrences of ab, whereas aaabacabc has 8, making their Parikh matrices different. Hence, the ambiguity of their Parikh matrix can be reduced using L-Parikh matrices.

While L-Parikh matrices are a useful concept for any alphabet size, we focus on the cases where they reduce ambiguity in the binary alphabet and show that this happens in most cases. We give specific conditions of when L-Parikh matrices do not help reduce the ambiguity of the given word, and investigate the words for which these criteria apply. This leads us to our main result of the paper, a characterisation of words whose ambiguity can be reduced using L-Parikh matrices.

We end this section with a brief breakdown of our paper. In Sect. 2 we present some basic definitions and notions. Section 3 examines the first of the two notions we introduce, the P-Parikh matrix, establishing conditions for when they can or cannot achieve ambiguity reduction. In Sect. 4, we study equivalent questions for L-Parikh matrices, largely focusing on binary alphabets in some cases. We end our paper with conclusions as well as directions for future work.

It is assumed the reader is familiar with the basics of combinatorics on words. If needed, [12] can be consulted. Throughout this paper, N refers to the set of natural numbers starting with 1.

We refer to a string of arbitrary letters as a word which is formed by concatenation of letters. The set of all letters used to create our words is called an alphabet. We represent an ordered alphabet as Σ k = {a 1 < · · · < a k }, where k ∈ N is the size of the alphabet, and by convention a i is the ith letter in the Latin alphabet. Whenever the alphabet size is irrelevant or understood, we omit this from notation using only Σ. All alphabets referred to in this paper have an order imposed on them.

We define the concatenation of two words u and v as uv. The length of a word is the total number of not necessarily distinct letters it contains and the empty word of length zero is denoted ε. The Kleene star, denoted * , is the operation that, once applied to a given alphabet, generates the set of all finite words that result from concatenating any words in that alphabet. Further, we denote the i th letter in a word w as w [i] .

The reversal of a word, denoted rev, is defined as rev

We say that a factor v is in w if and only if w can be written as w = w 1 vw 2 , where w 1 , w 2 ∈ Σ * . We say that u = u [1] 

We use |v| u to denote the number of distinct occurrences of u as a subword in v.

The Parikh vector [15] φ associated with a word w is obtained through a mapping φ : Σ * → N k , defined as φ(w) = [|w| a1 , |w| a2 , . . . , |w| a k ]. For a matrix M of size k × k, the j-diagonal is defined as all elements of M that are in the position M i,i+j for i = 1, . . . , k − j. A word is associated with a matrix, called its Parikh matrix, if the matrix is obtained from that word following the process detailed in the following explanatory definition. For a technical version of the definition we refer to [14] .

Let w ∈ Σ * k . The Parikh matrix, denoted Ψ (w), that w is associated with has size (k + 1) × (k + 1). The diagonal of the matrix is populated with 1's and all elements below it are 0. The count of all subwords that consist of consecutive letters in Σ k and are of length n in the word are found on the n-diagonal, for 1 ≤ n ≤ k.

One notion we introduce in this paper relies on a change in alphabet size. As such, to emphasise the size n of the alphabet used for a Parikh matrix, we write Ψ n (w). We say that a Parikh matrix describes a word if the word is associated to the matrix. Notice that due to the associativity of matrix multiplication, the Parikh matrix of a word can be constructed from the Parikh matrices of its factors. For a word w = u 1 u 2 , we have Ψ n (w) = Ψ n (u 1 )Ψ n (u 2 ). For the rest of this work we refine our notation for a Parikh matrix where we remove the elements not depending on the associated word. By definition a Parikh matrix is an upper triangular matrix with 1's on the diagonal regardless of the word described. For aesthetics, removing the redundant part leaves us with a triangular structure that holds the same information as the original matrix,

Two words w and w are conjugates if we can write w = uv and w = vu. For a word w, we say that the conjugacy class of w, denoted C(w) is the class of all of its possible conjugates. A conjugacy class is associated to a Parikh matrix if at least one word belonging to that class is associated to the matrix. Example 2. The matrix 4 4 2 has only the words aabbaa, abaaba, baaaab associated to it. The words aabbaa and baaaab are members of the same conjugacy class, while abaaba belongs to a different conjugacy class. Hence this matrix has two conjugacy classes associated to it.

A Parikh matrix can be associated to multiple words, as seen above, although cases exist where a matrix describes a single word, e. g., aabb is the unique word associated to 2 4 2 . We say that two words are amiable if they are associated to the same Parikh matrix. If two or more words are associated to a single Parikh matrix, we say that the matrix is ambiguous. Later in this paper, we reduce the ambiguity of a word using both its Parikh matrix and the Parikh matrix of an altered form of that word to describe it. As such, we introduce a formal definition of the ambiguity that multiple functions may have based on the set of all words that satisfy all functions. We are then able to use this when considering the ambiguity of the notions we introduce later.

. ., f m )| for m > n and functions f n+1 , . . ., f m , then we say that f n+1 , . . ., f m reduce the ambiguity of w on f 1 , . . ., f n .

. ., f n ) is unambiguous and it is not possible to further reduce ambiguity.

First we introduce the P-Parikh matrix. This matrix is in essence the Parikh matrix of a projection of a word, and represents a particular case of the extension of the Parikh matrix mapping presented in [19] . For n ∈ N, w ∈ Σ * n and S ⊂ Σ n , the P-Parikh matrix of w with respect to S is defined as follows.

To gain some intuition about the above definition, consider an example.

For the index sequence of S, since a is the lexicographically smallest letter in S, we obtain k 1 = 1, k 2 = 4 and k 3 = 5. Hence π S (a) = a, π S (d) = b and π S (e) = c.

With the transformation defined we apply this to the word, and calculate the corresponding P-Parikh matrix as the Parikh matrix of the transformed word,

The Lyndon conjugate of a word is the conjugate that is lexicographically smallest based on the order on the alphabet. The Lyndon conjugate of a word w is denoted L(w). In an attempt to reduce the ambiguity of Parikh matrices, we modify the original Parikh matrix mapping to gain more information about a given word. Next, we introduce the L-Parikh matrix associated to a word. It was shown in [4] that there exist transformations that, when applied to a word, create a new word that is amiable with the original. For non-binary alphabets, a Type 1 transformation is given.

Let w, w ∈ Σ * n with n ≥ 3. Then w transforms into w using a Type 1 transformation if w = u 1 a i a j u 2 and w = u 1 a j a i u 2 , where u 1 , u 2 ∈ Σ * n , a i , a j ∈ Σ n , and |i − j| ≥ 2.

For binary alphabets, a second type of transformation is described, referred to as a Type 2, that allows us to check if certain words are amiable without constructing their matrices.

Let w, w ∈ Σ * 2 . Then w transforms into w through a Type 2 transformation if w = xa 1 a 2 ya 2 a 1 z and w = xa 2 a 1 ya 1 a 2 z, or vice-versa, where x, y, z ∈ Σ * 2 and a 1 , a 2 ∈ Σ 2 .

In this section, we examine when and how much P-Parikh matrices reduce the ambiguity of a given word. When we refer to a reduction in ambiguity using P-Parikh matrices, we mean that the number of words described by the original Parikh matrix and their respective P-Parikh matrices is strictly less than the total number of words described by the original Parikh matrix alone, i. e., |A{w, Ψ n , Ψ S n }| < |A{w, Ψ n }|, for some S ⊂ Σ n . First we present an example of P-Parikh matrices removing the ambiguity of a Parikh matrix entirely. (w ) = Ψ 2 (aab) = 2 2 1 . Thus w and w have different P-Parikh matrices and we can uniquely describe them.

We first introduce some terms that are useful when describing how effective a given P-Parikh matrix is at reducing ambiguity.

Definition 5. Given a word w ∈ Σ * n , we call Ψ (w) P-distinguishable if either |A(w, Ψ )| = 1 or there exists a word u ∈ Σ * n and a set S ⊂ Σ n such that Ψ (w) = Ψ (u) and Ψ S n (w) = Ψ S n (u). In the latter case we say that w and u are Pdistinct. Furthermore, we call w P-unique if there exist sets S 1 , S 2 , . . . , S m ⊂ Σ n such that |A(w, Ψ, Ψ S1 n , Ψ S2 n , . . . , Ψ Sm n )| = 1.

Now we use these terms to examine words whose ambiguity can be reduced using P-Parikh matrices, namely those that contain any length two factor where those two letters are not equal or consecutive in the alphabet.

n , then w = u 1 a j a i u 2 is also associated to w, following Lemma 1. Without loss of generality, take S = {a i < a j }. Then Ψ S n (w) = Ψ S n (w ), since |w| aiaj and |w | aiaj are elements in Ψ S n (w) and Ψ S n (w ), respectively, and |w| aiaj = |w | aiaj .

It is simple to identify words that have such factors by comparing adjacent positions in the word. We can use this to find a lower bound for the proportion of words that are uniquely identified for a given alphabet and word length. Notice especially that as n and m get larger, the proportion of words which are reduced in ambiguity by P-Parikh matrices also gets larger. We therefore conclude that the use of P-Parikh matrices reduces ambiguity for a larger ratio of words for bigger alphabets rather than smaller.

There also exist words for which P-Parikh matrices do not reduce ambiguity. Our following result says that if our choice of a subset consists of only consecutive letters of the initial alphabet, the P-Parikh matrices are not P-distinguishable.

If all elements of the set S ⊂ Σ n are consecutive in the alphabet Σ n , then |A(w,

The result of Remark 6 strengthens the one of Proposition 1 by telling us that the ambiguity of words defined over binary alphabets is not reducible by P-Parikh matrices.

There does not exist a Parikh matrix that describes binary words whose ambiguity can be reduced by P-Parikh matrices.

Furthermore, there exist non-binary words for which P-Parikh matrices do not remove ambiguity, namely those that are not P-unique. Finally, we end this section by giving two classes of words which are not uniquely described by P-Parikh matrices, no matter how we choose the set S. Proposition 3. Take two words w, w ∈ Σ * n with the form w = u 1 a i a j va j a i u 2 and w = u 1 a j a i va i a j u 2 , where a i ≤ a j ∈ Σ n and u 1 ,

Proof. Firstly, if a i = a k = a j , equivalence follows, as w = w . Now, let a i < a j .

In the case where S contains either a i or a j , then π S (w) = π S (w ) since a i and a j are the only letters that swap places in w compared to w. Since π S (w) = π S (w ), clearly Ψ S n (w) = Ψ S n (w ) follows. If S = {a i , a j }, then, π S (w) is a binary word and can be transformed via a Type 2 transformation, from Lemma 2, into π S (w ), so Ψ S n (w) = Ψ S n (w ). Next consider that {a i , a j } ⊂ S, |S| > 2, and S has no elements between a i and a j . Then π S (w)=π S (u 1 )a i a j a j a i π S (u 2 ) and π S (w )=π S (u 1 )a j a i a i a j π S (u 2 ). Using an extension from [3] of the Type 2 transformations we can transform π S (w) into π S (w ), and get that Ψ S n (w) = Ψ S n (w ). Finally, consider the case where S contains a i , a j , and at least one letter that comes lexicographically between a i and a j . Then, π S (w) can be transformed into π S (w ) via two Type 1 transformations on a i and a j , since a i and a j are not lexicographically adjacent in S (see Lemma 1).

The ideas from Proposition 3 give rise to another class of words that are not P-unique, by loosening the condition on v and extending the length of the word. = u 1 a i a j v 1 a j a i a j a i v 2 a i a j u 2 ,  and w = u 1 a j a i v 1 a i a j a i a j v 2 a j a i u 2 , where a i < a j ∈ Σ n and u 1 ,

. Then, w and w are not P-distinct if and only if |v 1 | a = |v 2 | a for all a / ∈ {a k |a i ≤ a k ≤ a j }, and at least one of the following conditions is true:

In other words, the above statement says that two words are not P-distinct if both v 1 and v 2 are defined on the subset of the alphabet which is either lexicographically bigger than a i or smaller than a j , and they share the same Parikh vector for the subset of letters which are not in between a i and a j . Furthermore, if v 1 ∈ {a i+1 , . . . , a n } * , then all the letters which are lexicographically greater than a j must occur in v 1 in decreasing lexicographical order and in v 2 in increasing order. On the other hand, if v 1 ∈ {a 1 , . . . , a j−1 } * , then all the letters which are lexicographically smaller than a i must occur in v 1 in increasing lexicographically order and in v 2 in decreasing lexicographical order.

Proposition 2 shows that in many cases, the set of words that share both a Parikh matrix and a P-Parikh matrix is smaller than the set of those that share only a Parikh matrix. However, following Corollary 1 we also know that this never happens for binary alphabets. Hence we now study L-Parikh matrices as an alternative method of ambiguity reduction. While they can be effective for any non-unary alphabet, we focus on binary alphabets specifically. We begin this section by explaining the motivation for choosing the Lyndon conjugate of a word and then build to our main result where we characterise words whose ambiguity is reduced by the use of L-Parikh matrices.

As indicated by Definition 4, the concept of L-Parikh matrices is based on a modification to a word that results in a change in the order of letters. The following theorem implies that the strategy of altering a word is not always a successful method of ambiguity reduction. Note that Ψ rev refers to the Parikh matrix of the reversal of a word.

Unlike Theorem 1, L-Parikh matrices use the conjugate of a word. The next proposition implies that such conjugates need to be chosen wisely.

Proposition 5. Given words v, w ∈ Σ * with Ψ (v) = Ψ (w), for any factorisations v = v 1 v 2 and w = w 1 w 2 such that |v 2 | = |w 2 |, we have that

Proof Outline. We can prove the statement that holds for every size alphabet by contradiction, by assuming that Ψ (v) = Ψ (w), Ψ(v 2 v 1 ) = Ψ (w 2 w 1 ) and φ(v 2 ) = φ(w 2 ). We examine the total number of ab subwords in v, w, v 2 v 1 and w 2 w 1 to obtain a set of equations. We then consider the total number of b's in v 2 and w 2 to find a contradiction within these equations.

For the statement that holds just for the binary alphabet we examine the total number of ab subwords in v 2 v 1 , w 2 w 1 , v 1 , v 2 , w 1 and w 2 and get a contradiction in the equations we obtain by initially assuming that φ(

Below example shows that |v 2 | = |w 2 | is necessary for Proposition 5.

Example 5. Consider the words v = aabaabbb with v 2 = aabbb and w = aaabbabb with w 2 = abb. One can easily find that Ψ (v 2 v 1 ) = Ψ (w 2 w 1 ) = 4 10 4 . [2, 3] and φ(w 2 ) = [1, 2] , and therefore |v 2 | = |w 2 | is a necessary condition in the context of Proposition 5.

An example for the ternary alphabet where Ψ (v 2 v 1 ) = Ψ (w 2 w 1 ) even though we have that Ψ (v) = Ψ (w) and φ(v 2 ) = φ(w 2 ) is given below. Note that if φ(v 2 ) = φ(w 2 ), then we must also have |w 2 | = |v 2 |. Since any alphabet of size greater than 3 would rely on the result of the ternary alphabet always being true, we can deduce that the backwards direction from Proposition 5 only holds for the binary alphabet.

Let v = cbbaaabb and w = cabbbaab. We have that Ψ (v) = Ψ (w). Now let v 2 = aabb and w 2 = baab. Then we have that |w 2 | = |v 2 | and φ(v 2 ) = φ(w 2 ). Note that Ψ (v 2 ) = Ψ (w 2 ), since |v 2 | ab = 4 and |w 2 | ab = 2. But this gives us Ψ (v 2 v 1 ) = Ψ (aabbcbba) = Ψ (baabcabb) = Ψ (w 2 w 1 ).

Proposition 5 shows that when looking for a modification that we can apply to a word to find a new and different Parikh matrix, we need to consider conjugates of amiable words where it is less likely that the Parikh vectors of their right factors are the same, i. e., conjugates obtained by shifting the original words a different number of times, respectively.

Let us now consider how using L-Parikh matrices reduces ambiguity. The rest of this section ignores any word w where |A(w, Ψ )| = 1, since there is no ambiguity to be reduced here. For a word w, we calculate Ψ (w) and Ψ L (w) and use both of these matrices to describe the original word. The ambiguity of a word w, with respect to its Parikh and L-Parikh matrices, according to Definition 2, is the total number of words that share a Parikh matrix and an L-Parikh matrix with w, namely |A(w, Ψ, Ψ L )|. We use the next definitions and propositions to build to our main result where we characterise binary words whose ambiguity is reduced using L-Parikh matrices. In line with Definition 5 we introduce the following definitions.

Given a word w ∈ Σ * , we call Ψ (w) L-distinguishable if either |A(w, Ψ )| = 1 or there exists a word u ∈ Σ * with Ψ (w) = Ψ (u), such that Ψ L (w) = Ψ L (u). In the latter case we say that w and u are L-distinct. A word w is L-unique if |A(w, Ψ, Ψ L )| = 1.

Note that if w and v are L-distinct, then A(w, Ψ ) = A(v, Ψ ) and A(w, Ψ, Ψ L ) = A(v, Ψ, Ψ L ). The example below demonstrates the effectiveness of L-Parikh matrices for ambiguity reduction.

Example 7. Consider the words w = babbbaa, u = bbababa and v = bbbaaab with Ψ (w) = Ψ (u) = Ψ (v). However, for the conjugates L(w) = aababbb, L(u) = abababb and L(v) = aaabbbb we have that Ψ L (w) = 3 11 4 , Ψ L (u) = 3 9 4 , and Ψ L (v) = 3 12 4 . Thus their L-Parikh matrices are all different and we can uniquely describe each of the words by using L-Parikh matrices.

L-distinguishability is necessary for ambiguity reduction in this case.

The above characterisation of ambiguity reduction leads us to investigate sufficient conditions for a matrix to be ambiguous, and therefore for any pair of words it describes not to be L-distinct. Our next results consider the situations when the Parikh matrix of a word is not L-distinguishable. We show that words that meet the criteria outlined in each proposition within the binary alphabet are rare either later in the paper or directly following the next proposition. Proposition 6. For a word w ∈ Σ * , if all words in A(w, Ψ ) belong to the same conjugacy class, then Ψ (w) is not L-distinguishable.

Example 8. Let w = aababa and w = abaaab. These two words are amiable with each other and nothing else. Furthermore, L(w) = aaabab = L(w ), and since both words share a Lyndon conjugate, both words also share an L-Parikh matrix. Therefore Ψ (w) is not L-distinguishable. Now we move on to explore, for binary alphabets, the case where all words in A(w, Ψ ) belong to the same conjugacy class in more detail. Recall that C(w) refers to the conjugacy class of w. Proof Outline. The forwards direction is proven by examining every element of the conjugacy class of w. We can first prove that if L(u) = L(w), for all u ∈ A(w, Ψ ), then words in the conjugacy class of w are only amiable with other conjugates of w. We then show that this is only true when L(w) is in the set {aabb, ababbb, aababb, aabbab, aaabab}. For this we define a block of a letter to be a unary factor of a word which is not extendable to the right or left and argue that applying a Type 2 transformation to any Lyndon conjugate that is not in the above set either alters the size of the block of a's at the start of the word, or changes the total number of blocks of a's in the word altogether. This therefore gives us a word that is amiable to, but not a conjugate of, the original.

The backwards direction is proven by finding the Parikh matrices of all conjugates of words in the set {aabb, ababbb, aababb, aabbab, aaabab}. We then find that the only words described by these matrices are these conjugates.

We now look at the case where all words associated to a Parikh matrix are the Lyndon representatives of their respective conjugacy classes, which again makes this matrix not L-distinguishable. For binary alphabets, we examine in greater detail when all words in A(w, Ψ ) are the Lyndon representatives of their conjugacy classes. The next result provides a necessary and sufficient condition, and therefore the complete characterisation, for this case to occur for the binary alphabet.

Then the following statements are equivalent.

-For all u ∈ A(w, Ψ ), we have that u = L(u).

w = a * vb * and for n = |v| ba we have that |v| a = 2n and |v| b = n + 1.

Proof Outline. To show that these two statements are equivalent, we begin by showing that the second statement implies the former. We do this by first showing that if a word is of the form w = a * vb * and, for n = |v| ba , we have that |v| a = 2n and |v| b = n + 1, then w = L(w), and next move on to prove that only words of this form are described by Ψ (w). We prove that w = L(w) by observing that v = L(v). Adding more a's to the start of v and more b's to the end means that the Lyndon conjugate is still the word itself, and hence obtain w = L(w). We prove that words of the form described in the second point are only amiable with each other by calculating the total number of ab subwords in v and extrapolating this to w.

To prove that the first statement implies the second, we use the fact that our words share a Parikh matrix and that they must begin with the largest number of consecutive a's in the word and end with at least one b. We also rewrite w = a + w i b + where w i begins with the first occurrence of a b and ends with the last occurrence of an a in w, and examine the form that this must take given the fixed number of ab subwords we must have in w. This gives us the total number of a's and b's in a word relative to the total number of ba subwords.

The next example shows how the above result can be used to identify the form of the words that always share a Parikh matrix with other Lyndon conjugates.

Example 10. Following Proposition 9, Lyndon representatives of different conjugacy classes share a Parikh matrix only if they are of the form a * vb * , where for n = |v| ba we have that |v| a = 2n and |v| b = n + 1. Let us find all words of this form where n = 3. We begin by finding all binary words that contain 3 subwords ba. These are baaa, baba and bbba. Next add a's to the front and b's to the end of each word, respectively, so that we have a total of 6 a's and 4 b's per word: aaabaaabbb, aaaabababb, aaaaabbbab. Finally, any number of a's and b's can be added to the front and end of each word, respectively: a * aaabaaabbbb * , a * aaaabababbb * , a * aaaaabbbabb * . Hence we know that any word of this form is the Lyndon representative of its conjugacy class and shares a Parikh matrix with the two other words stated above. For example, Ψ (a 2 aaabaaabbbb 3 ) = Ψ (a 2 aaaabababbb 3 ) = Ψ (a 2 aaaaabbbabb 3 ) = 8 53 7 .

Thus far, we presented sufficient conditions for two amiable words not to be L-distinct. Our main result shows that these conditions are in fact also the necessary ones. The following lemmas are used in the proof of the final result, but are included here as they are also interesting results on their own. The first lemma tells us that if the Parikh vectors of the proper right factors of two amiable words are different, then the size of these factors must also be unequal.

Furthermore, if two amiable binary words are not the Lyndon representatives of their conjugacy classes, then to either of them we can apply a Type 2 transformation to obtain an amiable word whose Lyndon conjugate begins in a different position from the original one.

Proof Outline. The statement can be proven by contradiction, by first assuming that the Lyndon conjugate of every word associated to Ψ (w) begins in the same position within those words. We then show that for the Lyndon conjugate to begin at any position within a given binary word, it is possible to apply a Type 2 transformation to obtain a new word whose Lyndon conjugate begins in a different position.

Next we show that all words that are conjugates of any word w such that A(w, Ψ ) = A(w, Ψ L ) are also amiable with a word that is not a conjugate of any of the words in A(w, Ψ ).

Proof Outline. This statement can be proven by considering every form that a word w can take, such that A(w, Ψ ) = A(w, Ψ L ), from Proposition 9 and then examining all conjugates of these words. We show that a Type 2 transformation can be applied to every conjugate to obtain a word that is not a conjugate of any word in our original set A(w, Ψ ).

We end this section by giving our main result that characterises all binary words whose Parikh matrix is not L-distinguishable.

For Σ 2 , a Parikh matrix is not L-distinguishable if and only if any of the words it describes meet at least one of the following criteria:

w ∈ {aabb, ababbb, aababb, aabbab, aaabab, bbabbaaa, bbbaabaa} -w = a * vb * and for n = |v| ba we have that |v| a = 2n and |v| b = n + 1

Proof Outline. For the set of words B = {bbabbaaa, bbbaabaa}, the forward direction is easily proven by finding these words' Parikh and L-Parikh matrices, respectively. The backward direction is proved using the fact that for words w, w ∈ Σ * 2 such that w is the reverse of w and A(w , Ψ) = A(w , Ψ L ), then w ∈ B if and only if A(w, Ψ ) = A(w, Ψ, Ψ L ).

For the rest of the words, the 'if' direction was mostly proven earlier when Propositions 6, 7, 8 and 9, describing these situations, were introduced.

The 'only if' direction is proven by first examining the consequences of Proposition 5, which tells us that two words are L-distinct if their Lyndon conjugates begin in different positions, respectively. We use Lemmas 3 and 4 to conclude that no set of amiable binary words exists where the Lyndon conjugates of all words in the set begin in the same position of each word, respectively. Hence all Parikh matrices would be L-distinguishable if it were not for some cases that arise as a result of us using the Lyndon conjugate. These cases are namely the ones where the set of amiable words are all Lyndon conjugates, are all members of the same conjugacy class, or are all conjugates of words whose Lyndon conjugates share a Parikh matrix. We showed in Propositions 7 and 9 that the first two cases are characterised by words of the form w = a * vb * where for n = |v| ba we have that |v| a = 2n and |v| b = n + 1, and by words where their Lyndon conjugate is in the set {aabb, ababbb, aababb, aabbab, aaabab}, respectively. We use Lemma 5 to conclude that no words exist such that the third case is true.

In this paper, we have shown that using P-Parikh matrices and L-Parikh matrices reduces the ambiguity of a word in most cases. From Corollary 1, we learn that P-Parikh matrices cannot reduce the ambiguity of a Parikh matrix that describes words in a binary alphabet, but are very powerful when it comes to reducing the ambiguity of words in larger alphabets (Proposition 2). On the other hand, we find that L-Parikh matrices reduce the ambiguity of most binary words, with the few exceptions from Theorem 2, which have all been shown to be rare occurrences within the binary alphabet. Thus, using both tools together leads to a reduction in ambiguity in most cases.

Going forward, we wish to characterise words that are described uniquely by both types of matrices, respectively, as well as quantifying the ambiguity reduction permitted by both notions. Theorem 2 tells us that there are very few binary words whose Parikh matrix ambiguity cannot be reduced by L-Parikh matrices. Future research on L-Parikh matrices could also include an analysis similar to the one done in Proposition 2.

Finally we present a conjecture on the types of words that might be described by a Parikh matrix that is P-distinguishable. We know that the presence of a certain type of factor, described in Proposition 1, in a word means that its Parikh matrix is P-distinguishable. This conjecture implies that the presence of this factor is the only way that the ambiguity of a word could be reduced by P-Parikh matrices.

For any word w ∈ Σ * n , if Ψ (w) is P-distinguishable, then there exists a word amiable with w which contains a factor a i a j , where |i − j| > 1.

Counting subwords using a trie automaton

Several extensions of the Parikh matrix L-morphism

Parikh matrix mapping and amiability over a ternary alphabet

Parikh matrices and amiable words

Codifiable languages and the Parikh matrix mapping

On the injectivity of the Parikh matrix mapping

A new operator over Parikh languages

Some algebraic aspects of Parikh q-matrices

A q-matrix encoding extending the Parikh matrix mapping

A matrix q-analogue of the Parikh map

A q-analogue of the Parikh matrix mapping

Combinatorics on Words

Product of Parikh matrices and commutativity

On an extension of the Parikh mapping

On context-free languages

Strong (2·t) and strong (3·t) transformations for strong M-equivalence

Subword occurrences, Parikh matrices and Lyndon images

Extending Parikh matrices

On Parikh matrices, ambiguity, and prints

Subalgebras of free Lie algebras