key: cord-0043808-me3rq9rz
authors: Kari, Lila; Ng, Timothy
title: Descriptional Complexity of Semi-simple Splicing Systems
date: 2020-05-26
journal: Developments in Language Theory
DOI: 10.1007/978-3-030-48516-0_12
sha: a24355c8d96a41d75e622047c09e0aeaae16248d
doc_id: 43808
cord_uid: me3rq9rz

Splicing systems are generative mechanisms introduced by Tom Head in 1987 to model the biological process of DNA recombination. The computational engine of a splicing system is the “splicing operation”, a cut-and-paste binary string operation defined by a set of “splicing rules”, quadruples [Formula: see text] where [Formula: see text] are words over an alphabet [Formula: see text]. For two strings [Formula: see text] and [Formula: see text], applying the splicing rule r produces the string [Formula: see text]. In this paper we focus on a particular type of splicing systems, called (i, j) semi-simple splicing systems, [Formula: see text] and [Formula: see text], wherein all splicing rules r have the property that the two strings in positions i and j in r are singleton letters, while the other two strings are empty. The language generated by such a system consists of the set of words that are obtained starting from an initial set called “axiom set”, by iteratively applying the splicing rules to strings in the axiom set as well as to intermediately produced strings. We consider semi-simple splicing systems where the axiom set is a regular language, and investigate the descriptional complexity of such systems in terms of the size of the minimal deterministic finite automata that recognize the languages they generate.

Splicing systems are generative mechanisms introduced by Tom Head [7] to model the biological process of DNA recombination. A splicing system consists of an initial language called an axiom set, and a set of so-called splicing rules. The result of applying a splicing rule to a pair of operand strings is a new "recombinant" string, and the language generated by a splicing system consists of all the words that can be obtained by successively applying splicing rules to axioms and the intermediately produced words. The most natural variant of splicing systems, often referred to as finite splicing systems, is to consider a finite set of axioms and a finite set of rules. Several different types of splicing systems have been proposed in the literature, and Bonizzoni et al. [1] showed that the classes of languages they generate are related: the class of languages generated by finite Head splicing systems [7] is strictly contained in the class of languages generated by finite Pȃun splicing systems [13] , which is strictly contained in the class of languages generated by finite Pixton splicing systems [12] .

In this paper we will use the Pȃun definition [13] , which defines a splicing rule as a quadruplet of words r = (u 1 , u 2 ; u 3 , u 4 ). This rule splices two words x 1 u 1 u 2 y 1 and x 2 u 3 u 4 y 2 as follows: The words are cut between the factors u 1 , u 2 , respectively u 3 , u 4 , and the prefix of the first word (ending in u 1 ) is recombined by catenation with the suffix of the second word (starting with u 4 ), resulting in the word x 1 u 1 u 4 y 2 .

Culik II and Harju [3] proved that finite Head splicing systems can only generate regular languages, while [8] and [12] proved a similar result for Pȃun, respectively Pixton splicing systems. Gatterdam [5] gave (aa) * as an example of a regular language which cannot be generated by a finite Head splicing system, which proved that this is a strict inclusion.

As the classes of languages generated by finite splicing systems are subclasses of the family of regular languages, their descriptional complexity can be considered in terms of the finite automata that recognize them. For example, Loos et al. [10] gave a bound on the number of states required for a nondeterministic finite automaton to recognize the language generated by an equivalent Pȃun finite splicing system. Other descriptional complexity measures for finite splicing systems that have been investigated in the literature include the number of rules, the number of words in the initial language, the maximum length of a word in the initial axiom set, and the sum of the lengths of all words in the axiom set. Pȃun [13] also proposed the radius, defined to be the size of the largest u i in a rule, as another possible measure.

In the original definition, simple splicing systems are finite splicing systems where all the words in the splicing rules are singleton letters. The descriptional complexity of simple splicing systems was considered by Mateescu et al. [11] in terms of the size of a right linear grammar that generates a simple splicing language. Semi-simple splicing systems were introduced in Goode and Pixton [6] as having a finite axiom set, and splicing rules of the form (a, ε; b, ε) where a, b are singleton letters, and ε denotes the empty word.

In this paper we focus our study on some variants of semi-simple splicing systems called (i, j)-semi-simple splicing systems, i = 1, 2 and j = 3, 4, wherein all splicing rules have the property that the two strings in positions i and j are singleton letters, while the other two strings are empty. (Note that Ceterchi et al. [2] showed that all classes of languages generated by semi-simple splicing systems are pairwise incomparable 1 ). In addition, in a departure from the original definition of semi-simple splicing systems [6] , in this paper the axiom set is allowed to be a (potentially infinite) regular set.

More precisely, we investigate the descriptional complexity of (i, j)-semisimple splicing systems with regular axiom sets, in terms of the size of the minimal deterministic finite automaton that recognizes the language generated by the system. The paper is organized as follows: Sect. 2 introduces definitions and notations, Sect. 3 defines splicing systems and outlines some basic results on simple splicing systems, Sects. 4, 5 and 6 investigate the state complexity of (2,4)-, (2,3)-respectively (1,4)-semi-simple splicing systems, and Sect. 7 summarizes our results (Table 1 ).

Let Σ be a finite alphabet. We denote by Σ * the set of all finite words over Σ, including the empty word, which we denote by ε. We denote the length of a word w by |w| = n. If w = xyz for x, y, z ∈ Σ * , we say that x is a prefix of w, y is a factor of w, and z is a suffix of w. A deterministic finite automaton (DFA) is a tuple A = (Q, Σ, δ, q 0 , F ) where Q is a finite set of states, Σ is an alphabet, δ is a function δ : Q × Σ → Q, q 0 ∈ Q is the initial state, and F ⊆ Q is a set of final states. We extend the transition function δ to a function Q × Σ * → Q in the usual way. A DFA A is complete if δ is defined for all q ∈ Q and a ∈ Σ. In this paper, all DFAs are defined to be complete. We will also make use of the notation q w − → q for δ(q, w) = q , where w ∈ Σ * and q, q ∈ Q. The language recognized or accepted

Each letter a ∈ Σ defines a transformation of the state set Q. Let δ a : Q → Q be the transformation on Q induced by a, defined by δ a (q) = δ(q, a). We extend this definition to words by composing the transformations δ w = δ a1 •δ a2 •· · ·•δ an for w = a 1 a 2 · · · a n . We denote by im δ a the image of δ a , defined im δ a = {δ(p, a) | p ∈ Q}.

A state q is called reachable if there exists a string w ∈ Σ * such that δ(q 0 , w) = q. A state q is called useful if there exists a string w ∈ Σ * such that δ(q, w) ∈ F . A state that is not useful is called useless. A complete DFA with multiple useless states can be easily transformed into an equivalent DFA with at most one useless state, which we refer to as the sink state.

Two states p and q of A are said to be equivalent or indistinguishable in the case that δ(p, w) ∈ F if and only if δ(q, w) ∈ F for every word w ∈ Σ * . States that are not equivalent are distinguishable. A DFA A is minimal if each state q ∈ Q is reachable from the initial state and no two states are equivalent. The state complexity of a regular language L is the number of states of the minimal complete DFA recognizing L [4] .

A nondeterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, I, F ) where Q is a finite set of states, Σ is an alphabet, δ is a function δ : Q×Σ → 2 Q , I ⊆ Q is a set of initial states, and F ⊆ Q is a set of final states. The language recognized by an NFA A is L(A) = {w ∈ Σ * | q∈I δ(q, w) ∩ F = ∅}. As with DFAs, transitions of A can be viewed as transformations on the state set. Let δ a : Q → 2 Q be the transformation on Q induced by a, defined by δ a (q) = δ(q, a). We define im δ a = q∈Q δ a (q). We make use of the notation P w − → P for P = q∈P δ(q, w), where w ∈ Σ * and P, P ⊆ Q.

In this paper we will use the notation of Pȃun [13] . The splicing operation is defined via sets of quadruples r = (u 1 , u 2 ; u 3 , u 4 ) with u 1 , u 2 , u 3 , u 4 ∈ Σ * called splicing rules. For two strings x = x 1 u 1 u 2 x 2 and y = y 1 u 3 u 4 y 2 , applying the rule r = (u 1 , u 2 ; u 3 , u 4 ) produces a string z = x 1 u 1 u 4 y 2 , which we denote by (x, y) r z.

A splicing scheme is a pair σ = (Σ, R) where Σ is an alphabet and R is a set of splicing rules. For a splicing scheme σ = (Σ, R) and a language L ⊆ Σ * , we denote by σ(L) the language

Then we define σ 0 (L) = L and σ i+1 (L) = σ(σ i (L)) for i ≥ 0 and

For a splicing scheme σ = (Σ, R) and an initial language L ⊆ Σ * , we say the triple H = (Σ, R, L) is a splicing system. The language generated by H is defined by L(H) = σ * (L).

Goode and Pixton [6] define a restricted class of splicing systems called semisimple splicing systems. A semi-simple splicing system is a triple H = (Σ, M, I), where Σ is an alphabet, M ⊆ Σ × Σ is a set of markers, and I is a finite initial language over Σ. We have (x, y) (a,b) z if and only if x = x 1 ax 2 , y = y 1 by 2 , and z = x 1 ay 2 for some x 1 , x 2 , y 1 , y 2 ∈ Σ * . That is, a semi-simple splicing system is a splicing system in which the set of rules is

Since the rules are determined solely by our choice of M ⊆ Σ × Σ, the set M is used in the definition of the semi-simple splicing system rather than the set of rules M.

It is shown in [6] that the class of languages generated by semi-simple splicing systems is a subclass of the regular languages. Semi-simple splicing systems are a generalization of the class of simple splicing systems, defined by Mateescu et al. [11] . A splicing system is a simple splicing system if it is a semi-simple splicing system and all markers are of the form (a, a) for a ∈ Σ. It is shown in [11] that the class of languages generated by simple splicing systems is a subclass of the extended star-free languages.

Observe that the set of rules M = {(a, ε; b, ε) | (a, b) ∈ M } of a semisimple splicing system consist of 4-tuples with symbols from Σ in positions 1 and 3 and ε in positions 2 and 4. We can call such splicing rules (1,3)-splicing rules. Then a (1,3)-splicing system is a splicing system with only (1,3)-splicing rules and ordinary semi-simple splicing systems can be considered (1,3)-semisimple splicing systems. The state complexity of (1,3)-simple and (1,3)-semisimple splicing systems was studied previously by the authors in [9] .

We can consider variants of semi-simple splicing systems in this way by defining semi-simple (i, j)-splicing systems, for i = 1, 2 and j = 3, 4. A semi-simple (2,4)-splicing system is a splicing system (Σ, M,

The classes of languages generated by simple and semi-simple splicing systems and their variants have different relationships among each other. Mateescu et al. [11] show that the classes of languages generated by (1,3)-simple splicing systems (i.e. ordinary simple splicing systems) and (2,4)-simple splicing systems are equivalent, while, the classes of languages generated by (1,3)-, (1,4)-, and (2,3)-simple splicing systems are all incomparable and subregular.

The situation is different for semi-simple splicing systems. Ceterchi et al. [2] show that each of the classes of languages generated by (1,3)-, (1,4)-, (2,3)-, and (2,4)-semi-simple splicing systems are all incomparable. So unlike simple splicing systems, the (1,3)-and (2,4)-variants are not equivalent. They show this by showing that the language a + ∪ a + ab ∪ aba + ∪ aba + b is generated by the (1,3)-semi-simple splicing system ({a, b}, {(a, ε; b, ε)}, {abab}) but cannot be generated by a (2,4)-semi-simple splicing system, while the language b + ∪ abb + ∪ b + ab ∪ ab + ab can be generated by the (2,4)

In this paper, we will relax the condition that the initial language of a semisimple splicing system must be a finite language, and we will consider also semisimple splicing systems with regular initial languages. By [13] , it is clear that such a splicing system will also produce a regular language. In the following, we will use the convention that I denotes a finite language and L denotes an infinite language.

In this section, we will consider the state complexity of (2,4)-semi-simple splicing systems. Recall that a (2,4)-semi-simple splicing system is a splicing system with rules of the form (ε, a; ε, b) for a, b ∈ Σ. As mentioned previously, the classes of languages generated by (1,3)-and (2,4)-simple splicing systems were shown to be equivalent by Mateescu et al. [11] , while the classes of languages generated by (1,3)-and (2,4)-semi-simple splicing systems were shown to be incomparable by Ceterchi et al. [2] .

First, we define an NFA that recognizes the language of a given (2,4)-semisimple splicing system. This construction is based on the construction of Head and Pixton [8] for Pȃun splicing rules, which is based on the construction for Pixton splicing rules by Pixton [12] . The original proof of regularity of finite splicing is due to Culik and Harju [3] . We follow the Head and Pixton construction and apply ε-transition removal on the resulting NFA to obtain an NFA for the semi-simple splicing system with the same number of states as the DFA for the initial language of the splicing system. The result of this construction is an NFA that "guesses" when a splicing operation occurs. Since each component of a semi-simple splicing rule is of length at most 1, the construction of the NFA need only consider the outgoing and incoming transitions of states. In the case of (2,4)-semi-simple splicing systems, for a rule (a, b), any state with an outgoing transition on a has added transitions on a to every state with an incoming transition on b.

From this NFA construction, we can obtain a DFA via subset construction. This gives an upper bound of 2 n − 1 reachable states. This upper bound is the same for (1,3)-simple and (1,3)-semi-simple splicing systems and was shown to be tight [9] . Since (1,3)-simple splicing systems and (2,4)-simple splicing systems are equivalent, we state without proof that the same result holds for (2,4)simple splicing systems via the same lower bound witness. Therefore, this bound is reachable for (2,4)-semi-simple splicing systems via the same lower bound witness.

Proposition 2 [9] . For |Σ| ≥ 3 and n ≥ 3, there exists a (2,4)-simple splicing system with a regular initial language H = (Σ, M, L) with |M | = 1 where L is a regular language with state complexity n such that the minimal DFA for L(H) requires at least 2 n − 1 states.

It was also shown in [9] that if the initial language is finite, this upper bound is not reachable for (1,3)-simple and (1,3)-semi-simple splicing systems. This result holds for all variants of semi-simple splicing systems and the proof is exactly the same as in [9] . We state the result for semi-simple splicing systems for completeness. Proposition 3 [9] . Let H = (Σ, M, I) be a semi-simple splicing system with a finite initial language where I is a finite language recognized by a DFA A with n states. Then a DFA recognizing L(H) requires at most 2 n−2 + 1 states.

This upper bound is witnessed by a (2,4)-semi-simple splicing system which requires both an alphabet and ruleset that grows exponentially with the number of states of the initial language. This is in contrast to the lower bound witness for (1,3)-semi-simple systems from [9] , which requires only three letters. We also note that the initial language used for this witness is the same as that for (1,3)-simple splicing systems from [9] . From this, we observe that the choice of the visible sites for the splicing rules (i. e. (1,3) vs. (2,4)) makes a difference in the state complexity. We will see other examples of this later as we consider semi-simple splicing systems with other rule variants. 

We will now consider the state complexity of (2,3)-semi-simple splicing systems. Recall that a (2,3)-semi-simple splicing system is a splicing system with rules of the form (ε, a; b, ε) for a, b ∈ Σ. We can follow the same construction from Proposition 1 with slight modifications to account for (2, 3)-semi-simple splicing rules to obtain an NFA for a language generated by a (2,3)-semi-simple splicing system with the same number of states as the DFA for the initial language of the splicing system. Note that in this NFA construction, for each (2,3)-semi-simple splicing rule (a, b), any state with an outgoing transition on a has additional ε-transitions to every state with an incoming transition on b. This differs from the NFA construction for (2,4)-semi-simple splicing systems, where the new transitions were on the symbol a. From this NFA, we then get an upper bound of 2 n − 1 reachable states via the subset construction. However, we will show that because of the ε-transitions, this bound cannot be reached. Consider a ∈ Σ with (a, b) ∈ M and δ(q, a) = q is defined for some q ∈ Q. In other words, q has an outgoing transition on a. Assuming that (a, b) is non-trivial and im δ b contains useful states, for any set P ⊆ Q, we must have im δ b ⊆ P if q ∈ P . This is because for each symbol a ∈ Σ for which there is a pair (a, b) ∈ M , if the NFA B H enters a state q ∈ Q with an outgoing transition on a, the NFA B H also simultaneously, via ε-transitions, enters any state with an incoming transition on b. This implies that not all 2 n − 1 non-empty subsets of Q are reachable in A H , since the singleton set {q} is unreachable.

Because of this construction, the number of distinct sets that contains q decreases as the size of im δ b grows. Thus, to maximize the number of sets that can be reached, the number of states with incoming transitions on any symbol b with (a, b) ∈ M must be minimized. Therefore, for (a, b) ∈ M , there can be only one useful state with incoming transitions on b. Let us call this state q b ∈ Q.

We claim that to maximize the number of states, A must contain no useless states and therefore A contains no sink state. First, suppose otherwise and that A contains a sink state q ∅ . To maximize the number of states, we minimize the number of states of A with outgoing transitions, so there is only one state of A, say q , with an outgoing transition on a. We observe that q = q b , since otherwise, | im δ b | = 1 and if the only state with an outgoing transition on a is q b itself, then the only reachable subset that contains q b is the singleton set {q b }. Now, recall that for all subsets P ⊆ Q \ {q ∅ }, the two sets P and P ∪ {q ∅ } are indistinguishable. Then there are at most 2 n−2 distinguishable subsets containing q b and at most 2 n−3 − 1 nonempty subsets of Q \ {q b , q , q ∅ }. Together with the sink state, this gives a total of at most 2 n−2 + 2 n−3 states in A H . Now, we consider when A contains no sink state. In this case, since A must be a complete DFA, in order to satisfy the condition that | im δ b | is minimal, we must have δ(q, a) = q b for all q ∈ Q. But this means that for any state q ∈ Q and subset P ⊆ Q, if q ∈ P , then q b ∈ P . Therefore, every reachable subset of Q must contain q b . This gives an upper bound of 2 n−1 states in A H .

Since 2 n−1 > 2 n−2 + 2 n−3 for n ≥ 3, the DFA A H can have at most 2 n−1 states in the worst case.

The bound of Proposition 6 is reachable when the initial language is a regular language, even when restricted to simple splicing rules defined over an alphabet of size 3. This upper bound is met by the (2,3)-simple splicing system H = (Σ, {(c, c)}, L(A n )), where Σ = {a, b, c} and A n is the DFA shown in Fig. 1 . This gives us the following result. The bound of Proposition 6 depends on whether or not the DFA for the initial language contains a sink state. Since a DFA recognizing a finite language must have a sink state, the upper bound stated in the proposition is clearly not reachable when the initial language is finite. Proof. Let A = (Q, Σ, δ, q 0 , F ) be the DFA for I and let A H be the DFA obtained via the construction of Proposition 6, given the (2,3)-semi-simple splicing system H. We will consider the number of reachable and pairwise distinguishable states of A H .

Recall from the proof of Proposition 6 that to maximize the number of sets that can be reached in A H , the number of states with incoming transitions on any symbol b with (a, b) ∈ M must be minimized. Then for (a, b) ∈ M , there can be only one useful state with incoming transitions on b. Let us call this state q b ∈ Q.

Since I is a finite language, we know that q 0 , the initial state of A, is contained in exactly one reachable state in A H . Similarly A must contain a sink state q ∅ and for all subsets P ⊆ Q, we have that P and P ∪ {q ∅ } are indistinguishable. Finally, we observe that there must exist at least one state q 1 ∈ Q that is directly reachable from q 0 and is not reachable by any word of length greater than 1. Therefore, in order to maximize the number of reachable subsets, we must have that

Let Q a denote the set of states for which there is an outgoing transition on the symbol a. That is, if q ∈ Q a , we have δ(q, a) ≤ n − 2. Let k a = |Q a |. It is clear that k a ≥ 1. Now, consider a reachable subset P ⊆ Q \ {q 0 , q ∅ }. We claim that if |P | ≥ 2 and q b ∈ P , then we must have q ∈ P for some q ∈ Q a .

Suppose otherwise and that Q a ∩ P = ∅. Recall that q b = q 1 and the only incoming transitions to q 1 are from the initial state q 0 . Then this means that P = {q 1 } and |P | = 1, a contradiction. Therefore, we have Q a ∩ P = ∅ whenever q b ∈ P with |P | ≥ 2. Now, we can count the number of reachable subsets of Q \ {q 0 , q ∅ }. There are 2 n−3−ka (2 ka − 1) non-empty subsets of size greater than 1 which contain q b and there are 2 n−3−ka − 1 non-empty subsets which do not contain q b . Together with the initial and sink states and the set {q b }, we have 2 n−3−ka (2 ka − 1) + 2 n−3−ka − 1 + 3.

Thus, the DFA A H has at most 2 n−3 + 2 reachable states.

Let H = (Σ, {(a, c)}, L(B n )) be a (2,3)-semi-simple splicing system, where Σ = {a, b, c} and B n is a DFA for a finite language with n states. The DFA B n is shown in Fig. 2. Then H is a (2,3) -semi-simple splicing system with an initial finite language that is defined over a fixed alphabet that can reach the upper bound of Proposition 8. This then gives us the following theorem. Unlike the situation with (2,3)-semi-simple splicing systems with regular initial languages, when we restrict (2,3)-semi-simple splicing systems with initial finite languages to allow only (2,3)-simple splicing rules, the bound of Theorem 9 is not reachable. 

In this section, we consider the state complexity of (1,4)-semi-simple splicing systems. Recall that a (1,4)-semi-simple splicing system is a splicing system with rules of the form (a, ε; ε, b) for a, b ∈ Σ. As with (2,3)-semi-simple splicing systems, we can easily modify the construction of Proposition 1 to obtain an NFA for (1,4)-semi-simple splicing systems. This construction immediately gives an upper bound of 2 n+m states necessary for an equivalent DFA via the subset construction, where m is the number of symbols on the left side of each pair of rules in M . However, we will show via the following DFA construction that the upper bound is much lower than this. This construction gives an immediate upper bound of (2 n − 1)(|M 1 | + 1) states, however, not all of these states are distinguishable. Consider the two states Q, ε and Q, a for some a ∈ M 1 . We claim that these two states are indistinguishable. This arises from the observation that q∈Q δ(q, a) = im δ a for all a ∈ Σ. Then one of the following occurs:

Note that in either case, it does not matter whether or not (a, b) ∈ M and the two cases are distinguished solely by whether or not b is in M 1 . Thus, all states Q, a with a ∈ M 1 ∪ {ε} are indistinguishable.

Thus, A H has at most (2 n − 2)(|M 1 | + 1) + 1 states.

When the initial language is a regular language, the upper bound is easily reached, even when we are restricted to simple splicing rules. We consider the (1,4)-simple splicing system H = (Σ, {(c, c)}, L(D n )), where Σ = {a, b, c} and D n is the DFA shown in Fig. 4 . We note that the witness, H has |M | = 1 and therefore |M 1 | = 1. We observe that we can set |M 1 | to be arbitrarily large by adding symbols and transitions appropriately and adding the corresponding markers to M for each new such symbol. We then have the following result. We will show that this bound cannot be reached by any (1,4)-semi-simple splicing system when the initial language is finite. Proof. Let A = (Q, Σ, δ, q 0 , F ) be a DFA for I with n states and let A H be the DFA recognizing L(H) obtained via the construction of Proposition 13. Since I is finite, the initial state of A contains no incoming transitions and A must have a sink state. Therefore, for any state S, c , we have S ⊆ Q \ {q 0 , q ∅ } and c ∈ M 1 ∪{ε}, where q ∅ is the sink state. This gives us up to (2 n−2 −1)(|M 1 |+1)+2 states.

We can reduce the number of reachable states further by noting that since I is finite, A must contain at least one useful state q 1 that is directly reachable only from the initial state q 0 . Then there are only two ways to reach a state P, c in A H with q 1 ∈ P . Either P = {q 1 } and is reached directly via a transition from {q 0 } or |P | ≥ 2 and P = im δ b for some (a, b) ∈ M . For each c ∈ M 1 , this gives a total of 2 reachable states P, c . Therefore, we can enumerate the reachable states of A H as follows:

-the initial state {q 0 }, ε and the sink state {q ∅ }, ε , -at most 2 n−2 − 1 states of the form P, ε , where P ⊆ Q \ {q 0 , q ∅ }, -at most |M 1 | states of the form {q 1 }, c with c ∈ M 1 , -at most |M 1 | states of the form P, c such that P ⊆ Q \ {q 0 , q ∅ }, |P | ≥ 2, and q 1 ∈ P with c ∈ M 1 , -at most |M 1 |(2 n−3 − 1) states of the form P, c such that P ⊆ Q \ {q 0 , q 1 , q ∅ } with c ∈ M 1 .

This gives a total of at most 2 n−2 + |M 1 | · (2 n−3 + 1) + 1 reachable states in A H . This bound is witnessed by a (1,4)-semi-simple splicing system that is defined over an alphabet and ruleset that grows exponentially in the size of the number of states of the initial language. This is similar to the (2,4)-semi-simple case. We note also that one can arbitrarily increase the size of M by adding symbols and corresponding pairs of rules appropriately. We then get the following result. 

We have studied the state complexity of several variants of semi-simple splicing systems. Our results are summarized in Table 1 and we include the state complexity of (1,3)-semi-simple and (1,3)-simple splicing systems from [9] for comparison. Observe that for all variants of semi-simple splicing systems, the state complexity bounds for splicing systems with regular initial languages are reached with simple splicing witnesses defined over a three-letter alphabet. For semisimple splicing systems with finite initial languages, we note that the state complexity bounds for the (2, 3) and (1, 3) variants are reached by witnesses defined over a three-letter alphabet, while both of the (1, 4) and (2, 4) variants require an alphabet size that is exponential in the size of the DFA for the initial language.

We note that the witness for (2,3)-simple splicing systems with a finite initial language is defined over a fixed alphabet of size 7, while the problem remains open for (1,4)-simple splicing systems. Another problem that remains open is the state complexity of (1,4)-and (2,4)-simple and semi-simple splicing systems with finite initial languages defined over alphabets of size k for 3 < k < 2 n−3 . A similar question can be asked of (2,3)-simple splicing systems with a finite initial language for alphabets of size less than 7.

Separating some splicing models

On some classes of splicing languages

Splicing semigroups of dominoes and DNA

A survey on operational state complexity

Splicing systems and regularity

Semi-simple splicing systems

Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors

Splicing and regularity

State complexity of simple splicing

Descriptional complexity of splicing systems

Simple splicing systems

Regularity of splicing languages

On the splicing operation