1 Introduction

Program Synthesis (PS) is the task of automatically generating a computer program written in a programming language of choice using some form of specification [6]. This can be framed as a search problem that explores the space of possible programs under a set of constraints, usually as a grammar and a maximum program size, that remove undesirable solutions from the search space.

A convenient format for the specification is a set of input-output examples that demonstrate the expected outputs for several different input cases. In this particular case, the task is called Inductive Synthesis or Programming-by-Example (PBE). The main advantage of this approach is that sets of examples are easy to create, and often do not require deep knowledge of the problem in question. However, the lack of corner or special cases in the set allows for the generation of programs that do not follow the original intent of the user.

Among the vast selection of possible search algorithms, we highlight Genetic Programming (GP) [10]. GP is a search algorithm that tries to balance the exploration and exploitation of the search space using recombination and perturbation of a bag of solutions. It employs selective pressure at every step, inspired by the evolution of species, to favor the fittest solutions when replicating and applying such operators. Some benchmark problems extracted from common programming tasks have been successfully solved by recent variations of the original GP algorithm, such as PushGP [8], CBGP [15], GE [13], G3P [3], and HOTGP [1].

The representation of the solutions and the imposed constraints play a major role in search algorithms as they describe the navigability in this space and the coverage of possible programs. For example, the Push language [17] is a stack-based programming language in which the operations are executed sequentially storing and retrieving values from the stack corresponding to the types of the operation. This provides additional flexibility during the execution of a computer program as any invalid operation can be skipped and, thus, the operation at any given timestep is not constrained by output type of the previous step. This increases the possibilities of navigation of the search space. On the other hand, in Grammatical Evolution [13] the search space is defined by the grammar being evolved. This can limit the size of the search space and ensure that each step of the generated program is valid. Likewise, type-safe representations, such as in CBGP [15] and HOTGP [1], carry at each node the information of the input and output types, constraining the space to valid programs (i.e., that execute without errors) and solutions that follow the type specification.

A common challenge to GP algorithms is how to handle recursion or loops. The problem here lies in the fact that it is possible to generate an unnecessarily long or even unbounded recursion/loop. This is alleviated by the use of programming patterns that hide the loop or recursion behind a declarative language. Popular examples of such patterns are the map, filter, and fold functions.

In particular, fold describes a recursive pattern capable of expressing recursive algorithms that traverse a structure aggregating the partial results. Examples of the use of this pattern are sum, product, and even insertion sort. A complementary pattern that describes the recursion that generates or builds a structure is known as unfold. For example, unfold can be used to generate the list of Fibonacci numbers, or reversing the order of the elements of a list.

Although many algorithms can be described using folds or unfolds, this approach has some limitations such as working only on lists, or the inability to store partial results. A more general set of recursive patterns that involves folding and unfolding is known as Recursion Schemes [11]. The Recursion Schemes extend the common folding and unfolding operations to work on any inductive type and include additional mechanisms to handle a wider variety of recursive function patternsFootnote 1. These Recursion Schemes can be used to guide the synthesis of recursive programs, as their structure is well-defined and can be used as scaffolding, with variations only in certain parts of the program.

Origami [2] is an algorithm capable of synthesizing typed, pure, functional programs that support different Recursion Schemes. Origami’s authors presented a proof-of-concept implementation, capable of partially synthesizing one single scheme. This implementation was evaluated, yielding promising preliminary results. In this paper, we provide and evaluate the first complete implementation of Origami, following the description given in the original paper. This represents a meaningful step in assessing the effectiveness of integrating Recursion Schemes into Program Synthesis. We evaluate the performance of Origami using the problems described in the General Program Synthesis Benchmark Suite 1 (PSB1) [9], comparing the obtained results to well-known GP-based program synthesis algorithms. When comparing the success rate of each problem, this new version of Origami achieves the best results in \(25\%\) more problems than its predecessor (HOTGP) and an even higher increase when compared to other approaches. Additionally, Origami was capable of achieving a high success rate (higher than \(70\%\)) in problems that most algorithms achieve less than \(50\%\).

This text is organized as follows. Section 2 brings a brief literature review of Functional Program Synthesis and Recursion Schemes. Section 3 presents Origami and describes the details of our implementation. An analysis of the results and comparisons of Origami to HOTGP and other well-known methods is shown in Sect. 4. Finally, Sect. 5 concludes our work.

2 Related Work

There have been several attempts to use functional programming as the basis of program synthesis. One of the earliest attempts employed types to guide the search [12]. More recently, the use of recursion schemes [18] and higher-order functions [1, 5, 14] to represent recursion were proposed.

Out of these approaches, the one that is more closely related to this work is HOTGP [1], which is a GP algorithm that synthesizes pure, typed, and functional programs. Its approach to recursion includes support for higher-order functions, \(\lambda \)-functions, and parametric polymorphism.

Notably, in [18] the authors presented the first algorithm for synthesizing programs that exploit Recursion Schemes. Their work focuses only on catamorphisms over natural numbers using Peano representation (i.e., the inductive type of natural numbers). The authors evaluate their approach with variations of the Fibonacci sequence, successfully obtaining the correct programs.

Origami was originally proposed in [2] where the authors evaluated the feasibility of using Recursion Schemes to synthesize recursive programs. In that work, they showed that the entire PSB1 benchmark can be solved by one of four different Recursion Schemes: catamorphism, accumulation, anamorphism, and hylomorphism. It also described preliminary experiments with catamorphism showing that, for the problems that are solvable with this scheme, the use of scaffolding improved the success rate when compared to HOTGP. These results brought evidence that the synthesis process is simplified once the correct Recursion Scheme is determined, as it just needs to evolve non-recursive expressions.

Large Language Models as Code Assistants have also been employed for Program Synthesis. In [16], the authors use GitHub Copilot to synthesize programs from a textual description of the problem, obtaining correct programs more often than a selection of GP based synthesizers in 50% of the evaluated problems.

3 Origami Program Synthesis

Origami’s implementation follows a Koza-style Genetic Programming [10] (tree representation). The main distinctions to traditional approaches are the introduction of immutable nodes (ensuring a certain Recursion Scheme); and the type-safety of the genetic operators (the same approach taken by HOTGP [1]).

The implementation is based on patterns, which are used to represent different Recursion Schemes. A pattern is composed of immutable nodes and a set of evolvable slots that, when replaced with expressions, can be evaluated. The immutable nodes describe the main definition of the Recursion Scheme (see Sect. 3.1) and are fixed once we choose the pattern, while the evolvable slots represent the inner mechanisms that need to be synthesized to correspond to the expected behavior described by the dataset. These slots have a well-defined output type (inferred from the problem description), and a well-defined set of bindings to which the expression has access.

In this work we focus on the six different patterns that comprehend the minimal set required to solve PSB1. Naturally, Origami is not limited to these patterns, and more could be included as needed. Section 3.1 details these six patterns. Due to space constraints, we assume the reader has a basic understanding of Recursion Schemes [11], how they can be used to solve PSB1 [2], as well as the Haskell language notation, which is similar to the ML notation.

3.1 Patterns

NoScheme. This is the simplest pattern in Origami, as it does not employ any recursion at all. It is represented by the following code:

figure a

This pattern has just a single slot, which has all the arguments in scope and returns a value of the same type as the output of the program. Its main use is to accommodate for problems that do not require any recursion.

Catamorphism over Indexed List. This pattern captures the most common Recursion Scheme observed in PSB1, and arguably in practical scenarios as well, i.e., folding a list from the right. In Meijer-notation [11], this would be represented by the banana brackets \((\!|b, \oplus |\!)\), where b is the initial value and \(\oplus \) is the combining function. In the context of Origami, it can be represented as:

figure b

In a problem with arguments of type i\(_\texttt {0}\) \(\ldots \) i\(_\texttt {n}\) and of output type o, where 0 \(\equiv \) [e]Footnote 2, this pattern’s slots are typed as follows:

  •  : : o, with nothing in scope;

  •  : : o, with scope { i : : Int; x  : :  e; acc  : :  o; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\ldots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)}.

This pattern will be referred simply as Cata in the remainder of this paper.

Curried Catamorphism over Indexed List. This pattern captures a common variation of the Catamorphism, and can be represented by the following code:

figure e

As a problem of type i\(_\texttt {0}\) \(\texttt {->}\) i\(_\texttt {1}\) \(\texttt {->}\) o can also be seen in its curried form as i\(_\texttt {0}\) \(\texttt {->}\) (i\(_\texttt {1}\)  \(\texttt {->}\) o), we can employ Catamorphism to accumulate a function over the first argument, and then apply this function to the second argument. This is useful when we need to apply a Catamorphism over the zip of two lists [2].

In a problem with arguments of type i\(_{0}\) , \(\texttt {i}_\texttt {1}\)Footnote 3 and of output type o, where i\(_\texttt {0}\) \(\equiv \) [e], this pattern’s slots are typed as follows:

  •  : : o, with scope { ys  : :  i\(_\texttt {1}\) };

  •  : : o, with scope { i : : Int; x  : :  e; f  : :  i\(_\texttt {1}\)-> o; ys : : i\(_\texttt {1}\) }.

For brevity, this will be referred to as simply CurriedCata. Note that both this and the previous pattern use Indexed Linked List as the data structure, allowing the program to access each element’s index and value. For the remaining patterns we employ a regular list since it is enough to solve their problems (as shown by the canonical solutions presented in [2]).

Anamorphism to a List. This pattern is commonly used in Haskell as unfold, which is used to generate a list. In Meijer-notation [11], this would be represented by the concave lenses where g is the generator function, and p is the predicate. In the context of Origami, it can be represented by the following code:

figure i

In a problem with arguments of type i\(_{\texttt {0}}\) \(\ldots \) i\(_\texttt {n}\) and of output type o, where o \(\equiv \) [e], this pattern’s slots are typed as follows:

  •  : : i\(_\texttt {0}\), with scope { arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\ldots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : Bool, with scope { seed  : :  i\(_\texttt {0}\) ; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\ldots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : e, with scope { seed  : :  i\(_\texttt {0}\) ; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\ldots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : i\(_\texttt {0}\), with scope { seed  : :  i\(_\texttt {0}\) }.

Note that while we do not enforce arg\(_\texttt {0}\) to be used in , it must be of the same type as arg\(_\texttt {0}\), as all of the solutions for PSB1 respected this constraint. For brevity, this will be referred to as simply Ana in the rest of this paper.

Accumulation over a List. This pattern captures using an accumulation strategy before using a foldr, and can be represented by the following code:

figure o

In a problem with arguments of type i\(_\texttt {0}\) \(\ldots \) i\(_\texttt {n}\) and of output type o, where i\(_\texttt {0}\) \(\equiv \) [e], and given a type a, this pattern’s slots are typed as follows:

  •  : : a, with scope { arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\ldots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : a, with scope { x  : :  e; xs  : :  [e]; s  : :  a; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : o, with scope { s  : :  a; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : o, with scope { x  : :  e; acc  : :  o; s  : :  a; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)}.

This is the first pattern whose types are not fully determined by the type of the arguments and the expected output type: the accumulator type a. Types such as this will be referred to as unbound types. To keep the implementation simple, we assume unbound types are known and provided by the user. The exploration of different types is an interesting challenge that warrants dedicated research. This pattern will be referred to as simply Accu in the rest of this paper.

Hylomorphism Through a List. This pattern captures an Anamorphism followed by a Catamorphism, such as applying foldr to the result of unfold, in Haskell. In Meijer-notation [11], this would be represented by the envelopes . In Origami, it is represented by the following code:

figure u

In a problem with arguments of type i\(_\texttt {0}\) \(\ldots \) i\(_\texttt {n}\) and of output type o, and given a type a, this pattern’s slots are typed as follows:

  •  : : Bool, with scope { seed  : :  i\(_\texttt {0}\) ; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : a, with scope { seed  : :  i\(_\texttt {0}\) ; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : i\(_\texttt {0}\), with scope { seed  : :  i\(_\texttt {0}\) ; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)};

  •  : : o, with nothing in scope;

  •  : : o, with scope { x  : :  a; acc  : :  o; arg\(_\texttt {0}\)  : :  i\(_\texttt {0}\) \(\dots \) arg\(_\texttt {n}\)  : :  i\(_\texttt {n}\)}.

This pattern also contains an unbound type: the intermediary list has elements of type a. This pattern will be referred to as Hylo.

3.2 Genetic Programming

Origami synthesizes the evolvable slots using a Genetic Programming (GP) [10] algorithm. Since the patterns require more than a single slot, we represent each solution as a collection of programs represented as expression trees. Each element of this collection corresponds to one of the slots.

The GP starts with an initial random population of \(1\,000\) individuals, and iterates by applying either crossover to a pair of parents, or mutation to a single parent, generating \(1\,000\) new individuals in total. The entire population is replaced by the offspring population.

The initial population is generated using a ramped half-and-half where half of the individuals are generated using the full method and half using the grow method. The maximum depth for each method varies between 1 and 5. The parental selection is performed using a tournament selection of size 10.

Following a simple GP algorithm, in Origami the mutation randomly selects one of the evolvable slots, then picks one point in the tree at random to be replaced by a new subtree generated at random using the grow method, with a maximum depth of \(5 - d_ current \). Crossover also starts by picking one of the slots at random, then performing one of these two actions with equal probability: i) swap the entire slot of one parent with the same slot of the other parent; ii) swap two subtrees of the same output type from each parent.

3.3 Grammar

In HOTGP [1], the selected grammar focused on providing a minimal set of operations, including higher-order functions, that would enable the synthesis of programs under the functional programming paradigm.

With Origami, however, the main focus is assessing Recursion Schemes as the only means of synthesizing recursive programs. Therefore, we designed our grammar to avoid implicitly recursive functions, like map, filter, sum, and product. We acknowledge that this might remove shortcuts and potentially make the synthesis of certain problems harder. Notice that the recursion happens in the immutable nodes that describe the Recursion Schemes, so the recursion is provided rather than evolved. Additionally, the set of operations includes functions equivalent to those used by other methods, in particular those implemented by PushGP [8]. As a result, Origami has a larger set of operations than HOTGP. The full grammar is presented in Table 1. Once an execution is finished, the champion’s slots are refined using the same procedure that was used in HOTGP [1]. To refine a tree, we pick the root node and check if replacing it with any of its children leads to a correctly-typed solution with an equal or better fitness. If so, we replace it with the best child; otherwise, we keep the original node. This process continues recursively, traversing the tree and greedily replacing nodes with their children when needed. This procedure applies Occam’s Razor to choose a simpler solution [7], making sure the fitness in the training set is never worse.

4 Experimental Results

To evaluate our approach we conducted experiments to perform an automatic search for different patterns in the PSB1 [9] context. For each of the 29 datasets, we sequentially tried each pattern in increasing order of complexity: NoScheme; Cata, if arg\(_\texttt {0}\) is a list; CurriedCata, if the problem has two arguments and arg\(_\texttt {0}\) is a list; Ana, if the return type is a list; Accu, if arg\(_\texttt {0}\) is a list; Hylo.

For each dataset, we executed 30 seeds of each pattern starting from the simplest and testing other patterns if none of the seeds succeeded in finding a solution (i.e., the success rate was \(0\%\)). Each seed followed the instructions provided by PSB1, using the recommended number of training and test instances, and included the fixed edge cases in the training data, as well as using the fitness functions described in [9]. We also made the same adaptations to the benchmarks as in [2], and similar to [1] and [16]. Specifically, we changed the input of the grade benchmark from (arg\(_\texttt {0}\) , arg\(_\texttt {1}\) , arg\(_\texttt {2}\) , arg\(_\texttt {3}\) , arg\(_\texttt {4}\) ) to ([(arg\(_\texttt {0}\) , ’A’), (arg\(_\texttt {1}\) , ’B’), (arg\(_\texttt {2}\) , ’C’), (arg\(_\texttt {3}\) , ’D’)], arg\(_\texttt {4}\) ); and, since we only generate pure programs, we adapted the output to return the results instead of printing them on: checksum, digits, even-squares, for-loop-index, grade, pig-latin, replace-space-w-nl., string-differences, syllables, and word-stats.

Note that we deliberately placed the patterns with unbound types at the end of the sequence. Therefore, the unbound type in both Accu and Hylo is only decided after all other patterns have failed. For the benchmarks that Origami failed to find a solution with the other patterns, we applied one of these two patterns choosing the type that was known to be correct according to the canonical solutions [2]. For the cases in which the canonical solutions did not use Accu or Hylo, we chose a reasonable type as needed (see Table 2).

Table 1. The complete set of operations available for Origami. Each dataset only had access to the operations that involved its allowed types according to [9].
Table 2. The chosen types for the unbound types in Accu and Hylo. The type is colored in blue when the decision was guided by the canonical solution.

The maximum tree depth was set to 5 for each slot. As Origami is based in HOTGP, which was empirically shown to be robust to changes to the crossover rate, we set it to the same value as HOTGP (\(50\%\)). We allowed a maximum of \(300\,000\) evaluations with an early stop whenever the algorithm finds a perfectly accurate solution according to the training data. For patterns in which termination is not guaranteed, namely Ana and Hylo, a maximum number of iterations was imposed (empirically set to \(10\,000\)). Non-termination is also an issue that can occur with CurriedCata. Specifically, Origami was synthesizing solutions with the slot , essentially creating a “fork bomb”. To tackle this, a maximum execution budget is enforced when the evaluation of a single iteration of a slot executes more than \(10\,000\) operations. In this case, the program is assigned an infinitely bad fitness. This limit was reached by less than \(0.5\%\) of the individuals.

Table 3 shows the percentage of executions in which Origami was able to synthesize a solution that completely solved the test set (i.e., success rate). Origami found a solution for all of the problems that were canonically solved by NoScheme as well as Cata. Surprisingly, it was also able to synthesize a solution for for-loop-index by using NoScheme, even though the canonical solution used Ana, and for grade by using Cata when the canonical solution used CurriedCata. Nonetheless, we also ran these problems with their canonical patterns and discovered Origami was also able to synthesize solutions, albeit less often. Moreover, Origami was able to find the solutions for 3 out of the 4 canonical CurriedCata problems, and 2 out of the 3 Ana problems. Accu and Hylo, however, appear to be the most difficult patterns to synthesize, as no solution for problems that canonically involve these patterns was found.

Table 3. Success rates obtained by Origami for each pattern in each benchmark. The “Best” column shows the highest success rate for that benchmark across all patterns, which is also underlined. We also show in blue the pattern of the canonical solution.

Considering the 4 canonical Accu problems, checksum and word-stats are historically hard, with few methods ever finding a solution. The same can be said for Hylo in the wallis-pi and collatz-numbers problems.

In vector-average, the canonical solution involved using Accu to compute both the sum and the count as a pair in the st slots, and using the alg slots to perform the division as a post-processing step, finally obtaining the average. The solution that got closer to the intended result was the following:

figure ab

Origami took a different approach from the canonical solution, by storing the length of the input in the second element of the tuple while having no use for the first element. The st section had no other purpose than to transmit this pre-processing step to the alg section. This solution got a perfect score during training but failed in testing for certain cases. If we were to replace min 0 (last arg\(_\texttt {0}\) ) by 0 and max (x - acc) x by x, then this solution would be correct.

The Hylo solution for sum-of-squares employed coalg to generate a list of all the numbers from 0 to arg \(_{\texttt {0}}\), and then used alg to square each number and accumulate the sum. Even though this was the simplest Hylo solution, as Hylo has 5 different slots, it has an increased search space in relation to other patterns, which seems to be a big challenge for the algorithm.

Table 4 compares Origami’s results to HOTGP’s. There was a substantial increase (\(>30\)) in the success rate in 6 problems. In the 17 problems where the absolute difference is \(<30\), we highlight syllables, double-letters and even-squares problems, as those were problems for which HOTGP was not able to synthesize a solution, whereas Origami was successful at least once. The two problems with a more noticeable decrease are replace-space-w-nl. and vector-average. These can be explained by the change in grammar between the two algorithms, as HOTGP’s solutions were arguably simpler due to having map and filter for replace-space-w-nl. and sum for vector-average. In a practical scenario, the inclusion of these functions would likely lead to a correct solution but, as previously noted, removing them was a conscious decision to enable the proper assessment of the impact of Recursion Schemes in PS. It would also allow for composite solutions, such as using Ana with a map inside instead of relying on Hylo to find the entire pattern, which might be easier to synthesize.

Table 4. Origami’s success rates compared to HOTGP’s on solved problems. The \(\varDelta \) column shows the relative success rate of Origami with respect to HOTGP.
Table 5. Success rate with the best values underlined. The last row displays the ratio of victories of each algorithm against Origami by the amount of tested problems.

To assess how Origami fares with relation to the best methods found in the literature, we compare its results to those obtained by PushGP [9], Grammar-Guided Genetic Programming (G3P) [3], the extended grammar version of G3P (here called G3P+) [4], Code Building Genetic Programming (CBGP) [14], G3P with Haskell and Python grammars (G3Phs and G3Ppy) [5], as well as with Github Copilot [16]. In some of those works, only a subset of the problems was chosen, often avoiding the most difficult ones (i.e., not previously solved by any other method), or problems not solvable by the proposed method itself. The results are reported in Table 5 (“–” indicates the authors did not test their method on that specific problem), and summarized in Table 6.

Table 6. Number of problems with a success rate \(\ge \) a certain % for each method.

Among the other GP algorithms, the best performer (DSLS) achieves a higher success rate than Origami in only 10 problems. Considering that they both get a 0 in pig-latin, Origami outperforms DSLS in 14 problems. Notably, Origami frequently outperforms CBGP and the G3P variants. It also has the highest number of problems solved with \(100\%\), \(\ge 75\%\), and \(\ge 50\%\), and is second-place in \(\ge 25\%\). When we consider problems to which Origami found at least one solution, we note that it outperforms HOTGP, CBGP, and all the G3P variations, placing Origami at the fourth place. It is also worth noting that Origami outperforms HOTGP in both the number of best results and amount of problems above all thresholds, which demonstrates it is a substantial improvement over HOTGP.

We also compare with the results obtained using Copilot on PSB1, as reported by [16]. In that paper, the authors tested Copilot with a different formulation of the program synthesis problem: instead of receiving example of input-outputs, they made the problem description and the function signature available to Copilot. This input format can be more difficult to process, as it requires the extraction of useful information from a textual description, but contains additional information that may be implicit in the input-output format. Out of the 29 problems, Copilot had better results in 14 of them, and Origami is equivalent or outperforms Copilot in 15 problems. While Copilot solves more problems (at least once) than Origami, it struggles with consistency, and does not achieve \(100\%\) success rate on any of the tested problems.

5 Conclusion

This work is the first full implementation of Origami, a GP algorithm proposed in [2], and builds on its previous work, HOTGP [1]. Origami’s main differential is the use of Recursion Schemes, well-known constructs in functional programming that enable recursive algorithms to be defined in a unified manner. The main motivation for using these in the PS context is enabling recursive programs to be synthesized in a controlled manner, without sacrificing expressiveness.

We evaluate our approach in the 29 problems in the PSB1 dataset, which is known to be solvable by just a handful of Recursion Schemes. In general, Origami performs better than other similar methods, synthesizing the correct solution more often than other methods in most problems. It was also able to obtain the highest count of problems with success rate \(=100\%\), \(\ge 75\%\) and \(\ge 50\%\) among the GP methods. Furthermore, Origami achieved comparable results to Github Copilot, solving some problems that the LLM achieved \(0\%\) score. We should stress that the problem formulation is different for both approaches, indicating that combining LLM with GP and Recursion Schemes could be beneficial to improve the results. These experimental results suggest that using Recursion Schemes to guide the search is a promising research avenue. Currently, the main challenge of Origami appears to be dealing with harder Recursion Schemes, such as Accumulation and Hylomorphism. Different evolutionary mechanisms, such as other selection methods and mutation/crossover operators, should be evaluated in this context to understand if they can positively impact the search process.