key: cord-0046691-uirxxouz
authors: Berg, Jeremias; Bacchus, Fahiem; Poole, Alex
title: Abstract Cores in Implicit Hitting Set MaxSat Solving
date: 2020-06-26
journal: Theory and Applications of Satisfiability Testing - SAT 2020
DOI: 10.1007/978-3-030-51825-7_20
sha: fc1791b85d290a46bdeef940b88dacafae137000
doc_id: 46691
cord_uid: uirxxouz

Maximum Satisfiability (MaxSat) solving is an active area of research motivated by numerous successful applications to solving NP-hard combinatorial optimization problems. One of the most successful approaches to solving MaxSat instances arising from real world applications is the Implicit Hitting Set (IHS) approach. IHS solvers are complete MaxSat solvers that harness the strengths of both Boolean Satisfiability (SAT) and Integer Linear Programming (IP) solvers by decoupling core-extraction and optimization. While such solvers show state-of-the-art performance on many instances, it is known that there exist MaxSat instances on which IHS solvers need to extract an exponential number of cores before terminating. Motivated by the structure of the simplest of these problematic instances, we propose a technique we call abstract cores that provides a compact representation for a potentially exponential number of regular cores. We demonstrate how to incorporate abstract core reasoning into the IHS algorithm and report on an empirical evaluation demonstrating that including abstract cores into a state-of-the-art IHS solver improves its performance enough to surpass the best performing solvers of the most recent 2019 MaxSat Evaluation.

Maximum Satisfiability (MaxSat), the optimisation extension of the Boolean Satisfiability (SAT) problem, has in recent years matured into a competitive and thriving constraint optimisation paradigm with several successful applications in a variety of domains [7, 8, 11, 16, 18, 19, 31] . As a consequence, the development of MaxSat solvers is an active area of research with the state-of-the-art solvers evaluated annually in the MaxSat Evaluations [4, 5] .

In this work, we focus on improving the Implicit Hitting Set (IHS) approach to complete MaxSat solving [4, 14, 29] . As witnessed by the results of the annual evaluations, IHS solvers are, together with core-guided [2, 20, [24] [25] [26] and model improving [22] algorithms, one of the most successful approaches to solving MaxSat instances encountered in practical applications. IHS solvers decouple MaxSat solving into separate core extraction and optimisation steps. By using a Boolean Satisfiability (SAT) solver for core extraction and an Integer Linear Programming (IP) optimizer, the IHS approach is able to exploit the disparate strengths of these different technologies.

Through this separation IHS solvers avoid increasing the complexity of the underlying SAT instance by deferring all numerical reasoning to the optimizer [13] . One drawback of the approach, however, is that on some problems an exponential number of cores need to be extracted by the SAT solver and given to the optimizer. In this paper we identify a seemingly common pattern that appears in the simplest problems exhibiting this exponential worse case. We propose a technique, which we call abstract cores, for addressing problems with this pattern. Abstract cores provide a compact representation for a potentially exponential number of ordinary cores. Hence, by extracting abstract cores and giving them to the optimizer we can in principle achieve an exponential reduction in the number of constraints the SAT solver has to extract and supply to the optimizer. The net effect can be significant performance improvements.

In the rest of the paper we formalize the concept of abstract cores and explain how to incorporate them into the IHS algorithm both in theory and practice. Finally, we demonstrate empirically that adding abstract cores to a state-of-theart IHS solver improves its performance enough to surpass the best performing solvers of the 2019 MaxSat evaluation.

MaxSat problems are expressed as cnf formulas F with weight annotations. A cnf formula consists of a conjunction (∧) of clauses, each of which is a disjunction (∨) of literals, a literal is either a variable v of F (a positive literal) or its negation ¬v (a negative literal). We will often regard F and clauses C as being sets of clauses and literals respectively. For example l ∈ C, indicates that literal l is in the clause C using set notation, and (x, ¬y, z) denotes the clause (x ∨ ¬y ∨ z).

A truth assignment τ maps Boolean variables to 1 (true) or 0 (false). It is extended to assign 1 or 0 to literals, clauses and formulas in the following standard way: τ (¬l) = 1 − τ (l), τ (C) = max{τ (l) | l ∈ C}, and τ (F) = min{τ (C) | C ∈ F}, for literals l, clauses C, and cnf formulas F, respectively. We say that τ satisfies a clause C (formula F) if τ (C) = 1 (τ (F) = 1), and that the formula F is satisfiable if there exists a truth assignment τ such that τ (F) = 1.

A MaxSat instance I = (F, wt) is a cnf formula F along with a weight function that maps every clause C ∈ F to a integer weight wt(C) > 0. Clauses C whose weight is infinite wt(C) = ∞ are called hard clauses while those with a finite weight are called soft clauses. I is said to be unweighted if all soft clauses have weight 1. We denote the set of hard and soft clauses of F by F H and F S , respectively.

An assignment τ is a solution to I if it satisfies F H (τ (F H ) = 1). The cost of a solution τ , cost (I, τ), is the sum of the weights of the soft clauses it falsifies, i.e., cost (I, τ) = C∈FS (1 − τ (C)) · wt(C). When the instance is clear from context we shorten notation to cost (τ ). A solution τ is optimal if it has minimum cost among all solutions: i.e. if cost (τ ) ≤ cost (τ ) holds for all solutions τ . The task in MaxSat solving is to find an (any) optimal solution. We will assume that at least one solution exists, i.e., that F H is satisfiable.

To simplify our notation it will be useful to transform all of the soft clauses in F so that they become unit clauses containing a single negative literal. If C ∈ F S is not in the right form we replace it by the soft clause (¬b) and the hard clause (C ∨ b), where b is a brand new variable and wt((¬b)) = wt(C). This transformation preserves the set of solutions and their costs. We call the variables in the resulting set of unit soft clauses blocking variables or b-variables for short. Note that assigning a b-variable b the value true is equivalent to falsifying its corresponding soft clause (¬b). We denote the set of b-variables of the transformed formula by F B , and write wt(b) for a b-variable b to denote the weight of its underlying soft clause wt(¬b). With this convention we can write the cost of a solution τ more simply as cost(τ ) = b∈FB wt(b) · τ (b). For any set B of b-variables we write cost (B) to denote the sum of their weights.

In the MaxSat context a core κ is defined to be a set of soft clauses κ ⊆ F S that are unsatisfiable given the hard clauses, i.e.. κ ∪ F H is unsatisfiable. This means that every solution τ , which by definition must satisfy F H , must falsify at least one soft clause in κ. Given that the soft clauses are of the form (¬b) for some b-variable b we can express every core as a clause κ = b∈κ b containing only positive b-variables: one of these variables must be true. This clause is entailed by F H . We can also express κ as a linear inequality {b|(¬b)∈κ} b ≥ 1 that is also entailed by F H . A MaxSat correction set hs is dually defined to be a set of soft clauses hs ⊆ F S whose removal renders the remaining soft clauses satisfiable with the hard clauses, i.e., (F S − hs) ∪ F H is satisfiable.

Algorithm 1 shows the implicit hitting set (IHS) approach to MaxSat solving. Our specification generalizes the original specification of [13] . In particular, we use upper and lower bounds, terminating when these bounds meet, rather than waiting until the optimizer returns a correction set as in [13] . We use this reformulation as it makes it easier to understand our extension to abstract cores.

Starting from a lower bound of zero, an upper bound of infinity, and an empty set of cores C (line 3), the algorithm computes a minimum cost hitting set of its current set of cores C. This is accomplished by expressing each core in C as its equivalent linear inequality b∈κ b ≥ 1 and using the optimizer to find a solution hs with the smallest weight of true b variables (Fig. 1a) . This corresponds to computing the minimum weight of soft clauses that need to be falsified in order to satisfy the constraints imposed by cores found so far. Hence, cost (hs) must be a lower bound on the cost of any optimal solution: every solution must satisfy these constraints. This allows us to update the lower bound (line 6) and exit the while loop if the lower bound now meets the upper bound. (Note that since new cores are continually added to the optimizer's model the lower bound will never decrease).

The optimizer's solution is then used to extract more cores that can be added to the optimizer's constrains for the next iteration. Core extraction is done by the ex-cores procedure shown in Algorithm 2. ex-cores extracts cores until it finds a solution τ . If the solution has lower cost than any previous solution the upper bound UB will be updated and this best solution stored in τ best . The set of cores K extracted are returned and added to the optimizer's model potentially increasing the lower bound. subject to: The original IHS formulation [13] extracted only one core from each optimizer solution, but this was shown to be a significant detriment to performance [15] requiring too many calls to the optimizer. The procedure ex-cores gives one simple way of extracting more than one core from the optimizer's solution hs. It can be extended in a variety ways to allow extracting large numbers (hundreds) of cores from each optimizer solution [12, 15, 28] . In our implementation we used such techniques.

ex-cores (Algorithm 2) uses a SAT solver and its assumption mechanism to extract cores. It first initializes the assumptions to force the SAT solver to satisfy every soft clause not in hs. More specifically, for every soft clause (¬b) not in hs, ¬b is assumed, forcing the solver to satisfy this soft clause. Then it invokes ex-cores-sub which iteratively calls the SAT solver to find a solution satisfying F H along with the current set of assumptions. After each core is found its bvariables are removed from the assumptions (line 11) so that on each iteration we require the SAT solver to satisfy fewer soft clauses. Since F H is satisfiable, eventually the SAT solver will be asked to satisfy so few soft clauses that it will find a solution τ terminating the loop.

In the original IHS specification [13] IHS terminates with an optimal solution when the optimal hitting set hs is a correction set. This condition will also cause termination in our specification. In particular, before calling ex-cores the lower bound LB is set to cost (hs) (Algorithm 1, line 6). If hs is a correction set, a if (cost(τ ) < UB ) then τ best ← τ ; UB ← cost(τ ); 10 return K; 11 else K ← K ∪ {κ}; assumps ← assumps − {¬b|b ∈ κ} Algorithm 2: Extracting multiple cores from a single optimizer solution solution τ will be found by the SAT solver in the first iteration of Algorithm 2, (line 7). That τ will have cost (τ ) = cost (hs) as it cannot falsify any soft clause not in hs and cannot have cost less than the lower bound. Hence, on ex-cores's return Algorithm 1 will terminate with UB = LB . As shown in [13] the optimizer's must eventually return a correction set. This means that the original proof that IHS terminates, returning an optimal solution given in [13] continues to apply our reformulated Algorithm 1.

Algorithm 1 can also terminate before the optimizer returns a correction set. In particular, τ best can be set to an optimal solution (Algorithm 2, line 9) well before we can verify its optimality. In this case termination can occur as soon as the optimizer has been given a sufficient number of cores to drive its lower bound up to cost (τ best ), even if the optimizer's solution is not a correction set. In fact, termination in the IHS approach always requires that the optimizer be given enough constraints to drive the cost of its optimal solution up to the cost of the MaxSat optimal solution.

, (¬b 4 )} all having weight 1, Algorithm 1 will first obtain hs = ∅ from Min-Hs as there initially are no cores to hit. ex-cores will then SAT solve F H under the assumptions ¬b 1 , ¬b 2 , ¬b 3 , ¬b 4 trying to satisfy all softs not in hs. This is unsat and any of a number of different cores could be returned. Say that the core (b 1 , b 2 ) is returned. ex-cores then attempts another SAT solve, this time with the assumptions ¬b 3 and ¬b 4 . Now the SAT solver returns the core (b 3 , b 4 ). Finally, the SAT solver will be called to solve F H under the empty set of assumptions. Say that the solver finds the satisfying assignment τ = {¬b 1 , b 2 , ¬b 3 , b 4 } setting UB to 2 and τ best to τ . After returning to the main IHS routine, Min-Hs will be asked to compute an optimal solution to the set of cores

It might return hs = {b 1 , b 4 } and set LB = cost (hs) = 2. Now LB is equal to UB and τ best can be returned since it is an optimal solution. Note that in this example hs, is not a correction set.

As mentioned above IHS cannot terminate until its optimizer has been given enough constraints to drive the cost of an optimal solution up to be equal to the cost of an optimal MaxSat solution. As shown in [12] in the worst case this can require giving the optimizer an exponential number of constraints.

Example 2. Let n and r be integers with 0 < r < n. Consider the MaxSat instance F n,r with F H n,r = cnf(

is a cnf encoding of the cardinality constraint stating that at least r soft clauses must be falsified. The cost of every optimal solution is thus r; the maximum number of soft clauses that can be satisfied is n − r; and every subset of n − r + 1 soft clauses must be a core. Let C be the set of all of such cores. From the results of [12] we have that if the optimizer is given all cores in C it would yield solutions hs with cost (hs) = r; furthermore, if even one core of C is missing from the optimizer the optimizer solutions hs would have cost (hs) < r. This means that Algorithm 1 will have to extract n n−r+1 cores for the optimizer before it can reach the cost of an optimal MaxSat solution and terminate. When r is close to n/2 the number of cores required for termination is exponential in n.

The results of the 2019 MaxSat Evaluation [4, 5] witness this drawback in practice. The drmx-atmostk set of instances in the evaluation contain 11 instances with the same underlying structure as Example 2. Out of these, the IHS solver MaxHS [13, 14] , failed to solve 8 out of 11 when given an hour for each instance, while the best performing solvers were able to solve all 11 instances in under 10s.

Example 2 shows that a significant bottleneck for the IHS approach on some instances is the large number of cores that have to be given to the optimizer. Thus, a natural question to ask is whether or not there exists a more compact representation of this large number of cores that can still be efficiently reasoned with by the IHS algorithm. In this section we propose abstract cores as one such representation. As we will demonstrate, each abstract core compactly represents a large number of regular cores. By extracting abstract cores with the SAT solver and then giving them to the optimizer, we can communicate constraints to the optimizer that would have otherwise potentially required an exponential number of ordinary core constraints.

The structure of the instances F n,r discussed in Example 2 provides some intuition for abstract cores. In these instances the identity of the variables does not matter, all that matters is how many are set to true and how many are set to false. For example, in any core κ of F n,r we can exchange any soft clause C ∈ κ for any other soft clause C ∈ κ and the result will still be a core of F n,r . In other words, every soft clause is exchangeable with every other soft clause in these instances. While it seems unlikely that complete exchangeability would hold for other instances, it is plausible that many instances might contain subsets of soft clauses that are exchangeable or nearly exchangeable. In particular, in any MaxSat instance the cost of a solution depends only on the number of soft clauses of each weight that it falsifies. The identity of the falsified soft clauses does not matter except to the extent that F H might place logical constraints on the set of soft clauses that can be satisfied together. 1 Abstraction Sets. Suppose we have a set of b-variables all with the same weight and we want to exploit any exchangeability that might exist between their corresponding soft clauses. This can be accomplished by forming an abstraction set. An abstraction set, ab, is a set of b-variables that have been annotated by adding |ab| new variables, called ab's count variables, used to indicate the number of true b-variables in ab (i.e. the number of corresponding falsified soft clauses). The count variables allow us to abstract away from the identity of the particular b-variables that have been made false. We let ab.c denote the sequence of ab's count variables, and let the individual count variables be denoted by ab.c [1] , . . ., ab.c [|ab|] . Every count variable has a corresponding definition, with the i'th count variable being defined by the constraint ab.c[i] ↔ b∈ab b ≥ i. Note that these definitions can be encoded into cnf and added to the SAT solver using various known encodings for cardinality constraints [3, 6, 27, 30] .

Let AB be a collection of abstraction sets. We require that (1) the sets in AB are disjoint (so no b-variable is part of two different abstraction sets) and (2) that all of the b-variables in a specific abstraction set ab ∈ AB have the same weight (variables in different abstraction sets can have different weights). Let AB.c = ab∈AB ab.c be the set of all count variables.

An abstract core is a clause C such that (1) all literals C are either positive b-variables or count variables, ∀l ∈ C (l ∈ F B ∨ l ∈ AB.c); and (2) C is entailed by F H and the conjunction of the count variable definitions, i.e.,

As pointed out in Sect. 2 every ordinary core is equivalent to a clause containing only positive b-variables that is entailed by F H . Abstract cores, can be ordinary cores containing only b-variables but they can also contain positive count variables. Like ordinary cores they also must be entailed by F H (and the count variable definitions that are required to give meaning to the count variables they contain).

Consider an instance F n,r defined in Example 2. Say we form an single abstraction set, ab, from the full set of blocking variables F B n,r . Then F n,r will have among its abstract cores the unit clause (ab.c[r]) asserting that b∈FB n,r b ≥ r. This single abstract core is equivalent to the conjunction of n n−r+1 non-abstract cores. In particular, with n b-variables, asserting that at least r must be true entails that every set of n − r + 1 b-variables must contain at least one true b-variable. That is, (ab.c[r]) entails n n−r+1 different clauses each of which is equivalent to a non-abstract core. It is not difficult to show that entailment in the other direction also holds giving equivalence. clauses. Hence, C is equivalent to the conjunction of k i=1 |ab i | |ab i |−ci+1 nonabstract cores. In other words, abstract cores achieve the desideratum of providing a compact representation of a large number of cores. We address the second desideratum of being able to reason efficiently with abstract cores in the IHS algorithm in the next section. It can also be noted that core-guided solvers use cardinality constraints and thus are able to generate abstract cores, although they use these cores in a different way than our proposed approach. 

Algorithm 3 shows the IHS algorithm extended with abstract cores. Its processing follows the same steps as used earlier in the non-abstract IHS algorithm (Algorithm 1). There are however, three changes: (1) the optimizer must now solve a slightly different problem, (2) the abstraction sets are used in ex-abs-cores when extracting new constraints for the optimizer and (3) New Optimization Problem: The optimization problem shown in Fig. 2a is very similar to the previous minimum cost hitting set optimization (Fig. 1a) . It continues to minimize the cost of the set of b-variables that have to be set to true in order to satisfy the constraints. Each abstract core κ ∈ C is a clause and thus is equivalent to the linear constraint x∈κ x ≥ 1, just like the non-abstract cores. The abstract cores can, however, contain count variables ab.c[k] each of which has a specific definition. These definitions need to be given to the optimizer as linear constraints. For each count variable ab.c[i] the constraints added are (a)

That is, when ab.c[k] is 1 (true) constraint (a) ensures that the sum of ab's b-variables is ≥ k and constraint (b) becomes trivial; and when ab.c[k] is 0 (false) constraint (a) becomes trivial and constraint (b) ensures that the sum of ab's b-variables is < k. These definitions ensure the intended interaction between abstract cores and count variables. For example, if the optimizer has the abstract core constraint b 1 + ab.c [5] + b 2 ≥ 1 it must be able to reason that if it chooses to satisfy this constraint by setting ab.c [5] = 1 then it must also set 5 of the b-variables in ab to 1. The definitions allow this inference.

Extracting Abstract Cores: As before the optimizer's solution is used to create a set of assumptions for the SAT solver. Cores arise from the conflicts the SAT solver finds when using these assumptions. For ordinary cores ex-cores (Algorithm 2) used a set of negated b-variables as assumptions (ensuring that the corresponding set of soft clauses must be satisfied). If the SAT solver finds a conflict over these assumptions, the conflict will be a clause containing only negated assumptions; i.e, a clause containing only positive b-variables. Such clauses are ordinary cores. Hence, if we wish to extract abstract cores, we must give the SAT solver assumptions that consist of negated b-variables and negated count variables. Any conflicts that arise will then contain positive b-variables and positive count variables and will thus be abstract cores.

In the non-abstract case, the optimizer's solution hs specifies a set of bvariables that can be set to true to obtain an optimal solution to the current set of constraints. That is, hs provides a set of clauses that, if falsified, will most cost effectively block the cores found so far. In the abstract case, the optimizer's solution is also a set of b-variables with the same properties. All that has changed is the type of constraints the optimizer has optimized over.

Consider an abstraction set in the current set of abstractions ab ∈ AB. Say that ab is the set of b-variables {b 1 , b 2 , b 3 , b 4 }. Further, suppose that the optimizer returns the set hs = {b 1 , b 4 , b 5 } as its solution, and that the full

In the non-abstract case, the SAT solver will be allowed to make b 1 , b 4 and b 5 true, while being forced to make b 2 , b 3 , and b 6 false. In particular, the SAT solver will be called with the set of assumptions ¬b 2 , ¬b 3 and ¬b 6 , i.e., the set {¬b | b ∈ (F B − hs)} (line 7, Algorithm 2). Notice, that the SAT solver is being allowed to make specific bvariables in ab ∩ hs true (namely b 1 and b 4 ) , while being forced to make specific b-variables in ab − (ab ∩ hs) false (namely b 2 and b 3 ) . Given that we believe the b-variables in ab to exchangeable, we can achieve abstraction by removing these specific choices. In particular, instead of assuming that b 2 and b 3 are false and forcing the SAT solver to satisfy these specific soft clauses, we can instead assume ¬ab.c [3] . This means allowing at most two b-variables in ab to be true, K ← ex-cores-sub(assumps, UB , τ best ); 8 optionally: K ← K ∪ ex-cores(hs, UB , τ best ); 9 return K Algorithm 4: Extracting Abstract cores from the optimizer solution forcing the remaining |ab| − 2 ( = 2) b-variables to be false. Hence, the SAT solver must satisfy at least two soft clauses from the set {(¬b 1 ), (¬b 2 ), (¬b 3 ), (¬b 4 )} corresponding to ab, but it is no longer forced to try to satisfy the specific clauses (¬b 2 ) and (¬b 3 ). Hence, we can use {¬ab.c [3] , ¬b 6 } as the SAT solver's assumptions and thus be able to extract an abstract core. Note also that since the weight of every b-variable in ab is the same, the SAT solver is still being asked to find a solution of cost equal to cost (hs). Using this insight we can specify the procedure ex-abs-cores used to extract abstract cores.

Algorithm 4 shows the procedure ex-abs-cores. Once it has set up its assumptions this procedure operates exactly like ex-cores, calling the same subroutine ex-cores-sub to iteratively extract some number of cores. It first adds the negation of all b-variables not in hs: the optimizer wants to satisfy all of these soft clauses. Then it performs abstraction. It removes the b-variables of each abstraction set ab from the assumptions, and adds instead a single count variable from ab. The optimizer's solution has made k = |hs ∩ ab| of ab's bvariables true. So we permit the SAT solver to make this number of ab's bvariables true, but no more. This is accomplished by giving it the assumption ¬ab.c[k + 1]. Note that ¬ab.c[k + 1] ↔ b∈ab b ≤ k by the definition of the count variables. Finally, if every b-variable of ab is in hs we need not add anything to the set of assumptions (line 5): the SAT solver can freely make all of ab's b-variables true.

ex-abs-cores also has the option of additionally extracting a set of nonabstract cores by invoking its non-abstract version (line 8). Abstract and nonabstract cores can be freely mixed in Abstract-IHS. Due to the indeterminism in the conflicts the SAT solver returns, the non-abstract cores need not be subsumed by the abstract cores. Hence, in practice it is often beneficial to extract both.

The correctness of the IHS algorithm with abstract cores is easily proved.

Abstract-IHS when called on (F, wt) must eventually terminate returning an optimal solution.

Proof. First observe that the extra clauses E used to define the count variables in AB.c do not change the set of solutions (models of F H ) nor their costs as they are definitions. In particular, any model τ of F H can be extended to a model of F H ∪ E by appropriately setting the value of each count variable, and any model τ E of F H ∪E becomes a model of F H once we remove its assignments to the count variables. In both cases the cost of the model is preserved. Therefore, we will prove that Abstract-IHS eventually terminates returning an optimal solution to (F ∪ E, wt) (with every clause in E being hard): this optimal solution provides us with an optimal solution to (F, wt). From the definitions of the count variables in E and the soundness of the abstract cores computed as assumption conflicts by the SAT solver, we see that every constraint in the optimizer's model is entailed by F H ∪ E. That is, every solution of F H ∪E is also a solution of the optimizer's constraints. Therefore, the cost of the optimizer's optimal solutions, LB , is always a lower bound on the cost of an optimal solution of F H ∪E. Furthermore, τ best is always a solution of F H ∪E as it is found by the SAT solver. Therefore, when UB = cost (τ best ) = LB , τ best must be an optimal solution. Hence we have that when Abstract-IHS returns a solution, that solution must be optimal.

Furthermore, when the optimizer returns a solution hs to its model and hs does not cause termination, then Abstract-IHS will compute a new abstract core κ that hs does not satisfy. This follows from the fact κ is falsified by all solutions that make false exactly the same set of un-abstracted b-variables and exactly the same count of b-variables from each abstraction set as hs. Hence, once we add κ to the optimizer we block the solution hs. There are only a finite number of solutions to the optimizer's constraints since the variables all 0/1 variables, and every optimal MaxSat solution of F H ∪ E always satisfies the optimizer's model. Therefore, as more constraints are added to the optimizer it must eventually return one of these optimal MaxSat solutions causing Abstract-IHS to terminate.

Consider running Abstract-IHS on the formula used in Example 1: 1 ), (¬b 2 ), (¬b 3 ), (¬b 4 )}, and all weights equal to 1. First Min-Abs is called on an empty set of constraints, and it returns hs = ∅. Say that update-abs creates a single abstraction set, AB = {ab}, with ab = {b 2 , b 3 } and that it is unchanged during the rest of the run.

Using hs, ex-abs-cores will initialize its assumptions to {¬b 1 , ¬ab.c [1] , ¬b 4 } and call the SAT solver. These assumptions are unsat. Let the conflict found be the unit clause (ab.c [1] ). In ex-cores-sub the next SAT call will be with the assumptions {¬b 1 , ¬b 4 }. These assumptions are satisfiable and the solution τ = {¬b 1 , b 2 , b 3 , ¬b 4 } is returned. The upper bound UB will be set to cost(τ ) = 2 and τ best will be set to τ .

ex-abs-cores now returns and the optimizer is called with the set of abstract cores C = {(ab.c [1] )}. The optimizer can return two different optimal solutions {b 2 } or {b 3 }, and say that it returns the first one hs = {b 2 }. This will set the lower bound LB = cost (hs) = 1. Then ex-abs-cores will be called again and from hs it will initialize its assumptions to {¬b 1 , ¬ab.c [2] , ¬b 4 }, which is unsat with the unique conflict (b 1 , ab.c [2] , b 4 ). Hence, the next SAT call will be with an empty set of assumptions and a solution will be found. Suppose that this solution is the same as before, so that neither UB nor τ best is changed. ex-abs-cores will then return and the optimizer called with the accumulated cores {(ab.c [1] ), (b 1 , ab.c [2] , b 4 )}. There are different choices for the optimal solution, but say that it returns {b 2 , b 3 } as its optimal solution. This will reset the lower bound LB to 2, the lower bound will meet the upper bound, and the MaxSat optimal solution τ best = {¬b 1 , b 2 , b 3 , ¬b 4 } will be found.

Abstract cores can decrease the worst-case number of cores the IHS algorithm needs to extract. Consider the instances F n,r from Examples 2 and 3. As discussed in Example 2 when r is close to n/2 these instances have an exponential number of non-abstract cores all of which must be extracted by the IHS algorithm. If on the other hand all b-variables are placed into a single abstraction set ab as in Example 3, Abstract-IHS will generate the sequence of abstract cores (ab.c [1] ), . . . , (ab.c[r]) after which the optimizer will return a solution of cost r that will be a correction sets allowing Abstract-IHS to terminate. More generally, this strategy can be applied to any unweighted MaxSat instance. The solving strategy of Proposition 1 in fact mimics the Linear UNSAT-SAT algorithm [9, 17] where the SAT solver solves the sequence of queries "can a solution of cost 1 be found", "can a solution of cost 2 be found", etc. As interesting future work, we note that the framework of abstract cores presented here can be used to mimic the behaviour several of the recently proposed core-guided algorithms [23, 25, 26] .

Computing Abstraction Sets: When computing abstraction sets, there is an inherent trade-off between the overhead and potential benefits from abstraction; too large sets can lead to large cnf encodings for the count variable definitions, making SAT solving very inefficient while with too small sets the algorithm reverts back to non-abstract IHS with hardly any gain from abstraction.

Although the notion of exchangeability has intuitive appeal, it seems likely be computationally hard to identify exchangeable b-variables that can be grouped into abstraction sets. In our implementation we used a heuristic approach to finding abstraction sets motivated by the F n,r instances (Example 1). In those instances, each b-variable appears in many cores with each of the other bvariables. We decided to build abstraction sets from sets of b-variables that often appear in cores together. This technique worked in practice (see Sect. 6), but a deeper understanding of how best to construct abstraction sets remains as future work.

To find b-variables that appear in many cores with each other, we used the set of cores found to construct a graph G. The graph has b-variables as nodes and weighted edges between two b-variables representing the number of times these two b-variables appeared together in a core. We then applied the Louvain clustering algorithm [10] to G obtain clusters of nodes such that the nodes in a cluster have a higher weight of edges between each other (i.e. appear in cores more often together) than with nodes in other clusters, these were then taken to define an abstraction set. We also monitored how effective the cores found were in increasing the lower bound generated by the optimizer. If the cores were failing to drive the optimizer's lower bound higher, we computed new abstraction sets by clustering the graph G, and updated AB with these new abstraction sets. If clustering had already been performed and the extracted cores were still not effective, the nodes of the b-variables in each abstraction set were merged into one new node and G was reclustered. (The Louvain algorithm can compute hierarchical clusters). Any new clusters so generated will either be new abstraction sets or supersets of existing abstractions sets. New abstraction sets are formed from these new clusters and added to AB. All subsets are removed from AB so that future abstractions will be generated using the larger abstraction sets.

We also found that abstraction was not cost effective on instances where the average core size was in the hundreds. The generated abstraction sets were so large that the cnf encoding of their count variables definitions slowed the SAT solver down too much. Finally, only we add the cnf encoding of the count variable definitions ab.c[k] ↔ b∈ab b ≥ k to the SAT solver when ¬ab.c[k] first appears in the set of assumptions. Furthermore, we only add the encoding in the direction ab.c[k] ← b∈ab b ≥ k.

We have implemented two versions of abstract cores on top of the MaxHS solver [12, 14] using the version that had been submitted to the MaxSat 2019 evaluation (MSE 2019) [5] . The two new solvers are called maxhs-abs and maxhsabs-ex. maxhs-abs implements the abstraction method described above, using the Louvain algorithm to dynamically decide on the abstraction sets and extracting both abstract and non-abstract cores in ex-abs-cores. We used the well known totalizer encoding [6] to encode the count variable definitions into cnf. In particular, each totalizer takes as input the b-variables of an abstraction set ab, and the totalizer outputs become the count variables ab.c [k] .

The maxhs-abs-ex solver additionally exploits the totalizer encodings by using the technique of core exhaustion [20] . This technique uses SAT calls to determine a lower bound on the number of totalizer outputs forced to be true. This technique can sometimes force many of the abstraction set count variables. We impose a resource bound of 60s on the process so exhaustion is not complete.

We compare the new solvers to the base maxhs (MSE 2019 version) as well as to two other solvers: the MSE 2019 version of RC2 (rc2) [4, 20] , the best performing solver in both the weighted and unweighted track and a new solver in MSE 2019 called UWrMaxSat (UWr) [4, 21] . Both implement the OLL algorithm [1, 25] and differ mainly in how the cardinality constraints are encoded into cnf. As benchmarks, we used all 599 weighted and 586 unweighted instances from the complete track of the 2019 MaxSat Evaluation, drawn from a variety of different problem families. All experiments were run on a cluster of 2.4 GHz Intel machines using a per-instance time limit of 3600 s and memory limit of 5 GB. Figure 1 show cactus plots comparing the solvers on the unweighted (left) and weighted (right) instances. Comparing maxhs and maxhs-abs we observe that abstract core reasoning is very effective, increasing the number of unweighted instances solved from 397 to 433 and weighted instances from 361 to 379 surpassing both rc2 and UWr in both categories. maxhs-abs-ex improves even further with 438 unweighted and 387 weighted instances solved. Table 1 gives a pair-wise solver comparison of the number of instances that could be solved by one solver but not by the other. We observe that even though the solvers can be ranked by number of instances solved, every solver was able to beat every other solver on some instances (except that maxhs-abs did not solve any weighted instances that maxhs-abs-ex could not). This speaks to the diversity of the instances, and indicates that truly robust solvers might have to employ a variety of different techniques. Figures 2 and 3 show a breakdown by family for the three solvers maxhsabs-ex, rc2 and maxhs. The plots show only those families where the solvers exhibited different performance. We observe that rc2 and maxhs achieve quite disparate performance with each one dominating the other on different families. maxhs-abs-ex, on the other hand, is often able to achieve the same performance as the better of the two other solvers on these different families.

We proposed abstract cores for improving the Implicit Hitting Set (IHS) based approach to complete MaxSat solving. More specifically, we address the large worst-case number of cores that IHS needs to extract before terminating. An abstract core is a compact representation of a (potentially large) set of (regular) cores. We incorporate abstract core reasoning into the IHS algorithm, prove correctness of the resulting algorithm and report on an experimental evaluation comparing IHS with abstract cores to the best performing solvers of the latest MaxSat Evaluation. The results indicate that abstract cores indeed improve the empirical performance of the IHS algorithm, resulting in state-of-the-art performance on the instances of the Evaluation.

Unsatisfiability-based optimization in clasp

SAT-based MaxSAT algorithms

Cardinality networks and their applications

MaxSAT evaluation 2018: new developments and detailed results

MaxSAT Evaluation

Efficient CNF encoding of boolean cardinality constraints

Applications of MaxSAT in data analysis

Cost-optimal constrained correlation clustering via weighted partial maximum satisfiability

The SAT4J library, release 2.2

Fast unfolding of communities in large networks

Automated design debugging with maximum satisfiability

Solving MAXSAT by decoupling optimization and satisfaction

Solving MAXSAT by solving a sequence of simpler SAT instances

Exploiting the power of mip solvers in maxsat

Postponing optimization to speed up MAXSAT solving

Modeling and solving staff scheduling with partial weighted MaxSAT

Translating pseudo-boolean constraints into SAT

IMLI: an incremental framework for MaxSAT-based learning of interpretable classification rules

A low capture power oriented X-filling method using partial MaxSAT iteratively

RC2: an efficient MaxSAT solver

Encoding cardinality constraints using multiway merge selection networks

QMaxSAT: a partial Max-SAT solver

On using unsatisfiability for solving maximum satisfiability

Open-WBO: a modular MaxSAT solver

Core-Guided MaxSAT with Soft Cardinality Constraints

Maximum satisfiability using core-guided MaxSAT resolution

Modulo based CNF encoding of cardinality constraints and its application to MaxSAT solvers

Re-implementing and extending a hybrid SAT-IP approach to maximum satisfiability

LMHS: a SAT-IP hybrid MaxSAT solver

Towards an optimal CNF encoding of boolean cardinality constraints

MAXSAT heuristics for cost optimal planning