1 Introduction

Automated Planning [9] is the subarea of Artificial Intelligence that studies the process of automatically generating a plan of actions that, when executed, enables an agent to accomplish a task. A state refers to a particular configuration of the agent’s environment and can be described by a set of properties. The initial state is the environment configuration at the beginning of the plan execution. A planning problem is defined by means of a formal description of the environment (called planning domain) plus an initial state and a goal description. The planning domain models the environment based on the set of actions the agent can perform. Thus an action can be seen as a manifestation of the agent’s intent to make the environment to change from a state to another. Finally, the goal description can vary from a simple reachability goal formula (e.g. a propositional formula that must be satisfied by a set of goal states) to complex temporal goal formulae [22, 26], capable to represent the agent’s preferences over the plan solution and its quality as well.

Fig. 1.
figure 1

Examples of state transition systems representing planning domains.

Figure 1a shows a planning domain represented by a state transition system given by a graph with vertices representing the states - labeled by propositional atoms and directed edges representing the actions - labeled by action names. To represent states we make the closed world assumption, i.e. a state is described only by the propositions that are true and any proposition that is false is not included in the state label. For example, in the state \(s_0\) (Fig. 1a) p is true, q is false and r is false. Finding a solution to a planning problem corresponds to select the best action at each state, which is a computationally hard problem [11]. Therefore many research has been devoted to find efficient solutions for planning problems mainly focusing on making different assumptions about the environment and the capability to express goal formulae.

Deterministic Planning. Research in this area assumes the world is fully-observed and there are no uncertainties about the effects of the actions [9]. Thus, when an action is executed in a state, there is only one possible successor state. The solution of a deterministic planning problem is a sequence of actions that takes the environment from the initial state to a state that satisfies a simple reachability goal description. Examples of efficient algorithms for deterministic planning are based on heuristic search [10] and boolean satisfiability [13].

Example 1

(Deterministic Planning). Suppose the environment (system) depicted in Fig. 1a is currently in the initial state \(s_0\) and the agent goal is to reach the state where property \(\varphi = \lnot p \wedge q \wedge r\) holds. The sequence of actions \(\langle a_1,a_4\rangle \) is a plan solution since it deterministically takes the agent from state \(s_{0}\) to state \(s_{5}\), where \(\varphi \) is satisfied. Other possible plans are: \(\langle a_2,a_6\rangle \) e \(\langle a_3,a_7\rangle \).

Non-deterministic Planning. Also called FOND planning - Fully Observed Non-Deterministic planning, where we make the assumption the environment evolves in a non-deterministic way, due to the occurrence of natural events without the agent’s control (called exogenous events). In this case, the execution of an action can lead to a set of possible future states [22]. A solution for a non-deterministic planning problem is policy: a mapping from states to actions [9]. A policy can be classified into three quality categories: (i) a weak policy is a solution that may achieve the goal but offers no guarantee due to the non-determinism of the actions; (ii) a strong policy is a solution that guarantees the goal achievement, regardless of non-determinism of the actions; and (iii) a strong-cyclic policy is a solution that ensures the goal achievement, but its execution may involve infinite cycles (assuming that the agent will eventually exit all cycles) [6]. Efficient algorithms for non-deterministic planning are based on Model Checking [6, 18, 21, 28] and determinization of the actions to obtain relevant policies [19].

Example 2

(Non-deterministic planning). Consider the non-deterministic planning domain depicted in Fig. 1b, which is similar to the domain shown in Fig. 1a except for the fact that the actions \(a_2\) and \(a_4\) are non-deterministic. These actions can eventually take the agent, respectively, to the states {\(s_2\), \(s_3\)} and {\(s_4\), \(s_5\)}. Suppose the agent is in state \(s_{0}\) and its goal is to reach state where the goal formula holds: \(\varphi = \lnot p \wedge q \wedge r\). A solution is a policy: \(\{(s_{0},a_1), (s_{1},a_4)\}\). This policy indicates that if the agent starts at \(s_{0}\) and performs the deterministic action \(a_1\), it will reach state \(s_{1}\). Subsequently, if it takes the non-deterministic action \(a_4\) in \(s_{1}\), the agent will end up in either \(s_{4}\) or \(s_{5}\) (weak policy). Other solutions are strong policies \(\{(s_0,a_2),(s_2,a_6),(s_3,a_7)\}\) and \(\{(s_0,a_3),(s_3,a_7)\}\).

Planning with Preferences. This form of planning allows the specification of complex goal formulae to represent properties that should be preserved or avoided along the path to finally reach a goal state (also called extended reachability goal). There are two types of preferences: quantitative preferences and qualitative preferences. In the first type, preferences represent a subset of desirable goals when it is not possible to achieve all goals [25]. In the second type, preferences deal with additional properties that must hold along the path to achieve a goal state. Qualitative preferences can be used to express safety and liveness properties [16]. Informally, a safety property indicates that “something (bad) should not happen” during the execution of a plan, while a liveness property expresses that eventually “something (good) should happen” during the plan execution [15]. Qualitative preferences can be expressed by operators in PDDL such as: always, sometime, sometime-before and at-most-once [3, 8].

Example 3

(Planning with Qualitative Preferences in Deterministic Environments). Consider the planning domain depicted in Fig. 1a and a problem where the initial state is \(s_{0}\) and the goal is to reach a state where the propositional formula \(\varphi = \lnot p \wedge q \wedge r\) holds. Suppose that the agent may prefer to visit, on the path to reach the goal, only states where p is true (liveness property). In this case, possible solutions are \(\langle a_{1}, a_{4} \rangle \), \(\langle a_{2}, a_{6} \rangle \) and \(\langle a_{3}, a_{7} \rangle \). This work focuses on the qualitative preference “always (p)”, which aims to find a plan where the property p is true in all states along the path to reach the goal.

Example 4

(Planning with Qualitative Preferences in Non-Deterministic Environments). Consider the planning domain depicted in Fig. 1b and the agent is at state \(s_{0}\) and its goal is to reach a state where the property \(\varphi = \lnot p \wedge q \wedge r\) is satisfied, but with the additional requirement that the agent must pass only through states where property p is true before reaching the goal. The solution \(\pi \) = \(\{(s_{0},a_1), (s_{1},a_4)\}\) is a weak police that satisfies also the user preference, once it corresponds to a path that possibly reaches the goal \(\varphi \) and passes only through states where p is satisfied. The solutions \(\pi \) = \(\{(s_{0},a_2), (s_{2},a_6), (s_3, a_7) \}\) and \(\pi \) = \(\{(s_{0},a_3), (s_3, a_7) \}\) are strong polices that satisfies the user preference, once it reaches the goal \(\varphi \) for sure and passes only through states where p is satisfied.

Most algorithms for planning with qualitative preferences work on deterministic domains [2, 12, 14, 20]; and the few works on FOND planning do not include preferences over the policy quality. This is because they are based on LTL - Linear Time Logic - [23] which can not have path quantifiers. To the best of our knowledge, this is the first step toward extending the notion of qualitative preferences in non-deterministic environments to also include preferences over policy quality. In order to do so, we apply the notion of extended reachability goals in Model Checking based on the branching time temporal logica called \(\alpha \)-CTL [27] to express qualitative preferences as extended reachability goals and solve existing algorithms for \(\alpha \)-CTL model checking.

In this paper, we propose solutions for planning problems that address two aspects present in real-world environments: (i) we consider that the actions can have non-deterministic effects and; (ii) we consider that the user wants to express its preferences over the policy paths, and over the policy quality as well. This paper is organized as follows: Sect. 2 presents the related works; Sect. 3 describes the foundations; Sect. 4 details how policy preferences can be expressed using the \(\alpha \)-CTL temporal logic [22]; Sect. 5 presents our experimental analysis and, Sect. 6 presents the conclusions and future work.

2 Related Work

The work in [5] presents an approach to non-deterministic planning in which the qualitative preferences are expressed as extended (reachability) goals in Linear Time Logic (LTL) [23]. Let’s call this a problem P and to solve it, the authors proposed the following extra-logical approach: (i) build a Buchi automaton for the extended goal expressed as an LTL formula; (ii) perform the determinization of non-deterministic actions (iii) build a new problem \(P'\) without extended goals; (iv) use a non-deterministic planner to produce a policy for \(P'\) and; (v) convert the resulting policy into a solution for P. In our work, we rely on a solution based on CTL (a branching time propositional logic) able to solve non-deterministic planning without the need to perform determinizations.

The solution for planning with qualitative preferences in deterministic domains, presented in [20], involves transforming a problem with preferences P into a problem without preferences \(P'\). For each preference in P, a new goal, which is false in the initial state, is added in \(P'\). The new goal can be achieved by an action with zero cost, but requires the preference to be satisfied, or an action with a cost equal to the utility of the preference that can only be performed if the preference is false. Due to this transformation method, any classical planner can be used to solve problems with preferences.

The work in [4] introduces a method for solving generalized planning problems, where the same plan can be used for multiple problem instances. It generates policies by transforming multiple concrete problems into a single abstract (lifted) problem that captures their common structure. The global structure of the problems can be captured through qualitative preferences expressed as formulas in LTL logic. In addition, the authors demonstrate that for a wide class of problems, path constraints can be compiled, which reduces generalized planning into non-deterministic planning.

The work in [27] presents a planning algorithm that aims to solve non-deterministic planning problems with temporally extended goals (complex goals), while also considering the quality of the policy (weak, strong, or strong-cyclic). The proposed planner utilizes the \(\alpha -\)CTL model checking framework to tackle these kind of problems. Our work is an application of such planner for problems with complex goals involving qualitative preferences.

3 Foundations

3.1 Non-deterministic Planning

In the real-world situations, the nature can cause unpredictability in the effects the actions. As a result, when an agent is in a state \(s_i\) and chooses an action \(a_i\) with the intention of reaching a particular state \(s_j\), the interference of nature can cause the agent to end up in a different state \(s_k\) instead.

A non-deterministic planning domain can be characterized as a state transition system, as showed in Definition 1. In this kind of domain, when an action is performed in one state, it can lead to more than one successor state.

Definition 1

[Non-Deterministic Planning Domain] Given a set of propositional atoms \(\mathbb {P}\) and a set of actions \(\mathbb {A}\), a non-deterministic planning domain is defined by a tuple \(\mathcal {D} = \langle S, L, T\rangle \) where states are labeled by elements of \(\mathbb {P}\) and actions are labeled by elements of \(\mathbb {A}\) [9]:

  • S is a finite set of states;

  • \( L : S \rightarrow 2^{\mathbb {P}}\) is a state label function; and

  • \( T : S \times \mathbb {A} \rightarrow 2^{S}\) is a non-deterministic state transition function.

We assume that the set \(\mathbb {A}\) contains the trivial action \(\tau \) and that \(T(s,\tau ) = \{s\}\), for every final state \(s \in S\). Intuitively, this action represents that the agent may choose to do nothing. As the number of propositions increases, it becomes unfeasible to explicitly represent planning domains. Therefore, the planning community uses action languages (such as the Planning Domain Description Language - PDDL) to concisely describe domains. In this representation, actions are defined by their preconditions and effects. Preconditions are propositions that must be true in a state for an action to be executed, while effects are the literals that are modified in a state to produce a set of possible next states. After defining the domain, the next step is to formalize the planning problem.

Definition 2

(Non-Deterministic Planning Problem). Given a non deterministic planning domain \(\mathcal {D}\), a non-deterministic planning problem is defined by a tuple \(P = (\mathcal {D}, s_0, \varphi )\) where: \(\mathcal {D}\) is a non-deterministic planning domain; \(s_0\) is the initial state, and \(\varphi \) is a propositional formula representing the goal.

Due to uncertainties about the effects of the actions, a solution to a non-deterministic problem is a mapping from states to actions, is called policy [9].

Definition 3

(Policy). Let P = \(\langle \mathcal {D}, s_{0}, \varphi \rangle \) be a planning problem in the domain \(\mathcal {D} = \langle S, L, T\rangle \) with non-deterministic actions. A policy \(\pi \) is a partial function \(\pi : S \rightarrow A\) which maps states to actions; such that, for all state \(s \in S\) if \(\pi \) is set to s then [21].

The quality of a policy can be: weak are policies that can reach a goal state, but there is no guarantee due to the non-determinism of actions; strong are policies that reach a goal state, independently of the non-determinism and, strong-cyclic are policy that guarantees to reach a goal state, under the assumption that the agent will eventually exit cycles [6].

Definition 4

(Set of states reachable by a policy). Let \(P = \langle \mathcal {D},s_{0},\varphi \rangle \) be a non-deterministic planning problem and \(\pi \) a policy to a planning problem P. The set of states reachable by a policy \(\pi \), denoted by \(S_{reach[\pi ]} \) is defined by \( \{ s:(s,a) \in \pi \} \cup \{ s{'} : (s,a) \in \pi \,e \, s{'} \in \textrm{T}(s,a) \}\) [21].

Definition 5

(Execution structure of a policy). Let \(P = \langle \mathcal {D},s_{0},\varphi \rangle \) be a non-deterministic planning problem and \(\pi \) be a policy for a planning domain \(\mathcal {D}\). The execution structure induced by \(\pi \) from \(s_0 \in S\) is a tuple \(\langle S_{reach[\pi ]} ,T \rangle \) (\(S_{reach[\pi ]} \subseteq S\) and \(T \subseteq S \times \mathbb {A} \times S\)) which contains all the states and transitions that can be reached when executing policy \(\pi \).

3.2 Planning as Model Checking

Model checking [7] is a formal technique that explores all possible states of a transition system to verify whether a given property holds. Applying model checking involves: modeling the system; specifying the property to be verified via a logical formula; and verifying the property automatically in the model.

Definition 6

(Action labelled transition system). Let \(\mathbb {P}\) be a not-empty set of atomic proposition and \(\mathbb {A}\) a set of action names, an action labelled transition system is a tuple \( \mathcal {M}= \langle S,L,T \rangle \) where:

  • S is a finite nonempty set of states;

  • \(L : S \rightarrow 2^\mathbb {P}\) is the state labeling function and;

  • \(T \subseteq S \times \mathbb {A} \times S\) is the state transition relation.

Definition 7

(Path in an action labelled transition system). A path \(\rho \) in an action labelled transition system \(\mathcal {M}\) is a sequence of states \(s_{0}, s_{1}, s_{2}, s_{3}, \ldots ,\) such that \(s_i \in S\) and \((s_{i}, a, s_{i+1}) \in T\), for all \( i \ge 0 \).

We can combine the model checking and planning approaches to find solutions to planning problems. In this context, the domain represents the model to be checked, and the temporal logic formula expresses the planning goal to be satisfied.

3.3 The Temporal Logic \(\alpha \)-CTL

The branching time temporal logic \(\alpha \)-CTL [21] is an extension of CTL logic able to consider actions by labeling state transitions. With such extension it is possible to solve FOND planning problems without to appeal to extra-logical procedures. In this logic, temporal operators are represented by “dotted” symbols (Definition 8). The formulae of \(\alpha \)-CTL are composed by atomic propositions, logical connectives(\(\lnot ,\, \wedge \, \vee \,\)), path qUantifiers (\(\exists \) and \(\forall \)) and temporal operators: , , and .

Definition 8

(\(\alpha \) -CTL’s syntax). Let \(p \in \mathbb {P}\) be an atomic proposition. The syntax of \(\alpha \)-CTL [21] is inductively defined as:

figure f

The temporal operators derived from are and . In \(\alpha \)-CTL, the temporal model \(\mathcal {M}\) with signature (\(\mathbb {P},\mathbb {A}\)) is labeled transition system, i.e., a states transition system whose states are labeled by propositions and whose the transitions are labeled by actions (Definition 9):

Definition 9

(\(\alpha \) -CTL’s model). A temporal model \(\mathcal {M}\) with signature (\(\mathbb {P},\mathbb {A}\)) in the logic \(\alpha \)-CTL is state transition system \(\mathcal {D} = \langle S,L,T \rangle \), where:

  • S :  is a non-empty finite set of states;

  • L : S \(\rightarrow 2^{\mathbb {P}}\) is a state labeling function;

  • \(T:S \,\times \,\mathbb {A} \rightarrow 2^{S}\) is states transition function labeling by actions.

The semantics of the local temporal operators ( and ) is given by preimage functions, while the semantics of the global temporal operators (, , ) is derived from the semantics of the local temporal operators, by using least (\(\nu \)) and greatest (\(\mu \)) fixpoint operations.

Definition 10

(Weak Preimage in \(\alpha \) -CTL formula). Let \(Y \subseteq S \) be a set of states. The weak preimage of Y, denoted by \(\mathcal {T}^{-}_\exists (Y)\) is the set .

Definition 11

(Strong Preimage in \(\alpha \) -CTL formula). Let \(Y \subseteq S \) be a set of states. The weak preimage of Y, denoted by \(\mathcal {T}^{-}_\forall (Y)\) is the set .

Definition 12

(Intension of an \(\alpha \) -CTL formula). Let \(\mathcal {D} = \langle S, L, T\rangle \) be a temporal model (or a non-deterministic planning domain) with signature (\(\mathbb {P}, \mathbb {A}\)). The intension of an \(\alpha \)-CTL formula \(\varphi \) in \(\mathcal {D}\) (or the set of states satisfying \(\varphi \) in \(\mathcal {D}\)), denoted by \(\llbracket \varphi \rrbracket {\mathcal {D}}\), is defined as:

  • \(\llbracket p \rrbracket _{\mathcal {D}} = \{ s \, : \, p \, \in \, L(s)\}(by \, definition, \llbracket \top \rrbracket _{\mathcal {D}} = S \, and \, \llbracket \bot \rrbracket _{\mathcal {D}} = \emptyset ) \)

  • \(\llbracket \lnot p \rrbracket _{\mathcal {D}} = S \smallsetminus \llbracket p \rrbracket _{\mathcal {D}} \)

  • \(\llbracket \varphi _1 \wedge \varphi _2 \rrbracket _{\mathcal {D}} = \llbracket \varphi _1 \rrbracket _{\mathcal {D}} \cap \llbracket \varphi _2 \rrbracket _{\mathcal {D}}\)

  • \(\llbracket \varphi _1 \vee \varphi _2 \rrbracket _{\mathcal {D}} = \llbracket \varphi _1 \rrbracket _{\mathcal {D}} \cup \llbracket \varphi _2 \rrbracket _{\mathcal {D}}\)

Figure 2 shows the semantics of operators used in this work: (a) considering each effect of action a when applied in \(s_0\), p is globally true; (b) considering each effect of action a when applied in \(s_0\), p is true until q is true and; (c) considering some effect of action a when applied in \(s_0\), p is true until q is true.

Fig. 2.
figure 2

Semantics of the temporal operators of the logic \(\alpha \)-CLT.

3.4 Reasoning About Non-deterministic Actions

The representation of a planning domains through of a action language, e.g. PDDL, allow a compact representation of the state space. Thus, we can start from a initial state and progressively advance, determining a sequence of actions and states that lead achievement of the goal state progressive search [10] or from goal state and regressively determining the sequence of the actions and state that lead achievement of the initial state regressive search [24].

From the representation of the actions through of the preconditions an effect, is possible compute the set of states Y that precede a set X through of the regression operations of a set of states. The regression of a set of states X by action a leads to a set Y of predecessor states. However, due to the non-determinism of actions, to the applying an action a on Y, the states on X can be required (strong regression) or possibly (weak regression) achieved [18].

Using the representation of states and actions as propositional formulas [18], is possible to compute the set of predecessor states (weak and strong regression operations). These operations was implemented using Binary Decision Diagram [1] and incorporated to a planner capable of performing symbolic model checking using directly the actions representation with preconditions and effects.

3.5 Planning as \(\alpha -\)CTL Model Checking

In this section, we briefly describe the planning as \(\alpha -\)CTL model checking algorithms used in this work. The algorithms receive the planning problem, whose domain is given by actions with preconditions and effects; an initial state \(s_0\); a goal formula \(\varphi \); and a propositional atom p, representing the qualitative preference to be satisfied in all states along the path to achieve the goal.

The algorithm SAT-AU computes the submodel of the domain that satisfies the \(\alpha \)-CTL formula . It performs strong regression [17], computing a path that reaches a goal \(\varphi _2\) while each state along the path towards \(s_0\) also satisfies the formula \(\varphi _{1}\). This operation computes the predecessors states directly using actions defined in terms of preconditions and effects (Fig. 3).

Fig. 3.
figure 3

SAT-AU computes the set of states satisfying

Figure 4 provides an example of how to compute the set of states that satisfies , where \(\varphi \) is the goal formula \(\lnot p \wedge q \wedge r\). The algorithm starts by computing the set of pairs state-action whose states satisfies \(\varphi \) (\(Y = \{(s_5, \tau \}\)), as showed in Fig. 4(a). In the first iteration (Fig. 4b), the algorithm computes the strong regression of the set of states satisfying \(\varphi \) and performs the intersection with the set of states satisfying p, obtaining the set of pairs state-action \(Y = \{(s_5\), \(\tau \)), \((s_2,a_6)\), \((s_3,a_7)\}\). In the second iteration (Fig. 4c), the strong regression of the set of states Y is computed, the intersection with the set of states satisfying p is done, resulting in the set of pairs state-action \(Y = \{(s_5\), \(\tau \)), \((s_2,a_6), (s_3, a_7), (s_0,a_2), (s_0, a_3) \}\). The fixed point is reached after the second iteration, once no new pair state-action is obtained with weak regression operation in the Y set.

Similarly, the algorithm SAT-EU compute the set of states that satisfy the formula , but using weak regression operations [17]. In addition, the algorithm SAT-AG computes a submodel satisfying the \(\alpha \)-CTL formula . It performs regressive search from goal states (states that satisfies \(\varphi _1\)) towards the initial state \(s_0\) by using strong regression operations [17].

Fig. 4.
figure 4

Computing the always preference for strong solutions.

4 Specifying Preferences over Policies in \(\alpha \)-CTL

In this section we show how to specify preferences about policies in \(\alpha -\)CTL temporal logic. To the best of our knowledge, this work is the first step toward extending the notion of qualitative preferences in non-deterministic environments to also include the quality of policy (weak, strong or strong-cyclic). In order to do so, we apply the notion of extended goals in \(\alpha \)-CTL [27] to express qualitative preferences.

In this work, we will specify the qualitative preference always(p), which expresses that the property p must occur in all states in the path to achieve the goal. Consider the planning problem \(P = <\mathcal {D}, s_0, \varphi >\) where \(\mathcal {D}\) is the domain in Figure 4a, \(s_0\) is the initial state and the goal \(\varphi = \lnot p \wedge q \wedge r\) is satisfied in state \(s_5\). The non-deterministic actions are \(a_2\) and \(a_4\). Notice that:

  • The solution \(\{(s_0, a_1), (s_1, a_4)\}\) is a weak policy that satisfies the preference always(p) on the path to reach the goal state;

  • The solution \((s_0, a_2), (s_2, a_5), (s_3,a_7)\) is a strong solution that satisfies the preference always(p) on the path to reach the goal state;

  • The solution \((s_0, a_3), (s_3, a_7)\) is also a strong solution that satisfies the preference always(p).

4.1 Specifying the always Preference for Weak Policies

Weak policies can reach a goal state, but there is no guarantee due to the non-determinism of actions. In this section, we define the qualitative preference \(\texttt {always}\) for weak policies and present an \(\alpha \)-CTL formula for it.

Definition 13

(Preference always for weak policies). Let \(P=\langle \mathcal {D}, s_{0}, \varphi \rangle \) be a non-deterministic planning problem, \(\pi \) a weak policy for P, \(S_{\mathcal {D}[\pi ]}\) the execution structure induced by weak policy \(\pi \). The preference always(p) (\(p \in \mathbb {P}\)) for a weak policy is satisfied in \(S_{\mathcal {D}[\pi ]}\) iff there is some execution path \(\mathcal {P}_{\pi }\) where \(\forall k, 0 \le k \le i-1\), \(s_{k} \in \mathcal {P}_\pi \) and .

Definition 14

(Specifying the always preference for weak policies in \(\alpha \) -CTL). Let \(P=\langle \mathcal {D}, s_{0}, \varphi \rangle \) be a non-deterministic planning problem with signature \((\mathbb {P}, \mathbb {A})\) and always(p) \((p\in \mathbb {P})\) a preference over P. This preference for a weak policy can be expressed using the following \(\alpha -\)CTL formula:

                                           .

To prove that the \(\alpha \)-CTL formula specifies the always(p) preference for a weak solution \(\pi \) for a planning problem \(P = \langle \mathcal {D}, s_0, \varphi \rangle \) we must show that it is possible to obtain a submodel \(S_{\mathcal {D}[\pi ]} \subseteq \mathcal {D}\) induced by \(\pi \), where: (1) \(s_0 \in S_{\mathcal {D}[\pi ]}\); (2) there is an execution path \(\mathcal {P}_{\pi } \in S_{\mathcal {D}[\pi ]}\) that reaches the goal state , from the initial state \(s_0\), and \(\forall k, 0 \le k \le i-1\), \( s_{k} \in {\mathcal {P}_\pi }\) and .

Proof

Consider the non-deterministic planning problem \(P = <\mathcal {D}, s_0, \varphi > \). Assume that . According to \(\alpha \)-CTL semantics, there is an action \(a \in \mathbb {A}\) such that, for a path \(\mathcal {P}_\pi \) started at \(s_0\) there is a state \(s_i\) (for \(s_i \ge 0\)) along this path such that and, for every \(0 \le k < i\), we have . Then there is an execution path for a policy \(\pi \),i.e., that lead the goal \(\varphi \). Consequently, there is an execution structure \(S_{\mathcal {D}[\pi ]}\) induced by the policy \(\pi \) (Definition 5) such that \(\mathcal {P}_\pi \in S_{\mathcal {D}[\pi ]}\). Thus, we have \(s_0 \in S_{\mathcal {D}[\pi ]}\), satisfying condition 1. Since there is a \(\mathcal {P}_\pi \in S_{\mathcal {D}[\pi ]}\), which reaches the goal state , from the initial state \(s_0\) and \(\forall k, 0 \le k < i\), \( s_{k} \in {\mathcal {P}_\pi }\) we have , the condition 2 is satisfied.

4.2 Specifying the Always Preference for Strong Policies

Strong policies guarantees that all sequences of states, obtained after execution of the policy, reach the goal. In this section we define how one can obtain a strong solution for a non-deterministic planning problem with \(\texttt {always}\) preference.

Definition 15

(Preference always for strong policies). Let be \(P=\langle \mathcal {D}, s_{0}, \varphi \rangle \) a non-deterministic planning problem with signature \((\mathbb {P}\), \(\mathbb {A})\), \(\pi \) a strong policy for P, \(S_{\mathcal {D}[\pi ]}\) the execution structure induced by \(\pi \). The always(p) \((p \in \mathbb {P})\) preference for a strong policy is satisfied in \(S_{\mathcal {D}[\pi ]}\) iff for all execution path \(\mathcal {P}_{\pi }\), we have and \( \forall k, 0 \, \le k\le \,i-1, s_{k} \in {\mathcal {P}_\pi }\) with .

Definition 16

(Specifying the always preference for strong policies in \(\alpha \) -CTL). Let be \(P=\langle \mathcal {D}, s_{0}, \varphi \rangle \)a non-deterministic planning problem with signature \((\mathbb {P}, \mathbb {A})\) and always(p) \((p\in \mathbb {P})\) a qualitative preference over P. This preference for a strong policy can be expressed by the following \(\alpha -\)CTL formula:                                            .

In order to prove that \(\alpha \)-CTL formula specifies the always(p) preference for a strong policy for a non-deterministic planning problem \(P = \langle \mathcal {D}, s_0, \varphi \rangle \) we have to show that is possible to obtain a submodel \(S_{\mathcal {D}[\pi ]} \subseteq \mathcal {D}\) induced by the strong policy \(\pi \) that satisfies the following conditions: (1) \(s_0 \in S_{\mathcal {D}[\pi ]}\); (2) for all execution paths \(\mathcal {P}_{\pi } \in S_{\mathcal {D}[\pi ]}\), we have \(\forall k, 0 \le k \le i-1\), \( s_{k} \in {\mathcal {P}_\pi }\) and .

Proof

Consider a non-deterministic planning problem \(P = <\mathcal {D}, s_0, \varphi > \). Assume that . According \(\alpha \)-CTL semantics, there is an action \(a \in \mathbb {A}\) such that, for all execution path \(\mathcal {P}_\pi \), starting in \(s_0\), there is a state \(s_i\) (\(s_i \ge 0\)) in each path such that and, for each \(0 \le k < i\), we have . Thus, we can affirm that \(\mathcal {P}_\pi \) is a execution path. Consequently, there is a execution structure \(S_{\mathcal {D}[\pi ]}\) induced by the policy \(\pi \) (Definition 5) such that all \(\mathcal {P}_\pi \in S_{\mathcal {D}[\pi ]}\). Thus, \(s_0 \in S_{\mathcal {D}[\pi ]}\), satisfying condition 1. Since all \(\mathcal {P}_\pi \in S_{\mathcal {D}[\pi ]}\) reach a goal state , from initial state \(s_0\), and \(\forall k, 0 \le k < i\), \( s_{k} \in {\mathcal {P}_\pi }\) e , we have that the condition 2 é satisfied.

4.3 Specifying the Always Preference for Strong-Cyclic Policies

Strong cyclic policies guarantees that all sequence of states, obtained after policy execution, reach the goal under the assumption that its execution will eventually exit all existing cycles. In this section, we define how to specify the \(\texttt {always}\) preference for a strong-cyclic policy.

Definition 17

(Preference always for strong-cyclic policies). Let be \(P=\langle \mathcal {D}, s_{0}, \varphi \rangle \) a non-deterministic planning problem, where \(\mathcal {D}\) has signature \((\mathbb {P}\), \(\mathbb {A})\), \(\pi \) a strong cyclic policy for P, \(S_{\mathcal {D}[\pi ]}\) a execution structure induced by \(\pi \). The always(p) preference \((p \in \mathbb {P})\) for a strong cyclic policy is satisfied in \(S_{\mathcal {D}[\pi ]}\) iff for all execution paths \(\mathcal {P}_{\pi }\), we have \( \forall k, 0 \, \le k\le \,i-1, s_{k} \in {\mathcal {P}_{{\pi }_i}}\) e . Furthermore, it is possible to reach a goal state from each \(s_k \in \mathcal {P}_{\pi }\).

Definition 18

(Specifying the always preference for strong cyclic policies in \(\alpha \) -CTL). Let be \(P=\langle \mathcal {D}, s_{0}, \varphi \rangle \) a non-deterministic planning problem, where \(\mathcal {D}\) has signature \((\mathbb {P}, \mathbb {A})\) and always(p) \((p\in \mathbb {P})\) a qualitative preference over P. This preference for a strong cyclic police can be expressed by the following \(\alpha -\)CTL formula:

                                           .

In oder to show that \(\alpha \)-CTL formula specify the always(p) preference for a strong solution for a non-deterministic planning problem \(P =\langle \mathcal {D},s_0,\varphi \rangle \) we have to show that it is possible to obtain a submodel \(S_{\mathcal {D}[\pi ]} \subseteq D\), such that: (1) \(s_0 \in S_{\mathcal {D}[\pi ]} \); (2) in all execution path \(\mathcal {P}_{\pi } \in S_{\mathcal {D}[\pi ]}\) we have \(\forall k, 0 \le k \le i-1\), \( s_{k} \in {\mathcal {P}_\pi }\) and and; (3) it is possible to exit the all cycles and reach the goal state from any state in execution path.

Proof

Consider a non-deterministic planning problem \(P = \langle \mathcal {D}, s_0, \varphi \rangle \). Assume that . According \(\alpha \)-CTL semantic, there is an action \(a \in \mathbb {A}\) such that for some path \(\mathcal {P}_\pi \), starting in \(s_0\) there is a state \(s_i\) \((s_i \ge 0)\) in this path such that and, for each \(0 \le k < i\), we have . Thus, each path \(\mathcal {P}_\pi \) is a execution path. Consequently, there is a execution structure \(S_{\mathcal {D}[\pi ]}\), induced by \(\pi \), such that \(\mathcal {P}_\pi \in S_{\mathcal {D}[\pi ]} \). Thus, we have \(s_0 \in S_{\mathcal {D}[\pi ]}\) satisfying the condition 1. Since there is a \(\mathcal {P}_\pi \in S_{\mathcal {D}[\pi ]}\), reaching a goal state , from \(s_0\) and \(\forall k, 0 \le k < i\), \( s_{k} \in {\mathcal {P}_\pi }\) and , attending condition 2. Furthermore, according \(\alpha \)-CTL semantic, there is an action \(a \in \mathbb {A}\) such that, in all states \(s_i\) (\(i \ge 0\)) in all paths, starting in \(s_0\), we can reach a goal state by visiting only states where property p is true. Thus, the condition 3 is also satisfied.

5 Experimental Analysis

Our preliminary experimental analysis aims to verify the feasibility of finding policies that satisfy the qualitative preferences specification. For this, we use the deterministic Rover domain with preferences from \(5^{th}\) International Planning Competition, modified to include non-deterministic actions (to the best of our knowledge, there are currently no benchmark domain with these characteristics). Rovers models the mission of planetary exploration on Mars, where rovers are equipped with devices including cameras, sample collectors, and data transmitters. The goal of the rovers is to explore the planet surface and send the collected data to a space station. The qualitative preference always entails preventing the rover from sampling soil and rock at a specific waypoint. We use the planner PACTL [28] to obtain the set of states satisfying the \(\alpha \)-CTL formulas specifying the qualitative preference \(\texttt {always}\) and also the quality of the policy. The experiments were performed on a Xeon Quad Core Server with 16GB of RAM. A timeout of 45 min was set for solving each instance.

Table 1 presents, for each instance and qualitative preference specified, the execution time and number of steps to obtain a policy. The symbol “-” indicates that timeout was reached. The instances have different levels of difficulty, which are characterized by factors such as the number of regions that need to be traversed, the number of goals to be achieved, and the number of rovers involved. The planner found strong, weak, and strong-cyclic policies for all instances from 1 to 8 (except for instance 6). Starting with instance 9, the planner was able to find solutions to problems 14 and 19. In cases where the planner fails to find a solution, it is difficult to distinguish either there is no solution to the problem, or the planner was unable to find the solution without exceed the timeout.

Table 1. Time and number of steps to compute policies with preferences.

6 Conclusion and Future Work

In this work, we propose using the \(\alpha \)-CTL temporal logic to specify the always qualitative preference in non-deterministic planning problems. In order to obtain weak, strong, or strong cyclic policies for non-deterministic problems with qualitative preferences, we employ planning as \(\alpha \)-CTL model checking algorithms. To demonstrate the suitability of these algorithms for solving non-deterministic problems with qualitative preferences, we modify the existing deterministic Rover domain with preferences by incorporating non-deterministic effects on the actions. The instances of this modified domain were used in our experiments, which aimed to show that planning as model checking algorithms can effectively provide solutions for non-deterministic problems with qualitative preferences while considering the policy quality. To the best of our knowledge, this is the first work that addresses non-deterministic problems with preferences while incorporating a preference over the policy quality.

As future work, we aim to investigate the suitability of the \(\alpha \)-CTL logic in expressing others types of qualitative preferences and, consequently, if planning as model checking algorithms are appropriated to solve non-deterministic planning problems with these others kinds of preferences. Furthermore, we aim to conduct experiments using different benchmark domains.