1 Introduction

The allocation of human resources to solve tasks is part of the daily routine of the most diverse institutions. Going from the allocation of employees in a company to the fulfillment of complex tasks [23], to the assignment of public lawyers to defendants by a court [1]. In the previous examples, if the decision-maker, which allocates resources to perform each set of tasks, is well aware of the abilities of each resource to solve the tasks, one can use the Hungarian Algorithm to obtain the optimal solution [15]. However, this is not always the case. There are numerous scenarios where the abilities of the resources are not known beforehand, for example, in a non-governmental organization allocating tasks to sporadic volunteers [5, 14]. Another example is crowdsourcing, the recent tendency in certain companies to allocate some tasks to non-specialized outsource workers [26, 28].

Some works use previous information about the resource to estimate their ability to solve the tasks [8, 11]. To guarantee the quality of the solutions given by unknown resources, it is necessary to estimate their ability to accomplish such tasks. The ability estimation can be done by the conduction of a test before the allocation of the tasks [25]; or by dynamically estimating the abilities during the allocation process, depending on the results of previous assignments [31]. Here, the resources’ abilities are dynamically estimated during the process, being updated after each response. Many works consider a threshold so that weaker resources do not receive any task [9, 25, 31]; instead, we consider the resources scarce and try to use the maximum of their availability. We allocate the dynamic and finite resources that arrive sequentially to try to solve a set of known tasks.

We consider a finite number of human resources arriving sequentially; tasks are assigned to resources dynamically, and the resources can either complete them correctly or fail. Each task must be completed a number of times, and each resource can receive an independent number of tasks. No task is submitted twice to the same resource; each task can be submitted as many times as needed to different resources. Resources, tasks, and the probability of a correct response are modeled using Item Response Theory [6]. The task parameters are known, while the ability of the resources must be learned through the interaction between resources and tasks. We propose an algorithm that combines shadow test replanning [17] to replan under uncertain knowledge, with the goal of optimally allocating resources to tasks while maximizing the number of completed tasks. Each replanning consists in solving an optimization problem through linear programming. The results are compared with two baselines, the random selection and the easier-first selection that selects the easier task available as the resources arrive sequentially. In summary, the contributions of this paper are: (i) a novel allocation of dynamic finite resources to a set of finite binary tasks problem is formalized; and (ii) algorithms are proposed to solve the problem in different scenarios by using most of the resources’ availability.

The rest of this paper is structured as follows: the proposed allocation problem is defined in Sect. 2. The proposed framework, with the definition of three algorithms, is presented in Sect. 3. Section 4 gives an overview of related works on the state of the art of task allocation. We present and discuss the results in Sect. 5, before concluding in Sect. 6.

2 The Allocation of Dynamic and Finite Resources to a Set of Binary Tasks Problem

We consider the problem of repeatedly allocating resources to fulfill one of a set of tasks when finite resources are available dynamically. First, the decision-maker knows a set of tasks, in which each task must be solved an arbitrary number of times. Second, resources are available dynamically to receive a number of tasks to try to solve. Third, the decision-maker allocates the current resource to tasks, one after the other, until the resource’s availability ends. Fourth, for each allocated task, the resource may fail or may fulfill it. Finally, the decision-maker must allocate tasks to fulfill the largest number of tasks. We consider the following process:

  1. 1.

    while there is time, wait for resource:

    1. (a)

      a resource arrives. While the resource is available:

      1. i

        chooses and presents a task to the resource under two constraints:

        (i) the task was solved successfully less than a desired level; and

        (ii) the task has not been presented to the current resource;

      2. ii

        the resource tries to solve the task; and it is observed whether the task was solved or not.

This process is formalized mathematically by the following definition:

Definition 1

(Dynamic Allocation of Resources to solve Tasks Multiple Times - DART-MT). Let \(\mathcal {T}\) be a set of tasks and \(\mathcal {R}\) a set of human resources. Consider the function \(n: \mathcal {T} \rightarrow \mathbb {N}\), n(t) indicates the number of times the task t must be solved. Also, consider the function \(m: \mathcal {R} \rightarrow \mathbb {N}\), m(r) indicates the number of tasks that can be assigned to resource r. \(\zeta _{t}\) is a vector of parameters for task \(t \in \mathcal {T}\). \(\theta _{r}\) is a vector of parameters for resource \(r \in \mathcal {R}\). \(f(\zeta _{t}, \theta _{r})\) is a function that represents the probability of the task t be fulfilled by the resource r. There is an order of resources arriving \(Q = (r_{1}, r_{2}, \ldots , r_{|\mathcal {R}|})\), where \(r_{i} \in \mathcal {R}\) and \(r_{j} \ne r_{i}\) for all \(i \ne j\). The problem of dynamic resource allocation to solve tasks multiple times is defined by the tuple \((\mathcal {T}, \mathcal {R}, n, m, \{\mathbf {\zeta }_{t}\}, \{\mathbf {\theta }_{r}\}, f, Q)\) and the objective is to maximize the expected number of solved tasks.

Because the result of a resource r trying to solve a task t is stochastic, an optimal solution is clearly contingent on past results; in the Markov Decision Process jargon, policies. Here, we considered different scenarios of knowledge about the problem DART-MT, where optimal solutions are contingent on information collected during the process. We consider three different scenarios. In every scenario probability function (f), tasks parameters and demands (\(\mathcal {T}\), n, and \(\{\mathbf {\zeta }_{t}\}\)) are known beforehand; they may be obtained from calibration in another set of resources. Information about the resources (\(\mathcal {R}, m, \{\mathbf {\theta }_{r}\}\) and Q) may be revealed directly or indirectly during the process. We differentiate knowledge of the current resource and future resources to define the following scenarios:

  • (KK) known-known: fully-revealed, i.e., resources \(\mathcal {R}\), availability function m, parameters \(\{\mathbf {\theta }_{r}\}\) and arrival order Q are known before hand;

  • (KU) known-unknown: the number of resources \(|\mathcal {R}|\) and the current resource r is fully-revealed (availability m(r) and parameter \(\mathbf {\theta }_{r}\)); while future resources are known to be drawn from an a priori distribution on parameters \(\varTheta \) and availability M; and

  • (UU) unknown-unknown: the number of resources \(|\mathcal {R}|\) and the a priori distribution on parameters \(\varTheta \) and availability M are revealed; the current resource r reveals its availability m(r), while its parameters must be learnt from results after submitting tasks to the resource.

We note that the objective of managing to solve a task gets more difficult when knowledge is hidden from the decision maker, i.e., a decision maker should solve the largest number of tasks in scenario KK while solving the smallest number of tasks in scenario UU. To obtain an upper bound to the DART-MT problem, we relax the problem regarding the allocation and results of submitting a task to a resource. First, we allow a resource r to be partially allocated to a task t (decision \(s_{r,t}\)). Second, a resource r solves a task deterministic and partially, given by \(P_{r, t} = f(\zeta _{t},\theta _{r})\). Third, \(x_{r,t}=s_{r, t} P_{r, t}\) indicates the amount of task t to be solved by resource r. Then, the following Linear Programming obtains an upper bound to the DART-MT problem:

$$\begin{aligned} \max \quad & \sum _{t \in \mathcal {T}} \sum _{r \in \mathcal {R}} x_{r, t} & \end{aligned}$$
(1)
$$\begin{aligned} \text {subject to} \quad & s_{r, t} \in [0,1] & \forall r \in \mathcal {R}, t \in \mathcal {T} \end{aligned}$$
(2)
$$\begin{aligned} & x_{r, t} = s_{r, t} P_{r, t} & \forall r \in \mathcal {\text {subject to} R}, t \in \mathcal {T} \end{aligned}$$
(3)
$$\begin{aligned} & \sum _{t \in \mathcal {T}} s_{r, t} \le m(r) & \forall r \in \mathcal {R} \end{aligned}$$
(4)
$$\begin{aligned} & \sum _{r \in \mathcal {R}} x_{r, t} \le n(t) & \forall t \in \mathcal {T} \end{aligned}$$
(5)

3 \(\mathbf {[STA]^2O}\): A Shadow Test Approach to Skill-Based Task Allocation Optimization

To solve the DART-MT problem in each of the three scenarios (KK, KU, and UU), we consider three levels of time abstractions: episodes, rounds, and steps. An episode comprehends the whole process of DART-MT problems when each resource \(r\in \mathcal {R}\) tries to solve m(r) tasks. A round comprehends the whole interaction with a resource r when the resource r tries to solve m(r) tasks. A step comprehends the interaction of a resource r and a task t. For each of the three scenarios, we define three different algorithms; all of them are based on the solution of an LP similar to LP in Eqs. 15. Then, at any step j of a round i consider: the current resource \(r_{i}\) and the set of tasks \(T_{i,j}\) that was presented to resource \(r_{i}\) before the step j or has already been completely solved as demanded by the function n(j); then, at step j a task t is drawn with probability:

$$\begin{aligned} P_{t} = \frac{s_{r_{i},t}}{\sum _{t'\in \mathcal {T}\setminus T_{i,j}}s_{r_{i},t'}}, \end{aligned}$$
(6)

where \(s_{r,t}\) for any \(r\in \mathcal {R},t\in \mathcal {T}\) is the solution of the LP.

To solve the DART-MT problem in scenario KK, an LP is solved for each episode. To solve the DART-MT problem in scenario KU, an LP is solved for each round. To solve the DART-MT problem in scenario UU, an LP is solved for each step. In scenarios KU and UU, where knowledge is not full-revealed a priori, LPs are defined based on population information, and when new information is revealed, per round or per step, a new LP is solved. This strategy of fully planning based on population information and replanning when information is renewed is used in the Shadow Test Approach (STA) [18, 19, 30]. The Algorithm 1 shows a general solution to DART-MT problems conditioned in one of the three scenarios.

Algorithm 1
figure a

The General Solution to DART-MT conditioned for any scenario

KK Scenario and per Episode Algorithm - LPPE. Because the scenario KK knowledge is fully revealed a priori, the LP to be solved is the one described in Eqs. 15. The solution to the LP problem provides us with a set of (partial) tasks to be presented for each resource. Note that tasks are selected stochastically, but guarantee that a task t cannot be submitted repeatedly to the same resource and cannot be submitted beyond its demand n(t). Since planning occurs only once, the only adaptation occurs through set \(T_{i,j}\) to avoid violating the constraints of the DART-MT problem.

KU Scenario and the per Round Algorithm - LPPR. Let \(\hat{\mathcal {R}}\) be the set of resources that have already been selected from \(\mathcal {R}\), \(\tilde{\mathcal {R}}\) be the set of \(|\mathcal {R}| - |\hat{\mathcal {R}}| - 1\) resources drawn from distributions \(\varTheta \) and M, and \(r_{i}\) be the current resource. An adaptation of the previous LP problem is modeled as follows:

$$\begin{aligned} \max \quad & \sum _{t\in \mathcal {T}} \sum _{r \in \tilde{\mathcal {R}} \cup \{r_{i}\}} x_{r, t} & \end{aligned}$$
(7)
$$\begin{aligned} \text {subject to}\quad & s_{r, t} \in [0,1] & \forall r \in \tilde{\mathcal {R}} \cup \{r_{i}\}, t \in \mathcal {T} \end{aligned}$$
(8)
$$\begin{aligned} & x_{r, t} = s_{r, t} P_{r,t} & \forall r \in \tilde{\mathcal {R}} \cup \{r_{i}\}, t \in \mathcal {T } \end{aligned}$$
(9)
$$\begin{aligned} & \sum _{t \in \mathcal {T}} s_{r, t} \le m(r) & \forall r \in \tilde{\mathcal {R}} \cup \{r_{i}\} \end{aligned}$$
(10)
$$\begin{aligned} & \sum _{r \in \tilde{\mathcal {R}} \cup \{r_{i}\}} x_{r, t} + \sum _{r \in \hat{\mathcal {R}}} \hat{ x}_{r,t} \le n(t) & \forall t \in \mathcal {T} \end{aligned}$$
(11)

with \(\hat{x}_{r, t} = 1\) if the resource r succeeded on the already presented task t and 0 otherwise. Instead of just solving an LP problem once for the episode, the LPPR algorithm solves an LP problem after arriving and being revealed the next resource. Replanning is done considering previous results on tasks submitted to previous resources and the knowledge regarding the current resource, whereas future resources are drawn from a priori distributions.

UU Scenario and the Per Step Algorithm - LPPS. Let \(\hat{\mathcal {R}}\) be the set of resources that have already been selected from \(\mathcal {R}\), \(\tilde{\mathcal {R}}\) be the set of \(|\mathcal {R}| - |\hat{\mathcal {R}}| - 1\) resources drawn from distributions \(\varTheta \) and M, \(r_{i}\) be the current resource, \(T_{i}\) the set of tasks already submitted to resource \(r_{i}\), and \(\hat{P}_{r_{i},t}\) be the estimation probability of success given current resource \(r_{i}\) and task t.

An adaptation of the previous LP problem is modeled as follows:

$$\begin{aligned} \max \quad & \sum _{t\in \mathcal {T}} \sum _{r \in \tilde{\mathcal {R}}} x_{r, t} + \sum _{t\in \mathcal {T}\setminus T_{i}} x_{r_{i}, t} & \end{aligned}$$
(12)
$$\begin{aligned} \text {subject to}\quad & s_{r, t} \in [0,1] & \forall r \in \tilde{\mathcal {R}}, t \in \mathcal {T} \end{aligned}$$
(13)
$$\begin{aligned} & s_{r_{i}, t} \in [0,1] & \forall t \in \mathcal {T}\setminus T_{i} \end{aligned}$$
(14)
$$\begin{aligned} & x_{r, t} = s_{r, t} P_{r,t} & \forall r \in \tilde{\mathcal {R}}, t \in \mathcal {T } \end{aligned}$$
(15)
$$\begin{aligned} & x_{r_{i}, t} = s_{r_{i}, t} \hat{P}_{r_{i},t} & \forall t \in \mathcal {T}\setminus T_{i} \end{aligned}$$
(16)
$$\begin{aligned} & \sum _{t \in \mathcal {T}} s_{r, t} \le m(r) & \forall r \in \tilde{\mathcal {R}} \end{aligned}$$
(17)
$$\begin{aligned} & |T_{i}| + \sum _{t \in \mathcal {T} \setminus T_{i}} s_{r_{i}, t}\le m(r_{i})& \end{aligned}$$
(18)
$$\begin{aligned} & x_{r_{i},t} + \sum _{r \in \tilde{\mathcal {R}}} x_{r, t} + \sum _{r \in \hat{\mathcal {R}}} \hat{ x}_{r,t} \le n(t) & \forall t \in \mathcal {T}\setminus T_{i} \end{aligned}$$
(19)
$$\begin{aligned} & \hat{x}_{r_{i},t} + \sum _{r \in \tilde{\mathcal {R}}} x_{r, t} + \sum _{r \in \hat{\mathcal {R}}} \hat{ x}_{r,t} \le n(t) & \forall t \in T_{i} \end{aligned}$$
(20)

with \(\hat{x}_{r, t} = 1\) if the resource r succeeded on the already presented task t and 0 otherwise.

Here, the LPPS algorithm solves an LP problem on every step (after each task submission). Again, replanning is done considering previous results on tasks submitted to previous resources and the knowledge regarding the current resource, whereas future resources are drawn from a priori distributions. However, knowledge regarding the current resource is updated after every trial of solving a task, depending on the parameters of the task already submitted and the given results, success or failure. We postpone the discussion of estimating parameters of the current resource to Sect. 5.

4 Related Work

In this work, a new problem (DART-MT) is defined with specific settings. Therefore, there is not an obvious state-of-the art method to be compared to ours. We used as baselines a random selection and the heuristic easier first, even though methods applicable to similar problems are briefly compared to our approach in the following paragraphs.

Many works make use of resource skills to improve the task allocation problem [4, 10, 14, 20, 27, 29]. Some of them use this approach to improve allocation in project management or in crowdsourcing context, where the tasks are allocated to more than one resource to receive a number of solutions. However, these works assume the ability of the resources are given or can be estimated from a content-based approach, which is not always the case. Our work stands on a cold start, i.e., we do not have any previous information about the resources. In recent works, social search engines have become very useful as it uses information about the resources so as to improve task allocation [2, 3, 8, 11]. These studies use techniques like ranking function and information retrieval in the context of crowdsourcing or the Q &A Problem. Although similar to our work in the sense of optimizing task allocation, we do not use previous information about the resources and estimate their skills from the interaction with the tasks. Also, we do not use a threshold to reject weaker resources.

The STA is largely discussed in the context of adaptive tests [18, 19, 30]. In this context, many works address the problem of task allocation by learning information about the resources [12, 24]. They repeatedly administer tasks to resources; however, the objective of these works is restricted to better estimating the resources’ abilities. We use the same approach in a different context and with a different purpose since we use resource characterization to optimize the allocation of tasks by maximizing the number of solutions, not having resource characterization as an end in itself. In some works [9, 25, 31], resources and tasks are dynamically characterized. The tasks are allocated to a resource if the probability of a correct answer is above a threshold. Our study does not limit the allocation of tasks by a threshold. Also, the purpose is different, in their case, each resource receives only one task, and each task needs only one solution.

5 Experiments and Results

To evaluate the proposed algorithms, we adapt real data from an exam of mathematics to emulate tasks (questions) and resources (students). Within this data, we construct different DART-MT problems to identify the difference in the performance of our algorithms in the three scenarios KK, KU, and UU. Finally, we present the improvement of our algorithms against two baselines: uniformly random and easier first.

5.1 Real World Database

Evaluating in Real World Data. Real data can be obtained, basically, in two ways: online or offline. The online test consists of presenting tasks to human resources while the process is running, whereas the offline test consists of using a database that was previously collected. The advantage of the online test is to apply the method in the real world with resources interacting directly with the allocation process. The downside is that access to such resources is very costly, so getting a reasonable amount of human resources available to receive tasks is very complicated. The advantage of the offline test is precisely that it can take advantage of a database, but the interaction of the process with the real world is lost.

To use an offline test, the database need to meet certain criteria: (i) the independence of task solution conditioned on the resource, and (ii) all tasks have been submitted to all resources. The success or failure of each task submission is obtained from the history taken from the database. We can choose any task for a resource and check in the history if he has solved the task correctly or not. One can arbitrarily choose resources, their arrival order, and their availability without loss of generality. Availability must always be at most equal to the total amount of existing tasks in the database. It is equally possible to choose tasks and the number of solutions for them arbitrarily. The number of solutions demanded by each task must be equal to or less than the total amount of resources in the database. With historical data in hand, it remains to model the parameters \(\zeta \), \(\theta \), and the success probability function f.

ENEM Database. A database of the 2012 Brazilian baccalaureate examination (ENEM) containing responses from ten thousand people (resources) and 185 questions (tasks) was used. The data are public and can be downloaded from the transparency portal [13]. ENEM uses Item Response Theory to model tasks (\(\zeta \)), resources (\(\theta \)), and the success probability function (f). This exam is used to assess the level of knowledge (skills) of students in four areas of knowledge: human sciences; natural sciences; languages; and mathematics. In this work, only the exam of mathematics was considered.

We calibrate task parameters on data from 10,000 students and make use of the three-parameter logistic model, where parameter b indicates the difficulty of a question. We make use of such parameters to construct a baseline policy where the current resource always receives the easiest allowed task, the heuristic easier first.

5.2 Defining DART-MT Problems

From the database with the answers of ten thousand ENEM students, the average chance of the task t to be correctly solved is estimated by:

$$\begin{aligned} k(t) = \frac{\text {number of resources that solved the task } t \text { correctly}}{\text {number of resources that tried to solve the task } t} \end{aligned}$$
(21)

The responses of all students in the database were used to calculate k(t) and \(k_{mean} = \frac{\sum _{t\in \mathcal {T}}k(t)}{|\mathcal {T}|}\). However only one hundred of them are sampled to evaluate the method at each episode. So, from now on we consider \(|\mathcal {R}| = 100\) and the 45 math items from the ENEM’s mathematics test as the tasks, so \(|\mathcal {T}| = 45\). Depending on the number of solutions for each task, n(t), and the availability of each resource, m(r), solving all the required tasks can be more or less difficult. In this way, we define the following difficulty levels:

  • Level 1:

$$\begin{aligned} n(t) = \lceil {|\mathcal {R}| \times k_{mean}}\rceil \quad \forall t \in {\mathcal {T}}\end{aligned}$$
(22)
$$\begin{aligned} m(r) = \lceil |\mathcal {T}| \times k_{mean} \rceil \quad \forall r \in \mathcal {R} \end{aligned}$$
(23)
  • Level 2:

$$\begin{aligned} n(t) = \lceil {|\mathcal {R}| \times k(t)}\rceil \quad \forall t \in \mathcal {T}\end{aligned}$$
(24)
$$\begin{aligned} m(r) = \lceil |\mathcal {T}| \times k_{mean} \rceil \quad \forall r \in \mathcal {R} \end{aligned}$$
(25)
  • Level 3: n(t) is defined as the number of correct solutions for the task t and m(r) as the number of tasks the resource r solved correctly.

In levels 1, 2, and 3, we have that \(\sum _{t\in \mathcal {T}} n(t) \approx \sum _{r\in \mathcal {R}} m(r)\). Note that we can only get all the desired solutions for difficulty levels 1 and 2 if all resources have the same ability or/and if all tasks have the same difficulty. To obtain all the solutions for level 3, we have to present for each resource exactly the tasks that user can solve correctly.

One way to relax the levels is to decrease the number of solutions desired for each task, for example, \(n'(t) = n(t) \times 0.5\). We apply such relaxed version to level 3 to obtain the Level 4.

Another more-relaxed level, let’s call Level 0, is: given \(k_{mean}\), we compute n(t) and m(r) as constant values such that

$$\begin{aligned} |\mathcal {R}| \times m(r) \times k_{mean} = |\mathcal {T}| \times n(t) \end{aligned}$$
(26)

For this level, we choose \(m(r) = 16 \approx \sum _{t \in \mathcal {T}} k(t)\) and we get \(n(t) = 12\) from the Eq. 26 with \(k_{mean} = 0.34\).

Empirical Experiment. For each difficulty level and algorithm, we run 100 episodes. For each episode, one hundred resources are used from a previous random sample among the ten thousand available students. All simulations were performed in a Google Colaboratory environment. All codes were written in Python 3 language. To solve LP problems, we used the Pulp module and the open-source solver Coin-or Branch and Cut (CBC) [22].

5.3 Results Using Real World Data

We use the five levels defined before to evaluate our proposed algorithms. The results obtained for each level and in each scenario are shown in the Table 1. The proposed approaches are compared against two baselines: the uniform random selection and the heuristic easier first. The comparison is by the number of tasks solved correctly. Besides solving each scenario with the corresponding time abstraction, we also apply per round algorithm to the scenario KK to be contingent on trial results.

Table 1. Results for all scenarios and algorithms.

Discussion. Regardless of the knowledge of the resource’s ability to solve the tasks, the algorithms LPPE, LPPR, and LPPS obtained quantities of solutions greater than the baselines (random and easier first) for all levels. In the case resources’ abilities are well known, the algorithms LPPE and LPPR are better than the baselines and can be computed in a feasible time (less than three minutes per episode). If we do not know the ability of the resources to whom the tasks should be submitted, we use the algorithm LPPS, which furnished a better allocation than the baselines for all five levels. The algorithm LPPS takes about forty minutes per episode, and the resources need to wait no more than 1.7 s to receive a task to try to solve (it was the case of the level 0, with the results shown on Table 1).

Note that the algorithms only make sense to be applied in certain scenarios. For example, in scenario UU, it doesn’t make sense to use the per episode algorithm since we will allocate the tasks based on the solution of an LP problem whose resources’ abilities were unknown. Also, if we are in the scenario KK, it is not necessary to solve an LP problem at each step if the resources’ abilities are well known from the beginning. To better visualize the results, box plot graphics are shown for all scenarios, levels, and algorithms in Fig. 1. The name of the algorithms are abbreviated: random (rand), easier first (E-F), per episode (Ep), per round (Res), and per step (Sub).

Fig. 1.
figure 1

Box plots results for all scenarios and levels

To better analyze the results, we used statistical tests to verify which algorithm produced the higher number of solutions. The first step was to use the Shapiro-Wilk test to determine the data normality. With \(95\%\) of significance, the data do not deviate from a normal distribution, except for the results for some algorithms of level 0. As we have a lot of samples (100 per episode), we used the Analysis of Variance (ANOVA) test to compare the algorithms for each scenario in all levels, including level 0. The results of the ANOVA tests determined that for all scenarios in all levels, there are statistically significant differences, with \(95\%\) of significance. From the ANOVA test results, the number of solutions is not statistically equivalent between the algorithms used in each scenario and level. Then, we used the Tukey pairwise multiple comparisons (post hoc) test to verify which means differ, with \(95\%\) of significance. Specific algorithms are used in each scenario:

  • KK: random; easier first; per episode; and per round.

  • KU: random; easier first; and per round.

  • UU: random; easier first; and per step.

From the results of the Tukey post hoc test, the proposed algorithms per episode, per round and per step are better than the baselines (random and easier first) for all scenarios in all levels, except for the algorithm LPPS and the baseline easier first in the scenario UU at level 3. In this comparison, the Tukey test resulted in no statistical difference in the number of solutions. Figure 2 shows (for the first episode) the number of submissions and solutions, the mean of the difficulties of allocated tasks, and the ability of the resources for these two algorithms for scenario UU at level 3. From this figure, resources with higher abilities are available to receive more tasks at level 3. This is a very appropriate case for the easier first algorithm since the more tasks the resources receive following the easier first heuristic, the more difficult tasks received.

Fig. 2.
figure 2

Algorithms at level 3, on the first episode.

The algorithms per episode and per round in the scenario KK are statistically equivalent. It is expected since in the scenario KK, all the resources’ abilities are well known beforehand, then the gain in doing replannings after each round is not significantly better than planning once at the beginning of the episode.

The heuristic easier first is better than the random selection at most levels, except at level 0. Figure 3 shows (for the mean of all episodes) the number of submissions and solutions, the mean of the difficulties of allocated tasks, and the ability of the resources for these two algorithms at level 0. The mean of all episodes is used so that the influence of the ability of the resources is diminished. We can see that the mean of the abilities goes to zero. In the easier first allocation, it is clear the difficulty of tasks increases over the steps, whereas the number of solutions decreases. While in the random allocation, the difficulty of the tasks is approximately uniform. The last resources receive tasks slightly more difficult because the easier already received the needed number of solutions. These characteristics from both algorithms explain why the random made more allocations (approximately \(4\%\)) than the easier first, which led to receiving more solutions (approximately \(1\%\)).

Fig. 3.
figure 3

Algorithms at level 0, mean of all episodes.

6 Conclusion and Future Work

We can compare how far is the algorithm LPPS from the baselines and from the optimal case, that is, when we have the most information: scenario KK and the solution of the LP problems is updated per round. We have an increase in the number of solutions compared with the baselines on all levels. Compared with the baseline easier first, we have up to almost \(9\%\) more solutions; compared with the random, we have up to \(49\%\) more solutions. When we compare with the optimal case, the UU scenario using the LPPS algorithm reached up to \(96\%\) of the number of solutions of the optimal case.

These processes could be used to submit fewer tasks to resources without losing important information. For example, when asking people to answer polls or to generate databases to be used in important applications such as Q &A, intelligent tutoring systems, or natural language processing. In addition to the abilities of the resources being unknown, there is also the case in which the complexity of solving the tasks is unknown. Therefore, estimating parameters that characterize such tasks become necessary [7, 21]. The task parameters can be obtained from the solutions of the tasks or from a description of them [16]. We plan on considering scenarios where the resource is not continuously available to receive tasks, but he arrives and goes. Also, we plan on developing a real experiment with human resources receiving tasks and not only using real data from other applications.