key: cord-0576034-816k63yl authors: Wu, Chen-Xin; Liao, Min-Hui; Karatas, Mumtaz; Chen, Sheng-Yong; Zheng, Yu-Jun title: Real-Time Neural Network Scheduling of Emergency Medical Mask Production during COVID-19 date: 2020-07-28 journal: nan DOI: nan sha: b1ecf502a01b68df89ccfeabdea524959e835c93 doc_id: 576034 cord_uid: 816k63yl During the outbreak of the novel coronavirus pneumonia (COVID-19), there is a huge demand for medical masks. A mask manufacturer often receives a large amount of orders that are beyond its capability. Therefore, it is of critical importance for the manufacturer to schedule mask production tasks as efficiently as possible. However, existing scheduling methods typically require a considerable amount of computational resources and, therefore, cannot effectively cope with the surge of orders. In this paper, we propose an end-to-end neural network for scheduling real-time production tasks. The neural network takes a sequence of production tasks as inputs to predict a distribution over different schedules, employs reinforcement learning to optimize network parameters using the negative total tardiness as the reward signal, and finally produces a high-quality solution to the scheduling problem. We applied the proposed approach to schedule emergency production tasks for a medical mask manufacturer during the peak of COVID-19 in China. Computational results show that the neural network scheduler can solve problem instances with hundreds of tasks within seconds. The objective function value (i.e., the total weighted tardiness) produced by the neural network scheduler is significantly better than those of existing constructive heuristics, and is very close to those of the state-of-the-art metaheuristics whose computational time is unaffordable in practice. ZHENDE is a medical apparatus manufacturer in Zhejiang Province, China. It has a mask production line that can produce different types masks, such as disposable medical masks, surgical masks, medical protective masks, and respiratory masks. The Daily output is nearly one hundred thousand. However, on each day during the outbreak of the novel coronavirus pneumonia (COVID- 19) , the manufacturer often receives tens to hundreds of mask orders, the total demand of which ranges from hundreds of thousands to a million masks. Almost all orders have tight delivery deadlines. Therefore, it is of critical importance for the manufacturer to efficiently schedule the mask production tasks to satisfy the orders as much as possible. The manufacturer asked our research team to develop a production scheduler that can schedule hundreds of tasks within seconds. In fact, many manufacturers of medical supplies have similar requirements during the pandemic. Scheduling production tasks on a production line can be formulated as a machine scheduling problem which is known to be NP-hard [1] . Exact optimization algorithms (e.g., [2, 3, 4, 5] ) have very large computation times that are infeasible on even moderate-size problem instances. As for moderate-and large-size instances optimal solutions are rarely needed in practice, heuristic approximation algorithms, in particular evolutionary algorithms (e.g., [6, 7, 8, 9, 10, 11, 12] ), are more feasible to achieve a trade-off between optimality and computational costs. However, the number of repeated generations and objective function evaluations for solving large-size instances still takes a relatively long time and, therefore, cannot satisfy the requirement of real-time scheduling. Using end-to-end neural networks to directly map a problem input to an optimal or near-optimal solution is another research direction that has received increasing attention. The earliest work dates back to Hopfield and Tank [13] , who applied a Hopfield-network to solve the traveling salesman problem (TSP). Simon and Takefuji [14] modified the Hopfield network to solve the job-shop scheduling problem. However, the Hopfield network is only suitable for very small problem instances. Based on the premise that optimal solutions to a scheduling problem have common features which can be implicitly captured by machine learning, Weckman et al. [15] proposed a neural network for scheduling job-shops by capturing the predictive knowledge regarding the assignment of operation's position in a sequence. They used solutions obtained by genetic algorithm (GA) as samples for training the network. To solve the flow shop scheduling problem, Ramanan et al. [16] used a neural network trained with optimal solutions of known instances to produce quality solutions for new instances, which are then given as the initial solutions to improve other heuristics such as GA. Recently, deep learning has been utilized to optimization algorithm design by learning algorithmic decisions based on the distribution of problem instances. Vinyals et al. [17] introduced the pointer network as a sequence-to-sequence model, which consists in an encoder to parse the input nodes, and a decoder to produce a probability distribution over these nodes based on a pointer (attention) mechanism over the encoded nodes. They applied the pointer network to solve TSP instances with up to 100 nodes. However, the pointer network is trained in a supervised manner, which heavily relies on the expensive optimal solutions of sample instances. Nazari et al. [18] addressed this difficulty by introducing reinforcement learning to calculate the rewards of output solutions, and applied the model to solve the vehicle routing problem (VRP). Kool et al. [19] used a different decoder based on a context vector and improved the training algorithm based on a greedy rollout baseline. They applied the model to several combinatorial optimization problems including TSP and VRP. Peng et al. [20] presented a dynamic attention model with dynamic encoder-decoder architecture to exploit hidden structure information at different construction steps, so as to construct better solutions. In this paper, we propose a deep reinforcement approach for scheduling real-time production tasks. The neural network takes a sequence of production tasks as inputs to predict a distribution over different schedules, employs reinforcement learning to optimize network parameters using the negative total tardiness as the reward signal, and finally produces a high-quality task scheduling solution. We applied the proposed neural network scheduler to a medical mask manufacturer during the peak of COVID-19 in China. Computational results show that the neural network scheduler can solve problem instances with hundreds of tasks within seconds. The objective function value (i.e., the total weighted tardiness) produced by the neural network scheduler is significantly better than those of existing constructive heuristics such as the Nawaz, Enscore and Ham (NEH) heuristic [21] and Suliman heuristic [22] , and is very close to those of the state-of-the-art metaheuristics whose computational time is obviously unaffordable in practice. The remainder of this paper is organized as follows. Section 2 describes the emergency production scheduling problem. Section 3 presents the architecture of the neural network, Section 4 depicts the reinforcement learning algorithm, and Section 5 presents the experimental results, and finally Section 6 concludes with a discussion. In this section, we formulate the scheduling problem as follows (the variables are listed in Table 1 ). The manufacturer receives K orders, denoted by Each order O k is associated with a set Φ k of production tasks (jobs), which is related to the number of mask types in the order. Each order O k has an expected delivery time d k and an importance weight w k according to its value and urgency. In our practice, the manager gives a score between 1-10 for each order, and then all weights are normalized such that ( K k=1 w k ) = 1. Let J = {J 1 , J 2 , . . . , J n } be the set of all tasks. These tasks need to be scheduled on a production line with m machines, denoted by M = {M 1 , M 2 , . . . , M m }. Each task J j has exactly m operations, where the ith operation must be processed on machine M i with a processing time t ij (1 ≤ i ≤ m; 1 ≤ j ≤ n). Each machine can process at most one task at a time, and each operation cannot be interrupted. The operations of mask production typically include cloth cutting, fabric lamination, belt welding, disinfection, and packaging. The problem is to decide a processing sequence π = {π 1 , π 2 , . . . , π n } of the n tasks. Let C(π i , j) denote the completion time of task π j on machine M i . For the first machine M 1 , the tasks can be sequentially processed immediately one by one: The first job π 1 can be processed on each subsequent machine M i immediately after it is completed on the previous machine M i−1 : Each subsequent job π j can be processed on machine M i only when (1) the job π j has been completed on the previous machine M i−1 ; (2) the previous job π j−1 has been completed on machine M i : C(π j , i) = max C(π j , i − 1), C(π j−1 , i) + t π j ,i , i = 2, . . . , m; j = 2, . . . , n (4) Therefore, the completion time of each order O k is the completion time of the last task of the order on machine M m : The objective of the problem is to minimize the total weighted tardiness of the orders: If all tasks are available for processing at time zero, the above formulation can be regarded as a variant of the permutation flow shop scheduling problem which is known to be NP-hard [1] . When there are hundreds of tasks to be scheduled, the problem instances are computationally intractable for exact optimization algorithms, and search-based heuristics also typically take tens of minutes to hours to obtain a satisfying solution. Moreover, in a public health emergency such as the COVID-19 pandemic, new orders may continually arrive during the emergency production and, therefore, it is frequently to reschedule production tasks to incorporate new tasks into the schedules. The allowable computational time for rescheduling is even shorter, typical only a few seconds. Hence, it is required to design real-time or near-real-time rescheduling methods for the problem. We propose a neural network scheduler based on the encoder-decoder architecture [23] to efficiently solve the above production task scheduling problem. Fig. 1 illustrates the architecture of the network. The input to the network is a problem instance represented by a sequence of n tasks, each of which is described by a (m+2)-dimensional vector x j = {p j,1 , p j,2 , . . . , p j,m , d k , w k } that consists the processing times on the m machines and the expected delivery time and weight importance of the corresponding order. To facilitate the processing of the neural network, all inputs are normalized into [0,1], e.g., each d k is transformed to (d k −d min )/(d max −d min ), where d min = min 1≤k≤K d k and d max = max 1≤k≤K d k . The encoder is a recurrent neural networks (RNN) with long short-term memory (LSTM) [24] cells. An LSTM takes a task x j as input at a time and transforms it to a hidden state h j by increasingly computing the embedding of the inputs (where att denotes the transformation by LSTM): . . . As a result, the encode produces an aggregated embedding of all inputs as the mean of n hidden states: The decoder also performs n decoding steps, each making a decision on which task should to be processed at the next step. At each j-th step, it constructs a context vector h c by concatenating h and the hidden state h j−1 of the previous LSTM. We use a five-layer deep neural network to implement the decoder. The first layer takes h as input and transforms it into a n 1dimensional hidden vector u 1 (n 1 < n): where W 1 is a n 1 ×n weight matrix and b 1 is a n 1 -dimensional bias vector. The second layer takes the concatenation of u 1 and context vector h c as input and transforms it into a n 2 -dimensional hidden vector u 2 : where [ ; ] denotes the horizontal concatenation of vectors, W 2 is a n 2 ×(n 1 + 2n) weight matrix and b 2 is a n 2 -dimensional bias vector. And each of the remaining layer takes the hidden state of the previous layer, and transforms it into a lower-dimensional hidden vector using ReLU activation. Finally, the probability that each task x j is selected at the t-th step is calculated based on the state u of the top layer of the DNN: A solution to the scheduling problem can be viewed as a sequence of decisions, and the decision process can be regarded as a Markov decision process [25] . According to the objective function (6), the training of the network is to minimize the loss We employ the policy gradient using REINFORCE algorithm [26] with Adam optimizer [27] to train the network. The gradients of network parameters θ are defined based on a baseline base(x) as: A good baseline reduces gradient variance and increases learning speed [19] . Here, we use both the NEH heuristic [21] and Suliman heuristic [22] to solve each instance x, and use the better one as the base(x). During the training, we approximate the gradient via Monte-Carlo sampling, where B problem instances are drawn from the same distribution: Algorithm 1 presents the pseudocode of the REINFORCE algorithm. In the training phase, according to production tasks of the manufacturer during the peak of COVID-19 in China, we randomly generate 20,000 instances. The basic features of the instance distribution are as follows: m = 5, n follows a normal distribution N (124, 33) , t ij follows a normal distribution N (2.4, 1.6) (in hours), and d k follows a uniform discrete distribution {24, 36, 48, 60, 72, 96, 120} (in hours). The maximum number of epochs for training the network is set to 100. Compute the baseline base(x i ); θ ← Adam(θ, g(θ)); 10 return θ. For comparison, we also use three different baselines: the first is a greedy heuristic that sorts tasks in decreasing order of w k /( m i=1 t ij ), the second is the NEH heuristic [21] , and the third is the Suliman heuristic [22] . The neural network is implemented using Python 3.4, while the heuristics are implemented with Microsoft Visual C# 2018. The experiments are conducted on a computer with Intel Xeon 3430 CPU and GeForce GTX 1080Ti GPU. Fig. 2 presents the convergence curves of the four methods during the training process. In average, our method converges after 55∼65 epochs, the individual NEH and Suliman heuristics converge after 80∼85 epochs, while the greedy method converges to local optima that are significantly worse than the results of the first three methods. The results demonstrate that our method using hybrid NEH and Suliman heuristics as the baseline can significantly improve the training performance compare to the existing heuristic baselines. Next, we test the performance of the trained neural network scheduler for solving the mask production task scheduling problem. We select 146 realworld instances of the manufacturer from Feb 8 to Feb 14, 2020, the peak of COVID-19 in China. For each day, we need to first solve an instance with about 50∼200 tasks; during the daytime, with the arrival of new orders, we need to reschedule the production for 20∼40 times. To validate the performance of neural network scheduler, we also run the following five state-of-the-art metaheuristic algorithms to solve each instance, as use the best result among the algorithms as the benchmark: • A shuffled complex evolution algorithm (SCEA) [28] ; • An algebraic differential evolution (ADE) algorithm [29] ; • A teaching-learning based optimization (TLBO) algorithm [30] ; • A biogeography-based optimization (BBO) algorithm [31] ; • A discrete water wave optimization (WWO) algorithm [12] . Table 2 presents the average CPU time required to obtain the solutions. The results show that, the results of the neural network scheduler are significantly better than those of the NEH and Suliman heuristics. The NEH heuristic and neural network scheduler consume similar computational time, but the objective function value of NEH is about 2∼3 times of that of the neural network scheduler. The Suliman heuristic consumes more computational time and obtains even worse objective function value than the neural network scheduler. The benchmark solutions are obtained by the best metaheuristic among the five state-of-the-art ones, using significantly longer computational time (600∼1500 seconds) than that of the neural network scheduler (only 1∼2 seconds). Nevertheless, the objective function values produced by the neural network scheduler are very close to (approximately 6%∼7% larger than) those of the benchmark solutions. In emergency conditions, the computational time of the state-of-the-art metaheuristics is obviously unaffordable, while the proposed neural network scheduler can produce high-quality solutions within seconds and, therefore, satisfy the requirements of emergency medical mask production. In this paper, we propose a deep neural network with reinforcement for scheduling emergency production tasks within seconds. The neural network consists of an encoder and a decoder. The encoder employs LSTM-based RNN to sequentially parse the input production tasks, and the decoder employs a deep neural network to learn the probability distribution over these tasks. The network is trained by reinforcement learning using the negative total tardiness as the reward signal. We applied the proposed neural network scheduler to a medical mask manufacturer during the peak of COVID-19 in China. The results show that the proposed approach can achieve high-quality solutions within very shorter computational time to satisfy the requirements of emergency production. The baseline plays a key role in reinforcement learning. The baseline used in this paper is based on two constructive heuristics, which have much room to be improved. However, better heuristics and metaheuristics often require large computational resource and are not efficient in training a large number of test instances. In our ongoing work, we are incorporating other neural network schedulers to improve the baseline. Another future work is to use evolutionary metaheuristics to optimize the parameters of the deep neural network [32] . Scheduling Theory, Algorithms, and Systems Flow-shop scheduling with the branchand-bound method Bilevel programming applied to the flow shop scheduling problem Mixed binary integer programming formulations for the flow shop scheduling problems. a case study: ISD projects scheduling A discrete time exact solution approach for a complex hybrid flow-shop scheduling problem with limited-wait constraints A genetic algorithm for flow shop scheduling problems Scheduling flow shops using differential evolution algorithm A discrete version of particle swarm optimization for flowshop scheduling problems An efficient flow-shop scheduling algorithm based on a hybrid particle swarm optimization model A hybrid discrete biogeography-based optimization for the permutation flow-shop scheduling problem A discrete water wave optimization algorithm for no-wait flow shop scheduling problem Water wave optimization for combinatorial optimization: Design strategies and applications neural" computation of decisions in optimization problems Integer linear programming neural networks for job-shop scheduling A neural network jobshop scheduler An artificial neural network based heuristic for flow shop scheduling problems Advances in Neural Information Processing Systems Reinforcement learning for solving the vehicle routing problem Attention, learn to solve routing problems! A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems A heuristic algorithm for the mmachine, n-job flow-shop sequencing problem A two-phase heuristic approach to the permutation flow-shop scheduling problem Sequence to sequence learning with neural networks Advances in Neural Information Processing Systems Learning to forget: continual prediction with LSTM Neural combinatorial optimization with reinforcement learning Simple statistical gradient-following algorithms for connectionist reinforcement learning Adam: A method for stochastic optimization A shuffled complex evolution algorithm with opposition-based learning for a permutation flow shop scheduling problem Algebraic differential evolution algorithm for the permutation flowshop scheduling problem with total flowtime criterion An extended teaching-learning based optimization algorithm for solving no-wait flow shop scheduling problem Enhanced biogeography-based optimization for flow-shop scheduling Shallow and deep neural network training by water wave optimization