key: cord-0666341-36vxp27f authors: Niu, Yi-Shuai; Ding, Wentao; Hu, Junpeng; Xu, Wenxu; Canu, Stephane title: Spatio-Temporal Neural Network for Fitting and Forecasting COVID-19 date: 2021-03-22 journal: nan DOI: nan sha: 25a15423e7db36cae2741dd5898cc8082fa03144 doc_id: 666341 cord_uid: 36vxp27f We established a Spatio-Temporal Neural Network, namely STNN, to forecast the spread of the coronavirus COVID-19 outbreak worldwide in 2020. The basic structure of STNN is similar to the Recurrent Neural Network (RNN) incorporating with not only temporal data but also spatial features. Two improved STNN architectures, namely the STNN with Augmented Spatial States (STNN-A) and the STNN with Input Gate (STNN-I), are proposed, which ensure more predictability and flexibility. STNN and its variants can be trained using Stochastic Gradient Descent (SGD) algorithm and its improved variants (e.g., Adam, AdaGrad and RMSProp). Our STNN models are compared with several classical epidemic prediction models, including the fully-connected neural network (BPNN), and the recurrent neural network (RNN), the classical curve fitting models, as well as the SEIR dynamical system model. Numerical simulations demonstrate that STNN models outperform many others by providing more accurate fitting and prediction, and by handling both spatial and temporal data. The novel coronavirus COVID-19 was firstly reported in Wuhan since December 2019. The Chinese government reacted very quickly and made great effort to limit the spread of the disease. Numerous effective control measures have been implemented, including traffic control, travel limitation, mask wearing, social distancing, party cancellation, targeted isolation, environmental disinfection, work resumption delaying, and online working etc. As a result, the epidemic situation was effectively controlled in China. By mid-February, the peak of the daily active cases appeared and dropped rapidly, and the spread of the disease in China area was almost stopped by mid-April. Meanwhile, it was just the beginning of the worldwide situation. WHO reported in April 4 that over 1 million cases of COVID-19 had been confirmed worldwide, a more than tenfold increase in less than a month. Until January 2021, there are 95 millions cumulative cases and 2 million deaths worldwide. Several variants of COVID-19 appeared and some of them have stronger transmission capacity, e.g., SARS-CoV-2 outbreak in UK was estimated up to 70% more transmissible than the previously circulating forms (Kirby, 2021) . Scientists rush to understand the new illness COVID-19. Hundreds of preprints are shared every day on the study of all aspects including biomedicine, epidemiology, economics, sociology, mathematics and computer sciences etc. Several reports, e.g., (Organization et al., 2019; Wu et al., 2020; Leung et al., 2020; Xu et al., 2020; Li et al., 2020a) , show that COVID-19 has a higher basic reproductive number R 0 and a lower death rate comparing to the other well-known two coronaviruses SARS-CoV and MERS-CoV (SARS-CoV outbreak in 2002 caused more than 8000 infections and 800 deaths, and MERS-CoV outbreak in 2012 caused 2494 individuals and 858 deaths). Zhong's team analyzed in first the symptoms, latency and mortality of COVID-19 and emphasized the significance of early isolation (Guan et al., 2020) . Later, the impact of different factors has been studied, such as the temperature, human mobility and control measures etc., see e.g., (Lin et al., 2020; Chinazzi et al., 2020; Xie & Zhu, 2020; Kraemer et al., 2020) . With the spread of the pandemic to the whole world, the transmission dynamical models for different countries have been investigated, see e.g., (Phan et al., 2020; Mizumoto & Chowell, 2020; Fanelli & Piazza, 2020; Rockett et al., 2020; Giordano et al., 2020; Sarkar et al., 2020) . Our work is focused on establishing a spatio-temporal deep neural network model to predict the spread of disease over time and space. There exist many works on machine learning approaches to forecast the trend of COVID-19. The study of traditional models such as linear regression, SVM for regression, fast decision tree learner and so on are reported in (James Fong et al., 2020; Chinazzi et al., 2020; Ayyoubzadeh et al., 2020) , which showed that the traditional methods performed well enough with small-scale dataset. The use of deep learning models such as LSTM and GRU to predict COVID-19 are reported in Chimmula & Zhang, 2020; Shahid et al., 2020; Li et al., 2020b) , which performed well in some tasks using time series dataset. However, an advanced model taking both temporal and spatial data into account to predict the spread of infectious disease is rarely appeared in literature. In this manuscript, we establish in Section 2 a spatiotemporal deep neural network (namely, STNN) to fit spatial and temporal pandemic data and predict the trend of the decease over time and space. Its training algorithm based on SGD is also proposed. Two improved variants of STNN, namely STNN with Augmented Spatial States (STNN-A) and STNN with Input Gate (STNN-I), are developed which could provide more flexibility and improve prediction accuracy. Some classical prediction models to compare with are briefly introduced in Section 3 including: the fully-connected neural network model (BPNN), the recurrent neural network models (LSTM and GRU), the curve fitting (Gaussian, exponential and polynomial) models, and the SEIR dynamical system model. Our proposed STNN model and its variants are implemented in Python, and the compared classical prediction models are developed in MAT-LAB. These codes are tested on a set of COVID-19 dataset (a collection of various spatial and temporal data in China, United-States and Italy). Numerical results are reported in Section 4 which demonstrate that our STNN models perform well enough by providing accurate predictions and exhibit the ability to handle both spatial and temporal data. Spatio-Temporal Neural Network, namely STNN, is a neural network architecture to predict phenomena evolving in time and in space. Thus, it is suitable for epidemic predictions. The classical STNN model is based on the Recurrent Neural Network (RNN) with additional spatial data. The idea to feed the neural network with spatial data comes from (Ziat et al., 2017) . The spatial and temporal data required in STNN are described as follows: Model Data: Suppose that we observe a time series data of size m. Temporal data at time t ∈ [m] 1 is denoted by x t ∈ R n×d where n is the number of observing locations (e.g., cities, countries) and d is the number of observing targets (e.g., infections, deaths). The spatial data, denoted by , is a collection of p matrices of spatial features (e.g., the distance between cities, the transport flow between cities). 1 The notation [m] with m ∈ N stands for {1, 2, . . . , m}. Model Architecture: The architecture of the classical STNN is illustrated in Figure 1 : The hidden state can be understood as the high-dimensional representation of the observation, and the observation is the projection of the hidden state on the low-dimensional space through the observation network. The state network is used to achieve the hidden state transition, that is, the hidden state s t+1 is the output of the state network b with input hidden state s t coupling with spatial data W i , ∀i ∈ [p]. Due to the functionality of the observation network and the state network, we use either tanh or sigmoid as activation functions for all hidden layers and the output layer. Unlike the traditional RNN model, the hidden states in STNN are also parameters to be trained, so that the loss function of STNN model is defined as: where . F stands for the matrix Frobenius norm 2 , and the training parameters θ = (θ a , θ b , s 1 , . . . , s m ) in which θ a and θ b are parameters of the neural networks a and b. The first summand of L measures the quality of the observation network and the hidden states, and the second summand measures the quality of the state network and the hidden states. Training STNN amounts to find an observation network a, a state network b and all hidden states s t , t ∈ [m] to minimize the loss function L, which is the following unconstrained optimization problem: Once an "optimal" network is obtained, the prediction is made by updating the last hidden state from s m to s m+1 through the state network as and the predicted output x m+1 is obtained through the observation network as a(s m+1 ). There are several drawbacks of the classical STNN model. Firstly, the spatial data W i , i ∈ [p] are introduced by the linear mapping Such a simple superposition is hard to reflect the real contribution (probably nonlinear) of each element s t , W 1 s t , . . . , W p s t , thus eventually limits the model's prediction accuracy and flexibility. Secondly, there are no input data in STNN. Introducing input data x t at time t may help to correct the hidden states, thus enhance the prediction accuracy. To overcome these drawbacks, we propose two improved architectures, namely STNN with Augmented Spatial States (STNN-A) and STNN with Input Gate (STNN-I). STNN with Augmented Spatial States By introducing augmented spatial states, using a separate spatial state for each spatial feature as the output of W is used as input of the state network b which helps to provide more flexibility for improving the accuracy of the state transition. From an algebraic point of view, both of the mappings U and W are linear. However, the mapping U has the same input and output dimension; while the mapping W has an augmented output dimension (p + 1 times larger than the input one). This particular structure could be useful to include more rich spatial features, thereby improving the capacity of STNN to handle spatial data in a more flexible way. The improved STNN with Augmented Spatial States, namely STNN-A model, is described in Figure 3 . The loss function L A for STNN-A model is defined by: Note that both L and L A have a similar structure thus can be minimized in a similar way. The prediction of STNN-A is proceed accordingly as described in STNN. In STNN and STNN-A structures, the prediction of the next time t + 1 depends on the observation network, the state network and the current hidden state s t . So if they are not accurate, the error will be propagated and accumulated through the network, and yield worse predictions over time. Therefore, we propose to introduce input data x t−1 at time t to adjust the accuracy of the hidden state s t . To this end, the input observation x t−1 is introduced through a fully-connected neural network, namely input network c, whose output c(x t−1 ) will be coupled with the hidden state s t at each time t. Then, the next hidden state s t+1 is justified by c(x t−1 ) as: The improved STNN with Input Gate, namely STNN-I model, is described in Figure 4 . Similar to STNN and STNN-A models, the loss function of STNN-I model is: Once the optimal network is obtained, we predict future hidden states s m+t as: Figure 4 . Structure of the STNN-I model. Note that for t ≥ 3, there is no observation data yet, thus we have to use the predicted result at time m + t − 2, i.e., a(s m+t−2 ) as input data. Training STNN models amounts to solving a large-scale nonlinear and nonconvex optimization problem of type (P), which is obviously a challenging problem and NP-hard in general. The most popular methods in deep learning is to use the Stochastic Gradient Descent (SGD) algorithm and its improved variants (based on the Nesterov acceleration, Polyak momentum, adaptive learning rate, sampling techniques and noise reduction etc., see e.g. (Ruder, 2016; Bottou, 1998; Bottou & Bousquet, 2008; Goodfellow et al., 2016) for excellent presentations and theoretical analysis). Note that due to the lack of necessary and sufficient optimality conditions for nonconvex optimization, and the introduction of stochastics, SGD and its variants can be only expected to find an ε-stationary point, i.e., a random vector θ * for which The term "stochastic" in SGD is typically performed as sampling one or a minibatch of samples to compute the gradient of the loss function defined on them, namely gradient estimatorĝ of the true gradient g. The gradient estimatorĝ is supposed to be unbiased, i.e., E[ĝ] = g. Next, we will present SGD for training the classical STNN model. The variant STNN models can be trained in a similar way. Suppose that we have totally m samples {x 1 , . . . , x m }, let us choose a subset ofm samples {x i } i∈S where S is the index set of samples with S ⊂ [m − 1] and |S| =m. The loss function defined on S is: The classical SGD algorithm for problem (P) is described as follows: Algorithm 1 SGD for training STNN model Input: Learning rates {η k }; Initial θ of STNN. Output: Optimal θ of STNN. Initialize k ← 1; while Stopping criterion not met do Take a minibatch ofm samples with indices in S; Compute gradient estimator:ĝ ← ∇ θ L(θ; S); The hyper-parameter η k in SGD is called learning rate. It is necessary to gradually decrease the learning rate over time since the SGD gradient estimator introduces a source of noise (the random sampling) that does not become 0 even at a local minimum. To guarantee the convergence of SGD, one may choose {η k } as: In practice, we often use the following formulation to generate a sequence {η k } as i.e., linearly decay η k from η 0 until iteration τ with α k = k τ , and fix η k = η τ when k ≥ τ . Concerning the choice of parameters τ , η 0 and η τ , we usually set τ to the number of iterations required to make a few hundred passes through the training set; the final learning rate η τ should be set to roughly 1% of the initial learning rate η 0 . However, the set of η 0 is a sensitive question, which should be neither too large nor too small. If it is too large, then the learning curve will show violent oscillations and cause severe instability; if it is too small, learning proceeds slowly and may become stuck with a high loss value. Typically, we set η 0 slightly greater than the learning rate that yields the best performance after the first 100 iterations. Concerning the convergence rate of SGD, it depends on the optimization problem we are going to solve. The convergence rate is often measured by the excess error defined as L(θ (k) ) − min θ L(θ) where θ (k) denotes the parameter θ at iteration k. As a result, the convergence rate of SGD is of order O( 1 √ k ) for convex optimization problem, and O( 1 k ) for strongly convex optimization problem. These bounds cannot be improved unless extra conditions are assumed. However, training an STNN model is highly nonconvex, and the study of the convergence rate of SGD for a nonconvex optimization problem is still far from mutual. Some recent works (Khaled & Richtárik, 2020; Lei et al., 2019; Gower et al., 2019) show that the optimal convergence rate for SGD to find ε-stationary points of nonconvex optimization problems under the expected smoothness assumption is O(ε −4 ), and recover the optimal O(ε −1 ) if the Polyak-Łojasiewicz condition is satisfied. Note that a training algorithm with super-linear convergence rate for machine learning is in general not expected, as (Bottou & Bousquet, 2008) argue that too fast convergence presumably corresponds to overfitting. We have also tried some improved SGD for training STNN models, including Adam (Kingma & Ba, 2014) , AdaGrad (Duchi et al., 2011) and RMSProp (Tieleman & Hinton, 2012) , among which Adam seems to outperform the others. Adam is a variant of SGD based on adaptive estimates of lower-order moments and adaptive learning rate. It includes bias corrections to the estimates of both the first and second order moments. The method is claimed to be straightforward to implement, computationally efficient, little memory requirements, well suited for problems that are large in terms of data and parameters, and appropriate for non-stationary objectives and problems with very noisy sparse gradients. The Adam algorithm for training STNN model is described in Algorithm 2 where • denotes the Hadamard product (i.e., element-wise product). The convergence analysis of Adam can be found in (Reddi et al., 2019) . Input: Learning rate η (= 0.001), exponential decay rates for moment estimates, ρ 1 (= 0.9), ρ 2 (= 0.999), δ (= 10 −8 ), and initial θ of STNN. Output: Optimal θ of STNN. Initialize k ← 1; Initialize 1st and 2nd moments u = 0, v = 0; while Stopping criterion not met do Take a minibatch ofm samples with indices in S; Compute gradient estimator:ĝ ← ∇ θ L(θ; S); Update biased 1st moment: We are going to compare STNN models with several classical prediction models, including: fully-connected neural network, recurrent neural network, curve fitting models and dynamical system SEIR model. Note that these models take temporal data only. We will briefly introduce these models and explain how to use them for epidemic prediction. Fully-connected Neural Network, also called Back Propagation Neural Network (BPNN), is a popular supervised learning model, which is widely used due to its simple structure and strong fitting ability. BPNN often plays an important part of other complex neural network architectures such as CNN, ResNet, GAN as well as our STNN. A classical BPNN contains three parts: input layer, hidden layers and hidden layer. Every layer consists of some neurons associated with weights, bias and activation functions. The structure of the classical BPNN is previously illustrated in Figure 2 . We can use SGD and its variants to train BPNN model for epidemic prediction. The training data is a set of temporal input-output pairs {(x t , y t )} t where x t is a vector of inputs at time t (e.g., the time step t) and y t is a vector of outputs at time t (e.g., number of infections, deaths and recoveries at t). A well trained BPNN will receive a future input x t and output a prediction y t . There are two commonly used Recurrent Neural Network (RNN) architectures, namely LSTM and GRU. (Hochreiter & Schmidhuber, 1997) is suitable for processing and predicting events with long interval and delay in time series. The network can choose whether to memorize or delete relevant information through a structure called "gate" (including 3 gates: forget gate, input gate, output gate). The spread of an epidemic is strongly related to the situation in a certain period, while the relation with other periods is much weaker, so that LSTM should be appropriate to predict the spread of COVID-19. (Cho et al., 2014) . It is a simplified LSTM with only two gates (called update gate and reset gate) whose structure is shown in Figure 5 , thus it is claimed to be easier to train than LSTM, and can also achieve the same function as LSTM. For these reasons, GRU is now a very popular and widely used RNN. Update Gate Linear − Figure 5 . Structure of GRU Cell. We select a sequence of data from past k time steps as input of RNN, whose output is also a sequence of the same length. Then, passing output sequence into a two layers fully-connected network (with only input and output layers) to get the prediction of the next time step. This procedure is shown in Figure 6 . Both LSTM and GRU can be trained using SGD and its variants. Curve fitting techniques are classical prediction approaches, which aim at constructing curves, or mathematical functions, that have best fits to a series of data points. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a smooth function is constructed to approximate the data. Fitted curves can be used for data visualization, and to infer values of a function where no data are available, i.e., extrapolation. Therefore, we can use the fitted curves based on the observed data for epidemic prediction. Strictly speaking, the supervised deep learning models BPNN and RNN are also curve fittings which use neural network structures (as parametric composite functions) to fit observation data. In this section, we will focus on some other type of classical curve fitting models for epidemic prediction, which can be found in many textbooks of numerical analysis (e.g., (Arlinghaus, 1994) ) and curve fitting packages (e.g., MATLAB Curve Fitting Toolbox). The multi-terms exponential model is described by where a i , b i ∈ R are model parameters, x ∈ R is input, and y ∈ R is output. Exponential model is often used when the rate of change of a quantity is proportional to the initial amount of the quantity. If the coefficient associated with b i is negative, y represents exponential decay; otherwise, y represents exponential growth. Thus, the exponential model is particularly useful to predict in the periods where the number of infections is increasing or decreasing unilaterally. Gaussian Model The Gaussian model is often used to fit peaks defined as the sum of Gaussian functions as where a i is the amplitude, b i is the centroid, c i is related to the peak width, and k is the number of peaks to fit. Gaussian fitting is particularly useful to predict peaks of the infections. Polynomial Model Polynomial model for curve fitting is given by where n is the degree of the polynomial. Polynomial model is often used when a simple empirical model is required. You can use the polynomial model for interpolation or extrapolation, or to characterize data using a global fit. In the pandemic prediction, polynomial model can be used at any period of the epidemic. The main advantages of polynomial fits include reasonable flexibility for data that is not too complicated, which means the fitting process is simple. The main disadvantage is that high-degree fits can become unstable. Additionally, polynomials of any degree can provide a good fit within the data range, but can diverge wildly outside that range. Therefore, exercise caution when extrapolating with polynomials. Differential equations are classical approaches to study epidemic dynamics involving over time. R. Ross seems to be the first to establish a mathematical differential equation to research the dynamical transmission of disease (Ross, 1911) ; then Kermack and Mckendrick established the warehouse SIR and SIS models (Kermack & McKendrick, 1927; 1932) , based on which various improved dynamical transmission models of infectious diseases have been developed such as the SIRS model, the SEIR model and the SEIRS model, see e.g., (Bartlett, 1949; Bailey et al., 1975; Anderson & May, 1979; Beretta et al., 2001) . According to the transmission characteristics of COVID-19, patients have an incubation period, and acquire antibodies during a period after recovery. Therefore, among these dynamical transmission models, the SEIR model seems to be the most suitable one for COVID-19 prediction. The SEIR model divides persons into four categories: susceptible (S), exposed (E), infectious (I) and removed (R). The dynamics between them can be simply described as follows: where S(t), E(t), I(t), R(t) are functions of the number of susceptible, exposed, infectious and removed persons over time t; N is the total population of the area; the parameter β is the effective contact rate, δ is the exit rate from the vulnerable to confirmed infections, and γ is the removal rate of infected persons. These parameters can be estimated using least square method, and updated for different period to improve the fitting accuracy and get better estimations. In this section, we will report our numerical results for forecasting the trend of COVID-19 in several countries, including China, USA and Italy. Our STNN models are implemented in Python, the compared RNN (typically the GRU model is chosen instead of the LSTM model) comes from Keras, and the codes for other compared prediction models (curve fitting models, BPNN, and SEIR) are all developed in MATLAB. Numerical simulations are performed on a supercomputer π 2.0, at HPC center of Shanghai Jiao Tong University, equipped with 4 GPUs (Tesla V100-SXM3) and 20 CPUs (Intel Xeon Gold 6248 CPU@2.50GHz) for neural network training. Data Sources 3 We collect the Chinese provincial data from National Health Commission of the People's Republic of China, the USA data from Johns Hopkins University, and Italian data from the Health Ministry. These data are all in time series, consist of the cumulative confirmed cases, deaths and recoveries. In addition to the number of patients reported, we collect some temporal and spatial data related to the spread of the decease. The daily migration data between provinces of China are collected from Baidu Map. Some hospital information (e.g., the number of hospitals and the fever clinics) is obtained from CSMAR (short for China Stock Market & Accounting Research Database). The population of each province is obtained from National Bureau of Statistics. Some meteorological data include temperature, humidity and air quality index (AQI) are obtained from a data service platform Nowapi. A correlation analysis using provincial data (except Hubei) from January to March 2020 is reported in Table 1 . It was found that the population had the greatest impact on the epidemic situation, followed by: the number of hospitals, humidity, immigration scale index, temperature, emigration scale index and then air quality index. The main factors with major impact will be considered in STNN models. Fitting and predicting active cases for COVID-19 Firstly, we report fitting and prediction results of daily active cases in China because it consists of a complete period from the early outbreak to the rapid control. The provincial data from January to March 2020 are used to test different models. The reason to choose this period is that the pandemic situation is very different, broke out in January, peaked in February, and reduced quickly in March. The spatial data for STNN model include the adjacency matrix of geographic locations, the average population migrations, and the distance matrix among 33 provinces. Temporal data is divided into 2 categories: training set (90%) and validation set (10%). The compared BPNN, GRU, curve fitting and SEIR models are fed with temporal data only. The STNN and GRU models are trained up to 10,000 epochs using Adam algorithm, where the observation, state and input networks have 1-3 layers with 10-30 neurons in each. The 2 regularization could be involved to avoid overfitting. The BPNN model is a 3 layered network with 5 neurons in the hidden layer and tanh as activation function, which is trained up to 1000 epochs 10 times using SGD with random initialization and the one with best performance is used. We apply the root mean square error (namely, RMSE) to measure the training and prediction errors for all tested models. The fitting and prediction results using COVID-19 data in China are summarized in Table 2 and illustrated in Figure 7 . Concerning global outbreaks outside of China, we choose two typical countries: USA and Italy. The fitting and prediction results are reported in Figure 8 ; the training and testing errors are summarized in Table 3 . The first 300 days of spatial and temporal data from 22 January to 31 December of 2020 are used to predict the peak of the next 15 days in Italy and the trend of continued growth over the next 30 days in USA. Comments on numerical results Numerical results demonstrate that the STNN-A, STNN-I and BPNN models often provide best fitting and prediction results (as shown in Figures 7 and 8 and in Tables 2 and 3 with boldface for smallest RMSE and underline for largest RMSE). The classical STNN, GRU, SEIR and polynomial models often perform well for both fitting and prediction of data in China and other countries, only slightly weaker in RMSE than the im- proved STNN models and BPNN model. The exponential model seems to perform bad for both fitting and prediction, even for unilateral cases, and not suitable to fit peaks; while BPNN, GRU, STNN and Gaussian models perform quite well to predict peaks. The major drawback of the SEIR model is the high dependence on the estimation of model parameters; the Gaussian model performs well for unilateral fitting and short period unilateral prediction, but not very satisfactory for long period unilateral prediction; the polynomial model works well for fitting within the data range, but diverges wildly for prediction beyond that range. As for our STNN models, the main advantage is the compatibility and the flexibility of using both temporal and spatial data to increase the accuracy (with small RMSE) of fitting and prediction; while its main drawback is also the requirement of many types (temporal and spatial) of data whose collection could be difficult and cumbersome. As plus, the training of deep neural networks (e.g., STNN and RNN) is very time and computing resource consuming comparing to the other models. Nevertheless, there is a need of large amounts of training data in many deep learning models, whose collection, in the current era of big data, is not too difficult. The increasing GPU computing power also makes it possible to train deep neural networks. Therefore, we believe that STNN should be a promising deep learning model in many potential applications involving spatial and temporal data. Population biology of infectious diseases: Part i Practical handbook of curve fitting Predicting covid-19 incidence through analysis of google trends data in iran: data mining and deep learning pilot study The mathematical theory of infectious diseases and its applications. Charles Griffin & Company Ltd, 5a Crendon Street, High Wycombe, Bucks HP13 6LE Some evolutionary stochastic processes Global asymptotic stability of an sir epidemic model with distributed time delay Online algorithms and stochastic approximations The tradeoffs of large scale learning Time series forecasting of covid-19 transmission in canada using lstm networks The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak Learning phrase representations using rnn encoder-decoder for statistical machine translation Adaptive subgradient methods for online learning and stochastic optimization Analysis and forecast of covid-19 spreading in china, italy and france Modelling the covid-19 epidemic and implementation of populationwide interventions in italy General analysis and improved rates Clinical characteristics of coronavirus disease 2019 in china Long short-term memory Finding an accurate early forecasting model from small dataset: A case of 2019-ncov novel coronavirus outbreak Containing papers of a mathematical and physical character Contributions to the mathematical theory of epidemics. ii.-the problem of endemicity Better theory for sgd in the nonconvex world A method for stochastic optimization New variant of sars-cov-2 in uk causes surge of covid-19. The Lancet Respiratory Medicine The effect of human mobility and control measures on the covid-19 epidemic in china Stochastic gradient descent for nonconvex learning without bounded gradient assumptions Nowcasting and forecasting the wuhan 2019-ncov outbreak. Preprint published by the School of Public Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia A recurrent neural network and differential equation based spatiotemporal infectious disease model with application to covid-19 A conceptual model for the outbreak of coronavirus disease 2019 (covid-19) in wuhan, china with individual reaction and governmental action Transmission potential of the novel coronavirus (covid-19) onboard the diamond princess cruises ship Who mers global summary and assessment of risk Importation and human-to-human transmission of a novel coronavirus in vietnam On the convergence of adam and beyond Revealing covid-19 transmission in australia by sars-cov-2 genome sequencing and agent-based modeling The prevention of malaria An overview of gradient descent optimization algorithms Modeling and forecasting the covid-19 pandemic in india Predictions for covid-19 with deep learning models of lstm, gru and bilstm Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study Association between ambient temperature and covid-19 infection in 122 cities from china Evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions Spatiotemporal neural networks for space-time series forecasting and relations discovery This project is partially supported by the special grand for Science and Technology Innovation at Shanghai Jiao Tong University "Spatio-Temporal Deep Neural Network for Trend Prediction of the New Coronavirus COVID-19 and Countermeasures Researches" (Grant 2020RK10), 2020, and by the National Natural Science Foundation of China (Grant 11601327).