key: cord-0739857-0866gbd1
authors: Pham, Phu; Pedrycz, Witold; Vo, Bay
title: Dual attention-based sequential auto-encoder for Covid-19 outbreak forecasting: A case study in Vietnam
date: 2022-10-01
journal: Expert Syst Appl
DOI: 10.1016/j.eswa.2022.117514
sha: 12d00f7ed223944db71131df262d9e273f479f39
doc_id: 739857
cord_uid: 0866gbd1

For preventing the outbreaks of Covid-19 infection in different countries, many organizations and governments have extensively studied and applied different kinds of quarantine isolation policies, medical treatments as well as organized massive/fast vaccination strategy for over-18 citizens. There are several valuable lessons have been achieved in different countries this Covid-19 battle. These studies have presented the usefulness of prompt actions in testing, isolating confirmed infectious cases from community as well as social resource planning/optimization through data-driven anticipation. In recent times, many studies have demonstrated the effectiveness of short/long-term forecasting in number of new Covid-19 cases in forms of time-series data. These predictions have directly supported to effectively optimize the available healthcare resources as well as imposing suitable policies for slowing down the Covid-19 spreads, especially in high-populated cities/regions/nations. There are several progresses of deep neural architectures, such as recurrent neural network (RNN) have demonstrated significant improvements in analyzing and learning the time-series datasets for conducting better predictions. However, most of recent RNN-based techniques are considered as unable to handle chaotic/non-smooth sequential datasets. The consecutive disturbances and lagged observations from chaotic time-series dataset like as routine Covid-19 confirmed cases have led to the low performance in temporal feature learning process through recent RNN-based models. To meet this challenge, in this paper, we proposed a novel dual attention-based sequential auto-encoding architecture, called as: DAttAE. Our proposed model supports to effectively learn and predict the new Covid-19 cases in forms of chaotic and non-smooth time series dataset. Specifically, the integration between dual self-attention mechanism in a given Bi-LSTM based auto-encoder in our proposed model supports to directly focus the model on a specific time-range sequence in order to achieve better prediction. We evaluated the performance of our proposed DAttAE model by comparing with multiple traditional and state-of-the-art deep learning-based techniques for time-series prediction task upon different real-world datasets. Experimental outputs demonstrated the effectiveness of our proposed attention-based deep neural approach in comparing with state-of-the-art RNN-based architectures for time series based Covid-19 outbreak prediction task.

In the end of year 2019, the rapid spread of novel corona-virus pandemic, named as: Covid-19, over the world has causes tremendous pressures on multiple social aspects. This pandemic also leads to major challenges for researchers in various disciplines (Khan et al., 2021; Kucharski et al., 2020; Liu, Magal, Seydi, & Webb, 2020; Wu, Leung, & Leung, 2020) . Recently, the appearance of Covid-19 ′ s delta variant (Li, Lou, & Fan, 2021) has dramatically increased the number of infected patients who need intensive medical treatments. These public health associated disasters has led to severe burdensome for the hospitals even with countries that had high-levelled public healthcare systems and infrastructures (Micah et al., 2021) . In fact, this variant of Covid-19 is considered as the most contemporary dangerous one. It has flagged out a new pandemic wave through the world with an unimaginative speed. Many real-world case studies in different countries have showed that the tremendous growth in the healthcare requirements for Covid-19 patients. It is the results of fast person-to-person transmission (Fidan & Yuksel, 2021; Pitchaimani & Devi, 2021) of the Covid-19 delta variant as well as shortage of suitable strategies for preventing the spreads of virus among groups of confirmed Covid-19 cases to other healthy ones. In addition, the shortage of proper data modelling and short/long-term Covid-19 outbreak forecasting solutions also led to challenges for the governments to effectively manage as well as plan for social resource optimization (Nascimento et al., 2021) .The accurate pandemic forecasting mechanism also supports for the governments to properly impose suitable policies to simultaneously deal with the expansion of Covid-19 as well as ensure the social/economic stability (Miao, Last, & Litvak, 2022) . Due to the severe influences of this pandemic in multiple social aspects, it is considered as necessary for building data analysis systems which support to capture the spreading temporal patterns of this pandemic. These systems can explicitly support for governments to have suitable plans in both economic recovery and daily life renormalization. Therefore, it is needless to say that the precise forecasting in number of new COVID-19 cases is considered as an important problem. The prediction results can be utilized to facilitate the planning of available resources in public healthcare system. Moreover, for the long run it supports to efficiently optimize the social management strategies/policies for Covid-19 spread prevention as well as treatments for infected patients. To deal with the pandemic outbreak prediction problem, many researchers have proposed different statistical/mathematical and machine learning (ML)-based approaches as data-driven predictive solutions (Khan et al., 2021) for fighting against the rapid escalations of new confirmed Covid-19 cases.

Since, the medical reports of confirmed infectious cases are collected and stored as time series data form, most of predictive models are designed to learn capture temporal and sequential patterns of Covid-19 outbreaks. Commonly in most of countries, the outbreaks of Covid-19 frequently occurred at different spreading levels which are tightly respected to specific time-dependent periods. Moreover, recent studies also show that the spreading level of this pandemic is also relied on the changes in some natural aspects, like as temperature, seasonal weather, etc. (Chin et al., 2020; Zhou, Gao, Xie, & Xu, 2020 ) Therefore, the fluctuation patterns of these outbreaks are naturally non-linear and dynamic which are mainly depended on multiple natural/non-natural aspects. Moreover, the traditional linear and statistical prediction approaches are considered as unable to capture non-linear temporal information from the reported daily confirmed Covid-19 cases. This type of dataset might contain high level of chaotic observations, noises and disturbances due to the influences of multiple internal/external aspects. In general, most of classical linear time-series data-driven prediction models highly relied upon the regression paradigm without the capability in considering non-linear data patterns. Therefore, within the Covid-19 outbreak prediction problem, they might totally fail to sufficiently capture the dynamism of Covid-19 infectious transmission through the community at different time-dependent periods. Moreover, from the perspective of real-world applications, the Covid-19 pandemic prediction system should be able to continuously learn from previous historical data in order to obtain the long-range and temporal features from data to achieve better and more accurate predictions. From the past, multiple statistical models, like as: Auto Regressive (AR), Moving Average (MA), Auto Regressive Moving Average (ARIMA), Nonlinear Auto-regression Neural Network (NARNN) etc. have been widely applied to capture the linear patterns of routine number of confirmed infectious case data to conduct forecasting. However, these models are still far from sufficiently preserving the real-time and temporal information from reported Covid-19 routine data.

Recently, there are several studies presented the efficiency of applying ARIMA model in the Covid-19 pandemic outbreak prediction problem (Katris, 2021) . In this approach, the reported Covid-19 data is modelled as the time series pattern to achieve useful information for predicting the new Covid-19 infections. In very recent years, the integrated ARIMA-based approach is utilized as the predictive model is widely applied in different time series datasets. The ARIMA and its variants are widely used due to its flexibility in analyzing temporal information from a given time series dataset through using different statistical ordered parameters. Recent works have demonstrated success of ARIMA integrated predictive systems in handling prediction task for multiple types of infectious diseases (Cao et al., 2020; He & Tao, 2018; Liu, Liu, Jiang, & Yang, 2011) , including the Covid-19 pandemic (Benvenuto, Giovanetti, Vassallo, Angeletti, & Ciccozzi, 2020; Dehesh, Mardani-Fard, & Dehesh, 2020) . However, most of ARIMA-based predictive techniques are still unable to preserve temporal information as well as perform the non-linear regression to deal with complex and highly-noise routine reported Covid-19 data. This type of infectious data is majorly influenced by different external aspects. In order to deal with the non-linear and complex sequential data pattern learning, there are several studies (Ceylan, 2021; Wu et al., 2015; Yu et al., 2014) have applied the NARNN-based approach to enable the capability of predictive system in performing non-linear and time-series based data analysis problem. In this approach, the temporal information is efficiently preserved from the input time-dependent observations through the neural network based learning paradigm. In general, the NARNN is considered as a neural network based approach which explicitly supports to obtain the temporal information from time series data through utilizing different sequences of a given dataset and train them with corresponding parameter weights through the back-propagation learning procedure. Recent enhancements (Ceylan, 2021; Hansun et al., 2021) in the integration of NARNN with predictive systems have demonstrated remarkable performances in dealing with short-term Covid-19 outbreak forecasting problem. The successes of NARNN in complex time series based prediction task has clearly indicated that deep neural network is the potential and key factor for achieving better performance in temporal/sequential data representation learning and prediction problem.

Recently, there are notable achievements of deep learning in multiple disciplines of computer science domain, like as natural language process (NLP), computer vision (CV), etc. These deep learning's progresses have shed lights on reaching higher level of sequential/temporal data representation learning as well as time series based prediction task (Lim & Zohren, 2021) . Among advanced deep neural architectures, recurrent neural network (RNN), like as: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), etc. have played an indispensable role in time series data modelling and representation learning problem. Among RNN-based architecture, GRU and LSTM are considered as the most popular deep neural architectures. These RNN-based architectures have been widely utilized in different types of time series based modelling and fine-tuning processes for effectively handling prediction problem. Specifically, GRU and LSTM can effectively learn and capture the long-range temporal information in forms of generated hidden states through different sequence-connected neural state cells. There are several recent studies (Chimmula & Zhang, 2020; Zeroual, Harrou, Dairi, & Sun, 2020) have shown the outperformances of LSTM-based neural architecture in time series based Covid-19 new case prediction task. The LSTM model enables to capture temporal information representation learning and time-dependent training for data prediction. However, most of recent RNN-based time series based predictive models still encountered several major limitations. The limitations are mainly related to problems in handling highly noise/non-smooth sequential data as well as long-range dependency. Since, the reported Covid-19 cases are sometime reported as a cumulative sequence with multiple upward/downward trends which produce a non-smooth sequential data pattern. Thus, it might challenge the capability of LSTM in logically preserving these disturbances to make correct predictions. Moreover, the most of RNN-based architectures like as GRU/LSTM is unable to learn and properly interpret long input sequences due to the vanish/explosion of gradient during the backpropagation learning process. In order words, LSTM-based predictive models tend to likely focus and remember much on it last processed sequences rather than preserve the long-range dependencies of all input sequences. Therefore, it might fail to sufficiently capture the trends of Covid-19 outbreaks within long-range periods.

Inspired from recent literature studies and proposed Covid-19 predictive models (Chimmula & Zhang, 2020; Hu, Ge, Li, & Xiongd, 2020; Zeroual et al., 2020) , in this paper, we proposed an enhanced sequential auto-encoding architecture with the dual self-attention mechanism, called as DAttAE. The proposed DAttAE efficiently supports to improve the performance of temporal information representation learning process.

1.4.1. The utilization of sequential auto-encoding (AE) mechanism in In this study, for achieving better performance in Covid-19 case prediction problem, we utilize a sequential (Bi-LSTM) based autoencoder to encode and generate rich-semantic temporal representations of historical observations. The learnt rich temporal information from sequences are later used to assist the fine-tuning process of Covid-19 trend forecasting problem. In general, the application of AE enables to learn a latent representation of the data which is split into different neural components (encoding/decoding). It later allows us to efficiently disentangle the latent time-dependent features from the given set of historical observations which are currently evaluated. Specifically our proposed DAttAE model in this paper is designed upon the inheritance of the well-known sequence-to-sequence (seq2seq) architecture (Sutskever, Vinyals, & Le, 2014) . To deal with noises/lagged observations from time-series dataset, it is integrated with the attention mechanism (Bahdanau, Cho, & Bengio, 2015; Vaswani, 2017) which is mainly inspired from the NLP domain. In general, the historical confirmed Covid-19 case data in forms of time series are passed through an autoencoding based architecture. The given sequential AE is composed with two separated Bi-LSTM based encoder and decoder as a sequential self-learning approach (as illustrated in the main block of Fig. 1-A) . Within the given proposed AE architecture, first of all, we achieve the concatenated (backward/forward) hidden states of the input sequences in forms of reported number of Covid-19 cases within a specific timedependent range through the encoding component. Then, the encoder's outputs are fed into another Bi-LSTM architecture to simultaneously interpret and produce the transformed hidden states of the original input sequences. Next, both learnt hidden states of the encoding and decoding components of our given AE are passed through separated attention layers to generate the combined attention weights (as illustrated in the Fig. 1-B) . The ultimate purpose of applying the dual selfattention filtering mechanism in the outputs of the Bi-LSTM based AE architecture is to estimate the importance levels of all data entries in each input sequence. It directly supports to enhance the prediction finetuning process in the after all. Finally, the concentrated attention weights of both encoder and decoder are fed into a full-connected layer to predict the number of Covid-19 cases.

In general, the Covid-19 case forecasting task can be formulated as non-smooth/high-noised time-series based data analysis problem in which routine reported Covid-19 cases are abnormally fluctuated over the time. The rapid/abnormal changes in number of infectious cases are come from different reasons, including: impacts of environmental/ geographical aspects, the inconsistency in reported data of different organizations/governments, etc. Thus, recent regression as well as modern RNN-based predictive models might be not powerful enough to effectively analyze and reduce noises/temporal disturbances within the representation learning and prediction processes. In order to deal with this problem, in our approach the neural attention mechanism has been adapted to achieve noise-reduced sequential representation of a given input sequence. By doing this, the importance levels of all routine reported Covid-19 data entries in the given sequence are taken into considered during the embedding process through the Bi-LSTM based auto-encoder. Thus, it supports to deliver better prediction results in the after all. To demonstrate the effectiveness of our findings in this paper, we have tested our proposed method with other baselines in real-world Covid-19 datasets and achieved outperformed results for both Covid-19 case prediction and risk-zone classification problems (as later described in sub-section 2.1). In general, our research objectives, contributions as well as novelty of our proposed DAttAE model in this paper can be briefly summarized as three-folds, which are:

• First of all, we present the utilization of Bi-LSTM based autoencoding architecture as a self-supervised representation learning. It supports to effectively preserve the dynamic and temporal information from a given time series dataset. The ultimate purpose of using a sequential auto-encoding mechanism is to simultaneously facilitate the data embedding and interpretation processes to deal with challenges of non-smooth and chaotic sequences which might contain a lot of lagged observations, noises and temporal disturbances, such as in the routine reported number of Covid-19 infectious cases. • Next, we integrate the proposed Bi-LSTM based auto-encoding mechanism with a custom dual self-attention mechanism (as illustrated in Fig. 1-B ). It can support to softly achieve the smooth/fluent representations of input sequences. These rich-semantic representations of historical observation are later utilized to fine-tune for achieve better prediction results through a task-driven full-connected layer at the end. The ultimate purpose of integration an attention mechanism into the given sequential AE architecture is to automatically evaluate the importance of all consecutive data entries in forms of non-smooth input sequences. It later assists to directly leverage the overall accuracy performance of the temporal representation learning process. 

For many years, researchers have tried to find different approach to effectively handle as well as preserve temporal dynamic features from the time-series based datasets to produce better predictions. However, most of traditional approaches like as the popular auto-aggression based techniques (ARMA, ARIMA, etc.), Naïve Bayesian probabilistic addictive decomposition techniques for time-series, etc. have failed to sufficiently capture the long-range time-dependent features between consecutive observation entries. Moreover, within the short/long-term time series prediction approach through deep learning, multiple complex recurrent neural network based architectures have been utilized to dealing with the long-ranged historical observation representation learning. These RNN-based models have sufficiently supported to capture the richsemantic time-dependent features within time-series dataset to performance significant improvements. However, these deep learning based predictive model normally performed inaccurate predictions while dealing highly noise/chaotic sequential datasets with unpredicted growing patterns like as the routine reported Covid-19 case dataset. In recent times, there are few works have been concentrated on the utilization of attention mechanism within the sequential auto-encoding paradigm. The utilization of neural attention based mechanism with different full-connected components have effectively assisted to capture the short-term growing patterns of disturbing/chaotic time series datasets. Majorly inspired from previous works for the case study of Covid-19 outbreak prediction, we dedicated our efforts on discovering on how the integration between dual self-attention neural architectures can efficient support the sequential AE-based architecture. Our proposed AE with a dual self-attention mechanism can support to efficiently preserve the unsmooth sequences and improve the performance of time series prediction problem, specifically for the routine reported Covid-19 confirmed cases in Vietnam.

In recent years, the COVID-19 pandemic as well as it associated aspects have been considered as a hot/top priority research topic for the researchers worldwide. To prevent the rapid spreads of this pandemic, early infectious case forecasting is a crucial task which can facilitate multiple management activities in different social sectors, especially public administration and healthcare system. The accurate Covid-19 case forecasts can directly support to optimize multi-sector social resources for preventing the spreads of this pandemic as well as planning for quarantine isolation strategy. Moreover, due to serve impacts of Covid-19 on human health, the effective public healthcare resource anticipation and distribution are extremely important to provide fast and proper treatments for patients with serve COVID-19 symptoms. Therefore, within the computer science domain, there are various researches have been conducted in the literature using different mathematical models/deep learning based paradigms to predict the spread of the Covid-19. Normally, routine Covid-19 infectious cases are reported in forms of complex and highly-fluctuated time-series datasets. Therefore, to effectively analysis and learn the dynamic feature representations of these datasets, different RNN-based models have been applied in this research direction, however, they still be considered as insufficient to cope with abnormal/lagged observations to convey better prediction results. In our works, we proposed a novel integrated Bi-LSTM based auto-encoding architecture with dual attention mechanism to effectively deal with complex and non-smooth/abnormal fluctuations of routine reported Covid-19 infectious cases, thus deliver more accurate predictions. The accuracy forecasts in number of new Covid-19 cases as well as growing pattern are considered as significant for management implications.

To sum up, the left contents of our papers are organized into four sections. In the second section, we shortly review recent works which are related to our study. In this section, we also discuss the pros/cons of each technique which serve as a motivation for the proposal of DAttAE model. Next, we formally present about the associated background concepts, methodology and detailed implementations of the proposed DAttAE model in the third section. In the next section, we present extensive comparative studies in Covid-19 outbreak prediction task via different deep neural architecture within real-world datasets and discuss the experimental outputs. Finally, we conclude the findings as well as highlight some promising directions for future improvements. For further referencing purpose, we listed all common abbreviations as well as mathematical notations which are used in the left contents of our paper in Table 1 .

For many decades, our world has witnessed and suffered from different severe infectious disease outbreaks, like: Asian Flu, HIV/AIDS, SARS and now is the Covid-19. From this time on, many researchers, organizations and governments in different countries have thorough studied, proposed and applied different management strategies to prevent the tremendous exploration of Covid-19 infectious cases. Following the recent reports of World Health Organization (WHO) [4] , there are over 248 million of confirmed cases and 4 million of deaths have been reported over the world at the start of November 2021. In Vietnam, following the official information from Vietnam Ministry of Health (VMH) [5] , there are approximately over 880 thousand of infectious cases and 21 thousand of deaths have been reported up to this time. In fact, the spreads of this Covid-19 pandemic in different countries are varied and it seems unpredictable. This pandemic is influenced by multiple aspects related to both natural (seasonal weather, regional temperature, etc.) as well as unnatural (population, immigration, etc.) aspects. Due to the severe effects of this pandemic in multiple social and economic aspects, it has been considered as a multidisciplinary issue which requires the involvements of many organization and governments. It requires higher efforts for the governments in imposing new social management strategies, pharmaceutical/epidemiological organizations in medical treatment proposal as well as data modeling/analysis solutions for early forecasting and planning.

In Vietnam, our government and VMH have proposed different effective management strategies which enable to flexibly impose different quarantine policies for different regions. There are different policies are imposed depending on the levels of Covid-19 infectious risk at different Vietnam's regions and cities. Recent new proposed management strategies support to jointly optimize social resources for preventing Covid-19 outbreaks in high-risked regions as well as ensure the social and economic stability in low-risked/safe regions. Following the VMH, the level of risk in each region is identified upon two criteria, which are: the daily/weekly number of confirmed Covid-19 cases and vaccination rate (%) in that region. Table 2 shows the detailed assessment criteria which are provide by VMH [6] for the Covid-19 risk zone classification in Vietnam. Our works in this paper mainly focused on the solution of Covid-19 data as time series modelling and prediction, specifically in learn the evolution pattern of confirmed infected Covid-19 cases. Then, we utilize the predicted infectious case results to conduct Covid-19 risk zone classification following the listed assessment criteria in Table 2 . The Fig. 2 illustrates the Covid-19 risk zone prediction and classification problem within 62 districts of Vietnam.

Recently, there are several works focused on applying statistical/ mathematical and traditional predictive approach, like as ARIMA (Benvenuto et al., 2020; Dehesh et al., 2020) . These traditional time series based techniques support to efficiently model and predict the transmission trends of Covid-19. These works have demonstrated the success in modelling and correctly predicting the linear growth of new Covid-19 infectious cases. However, real-world reported Covid-19 is much complex and dynamic in different nations in which the growth Table 1 List of common abbreviations/notations which are utilized in our paper. The trainable weighting matrix and bias parameters, respectively.

The rectified linear units function, formulated as:

The linear full-connected neural layer.

The sigmoid function, formulated as: σ(.) = 1 1 + e − . .

Covid-19 risk zone categorization in Vietnam following the VMH's criteria.

Red zone (very high risk)

• To effectively deal with the non-linear patterns of daily reported infectious cases, there are recent attempts of Ceylan (Ceylan, 2021) and Hansun et al. (Hansun et al., 2021) have demonstrated the effectiveness of integrating NARNN architecture. These NARRN-based models (Ceylan, 2021; Hansun et al., 2021) support to model the flexible timedependent growth patterns of Covid-19 diseases in different real-world datasets. In general, the NARNN approach is basically considered as an initial artificial neural network based technique. It mainly relies on neuron-varied learning paradigm to deal with the time series based modelling and prediction problem. On the same approach of applying multi-layered neural architecture for handling Covid-19 spreading prediction problem, recent works of Wieczorek, M. et al. proposed an application of stacked multi-layered full-connected neural architecture to preserve the temporal information from the routine reported Covid-19 datasets (Wieczorek, Siłka, Woźniak, 2020) . However, the simplicity of NARNN architecture is considered as unable to preserve the long-range temporal information of complex time series datasets to effectively perform the short-terms prediction.

In recent years, the tremendous raises of deep learning in multiple disciplines have shed some lights for enhancing the performance of Covid-19 outbreak forecasting problem (Shorten, Khoshgoftaar, & Furht, 2021) . Within the problem of time series data modelling and representation learning tasks, there are popular RNN-based architectures (e.g., GRU, LSTM, Bi-LSTM, etc.) are widely applied to achieve significant performance in multiple data analysis tasks, including the time-series prediction. Recent attempt of Chimmula et al. (Chimmula & Zhang, 2020) and Hu et al. (Hu et al., 2020) in applying advanced deep neural architectures in handling complex non-linear and temporal information capturing from time series data. In more specifics, in this work (Chimmula & Zhang, 2020) , Chimmula et al. have presented the advantages of LSTM-based neural architectures in modelling reported Covid-19 time series data for predicting short-term number of infection cases in Canada, US and Italy. For the general literature reviews of potential applications of different deep neural architectures in sequential data representation learning and Covid-19 outbreak forecasting problem, Zeroual et al. (Zeroual et al., 2020) have conducted extensive comparative studies between different RNN-based and auto-encoding (AE) based architectures to present the potentiality of these advanced neural architectures in handling real-world time series based Covid-19 datasets. Similar to that, recently Chatterjee, A. et al. (Chatterjee, Gerdes, & Martinez, 2020 ) studied on using multiple LSTM-based architectures to efficiently preserve the dynamic temporal information from reported Covid-19 spreading data to conduct accurate predictions. Also related to Covid-19 time series based data evaluation and learning, Nascimento et al. (Nascimento et al., 2021) recently propped a novel dynamic graph-based analysis technique with multi-regression dynamic model (MDM) approach. It supports to find relationships between time series routine Covid-19 reported data and financial market trends. These recent well-known studies have placed strong backgrounds for building efficient Covid-19 prediction systems.

However, as aforementioned issues related to the limitations of RNNbased architectures in remembering the long-range temporal information from input sequences as well as problem associated with the highly chaotic noised/non-smooth time series datasets, there are existing challenges which have still required extra researching efforts. Mainly motived by recent studies in future Covid-19 case forecasting, our works in this paper concentrate on integrating the Bi-LSTM based autoencoding architecture with dual self-attention mechanism to improve the performance of Covid-19 case prediction task.

In this section, we formally present the methodology and detailed implementations of our proposed DAttAE model in this paper. In general, the proposed DAttAE model is designed as a Bi-LSTM based autoencoder which support to softly capture the temporal information from the input time series sequences as a self-supervision representation learning approach. Then, the concatenated hidden states which are produced by the encoder and decoder parts are fed into a dual selfattention mechanism to produce the combined weighted attention vectors which support to estimate the importance of all data entries in the given input sequences to assist for the data temporal forecasting process. To do this, the concatenated output attention weights of both encoder and decoder are passed through a linear full-connected layer to conduct prediction.

In general, large number of real-world datasets in multiple disciplines can be considered as time series data for in which data entries are collected at specific time intervals with (N) data entries, denoted as: X = {x 1 , x 2 , ⋯, x N }, with x t being a single data entry at a specific t th time. Due to the popularity of time series based datasets, there are numerous studies have presented notable achievements as well as existing challenges for the time series prediction problem. The time series prediction model is formally designed to forecast/predict the upcoming sequential trends/patterns of a given time series dataset through analyze the latent temporal features of the historical data entries. For the short-term time series prediction problem, the given time series dataset (X ) is normally split into (n) smaller observation sequences, denoted as:

which are used to learn and predict the upcoming trend/pattern of the consecutive data entry, as: (x i+L ), with (L) is the pre-defined observation length or look-back parameter. In general, given time series predictive method, denoted as: f TSP (.), is designed to optimize the following learning objective: f TSP (X i ) ≈ x (i+L) .

From the past, most of stacked full-connected linear neural network architectures are widely applied to model and learn context latent features from different types of datasets. However, these stacked multilayered neural architectures are unable to model and retain the consecutive information from sequential/time-series datasets. In order to deal with the continuous and temporal information present in different sequential data forms, the novel deep neural architecture is required. RNN is considered as the most popular neural architecture which is designed to deal with the sequence/time-dependent latent feature representation learning problem. In general, the RNN is developed upon the principle of considering the effects of consecutive data entries' information to generate the corresponding outputs. To do this, a neural state cells are organized as the sequence-ordered structure. Each neural cell in a given RNN-based architecture contains multiple logic gates which are used to control the influences of historical observations of previous data to the current input entry, as: x and generate the corresponding output in form of hidden state, denoted as h. For a specific time-step (t th ), the generated hidden state, denoted as h t for a specific input

x t is generally obtained as follows:

} standing for the set of trainable parameters (weights, bias) and the corresponding activation functions of different gates in a given RNN architecture. Among RNN-based architectures, GRU and LSTM are the most commonly and widely applied to model and analyze different types of sequence/time series based datasets in multiple disciplines like as NLP explain abbreviation, short-term recommendation, multi-media information retrieval, etc.

To effectively model the input sequences coming from a nonsmooth/high noisy time series dataset like as routine reported Covid-10 infectious cases, we majorly inspired from the well-known seq2seqbased approach (Sutskever et al., 2014) in the NLP domain in which the proposed DAttAE model is designed as a sequential auto-encoding architecture with two components encoder and decoder. The encoding part is composed as a Bi-LSTM architecture which support to effectively encode all data entries in each input sequence in both forward and backward directions. In more details for a specific data entry x t in a given observation sequence X, each LSTM architecture at each direction supports to produce the corresponding hidden state h t as the following (as shown in equation (1)). In this equation, the U, W and b present for the weighting and bias parameter matrices of different gates of the given LSTM architecture in each direction which is identified by the operator: [ +, − ] ( + ,− used for denoting the forward and backward directions, respectively).

Then, the last (k th ) generated hidden states of the given Bi-LSTM architecture are then combined to form the final sequential embedding vector of the given input sequence (X). These learnt temporal representations are formulated as the output hidden state embedding vectors of Bi-LSTM based last layers, denoted as: e enc X = h k X ∈ R |X|×2k , with |X| is the length of input sequence (X).

In general, the utilization of a LSTM/Bi-LSTM based sequential embedding mechanism in our work is to sufficiently capture the rangevaried time-dependent features from a given time-series dataset. The equation (1) and (2) generally illustrated a basic LSTM-based neural cell which is implemented in our proposed sequential encoding mechanism. In our encoding and decoding parts, a dual hierarchical LSTM based architecture (Smagulova & James, 2019) is applied to generate the bidirectional temporal embeddings of each input entry in forms of last output hidden states of each LSTM architecture. These rich-structural temporal representations are combined to produce the unified representations of consecutive observation entries.

The overall process of hidden state combination process is illustrated as shown in the equation (2), with [., .] presenting for the vector concatenation operation. Then, the generated latent embedding vectors for each input sequence in the encoding part will be passed to the decoding part. The decoding component is also designed as another Bi-LSTM based architecture to transform them into another sequential representation form, denoted as: (e dec X ). The ultimate purpose of using different Bi-LSTM based embedding mechanism to learn the sequential representation of input sequence (X). It supports to create a longer consecutive neural learning architecture. By using the given Bi-LSTM based auto-encoding mechanism, it assists to prevent the problems related to gradient explosion as well as efficient trainable weighting parameter calculation. Moreover, the Bi-LSTM architecture also provides better temporal information preserving for chaotic time-series data in which generated latent features in each neural cell can be shared and remembered for sets of long/non-smooth input sequences.

Next, in order to enhance the capability of the given Bi-LSTM based auto-encoder to achieve better sequential representation of a given input sequence in which the importance levels of all data entries in the given sequence are taken into considered during the embedding process, we implement a custom hierarchical dual self-attention mechanism in our proposed DAttAE model. There are two self-supervised attention layers are placed at the end of encoder and decoder components which take the output embedding vectors of each component and help to overall model to selectively concentrate on different parts of the given sequence for better prediction-driven fine-tuning process at the end. Specifically, the self-supervised attention layer which is located at the output of encoder and decoder components is designed as follows. [i] .e X, [i] (4)

In the equation (3), the attention score of each data entry at a specific i th index, denoted as: score

, is computed as the linear transformation of the attention alignment paradigm which is majorly inspired in previous works (Bahdanau et al., 2015; Vaswani, 2017) . Being different from previous proposed attention mechanisms which are mainly applied for machine translation task, in our developed dual selfsupervised attention mechanism, we used the ReLU activation function to perform the linear transformation of the given output sequential embedding e X with trainable weighting parameter matrix, as: W Att . Then, the computed attention score for a specific i th data entry in the sequence is utilized to calculate the corresponding importance score, denoted as: α X, [i] . Finally, we compute the context attention vector for the given input sequence X, denoted as: c X as the weight sum of all attention scores for all data entries in X which is illustrated in the equation (4). In general, the application of this self-supervised attention mechanism in our DAttAE model directly supports to the soft alignment between the generated sequential embedding vectors in both encoding e enc X and decoding e dec X parts which are simultaneously learned and controlled by the corresponding context vectors: c enc X and c dec X , respectively. Then, we concatenate the calculated context vectors of both encoding and decoding part to produce the final dual attention-based weighting embedding vector of input sequence X, denoted as:

. Finally, to let the given architecture predict the value of consecutive data entry of the given sequence, as: x (i+L) , we feed the achieved attention-based embedding vector e Att X to a linear full-connected layer. The general calculation of this process is formulated as shown in form.

As shown in (5), a linear full-connected layer supports to linearly transform the previous obtained attention-based embedding vector of input sequence X into the distributions of all possible entry values in a given time series dataset X . Then, we optimize DAttAE model's parameters through the mean square error (MSE) loss strategy with the defined learning objective, as: L DAttAE as shown in the equation (6).

In this section, we present extensive experiments to evaluate the performance of the proposed DAttAE model in new infection case prediction task within two real-world Vietnam reported Covid-19 datasets. Moreover, we also provide thorough comparative studies of the performances of our model with other state-of-the-art RNN-based baselines which are assessed under standard evaluation metrics used for the time series forecasting problem.

In order to ensure the adaptation of the proposed model in real-world application, we mainly utilize realistic daily reports of confirmed Covid-19 cases in forms time series data resources within Vietnam. The experimental datasets in this paper are collected from official source of VMH. There are two main types of datasets which are used for all experiments in this paper, which are: 

For evaluating the performance of long-range Covid-19 case prediction task within two datasets, we applied the time-dependent splitting strategy in which the training set are taken from the period of April 27, 2021 to the end of (35 th ) week (September 5, 2021) for the VN-62P dataset and from June 22, 2021 to September 5, 2021 for the HCMC-12D dataset. For experiments with different traditional and deep learning based techniques which are applied for Covid-19 prediction task, we all applied the same test/split ratio in which the training set is utilized to learn and extract temporal features from historical observation which are later applied to predict the future number of Covid-19 cases against the test set. The detailed information and statistics about these two datasets as well as usage which are used for all experiments in our paper can be found in Table 3 .

Experimental environment & configurations. To implement the DAttAE model, we mainly used the Python programming language under the PyTorch [8] machine learning framework. We set up our DAttAE model and other traditional deep learning based comparative baselines for all experiments on a same computer with Intel Xeon CPU E5-2620 v4 2.10 GHz (8 cores -16 threads) CPU and 64 Gb memory.

Setups of our proposed DAttAE model for data training and evaluation. For the Bi-LSTM based auto-encoding mechanism (as described in sub-section 3.2.1), we set the number of used LSTM-based cells (k LSTM ) for each Bi-LSTM architecture as 32. For the experiments with LSTM/Bi-LSTM related techniques which later are described in subsection 4.1.3, we also used the same configured number of hidden states for all datasets. For all predictive techniques which are related to the RNN-based approach for dealing with the long-range Covid-19 prediction task, the default observation/look-back length (L) are set as 5 for all datasets. For the setup of dual self-supervised attention mechanism (as described in the sub-section 3.2.2), the default weighting parameter values of all attention layers is initialized by using Xavier initialization. The hidden size layer of these attention-based architectures is set as the same with the number of configured (k LSTM ) parameters for the encoder and decoder components. In more specifics, Table 4 lists detailed configured parameters which are set up for our DAttAE model in all experiments.

Experimental evaluation criteria. Similar to recent works (Chimmula & Zhang, 2020; Hu et al., 2020; Zeroual et al., 2020) , to evaluate the performance of different deep learning based time series predictive model, we three main standard evaluation metrics which are: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). These evaluation methods support to assess how much a given predictive model make wrong predictions upon a context of time series dataset. These evaluation methods are calculated upon a given test set, denoted as: (X Test ) as the following (as shown in the equations (7) and (8)). For each time series predictive model in each dataset, we run experiments 10 times and reported the average output of each model as the final result.

F-1 = 2. P.R P + R (10) In our experiments, we also conducted the extended Covid-19 zone categorization problem in each assessed geographical region (provinces in VN-64D dataset and districts in HCMC-12D dataset) which is considered as the classification problem. To evaluate the accuracy performance of this Covid-19 risk zone categorization task, we mainly used two metrics which are the Accuracy and F-1 measure (as shown in equation (9) and (10)). In this equation, the values of precision (P) and recall (R) are calculated by identifying the TP (true positive), FP (false positive) and FN (false negative). Specifically, the TP (true positive) presents the number of regions which are correctly classified to their ground-truth Covid-19 risk zone labels. The FP (false positive) and FN (false negative) indicate the number of expected regions which are categorized to specific Covid-19 risk zone based classes but not correct and the number of regions which are not classified by their actually Covid-19 risk zone ground-truth labels, respectively. To identify the Covid-19 risk zone-based labels for each region (district/province) in each dataset, we mainly relied on the standard assessment criteria of VMH which are mentioned in Table 2 . Due to the lack of information in vaccination rate in each region, thus we only considered the daily/ weekly number of new reported Covid-19 infection cases to identify the ground-truth/predictive labels for each region.

To demonstrate the effectiveness of our proposed DAttAE model in comparing with recent state-of-the-art deep learning based predictive model for time series dataset, we implement several techniques to conduct Covid-19 infection cases in two VN-62P and HCMC-12D datasets, which are:

• Naïve Bayes (NB): for further evaluation on how traditional probabilistic model can facilitate on the problem of time-series based Covid-19 prediction problem, we implemented a simple NB-based regression model which supports to product the predictions through the addictive decomposition approach. • ARIMA: is considered as the most well-known and traditional approach for time series data forecasting problem. In this study, we implement the ARIMA model to predict the number of Covid-19 cases within two VN-62P and HCMC-12D datasets. In our experiments, we used the ARIMA model to conduct the long-range timedependent prediction task. • GRU (Zeroual et al., 2020) : is considered as the classical and earliest RNN-based neural architecture in which neural state cells are organized as the sequence-ordered structure to enable the capability in modelling and representation learning for sequence/time seriesbased data. Each GRU's neural cell contains two types of logic gates: reset and update which support to control the preserved sequential information of each data entry in a sequence when it is passed through it. For the implementation of GRU based predictive model in our experiments, the number of used neural state cells (k GRU ) is set as the same our DAttAE model. • LSTM (Zeroual et al., 2020) : is the recent most advanced sophisticated sequence-ordered neural network architecture which is designed to prevent problems related to the vanishing/exploration of gradient while handling long-dependent sequence/time series-based datasets. Different from the GRU, LSTM contains three types of logic gates: input, forget and output. The advanced designs of LSTM enable it to extend the capability in preserving the long-term dependencies between input sequences and thus perform great ability to dealing with sequence/time-series based data analysis and representation learning problems. • Bi-LSTM (Zeroual et al., 2020) : is the enhanced version of the LSTM in which two different LSTM-based architectures are vertically aligning to learn the sequential representation of all data entries in each input sequence at both direction (forward/backward). Then, the last concatenated hidden state vector of each sequence is used to predict the next data entry value. For all experiments, the Bi-LSTM architecture is setup as the same configurations with the encoder part of our proposed DAttAE model. • COVID-ANN (Wieczorek, Siłka, & Woźniak, 2020 For other similar configurations of these comparative techniques, we set them as the same DAttAE model's configurations which are shown in Table 4 .

In this experimental section, we present extensive comparative studies between different deep learning based predictive models for handling the daily number of Covid-19 infection case forecasting problem within the VN-64D and HCMC-12D datasets.

The routine Covid-19 infectious case forecasting is considered as a challenging task within non-smooth time-series databases like as VN-64D and HCMC-12D. The Fig. 3 and Fig. 4 show the average experimental outputs for daily Covid-19 case prediction problem through different traditional and deep learning based techniques, within the VN-62P and HCMC-12D datasets, respectively. The experimental outputs as shown in these tables are evaluated under the MAE and RMSE standard metrics. In general, as briefly take a look at output experimental charts our proposed DAttAE model outperforms most of recent baselines for the Covid-19 case forecasting problem.

In general, as shown from the experimental outputs ( Fig. 3 and Fig. 4) , our proposed DAttAE model significantly achieved better performances than traditional time-series and probabilistic/regression models, like as: NB and ARIMA. As shown from the experimental outputs, it achieved better predictive MAE/RMSE-based performances than the NB and ARIMA, about 116.21 %/104.51 % and 135.64 %/120.77 % within the VN-64P dataset (as shown in Fig. 3 ) which is similar in the HCMC-12D dataset (as shown in Fig. 4) for 79.33 %/66.52 % and 53.19 %/50.7 %, respectively. Generally, these experimental outputs have demonstrated the outperformances of deep learning based approaches in comparing with classical probabilistic and regression methods in which better long-range time dependent features are captured with deep neural architectures. In comparing with other common sequential deep neural baselines, including: GRU, LSTM and Bi-LSTM in terms of MAE and RMSE evaluation metrics for this task, our proposed DAttAE model also demonstrated significant improvements. Specifically, for the Covid-19 case prediction task in the VN-62P dataset, our proposed DAttAE model remarkably outperforms previous RNN-based architectures (GRU, 116.67 % and 17.97 % in terms of MAE and 117.76 %, 99.62 % and 14.26 % in terms of RMSE, respectively. Similar to that within the HCMC-12D dataset, our approach also notably improves the accuracy performances in terms of MAE and RMSE evaluations about: 19.72 %/20.95 %, 80.21 %/67.84 % and 64.23 %/59.16 % in comparing with Bi-LSTM, LSTM and GRU, respectively. In addition, for comparisons with recent deep learning based methods for Covid-19 case prediction problem like as the COVID-ANN and COVID-RNN, our proposed DAttAE model also slightly improved the accuracy performances in terms of MAE/RMSE metrics approximately 86.85 %/75.23 % and 4.67 %/5.98 % within the VN-62P dataset and 24.7 %/29.53 % and 10.21 %/12.85 % within the HCMC-12D dataset, respectively.

Moreover, as shown from some forecasting outputs ( Fig. 5 and Fig. 6 ) in the VN-62P and HCMC-12D datasets, our proposed DAttAE model perform better predictions than the traditional Bi-LSTM architecture. In general, the ground-truth Covid-19 daily reported cases are extremely undulating, thus presenting how challenging of the routine infectious case prediction problem. The experimental outputs present that the daily predictive values are close to the actual values than the Bi-LSTM based predictive ones. By utilizing the custom dual attention mechanism within the Covid-19 data representation learning and prediction tasks, our DAttAE model have significantly achieved better sequential representations of Covid-19 case observation data entries in both VN-62P and HCMC-12D and deliver nearly-fit predictions. Through experiments in Covid-19 data in Vietnam, the experimental outputs have demonstrated the effectiveness of our proposed DAttAE model in handling the Covid-19 outbreak forecasting problem in form of timeseries based prediction approach.

For the Covid-19 risk zone identification problem (as mentioned in sub-section 2.1), in this experimental section we demonstrated the extensive comparative studies in using deep neural time series predictive techniques to conduct the Covid-19 risk zone classification. For the dataset VN-62P is the zone classification for 62 provinces of Vietnam. Similar to that, for HCMC-12D dataset is the zone classification for 12 high-populated/large districts in HCMC, Vietnam. In order to conduct Covid-19 zone identification depending on criteria which have been described in previous section, similar to the daily Covid-19 case prediction problem, we used the trained deep learning-based models, including our proposed DAttAE model, to predict the daily number of infection cases in each region (provinces for the VN-62P dataset and districts for the HCMC-12D dataset). Then, we relied on the assessment criteria for Covid-19 risk zone categorization in Table 2 to identify the corresponding classes (red, orange, yellow and green zones) for all regions at specific time-steps. Finally, the Covid-19 risk zone categorization results are evaluated under the Accuracy and F-1 evaluation metrics for the comparison purposes between different baselines. We evaluated the performance of each model in Covid-19 risk zone classification task with two types of time-interval, which are: daily and weekly. The Fig. 7 and Fig. 8 show the experimental outputs for both daily and weekly Covid-19 zone risk classification task through different techniques, within the VN-62P and HCMC-12D datasets, respectively. Specifically, as shown from the experimental outputs, our proposed DAttAE model also slightly leverage the accuracy performance of Covid-19 risk-zone classification task in both VN-62P and HCMC-12D datasets. In more specific, our proposed DAttAE model explicitly achieves better accuracy performances in terms of F-1 based accuracy than the GRU, LSTM and Bi-LSTM approximately 3.92 %, 2.79 % and 1.57 %, within the VN-62P dataset and 12.33 %, 14.15 % and 1.53 % within the HCMC-12D dataset, respectively. In comparing with recent deep learning based studies, such as: COVID-ANN and COVID-RNN, about proposed DAttAE model also achieve better performances for this task about 5.41 % and 1.79 % in the VN-62P, 5.06 % and 8.01 % in the HCMC-12D dataset.

Similar to previous empirical studies in the Covid-19 case forecasting problem (sub-section 4.2.1), experimental results in this section have proved the effectiveness of integrating with attention mechanism for long-varied sequential data representation learning task in forms of daily/weekly Covid-19 risk-zone classification problem.

For further performance evaluation of our proposed DAttAE model within different times-series based Covid-19 dataset, we conducted extensive experiments on a popular United States (US) reported Covid-19 confirmed cases within different states (named as US-Covid19). This dataset contains different types of information related to the reported death and infectious cases in forms of time-series and routinely updated by collecting data from the well-known Johns Hopkins University Center for Systems Science and Engineering (CSSE) [9] . For experiments in this dataset, we extracted the number of confirmed Covid-19 infectious cases within the top-5 highest populated states in the US, including: California, Texas, Florida, New York and Pennsylvania. The extracted data related to the confirmed cases in these states are within the period of April 16, 2021 to January 15, 2022. For dealing with the infectious case prediction problem within this dataset, we utilized different deep learning based methods including: LSTM, Bi-LSTM, COVID-ANN, COVID-RNN and our proposed DAttAE model. We applied the same train/test data splitting strategy with previous datasets as well as general configurations (as described in Table 4 ) for all deep learning techniques which are studies in the experiments with the US-Covid19 dataset. The average prediction performances in terms of MAE and RMSE evaluation metrics of all states are reported as shown in Fig. 9 .

As shown from the final experimental results in Fig. 9 , our proposed DAttAE model has demonstrated effectiveness as well as outperformances in comparing with other deep learning based time-series predictive models. In more specific, our proposed DAttAE model explicitly achieved better performance in terms of MAE/RMSE for the Covid-19 confirmed case prediction about 34.6 %/36.3 %, 20.84 %/18.61 %, 29.01 %/27.63 % and 14.5 %/14.89 % in comparing with LSTM, Bi-LSTM, COVID-ANN and COVID-RNN, respectively. This extensive comparative result has proved the efficiency of utilizing the dual attention mechanism within RNN-based architecture for dealing with non-smooth and chaotic time-series based dataset, like as the routine reported Covid-19 infectious cases.

To furtherly study on the influences of different model's parameters upon the accuracy performance of our proposed DAttAE model, in this section we presented extensive ablation studies on important fine-tuning parameters of our model, like as the number of training epochs, the used LSTM-based cells for the Bi-LSTM based auto-encoding architecture (k LSTM ) and the length of observation sequence (L) for the Covid-19 case forecasting problem. As shown from the experimental outputs in Fig. 10 , our proposed DAttAE model are quite insensitive with these parameters. Specifically, for the experiment with number of training epochs, we trained our model with different number of epochs within range of [10, 500] and reported the changes in accuracy performances for the Covid-19 case prediction task in terms of RMSLE assessment metric. The experimental outputs in Fig. 10 -A presented that the DAttAE model achieved the stability in accuracy performance within 400-500 training epochs for both VN-62P and HCMC-12D datasets. For the (k LSTM ) parameter, the experiments in Fig. 10 -B showed the ideal number of used LSTM's cells for our model is about ≥ 32. Similar to that with the default length of observation sequence (L) parameter, our proposed DAttAE model reached the highest accuracy performance with the value of (L) parameter is in range (Cao et al., 2020; Chin et al., 2020) ; in both VN-62P and HCMC-12D datasets.

In this paper, we study the problems of routine Covid-19 infection case forecasting and risk zone classification in forms of deep learning based analysis and representation learning for time series dataset. The accurate daily Covid-19 infection case forecasting and risk zone categorization provide crucial information to governments for effectively planning social resources and imposing suitable policies to prevent the Covid-19 pandemic escalation in different regions. To effectively model and retain the temporal and growing patterns of time series based reported Covid-19 confirmed infection cases, we proposed a novel Bi-LSTM based auto-encoding mechanism with the custom dual selfsupervised attention mechanism, called as DAttAE. The proposed DAt-tAE model not only enables to sufficiently preserve them temporal information from complex/non-smooth time series dataset but also ensures the readiness of learnt sequential embedding vector for dealing with the short-term prediction problem with chaotic noise/disturbance through the application of self-supervised attention mechanism. The utilization of attention mechanism for integrating with the archived sequential embedding vectors which are produced by encoder and decoder supports to estimate the important level of all data entries in each input sequence. Then, the calculated attention weights of input sequences are later used to explicitly facilitate the forecasting-driven fine-tuning process in the after all. The application of attention mechanism with RNN-based architecture such as Bi-LSTM enables to better preserve longer chaotic time series in which temporal latent features are sufficiently capture within deeper neural network architectures. Extensive experiments in real-world reported Covid-19 datasets in Vietnam demonstrated the effectiveness of the proposed ideas.

Our studies in this paper mainly focused on the application of deep learning approach for dealing with Covid-19 pandemic outbreak prediction problem. The proposed model in this paper might be useful for the governments to effectively forecast as well as optimize multi-sector social resources for preventing the Covid-19 pandemic spreading. To effectively deal with complex and non-smooth fluctuations of routine reported infectious cases and deliver more accurate prediction results, we applied the custom dual attention mechanism within our Bi-LSTM based sequential auto-encoding model to efficiently reduce noises and lagged observations from the Covid-19 reported data. The effectiveness of our proposed ideas in this paper have been demonstrated through extensive experiments in real-world Covid-19 datasets. However, our proposed DAttAE still be unable to incorporate the routine reported Covid-19 data with other associated information resources, such as external environmental/geographical aspects in order to achieve better prediction results. The environmental/geographical aspects like temperature, contamination, geographical locations, etc. (Bashir et al., 2020; Rasheed, Rizwan, Javed, Sharif, & Zaidi, 2021) are considered as important direct/indirect factors which might lead to the raise in number of Covid-19 infectious cases. Therefore, the capability of integrating with exogenous/external information resources of Covid-19 predictive models could be also considered as a potential improvement direction for further studies in this domain.

For future works, we intend to expand the studies in GIS-based spatial clustering problem within the context of time series for identifying the spread pattern of Covid-19 hotpots within specific geographical regions. These expansions require extra research effort on integrating geographical spatial clustering techniques with temporal information representation learning to detail with the clustering problem in the context of temporal dynamism. In addition, in order to improve the performance of our proposed DAttAE model in handling chaotic time series prediction task more accurately, we intend to extend the current model to involve a fuzzy-neural inference mechanism (Han, Zhong, Qiu, & Han, 2018; Soto, Castillo, Melin, & Pedrycz, 2019) within the RNN-based architecture. The utilization of fuzzy-neural inference might enable to leverage the performance of time series prediction problem in such chaotic time series-based datasets like as routine Covid- 

Towards providing effective data-driven responses to predict the Covid-19 in São Paulo and Brazil

Neural machine translation by jointly learning to align and translate

Correlation between environmental pollution indicators and COVID-19 pandemic: A brief study in Californian context

Application of the ARIMA model on the COVID-2019 epidemic dataset

Relationship of meteorological factors and human brucellosis in Hebei province

Short-term prediction of COVID-19 spread using grey rolling model optimized by particle swarm optimization

Statistical explorations and univariate timeseries analysis on COVID-19 datasets to understand the trend of disease spreading and death

Time series forecasting of COVID-19 transmission in Canada using LSTM networks

Stability of SARS-CoV-2 in different environmental conditions

A SIR model assumption for the spread of COVID-19 in different communities

Forecasting of covid-19 confirmed cases in different countries with arima models

A comparative study for determining Covid-19 risk levels by unsupervised machine learning methods

Interval type-2 fuzzy neural networks for chaotic time series prediction: A concise overvie

A tuned Holt-Winters whitebox model for COVID-19 prediction

Epidemiology and ARIMA model of positive-rate of influenza viruses among children in Wuhan, China: A nine-year retrospective study

Artificial Intelligence Forecasting of Covid-19 in China

A time series-based statistical approach for outbreak spread forecasting: Application of COVID-19 in Greece

Applications of artificial intelligence in COVID-19 pandemic: A comprehensive review

Early dynamics of transmission and control of COVID-19: A mathematical modelling study. The lancet infectious diseases

SARS-CoV-2 Variants of Concern Delta: A great challenge to prevention and control of COVID-19

Time-series forecasting with deep learning: A survey

Forecasting incidence of hemorrhagic fever with renal syndrome in China using ARIMA model

Understanding unreported cases in the COVID-19 epidemic outbreak in Wuhan, China, and the importance of major public health interventions

Tracking social media during the COVID-19 pandemic: The case study of lockdown in New York State

Tracking development assistance for health and for COVID-19: A review of development assistance, government, out-of-pocket, and other private spending on health for 204 countries and territories

Dynamic graph in a symbolic data framework: An account of the causal relation using COVID-19 reports and some reflections on the financial world

Stochastic probical strategies in a delay virus infection model to combat COVID-19

Socio-economic and environmental impacts of COVID-19 pandemic in Pakistan-an integrated analysis

Deep Learning applications for COVID-19

A survey on LSTM memristive neural network architectures and applications

A new approach to multiple time series prediction using MIMO fuzzy aggregation models with modular neural networks

Sequence to Sequence Learning with Neural Networks

A new model for the spread of COVID-19 and the improvement of safety

Attention is all you need

Neural network powered COVID-19 spread forecasting model

Comparison of two hybrid models for forecasting the incidence of hemorrhagic fever with renal syndrome in Jiangsu Province

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study

Application of a new hybrid model with seasonal auto-regressive integrated moving average (ARIMA) and nonlinear auto-regressive neural network (NARNN) in forecasting incidence cases of HFMD in Shenzhen

Deep learning methods for forecasting COVID-19 time-Series data: A Comparative study

Estimation of the reproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: A data-driven analysis

COVID-19 with spontaneous pneumomediastinum. The Lancet Infectious Diseases

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.