key: cord-0230896-1l9xmh9b
authors: Cao, Longbing; Liu, Qing
title: COVID-19 Modeling: A Review
date: 2021-04-16
journal: nan
DOI: nan
sha: fd156d6818b065fea945f26ac47f2ead6f0c816e
doc_id: 230896
cord_uid: 1l9xmh9b

The SARS-CoV-2 virus and COVID-19 disease have posed unprecedented and overwhelming demand, challenges and opportunities to domain, model and data driven modeling. This paper provides a comprehensive review of the challenges, tasks, methods, progress, gaps and opportunities in relation to modeling COVID-19 problems, data and objectives. It constructs a research landscape of COVID-19 modeling tasks and methods, and further categorizes, summarizes, compares and discusses the related methods and progress of modeling COVID-19 epidemic transmission processes and dynamics, case identification and tracing, infection diagnosis and medical treatments, non-pharmaceutical interventions and their effects, drug and vaccine development, psychological, economic and social influence and impact, and misinformation, etc. The modeling methods involve mathematical and statistical models, domain-driven modeling by epidemiological compartmental models, medical and biomedical analysis, AI and data science in particular shallow and deep machine learning, simulation modeling, social science methods, and hybrid modeling.

Here, we give a brief overview of the COVID-19 pandemic, the global effort on modeling COVID-19, and the scope, motivation and contributions of this comprehensive review.

The coronavirus disease 2019, designated as COVID-19, is a new epidemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. SARS-CoV-2 and COVID-19 have overwhelmingly shocked and shaken the entire world. After the first outbreak in Wuhan China in Dec 2019, the disease spread rapidly across the world in only two months due to its strong human-to-human transmission ability. The World Health Organization (WHO) declared COVID-19 a pandemic on 11 March 2020. To date, COVID-19 has infected more than 194M people with 4M having lost their lives 1 . The continuous iterative mutative infections are even more seriously troubling 215 countries and territories with increasingly unexpected resurgences and virus mutations continuously challenging pandemic containment, vaccinations and treatments. With these widespread and continuous result of COVID-19 mitigation policies, and the rapid and mysterious mutation and spread of coronavirus. Lastly, the modeled problems and areas are fragmented, and although the modeling techniques and results are highly comprehensive, they are divided and evolving.

These brief observations indicate the critical need to model COVID-19 and the urgency of forming a comprehensive understanding of the progress being made in COVID-19 modeling, the research gaps and the open issues. This overview is crucial for not only furthering COVID-19 modeling research but also for informing insights on scientific and public strategies and actions to better battle this pandemic and future pandemics.

In this review, we seek answers and indications to the following major questions:

• What is the research landscape of COVID-19 modeling, i.e., what COVID-19 problems can be modeled and what modeling techniques can address these COVID-19 issues? • How well do AI and data science, specifically machine learning and deep learning, deepen and broaden the understanding and management of the COVID-19 pandemic? • How do varied techniques perform differently in modeling COVID-19?

• What are the gaps in modeling COVID-19?

• Where can AI and data science make new, more or better difference in containing COVID-19?

This comprehensive review obtains a relatively full spectrum of the virus challenges, data issues, techniques, gaps and opportunities in relation to modeling COVID- 19 . In addition to many specific observations obtained through this review, as discussed in the following sections, here we highlight the following high-to-low level observations and quantitative indications of results in modeling different COVID-19 problems and data.

• COVID-19 problems and complexities: as summarized in Section 3.1, the spectrum of problems covers typical aspects of epidemic dynamics and transmission, virus and disease diagnosis, infection identification, contact tracing, virus mutation and resurgence, medical diagnosis and treatment, pharmaceutical interventions, pathological and biomedical analysis, drug and vaccine development, non-pharmaceutical interventions, and socioeconomic influence and impact. We further summarize the COVID-19 characteristics and complexities in Section 2.1, including complex hidden epidemic attributes, high contagion, high mutation, high proportion of asymptomatic to mild symptomatic infections, varied and long incubation periods, ethnic sensitivity, and other high uncertainties, which shows significant differences between SARS-CoV-2 and other existing viruses and epidemics. • COVID-19 data and challenges: the core data is related to the daily reported number of asymptomatic infections, the number of confirmed, recovered and deceased cases, patients' demographics, pathological, clinical and genomic results of the virus and disease tests, and patients' activities and hospitalized information, etc.; external data comprises NPI policies and events, the resident's responses and behaviors, public activities, texts from online and health services, weather, and environment, etc. Since the data spectrum is indeed comprehensive, almost all data complexities widely explored in general modeling have also been involved in modeling COVID, including data uncertainty, dynamics and nonstationarity, various data quality issues such as incompleteness, inconsistency, inequality and incomparability, lack of ground truth information, and limited size of daily reports. For further discussion, see Section 2.2.

these control measures; and resulting in over 40% transmission reduction by restricting human mobility and interactions. • Emotional, social and economic impact: the COVID-19 pandemic has generated an overwhelming negative impact on the public mental health (e.g., significantly increasing anxiety, stress, depression and suicide), economic growth and workforce (e.g., over 20% estimated annual GDP loss in 2020), public health systems, global supply chain, sociopolitical systems, and information disorder, as discussed in Section 7.1. • Modeling gaps: as commented in Section 10.1, the review also finds various issues and limitations of existing research, e.g., an insufficient, biased and partial understanding of COVID-19 complexities and data challenges; a simple and direct application of modeling techniques on often simple data; lack of robust, generalizable and tailored designs and insights into the virus and disease nature and complexities. • Future opportunities: the discussion in Section 10.2 indicates significant new opportunities, e.g., studying rarely to poorly addressed problems such as epidemiologically modeling mutated virus attributes, complex interactions between core and external factors, and the influence of external factors on epidemic dynamics and NPI effect; new directions and methods such as hybridizing multiple sources of data or methods to characterize the complex COVID systems; and novel AI, data science and machine learning research on large-scale simulation of the intricate evolutionary mechanisms in COVID, discovering robust and actionable evidence to dynamically personalize the control of potential resurgence and balance the economic and mental recovery and the virus containment.

Note, the above-quoted numerical results are illustrative, which do not represent the state-of-the-art performance. Interested readers may refer to [38] and specific references for more comprehensive information about how the global scientists have responded to model COVID-19 and [37] to understand what quantitative results COVID-19 modeling has identified in both the above questioned areas and other areas.

Several surveys have been conducted on COVID-19 modeling, which review the progress from specific perspectives, e.g., COVID-19 characteristics [70] , epidemiology [173] , general applications of AI and machine learning [157, 160] such as for epidemic and transmission forecasting and prediction [30, 47, 192] , virus detection, spread prevention, and medical assistance [207] , policy effectiveness and contact tracing [143] , infection detection and disease diagnosis [28, 120] , virology and pathogenesis [132] , drug and vaccine development [111] , and mental health [258] . The methods which have been reviewed include epidemiological modeling [173] , general AI and machine learning methods [47, 106, 111, 151, 157, 160, 194] , data science [122] , computational intelligence [224] , computer vision and image processing [209, 227] , statistical models [151] , and deep learning [103, 265] . These reviews paint a partial picture of what happened in their selected areas based on several references and specific techniques. However, there are currently no comprehensive surveys or critical analyses of the intricate challenges posed by the virus, the disease, the data and the modeling.

This review is the first attempt to provide a comprehensive picture of the problems by modeling coronavirus and COVID-19 data. We start by categorizing the characteristics and challenges of the COVID-19 disease, the data and the modeling in Section 2. A transdisciplinary landscape is formed to categorize and match both COVID-19 modeling tasks and objectives and categorize the corresponding methods and general frameworks in Section 3. The review then focuses on structuring, analyzing and comparing the work on mathematical, data-driven (shallow and deep machine learning), domain-driven (epidemic, medical and biomedical analyses) modeling in Sections 4, 5 and 6, respectively. Section 7 further discusses the modeling on the influence and impact of COVID-19, Section 8 reviews the work on COVID-19 simulations, and the related work on COVID-19 hybrid modeling is reviewed in Section 9. Lastly, Section 10 further discusses the significant gaps and opportunities in modeling COVID- 19. This review aims to be specific to COVID-19 modeling so pure domain-specific research on its medicine, vaccine, biology and pathology is excluded; more comprehensive than the other references to cover problems and techniques from classic to present AI, data science and beyond; unique in summarizing the challenges of the COVID-19 disease, data and modeling; structural and critical by categorizing, comparing, criticizing and generalizing typical modeling methods tailored for COVID-19 modeling from various disciplines and areas; and insightful by extracting conclusive and contrastive (to other epidemics) findings about the virus and disease from the references. The review incorporates much discussion on the topics, opportunities and directions to tackle those issues which are rarely or poorly addressed or areas which remain open in the broad research landscape of modeling COVID- 19. However, this review also presents the various limitations and opportunities for further work. (1) As the scope and capacity of the review is limited, we do not cover the domainspecific literature on pure medical, biomedical and social science-oriented topics and methods without involving modeling methods. (2) There are over 10k references closely relevant to modeling COVID-19 and numerous specific modeling techniques from various disciplines and areas [38] , which could not be fully covered or highlighted in detail in this review. (3) As discussed in the above, different from the narrowly-focused review papers in the area which highlight specific techniques and their relevant references, we only present those mostly used (useful) modeling techniques by summarizing their generalizable formulations. (4) This review does not answer many important questions concerning modelers, governments, policy-makers and domain experts, e.g., what has the modeling told us about the nature of COVID-19, which could be further highlighted in more purposeful reviews and analyses. (5) There are many challenging problems yet to be informed or addressed by the modeling progress, as discussed in Section 10. (6) There are increasingly more and newer references including preprints emerging online every day, which poses significant challenges for us to cover all up-to-date important references on modeling COVID-19.

In this section, we summarize the main characteristics and challenges of the COVID-19 disease, the data and the modeling, which are connected to the various modeling tasks and methods reviewed in this paper.

Modeling COVID-19 is highly challenging because its sophisticated epidemiological, clinical and pathological characteristics are poorly understood [93, 96, 173] . Despite common epidemic clinical symptoms like fever and cough [105] , SARS-CoV-2 and COVID-19 have many other sophisticated characteristics [167] that make them more mysterious, contagious and challenging for quantification, modeling and containment. We highlight a few of these below.

High contagiousness and rapid spread. The high contagiousness of SARS-CoV-2 is one of the most important factors driving the COVID-19 pandemic. In epidemiology, the reproduction number 0 denotes the transmission ability of an epidemic or endemic. It is the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection [79] . If 0 > 1, the epidemic will begin to transmit rapidly in the population, while 0 < 1 indicates that the epidemic will gradually vanish and will not lead to a large-scale outbreak. Different computational methods have resulted in varying reproduction values of COVID-19 in different regions. For example, Sanche et al. [201] report a median 0 value of 5.7 with a 95% confidence interval (CI) [3.8, 8.9 ] during the early stages of the epidemic in Wuhan China. Gatto et al. [80] estimate a generalized reproduction value of 3.60 (95% CI: 3.49 to 3.84) using the susceptible-exposed-infected-recovered (SEIR)-like transmission model in Italy. de Souza et al. [62] report a value of 3.1 (95% CI: 2.4 to 5.5) in Brazil. The review finds that the 0 of COVID-19 may be larger than 3.0 in the initial stage, higher than that of SARS (1.7-1.9) and MERS (< 1) [180] . It is generally agreed that SARS-CoV-2 is more transmissible than severe acute respiratory syndrome conronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) although SARS-CoV-2 shares 79% of the genomic sequence identity with SARS-CoV and 50% with MERS-CoV [70, 96, 138, 179] .

A varying incubation period. The incubation period of COVID-19, also known as the presymptomatic period, refers to the time from becoming infected by exposure to the virus and symptom onset. A median incubation period of approximately 5 days was reported in [124] for COVID-19, which is similar to SARS. In [173] , the mean incubation period was found to range from 4 to 6 days, which is comparable to SARS (4.4 days) and MERS (5.5 days). Although an average incubation period of 5-6 days is reported in the literature, the actual incubation period may be as long as 14 days [124, 167, 264] . The widely varying COVID-19 incubation period and its uncertain value in a specific hotspot make case identification and infection control very difficult. Unlike SARS and MERS, COVID-19 infected individuals are already contagious during their incubation periods. As it is likely that they are unaware that they are infected and have no-to-mild symptoms during this period, they may easily be the unknown sources of widespread transmission. This has informed screening and control policies, e.g., mandatory 14-day quarantine and isolation, corresponding to the longest predicted incubation time.

A large number of asymptomatic and undocumented infections. It is clear that COVID-19 has a broad clinical spectrum which includes asymptomatic and mild illness [42, 128, 179] . However, the accurate number of asymptomatic and mild-symptomatic infections of both original and new-generations of viruses remains unknown. Asymptomatic infections may not be screened and diagnosed before symptom onset, leading to a large number of undocumented infections and the potential risk of contact with infected individuals [120] . The review in [30] reports that of those who tested positive in studies which were conducted in seven countries, the proportion who were asymptomatic ranged from 6% to 41%, while the study in [249] reports that 23% of those infected by COVID were asymptomatic. Buitrago-Garcia et al. [28] found that most people who are infected with COVID-19 do not remain asymptomatic throughout the course of infection, and only 20% of infections remain asymptomatic during follow-up, however this estimate requires further verification and study. Ravindra et al. [195] analyzed the possibility of different levels of asymptomatic transmission in the community and concluded that asymptomatic human transmission is relevant to the varying incubation periods between people and about 31% of all populations are asymptomatic, including familial clusters, adults, children, health care workers, and travelers. The study in [130] shows that a large percentage (86%) of infections are undocumented, about 80% of documented cases are due to transmission from undocumented cases, and the transmission rate of undocumented infections is about 55% of that of documented cases.

High mutation with mysterious strains and high contagion. The four major SARS-CoV-2 variants of concern such as B.1.1.7 (labeled Alpha by WHO) and B.1.351 (Beta) variants have higher transmissibility (B.1.1.7 has approximately 50% increased transmission) [186] and reproduction rate (B.1.1.7 has an increase reproduction rate of 1-1.4) [235] , challenging existing vaccines, containment and mitigation methods. The recently identified variant B.1.617.2 (Delta) in India has even more sophisticated transmissibility and infectious characteristics. The identified variants of concern generally have increased transmissibility (20-50%), increased detrimental change of epidemiology, more severe virulence and disease presentation (e.g., increased hospitalizations or deaths), and result in the decreased effectiveness of public health and social measures, reduced effectiveness of available diagnostics, vaccines and therapeutics, increased diagnostic detection failures, and reduced neutralization by antibodies generated during previous infection or vaccination [229, 251] .

Discussion. While the above summarizes the most recent understanding of SARS-CoV-2 and COVID-19 complexities, it is also noted that knowledge on the nature of the virus and its mutation is limited. Without knowing its origins, there is much misinformation about the virus, its contagion and the interventions required [199] . There is weak to no ground truth about the reality of its infection, symptoms shown in medical imaging, and mitigation and treatment measures. There have been no joint global pathological, epidemiological, biomedical and socioeconomic studies which provide a deep and systematic understanding of the COVID-19 virus and disease complexities, common knowledge, and ground truth.

COVID-19 involves multisource, small, sparse and quality-inconsistent data [122] 3 . Typical data sources and factors include (1) epidemiological factors (e.g., origin, incubation period, transmission rate, mortality, morbidity, and high to least vulnerable population, etc.); (2) daily new-infected-recovered-death case numbers, their reporting time and region of occurrence; (3) quarantine and mitigation measures and policies (e.g., social distancing and border control) relating to communities and individuals; (4) clinical, pathological and genomic data (e.g., symptoms, medical facilities, hospitalization records, medical history, medical imaging, pharmaceutical treatments, gene and protein sequences); (5) infective demographics (e.g., age, gender, race, cultural background, and habit); (6) social activities and mobility; (7) domain knowledge and precautionary guidance from authorities on the virus and disease; (8) seasonal and environmental factors (e.g., season, geographical location, temperature, humidity, and wind speed); (9) news, reports and social media discussions on coronavirus and COVID-19; and (10) fake news, rumor and misinformation. Such COVID-19 data are heterogeneously coded in character, text, number or image; in unordered, temporal/sequential or spatial modes; in static and dynamic forms; and with the characteristics as follows.

Despite the large volume of existing research, modeling COVID-19 is still in an early stage with many open issues, partially because of the significant complexities of COVID-19 data. The main characteristics of COVID-19 data are summarized below and impose a computational burden on the modeling of COVID-19.

Acyclic and short-range case numbers which are small in size. The publicly available data for COVID-19 modeling is limited. Except in rare scenarios such as in the US, most countries and regions report a short-range (2-3 months or even shorter such as local hotspot-based outbreaks), low-granularity (typically daily), and small-size (daily case numbers for a short period of time and a small cluster of the population) record of COVID-19 data. Such data is typically acyclic without obvious seasonal or periodical patterns as in influenza [56] and recurrent dengue epidemics in tropical countries [228] .

Inaccurate statistics on COVID-19 cases. The reported new-infected-recovered-death case numbers are estimated much lower than their real number in most countries and regions. This may be due to many reasons, such as pre-symptomatic and asymptomatic infections, limited testing capability, nonstandard manual recording, different confirmation standards, an evolving understanding of the disease nature, and other subjective factors. The method of calculating case statistics may vary significantly from country to country; the actual figures in some countries and regions may even be unknown; and no clear differentiation is made between hotspot and country/region-based case reporting. The gaps between the infection reality and what has been documented may be more apparent in the first wave, in the early stage of outbreaks, and in some countries and regions [141] . As result, the actual infected number and infected regions of COVID-19 pandemic may be much bigger than those publicly reported.

Lack of reliable data particularly in an initial outbreak. The spread of an epidemic in its initial phase can be regarded as transmission under perfect conditions. In its initial phase, the intrinsic epidemiological characteristics of COVID-19, such as reproduction rate, transmission rate, recovery rate and mortality are closer to their true values. For example, the modeling results in [198] show a wide range of variations due to the lack of reliable data, especially at the beginning of an outbreak.

Lack of high-quality microlevel data. Data on COVID-19 cases, including daily infected cases, daily new cases, daily recovery cases, and daily death cases, is collected on a daily basis, while daily susceptible case numbers were also reported in Wuhan. However, macrolevel and low-dimensional data is far from comprehensive for inferring the complex transmission processes accurately and more fine-grained data with various aspects of features and high dimensions are needed. For example, during the initial phase of the Wuhan outbreak, the dissemination of SARS-CoV-2 was primarily determined by human mobility in Wuhan, however no empirical evidence on the effect of key geographic factors on local epidemic transmission was available [191] . The risk of COVID-19 death varies across various sociodemographic characteristics [66] , including age, sex, civil status, individual disposable income, region of residence, and country of birth. More specific data is required to address the sociodemographic inequalities related to contracting the COVID-19 virus. To contain the spread of COVID-19, governments propose and initiate a series of similar to different NPIs. No quantitative evidence or systematic evaluation analyzes how these measures affect epidemic transmission, leading to challenges in inferring NPI-based COVID-19 transmission and mitigation.

Data incompleteness, inconsistencies, inequality and incomparability. Typically, it is difficult to find all-round information about a COVID-19 patient's infection source, demographics, behaviors, social activities (including mobility and in social media), clinical history, diagnoses and treatments, and resurgence if any. COVID-19 public data also presents strong inconsistencies and inequalities across reporting hotspots, countries and regions, updating frequencies and timelinesses, case confirmation standards, collection methods, and stages [198] . Data from different countries and areas may be unequal and incomparable due to their non-unified statistical criteria, confirmation standards, sampling and coverage methods, and health and medical conditions and protocols. These are also related to or affected by a person's race, living habits, and their applied mitigation policies, etc.

Other issues. Comparing public data available from different sources also reveals other issues like potential noise, bias and manipulation in some of the reported case numbers (e.g., due to their nonuniform statistic standards or manual statistical mistakes), missing values (e.g., unreported on weekends and in the early stage of outbreaks), different categorization of cases and stages (e.g., some with susceptible and asymptomatic case numbers), misinformation, and lack of information and knowledge about resurgence and mutation.

Discussion. While increasing amounts of COVID-19 data are publicly available, they are in fact poor and limited in terms of quality, quantity, capability and capacity to discover deep insights about the nature of COVID-19, its interaction with external factors, and its effects. It is fundamental and urgent to acquire substantially larger and better-quality multisource COVID-19 data. This is crucial so that meaningful modeling can be robustly conducted and evaluated to reveal intrinsic knowledge and insights about the disease and to assist effective pandemic control.

The COVID-19 pandemic is essentially an open complex system with significant system complexities [35, 247] . Examples are the hidden nature and strong uncertainty, self-organization, dynamics and evolution of the virus, disease and their developments and transmissions; their sophisticated interactions and relations to environments and context; the differentiated virus infections of individuals and communities; and the significant emergence of consequences and impacts on society in almost every part of the world. However, the publicly available small and limited COVID-19 data does not explicitly display a complete picture and a sufficient indication of the above complexities and intrinsic epidemiological attributes, transmission process, and cause-effect relations. It is thus challenging to undertake sound, robust, benchmarkable and generally useful modeling on such potential-limited data.

Achieving ambitious modeling objectives on low-quality small COVID-19 data. As discussed in Section 3.1, many business problems and objectives are expected to be addressed by modeling COVID-19. However, the strong constraints in COVID-19 public data discussed in Section 2.2 significantly limit this potential. Modelers have to carefully define learnable objectives, i.e., what can be learned from the data, acquire the essential and feasible data, or leverage data poverty by more powerful modeling approaches. For example, when a model is trained on a country's case numbers, its application to other countries may produce unfair results owing to their data inequalities. Another example is how to combine multisource but weakly connected data for meaningful high-potential analysis and results.

Undertaking complex modeling with limited to no domain knowledge and ground truth. The weak to no-firm knowledge and ground truth about COVID-19 and its medical confirmation and annotations and poor-quality data limit the capacity and richness of the hypotheses to be tested and modeled on the data. It is not surprising that rather simple and classic analytical and learning models are predominantly applied by medical and biological scientists to verify specific hypotheses, e.g., various SIR models, time-series regression, and traditional machine learning methods [49, 81, 207] , which occupy the top-80 keywords-based methods in the 200k WHO-collected references. In contrast, statisticians and computer scientists tend to enforce overparameterized models, over-complicated hypotheses, or over-manipulated data, resulting in highly specific results and over-or under-fitting issues.

Challenges in addressing the COVID-19 disease and data complexities. The unique characteristics and complexities of the COVID-19 disease and data discussed in Sections 2.1 and 2.2 challenge the existing modeling methods including deep neural learning. Examples are generalized modeling of quality and quantity-limited COVID-19 data from different countries and regions and over evolving time periods [232] , robust modeling of short-range, small-size and incomplete-cycle data, and high-capacity modeling of mixture distributions with exponential growth [63] , sub-exponential growth [141] , discontinuous phase transition [232] and instant changes in case developments.

Disclosing complicated relations and interactions in weakly-coupled multisource data. COVID-19 is affiliated with many personal, social, health/medical, political and other factors, dispersedly reflected in explicitly or implicitly related multisource systems. The COVID-19 pandemic is formed and evolves as a dynamic social-technical process and the co-effects of multi-factor interplay. These multi-aspect factors are coupled strongly or weakly, locally or globally, explicitly or implicitly, subjectively or objectively, statically or dynamically, and essentially or accidentally in the virus and disease formation, development, influence, and evolution. Disclosing such sophisticated factor couplings and interactions is significantly challenging as they are not obvious or easily verifiable in observations. Therefore, modeling COVID-19 requires in-depth transdisciplinary cooperation between computer science, bioinformatics, virology, sociology and many other disciplines. A single factor alone cannot disclose the intrinsic and intricate nature of COVID-19 or explain the variability or shape the dynamics of this epidemic.

Discussion. The COVID-19 complexities result in significant modeling challenges, resulting from the data, the unclear epidemic transmission mechanisms and processes, and the entanglement between epidemic factors/observations and external objective (e.g. countermeasures) and subjective (e.g. people behavior changes) factors. COVID-19 modeling goes beyond the transformation and applications of powerful models such as overparameterized deep neural networks, SIR variants and hierarchical Bayesian networks on the highly limited and poorly coupled small COVID-19 data. Careful designs are necessary to address the specific COVID-19 characteristics and complexities of its data and disease, avoid under-/over-fitting, and focus on modeling the complexities in relation to their underlying nature and insight. Complicating models does not necessarily contribute to better or more actionable knowledge and intelligence about the COVID-19 disease and data [22, 34, 223] .

To address the aforementioned COVID-19 disease, problems, data and modeling challenges, we present a high-level landscape to categorize and connect the comprehensive objectives and techniques for modeling COVID-19.

Here, we summarize the main business problems and objectives in modeling COVID-19. The analysis of the WHO-collected literature [38] gives us a clear indication of the top business terms in over 200k references and 22k modeling-focused references. The top-ranked keywords include COVID-19 and coronavirus pandemic outbreak, spread, infection, transmission, factors, symptoms, characteristics, treatment, diagnosis, mortality, their risk and effects, as well as major data analysis and domain-specific research areas and methods. Below, we consolidate the main concerns and objectives of modeling COVID-19. Table 1 summarizes the associated modeling factors, modeling methods, and references.

Characterizing and predicting the COVID-19 epidemic dynamics and transmission. An imperative challenge is to understand the COVID-19 epidemic mechanisms, transmission process and dynamics, infer its epidemiological attributes, and understand how the virus spreads spatially and socially [1] . The majority of COVID-19 modeling tasks focus on exploring the source and spectrum of the COVID-19 infection, clinical and epidemiological characteristics, tracking its transmission routes, and forecasting case development trends Classifiers, outlier detectors, genome analysis, protein analysis, DNNs, etc. [7, 18, 111, 139, 169, 193, 205, 237, 256, 270] Resurgence and mutation

Daily new-infected-recovered-death case numbers and reporting time, quarantine and mitigation measures and policies on communities and individuals, social activities and mobility, seasonal and environmental factors (e.g., season, geographical location, temperature, humidity, and wind speed)

Compartmental models, simulation models, compartmental models combined with regression, epidemic renormalisation group, etc. [12, 31, 127, 137, 174] Influence and impact Quarantine and mitigation measures and policies on communities and individuals, domain knowledge and precaution guidance from authorities, social activities and mobility, demographics (e.g., age, gender, racist, cultural background, habit), related news, reports, social media discussions, and misinformation.

analysis, questionnaire methods, age-structured SIR/SEIR models, deep neural networks (e.g., BERT and LSTM), simulation models, etc. [40, 119, 131, 208, 236] and the peak number of infected cases and disease transmission [42, 254, 255] . They aim for findings to understand the nature of the virus and disease and inform disease precaution, virus containment, mitigation campaigns, and medical resource planning, etc.

Modeling the resurgence and mutation. As the SARS-CoV-2 mutation and COVID-19 resurgence are highly uncertain and much more transmissible and infectious, we highlight their relevant research here. However, as our current understanding of the resurgence and mutation is very limited, COVID-19 may become another epidemic disease which stays with humans for a long time. The WHO-identified four variants of concern which have higher transmissibility, contagion and complexities [85, 86, 229, 251] . Imperative research is expected to quantify the resurgence conditions, control potential resurgences after lifting certain restrictions and reactivating businesses and activities [137, 174] , distinguish the characteristics and containment measures between waves [8, 72, 84] , and prepare for and predict resurgence, mutation and their responsive countermeasures [12] .

Disease diagnosis, infection identification and contact tracing. Given the strong transmission and reproduction rates, high contagion, and sophisticated transmission routes and the unexpected resurgence of COVID-19 and its virus mutation, it is crucial to immediately identify and confirm exposed cases and trace their origins and contacts to proactively implement quarantine measures and contain their potential spread and outbreak [159] . This is particularly important during the varying incubation periods (usually 2 -14 days) which are asymptomatic to mildly symptomatic yet highly contagious. In addition to chemical and clinical approaches, identifying COVID-19 by analyzing biomedical images, genomic sequences, symptoms, social activities, mobility and media communications is also essential [226] .

Modeling the efficacy of medical treatment and pharmaceutical interventions. The general practices of timely and proper COVID-19 medical treatments, drug selection and pharmaceutical measures, and ICU and ventilation etc. play fundamental roles in fast recovery, mitigating severe symptoms and reducing the mortality rate of both the original and increasingly-mutated virus strains. However, the lack of best practices and standardized protocols and specifications of medical and pharmaceutical treatments on the respective virus variants in terms of patient's demographic and ethnic context and the wide dispersal of online misinformation of drug use may also contribute to global imbalance in containing COVID-19. Research is required to select and discover suitable drugs, which best match the patient's diagnosis and ethnic contexts with suitable medical treatments to mitigate critical conditions and mortality in a timely manner, etc. [18, 254, 260, 274] .

Modeling the efficacy of non-pharmaceutical intervention and policies. Various NPIs, such as travel bans, border control, business and school shutdowns, public and private gathering restrictions, mask-wearing, and social distancing are often implemented to control the outbreak of COVID-19. Different governments tend to enforce them in varied combinations and levels and ease them within different timeframes and following different procedures, resulting in different outcomes. Limited research results have been reported to verify the effects of these measures and their combinations on containing the virus spread and case number development, the balance between enforcement levels and containment results, and the response sensitivity of the restrictions in relation to the population's ethnic context [24, 63, 221] . Limited results are available on the threshold and effects of COVID-19 vaccinations and herd immunity. More robust results will inform medical and public health policy-making on medication, business and society.

Understanding pathology and biomedical attributes for drug and vaccine development. By involving domain knowledge and techniques such as virology, pathogenesis, genomics and proteomics, pathological and biomedical analyses can be conducted on pathological test results, gene sequences, protein sequences, physical and chemical properties of SARS-CoV-2, drug and vaccine information and their effects. Accordingly, it is necessary to conduct domain-driven analysis and model correlated drugs and vaccines with genomic and protein structures to select and develop COVID-19 drug and vaccines, to understand the drugtarget interactions, and to diagnose and identify infection, understand virus mutation, etc. [18, 97, 193] . More research is needed on COVID-19 immunity responses, drug and vaccine development, and mutation intervention.

Modeling COVID-19 influence and impact. While the COVID-19 pandemic has changed the world and has had a significant and overwhelming influence on almost all aspects of life, society and the economy, quantifying its influence and impact has rarely been studied. COVID-19 negative impact modeling may include (1) economic impact on growth and restructuring [252] ; (2) social impact on people's stress, psychology, emotions, behavior and mobility [175, 258] ; and (3) transforming business processes and organizations, manufacturing, transport, logistics, and globalization [204, 234] . In contrast, it would also be interesting to model its 'opportunity' and influence on (1) enhancing the wellbeing and resilience of individuals, families, society and work-life balance [187] ; (2) digitizing and transforming work, study, entertainment and shopping [215] ; (3) restructuring supply-demand relations and supply chains for better immediate availability and to satisfy demand [64] ; (4) promoting research and innovation on intervening in global black-swan disasters like COVID-19 and its impacts [269] ; and (5) enhancing trust and development in science, medicine, vaccination and hygiene [182] . Other impact modeling tasks include analyzing the relations between the COVID-19 containment effect and socioeconomic level (e.g., income level particularly in relation to lower-income and disadvantaged groups), healthcare capacity and quality, government crisis management capabilities, citizen-government-cooperation, and public health and hygiene habits.

The flow of COVID-19 modeling has strong features such as: (1) multi-disciplinary techniques of mathematics and statistics, epidemiology, broad AI and data science including shallow and deep learning, and social science; (2) epidemiological methods to explore business problems and research areas; (3) domain, model and data-driven approaches consisting of various families of domain knowledge, models and methods which are widely applied in all business problems; (4) case studies and hypothesis tests to highlight the results of particular methods or modeling specific settings, scenarios or data.

The keyword-based analysis of the 22k WHO-collected modeling references in [38] shows that epidemiological modeling, mathematical and statistical modeling, artificial intelligence and data science, and simulation modeling play predominant roles in understanding, characterizing, simulating, analyzing and predicting COVID-19 issues. We thus categorize the research landscape of COVID-19 modeling into six: domain-driven modeling, mathematical/statistical modeling, data-driven learning, influence/impact modeling, simulation modeling, and hybrid methods in this review. Fig. 1 summarizes the transdisciplinary research landscape connecting the aforementioned six categories of modeling techniques and their respective modeling methods to major COVID-19 business problems and their modeling objectives in Section 3.1.

• COVID-19 mathematical/statistical modeling: developing and applying mathematical and statistical models such as time-series analysis (e.g., regression models and hazard and survival functions) and statistical models (e.g., descriptive analytics, statistical processes, latent factor models, temporal hierarchical Bayesian models, and stochastic compartmental models) to estimate COVID-19 transmission processes, symptom identification, disease diagnosis and treatment, sentiment analysis, misinformation analysis, and resurgence and mutation. • COVID-19 data-driven learning: developing and applying data-driven classic (e.g., tree models such as random forests and decision trees, kernel methods such as support vector machines (SVMs), NLP and text analysis, and classic reinforcement learning) and deep (e.g., deep neural networks, transfer learning, deep reinforcement learning, and variational deep neural models) analytics and learning methods on COVID-19 data to characterize, represent, classify, and predict COVID-19 problems, such as case development, mortality and survival forecasting, medical imaging analysis, NPI effect estimation, and genomic analysis. • COVID-19 domain-driven modeling: developing and applying domain knowledge and domain-specific models for COVID- 19 ; examples are epidemiological compartmental models to characterize the COVID-19 epidemic transmission processes, dynamics, transmission and risk, and the influence of external factors on COVID-19 epidemics, resurgence and mutation; and medical, pathological and biomedical analysis for infection diagnosis, case identification, patient risk and prognosis analysis, medical imaging-based diagnosis, pathological and treatment analysis, and drug development. • COVID-19 influence/impact modeling: developing and applying methods to estimate and forecast the influence and impact of SARS-COV-2 variations and COVID-19 diseases and their interventions, treatments and vaccination on epidemic transmission dynamics, virus containment, disease treatment, public resources including healthcare systems, social systems, economy, and human psychological health and behaviors. • COVID-19 simulation modeling: developing and applying simulation models such as theories of complex systems, agent-based simulation, discrete event analysis, evolutionary learning, game theories, and Monte-Carlo simulation to simulate the COVID-19 epidemic evolution and the effect of interventions and policies on the COVID-19 epidemic. • COVID-19 hybrid modeling: hybridizing and ensembling multiple models to tackle multiple business problems and objectives, multiple tasks, and multisource data and those individual objectives, tasks and data sources that cannot be better understood by single approaches.

It is worth mentioning that each of the above modeling techniques and their specific methods may be applicable to address different business problems and modeling objectives, as shown in Fig. 1 . Below, we review the progress of the above six categories of COVID-19 modeling by (1) summarizing the typical modeling techniques and (2) categorizing their typical applications in modeling diverse COVID-19 issues.

Mathematical and statistical models are overwhelmingly used to estimate and predict the transmission dynamics and reveal the truth of epidemic in a formalized and quantitative way. Accurate COVID-19 mathematical models are indispensable for the COVID-19 epidemic forecasting and decision making, amounting to 13k of 22k references on modeling COVID-19. Here, we review two sets of main mathematical methods: time-series analysis and statistical modeling, and their applications in COVID-19 modeling.

We here focus on two typical methods that are predominantly customized for modeling COVID-19: regression models and hazard and survival functions.

Regression models. Typical regression models such as logistic regression and auto-regressive integrated moving average (ARIMA) variants are widely used in epidemic and COVID-19 modeling. Logistic growth models can model the number of COVID-19 infected cases. For a population of with the infection rate , the growth scale of infection number can be modeled by

over time t. Accordingly, with historical COVID-19 cases of a place at a time period, an S-shaped curve can be derived to describe and forecast the growth distributions of the COVID-19 infections and the peak infected number by adjusting the constant rate . Logistic models are weak to incapable of modeling other states of COVID-19 cases and there are many challenges, such as nonstationary characteristics discussed in Section 2.2, while some may be better modeled by more sophisticated regression models. ARIMA and its variants also model the temporal movement of COVID-19 case numbers with more flexibility than logistic ones. For example, the number of infected cases can be modeled by , , which factorizes the number into consecutive past numbers with errors:

(2) over the number of time lags (order) of autoregression , the order of moving average , the degree of differencing , and time with a constant . − refers to the infected cases at time − with weight , − refers to the error between − and − −1 with weight − . Adjusting parameters like , and can simulate/capture some of the time series characteristics (e.g., the process and trend of the infection series by and , seasonality by , and volatile movement by the distribution of error terms). Similarly, ARIMA models can be used to simulate and forecast the number of recoveries and deaths in relation to COVID-19.

In addition, ARIMA and its variants can be integrated with other modeling methods to characterize other aspects of COVID-19 time series. For example, the wavelet decomposition of frequency-based nonstationary factors can model the oscillatory error terms of ARIMAbased modeling of COVID-19 infected cases [40] ). Another example is to combine the decision tree method with regression to form a regression tree and identify mortality-sensitive COVID-19 factors [41] .

Hazard and survival functions. Hazard functions and survival functions are often used to model the mortality and survival (recovery) rates of patients using time-to-event analysis. A hazard function models the mortality probability ℎ |x of a COVID-19 patient with the factor vector x (∈ ℛ ) of dying at discrete time t:

On the contrary, the survival function models the probability |x of surviving until time t:

In discrete time, = =1 1 − ℎ where ℎ is the mortality probability at time . In continuous time, |x = 1 − |x where · is the cumulative distribution function until time t. For covariates x with their relations represented by an either linear or nonlinear function ·; with parameters , the mortality rate of COVID-19 can be modeled by a Cox proportional hazard model [190] :

where ℎ 0 is the baseline hazard function, the function ·; can be implemented by a linear function such as a linear transform or a nonlinear function such as a deep convolutional network. For example, in [203] , ·; is implemented by a shallow neural network with a leaky rectified linear unit-based activation of the input and then another tangent transformation.

In the case of time-varying covariates x , the above hazard, survival and transform functions should be time sensitive as well.

Typically, the performance of mathematical modeling is measured by metrics such as mean absolute error (MAE), root mean square error (RMSE), the improvement percentage index (IP), and symmetric mean absolute percentage error (sMAPE) in terms of certain levels of confidence intervals.

Time-series analysis contributes the most (about 3k of the 22k WHO-listed references) to COVID-19 modeling. As shown in [38] , regression models, linear regression, and logistic regression are mostly applied in COVID-19 modeling. Many linear and nonlinear, univariate, bivariate and multivariate analysis methods have been intensively applied for the regression and trend forecasting of new, susceptible, infectious, recovered and death case numbers. Popular methods include linear regression models such as ARIMA and GARCH [213, 217] , logistic growth regression [243] , COX regression [203] , multivariate and polynomial regression [6, 46, 89] , generalized linear model and visual analysis [153] , support vector regression (SVR) [87] [196] , regression trees [41] , hazard and survival functions [203] , and more modern LSTM networks. In addition, temporal interpolation methods such as best fit cubic, exponential decay and Lagrange interpolation, spatial interpolation methods such as inverse distance weighting, smoothing methods such as moving average, and spatio-temporal interpolation [33] are applicable to fit and forecast COVID-19 case time series. We illustrate a few tasks below: COVID-19 epidemic distributions, case number and trend forecasting, and COVID-19 factor and risk analysis.

COVID-19 case number and trend forecasting and epidemic distributions. Regressioncentered time series analysis has been widely applied to forecast case number developments and trends. For COVID-19 prediction, Singh et al. [213] apply ARIMA to predict the COVID-19 spread trajectories for the top 15 countries with confirmed cases and conclude that ARIMA with a weight to adjust the past case numbers and the errors has the ability to correct model prediction and is better than regression and exponential models for prediction. However, ARIMA lacks flexible support for volatility and in-between changes during the prediction periods [213] . Gupal et al. [89] adopt polynomial regression to predict the number of confirmed cases in India. Almeshal et al. [9] utilize logistic growth regression to fit the actual infected cases and the growth of infections per day. Wang et al. [243] model the cap value of the epidemic trend of COVID-19 case data using a logistic model. With the cap value, they derive the epidemic curve by adapting time series prediction. To find the best regression model for case forecasting, Ribeiro et al. [196] explore and compare the predictive capacity of the most widely-used regression models including ARIMA, cubist regression (CUBIST), random forest, ridge regression, SVR, and stacking-ensemble learning models. They conclude that SVR and stacking ensemble are the most suitable for the short-term COVID-19 case forecasting in Brazil. In addition, linear regression with Shannon diversity index and Lloyd's index are applied to analyze the relations between the meta-population crowdedness in city and rural areas and the epidemic length and attack rate [191] .

COVID-19-specific factor and risk analysis. Time-series analysis may be used to (1) analyze the influence of specific and contextual factors on COVID-19 infections and COVID-19 epidemic developments including infection, transmission, outbreak, hospitalization, and recovery, e.g., on COVID-19 survival, mortality and recovery; and (2) analyze the influence and impact of external and contextual factors of COVID-19 outbreak on the population, health, society and the economy, case developments and containment. For example, to investigate the potential risk factors associated with fatal outcomes from COVID-19, Schwab et al. [203] present an early warning system assessing COVID-19 related mortality risk with a variation of the Cox proportional hazard regression model. Chen et al. [48] adapt the Cox regression model to analyze the clinical features and laboratory findings of hospitalized patients. Charkraborty et al. [41] design the wavelet transform optimal regression tree (RT) model, which combines various factors including case estimates, epidemiological characteristics and healthcare facilities to assess the risk of COVID-19. The advantage of RT is that it has a built-in variable selection mechanism from high dimensional variable space and can model arbitrary decision boundaries.

Correlation analysis between COVID-19 epidemic dynamics and external factors. Much research has been conducted on analyzing the relationships between COVID-19 transmission and dynamics and external and contextual factors. For example, Cox proportional hazard regression models are used to analyze high risk sociodemographic factors such as gender, individual income, education level and marital status that may be associated with a patient's death [66] , and logistic regression models are applied to analyze the relations between COVID-19 (or SARI with unknown aetiology) and socioeconomic status (per-capita income) [62] . To reveal the impact of meteorological factors, Chen et al. [46] examine the relationships between meteorological variables (i.e., temperature, humidity, wind speed and visibility) and the severity of the outbreak indicated by the confirmed case numbers using the polynomial regression method; while Liu et al. [134] fit the generalized linear models (GLM) with negative binomial distribution to estimate the city-specific effects of meteorological factors on confirmed case counts. In [183] , Loess regression does not show an obvious relation between the COVID-19 reproduction number, weather factors (humidity and temperature) and human mobility. Lastly, linear models including linear regression, Lasso regression, ridge regression, elastic net, least angle regression, Lasso least angle regression, orthogonal matching pursuit, Bayesian ridge, automatic relevance determination, passive aggressive regressor, random sample consensus, TheilSen regressor and Huber regressor are applied to analyze the potential influence of weather conditions on the spread of coronavirus [142] .

Discussion. Time-series methods excel at characterizing sequential transmission processes and temporal case movements and trends. They lack the capability to involve other multisource factors and disclose deep insights into why case numbers evolve in a certain way and how to intervene in the infection, treatment and recovery.

Statistical learning, in particular Bayesian models, play a critical role in stochastic epidemic and infectious disease modeling [23] . It takes generative stochastic processes to model epidemic contagion in epidemic modeling [10, 166] . In contrast to compartmental models, statistical models involve prior knowledge about an epidemic disease and their results have confidence levels corresponding to distinct assumptions (i.e., possible mitigation strategies), which better interpret and more flexibly model COVID-19 complexities. Below, we summarize typical statistical models and their applications in COVID-19 statistical modeling.

Statistical models are widely applied to COVID-19 modeling tasks including (1) simulating and validating the state distributions and transitions of COVID-19 infected individuals over time, (2) modeling latent and random factors affiliated with the COVID-19 epidemic processes, movements and interactions, (3) forecasting short-to-longterm transmission dynamics, (4) evaluating the effect of non-pharmaceutical interventions (NPI), and (5) estimating the impact of COVID-19 such as on socioeconomic aspects. Typical methods include descriptive analytics, Bayesian hierarchical models, probabilistic compartmental models, and probabilistic deep learning. Below, we introduce some common statistical settings and corresponding statistical models in both frequentist and Bayesian families that are often applied in COVID-19 statistical modeling.

Taking a stochastic (vs. deterministic) state transition assumption, various statistical processes can be assumed to simulate and estimate the state-specific counts (case numbers) and the probability of state transitions (e.g., between infections and deaths) during the COVID-19 spread. The stochastic processes and states (e.g., its infection and mortality) of a COVID-19 outbreak are influenced by various explicit and latent factors. Examples of explicit (observable) factors include a person's demographics (e.g., age and race), health conditions (e.g., disease history and hygienic conditions), social activities (e.g., working environment and social contacts), and the containment actions (e.g., quarantined or not) taken by the person. Latent factors may include the person's psychological attitude toward cooperation (or conflict) with containment, health resilience strength to coronavirus and the containment influence on the outcome (e.g., infected or deceased). Fig. 2 (a) illustrates a general graphical model of the temporal hierarchical Bayesian modeling of COVID-19 case numbers for estimation and forecasting. The reported case number (e.g., death toll or infected cases) at time can be estimated by˜, which is inferred from the documented (declared) infections and removed (e.g., recovered and deceased) rate . The documented infection number is inferred from the infected population and the test rate . is inferred from the exposed population and the infection rate , is determined by its exposed rate . Further, we assume the removed rate is influenced by various medical treatments , determined by auxiliary variables including socioeconomic condition 2 , the treatment effectiveness , and the public health quality . The infection rate is determined by NPIs , which are further influenced by the NPI execution rate and the socioeconomic factor 1 . The priors of the corresponding parameters are , , , 1 , 2 , , , ℎ and , which may follow specific assumptions.

For the statistical settings and hypotheses, typical statistical distributions of COVID-19 state-specific counts are applied to (1) infection modeling, e.g., by assuming a Bernoulli process ( , with the probability of exposure to infections over contacts) and then a Poisson process at points of infections with exponentially-distributed infectious periods ( with the rate referring to the infection rate within the infectious period); (2) mortality modeling, e.g., by assuming a negative binomial distribution ( , ) or a Poisson distribution ( with the rate parameterized on the mortality rate and population ). Further, the basic reproduction number 0 may be estimated by , the infections will be under control if 0 is less than a given threshold (e.g., 1). Cao, et al.

∼ Normal0, 0.5 (6a)

The initial reproduction number:

The intervention impact:

The time-varying reproduction number:

The distribution rate:

The 6 sequential days of infections:

The daily serial interval:

The number of infections:

The time from infection to death:

The variance latent variable:

The expected number of deaths:

The observed daily deaths:

The above hierarchical statistical model in Fig. 2 (a) can be customized to estimate and forecast COVID-19 case numbers in terms of specific hypotheses, settings and conditions. For example, Fig. 2 (b) shows the graphic model for the hierarchical model proposed in [77] to estimate the death number from its inferred variable˜and inferred from auxiliary variable . The inferred death number˜is sampled from the basic reproduction number 0 with a normally-distributed prior 2.4, | | parameterized by its variance variable and the probability of infected death determined by two Gamma priors. In addition,˜is also influenced by the number of new infections with two latent variables, the distribution rate of the Exponential distribution and the daily serial interval and a variable as a parameter of the reproduction rate. Fig. 2 (b) also shows the prior distributions of the auxiliary variables, for example, assuming the variable describing the time from infection to death following an exponential prior 0.03. The hierarchical statistical model in Fig. 2 (b) to estimate the death number can be described by the following equations.

Another major set of COVID-19 statistical modeling incorporates statistical hypotheses and settings into other epidemic models such as compartmental models to approximate some state distributions or estimate some parameters. A typical application reformulates SIR-based models as a system of stochastic differential equations, e.g., by assuming Gammadistributed probability density of the exposed and infected states in Section 6.1. Lastly, modeling the influence of mitigation strategies on the COVID-19 case numbers is also a typical statistical modeling problem.

Modeling. The contagion of an epidemic like COVID-19 is complex and uncertain. Statistical or probabilistic modeling naturally captures this uncertainty around epidemics better than other models. In COVID-19, for example, hierarchical Bayesian distributions with hidden states and parameters are used to model the causal relationships in their transmission [77, 162] , and probabilistic compartmental models [91, 168, 275] integrate the transmission mechanisms of epidemics with the statistics of observed case data. Below, we summarize the relevant applications of descriptive analytics, Bayesian statistical modeling and stochastic compartmental modeling of the COVID-19 epidemic statistics, epidemic processes, and the influence of external factors such as NPIs on the epidemic. Table 2 further  summarizes various applications of COVID-19 statistical modeling. COVID-19 descriptive analytics. Descriptive analytics are the starting point of COVID-19 statistical analysis, which are typically seen in non-modeling-focused references and communities. Typically, simple statistics such as the mean, deviation, trend and change of COVID-19 case numbers are calculated and compared. For example, the statistics of asymptomatic infectives are reported in [120] . In [19] , change point analysis detects a change in the exponential rise of infected cases and Pearson's correlation between the change and lockdown implemented across risky zones. In addition, case statistics may be calculated in terms of specific scenarios, e.g., a population's mobility [100] or workplace [15] .

Bayesian statistical modeling of COVID-19 epidemic processes. Bayesian statistical modeling can model stochastic COVID-19 epidemic processes, specific factors that may influence the COVID-19 epidemic process, causality, partially-observed data (e.g., under-reported infections or deaths), and other uncertainties. For example, stochastic processes are adopted to model conventional epidemic contagion [10, 166] . Niehus et al. [162] use a Bayesian statistical model to estimate the relative capacity of detecting imported cases of COVID-19 by assuming the observed case count to follow a Poisson distribution and the expected case count to be linearly proportional to daily air travel volume. To capture the complex relations in the COVID-19 pandemic, the causal relationship in the transmission process can be modeled by hierarchical Bayesian distributions [77, 162] . In [71] , a special case of the continuous-time Markov population process, i.e., a partially-observable pure birth process, assumes a binomial distribution of partial observations of infected cases and estimates the future actual values of infections and the unreported percentage of infections in the population.

Bayesian statistical modeling of external factors on COVID-19 epidemic. Another important application is to model the influence of external factors on COVID-19 epidemic dynamics. For example, Flaxman et al. [77] infer the impact of NPIs including case isolation, educational institution closure, banning mass gatherings and/or public events and social distancing (including local and national lockdowns) in 11 European countries and estimate the course of COVID-19 by back-calculating infections from observed deaths by fitting a semi-mechanistic Bayesian hierarchical model with an infection-to-onset distribution and an onset-to-death distribution. In addition, case numbers, especially deaths, their model also jointly estimates the effect sizes of interventions.

Stochastic compartmental modeling of COVID-19 epidemic. Stochastic compartmental models can simulate stochastic hypotheses of specific aspects (e.g., probability of a statebased population or of a state transition) of the COVID-19 epidemiological process and the stochastic influence of external interventions on the COVID-19 epidemic process. Such probabilistic compartmental models integrate the transmission mechanisms of epidemics with the characteristics of observed case data [63, 91, 168, 275] . For example, in [239] , a COVID-19 transmission tree is sampled from the genomic data with Markov chain Monte Carlo (MCMC)-based Bayesian inference under an epidemiological model, the parameters of the offspring distribution in this transmission tree are then inferred, and the model infers the person-to-person transmission in an early outbreak. Based on probabilistic compartmental modeling, Zhou et al. [275] develop a semiparametric Bayesian probabilistic extension of the classical SIR model, called BaySIR, with time-varying epidemiological parameters to infer the COVID-19 transmission dynamics by considering the undocumented and documented infections and estimates the disease transmission rate by a Gaussian process prior and the removal rate by a gamma prior. To estimate the all-cause mortality effect of the pandemic, Kontis et al. [117] apply an ensemble of 16 statistical models (autoregressive with holiday and seasonal terms) on the vital statistics data for a comparable quantification of the weekly mortality effects of the first wave of COVID-19 and an estimation of the expected deaths in the absence of the pandemic. Other similar stochastic SIR models can also be found such as by assuming a Poisson time-dependent process on infection and reproduction [94] , a beta distribution of infected and removed cases [241] , and a Poisson distribution of susceptible, exposed, documented infected and undocumented infected populations in a city [130] .

Statistical influence modeling of COVID-19 interventions and policies. Apart from modeling the transmission dynamics or forecasting case counts, Bayesian statistical models are also applied in some other areas, e.g., to estimate the state transition distributions by applying certain assumptions such as of the susceptible-to-infected (i.e., the infection rate) or infectedto-death (mortality rate) transition. For example, Cheng et al. [50] use a Bayesian dynamic item-response theory model to produce a statistically valid index for tracking the government response to COVID-19 policies. Dehning et al. [63] combine the established SIR model with Bayesian parameter inference with MCMC sampling to analyze the time dependence of the effective growth rate of new infections and to reveal the effectiveness of interventions. With the inferred central epidemiological parameters, they sample from the parameter distribution to evolve the SIR model equations and thus forecast future disease development. In [241] , a basic SIR model is modified by adding different types of time-varying quarantine strategies such as government-imposed mass isolation policies and micro-inspection measures at the community level to establish a method of calibrating cases of under-reported infections.

Discussion. Statistical modeling and a Bayesian statistical framework allow us to elicit informative priors for parameters that are difficult to estimate due to the lack of data reflecting the clinical characteristics of COVID-19, offer coherent uncertainty quantification of the parameter estimates, and capture nonlinear and non-monotonic relationships without the need for specific parametric assumptions [275] . Compared with compartmental models, statistical models usually converge at different confidence levels for different assumptions (i.e., possible mitigation strategies), providing better interpretability and flexibility for characterizing the COVID-19 characteristics and complexities discussed in Section 2. However, the related work on COVID-19 statistical modeling is limited in terms of addressing COVID-19-specific characteristics and complexities, e.g., asymptomatic effect, the couplings between mitigation measures and case numbers, and the time-evolving and nonstationary case movement.

This section reviews the related work on data-driven discovery, i.e., applying classic (shallow) and deep machine learning methods, AI and data science techniques on COVID-19 data, to discover interesting knowledge and insights through characterizing, representing, analyzing, classifying and predicting COVID-19 problems.

Classic shallow machine learning methods have been predominantly applied to COVID-19 classification, prediction and simulation, as shown by the WHO-based literature statistics in [38] . Typical shallow learning methods include artificial neural networks (ANN), SVM, decision trees, Markov chain models, random forest, reinforcement learning, and transfer learning. These tools are easy to understand and implement and they are more applicable than other sophisticated methods (e.g., deep models and complex compartmental models) NPI policies, case numbers, external data (e.g., social activities), etc.

Descriptive analytics, latent models for sentiment/topic modeling, time-series analysis like regression variants [175, 187, 230, 258] Questionnaire data, social media data, external factors like wellbeing, etc.

Social, economic and workforce influence Descriptive analytics, time-series analysis, numerical methods, stochastic compartmental models [15, 64, 100, 109, 118, 129, 154, 215, 234, 236] Case numbers, data related to economy, trade, supply chain, logistics, social activities, workforce, technology, transport, mobility, sustainability and public resources, etc. Misinformation Descriptive analytics, time-series models, numerical methods, statistical language models [4, 126, 199] Fact data, online texts, social media, case numbers, etc.

for the often small COVID-19 data. They are well explained in the relevant literature (e.g., [47, 157, 160] ) and interested readers can refer to them and other textbooks for technical details. Though different machine learning methods may be built on their respective learning paradigms [36] , their main learning tasks and processes for COVID-19 modeling are similar, including (1) selecting discriminative features x, (2) designing a model (e.g., a random forest classifier) to predict the target :˜x = , x, b with parameters and bias term b, and (3) optimizing the model to fit the COVID-19 data by defining and optimizing an objective function ℒ = arg min −˜x for the goodness of fit between expected˜and actual target (e.g., infective or diseased case numbers). Deep learning as represented by deep neural networks is a more advanced COVID-19 modeling typically favored by computing researchers. Typical models applied in COVID-19 modeling include (1) convolutional neural networks (CNN) and their extensions in particular for images such as ImageNet and ResNet; (2) sequential networks such as LSTM, recurrent neural networks (RNN), memory networks and their variants; (3) textual neural networks such as BIRT, Transformer and their variants; (4) unsupervised neural networks such as autoencoders and generative adversarial networks (GAN); and (5) other neural learning mechanisms such as attention networks.

Typical approaches for COVID-19 deep modeling can be represented by a general deep interaction and prediction framework as follows. It models (1) temporal dependencies over sequential case (x, which may consist of categories of case numbers , and or their rates) evolution, (2) interactions and influence between external containment actions (a, which may consist of various control measures such as masking and social distancing) and case developments, and (3) the influence of personal context (c, which may consist of demographic and health circumstances and symptomatic features on COVID-19 infections) over time .

As COVID-19 case developments are sequential and stochastic to be influenced by many external factors, the framework combines autoencoders for the influence of unknown and stochastic asymptomatic and unreported case dynamics on reported numbers x, RNN for sequential evolution of case numbers, control measures and personal context, and contextual attention for exterior containment strategies applied on case control to model complex interactions between various sources of underlying and control factors in COVID-19 sequential developments. can be treated as an encoder, while the estimation (reconstruction) of x from e by network is a decoding or prediction process.

The interaction and prediction network in Fig. 3 can be implemented in terms of an autoencoder (where and refer to encoding and decoding networks, e.g., [101] ) or LSTM/RNNbased prediction (with for representation and for estimating the next input, e.g., [206] ) framework. Accordingly, the objective function can be defined in terms of the discrepancy between x andx (i.e., arg min ,

x ,x ) or the KL-divergence ( ) with loss ℒ (where h and h −1 refer to the representations of input x interacting with actions a under the context c through gating integration).

Here, COVID-19 shallow learning refers to the application of general shallow or classic machine learning methods to the analytics and modeling of COVID-19 problems and data. It forms the second popular set of modeling methods (about 4k of 22k WHO-listed references) that model COVID-19 outbreak, risk, transmission, uncertainty, anomalies, complexities, classification, variation, and prediction and more specifically case forecasting, medical diagnostics, contact tracing, and drug development [106] . General machine learning methods including ANN, tree models such as decision trees and random forest, kernel methods like SVM, transfer learning, NLP and text mining methods, evolutionary computing like genetic algorithms and fuzzy set, and reinforcement learning are mostly applied in addressing the above COVID-19 tasks by medical, biomedical, computing and social scientists [151, 194, 207] , as discussed below.

Machine learning for COVID-19 outbreak prediction and risk assessment. Typical classifiers like ANN, SVM, decision trees, random forest, regression trees, least absolute shrinkage and selection operator (LASSO), and self-organizing maps are applied to forecast COVID-19 spread and outbreak and their coverage, patterns, growth and trends; estimate and forecast the confirmed, recovered and death case numbers or the transmission and mortality rates; and cluster infected cases and groups, etc. For example, in [108] , logistic regression, decision trees, random forest and SVM are applied to estimate the growth trend and containment sign on the data consisting of factors about health infrastructure, environment, intervention policies and infection cases with accuracy between 76.2% and 92.9%. Evolutionary computing such as genetic algorithm, particle swarm optimization, and gray wolf optimizer forecast COVID-19 infections [161, 200, 224] .

Machine learning for COVID-19 diagnosis on clinical attributes. The machine learning of COVID-19 clinical reports such as blood test results can assist in diagnosis. For example, in [114] , clinic attributes and patient demographic data are extracted by term frequency/inverse document frequency (TF/IDF), bag of words (BOW) and report length from textual clinic reports. The extracted features are then classified in terms of COVID, acute respiratory distress syndrome (ARDS), SARS and both COVID and ARDS by SVM, multinomial naıve Bayes, logistic regression, decision tree, random forest, bagging, Adaboost, and stochastic gradient boosting, reporting an accuracy of 96.2% using multinomial naıve Bayes and logistic regression. In [25] , hematochemical values are extracted from routine blood exam-based clinic attributes, which are then classified into positive or negative COVID-19 infections by decision trees, extremely randomized trees, KNN, logistic regression, naive Bayes, random forest, and SVM. It reports an accuracy of 82% to 86%. The work in [253] applies random forest to identify COVID-19 infections.

Machine learning for COVID-19 diagnosis on respiratory data. Machine learning can be conducted on COVID-19 patient's respiratory data such as lung ultrasound waves and breathing and coughing signals to extract respiratory behavioral patterns and anomalies. For example, logistic regression, gradient boosting trees and SVMs distinguish COVID-19 infections from asthmatic or healthy people on the Android app-based collection of coughs and breathing sounds and symptoms with AUC at 80% [26] .

Machine learning for COVID-19 diagnosis on medical imaging. A very intensive application of classic machine learning methods is to screen COVID-19 infections on CT, chest X-ray (CXR) or PET images. For example, in [43] , the majority voting-based ensemble of SVM, decision tree, KNN, naive Bayes and ANN is applied to classify normal, pneumonia and COVID-19-infected patients on CXR images with an accuracy of 98% and AUC of 97.7%. In [74] , the simple applications of SVM, naive Bayes, random forest and JRip on CT images screen COVID-19 diseases with a reported accuracy of 96.07% by naive Bayes combined with random forest and JRip, in comparison with 94.11% by CNN.

Machine learning for COVID-19 diagnosis on latent features. Further, shallow learners are applied to detect and diagnose COVID-19 infections on latent features learned by shallow to deep representation models on COVID-19 medical images. For example, in [107] , ANN-based latent representation learning captures latent features from gray, texture, histogram, number, intensity, surface and volume features in CT images, then classifiers including SVM, logistic regression, Gaussian naive Bayes, KNN and ANN are applied to differentiate COVID-19 infections from community-acquired pneumonia with 95.5% accuracy reported. In [170] , latent features are extracted from CXR and CT images to form a gray level co-occurrence matrix (GLCM), local binary gray level co-occurrence matrix (LBGLCM), gray level-run length matrix (GLRLM) and segmentation-based fractal texture analysis (SFTA)-based features, which are then oversampled by the synthetic minority over-sampling technique (SMOTE) and further selected by a stacked autoencoder (sAE) and principal component analysis (PCA), before SVM is applied to achieve 94.23% accuracy. In [222] , MobileNetV2 and SqueezeNet extract features from CXR images, which are then processed by social mimic optimization to classify coronavirus, pneumonia, and normal images with 99.27% accuracy by SVM. Lastly, in [225] , a residual exemplar local binary pattern (ResExLBP)-based method extracts features from CXR images, which are then selected by an iterative relief-based method before decision trees, linear discriminant, SVM, KNN and subspace discriminant are applied on the selected features to detect COVID-19 infection with an accuracy of 99.69% to 100.0%.

Modeling the influence of external factors on COVID-19. Various machine learning tasks are undertaken to analyze the relation and influence of external and contextual factors on COVID-19 epidemic attributes. For example, ensemble methods including random forest, extra trees regressor, AdaBoost, gradient boosting regressor, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), CatBoost regressor, kernel ridge, SVM, KNN, MLP and decision trees indicate potential association between COVID-19 mortality and weather data [142] .

Machine learning-driven drug and vaccine development for COVID-19. Machine learning methods are applied to analyze the drug-target interactions, drug selection, and the effectiveness of drugs and vaccines on containing COVID-19. For example, machine learning methods including XGBoost, random forest, MLP, SVM and logistic regression are used to screen thousands of hypothetical antibody sequences and select nine stable antibodies that potentially inhibit SARS-CoV-2 [111, 139] .

Deep learning has been intensively applied to modeling COVID-19 as discussed in Section 5.1, with about 2k of 22k references on modeling COVID-19. Typical applications involve COVID-19 data on daily infection case numbers, health and clinic records, hospital transactions, medical imaging, respiratory signals, genomic and protein sequences, and exterior data such as infective demographics, social media communications, news and textual information, etc. Below, we first discuss a common application of deep learning for COVID-19 epidemic description and forecasting, and then briefly review other applications.

Deep learning of the COVID-19 epidemic. Deep neural networks are intensively applied to characterize and forecast COVID-19 epidemic outbreak, dynamics and transmission. Examples are predicting the peak confirmed numbers and peak occurrence dates, forecasting daily confirmed, diseased and recovered case numbers, and forecasting -day (e.g., = {7, 14, 10, 30, 60 }) infected/confirmed, recovered and death case numbers (or their transmission/mortality rates) through modeling short-range temporal dependencies in case numbers by applying LSTM, stacked LSTM, Bi-LSTM, convolutional LSTM-like RNNs, and GRU [65, 206] . Other work models the transmission dynamics and predicts daily infections of COVID-19 using a variational autoencoder (VAE), encoder-decoder LSTM or LSTM with encoder and Transformer [116] and modified auto-encoder [178] , GAN and their variants, tracks its outbreak [99] , predicts the outbreak size by encoding quarantine policies as the strength function in a deep neural network [60] , estimates global transmission dynamics using a modified autoencoder [98] , predicts epidemic size and lasting time, and combines medical information with local weather data to predict the risk level of a country by a shallow LSTM model [172] . In [265] , a comparative analysis shows that VAE outperforms simple RNN, LSTM, BiLSTM and GRU in forecasting COVID-19 new and recovered cases.

Broad deep COVID-19 learning. In addition, we highlight several other typical application areas of COVID-19 deep learning:

(1) Characterizing symptoms of coronavirus infections, e.g., by pretrained neural networks (e.g., [202] ), with more discussion in Section 6.2.1; (2) Analyzing health and medical records, blood sample-based test reports, and respiratory sounds and signals for diagnosis and treatment e.g. by CNN, LSTM and GRU [194] , with more discussion in Section 6.2.2; (3) Analyzing medical imaging for diagnosis, quarantine and treatment by convolutional neural networks (CNN, e.g., ImageNet and ResNet), GAN and their mutations [103, 194] , with more discussion in Section 6.2.3; (4) Analyzing COVID-19 genomic and protein sequence and interaction analysis by RNN, CNN and their variants for drug and vaccine development, tracing infection sources, and analyzing virus structures and evolution, with more details in Section 6.2.4; (5) Repurposing and developing drugs and vaccines by generative autoencoders, generative tensorial reinforcement learning and generative adversarial networks [270] for generative chemistry discovery;

(6) Analyzing COVID-19 impact on sentiment and emotion by RNN, Transformer-based NLP neural models and their derivatives [67, 131] ; (7) Characterizing the COVID-19 infodemic by NLP and text mining including misinformation identification [208] , enhancing epidemic modeling using social media data [115] , and analyzing the COVID-19 research progress and topic evolution [269] ; (8) Other topics such as analyzing the influence and effect of countermeasures, e.g., the effect of quarantine policies on outbreak using DNNs [60] , with more discussion in Section 7. Table 3 illustrates some typical applications of shallow and deep learning methods for modeling COVID-19. More discussion on COVID-19 deep learning can be found in Section 6.2.

Discussions. Most of the existing studies on shallow and deep COVID-19 modeling directly apply the existing shallow machine learning methods and deep neural networks on COVID-19 data, as shown in reviews like [133, 205, 240] . Our literature review also shows that deep neural models are widely applicable to COVID-19 modeling tasks, which are unnecessarily overwhelmingly applied to all possibilities and significantly outperform time-series forecasters and shallows machine learners. In fact, sometimes, deep models may even lose their advantage over traditional modelers such as ensembles, as shown in Table 3, and Table 5 .

As a complex social-technical issue, COVID-19 modeling brings many specific challenges and research questions from the relevant domains and for domain-specific research communities. In this section, we focus on two major and mostly relevant domains of COVID-19: epidemic modeling, and medical and biomedical analysis.

6.1.1 Epidemiological compartmental models. Epidemiological modeling portrays the statespace, interaction processes and dynamics of an epidemic in terms of its macroscopic population, states and behaviors. Compartmental models are widely used in characterizing COVID-19 epidemiology by incorporating epidemic knowledge and compartmental hypotheses into imitating the multi-state COVID-19 population transitions. An individual in the COVID-19 epidemic sits at one state (compartment) at a time-point and may transit this state to another at a state transmission rate. The individuals of the closed population are respectively labeled per their compartments and migrate across compartments during the COVID-19 epidemic process, which are modeled by (ordinary) differential equations.

By consolidating various COVID-19 epidemiological characteristics, hypotheses and compartmental models, Fig. 4 illustrates a typical COVID-19 state-space and evolution system with major (thick) and minor (thin) states and state transition paths that can be sequentially categorized into four phases:

• Susceptible (S): Individuals ( ) are susceptible to infection under free (uncontained or unrestrained, at uncontained rate ) or contained (restrained,¯at containment rate¯) conditions at a respective transmission rate or¯; • Exposed (E): Free or contained susceptibles are exposed to infection from those who are infected but in the incubation period (which could be as long as 14 days), and may be noninfectious and free ( ) or infectious and contained (¯) at a respective exposure rate or¯; Shallow learners like SVM, ANN, decision tree, random forest and ensemble methods [108] , evolutionary computing methods such as particle swarm optimization [161, 200, 224] , DNN variants such as LSTM, GRU, VAE, GAN and BiLSTM, etc. [60, 65, 99, 116, 178, 206, 265] Epidemic case numbers and external data such as meteorological data, environmental data (e.g., humidity), social activity and mobility data, etc. Infection diagnosis Shallow learners [25, 26, 43, 253] , CNN and RNN variants like LSTM and GRU, and pretrained CNNbased image nets like ResNet, MobileNetV2 and SqueezeNet, etc. [74, 104, 222, 225] , text analysis models [114] Pathological and clinical records, respiratory signals (e.g. coughing and breathing signals and patterns in ultrasound or thermal video), computed tomography (CT) and CXR images, etc. Mortality and survival analysis Shallow learners like SVM, ANN, decision tree, regression tree, random forest and ensemble methods like XGBoost [41, 203, 203] , CNNs, pretrained CNN-based image nets, RNN variants like LSTM and GRU [65, 206, 266] Medical imaging including CT and CXR images, clinical records, patient demographics, case numbers, external data, etc. Medical treatment Shallow machine learning methods and DNNs, etc. [18, 254, 260, 274] Health/medical records, pharmaceutical treatments, ICU data, etc. Genomic and protein analysis, drug/vaccine development Shallow classifiers like SVM and ensembles, frequent pattern and sequence analysis methods, CNN variants, RNN variants, attention networks, GAN, autoencoders, reinforcement learning, NLP models like Transformer, etc. [7, 18, 97, 111, 145, 146, 156, 270, 271] Genomic data, proteomic data, drug-target interactions, molecular reactions, etc.

Shallow learners like linear discriminant, SVM, KNN and subspace discriminant, combining classifiers with compartmental models, DNN variants, sequence analysis , etc. [12, 193] Resurgence case numbers, virus strain genome and protein sequences, NPI data, external data, etc. NPI evaluation Various Bayesian models , combining compartmental models with classifiers or estimators, DNNs , etc. [73, 80] Case numbers, NPI policies, external data, etc.

NLP models like LDA and topic models and DNN variants like BERT and Transformer variants, etc. [14, 67, 131, 131, 150, 158, 245] Social media data, news feeds, Q/A data, external factors, etc. Socioeconomic influence Relation (e.g., correlation and causality) analysis , topic modeling by NLP models [269] Social, economic and workforce activities, case numbers, etc.

Classic NLP models, correlation analysis, shallow learners, outlier detectors, DNN variants like BERT and Transformer mutations [126, 147, 208] Social media, online texts, Q/A data, news feeds, etc.

See Table 5 for deep COVID-19 medical imaging analysis; such methods are applicable but not much work is reported in the literature; See Table 2 for NPI effect modeling.

• Infective (I): Those exposed become infectious and may be detected (registered/documented and known to medical authorities, ) or undetected (unreported/undocumented and unknown to medical management,¯); also, some initially undetected infectives may be further detected and converted to detected infectives at rate¯; some documented infectives may be symptomatic and quarantined ( ) at quarantine rate while others may be asymptomatic and unquarantined (¯) at unquarantined rate¯; there may be some rare cases (at rate ) who carry the virus and infection for a long time with or without symptoms, called lasting carriers ( ); in addition, some initially asymptomatic cases may transfer to symptomatic and quarantined at rate¯;

• Removed (R): Unquarantined infectives may recover at recovery rate¯or die at mortality rate¯respectively, the same for quarantined infectives at rate or and unknown/undetected infectives at rate¯or¯; some quarantined infectives may present acute symptoms even with life threat, who are then hospitalized (H) at rate or even further ventilated (V) at rate ; hospitalized infectives may recover or die at rate or , the same for ventilated at rate or .

In practice, the above COVID-19 state-space may be too complicated to model and not all states are characterizable by the available data. Accordingly, a focus is on those main states and their transitions when the corresponding data is available, for example, susceptible, exposed, infectious (which consists of both detected and undetected), recovered, and diseased. Below, we illustrate the differential equations of the states S, E, I, Q,¯, R and D which are the main states of a closed COVID-19 population. Here, (1) , , , ,¯, and represent the fraction of the population at each state; (2) and are impacted by containment measures at the containment rate and at who are contained and recovered; and (3) the state transitions take place at the rates shown in Fig. 4 .

We summarize and discuss the related work on the major tasks of COVID-19 epidemiological modeling, and highlight the work on modeling COVID-19 epidemic transmission processes, dynamics, external factor influence, and resurgence and mutation.

COVID-19 epidemiological modeling tasks. Epidemiological models dominate COVID-19 modeling (about 3.5k publications of the 22k reported in the WHO literature) by epidemic researchers and computing scientists through the expansion or hybridization with other models such as statistical models and machine learning methods. COVID-19 compartmental modeling aims to answer several epidemiological problems: (1) the growth (spread) of COVID-19 and its case number movements at different epidemiological states to forecast case numbers in the next days or periods; (2) the basic reproduction rate 0 that informs the contagion and transmission level and control strategies; (3) the sensitivity and effect of control measures on infection containment and case movements; and (4) the sensitivity and effect of strategies for herd immunity and mass vaccination. Accordingly, various compartmental models are customized to cater for specific assumptions, settings and conditions of modeling COVID-19, as discussed in Section 6.1.2. For (1), with historical case numbers of a country or region and the initial settings of hyperparameters, we can estimate the parameters and further predict the number over time, e.g., the number of infections and deaths in a country or city. Regarding (2), with the state-space shown in Fig. 4 , to resolve these differential equations, we first obtain the population projection matrix corresponding to all states and their transition probabilities. The projection matrix can be converted to a state transition matrix (where each element is the probability of an individual transferring from state at time to state at time 1) and a fertility (reproductive) matrix (where an element refers to the reproduced number of -state offsprings of an individual at state ), i.e., =

. Further, we can calculate the fundamental matrix : = − −1 with identity matrix to represent the expected time spent in each state and that to death. Then, we can obtain another matrix : = with each entry referring to the expected lifetime production number of -state offspring by an individual at stage [39, 211] . Its dominant eigenvalue is the net reproduction rate 0 . With regard to (3), since control measures such as social distancing and lockdown may influence the growth of case numbers and reproduction and transmission rates, we can analyze the sensitivity of adjusting related parameters on the case numbers and rates. To explore the opportunities for herd immunity and mass vaccination in (4), the herd immunity rate and vaccination rate are expected to be greater than 1 − 1 0 to eradicate the disease.

In addition to these major problems, below, we further discuss two applications of epidemiological modeling in COVID-19: modeling its transmission (which is also the most explored area) and resurgence and mutation (which is a recent challenge). More discussion on modeling the NPI effect on COVID-19 transmission and epidemic is in Section 7.1.

Modeling COVID-19 epidemic transmission process. Studies on modeling COVID-19 epidemic transmission mainly focus on evaluating the epidemiological attributes (e.g., infection rate, recovery rate, mortality, reproduction number, etc.), predicting the infection and death counts, and revealing the transmission, spread and outbreak trends under experimental or real-world scenarios. As illustrated in Table 4 , various compartmental models are available to characterize COVID-19. For example, the SIDARTHE compartmental model considers eight stages of infection: susceptible ( ), infected ( ), diagnosed ( ), ailing ( ), recognized ( ), threatened ( ), healed ( ) and extinct ( ) to predict the course of the epidemic and to plan an effective control strategy [81] . A new compartment is introduced to the classic SIR model to quantify those who are symptomatic, quarantined infecteds [141] . Further, a stochastic SHARUCD model framework contains seven compartments: susceptible ( ), severe cases prone to hospitalization ( ), mild, sub-clinical or asymptomatic ( ), recovered ( ), patients admitted to the intensive care units ( ), and the recorded cumulative positive cases ( ), which include all new positive cases for each class of , , , , and deceased ( ) [5] . In addition, several models involve new compartments to represent asymptomatic features to mild symptoms [5, 250] and undocumented cases [130] .

Modeling COVID-19 epidemic dynamics, transmission and risk. Classic compartmental models assume constant transmission and recovery rates between state transitions. This assumption is taken in many SIR variants tailored for COVID-19, which cannot capture the disease characteristics in Section 2.1. To cater for COVID-19-specific characteristics especially when mitigation measures are involved, the classic susceptible-infectious-recovered (SIR) [110] and susceptible-exposed-infectious-recovered (SEIR) models [13] , which were applied to modeling other epidemics like measles and Ebola, are tailored for COVID-19. Since COVID-19 transmission contains more states, especially with interventions, SIR/SEIR models are extended by adding customized compartments like quarantine, protected, asymptomatic and immune [5, 57, 81, 141, 250] . Accordingly, to capture the evolving COVID-19 epidemiological attributes including time-variant infection, mortality and recovery rates, time-dependent compartmental models are proposed. For example, a time-dependent SIR model adapts the change of infectious disease control and prevention laws as city lockdowns are imposed and traffic halt with the control parameters infection rate and recovery rate modeled as time-variant variables [49] . Dynamical modeling is also considered in temporal SIR models with temporal susceptible, insusceptible, exposed, infectious, quarantined, recovered and closed (or death) cases in [177] . An early-stage study of a dynamic SEIR model estimates the epidemic peak and size, and an LSTM further forecasts its trend after taking into account public monitoring and detection policies [261] .

Modeling the influence of external factors on the COVID-19 epidemics. COVID-19 epidemic dynamics reflect the time-varying states, state transition rates, and their vulnerability to contextual and external factors such as a person's ethnicity and public health conditions and social contacts and networking [135] . To depict the influence of external factors, more complex compartmental models involve the relevant side information (e.g., NPIs, demographic features such as age stratification and heterogeneity, and social activities such as population mobility) into their state transitions. Examples include an age-sensitive SIR model [51] which integrates known age-interaction contact patterns into the examination of potential effects of age-heterogeneous mitigations on an epidemic in a COVID-19-like parameter regime, an age-structured SIR model with social contact matrices and Bayesian imputation [212] , and an age-structured susceptible-exposed-infectious-recovered-dead (SEIRD) model that identifies no significant susceptibility difference between age groups [165] . More about the NPI influence on the COVID-19 epidemic is in Section 7.1. In addition, environmental factors, especially humidity and temperature, may affect COVID-19 virus survival and the epidemic's transmission [27, 164, 242, 257] despite inconsistent conclusions. In [59] , variational mode decomposition decomposes COVID-19 case time series into multiple components and then a Bayesian regression neural network, cubist regression, KNN, quantile random forest and support vector regression (SVR) are combined to forecast six-day-ahead case movements by involving climatic exogenous variables.

Modeling COVID-19 resurgence and mutation. Our current understanding of COVID-19 resurgence and mutation is very limited while the British, South African, Indian and other newly-emergent mutations show higher contagion and complexities [85, 86] . COVID-19 may indeed become another epidemic disease which remains with humans for a long time. Imperative research is expected to quantify the virus mutation and disease resurgence conditions, forecast and control potential resurgences and future waves after lifting certain mitigation restrictions and reactivating businesses and social activities [137, 174] , distinguish the epidemiological characteristics, age sensitivity, and intervention and containment measures between waves [8, 84] , compare the epidemiological wave patterns between countries experiencing mutations and resurgences and compare COVID-19 wave patterns with influenza wave patterns [72] , predict resurgences and mutations (e.g., by estimating the daily confirmed case growth when relaxing interstate movement, mobility and contact restrictions and social distancing by SEIR-expanded modeling) and prepare for countermeasures on future waves [12] . Limited research results are available in the literature on the above broad issues. For example, a comparative analysis in [21] shows the differences in the second COVID-19 wave in Europe in Italy and indicates the different causing strategies taken by them in implementing facemasks, social distancing, business closures and reopenings. In [32] , building on fitting the first wave data, an epidemic renormalisation group approach further simulates the dynamics of disease transmission and spreading across European countries over weeks by modeling the European border control effects and social distancing in each country. In [127] , an SIR model estimates the scenarios of incurring a potential second wave in China and the potential case fatality rate if containment measures such as travel ban and viral reintroduction from overseas importation are relaxed for certain durations in a population with a certain epidemic effect size and cumulative count after the first wave. In [174] , an SEIR model incorporates social distancing to model the mechanism (closure releasing) of forming the second wave, the epidemiological conditions (ranges of transmission rate and the inverse of the average infectious duration) for triggering the second and third waves, and the socioeconomic (economic loss due to lockdown) and intervention (novel social behavior spread) factors on case numbers. In [137] , a revised stochastic SEIR model estimates different resurgence scenarios reflected on infections when applying time-decaying immunity, lockdown release, or increasing implementation of social distancing and other individual NPIs.

Discussion. COVID-19 compartmental models excel at modeling epidemiological hypotheses, processes and factors with domain knowledge and interpretation. Such models often assume constant state-space transitions, capture average behaviors and the contagion of a closed population, and are sensitive to initial states and parameters. Challenges and opportunities exist in expanding its traditional frameworks to address the specific COVID-19 complexities and challenges in Section 2. Examples are time-varying, non-IID dynamics and complex couplings between interior and exterior factors related to COVID-19 populations, management groups and contexts. Other important issues include understanding how vaccination and specific vaccines affect coronavirus mutation and discovering the relationships between interior and exterior factors and the resurgence and mutations. [49, 177] Case numbers, reporting time information, etc. Asymptomatic transmission Asymptomatic to mild symptoms [250] , undocumented [130] , SHARUCD differing mild and asymptomatic from severe infections [5] , undocumented cases [130] , epidemiological interventions with serological tests, age-dependent and asymptomatic settings [250] , etc.

Case numbers, reporting information, symptoms, demographics, etc.

Age-sensitive SIR model [51] , age-structured SIR model with social contacts [212] , age-structured SEIRD [165] , public monitoring and detection policies [261] , ethnicity, public health conditions and social contacts [135] , environmental factors [27, 164, 242, 257] Case numbers, demographics, health conditions, social activities, environmental factors, etc.

Lockdown and social distancing [49] , lockdown [5] , quarantine [57] , symptomatic and quarantined infecteds [141] , selfprotection and quarantine [177] , etc.

Case numbers, NPIs, health conditions, test results, demographics, etc.

Second waves [12] , wave difference [72, 84] , reopening business and social activities [174] , time-decaying immunity and easing lockdown and social distancing [137] , age sensitivity [8] , NPI influence on future waves [12] , travel ban and virus importation [127] Case numbers, multiwave data, NPI and external data, etc.

Compartmental model for simulating 'shield immunity' in a population [250] Case numbers, serological tests, etc.

A wide range of research issues may benefit from such analyses, including but not limited to: COVID-19 infection diagnosis, prognosis and treatment, virology and pathogenesis analysis, potential therapeutics development (e.g., drug repurposing and vaccine development), genomic similarity analysis and sourcing, and contact tracing. In this section, we summarize the medical and biomedical modeling of COVID-19 infection diagnosis and case identification, risk and prognosis analysis, medical imaging analysis, pathological and treatment analysis and drug development.

Given the high transmission and reproduction rates, high contagion, and sophisticated and unclear transmission routes of COVID-19 and its virus strains such as the Delta strain, it is crucial to immediately identify and confirm exposed cases, test positive or negative infections, identify the infected virus variant types, and trace their origins and contacts so as to timely and proactively implement appropriate quarantine measures and contain their potential spread and outbreak [159] . This is particularly important during the varying incubation periods which are often asymptomatic to mildly symptomatic yet highly contagious particularly for the virus variants. The SARS-CoV-2 diagnosis and test methods include (1) chemical and clinical methods, typically nucleic acid-based molecular diagnosis and antibody-based serological detection; (2) medical imagingdriven analysis, such as symptom inspection from CXR and CT images; (3) clinical diagnoses and tests like respiratory signal analysis, such as on the abnormal patterns of the lung's ultrasound waves and coughing and breathing signals; and (4) other noninvasive methods such as by involving SARS-CoV-2 and its disease data and external data [45, 47] . Data-driven discovery also plays an increasingly important role in improving COVID-19 diagnosis. Due to the virus and disease complexities, alternative and complementary to the chemical and clinical diagnosis approaches, COVID-19 identification [226] can benefit from analyzing biomedical images, genomic analysis, symptom identification and discrimination, and external data including social contacts, social activities, mobility and media communications, etc. by data-driven discovery [36] .

• Nucleic acid-based diagnosis test (NAT) [3, 76, 263] refers to various molecular diagnosis test methods, including non-isothermal amplification (e.g., the real-time reverse transcription polymerase chain reaction (RT-PCR) test, which is the golden standard of COVID-19 diagnosis), isothermal amplification (e.g., CRISPR-based), and sequencingbased tests. Such methods may benefit from modeling techniques including gene and protein sequence analysis and drug-target and virus-host interaction analysis. It is highly sensitive and usable for large-scale operations, but it is expensive as typically it is done using specific test materials and in labs, and is less accurate as it is subject to the varied quality and quantity of specimen collections. The challenges are to reduce its false-negative and false-positive rates supplemented by other diagnosis tools and develop scalable fast test tools. • Antibody-based serological diagnosis [125, 132, 176] is to detect anti-SARS-CoV-2 immunoglobulins i.e. the antibodies produced in response to COVID-19 infections by validating the specificity and sensitivity of chemiluminescent immunoassays, enzymelinked immunosorbent assays and lateral flow immunoassays against SARS-CoV-2. It is an alternative or complement to NATs for acute infection diagnosis with easier and cheaper operations at any time. It, however, may produce poor-performing results which are unreliable for decision-making, it may take time to get the results, and it might be difficult for early large-scale diagnosis. There is an urgent need to develop more accurate serological test methods and tools. Machine learning methods such as CNNs could improve test performance e.g. by analyzing the test results, involving external data on patient demographics and clinical results, and integrating various test results [144] . • Clinical diagnosis and analysis involves clinical reports, domain knowledge and clinicians in identifying COVID-19-specific symptoms, indications and infections, differentiating them from other similar diseases such as influenza, and confirming positive, negative, severe or fatal conditions. Such diagnoses are conducted by blood tests, cough sound judgment, breathing pattern detection, and external factors by involving external data, etc. AI, machine learning and analytics methods are increasingly being used to classify COVID-19 from other diseases, predict infections, recovery and mortality rates, numbers or timing, etc. [25, 26, 114] . For further discussion, see Section 5.2.

• Clinical medical imaging analysis for COVID-19 inspection on COVID-19-sensitive medical images, typically by DNN-based image analysis, can complement the aforementioned chemical and medical methods by detecting abnormal and discriminative symptoms and patterns sensitive to COVID-19 in patient's CXR and CT images. Both typical deep and shallow learning methods are widely applied, which also present inconsistencies and biases in their applications, experiments, results and actionability [197] . For further discussion, see Section 6.2.3. • Data-driven prediction on COVID-19 related data such as blood test results, respiratory signals, and external data that may indicate symptoms, patterns or anomalies of COVID-19 infections. Shallow and deep learning and mathematical modeling methods are applied to classify the symptom types, differentiate COVID-19 infections from other diseases, or detect outliers that may indicate COVID-19 infections. For example, in [253] , a random forest algorithm-driven assistant discrimination tool extracts 11 top-ranking clinically available blood indices from 49 blood test samples to identify COVID-19 infectives from suspected patients. In [202] , computer audition is used to recognize COVID-19 patients under different semantics such as breathing, dry/wet coughing or sneezing, and speech during colds, etc. AI4COVID-19 [102] combines the deep domain knowledge of medical experts with smart phones to record cough/sound signals as the input data to identify suspect COVID-19 infections with 92.8% accuracy reported. In [152] , a shallow LSTM model combines medical information and local weather data to predict the risk level of the country. Table 6 . As commented in various reviews [20, 132, 148, 218, 231] , COVID-19 diagnosis and tests still suffer from various limitations and challenges. The issues include concerns about result quality, implementation scalability, actionability for determining isolation and quarantine strategies, and trustfulness of accepting medical findings as general clinical specifications. An increasing number of studies appear promising by incorporating advanced data science and AI techniques to complement medical and chemical test approaches and tools, to integratively enhance preanalytical and postanalytical test results, and strengthen the interpretability and actionability of the results for clinicians, microbiological staff and public health authorities.

COVID-19 patient risk assessment identifies the risk factors and parameters associated with patient infections, disease severity, and recovery or fatality to support accurate and efficient prognosis, resource planning, treatment planning, and intensive care prediction. This is crucial for early interventions before patients progress to more severe illness stages. Moreover, risk and prognosis prediction for patients can help with effective health and medical resource allocation when intense monitoring, such as that involving ICU and ventilation and more urgent medical interventions are needed and prioritized. Machine learning models and data-driven discovery can also play a vital role in such risk factor analysis and scoring, prediction, prioritization and planning of prognostic and hospitalization resources and facilities, treatment and discharge planning, and the influence and relation analysis between COVID-19 infection and disease conditions and the external environment and context (e.g., weather conditions and socioeconomic statuses).

Techniques including mathematical models, and shallow and deep learners are applicable on health records, medical images, and external data. For example, LightGBM and Cox proportional-hazard (CoxPH) regression models incorporate quantitative lung-lesion features and clinical parameters (e.g., age, albumin, blood oxygen saturation, CRP) for prognosis prediction [268] , their results showing that lesion features are the most significant contributors in clinical prognosis estimation. Supervised classifiers like XGBoost are applied on electronic health records to predict the survival and mortality rates of severe COVID-19 infectious [263] High sensitivity, suitable for large-scale operation Preliminary assessment by technicians, professional data analysis, expensive, less accurate, falsenegative or false-positive results Nasal, nasopharyngeal or oropharyngeal swab, aspiration, saliva or wash specimens Serological diagnosis [132] Easy and cheap to implement, no requirement of experts inspection [197] Fast and automated detection, data-driven analysis Need trained experts, costly in labeling and early detection, train data scarcity CT and CXR images Data-driven prediction [47] Algorithmic prediction by data-driven analytics and learning on data relevant to the COVID-19 diagnosis

Any relevant data including clinical test results and genomic/protein sequences patients [203, 259] for the detection, early intervention and potential reduction of mortality of high-risk patients. In [189] , logistic regression and random forest are used to model CT radiomics on features extracted from pneumonia lesions to predict feasible and accurate COVID-19 patient hospital stay, which can be treated as one of the prognostic indicators. Further, shallow and deep machine learning methods are applied to screen COVID-19 infections on respiratory data including lung ultrasound waves, coughing and breathing signals. For example, in [104] , a bidirectional GRU network with attention differentiates COVID-19 infections from normal on face-based videos captured by RGB-Infrared Sensors with 83.69% accuracy. Lastly, external data can be involved for risk analysis; e.g., the work in [142] analyzes the association between weather conditions and COVID-19 confirmed cases and mortality.

A rapidly growing body of research literature on COVID-19 medical image processing is available, which involves both shallow and deep learning methods especially pretrained CNN-based image nets in learning tasks such as feature extraction, region of interest (ROI) segmentation, infection region/object detection, and disease/symptom diagnosis and classification, etc. Typical COVID-19 medical imaging data includes CXR and CT images of lung (lobes or segments), lesion, trachea and bronchus. The most commonly used DNNs are pretrained or customized CNN, GAN, VGG, Inception, Xception, ResNet, DenseNet and their variants [103, 209] .

Further, CNN-based transfer learning models, deep transfer learning and GAN are applied on CXR images to detect COVID-19 pneumonia and its segmentation and severity [54, 112, 149] . On chest CT images, CNNs like ResNet, DenseNet and VGG16 and the inception transfer model are applied to classify COVID-19 infected patients and detect and localize COVID-19 pneumonia and infection regions [11, 184, 214, 244, 272] .

The application of DNNs in COVID-19 medical imaging analysis show significant performance advantages. For example, several references report close-to-perfect prediction performance of pretrained DNNs on CXR images (e.g., achieving accuracy and F-score 100 [136] , AUC 100 [188] and 99.97 [140] , and accuracy and F-score 98 [155] ), in contrast to the lower performance of customized networks on CT images (e.g., with accuracy 99.68 [90] , AUC 99.4 [74] and F-score 94 [74, 214] ). The highly promising medical imaging analysis results provide strong evidence and support to further case confirmation, medical treatment, hospitalization resource planning, and quarantine, etc. Table 5 illustrates various DNNs applied on medical imaging for COVID-19 screening and abnormal infection region segmentation, etc. For example, various CNNs such as shallow CNN, truncated InceptionNet, VGG19, MobileNet v2, Xception, ResNet18, ResNet50, SqueezeNet, DenseNet-121, COVIDX-Net with seven different architectures of deep CNN models, GoogleNet, AlexNet and capsule networks [2, 17, 61, 92, 152, 240] are applied to analyze CXR images for screening COVID-19 patients, assisting in their diagnosis, quarantine and treatments, and differentiating COVID-19 infections from normal, pneumonia-bacterial and pneumonia-viral infections.

The modeling of COVID-19 pathology and treatment aims to characterize virus origin and spread, infection sources, pathological findings, immune responses, and drug and vaccine development, etc. The formulation of molecular mechanisms and pathological characteristics underlying viral infection can inform the development of specific anti-coronavirus therapeutics and prophylactics, which disclose the structures, functions and antigenicity of SARS-CoV-2 spike glycoprotein [237] . The pathological findings pave the way to design vaccines against the coronavirus and its mutations. For example, the higher capacity of membrane fusion of SARS-CoV-2 compared with SARS-CoV is shown in [256] , suggesting the fusion machinery of SARS-CoV-2 as an important target of developing coronavirus fusion inhibitors. Further, human angiotensin coverting enzyme 2 (hACE2) may be the receptor for SARS-CoV-2 [169] informing drug and vaccine development for SARS-Cov-2. In [238] , a structural framework for understanding coronavirus neutralization by human antibodies can help understand the human immune response upon coronavirus infection and activate coronavirus membrane fusion. The kinetics of immune responses to mild-to-moderate COVID-19 discloses clinical and virological features [220] . Data-driven analytics are applied in COVID-19 virology, pathogenesis, genomics and proteomics and collecting pathological testing results, gene sequences, protein sequences, physical and chemical properties of SARS-CoV-2, drug information and its effect, together with their domain knowledge. This plays an important role in discovering and exploring feasible drugs and treatments, drug discovery, drug repurposing, and correlating drugs with protein structures for COVID-19 drug selection and development. For example, a pre-trained MT-DTI (molecule transformer-drug target interaction) deep learning model based on the self-attention mechanism identifies commercially available antiviral drugs by finding useful information in drug-target interaction tasks [18] . The GAN-based drug discovery pipeline generates novel potential compounds targeting the SARS-CoV-2 main protease in [271] . In [270] , 28 machine learning methods including generative autoencoders, generative adversarial networks, genetic algorithms, and language models generate molecular structures and representations on top of generative chemistry pipelines and optimize them with reinforcement learning to design novel drug-like inhibitors of SARS-CoV-2. Further, multitask DNN screens candidate biological products [97] . In [145, 146] , CNN-enabled CRISPR-based surveillance supports a rapid design of nucleic acid detection assays.

For genome and protein analysis, frequent sequential pattern mining identifies frequent patterns of nucleotide bases, predicts nucleotide base(s) from their previous ones, and identifies the genome sequence locations where nucleotide bases are changed [156] . In [7] , a bidirectional RNN classifies and predicts the interactions between COVID-19 non-structural proteins and between the SARS-COV-2 virus proteins and other human proteins with an accuracy of 97.76%.

Classic and deep machine learning methods such as classifiers SVM and XGBoost, sequence analysis, multi-task learning, deep RNNs, reinforcement learning such as deep Q-learning network, and NLP models are applied to SARS-COV-2 therapy discovery, drug discovery, and vaccine discovery [111] . Examples are the rule-based filtering and selection of COVID-19 molecular mechanisms and targets; virtual screening of protein-based repurposed drug combinations; identifying the links between human proteins and SARS-COV-2 proteins; developing new broad-spectrum antivirals, and molecular docking; identifying functional RNA structural elements; discovering vaccines such as predicting potential epitopes for SARS-COV-2 and vaccine peptides by LSTM and RNNs, and analyzing protein interactions, molecular reactions by neural NLP models such as Transformer variants. Table 7 briefly illustrates the applications of modeling in supporting COVID-19 treatments and drug and vaccine development.

Discussion. Most of the literature on COVID-19 medical and biomedical analytics directly applies existing mathematical models, shallow and pretrained deep models. There are gaps and opportunities in characterizing COVID-19-specific characteristics and domain knowledge into tailored modeling and training deep neural networks on usually small and quality-limited COVID-19 data and involving multimodal COVID-19 data to discover more informative medical and biomedical insights. 

Data Treatment Data-driven diagnosis-informed treatment e.g. pathological analysis, medical imaging analysis, immune reaction, genomic and proteomic analysis [103, 103, 209, 268] Pathological, clinical, virological, genomic, proteomic data Drug development Correlating drugs with protein structures and molecule transformer for drug-target interactions [18] , DNNs like GANs and multitask DNNs for drug discovery [97, 271] , machine learning and language models to generate molecular structures and drug-like inhibitors [270] Virological, genomic, proteomic data Vaccine development Sequence analysis and sequential modeling like LSTM and RNN variants and NLP models like Transformer variants for functional RNA structures, vaccine epitopes and peptides, protein interactions and molecular reactions [111] Genomic and proteomic data

COVID-19 has had an unprecedented and overwhelming influence and impact on all aspects of our life, society and economy, posing significant health, economic, environmental and social challenges to the entire world and human population [40] . Over 3k references of the 22k literature involve the topic of influence and impact modeling. In this section, we review and summarize the modeling and analysis methods and results on many broad areas affected by SARS-CoV-2 and COVID-19. These include the modeling of the effect of COVID-19-sensitive NPIs and the COVID-19 healthcare, psychological, economic and social influence and impact.

On one hand, pharmaceutical measures, drug and vaccine development play fundamental roles [274] . On the other, to control the outbreaks of COVID-19 and its further influence on various aspects of life, governments adopt various NPIs such as travel restrictions, border control, business and school shutdown, public and private gathering restrictions, mask-wearing, and social distancing. For example, travel bans and lockdown are issued to decrease cross-boarder population movement; social distancing and shutdowns minimize contacts and community spread; schooling closures and teleworking reduce indoor gatherings and workplace infections. Although these control measures flatten the curve, they also undoubtedly change the regular mobility and activities of the population, normal business and economic operations, and the usual practices of our daily businesses.

A critical modeling issue is to characterize, estimate and predict how such NPIs influence COVID-19 epidemic dynamics, infection spread, case development, and population structure including deceased, medical resource and treatment allocation, and human, economic and business activities. Accordingly, various modeling tasks involve epidemiological, statistical and social science modeling methods and their hybridization (typically stochastic compartmental models) to evaluate and estimate the effects, typically by aligning the NPIs with case numbers for correlation and dependency modeling. Below, we summarize a few aspects of NPI influence.

Modeling the effect of NPIs on COVID-19 epidemic dynamics. This typically models the correlations between COVID-19 cases and NPIs, the NPI influence on COVID-19 epidemic factors including transmission rate and case numbers, and the NPI influence on improving recovery rates and lowering death rates. Various SIR and statistical modeling variants evaluate the effects of such control measures and their combinations on containing the virus spread and controlling infection transmission (e.g., per transmission rate) and estimate the corresponding scenarios (distributions) of case number development [24, 63, 221] . For example, in [177] , a generalized SEIR model includes the self-protection and quarantine measures to interpret the publicly released case numbers and forecast their trend in China. The effect of control measures, including city lockdowns and travel bans implemented in the first 50 days in Wuhan and their effect on controlling its outbreak across China in terms of infection case numbers estimated by an SEIR model before and after the controls is described in [221] .

Often, various NPIs are jointly implemented to contain a COVID-19 epidemic. It may be reasonable that multiple NPIs cooperatively reduce the epidemic effective reproduction number [24, 77, 121, 185] . In [24] , a temporal Bayesian hierarchical model incorporates auxiliary variables describing the temporal implementation of NPIs, which infers the effectiveness of individually (estimated 13% to 42% reduction of reproduction number) and conjunctionally (77% reduction of reproduction number) implementing NPIs such as staying-at-home, business closures, shutting down educational institutions and limiting gathering sizes in terms of their influence on the reproduction number. In [77] , a hierarchical Bayesian model infers the impact and effectiveness of NPI (including case isolation, school closure, mass gathering ban, social distancing) on the infections, reproduction number 0 , effect sizes of population, and death tolls in 11 European countries and suggests continued interventions to keep the epidemic under control.

Modeling NPI influence on public resources including healthcare systems. The implementation of NPIs affects the demand, priority and effectiveness of anti-pandemic public health resources and the planning and operations of healthcare systems. For example, in [73] , an SEIR model and a polynomial regressor simulates the effect of early detection, isolation, treatment, adequate medical supplies, hospitalization and therapeutic strategy on COVID-19 transmission, in addition to estimating the reproductive number and confirmed case dynamics. The SIDARTHE model [81] simulates possible scenarios and the necessity of implementing countermeasures such as lockdowns and social distancing together with population-wide testing and contact tracing to rapidly control the pandemic. The SHARUCD model [5] predicts the COVID-19 transmission response (in terms of infection cases, growth rate and reproduction number) to the control measures including partial lockdown, social distancing and home quarantining and differentiates asymptomatic and mild-symptomatic from severe infections, which could inform the prioritization of healthcare supplies and resources.

Modeling NPI influence on human activities. This explores the relations between COVID-19 NPIs and human mobility, travel, and social and online activities. For example, in [118] , the alignment between human mobility and case number development in Wuhan and China presents the effect of travel restrictions on case reduction and COVID-19 spread. In [109] , a simple SEIR model analyzes the tracing contacts in UK social network data, estimates the scenarios of COVID-19 infection control and subsequent untraced cases and infections, and shows the efficacy of close contact tracing in identifying secondary infections. In [80] , MCMC parameter estimation and a metacommunity Susceptible-Exposed-Infected-Recovered (SEIR)-like disease transmission model shows the need for planning emergency containment measures such as restrictions on human mobility and interactions to control COVID-19 outbreak (by 42% to 49% transmission reduction). In [83] , mobile phone data is collected and analyzed to inform COVID-19 epidemiologically relevant behaviors and response to interventions. Weitz et al. [250] develop and analyze an epidemiological intervention model that leverages serological tests to identify and deploy recovered individuals as focal points for sustaining safer interactions by interaction substitution, developing the so-called 'shield immunity' at the population scale. It is shown that the change of contact patterns could dramatically decrease the probability of infections and reduce the transmission rate of COVID-19 [75, 123, 267] .

Discussion. The many diverse applications of SIR-based modeling of COVID-19 invention and policy effects enable an epidemiological explanation. Such methods assume each NPI independently acts on case movement. This leaves open issues including characterizing the effectiveness of individual NPIs by assuming they are coupled with each other and cooperatively contribute to flatten the curves; and exploring the interactions between NPIs, case development, and external factors including people's behaviors and environmental factors without disentangling them (opposite to the method of DNNs-based decoupled, homogeneous and independent representations and learning).

A common concern is the influence of COVID-19 on individual and public psychological and mental health [258] . Typical tasks are to characterize, classify and predict social-mediabased individual and public emotion and sentiment and their mental health. These may be sensitive to the COVID-19 outbreak, health and medical mitigation, NPI measures, government governance, public healthcare system performance, vaccine, resurgence and coronavirus mutations, and the 'new normal' including working from home and online education, etc. The data involved are from social media and networks such as Twitter, Facebook, Wechat, Weibo, YouTube, Instagram and Reddit; online news feeds, discussion boards, blogs and Q/A; and instant messaging such as mobile messaging and apps.

Negative sentiments [158, 245] , opinion and topic trends, online hate speech [233] , psychological stress, men's and women's worries [230] , responsive emotions [95] and behaviors and events [14] can be characterized, clustered or classified on short and long texts by simply applying NLP processing techniques. Examples are extracting TF-IDF and part-of-speech features, shallow NLP and text analysis models including BOW and latent Dirichlet allocation (LDA), and neural text modelers including DNN variants such as BioBERT, SciBERT and Transformer variants on the word, sentence or corpus level. For example, in [258] , the preferred reporting items for systematic reviews and meta-analyses guidelines are used to review the COVID-19 impact on public mental health, disclosing the extent of symptoms and risk factors associated with anxiety (6.33% to 50.9%), depression (14.6% to 48.3%), posttraumatic stress disorder (7% to 53.8%), psychological distress (34.43% to 38%) and stress (8.1% to 81.9%) in the surveyed population of 8 countries.

Discussion. The existing modeling of COVID-sensitive psychological influence often misses psychological knowledge because it is purely driven by data; the analytical results are based on a cohort of infected people owing to its anonymous nature; no work is reported on fusing various sources of data including online misinformation to infer the predominant drivers of specific mental stress such as vaccination hesitation; and the targeted analysis of specific mental issues in vulnerable groups, such as COVID-driven teenage suicide and racism.

The COVID-19 pandemic has incurred overwhelming and devastating impact on regional and global economy and business activities including trade, tourism, education exchange, logistics, supply chain, workforce and employment. It seems that no economy on the interconnected globe is immune from the negative consequences of COVID-19 [52] . A critical modeling task is to quantify how COVID-19 influences various aspects of the economy and businesses, how to manage and balance COVID-19 control measures (including NPIs and vaccination rollouts) and government relief and recovery programs, and how to sustain and recover business and economic activities without seriously suffering from uncontrollable outbreaks and resurgences for better sustainability in the COVID new normal.

Modeling the COVID-19 impact on economic growth. A rapidly growing body of research investigates the heterogeneous, non-linear and uncertain macroeconomic effects of COVID-19 across regions and sectors in individual countries, as well as on a global scale. It is estimated that COVID-19 and SARS-CoV-2 may cause over 2% monthly GDP loss and a 50% to 70% decline in tourism [40] . In [181] , a sectoral macroeconomic model analyzes the short-term effects of intervention measures such as lockdown, social distancing and business reopening on economic outcomes such as production network, supply and demand, inventory dynamics, unemployment and consumption and estimates their influence on the relations between reproduction number and GDP. The study in [236] illustrates the relations between a country's income levels, public healthcare availability and capacity and the COVID-19 infected patient's demography and social patterns in low-to middle-income countries.

Modeling the COVID-19 impact on workforce and sustainability. COVID-19 drives the new normal of working, including a hybrid work mode, cloud-based enterprise operations, the shift from centralized infrastructures (including IT) to cloud-based ICT and home-based workplaces, and new ways of ensuring sustainability including engaging and supporting clients through online operations and services and AI-enabled cost-effective planning, production, logistics and services. In [15] , the descriptive statistics of the daily activities of Baidu developers show the positive and negative impacts of working from home on developer productivity, particularly on large and collaborative projects. The survey conducted in [154] shows the various impacts of COVID-19-driven work from home on the scientific workforce, including the time spent on work, parenting distraction, and impact on laboratory-based projects. The analysis in [129] in Australia shows the impact of government welfare support responses to COVID-19-infected people and businesses on mitigating potential unemployment, poverty and income inequality and the sustainability of such support measures.

Discussion. The existing modeling objectives, tasks and methods are highly preliminary, specific and limited. Expectations include macro-, meso-and micro-level modeling of economic impact by involving their economic-financial variables and activities, contrastive analysis with similar historical events and periods, and data-driven discovery of insights for a sustainable tradeoff between mitigation and economic growth in the new normal, to name a few.

The COVID-19 pandemic has had significant impact on public health, welfare, social, political and cultural systems, including restricting human activities, affecting people's well-being, causing an overwhelming burden on public health systems, reshaping sociopolitical systems, and disturbing social regularity such as incurring online information disorder. This section reviews the relevant modeling work on such social impacts.

Modeling the COVID-19 influence on human behaviors. In addition to the COVID-19sensitive NPI influence on human activities as discussed in Section 7.1, SARS-CoV-2 and COVID-19 have fundamentally reshaped people's social activities and habits. For example, Baidu-based daily transportation behaviors and simple statistics were collected which show high-level mobility patterns such as visiting venues, origins, destinations, distances, and transport time during the COVID-19 epidemic in China [100] . In [83] , large-scale mobile phone data such as call detail records, GPS locations, Bluetooth data and contact tracing apps are collected and analyzed by off-the-shelf tools to extract statistic metrics and patterns of behaviors, mobility and interactions. The results may inform population behaviors, individual contacts, movement paths and mobility patterns, and networking, in addition to evaluating the effectiveness of NPIs and informing COVID-19 responses such as contact tracing. In [95] , social media data from Sina Weibo, the Baidu search engine, and 29 Ali ecommerce marketplaces were collected and analyzed using keyword-based linguistic inquiries and statistics like word frequencies and Spearman's rank correlation coefficient analysis. Keywords are extracted to show people's behavioral responses to COVID-19 outbreaks, public awareness and attention to COVID-19 protection measures, concerns about misinformation and rumors about ineffective treatments, and the correlation between risk perception and negative emotions.

Modeling the COVID-19 influence on public health systems. The sudden COVID-19 endemic or pandemic and its mysterious resurgence has resulted in the imperative, nonscheduled and overwhelming rationing demand on healthcare and medical professionals, public health and medical resources and supplies including oxygen, hospital beds and facilities, ICU facilities and ventilators, medical waste processing equipment, hygiene protection equipment such as medical masks and sanitization chemicals, and intervention materials and devices. How to plan, prioritize, ration and manage these resources, assess their supply/demand and effects to prioritized hotspots and regions and optimize their reorganization per local and global needs and population-wide well-being are some challenging issues to model and optimize. In [69] , recommendations are made to allocate medical resources to both COVID-19 and non-COVID-19 patients to maximize benefits, prioritize health workers, avoid a first-come, first-served approach, in a way that is evidence-based and involves science and research.

Modeling the COVID-19 influence on sociopolitical systems. The COVID-19 influence on social and political systems is unprecedented. This influence extends to the confidence and trust in existing sociopolitical systems such as public and moral values, national interests, social welfare systems, human services, political relations, globalization, scientific exchange and collaborations, science-driven epidemic mitigation policies and strategies, and the impact on social governance and disaster management. For example, in [29] , an identity fusion theorybased online sampling and a moral foundations theory-based computer simulation show the correlations between nationalism, religiosity, and anti-immigrant sentiment from a sociocognitive perspective during the COVID-19 pandemic in Europe. The surveys undertaken in [119] show that the scientific uncertainty of COVID-19-oriented modeling and findings affect the public and political trust in science-based policy making in the US and suggest more careful science communications. The work in [210] evaluates the impact of COVID-19 on globalization and global health, in particular, mobility, trade, travel, event management, food and agriculture, and a pandemic vulnerability index quantitatively measures the potential impact on global health and the countries most impacted.

Modeling misinformation and disorder in the COVID-19 infodemic. The COVID-19 infodemic has been accompanied by a large volume of misinformation (partially or entirely inaccurate or misleading information), biased (polarized), questionable or unverified information, rumor and propaganda. Such information is harmful for correctly understanding, recognizing, intervening, and preventing the COVID-19 pandemic. Its diffusion is usually fast, its spread is often wide, and its impact is typically devastating. Modeling the COVID-19 misinformation and information disorder involves tasks such as detecting and ranking misinformation, classifying them, undertaking fact checks and cross-references, tracing their sources and transmission paths, discovering their diffusion and propagation networks and paths, and estimating their effects on the COVID-19 epidemic spread and control. For example, in [53] , skip-gram is used to represent the words collected from Twitter, Instagram, YouTube, Reddit and Gab; the converted vector representations are then clustered by partitioning them around medoids and cosine distance-based similarity analysis to extract the topics of concern. An SIR model is then applied to estimate the basic reproduction number of the social media-based COVID-19 infodemic. A comparative analysis then estimates and compares the platform-dependent interaction patterns, information spread (w.r.t. reproduction rate), questionable and reliable information sourcing and differentiation, and rumor amplification across the above platforms. In [248] , SVM classifies credible and misinformation from Twitter texts and a correlation analysis shows the predominant credible information on wearing masks and social distancing can lead their misinformation with a time lag. In [4] , bivariate (ANOVA) and multivariate logistic regression identifies similar belief profiles of political orientation, religious commitment, and trust in science in survey-based narratives and compares the profiles of those who are disinformed or conspiratorial with scientific narratives. Further, the statistics on Weibo tweets show the COVID-19 misinformation evolution related to topics and events such as city lockdowns, cures, preventive measures, school reopening and foreign countries, the bias involving cures and preventive methods, and sentiment evolution such as fear of specific topics [126] . The work in [147] applies SVM, logistic regression and BERT to classify COVID-19 misinformation and counter-misinformation tweets, characterizes the type, spread and textual properties of counter-misinformation, and extracts the user characteristics of the citizens involved.

Discussion. Typical research on COVID-19 influence and impact modeling only involves local and regional COVID-19 data and their affected objects, hence the resultant conclusions are limited in the ability to indicate their applicability to general practice and broad pandemic control. More robust results are expected to inform medical and public health policy-making on medication, business and society. No-to-rare outcomes are available on how NPIs influence the threshold and effects of COVID-19 vaccinations and herd immunity and on how to balance NPIs and economic and social revivification. It is difficult to find actionable evidence and guidelines on what policies should be implemented and what tradeoff is appropriate in balancing a COVID-19 outbreak and containing resurgence with economic and social business recovery. [75, 80, 83, 109, 118, 123, 250, 267] Case data, NPIs, human activities (incl. mobile phone data and mobility), etc.

Psychological influence on individual mental health Psychology, systematic reviews, classic and neural NLP models e.g. BOW, LDA, SciB-ERT, Transformer variants, etc. [14, 158, 230, 245, 258] Identity, social media data, news feeds, Q/A, surveys, instant messaging, behaviors, NPIs, etc. on public mental health Psychology, systematic reviews, classic and neural NLP models e.g. BOW, LDA, SciB-ERT, Transformer variants, etc. [95, 158, 233, 245, 258] Social media data, news feeds, Q/A, surveys, instant messaging, public emotion, activities and events, NPIs, etc. on mental health Psychology, systematic reviews and metaanalyses, classic and neural NLP models, statistics, etc. [258] Social media data, questionnaires, instant messaging, behavior and events, NPIs, etc.

Economic impact on economic growth Time series analysis, descriptive analytics, macroeconomic modes, relational models, etc. [40, 181, 236] Economic data, case data, NPIs, etc. on workforce and sustainability Descriptive analytics, time-series analysis, relational models, etc. [15, 129, 154] Work and sustainabilityrelated data, performance, employment, surveys, social welfare data, etc.

Social impact on human behaviors Descriptive analytics, pattern analysis, social media/network analysis, NLP models, etc. [83, 95, 100] Public, online and household activities, gathering, mobility data, mobile phone data, social media data, etc. on public health systems Descriptive analytics, relational models, etc. [69] Public health and medical data, public hygiene data, case data, etc. on misinformation Classifiers, classic and neural NLP models, social media/network analysis, sentiment/topic modeling, time-series analysis, outlier detection, etc. [4, 53, 126, 147, 248] Social media data, news feeds, Q/A, cross/fact-check, etc. on sociopolitical systems Descriptive analytics, sociopolitical methods, survey analysis, etc. [29, 119, 210] Social and political data, case data, surveys, questionnaires, sociopolitical events, etc.

Despite being a small focus (over 1.5k of the 22k publications on modeling), simulation is an essential means to understand, imitate, replicate and test the working mechanisms, the epidemic transmission processes, the evolution and mutation of COVID-19 and its virus SARS-CoV-2, the interactions and self-organization between factors, the effect of mitigation measures and various interior and contextual factors, and resource planning and optimization such as healthcare resource allocation. Typical simulation methods include dynamic systems, state-space modeling, discrete event simulation, agent-based modeling, reinforcement learning, Monte-Carlo simulation, and hybrid simulation [58] . Below, we summarize the relevant work on simulating the COVID-19 epidemic evolution and the effect of interventions and policies on COVID-19 epidemic development.

Simulating the COVID-19 epidemic evolution. One important but unclear question is how does the COVID-19 evolve over time in the community. What-if analyses can be applied to estimate infection case numbers and their evolution under various hypotheses tests [275] . Typical methods include SIR variants, statistical and mathematical models, e.g., introducing control measure-sensitive variables into such models to estimate their effects on infections, reproduction number, transmission rate, and outbreak control after implementing or relaxing certain interventions. For example, in [78] , composite Monte Carlo simulation conducts the what-if analysis of future COVID-19 epidemic development possibilities on top of the estimation made by a polynomial neural network on COVID-19 cases, then fuzzy rule induction outputs decision rules to inform epidemic growth and control. In [82] , an agent-based simulation system simulates a COVID-19 patient's demographic, mobility and infectious disease state (susceptible, exposed, seriously-infected, critically-infected, recovered, immune and dead) information and their dynamic interactions between each other (agents, i.e., people in epidemiology) in certain environments (home, public transport stations, and other places of interest), and evaluates the effect of adjusting individual and social distancing (separation) on epidemics (e.g., numbers in each state).

Simulating the policy effects on the COVID-19 epidemic. Another important task is to simulate how interventions, interior and external factors, and other policies and control measures of interest influence the dynamics of the COVID-19 epidemic. For example, a discrete-time and stochastic agent-based simulation system (Australian Census-based Epidemic Model) [44] incorporates 24 million software agents, where each agent mimics an Australian individual in terms of their demographics, occupation, immunity and susceptibility to COVID-19, contact rates in their social contexts, interactions, commuting and mobility patterns, and other aspects, which are informed by census data from the Australian government. The system evaluates various scenarios by adjusting the level of restrictions on case isolation, home quarantine, international air travel, social distancing and school closures and their effects on COVID-19 pandemic consequences in terms of the reproductive number, the generation period, the growth rate of cumulative cases, and the infection rate for children. The simulation provides evidence to help the Government understand how COVID-19 is transmitted and what policies should be implemented to control COVID-19 in Australia. In [141] , an SIR model is extended by adding variables reflecting symptomatic infections and the quarantine of susceptibles, which then estimates the case development distribution as subexponential after implementing the quarantine. In [262] , an attributed heterogeneous information network incorporates the representations of external information about the COVID-19 disease features, the population's demographic features, mobility and public perception of sentiment into a GAN model, which then assesses the hierarchical community-level risks of COVID-19 to inform interventions and minimize disruptions.

Discussion. Although we mention many aspects and questions that could be (better) addressed by simulation, very limited research is available in this direction. In addition to the above two aspects closely relevant to COVID-19 epidemic dynamics, other important topics include simulating the mutation and resurgence of the coronavirus and COVID-19 in communities with different social, ethnic and economic conditions; the influence of individual and compound COVID-19-sensitive policies on social, economic and psychological aspects; and the tradeoff between the strength and width of mitigation strategies and their impact on society and the economy.

Hybrid COVID-19 modeling can be categorized into the following families: (1) multi-objective modeling: to address multiple problems and multiple business and modeling objectives at the same time, such as jointly understanding COVID-19 epidemic dynamics and the corresponding effective NPI policies; (2) multi-task modeling: to handle multiple modeling tasks, e.g., simultaneously forecasting daily confirmed, death and recovered case numbers;

(3) multisource (multimodal etc.) COVID-19 data modeling: to involve multiple sources of internal and external data for modeling, e.g., supplementing environmental and demographic data with case numbers and complementing case numbers with medical imaging and social mobility data; (4) hybrid methods for COVID-19 modeling: typically by sequentializing (i.e., multi-phase) or parallelizing multiple tasks or methods from different disciplines and areas, e.g., integrating statistical methods, shallow or deep learning methods, and evolutionary computing methods into compartmental models; and (5) hybrid modeling with multi-methods from various disciplines on multisource COVID-19 data for multi-objective or multi-task modeling.

COVID-19 multi-objective modeling is commonly seen in COVID-19 modeling, as shown in Sections 4-8, where, multiple business problems and learning objectives are involved in one research or case study. Examples are forecasting COVID-19 transmission and its sensitivity to external factors such as the patients' age groups, hygiene habits and environmental factors; modeling the influence of NPIs and people's ethnic conditions on case movements; modeling the influence of NPIs on both case trends and public psychological health; and survival/mortality rate estimation and the influence analysis of dependent factors such as the patients' health conditions. Typical methods include multivariate analysis, probabilistic compartmental models, simulation systems, multi-objective evolutionary learning methods, and DNN variants. For example, in [183] , a regression model estimates the relations between reproduction number and environment factors and human movements. In [63] , Bayesian inference of an SIR model infers the effect of various interventions on new infections. In [174] , an SEIR models the relations between case trends and epidemic conditions, socioeconomic effect, and interventions. In [258] , systematic reviews and meta-analyses review the work on the relations between COVID-19 symptom severity, risk factors and public emotions.

COVID-19 multisource data modeling serves various purposes such as predicting COVID-19 epidemic spread and transmission, medical diagnosis and treatment, and government and community interventions by combining data from respective modalities, sources or views. Examples of multisource data are combining COVID-19 case numbers with NPI data; people's demographics, health conditions, mobility, social and business activities, social networking and media information; health and medical records, diagnosis information, treatments, pharmaceutical interventions, and pathological tests; social and public activities and events, economic data, and sociopolitical data; and online, social media and mobile apps-based messaging, news, Q/A, and discussion groups. Typical methods include data fusion-based learning, mixed representations-based learning, clustering and classification on mixed data types, DNN variants, etc. [107] . For example, a novel variational-LSTM autoencoder model in [101] predicts the coronavirus spread in various countries by integrating historical confirmed case numbers with urban factors (about location, urban population, population density, and fertility rate) and governmental measures and responses (school, workplace and public transport closures, public events cancellation, contact tracing, public information campaigns, international travel controls, fiscal measures, and investment in health care and vaccines). In [142] , COVID-19 case numbers and weather data are combined to analyze the correlation between COVID-19 confirmed cases and mortality and weather factors. NLP methods can extract and analyze related news, which are then input to LSTM networks to update the infection rate in a susceptible-infected epidemic model [273] , which shows to beat the susceptible-infected epidemic model and its combination with LSTM. In [216] , coupling LSTM with an epidemic model forecasts COVID-19 spread on case data, population density and mobility.

COVID-19 hybrid methods integrate various methods for single or multiple-objective/task/source learning. In addition to ensemble learning by integrating the results from multiple learners such as ensemble trees and XGBoost, often multiple methods are sequentially involved to learn specific tasks or data over phases; other common tasks are to integrate compartmental models with other methods such as statistical models, classifiers and DNNs for the improved forecasting of COVID-19 epidemic dynamics and attributes. For example, in [266] , a hybrid model predicts the infected and death cases by integrating a genetic algorithm to optimize infection rates and integrating LSTM for parameter optimization into a modified susceptible-infected-quarantined-recovered (SIQR) epidemic model. In [41] , a regression tree combined with wavelet transform predicts COVID-19 outbreak and assesses its risk on case numbers. In [16] , a baseline method generates a granular ranking (discrimination) of severe respiratory infection or sepsis on the medical records of the general population, then a decision-tree-based gradient boosting model adjusts the former predicted results in subpopulations by aligning it with the published aggregate fatality rates. In addition to the aforementioned methods, other methods and tasks e.g. for innovative pandemic responses are available in the literature. Examples are automated primary care tools to alleviate the shortage of healthcare workers [219] , expert systems and chatbots for symptom detection and lessening the mental health burden [150] , IoT and smart connecting tools to prevent outbreaks, remotely monitoring patients, and prompting enforcement of guidelines and administrative orders to contain future outbreaks [88] .

Discussion. Though various methods of hybridization have been summarized in this section, the relevant research is not systematic, comprehensive, or substantial. This observation applies to hybrid data, hybrid tasks, and hybrid methods. The complex characteristics and challenges of both the virus and disease and of modeling their problems and data, as discussed in Sections 2.2, 2.1 and 2.3 , are substantial. Though overwhelming efforts have been made in modeling COVID-19, the above complexities require significant novel developments through synergizing problems, data, and modeling techniques.

In the above review of each category of COVID-19 modeling techniques, a brief discussion has been provided on the main limitations, gaps and opportunities in those areas. Here, we expand this specific discussion to broad major gaps in the research on modeling COVID-19. Further, we discuss various open issues and opportunities for future research.

Two major aspects of modeling gaps include: the gaps in understanding the virus and disease nature, and the gaps in modeling their complexities.

10.1.1 Gaps in understanding the problem nature. Since the virus is new and unique, we have limited knowledge on all aspects of the SARS-CoV-2 virus and COVID-19, such as virus characteristics, epidemiological attributes and dynamics, socioeconomic influence, and virus mutations, and so on. Specifically, our poor understanding of the intrinsic and intricate pathological, biomedical and epidemiological attributes of the evolving SARS-CoV-2 and COVID-19 systems limits the modeling attempts and contributions. As a result, our understanding of the virus and disease is still insufficient without substantial knowledge and comprehensive evidence on the system complexities; it is biased to specific data, conditions or settings; it is shallow without deep insights into the virus and disease nature; and it is partial without a full picture of the SARS-CoV-2 and COVID-19 complexities, in understanding the COVID complex systems and their data complexities [35, 247] .

To address these issues, the modeling has to start with building a comprehensive understanding of the virus nature and the fundamental complexities of the COVID-19 complex systems. Of the many questions to explore, we highlight the following important unknowns, which require cross-disciplinary scientific explorations by integrating medical science, virology, bio-medicine, and data-driven discovery.

• The hidden nature of SARS-CoV-2 and COVID- 19 10.1.2 Gaps in modeling the system complexities. The modeling gaps come from both a poor understanding of the virus and disease nature and the limitations in modeling their characteristics and complexities. On one hand, even though massive efforts have been made in modeling COVID-19, the existing modeling work is still in its early stage. The weaknesses and limitations of the existing work lie in

• an average description of the population-wise coronavirus and the disease's epidemiological characteristics and observations after applying mitigation and control measures, no fine-grained and microlevel analysis and findings are available; • a direct application of existing (even very simple and classic) modeling methods without COVID-specific and optimal modeling mechanisms, typically by applying overparameterized or independently pretrained deep neural models or complex statistical and compartment models on low-quality and often small COVID data; • simple data-driven modeling purely motivated by applying advanced models (typically deep models) on COVID-19 data without a deep incorporation of domain and external knowledge and factors; and • a purposeful design without a comprehensive design or exploration of the multi-faceted COVID-19 characteristics and challenges in one framework or system.

On the other hand, the general applications of existing methods also present unsuitability and incapability in tackling the complexities of the complex virus and disease. Table 9 compares the major modeling methods and their pros and cons in modeling COVID-19. Consequently, it is common that the existing models and their modeling results

• often only reflect a specific population or cohort-based average estimation or hypothesis of epidemic transmission, losing a personalized applicability to individual cases or scenarios, making it difficult to undertake personalized treatment; • are too specific to expand to other countries and scenarios, hard to reproduce and transfer to other regions without (significant) changes, making it unsuitable for broad applications; • over-or under-fit the given data and hypothesis settings, they are difficult to validate in a fine-grained way and have weak robustness or generalization for a general but deep understanding of the problems; and • lack the ability and capacity to disclose the intrinsic nature and general insights about the SARS-CoV-2 virus, COVID-19 disease, and their interventions.

There are enormous opportunities and future directions in modeling COVID-19, including (1) fundamentally characterizing the system complexities, (2) addressing the aforementioned limitations of existing work, and (3) exploring new directions and alternatives. These are particularly valid for AI, data science and machine learning, which play a dominating role in the data-driven COVID-19 modeling. 

Temporal representations and interaction modelings of periodic and aperiodic components, relations and trends of COVID-19 cases at different states (e.g., new, susceptible, infected, recovered and death) and external temporal factors Weak modeling power involving other rich factors (e.g., demographics and clinical attributes) and complex data characteristics (e.g., nonstationarity) and discovering the insight of COVID-19 driving forces and interventions General machine learning Multifaceted factor and relation analysis, outlier detection, profiling, classification, prediction and impact analysis for disease diagnosis and case detection on small and poor-quality COVID-19 data, etc.

Poor modeling of weak but complex interactions, couplings, high-dimensional dependencies, heterogeneity, nonstationarity and other data challenges in multisource COVID-19 data

Modeling distributional dynamics, uncertainty and dependency with analytical explanation and parameter settings Requires informative prior knowledge, high modeling and computation complexity on poor-quality COVID data Epidemiological modeling Built on epidemic knowledge, straightforward but domain-friendly and explainable hypothesis test, strong characterization of infection processes, state transitions, and parameter selection Captures complex epidemic transmission characteristics, factors, causal relations and processes in COVID-19 developments; hypothesis of homogeneous disease transmissions Deep learning Performs well with large and complex COVID-19 data (e.g., medical imaging)-based case and disease prediction and identification with annotated samples; pretrained model easily adaptable to new tasks Requires annotated ground-truth of COVID-19 learning targets, easy to overfit small COVID-19 data, vulnerable results, poor interpretability, high computational cost

Imitates and replicates complex COVID-19 mechanisms and processes, cost-effective, reproducible and riskaverse, manually controllable for purposeful test and optimization

Proper knowledge and hypotheses about COVID-19 transmission and factor interactions, high experimental complexity, inactionable for evolving and random real-life scenarios Hybrid methods Flexible and powerful in selecting and combining small COVID-19 multisource data and relevant multimethods on demand for combined COVID-19 learning tasks and data Understands constituents for their best ensemble to address specific COVID-19 challenges with appropriate design complexity, less flexible in combination optimization and explanability 10.2.1 Characterizing the system complexities. To discover the mysteries of the COVID virus and disease, the most important opportunities come from understanding their nature and system characteristics and complexities, as discussed in Sections 2.1 and 10.1.1. Combining the domain-driven and data-driven thinking and techniques, there are various directions in characterizing the problem nature and system complexities:

• extracting, representing and distinguishing observable and latent factors and metrics to describe the epidemiological, biological (genomic), medical (clinical and pathological) and social attributes, liveliness and dynamic processes of the virus, virus mutations, the disease and its variants from other similar viruses and diseases;

• identifying and characterizing external entities and factors (e.g., drugs, vaccines, ethnics, environment) and how they interact with the virus and disease and influence their evolution; • characterizing and simulating the diversified (e.g., explicit vs. implicit, global vs. local, domain-specific vs. general) interactions and relations between the above-extracted explicit and implicit internal and external factors and their dynamics; • quantifying and simulating the virus and disease's system dynamics and genetic mechanisms (e.g., self-organization, genomic expression, genetic crossover and mutation, interaction and adaptation with external environment) in terms of temporal, dependent variables and major transformations; • simulating and quantifying the virus parasitism, interactions, adaptation and evolution with human, animal and living hosts in a large scale.

To address the modeling gaps in Section 10.1.2 and those rarely and poorly explored areas and challenges in Section 2.2, we here highlight the following major directions.

Rarely to poorly addressed areas. First, opportunities to focus on the areas rarely or poorly addressed in the existing COVID-19 modeling include: (1) characterizing the effective NPIS on the variants of the SARS-CoV-2 virus and comparing them with those on the original strains; (2) quantifying the effects of COVID-19 vaccines, pharmaceutical and NPI interventions on the infection control, mobility, mental health, society and the economy, e.g., the efficacy of vaccinated population percentage on herd immunity, and the effect of variable close-contact interactions and individual actions on epidemic de-escalating; (3) balancing the NPI strength and the socioeconomic recovery, e.g., modeling the effect of full vs partial business close-downs and border control on virus confinement at different stages and for different sectors, and characterizing the effect of increasing daily commuting and workforce movement vs working-from-home and telecommuting on the virus confinement; (4) capturing the temporally evolving interplay and interactions between virus propagation and external interventions; and (5) modeling target problems by systemically coupling relevant multisource data and multiple modeling techniques, e.g., by involving pathogenrelated, societal, environmental and racist factors and the disparities between developing and developed countries, age groups, and races.

Hybrid modeling. Second, the hybridization of relevant data and techniques offers significant opportunities to improve and expand the existing modeling capacity and results. Examples include integrating (1) coarse-grained and fine-grained modeling, e.g., epidemic modeling by SIR variants to inform further specific NPI's effect analysis; (2) static and dynamic modeling, e.g., from population-based static epidemic modeling to specific NPI-varying and time-varying case forecasting; (3) observable and hidden factors and relations, e.g., multisource-based attributed modeling with deep abstraction and representation of interactions between the multisource factors; (4) local-to-micro-level and global-to-macro-level factors, e.g., involving patient clinical and demographic records with their environmental and socioeconomic context in survival and mortality prediction and medical resource planning; and (5) domain, data and models for domain-specific, interpretable, evidence-based and actionable findings. These typically involve compound modeling objectives, multisource data, and multi-method ensembles.

Enhanced COVID-19 modeling. Third, another set of new opportunities is to undertake sequential or multi-phase modeling, such as (1) from coarse-grained to fine-grained modeling: e.g., applying epidemic models like SIR and SEIR on COVID-19 in the initial stage and then modeling the impact of NPIs, the mobility and behaviour change of a population on epidemic dynamics; (2) from static to dynamic modeling: e.g., testing constant epidemic parameters and then time-varying settings such as NPI-sensitive varying parameters; and (3) from core to contextual factors: e.g., modeling epidemic processes on case data and then involving pathogen-related, societal and environment (like temperature and humidity) variables to model their influence on epidemic movements.

Lastly, alternative opportunities exist by (1) developing COVID-19-specific modeling methods, benchmarks and evaluation measures to address the virus and disease's challenges and their data challenges for an intrinsic interpretation of the virus and disease nature and dynamics; (2) trans-disciplinarily integrating the relevant domain knowledge and hypotheses from biomedical science, pathology, epidemiology, statistics and computing science to address multifaceted challenges in the virus, disease, data and modeling and to form a comprehensive understanding of the virus and disease; (3) defining multifaceted modeling objectives and tasks to directly address comprehensive epidemiological, clinical, social, economic or political concerns and their challenges in one framework; and (4) ethical and explainable COVID-19 modeling with privacy-preserving and distributed heterogeneous information integration, augmentation, representation and learning by utilizing personal computing devices (e.g., smart phones) and cloud analytics.

In addition to many specific perspectives, such as hybridizing modeling objectives, data and methods in Section 10.2.2 and addressing the shortcomings in Section 10.1.2, we here highlight some other opportunities that may particularly benefit (from) AI, data science and machine learning advances.

Quantifying the virus nature and complexities. An imperative yet challenging task for the AI, data science and machine learning communities is to 'quantify' the nature and complexities of the virus and disease and address the fundamental questions on the virus nature and complexities raised in Section 10.1.1. Building on multi-disciplinary knowledge such as on epidemiology, genetic computing and theories of complex systems, large-scale agent-based epidemic simulation systems are demanding to test and improve genetic, clinical and epidemiological hypotheses and knowledge about the virus and characterize the virus' genetic evolution mechanisms. Data-driven discovery of COVID mysteries. There is increasing and comprehensive sources of COVID-19 data available publicly and through private providers. Data-driven discovery on this COVID-19 data can substantially leverage other domain-specific research on COVID to disclose the mysteries of COVID.

• COVID data genomics: forming the data genomics of COVID for a person, country, community or task by automatically extracting and fusing all possibly relevant data, e.g., contacts, personal health, mobility, clinical reports, exposure to infected people, and household and public activities in a privacy-preserving manner. • COVID data augmentation: developing new techniques to address the various data quality issues embedded in the data, as discussed in Section 2.2 and novel augmented analytics and learning methods to directly learn from poor quality COVID data. • All-purpose representation of COVID attributes: learning the representations on allrelevant COVID data that can be used to describe the full profile of COVID and support diverse learning objectives and tasks in an ethical and privacy-preserving manner. • Automated COVID screening and diagnosis: developing techniques and systems to automatically detect, screen, predict and alert potential infection of the virus and disease on the COVID data genomics. • Virus detection and interaction modeling: developing personal IoT assistants and sensors to detect the virus, trace its movement and its origin and visualize the 'COVID net' showing its propagation paths, interactions and networking with other viruses and hosts. • COVID knowledge graph: generating knowledge graph showing the ontology about the virus; ontological connections between concepts on the virus; relations between knowledge on the virus and its protection, intervention, treatment and influence; and important highlights such as new knowledge discovered and misinformation detected. • COVID safety and risk management: developing systems and tools (including mobile apps) for personal and organizational daily management of their COVID safety and risk, e.g., COVID-safe physical and emotional health management, mobility planning, risk estimation and alerting, infection tests, immunity estimation, and compliance management. • Metasynthetic COVID decision-support systems: developing evidence-based decision support systems to fuse real-time and relevant big data, simulate and replay the outbreaks, estimate NPI effects, discover evidence from data and modeling, engage domain experts in the modeling and optimization processes, generate recommendations for decision-making, and support the data-driven analytics and management of severe disasters and emergencies.

The COVID-19 pandemic's short-to-long-term influence and impact on public health (both physical and mental health), human daily life, global society, economy and politics is unprecedented, lasting, evolving yet quantified and verified. This review paints a comprehensive picture of the field of COVID-19 modeling. The multidisciplinary methods including mathematical modeling, AI, data science and deep learning on COVID-19 data have deepened our understanding of the SARS-CoV-2 virus and its COVID-19 disease's complexities and nature; contributed to characterizing their propagation, evaluating and assisting in the effect of preventive and control measures, detecting COVID-19 infections, predicting next outbreaks, and estimating the COVID-19 influence and impact on psychological, economic and social aspects. The review also highlights the important demands and significant gaps in deeply and systemically characterizing COVID-19-related problems and complexities; and developing effective, interpretable and actionable models to characterize, measure, imitate, evaluate and predict broad-based challenges and problems and to proactively and effectively intervene in them. Such COVID-19 modeling research proposes many significant challenges and opportunities to the multidisciplinary modeling communities in the next decade. These include not only immediately gaining intrinsic knowledge and proactive insight about the evolving coronavirus and its disease outbreak, infection, transmission, influence and intervention; but also preparing to tackle future global health, financial, economic, securityrelated and other black-swan events and disasters.

Srinivasan Venkatramanan, and Anil Vullikanti. 2020. Data-driven modeling for different stages of pandemic response

Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images

Molecular diagnostic technologies for COVID-19: Limitations and challenges

Misinformation about COVID-19: evidence for differential latent profiles and a strong association with trust in science

Modelling COVID 19 in the Basque Country from introduction to control measure response

Sharaf Jameel Malebary, and Omar Mohammed Barukab. 2020. The number of confirmed cases of covid-19 by using machine learning: Methods and challenges

A Novel Protein Mapping Method for Predicting the Protein Interactions in COVID-19 Disease by Deep Learning

Age differential analysis of COVID-19 second wave in Europe reveals highest incidence among young adults

Forecasting the spread of COVID-19 in Kuwait using compartmental and logistic regression models

Stochastic epidemic models and their statistical analysis

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

Preparing for a future COVID-19 wave: insights and limitations from a data-driven evaluation of non-pharmaceutical interventions in Germany

Seasonality and period-doubling bifurcations in an epidemic model

Narjes Nikzad-Khasmakhi, and Shervin Minaee. 2020. Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder

How does Working from Home Affect Developer Productivity? -A Case Study of Baidu During

Doron Netzer, Ran Balicer, and Noa Dagan. 2020. Developing a COVID-19 mortality risk prediction model when individual-level data are not available

Deep learning for screening covid-19 using chest x-ray images

Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model

Understanding COVID-19 transmission through Bayesian probabilistic modeling and GIS-based Voronoi approach: a policy perspective

Challenges and Controversies to Testing for COVID-19

The Europe second wave of COVID-19 infection and the Italy "strange" situation

Development of a prognostic model for mortality in COVID-19 infection using machine learning

Mathematical Models in Epidemiology

Inferring the effectiveness of government interventions against COVID-19

Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study

Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data

Analysis of meteorological conditions and prediction of epidemic trend of 2019-nCoV infection in 2020

Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysis

Modelling Threat Causation for Religiosity and Nationalism in Europe

Estimating the Extent of True Asymptomatic COVID-19 and Its Potential for Community Transmission: Systematic Review and Meta-Analysis

Second wave COVID-19 pandemics in Europe: a temporal playbook

Second wave COVID-19 pandemics in Europe: A temporal playbook

A novel spatio-temporal interpolation algorithm and its application to the COVID-19 pandemic

Combined mining: Analyzing object and pattern relations for discovering and constructing complex yet actionable patterns

Metasynthetic Computing and Engineering of Complex Systems

Data Science Thinking: The Next Scientific, Technological and Economic Revolution

What does COVID-19 modeling tell us about the pandemic?

How have global scientists responded to modelling COVID-19?

Matrix population models

COVID-19 outbreak: Migration, effects on society, global environment and prevention

Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis

A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: A study of a family cluster

Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble

Modelling transmission and control of the COVID-19 pandemic in Australia

COVID-19 Clinical Diagnostics and Testing Technology

Fang Tian, and Xuejun Zhu. 2020. Roles of meteorological conditions in COVID-19 transmission on a worldwide scale

2020. A survey on applications of artificial intelligence in fighting against covid-19

Risk factors of fatal outcome in hospitalized subjects with coronavirus disease 2019 from a nationwide analysis in China

A time-dependent SIR model for COVID-19 with undetectable infected persons

COVID-19 government response event dataset (CoronaNet v. 1.0)

Modeling strict age-targeted mitigation strategies for COVID-19

Economic consequences of Covid-19: A counterfactual multi-country analysis

Fabiana Zollo, and Antonio Scala. 2020. The COVID-19 Social Media Infodemic

Predicting covid-19 pneumonia severity on chest x-ray with deep learning

Covid-19 image data collection: Prospective predictions are the future

Shifting patterns of seasonal influenza epidemics

Modeling the early evolution of the COVID-19 in Brazil: Results from a Susceptible-Infectious-Quarantined-Recovered (SIQR) model

How simulation modelling can help reduce the impact of COVID-19

Forecasting Brazilian and American COVID-19 cases based on artificial intelligence coupled with climatic exogenous variables

Neural Network aided quarantine control model estimation of COVID spread in Wuhan

Truncated inception net: COVID-19 outbreak screening using chest X-rays. Physical and engineering sciences in medicine

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

Supply and demand shocks in the COVID-19 pandemic: An industry and occupation perspective

Ajay Kaarthic Jeysree, Irfan Ahmad Khan, and Eklas Hossaine. 2021. Forecasting of COVID-19 cases using deep learning models: Is it reliable and practically significant?

A population-based cohort study of socio-demographic risk factors for COVID-19 deaths in Sweden

The ivory tower lost: How college students respond differently than the general public to the COVID-19 pandemic

Infection forecasts powered by big data

Fair Allocation of Scarce Medical Resources in the Time of Covid-19

Seyed-Mohsen Miresmaeili, and Elham Bahreini. 2020. A comprehensive review of COVID-19 characteristics

Modeling the dynamics of the COVID-19 population in Australia: A probabilistic analysis

Decreased case fatality rate of COVID-19 in the second wave: a study in 53 countries or regions

Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: A data-driven analysis

A Novel Approach of CT Images Feature Analysis and Prediction to Screen for Corona Virus Disease (COVID-19)

Quantifying population contact patterns in the United States during the COVID-19 pandemic

Molecular Diagnosis of COVID-19: Challenges and Research Needs

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction

Pandemic potential of a strain of influenza A (H1N1): early findings. science

Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures

Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy

INFEKTA: An agent-based model for transmission of infectious diseases: The COVID-19 case in Bogota, Colombia

The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology

COVID-19: A global and continental overview of the second wave and its (relatively) attenuated case fatality ratio

Making sense of mutation: what D614G means for the COVID-19 pandemic remains unclear

Public health actions to control new SARS-CoV-2 variants

Prediction of COVID-19 pandemic measuring criteria using support vector machine, prophet and linear regression models in Indian scenario

Future smart connected communities to fight covid-19 outbreak

Poonam Chaudharyb, and Saibal Palc. 2020. SEIR and Regression Model based COVID-19 outbreak predictions in India

Classification of Covid-19 Coronavirus, Pneumonia and Healthy Lungs in CT Scans Using Q-Deformed Entropy and Deep Learning Features

Macroscopic patterns of interacting contagions are indistinguishable from social reinforcement

Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images

Wrong but useful-what covid-19 epidemiologic models can and cannot tell us

Estimation of time-varying reproduction numbers underlying epidemiological processes: A new statistical tool for the COVID-19 pandemic

Assessment of public attention, risk perception, emotional and behavioural responses to the COVID-19 outbreak: social media surveillance in China. medRxiv

Characteristics of SARS-CoV-2 and COVID-19

Prediction of potential commercially inhibitors against SARS-CoV-2 by multi-task deep model

Forecasting and evaluating multiple interventions for COVID-19 worldwide

Artificial intelligence forecasting of covid-19 in china

Understanding the Impact of the COVID-19 Pandemic on Transportation-related Behaviors with Human Mobility Data

Variational-LSTM autoencoder to forecast the spread of coronavirus across the globe

AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app

A Review on Deep Learning Techniques for the Diagnosis of Novel Coronavirus (COVID-19)

Detection of Respiratory Infections Using RGB-Infrared Sensors on Portable Device

Clinical characteristics of coronavirus disease 2019 in China

Machine learning applications for COVID-19: A state-of-the-art review

Diagnosis of Coronavirus Disease 2019 (COVID-19) With Structured Latent Multi-View Representation Learning

Exploring the growth of COVID-19 cases using exponential modelling across 42 countries and predicting signs of early containment using machine learning

Efficacy of contact tracing for the containment of the 2019 novel coronavirus (COVID-19)

A contribution to the mathematical theory of epidemics

Artificial Intelligence for COVID-19 Drug Discovery and Vaccine Development

Detection of coronavirus (covid-19) associated pneumonia based on generative adversarial networks and a fine-tuned deep transfer learning model using chest x-ray dataset

CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images

Machine learning based approaches for detecting COVID-19 using clinical text data

Incorporating media data into a model of infectious disease transmission

Hi-COVIDNet: Deep Learning Approach to Predict Inbound COVID-19 Patients and Case Study in South Korea

2020. Magnitude, demographics and dynamics of the effect of the first wave of the COVID-19 pandemic on all-cause mortality in 21 industrialized countries

Open COVID-19 Data Working Group

The effect of human mobility and control measures on the COVID-19 epidemic in China

Model uncertainty, political contestation, and public trust in science: Evidence from the COVID-19 pandemic

Asymptomatic patients as a source of COVID-19 infections: A systematic review and meta-analysis

Effect of non-pharmaceutical interventions to contain COVID-19 in China

Leveraging Data Science to Combat COVID-19: A Comprehensive Review

Evolving social contact patterns during the COVID-19 crisis in Luxembourg

The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application

Serological Approaches for COVID-19: Epidemiologic Perspective on Surveillance and Control

Misinformation During the COVID-19 Outbreak in China: Cultural, Social and Political Entanglements

First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment

Xiaojin He, and Yunxia Cao. 2020. Asymptomatic and Presymptomatic Infectors: Hidden Sources of Coronavirus Disease 2019 (COVID-19)

The Impact of COVID-19 and Policy Responses on Australian Income Distribution and Poverty

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Analyzing Covid-19 on online social media: Trends, sentiments and emotions

Diagnostic accuracy of serological tests for covid-19: systematic review and meta-analysis

Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis

Impact of meteorological factors on the COVID-19 transmission: A multi-city study in China

Measurability of the epidemic reproduction number in data-driven contact networks

Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based on

The end of social confinement and COVID-19 re-emergence risk

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

Potential neutralizing antibodies discovered for novel corona virus using machine learning

A Critic Evaluation of Methods for COVID-19 Automatic Detection from X-Ray Images

Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China

Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches

Data-driven Analytical Models of COVID-2019 for Epidemic Prediction, Clinical Diagnosis, Policy Effectiveness and Contact Tracing: A Survey

Using artificial intelligence to improve COVID-19 rapid diagnostic test result interpretation

CRISPR-based surveillance for COVID-19 using genomically-comprehensive machine learning design

Designing viral diagnostics with model-based optimization

The Role of the Crowd in Countering Misinformation: A Case Study of the COVID-19 Infodemic

Rethinking Covid-19 Test Sensitivity -A Strategy for Containment

Deep-covid: Predicting covid-19 from chest x-ray images using deep transfer learning

Chatbots in the fight against the COVID-19 pandemic

A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID-19

Shallow convolutional neural network for COVID-19 outbreak screening using chest X-rays

Statistical analysis and visualization of the potential cases of pandemic coronavirus

Quantifying the Immediate Effects of the COVID-19 Pandemic on Scientists

Automatic Detection of Coronavirus Disease (COVID-19) Using X-ray Images and Deep Convolutional Neural Networks

Using artificial intelligence techniques for COVID-19 genome analysis

Syeda Sabrina Akter, Iqbal H Sarker, and AKM Najmul Islam. 2020. A Survey on the Use of AI and ML for Fighting the COVID-19 Pandemic. arXiv e-prints (2020)

Social media sentiment analysis based on COVID-19

Evaluation of the effectiveness of surveillance and containment measures for the first 100 patients with COVID-19 in Singapore

Artificial intelligence in the battle against coronavirus (COVID-19): a survey and future research directions

Covid-19 outbreak: Application of multi-gene genetic programming to country-based prediction models

Using observational data to quantify bias of traveller-derived COVID-19 prevalence estimates in Wuhan

Automatic classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray image: combination of data augmentation methods

Role of temperature and humidity in the modulation of the doubling time of COVID-19 cases

The age distribution of mortality from novel coronavirus disease (COVID-19) suggests no large difference of susceptibility by age

Bayesian inference for partially observed stochastic epidemics

Transmission of SARS-CoV-2: implications for infection prevention precautions: Scientific brief

Dynamic Bayesian influenza forecasting in the United States with hierarchical discrepancy (with discussion)

Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV

Classification of Coronavirus (COVID-19) from X-ray and CT images using shrunken features

Automated detection of COVID-19 cases using deep neural networks with X-ray images

Neural network based country wise risk prediction of COVID-19

A systematic review of COVID-19 epidemiology based on current evidence

Conditions for a second wave of COVID-19 due to interactions between disease dynamics and social processes

Ana Cláudia Fontoura Fróes, Maria Luíza Barreto Cazumbá, Ramon Gustavo Bernardino Campos, Stephanie Bruna Camilo Soares de Brito, and Ana Cristina Simões e Silva. 2020. Emotional, behavioral, and psychological Impact of the COVID-19 Pandemic

Serology testing in the COVID-19 pandemic response

Epidemic analysis of COVID-19 in China by dynamical modeling

Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach

Sulien Al Khalili, and Lone Simonsen. 2020. Comparing SARS-CoV-2 with SARS-CoV and influenza pandemics. The Lancet infectious diseases

COVID-19, SARS and MERS: are they closely related?

Production networks and epidemic spreading: How to restart the UK economy

Modeling compliance with COVID-19 prevention guidelines: The critical role of trust in science

The role of environmental factors on transmission rates of the COVID-19 outbreak: An initial assessment in two spatial scales

Automatic detection and localization of COVID-19 pneumonia using axial computed tomography images and deep convolutional neural networks

The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: A modelling study

Armin Nassehi, Andreas Peichl, Matjaz Perc, Elena Petelos, Barbara Prainsack, and Ewa Szczurek. 2021. An action plan for pan-European defence against new SARS-CoV-2 variants

Risk and resilience in family well-being during the COVID-19 pandemic

Automated diagnosis of COVID-19 with limited posteroanterior chest X-ray images using fine-tuned deep neural networks

Shenghong Ju, and Xiaolong Qi. 2020. Machine learning-based CT radiomics model for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: A multicenter study. Medrxiv

Regression Models and Life Tables

Crowding and the shape of COVID-19 epidemics

A review on COVID-19 forecasting models

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study

COVID-19 in the Age of Artificial Intelligence: A Comprehensive Review

Consideration for the asymptomatic transmission of COVID-19: Systematic Review and Meta-Analysis. medRxiv

Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil

Evis Sala, and Carola-Bibiane Schönlieb. 2021. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

Why is it difficult to accurately predict the COVID-19 epidemic?

Anne Marthe Van Der Bles, and Sander Van Der Linden. 2020. Susceptibility to misinformation about COVID-19 around the world

Time series analysis and forecast of the COVID-19 pandemic in India using genetic programming

High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2

Covid-19 and computer audition: An overview on what speech & sound analysis could contribute in the Sars-cov-2 corona crisis

Real-time prediction of COVID-19 related mortality using electronic health records

Business models shifts: Impact of Covid-19

Koray Kavukcuoglu, and Demis Hassabis. 2020. Improved protein structure prediction using potentials from deep learning

Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM

Machine Learning Research Towards Combating COVID-19: Virus Detection, Spread Prevention, and Medical Assistance

Covid-19 on social media: Analyzing misinformation in twitter conversations. arXiv e-prints (2020)

Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19

Rafdzah Ahmad Zaki, and Ubydul Haque. 2020. The impact of COVID-19 on globalization

Mathematical population dynamics and epidemiology in temporal and spatio-temporal domains

Age-structured impact of social distancing on the COVID-19 epidemic in India

Prediction of the COVID-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (ARIMA) model. JMIR public health and surveillance

Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images

COVID-19 pandemic: Shifting digital transformation to a high-speed gear

SIRNet: understanding social distancing measures with hybrid neural network model for COVID-19 infectious spread

Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future

Laboratory Diagnosis of COVID-19: Current Issues and Challenges

Robotics, smart wearable technologies, and autonomous intelligent systems for healthcare during the COVID-19 pandemic: An analysis of the state of the art and future vision

Breadth of concomitant immune responses prior to patient recovery: a case report of non-severe COVID-19

An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China

COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches

COVID-19: a knowledge and learning perspective

Computational Intelligence Techniques for Combating COVID-19: A Survey

An automated Residual Exemplar Local Binary Pattern and iterative ReliefF based COVID-19 detection method using chest X-ray image

Diagnosing COVID-19: the disease and tools for detection

Covid-19 control by computer vision approaches: A survey

Dengue Fever, COVID-19 (SARS-CoV-2), and Antibody-Dependent Enhancement (ADE): A Perspective

SARS-CoV-2 Variant Classifications and Definitions

Women Worry About Family, Men About the Economy: Gender Differences in Emotional Responses to COVID-19

Alex van Belkum, and Zisis Kozlakidis. 2021. Considerations for diagnostic COVID-19 tests

Kathy Leung, and Gabriel M. Leung. 2020. Modelling COVID-19

On Analyzing COVID-19-related Hate Speech Using BERT Attention

The Impact of Covid-19 Pandemic on the Global Trade

England: Insights from linking epidemiological and genetic data. medRxiv (2021)

The impact of COVID-19 and strategies for mitigation and suppression in low-and middle-income countries

Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein

Unexpected receptor functional mimicry elucidates activation of coronavirus fusion

Inference of person-to-person transmission of COVID-19 reveals hidden super-spreading events during the early outbreak phase

Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images

An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China

Ke Zheng, and Hongyan Li. 2020. Temperature significant change COVID-19 Transmission in 429 cities

Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics

A deep learning algorithm using CT images to screen for Corona virus disease (COVID-19)

COVID-19 Sensing: Negative Sentiment Analysis on Social Media in China via BERT Model

Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases

Complex Systems Analysis Informs on the Spread of COVID-19. bioRxiv

Can Predominant Credible Information Suppress Misinformation in Crises? Empirical Studies of Tweets Related to Prevention Measures during COVID-19

Characterization of an asymptomatic cohort of SARS-COV-2 infected individuals outside of Wuhan, China

Modeling shield immunity to reduce COVID-19 epidemic spread

WHO. 2021. Tracking SARS-CoV-2 variants

The global economic outlook during the COVID-19 pandemic: A changed world

Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results

Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study

Inhibition of SARS-CoV-2 (previously 2019-nCoV) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion

Association between ambient temperature and COVID-19 infection in 122 cities from China

Impact of COVID-19 pandemic on mental health in the general population: A systematic review

A machine learning-based model for survival prediction in patients with severe COVID-19 infection

Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in

Nanshan Zhong, and Jianxing He. 2020. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions

Satellite: An AI-driven System and Benchmark Datasets for Hierarchical Communitylevel Risk Assessment to Help Combat COVID-19

Nucleic Acid-Based Diagnostic Tests for the Detection SARS-CoV-2: An Update

A familial cluster of infection associated with the 2019 novel coronavirus indicating possible person-to-person transmission during the incubation period

Deep learning methods for forecasting COVID-19 time-Series data: A Comparative study

Prediction and control of COVID-19 spreading based on a hybrid intelligent model

Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China

Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography

Topic Evolution, Disruption and Resilience in Early COVID-19 Research

Potential COVID-2019 3C-like protease inhibitors designed using generative deep learning approaches

Potential Non-Covalent SARS-CoV-2 3C-like Protease Inhibitors Designed Using Generative Deep Learning Approaches and Reviewed by Human Medicinal Chemist in Virtual Reality

Deep learning-based detection for COVID-19 from chest CT using weak label

Feng Ye, and Jingmin Xin. 2020. Predicting COVID-19 in China using hybrid AI model

Recommendations and guidance for providing pharmaceutical care services during COVID-19 pandemic: a China perspective

Semiparametric Bayesian inference for the transmission dynamics of COVID-19 with a state-space model

This work is partially sponsored by the Australian Research Council Discovery grant DP190101079 and the ARC Future Fellowship grant FT190100734.We thank Wenfeng Hou, Siyuan Ren, Yawen Zheng, Qinfeng Wang and Yang Yang for their assistance in the literature collection. More information about COVID-19 modeling is in https://datasciences.org/covid19-modeling/.