key: cord-0994274-r68spezb
authors: Chen, Ray-Ming
title: Analysing deaths and confirmed cases of COVID-19 pandemic by analytical approaches
date: 2022-03-21
journal: Eur Phys J Spec Top
DOI: 10.1140/epjs/s11734-022-00535-4
sha: 36e80240c184d62a105db650e45e5ac6e5f606ac
doc_id: 994274
cord_uid: r68spezb

In this work, the time series of growth rates regarding confirmed cases and deaths of COVID-19 for several sampled countries are investigated via an introduction of an orthonormal basis. This basis, which is served as the feature benchmark, reveals the hidden features of COVID-19 via the magnitude of Fourier coefficients. These coefficients are ranked in the form of ranking vectors for all the sampled countries. Based on these and Manhattan metric, we then perform spectral clustering to categorise the countries. Unlike the classical cosine similarity analysis which, relatively speaking, is a composite index and hard to identify the features of the categorised countries, spectral analysis delves into the internal structures or dynamical trend of the time series. This research shows there is no single feature that dominates the trend of the growth rates. It also reveals that results from the spectral analysis are different from the ones of cosine similarity. In the end, some approximated values of the confirmed cases and deaths are also calculated by the spectral analysis.

For the time being, there are over 180 million confirmed cases and around 4 million deaths of COVID-19 [1] . The variants of COVID-19 are still in full swing and spreading across the continents [2, 3] . The spread of this pandemic is studied by many researchers [4] [5] [6] [7] . There are many ways to look into the behaviours of the viruses or the pandemic itself [8, 9] , including for the efficacy of travel ban or lockdown [10] . Nowadays more and more papers are focusing on the efficacy and effectiveness of vaccine against all sorts of variants of SARS-CoV-2 [11, 12] . Some researches even help shed light on resolving the pandemic via herd immunity [13] or map out the dynamical trajectory of the viruses [14, 15] . Among those research, there are plenty of qualitative and quantitative research methods, in particular statistical methods: regression-related models [16] , Mann-Whitney U tests, Mann-Kendal tests, Spearman's rho, etc [17, 18] ; factorial design [19] ; artificial intelligence (AI) based methods [20, 21] and some deep learning techniques in COVID-19 diagnosis [22, 23] . Coupling with AI, an automatic reasoning for searching the hidden features of the trend of COVID-19 is also vital [24] . Some researches even have established the relations between cases and deaths of COVID-19 from demographic, economic and social perspectives [25] .

An orthonormal basis [26] B N (precisely B 36 ), which is motivated by Fourier analysis [27] , is employed. The a e-mail: raymingchen@bsuc.cn (corresponding author) underlying frequencies of data are taken into consideration. The COVID-19 database [28] which recorded the weekly COVID-19 cases and deaths from Week 15 to Week 51 (37 weeks in total) is utilised. By filtering out some non-essential data (countries), we obtain 43 countries as the research targets. By calculating the 36 (from Week 15 to Week 50) growth rates of the cases and deaths for each sampled country, we have a national case vector and a death vector. By transforming these two vectors into a set of coefficients, which is the result of the inner products via B 36 , we start to rank the coefficients by their face values and form ranking vectors for each country. The ranks indicate the strength (relation) between the growth rates and the underlying frequencies-a larger coefficient will be assigned a larger rank. Then the Manhattan metric [29] is applied to measure the distances between all the ranking vectors and yield a Manhattan distance matrix. Based on this matrix, we associate each country with its closer and further neighbours via the minimal pairing and maximal pairing. Then we re-analyse the collected growth rates of confirmed cases and deaths with another typical approach: cosine similarity. Finally, an approximation method based on spectral analysis to further predict the development of confirmed cases and deaths is devised.

In sum, the current work shows that:

• the patterned evolutionary correlation between countries not random, i.e., there are some fundamental factors that contribute to such relation;

• the correlated patterns for cases and deaths between countries bear no similarity at all; • there is a strong discrepancy between evolution of cases and the one of deaths; • the development of confirmed cases and deaths among countries are monotonic-some might increase the total cases, while some would decrease the total ones, if the detected features are preserved.

Based on the clustering technique in this study, it shall offer some knowledge for the policymakers in adopting the sensible measures for controlling the pandemic.

By expanding the concept of the Fourier analysis, an orthogonal basis -which is served as the feature benchmarks to conduct the current study is introduced. Let N denote the set of positive integers and R denote the set of real numbers. For any E v ∈ R n , E v i is used to denote its i'th element, |E v| is used to denote its length and ||E v|| is used to denote the Euclidean norm. Assume |E v| = N +1, where N stands for a natural number. ΔE v is used to denote its growth vector, i.e.,

Observe that |ΔE v| = N . This growth vector is the main research target, since we study the (weekly) grow rates of cases and deaths regarding COVID-19. For any two

w is used to denote their inner product.

. By some manipulation of mathematical operations, B N is proved to be an orthogonal basis for all natural number N .

w ∈ R n be arbitrary. A Manhattan metric d over R n is defined by [29] 

It faithfully adds up all the projected distances with respect to every individual dimensions. Cosine similar-ity between them is defined by

This is a classical approach measuring the relation or similarity between two vectors.

The weekly data (up to Week 51, 2020) of the reported COVID-19 total confirmed cases and deaths worldwide [28] are download. The samples are the countries. To reduce biased sampling which derives from the insufficient population, inadequate healthcare support and missing data, only these countries satisfying the following three criteria simultaneously would be qualified for the sampled countries:

1. countries whose population are more than ten millions; 2. countries whose healthcare systems are ranked [30] among the top 100; 3. countries whose data from Week 15 to Week 51,

Year 2020 are available.

After filtering out the non-essential samples by the above criteria, we obtain 43 countries as shown in Table  1 -each of which contains 37 weekly data (from Week 15 to Week 51) of total confirmed cases and deaths.

Procedures The current work is conducted result spectral analysis by the following procedures:

1. Prepare and compile the weekly accumulated confirmed cases and deaths from Week 15 to Week 51 in Year 2020 with respect to the sampled 43 countries. The sampled data are shown in Table 6 . 2. Calculate the t-th week's growth rates for confirmed cases of COVID-19 by the formula

where W eek t i denotes the total number of confirmed cases at Week t for country i. For my analytical purpose, the denominator is deliberately added by 1 to avoid divisor being 0. Similarly, we could calculate the t-th week's growth rates of deaths for country i by h t i . Then for each country i, a growth vector g i = (g t i ) 50 t=15 and h i = (h t i ) 50 t=15 are formed. The calculated results are tabulated in Table 7. 3. Choose an orthonormal basis B 36 and rename them by These 43 labelled countries are sampled based on three criteria: over 10 million population, top 100 healthcare system, and available data for the set periods Fig. 1 Trend of weekly total cases of COVID1-19. These plots correspond directly to Table 6 . It depicts the weekly total cases of COVID-19 for the labelled countries 1 to 4 and 40 to 43 Table 7 4. Calculate the magnitude vector for each growth vec- Table 8 ; 5. Rank each element in [g i ] by a natural number according to its face value and form a ranking vector. The one with higher face value would be assigned a higher rank. Follow the same method for ranking each [h i ]. The results are presented in Table 9 . 6. Calculate the Manhattan distance between all the ranking vectors. The resulting distance matrices are presented in Table 2 . 7. Find the minimal pairs (or nearest neighbours) and maximal pairs (or furthest neighbours) for all the countries via with least distance via above Manhattan distance matrices. The visualised results are presented in Fig. 4 .

In correspondence to the procedures described in Sect. 2, we embark on data analysis and produce the results in this section with the help of some R 4.1.0 programs [31] . The raw data for weekly total confirmed cases and deaths are further processed and graphed in Figs. 1 and 6. Due to limited space, we choose labelled countries 1 to 4 and 40 to 43 throughout this study for demonstrative purpose. The visualised growth rates for confirmed cases and deaths are graphed in Fig. 2 . As expected, the trends for confirmed cases and deaths are very similar, but different in scale. However, this synchrony falls apart as more features and factors are taken into consideration To further reveal the characteristics of the trends, we delve into the growth rates of the total numbers. The visualised results are graphed in Fig. 2 .

This visualisation is not sufficient enough to look into the underlying features of the changes. To investigate the hidden features of the trend, we avail of spectral analysis to decompose the national trends into different magnitude of frequencies. The reason that the growth rates are adopted for decomposition rather than the total cases is for the further comparison and analysis between countries, since the growth rate in essence is already scaling the data and that makes the decompositions comparable. The visualised results are graphed in Fig. 3 .

Since the data collected might still contain some noise, we process the resulting Fourier coefficients by ranking. An alternative is to compare the coefficients directly by some similarity measures, but this approach might contain some noise and distort the results. The ranks are assigned directly by comparing the values of the coefficients. The higher values will be assigned higher ranks. After this ranking, we start to measure the distances for the ranking vectors of the countries via Manhattan metric, which is a straightforward metric reflecting the difference between two vectors. The calculated results are shown in Table 2 .

Clustering analysis Based on Table 2 , we analyse the distance matrix by categorising the countries with similar features by two methods for contrasts: minimal pairing and maximal pairing. The minimal one reveals the closer neighbours who share the similar features, whereas the maximal one discloses the further neighbours who bear the most dissimilarities. The results are collectively presented in Fig. 4 .

Cosine similarity For further comparison, we contrast the spectral analysis with typical cosine similarity analysis. The targets for cosine similarity are the growth rates of confirmed cases and deaths. In other words, it calculates the similarities between the graphs in Fig. 3 . Cosine similarity, which is defined in Equation 2, is explicitly or implicitly used in many fields, since it is a composite index which yields a much intuitive interpretation via geometrical notions. Since it is a static indicator, the analysis per se does not really Eur. Phys. J. Spec. Top.

Inner product of growth rates of weekly total cases and deaths of COVID1-19. The solid line concerns the confirmed cases, while the dashed line concerns the deaths in each plot. These plots correspond directly to Table 8 reveal the internal difference between the structures or trends. The results are presented in Table 3 .

1. From Figs. 1 and 6, we observe that the trends between confirmed cases and deaths are pretty much the same. 2. From Fig. 2 , the trends of growth rates for confirmed cases and deaths are not similar to each other. Furthermore, the growth rates for deaths are much stabler than the ones for the confirmed cases; 3. From Fig. 3 , one could observe that the Fourier coefficients for confirmed cases and deaths are in line with each other; there is no obvious leading frequencies that decide the trend of growth rates; 4. From Fig. 4 , we obverse that the graphs are disconnected from the spectral features' point of view. The sub-graphs indicate the closeness between their features. Each group shall share some hidden properties which are not necessary linking to the geographical locations; 5. From Fig. 4 , we find the maximal clusters do not form a connected graph-this indicates the characteristic of the grouping is distinct and clear-cut, i.e., there are some separable features among the countries; 6. Comparing Fig. 5 with Fig. 4 , we find cosine similarities lead to few representing countries than the spectral-featured counterparts; in addition, the confirmed cases have even fewer representing countries than the death-this shall indicate the deaths might be due to the circumstances of each individual countries-for example, the healthcare systems; 7. From Table 4 , the closest pair is the minimal pairing for confirmed case via spectral analysis and the maximal pairing for deaths via cosine similarity. This indicates there two methods indeed produce substantially different results (if the two methods are similar to each other, we shall expect a closer relation between fq_mc and cs_mc than the one between fq_mc and cs_Mc).

Prediction Now we show how to further approximate the weekly COVID-19 confirmed cases and deaths based on obtained features. In this section, for any vector z i , we use z i,j to denote its j-th element in z i . Let v t i and w t i denote the total confirmed cases and total deaths by Week t for Country i, respectively. Define the growth Table 2 . They reveal the distances between ranked Fourier coefficients of the 43 countries. The lower the distance is, the closer the features are Table 3 . They reveal the similarities between weekly growth rates of COVID-19 for the 43 countries 

Furthermore, by the first order derivative

Eur. Phys. J. Spec. Top. The similarities are based on Table 7 directly. They reveal the relations between the trend of Fig. 2 0  84  86  86  86  36  86  80  fq_Mc  84  0  82  82  76  82  84  84  fq_md  86  82  0  80  84  86  84  82  fq_Md  86  82  80  0  84  86  70  86  cs_mc  86  76  84  84  0  86  84  86  cs_Mc  36  82  86  86  86  0  86  80  cs_md  86  84  84  70  84  86  0  86  cs_Md  80  84  82  86  86  80  86  0 The distances for the eight minimal/maximal pairings for confirmed cases and deaths via spectral analysis and cosine similarities are computed. fq_mc, fq_Mc, fq_md and fq_Md are the minimal pairings of the confirmed cases, the maximal pairings of the confirmed cases, the minimal pairing of the deaths, and the maximal pairings of the deaths by frequency, respectively. Similarly, cs_mc, cs_Mc, cs_md and cs_Md are the minimal pairings of the confirmed cases, the maximal pairings of the confirmed cases, the minimal pairing of the deaths, and the maximal pairings of the deaths by cosine similarity, respectively

and second order derivative 2· 

Proof The result could be deduced by the same inference above. Based on this alternating inductive steps, we predict next four weeks' confirmed cases and deaths for the 43 This table shows the prediction of weekly accumulated confirmed cases (upper block) and deaths (lower block) from

Week 52 to Week 55 for the sampled 43 countries. The prediction is based on the spectral analysis sampled countries. The results are partially presented in Table 5 and fully shown in Figs. 7 and 8. From these results, we find that if the features are preserved, then the development of confirmed cases and deaths shall not be monotonic among these countries. This shall offer an explanation of why the pandemic is not synchronous among the countries.

The main purpose for this study is to extract the patterns of evolution of COVID-19 regarding confirmed cases and deaths across the globe and to predict the future trend via spectral analysis. The results are presented in Sect. 3. The characteristics of this approach in this study go as follows:

• Spectral analysis is applied on detecting and tracking the hidden features of growth rates of confirmed cases and deaths of COVID-19. This method offers some advantages over statistical approaches: there is no predetermined independent variable association and no need to consider or interpret the interactions between chosen factors. This characteristic provides a much efficient way for automatic rea-soning, though the features extracted are somehow mechanic. • By Manhattan metric and ranking techniques, we could then perform spectral clustering which groups the countries with similar features. Since this is a feature-based clustering, the groups could be easily identified by their representing frequencies. Unlike the classical cosine similarity analysis which is a composite and descriptive index and hard to identify their representing properties of the clusters, spectral analysis delves into the internal structures or dynamical properties of the trend. • Spectral analysis could also be applied in approximation problem and this is also conducted in the prediction of confirmed cases and deaths of COVID-19 in this study.

Based on these characteristics, there are a couple of points I like to address:

• Relatively speaking, the main advantage for statistical approach, in particular regression-related methods, over spectral analysis lies in its interpretability of the causal variables. But this might also be a disadvantage since one needs to specify the independent variables, which is not a case for spectral analysis. Henceforth, for automatic reasoning, the spectral analysis shall turn out to be more effective, but for meaningful interpretation, statistical approach is preferable. The choice between them would depend on the one's purpose. • When one is interested in finding out the fundamental features, or the benchmarks, of time series, then spectral analysis is the candidate, i.e., the internal structures of data are revealed via the magnitudes of the frequencies; cosine similarity, on the other hand, is a composite and static indicator for relation between vectors-it is much intuitive in geometrical interpretation, but ambiguous in revealing or comparing the internal structures.

As for the study, there are some points worth noticing and enhancing.

• Some of the results about causal relations in this study might not comply with other researches [25] . This is reasonable, since the approach we adopt focus more on feature detection, not solely on causal relation finding. • One could also delve into the shift of phrases of the frequencies by lifting the constraint on weekly growth rates. This might yield an even more dynamical pictures of the evolutions. • The samples filtered are based on some criteria. One could loosen or strengthen the criteria to compare the results generated. • During the reviewing process of this manuscript, there is a new variant Omicron [32, 33] whose dynamical behaviour is worth further investigating [34] . This is a partial raw data for weekly confirmed cases (upper block) and deaths (lower block) of COVID-19 for 15th week 2020 to 51st week 2020 for 43 sampled countries. The sampling is based on the size of population in a country, the availability of COVID-19 data and the healthcare system in a country

See Fig. 6 .

Trend of weekly total deaths of COVID1-19. These plots correspond directly to Table 6 Weekly growth rates for cases and deaths See Table 7 . Fourier coefficients See Table 8 . 

See Table 9 .

Ranking the Fourier coefficients for cases (upper block) and deaths (lower block) calculated in Table 8 Country The higher the coefficients are, the higher the ranks are. The higher ranks indicate the main features of weekly growth rates of COVID-19 in terms of the chosen 36 frequencies 

Covid-19: What have we learnt about the new variant in the UK?

Asymptomatic transmission of covid-19

Analysis of outbreak and global impacts of the COVID-19

Variation in US hospital mortality rates for patients admitted with COVID-19 during the first 6 months of the pandemic

Decreased COVID-19 mortality-A cause for optimism

Randomness for nucleotide sequences of SARS-CoV-2 and its related subfamilies

Quantifying collective intelligence and behaviours of SARS-CoV-2 via environmental resources from virus' perspectives

Track the dynamical features for mutant variants of COVID-19 in the UK

On COVID-19 country containment metrics: a new approach

Efficacy and safety of the mRNA-1273 SARS-CoV-2 Vaccine

Safety and efficacy of the BNT162b2 mRNA COVID-19 Vaccine

Can India develop herd immunity against COVID-19?

An exploration of fractal-based prognostic model and comparative analysis for second wave of COVID-19 diffusion

The second and third waves in India: when will the pandemic be culminated?

A statistical analysis of the novel coronavirus (COVID-19) in Italy and Spain

Statistical procedures for evaluating trends in coronavirus disease-19 cases in the United States

Statistical analysis and visualization of the potential cases of pandemic coronavirus

Implementing clinical research using factorial designs: a primer

Artificial intelligence in the diagnosis of COVID-19: challenges and perspectives

Artificial Intelligence (AI) applications for COVID-19 pandemic

Generalizability of deep learning tuberculosis classifier to COVID-19 chest radiographs: new tricks for an old algorithm?

ai-corona: Radiologist-assistant deep learning framework for COVID-19 diagnosis in chest CT scans

Fourier analysis using the number of COVID-19 daily deaths in the US

Relationships of total COVID-19 cases and deaths with ten demographic, economic and social indicators. medRxiv

Advanced Linear Algebra

An Introduction to Fourier Analysis

The Humanitarian Data Exchange, Geographic Distribution of COVID-19 Worldwide

Dictionary of Algorithms and Data Structures

Measuring overall health system performance for 191 countries

countrycode: An R package to convert country names and country codes

Omicron: a mysterious variant of concern

Covid-19: Early studies give hope omicron is milder than other variants

Modeling the dynamics of COVID-19 pandemic with implementation of intervention strategies

Acknowledgements This work is supported by the Humanities and Social Science Research Planning Fund Project under the Ministry of Education of China (No. 20XJA-GAT001).

All data used in this article are included in the manuscript.

Refined raw data See Table 6 .