key: cord-0710040-r2g0gqk6 authors: Soukhovolsky, Vladislav; Kovalev, Anton; Pitt, Anne; Shulman, Katerina; Tarasova, Olga; Kessel, Boris title: The Cyclicity of coronavirus cases: “Waves” and the "weekend effect" date: 2021-01-28 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2021.110718 sha: 3517e506e896bd1f956f0412930afa75e1420275 doc_id: 710040 cord_uid: r2g0gqk6 INTRODUCTION: : Medical statistics is one of the "milestones" of current medical systems. It is the foundation for many protocols, including medical care systems, government recommendations, epidemic planning, etc. At this time of global COVID-19, credible data on epidemic spread can help governments make better decisions. This study's aim is to evaluate the cyclicity in the number of daily diagnosed coronavirus patients, thus allowing governments to plan how to allocate their resources more effectively. METHODS: : To assess this cycle, we consider the time series of the first and second differences in the number of registered patients in different countries. The spectral densities of the time series are calculated, and the frequencies and amplitudes of the maximum spectral peaks are estimated. RESULTS: : It is shown that two types of cycles can be distinguished in the time series of the case numbers. Cyclical fluctuations of the first type are characterized by periods from 100 to 300 days. Cyclical fluctuations of the second type are characterized by a period of about seven days. For different countries, the phases of the seven-day fluctuations coincide. It is assumed that cyclical fluctuations of the second type are associated with the weekly cycle of population activity. CONCLUSIONS: : These characteristics of cyclical fluctuations in cases can be used to predict the incidence rate. The possible cyclical nature of the new disease, with incidence "waves", was predicted at the onset of the coronavirus pandemic, by analogy with the influenza epidemic [1, 9] . However, the absence of long series of data, the occurrence of previous single-wave epidemics [12, 13] , and reliance on the classical mathematical model of the epidemic, which considered one "wave" of the disease [11] , did not encourage quantifying the risk of epidemic "waves". In addition, the biological factors causing the cyclical nature of epidemics remained unclear. For influenza, weather conditions are considered a modifying factor in the epidemic's development; cases increase in autumn and winter. By analogy, it was assumed that the coronavirus epidemic would slow or even end by the summer of 2020. However, this did not happen, and since March 2020, sufficiently long series of incidence rates have been gathered in different countries to allow the cyclicality of coronavirus epidemics to be assessed and to identify the factors responsible for the disease's recurrence. In this work, we studied the cyclical nature of coronavirus incidence in different countries using the methods of spectral analysis of time series. The database of the Johns Hopkins University (USA) was used for analysis [7] . The database for all countries contains the number n(t) of cases accumulated since the epidemic's beginning in a country from the date t 0 of the epidemic's start to the day t max (here, it is 12.08.2020). Using these data, we calculated the first relative differences, p(t)/N = [n(t) -n(t-1)]/N, characterizing the relative number of cases on a specific day t (N is a country's population). By analyzing these series, one can characterize their general properties, such as the presence or absence of a time trend and cyclicality. Fig. 1 We will use spectral analysis methods to assess periodic processes during the epidemic, according to which the time series is represented as the sum of sinusoids with a certain frequency and amplitude [2, 3, 4] . In the spectrum of the time series, the X-axis characterizes the frequency of oscillations with a certain frequency, and the Y-axis characterizes the spectral density -the amplitude A(f) of oscillations with a frequency f. To calculate the spectra, the so-called Fourier transform is used [8, 14, 15] . To estimate the spectra of time series, the studied time series must be stationary in time, that is, their average values cannot change [18] . If the series is characterized by the presence of some monotonic trend (as can be seen in the US data), then it is necessary to "clear" the trend from the time series and then calculate it. For these time series, the ADF-test was used to estimate the stationarity of the series [5, 6, 17] . If the series turned out to be non-stationary, then to isolate the trend, the linear regression function q = a + bt was calculated with respect to the value p(t), or the series w(t) = p(t) -q(t). If the significance of the coefficient of determination R 2 of the regression was low, the logarithmic transformation p(t) was carried out, and for the series ln p(t), the logarithmically linear regression equation q1(t) was calculated and then the series w1(t)-= ln p(t) -q1(t). If the time series w(t) or w1(t) was stationary according to the ADF-test, its spectrum was calculated. If the series w(t) still remained unsteady, the trend detection procedure could be repeated (however, this was never needed and the calculations were limited to a single trend detection procedure). In addition, a procedure was used to isolate a number of second differences Δw(t + 1) = w(t + 1) -w(t). If the number of first differences can be interpreted as the rate of the epidemic's increase or decrease, then a number of second differences can be represented as the acceleration of the epidemic's development. If the average value of time series is zero and the sum of the terms of the autocorrelation function of the series Δw is finite, then such a series will be stationary. After a transformation of the time series, their spectra were calculated. Below are the results of calculating the spectra for a number of countries. A log-transformed and detrending daily incidence series for the US is shown in Fig. 2 . There are three low-frequency "waves" of the disease for the USA during the period from the epidemic's beginning to early December 2020 (Fig. 2) . After performing the Fourier transformation, the spectrum s(f) of the time series was calculated (Fig. 3 ). The peaks of the spectrum f1, f2, ... determine the characteristic frequencies (or reciprocal values -the periods of fluctuations in the level of daily cases). As can be seen from Fig. 3 , the maximum frequency L1 = 1/f1 for the US is 133 days. In addition to the peak at the frequency f1 = 0.0075, the spectrum for the US is characterized by the presence of a peak at the frequency f2 = 0.143 (wavelength L2 = 7 days). Spectra with similar peak frequencies are typical for the time series of the relative number of cases per day for other countries. Table 1 shows the wavelengths L1 and L2 of the incidence of coronavirus for a number of countries. The duration (1/frequency) of epidemic waves differs significantly in different countries. Two groups of countries can be distinguished -countries with a short (from 94 to 160 days) duration of low-frequency fluctuations in incidence, and countries with a long duration of incidence (from 274 to 320 days). The twofold differences in the low-frequency incidence rate and the complete absence of waves for the incidence data in China indicate that the cause of the cyclical fluctuations is probably not related to the intrinsic properties of the virus itself, but is most likely determined by a country's characteristics, although it is difficult to find commonalities in the climates and the cultures, government response/action, behavior of population of Belgium, Japan, USA, Israel and Russia. Most countries in Tab The L2 wavelengths in the series of second differences in this range are also given in Table 1 , and the wavelength is close to 7 days for most of the analyzed data. The exceptions are France, Saudi Arabia, Belarus and China (the frequency of fluctuations is close to 3 days) and Egypt, with a frequency of 11 days. But the incidence statistics data for France turn out to be incorrect, because there are the errors in database and very often the number of accumulated cases n(t) on data t is less than the value n(t-1) on the previous day (t-1) in database. The analysis showed that the power of the peaks of the spectrum of the second differences was associated with indicators of case intensity--the total relative number of cases from the epidemic's beginning to 12.08.2020 (Fig. 5 ). The reason for the peaks on the spectrums of the second difference time series for the number of sick patients remains unclear. There are no obvious biological causes for the disease's cyclicity, so the seven-day cyclicity may be associated with social effects. As a hypothesis, it could result from the so-called "weekend effect", when the patients with initial disease symptoms go to medical care providers only after the weekend, at the beginning of the next week. Fig. 6 shows trend-cleared and smoothed data on the distribution of the number of registered patients in the United States (weekends on Saturdays and Sundays) and Israel (weekends on Fridays and Saturdays). In the US, at the beginning of the week, an increased number of patients need medical care because those who became ill on Saturday-Sunday will be registered as ill on Monday-Tuesday. In Israel, the number of registered patients is also at a minimum on weekends -on Friday and Saturday. A similar incidence distribution by days of the week is observed in other countries (Table 2) . However, there is no pronounced peak at the frequency f2 ≈ 0.143 in the spectrum of the second differences for a number of countries -Belarus, Saudi Arabia, China (about 3 days) Egypt (11 days) . The lack of a weekend effect may be due to the lack of a complete weekend (in Egypt, only Friday is off) or to problems with the reliability of disease case registration in Belarus. If the spectrum's peak in the transformed time series of cases is caused by a week-end effect, then there should be a shift in the phase between the transformed time series for countries whose weekends fall on Saturday -Sunday, the transformed time series for countries that have only Friday as a day off (i.e. Muslim countries), and Israel, which has weekends on Friday and Saturday. However, the calculations showed that the spectrum's peaks do not appear for a number of countries. In particular, such peaks are not typical for the epidemic in China, and the spectrum for China in the first approximation can be seen as flat, with a lack of significant peaks in the spectrum and apparent frequency. Perhaps the absence of significant peaks in the spectrum of the second differences time series of cases in China is associated with a more organized control of the disease, the state of the population and rapid testing for coronavirus. In China, the disease incidence is recorded as soon as it appears, and there is no seven-day cyclicality. Earlier, an AR model was proposed to describe the dynamics of disease [16] . where k is the autoregression order, a 0 ,… a k are the coefficients. To estimate the order of autoregression according to the time series {p (i)}, the partial autocorrelation function (PACF) was calculated and the maximum significant value of PACF(k) was determined. Further, equation (2) The spectrum I(f) of model (2) can be represented as the Fourier cosine transform of the autocorrelation function, calculated from the values of the coefficients a j [2] : The maximum of I(f max ) at a certain value of f max will characterize the wavelength L1 = 1 /f max of the incidence time series spectrum for a particular country. From the values of the autoregression order k and the coefficients a j of equation (2), one can understand the reasons for the differences in the wavelengths L1 in the time series of the incidence of countries with sufficiently small values of L1≈140 days (Israel, Russia) and countries with large values of L1 ≈ 280 days (Brazil, Sweden). The parameters of model (2) for the weekly series of Israel, Russia, Brasil and Sweden are given in Table 3 . As shown in Table 3 , all coefficients for the variables p(j) are significant at the 0.05 level. Thus, we can conclude that the cyclicity of the coronavirus and, accordingly, the wavelength L1 of the time series of incidence is determined by the inertia k of the system. Perhaps inertia depends on the rate of withdrawal of sick patients from the population and from the number of asymptomatic patients. However, the weekly frequency is not very surprising; it exists for other medical conditions. The multiple epidemic waves were described in 1918 influenza pandemic. Yu D. et al , in their classic work, based on previous and his own improved modelling, demonstrated reactive social distancing, temperature, and school term could explain the observed multiple waves and final epidemic size (19) . Another evidence of multiple-waves outbreaks was recently published by Bo Xu. Possible explanations for the L1, could be seasonal climate effects, different peaking time in the North and South hemispheres, and public behavior reaction to the pandemic (20) . The effect of global transportation, which remains unclearly limited, may also play the important role in this phenomenon. What can be expected in the more distant future if mass vaccinations are carried out? Apparently, despite a decrease in the disease intensity, the seven-day cycle associated with the weekly rhythm of population activity may persist. The low-frequency cycle may also continue, especially if vaccinating most of the population takes a long time and if the quarantine is not continued, after vaccination begins. Conclusions: A review of influenza detection and prediction through social networking sites Statistical Analysis of Time Series Time Series Analysis: Forecasting and Control Introduction to Time Series and Forecasting Digital filters. Courier Corporation Spectral analysis and its applications Modeling Infectious Diseases in Humans and animals The advanced theory of statistics: design and analysis, and time series A Contribution to the Mathematical Theory of Epidemics // Proceedings of the Understanding the dynamics of Ebola epidemics // Epidemiology and Infection Mathematical Model of SARS Prediction and Its Research Progress Digital spectral analysis: with applications Time Series Analysis and Its Applications with R Examples A new modelling of the COVID 19 pandemic// Chaos, Solitons & Fractals (IF 3.764) Pub Date Introduction to Econometrics Time series analysis: univariative and multivariative methods Effects of reactive social distancing on the 1918 influenza pandemic Mechanistic modelling of multiple waves in an influenza epidemic or pandemic The presented analysis demonstrates that at present time world widely, the morbidity dynamics is characterized by periodic lowfrequency oscillations with wavelengths from 100 to 300 days and of high-frequency oscillations with a period of about 7 days. An autoregressive model can be used to describe such fluctuations with high level of accuracy. ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.