key: cord-0961080-mh40n7ou
authors: Cascon, A.; Shadwick, W. F.
title: Predicting the course of Covid-19 and other epidemic and endemic disease
date: 2021-12-29
journal: nan
DOI: 10.1101/2021.12.26.21268419
sha: bf9bc1ebba25f0b6e627abbb4ee2c5e67d2097c3
doc_id: 961080
cord_uid: mh40n7ou

The Gompertz Function is an accurate model for epidemics from Cholera in 1853 to Spanish Flu in 1918 and Ebola in 2014. It also describes the acute phase of annual outbreaks of endemic influenza and in all of these instances it has significant predictive power. For Covid-19, we show that the Gompertz Function provides accurate forecasts not just for cases and deaths but, independently, for hospitalisations, intensive care admissions and other medical requirements. In particular Gompertz Function projections of healthcare requirements have been reliable enough to allow planning for: hospital admissions,intensive care admissions,ventilator usage, peak loads and duration. Analysis of data from the Spanish Flu pandemic and the endemic influenza cycle reveals alternating periods of Gompertz Function growth and linear growth in cumulative cases or deaths. Linear growth means the Reproduction Number is equal to 1 which in turn indicates endemicity. The same pattern has been observed with Covid-19. All the initial outbreaks ended in linear growth. Each new outbreak has been preceded by a period of linear growth and has ended with a transition from Gompertz Function growth to linear growth. This suggests that each of these outbreak cycles ended with a transition to endemicity for the current dominant strain and that the normal seasonal respiratory virus periods will continue to see new outbreaks. It remains to be seen if widespread vaccination will disrupt this cyclicality. Because both Gompertz Function Growth and linear growth are accurately predictable, the forecasting problem is reduced to identifying the transition between these modes and to improving the performance in the early Gompertz Function growth phase where its predictive power is lowest. The dynamics of the Gompertz Function are determined by the Gumbel probability distribution. This is an exceptional distribution with respect to the geometry determined by the affine group on the line which is the key to the Gumbel distribution's role as an Extreme Value Theory attractor. We show that this, together with the empirically observed asymmetry in epidemic data, makes the Gompertz Function growth essentially inevitable in epidemic models which agree with observations.

the early Gompertz Function growth phase where its predictive power is lowest.

The dynamics of the Gompertz Function are determined by the Gumbel probability distribution. This is an exceptional distribution with respect to the geometry determined by the affine group on the line which is the key to the role of the Gumbel distribution as an Extreme Value Theory attractor. We show that this, together with the empirically observed asymmetry in epidemic data, makes the Gompertz Function growth essentially inevitable in epidemic models which agree with observations. 1 Overview and Outline

Galileo showed by experimental observations that, in the absence of friction, projectiles follow a parabolic trajectory. Only some decades later did Newton's laws of motion explain why that was the case.

In the interim, the parabolic trajectory was what would now be called a 'phenomenological model'-a mathematical formulation of a process that describes it without explaining why, or by what mechanism, it takes place. If we can make reasonable observations of points on the projectile's trajectory, we can approximate the parameters that determine the parabola. The model can then make accurate predictions about the future on the basis of our observations of the past.

We show that major epidemics in the past from Cholera in 1853 to Spanish Flu in 1918 and Ebola in 2014 are all described very well by a 3-parameter phenomenological model-the Gompertz Function.

So it is not surprising that the Covid-19 epidemic also followed the Gompertz Function Model.

We demonstrate through numerous examples that this model has significant short to medium term predictive power. The error between the predictions made by the Gompertz Function fit at a given time and the subsequent out of sample data remains small, first only for a few days and then for progressively longer and longer periods.

Most importantly, this means that the model can be used to make increasingly accurate forecasts of healthcare requirements as a Covid-19 outbreak (or other epidemic) progresses. Such forecasts are essential to address the fear that hospitals will be overwhelmed-which has been a major preoccupation almost everywhere.

In every location for which we have obtained data, the predictions of the Gompertz Function Model could have been used in planning for: hospital admissions and intensive care admissions as well as such things as ventilator requirements, timing and size of peak loads and duration of an outbreak.

The Gompertz Function Model is based on observed data and the model parameters can be calculated very rapidly by a simple transparent process (nonlinear regression) for which there are numerous 'off the shelf' implementations.

Fortunately, given how little we know about how coronaviruses are transmitted, [1] using the Gompertz Function Model to forecast hospital demand doesn't require models of infections, cases or the proportion of cases that will require hospitalisation. It is based simply on counting hospital and intensive care admissions.

By analysing a multi-year record of daily influenza cases in Portugal, we have shown the annual influenza cycle there is described to very good accuracy by alternating periods of Gompertz Function growth and periods of linear growth (where the Reproduction Number must be approximately equal to 1).

These cycles of Gompertz Function and linear growth provide a natural way of separating epidemic waves (as in the 'Spanish Flu') and of observing the transition from epidemic to endemic phases of disease.

Alternating Gompertz Function growth and linear growth is exactly what has been observed in Covid-19. The initial outbreaks began with Gompertz Function growth and then switched to linear growth. Subsequent outbreaks in 2020 followed the normal seasonal pattern of influenza [9] and other common respiratory illnesses. As yet, none of the seasonal 'slots' have been missed out-so we should be alert for repetitions of this pattern. After large scale vaccination programs there have been some outbreaks in periods other than the normal seasonal ones. In each case they followed the same alternating Gompertz Function growth-linear growth regime.

This Extended Gompertz Function Model of alternating Gompertz Function and linear growth phases provides an important mechanism for the early observation of new outbreaks and the ability to accurately forecast resulting demands on the healthcare system: identify transition from a linear growth regime to Gompertz Function growth, then use the Gompertz Function Model to provide forecasts. 1 For a Gompertz Function, the speed of infection peaks with 1 e or approximately 37% of the susceptible population infected and declines steadily thereafter. This marked asymmetry with rapid growth and slow decay is observed to very good agreement in all of our data. It is also consistent with recent generalisations of compartmental models that remove the unrealistic assumption of a perfectly homogeneous susceptible population.

The dynamics of the Gompertz Function are determined by the Gumbel probability distribution.This is an exceptional distribution with respect to the geometry determined by the affine group on the line which is the key to the Gumbel distribution's role as an Extreme Value Theory attractor. We show that this, together with the empirically observed asymmetry in epidemic data, makes Gompertz Function growth an essentially inevitable feature in epidemic models which agree with observations.

In Section 2 we discuss the purpose and predictive power of models and review the performance of the Gompertz Function model in a wide variety of infectious disease outbreaks prior to 2020 and the Covid-19 epidemic.

Next we illustrate the Gompertz Function Model in the initial Covid-19 outbreaks. Our focus is on healthcare and the predictive power to make forecasts of practical use for planning.

In Section 3 we discuss the mathematical properties of the model. We show how these can be used to infer features of disease dynamics, including the reproduction number and herd immunity thresholds.

In Section 4 we use a multi-year data set of Influenza cases in Portugal as a guide to extend the epidemic model to include epidemic-endemic cycles where Gompertz Function growth alternates with linear growth. We also show that linear growth periods separated the waves of the 'Spanish Flu' in England.

Finally we demonstrate the same sequence of Gompertz Function Growth and Linear Growth in Covid-19 outbreaks.

In Section 5 we discuss a number of important aspects of the fits of Gompertz Functions to data. Section 6 is a brief note on the relation between the Gompertz Function Model and compartmental models.

In Section 7 we discuss the use of the Gompertz Function Model in planning for new outbreaks of Covid-19 and as well as for seasonal Influenza and future pandemics.

Appendix A provides a brief account of the geometry behind the properties of the Gumbel probability distribution that underlies the Gompertz function and the way Gumbel distribution's role as an Extreme Value Theory attractor helps to explain the ubiquity of Gompertz Functions in observations of epidemic data.

Appendix B contains links to our Covid-19 data sources.

The purpose of mathematical models is to allow us to see into the future. To predict things such as the number of people who will be admitted to hospital with Covid-19 in the next two weeks or the trajectory of a Mars lander, for example. The distance we can see and the accuracy of our vision may vary greatly, as it does in these two cases, but predicting the future is the fundamental goal of mathematical models of processes in the real world. In our case, we will focus on making short term predictions (a week to several weeks) with a 'reasonable' degree of accuracy (say within 10 − 20%), which is a very different goal from the models which are used to create scenarios outlining a wide range of possible futures contingent on different events or policy decisions. 2 Early in 2020, people feared that the Covid-19 epidemic was growing exponentially. This was a model that predicted a catastrophic future with explosive growth in the numbers of cases and deaths.

But Michael Levitt observed [12] that cumulative cases in the Covid-19 outbreak seemed instead to be growing according to a Gompertz Function. This is a simple 3-parameter phenomenological model for epidemic growth in which the number of cases at time t is:

where N , a and b are parameters to be calculated 3 from the reports of daily case numbers. By contrast with exponential growth where the doubling time is constant, in Gompertz Function growth the time required for cases to double increases rapidly, damping the epidemic more and more as it progresses.

And this is what was subsequently seen as the Covid-19 epidemic spread. A very asymmetric growth pattern with a rapid rise and a much slower fall off after the peak. This is apparent in the Covid-19 deaths reported daily in Italy in the spring of 2020, shown in Figure 1 . Because of the large degree of variability in the daily data and to illustrate the trend in the data we have included a rolling 7-Day average. 4 This shows the high degree of asymmetry in the daily deaths which rise much more quickly than they fall. 5 The cumulative deaths for the same period 6 are shown in Figure 2 . Figure  3 shows how well the Gompertz Function Model fits the cumulative Covid-19 Deaths data.

The daily differences between Gompertz Function values are shown together with the 7-Day average of daily deaths in Figure 4 .

Levitt and his co-authors [13] showed subsequently that in hundreds of countries and states worldwide the Gompertz Function Models obtained by fitting observed data could be used to predict final case and death levels as the epidemic continued and more data became available.

This should not have come as a surprise. It is precisely what the historical record told us to expect, as we show in the next section. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In this section we show that the Gompertz Function Model is a generically good description of epidemic data from a wide variety of epidemics over the past 170 years.

In each case we display the 'best fit' of a Gompertz Function Model to the cumulative epidemic data at the end of the outbreak, where the best fit is determined by non-linear regression using a standard Python routine. 7 , 8 It is important to note that the quality of the Gompertz Function fits shown in all the examples here implies that the strong asymmetry we noted in Italy's Covid-19 deaths data is, to a good approximation, a regular feature of epidemic growth. In addition, Figures fig:PortugalInfluenza17-13 show that the following exam- All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 10 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. From Denmark in 1853, a time when even the bacterial origin of Cholera was still to be discovered, through the 'Spanish Flu'-the greatest pandemic of modern times-to major bacterial and viral epidemics prior to Covid-19, the data shows us that the Gompertz Function has been an accurate description of the outcome of all of these outbreaks.

Examples Prior to 2020 about what we will see in the next days or weeks.

To assess the predictive power of the Gompertz Function model we need to know if the best Gompertz Function fit at time t allows us to predict numbers for some time t + τ into the future with a reasonable degree of accuracy. For concreteness we'll use an error of ±10% as our desired level of accuracy and measure the time τ for which a fit remains within that error.

We illustrate this process in Figure14 for Cholera in Denmark in 1853, in Figure 15 for Ebola in Sierra Leone in 2014-15 and in Figure 16 for the Portuguese 2017-18 Influenza season.

To simplify the figures, only the out of sample part of the Gompertz Function is graphed. Error bands of ±10% around the data show how quickly a high level of accuracy is achieved.

To judge the quality of successive predictions, we want to measure the gap between the predicted value X(t) and the observed value X obs (t) as a percentage of X(t). In other words we want to judge the extent to which the observation differed from the prediction, as a percentage of the prediction. 10 We will use the absolute value of the error to simplify the graphs. So we display the error in % defined as

10 This corresponds to putting a band around the prediction rather than the data.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure 14 : Sequence of weekly updates of Gompertz Function fits to Danish Cholera Cases

The Danish Cholera epidemic evolved very rapidly. The case data begin on 6 August and by 6 October 1853, less than 9 weeks later, the epidemic was over.

The first Gompertz Function fit on 26 August, as Figure14 shows, soon proved over-pessimistic. Its out of sample error was less than 10% for only 3 days. Nevertheless even this early fit would have been useful. The error remained less than 20% for 12 days. Figure 17 shows how quickly the predictive power of subsequent fits increases. The fit made one week later, on 2 September, was never out by more than 6%. The remainder of the weekly update fits had errors of less than 1% for the duration of the epidemic.

We have WHO Situation Reports with weekly updates on Ebola cases in Sierra Leone for 52 weeks from 21 September 2014 to 13 September 2015.

The first fit (not shown) was done with six weeks of data (i.e. with only 6 data points) on 26 October. Its out of sample error was less than 10% for only 1 week but remained below 15% for 2 weeks. Subsequent fits gained predictive power rapidly. For the fit made two weeks later the error was less than 10% for 2 weeks and below 15% for 3 weeks

The error from the first fit shown in Figure18 (November 16 2014) with only 9 data points was never out by more than 11%. Subsequent updates to the preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure 17 : Evolution of the in and out of sample errors in the Gompertz Function fits to Danish Cholera Cases. 16 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure 19 : Evolution of the in and out of sample errors in the Gompertz Function fits to Portuguese Influenza Cases Gompertz Function fits were done every 4 weeks. Their out of sample errors all remained below 10% for the duration of the outbreak.

The Portuguese Influenza Case data begins Gompertz Function growth on 1 December 2017 and continues until 30 April 2018. The out of sample error of the 11 January fit remained below 20% for eleven days but was below 10% for only 5 days. For the fit made two weeks later, the out of sample error remained below 10% for more than a week. From 1 February's fit this level was maintained for almost 3 weeks out of sample.

These examples illustrate the predictive power that precedes the final excellent Gompertz Function fits to the observed data. The differences in the time taken to reduce out of sample errors to our 10% target depend on the parameters a and b in a way that we will return to after outlining the dynamics of the Gompertz Function.

First we examine the performance of the model in the 2020 Covid-19 epidemic across a variety of locations.

Covid-19 has produced an unprecedented volume of data from countries all over the world.

In addition to case and death data, many locations published data on the medical consequences of the outbreak: hospital and intensive care admissions for Covid-19, and/or daily updates of total hospital and intensive care patients, numbers on ventilators etc.

Any process driven by the epidemic-such as hospitalisations-that also follows Gompertz Function growth will be predictable.

We should expect that cumulative hospitalisations will evolve as a percentage of cases. Likewise a certain percentage of Covid-19 patients will be admitted to Intensive Care Units or other specialised treatment. But these percentages cannot be known in advance and may vary over time.

The Gompertz Function Model doesn't require that we know, estimate or approximate these percentages in advance. As data accumulate (typically two weeks of daily observations is enough to start) Gompertz Function fits can be obtained and tested for their predictive power. Based on previous epidemics, we can expect predictive power to increase quickly as time goes on.

We have used hospital and death data as our primary illustrations rather than cases. Hospital admissions generally involve a diagnosis of illness rather than a simple test for the presence of the virus so they should be expected to provide a good proxy for the progress of serious cases of infection. By contrast, case data are subject to considerable uncertainty (regardless of testing regime) and are frequently revised retrospectively. 11 These data are available in many locations with a short lag. This means that historic data for hospitalisations and intensive care admissions will be a very good approximation of the data and model performance we would have seen in real time, as long as the lags are taken into account.

For the Northern Hemisphere, the locations we use to illustrate the Gompertz Function fits are, in order of decreasing latitude, Sweden, London, Isle de France, the Province of Ontario and Portugal. In the tropical Southern Hemisphere we use the State of Rio de Janeiro. These are all of roughly the same order of magnitude in population (approximately 10 to 15 million people).

For an equatorial example we use the Brazilian State of Amazonas (approximately 4 million people).

We have been able to obtain data for Hospital admissions in all of these locations and for ICU admissions in all but London and Ontario. We have data for the ICU admissions for England, Wales and Northern Ireland combined, and have used that as a proxy comparison for London.

We also have daily records of the number of Covid-19 patients in Intensive Care and in Ventilator beds for London.These data are well approximated by the derivativeẊ of a Gompertz Function and hence predictable. We illustrate this in Figure 26 where we have made a fit to the cumulative. 12 Across all of these examples, the Gompertz Function fits are excellent. As a result, the predictive power was also good, exactly as illustrated in the previous epidemics. This is particularly important for making accurate projections of health care loads. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 25 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 26 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 27 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. These examples show that, as expected, the good fits observed in Figures 20 to 41 mean that as the epidemic progressed, the errors in the out of sample predictions of the Gompertz Function model improved to remain below the 10% level we specified.

This convergence is close to monotone for periods long enough to be practical for use in planning for health infrastructure demands. Knowledge that realistic short to medium term estimates of these demands can be made is extremely important given the universal fear of overwhelming healthcare systems that the Covid-19 epidemic has generated.

While the errors always converge to a level below 10%, the time this takes varies significantly across our examples. The errors at the beginning of the in-sample data also remain large even as they converge to very low levels out of sample as the epidemic progresses. In order to discuss these and other regularities and to examine the the extension of the Gompertz Function Model to subsequent Covid-19 outbreaks, we will make use of some observations about the Gompertz Function dynamics to which we devote the next section.

28 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 29 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 30 All rights reserved. No reuse allowed without permission.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 32 All rights reserved. No reuse allowed without permission.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 33 All rights reserved. No reuse allowed without permission.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

If X(t) is the cumulative number of cases at day t and N is the final (as yet unknown) level, then, writing

N is the fractional number of cases at time t.

Therefore G is a non-decreasing function whose values lie in [0, 1]-a probability distribution. 13 The distribution G(t, a, b), has (observable) scale and location parameters a and b which vary over time but converge as the epidemic tends to its conclusion.

For the Gompertz Function, G is the Gumbel distribution

The constant N determines the final scale of the epidemic, but the dynamical properties of the Gompertz function are completely determined by the Gumbel distribution. 14 In addition to the daily rate of change of X, denoted byẊ, a critical quantity for the impact on a population is the relative rate of changeẊ X -the new infections or deaths each day as a fraction of the total population affected so far.

Since , a, b) ). It's immediate from the definition of the Gumbel distribution that

It follows from the equation forq that, unlike the case of exponential growth where the relative rate of change is constant, in Gompertz Function growth the relative rate of change of X is decreasing exponentially as time increases. Using the first difference approximation toẊ X , the derivative of log(X), we haveẊ X ≈ log(X(t + 1)) − log(X(t)).

SinceẊ X =q, this means that log(log(

We can observe a i and b i from the slope and intercept of the line at time t i . Of course in practice the plot of log(log(X(t i+1 ))−log(X(t i ))) is only approximately a straight line and the observations of the parameters will be volatile, especially early in the epidemic. Nevertheless, it's easy to see in epidemic data 13 For example, a common 'S-curve' from population dynamics is the Logistic function

14 As we explain in the Appendix, the Gumbel distribution is not just another 'S-curve' but is exceptional from the point of view of the geometry of the affine group on the line (the 'Location-Scale' transformations). This geometry is what explains the Gumbel distribution's role as an 'attractor' in Extreme Value Theory and contributes to its apparent inevitability in epidemic behaviour.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. ; that, as time increases, this series trends downward steadily and tends to a straight line with negative slope.(See Figure 49) While the Gompertz function never increases exponentially, it does grow rapidly at first with both its velocityẊ and its accelerationẌ increasing. 15 The parameters aN and a 2 N control these rates of growth. Note that a has units of t −1 so that aN and a 2 N have the correct units for the velocity and acceleration of X.

Suppose that the daily increments for one process Y (for example ICU admissions) is a multiple λ of another process X (for example Hospital Admissions) but with a shift of l days. If X is a Gompertz function then

It follows that Y is a Gompertz function whose parameters are

and

These relations hold approximately among some of the Gompertz function groups for hospital admissions, ICU admissions and deaths presented in the previous section. For example, although the hospital and ICU admissions data for Isle de France both begin on the same day, Equation 9 says that ICU process lags hospitalisations by about 4.2 days. This is also the lag between the maxima of their Gompertz function derivatives. For this case the values of N in ICU and Hospital admissions predict that 17% of those hospitalised will be admitted to ICU. At the end of the data sample a comparison of ICU admissions with Hospital admissions 4 days earlier gives a ratio of 17.8%.

The Gumbel distribution's features produce a number of characteristic times for Gompertz Function growth. We denote by T 1 (a, b) , the time at which the accelerationẌ reaches its peak. It's easy to check by finding the zeros of the third t-derivative of X, that T 1 is given by 15 The Gompertz function can fit growth with arbitrarily many positive derivatives. See Section 5 for a brief discussion of the 'exponential growth phase'.

Thus, by the time about 7.3% of the susceptible population N has been infected, the acceleration has reached its peak of approximately 0.309a 2 N ,and begins to decline quickly.

The velocity which is approximately 0.191aN at T 1 (a, b) continues to increase until time T 2 (a, b)

The parameter a is the value ofq at T 2 . The dimensionless parameter b determines the fraction of the final level N observed on day 0:

At time T 2 , X(T 2 ) = N e ≈ 0.368N so the velocityẊ begins to decline once about 37% of the susceptible population is infected. It is easy to verify that the peak velocityẊ(T 2 ) = a N e ≈ 0.368aN -so between peak acceleration and peak velocity the velocity has almost doubled:Ẋ(T 2 ) = 2e ( 1+ √

3+ √ 5Ẋ (T 1 ) ≈ 1.926Ẋ(T 1 ). The accelerationẌ remains negative for all time t > T 2 and reaches its minimum at time T 3 (a, b) given by T 2 is midway between T 1 and T 3 and s(a) is the common distance between them. The larger the value of a the shorter the interval between the critical times.

As an indication of how quicklyẊ is growing in the early phase of the epidemic, between time T 1 − s(a), where X = N e −( 3+ √ 5 2 ) 2 ≈ 0.001N , and time T 1 ,Ẋ increases by a factor of more than 26.

36 All rights reserved. No reuse allowed without permission.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 0# 0 9 ! D e c ! 2 0 2 0 # 2 3 ! D e c ! 2 0 2 0 # 0 6 ! J a n ! 2 0 2 1 # 2 0 ! J a n ! 2 0 2 1 # 0 3 ! F e b ! 2 0 2 1 # 1 7 ! F e b ! 2 0 2 1 # 0 3 ! M a r ! 2 0 2 1 # 1 7 ! M a r ! 2 0 2 1 # Log#(Log(X(t+1))!Log(X(t)))# Log(Log(Gompertz(t+1))!Log(Gompertz(t)))# Linear#(Log#(Log(X(t+1))!Log(X(t))))# Likewise,Ẍ increases by a factor of approximately 7.3 in the same period. But even over this period of rapid increase, the growth is never exponential. In fact the third derivative of X decreases steadily to 0 over the interval from T 1 (a, b) − s(a) 2 to T 1 (a, b), then remains negative until T 3 (a, b). The decline inẊ after it reaches its peak is slower than its initial growth producing the characteristic asymmetry seen in Figure 50 .

It is not until time T 3 (a, b) + s(a) 2 thatẊ drops back to about the value it had at T 1 .Ẋ(T 3 (a, b) + s(a)

2 ) = a (

The Gompertz Function Model fits the cumulative observations, so by taking the differences X(t i+1 ) − X(t i ) we have a model for the daily observations. The Gompertz Function time derivativeẊ(t i+1 ) is a very good approximation to this first difference as can be seen in Figure 50 . The maximum and minimum of the acceleration are located symmetrically around the peak of the Gompertz function velocity, but as we have seen before, X itself is asymmetric-as Figure 50 shows. For a symmetric 'S-curve' peak velocity comes at the point where X = 1 2 N while for the Gompertz Function it is X = 1 e N ≈ 0.368N . As a result, any growth function Y for whichẎ is symmetric will underesti-37 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

The natural timescale s(a) can be used to estimate the time remaining in the epidemic if it continues to follow the Gompertz Function. 17 As soon as we have stable values for the parameters a, b and N , we can calculate the remaining time, in days, to reach any desired fraction of N . These times are conveniently described in terms of T 2 and s(a).

For example, we have already observed that X(T 3 ) = X(T 2 (a, b) + s(a)) ≈ 0.68N . We also have X(T 2 (a, b) + 2s(a)) ≈ 0.84N , X(T 2 (a, b) + 3s(a)) ≈ 0.95N and X(T 2 (a, b) + 4s(a)) ≈ 0.99N .

The effective Reproduction Number R t for an epidemic at time t is defined as the average number of secondary infections that are produced by each primary infection. [5] There is generally no way to observe infections and this number must be inferred from other quantities-such as cases-so the value of R t is typically subject to considerable uncertainty. Nevertheless, we can draw some conclusions about R t for an epidemic which is following a Gompertz Function, at least if the growth of cases is a reasonable proxy for the growth of infections.

For any time t > T 2 , the accelerationẌ < 0, soẊ(t) is strictly decreasing. Because the derivative at time t + 1 is almost identical to the difference X(t + 1) − X(t), the number of new cases must also be smaller at time t + 1 than it was at time t. Independent of the number of days τ that it takes for an infected individual to become infectious, the number of new infections on day t + τ is less than on day t for all t > T 2 .

As a result, when the cases in an epidemic follow a Gompertz Function, for all times t > T 2 , R t is less than 1-so X(T 2 ) is the proportion of the population infected in order for the epidemic to begin to decline. This is the Herd Immunity Threshold. 18 The generically good fits of the Gompertz Function Model to epidemic data makes it clear that the empirically observed Herd Immunity Threshold across all of the outbreaks illustrated above is always close to X(T 2 ) = 1 e N ≈ 0.368N .

For an epidemic which passes the Herd Immunity Threshold and continues to follow a Gompertz Function Model, R t will tend to 0 and the epidemic will simply die out (as in the case of the Danish Cholera epidemic of 1853). But rather than dying out, the disease may become endemic with periodic outbreaks.These may be sporadic as is the case with Ebola. Or the outbreaks may follow an annual cycle-as they do in influenza.

In endemic disease, R t must be very close to 1. If it were exactly 1 then the growth in cumulative cases would be linear. Conversely, if the cumulative is exactly linear, then the number of cases between t and t + t 1 is constant for all values of t 1 and this means that R t = 1.

If R t is very close to 1, there will be fluctuations in the number of daily cases but the graph of cumulative cases from time t to t + t 1 will be nearly a straight line whose slope is the average number of cases in that period.

So if an epidemic ends with a transition from Gompertz Function growth to linear growth, this is an indication that the disease has become endemic. 18 We are using the proportion of the population that needs to be immune in order for new infections to decline as our definition of the Herd Immunity Threshold. [14] .

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In our analysis of the initial Covid-19 outbreaks we saw exactly this process repeated again and again: Gompertz Function growth ended with a transition to linear growth. For example, Figure 51 shows the best line fit to London Covid-19 Hospital Admissions after the Gompertz Function growth shown in Figure 23 ).The linear fit has r 2 = 0.997, its maximum absolute error is less than 0.6% and its average absolute error is only 0.24%. Linear growth is not just a good approximation. It is an extremely good approximation. 19 Influenza is an endemic disease which has regular outbreaks of epidemic infection. Thus it provides a natural 'observatory' for the endemic-epidemic cycle. In the next section we show that transition from Gompertz Function growth to linear growth (twice) is precisely what happens in influenza cases in Portugal in a regular annual pattern.

The Portuguese government has published daily influenza case data beginning 1 November 2016. 20 .The main 'Flu Season' runs from approximately November to April. 21 Figure 52 shows the annual cycle for Influenza Cases recorded by Each year there is an annual winter peak followed by a dramatic reduction in cases to a much lower level through the late spring and summer. But cases never die out completely and in early autumn they begin to rise again. The graph of cumulative cases ( Figure 53 ) reveals further regularities with consistent annual patterns. The cumulative cases through each of the annual peaks follow Gompertz Function growth (shown in red). 22 We have already shown ( Figure 12 ) that the 2017-18 outbreak from 1 December 2017 until 30 April 2018 followed the Gompertz Function Model. Figures 54 to 56 show the same thing for the remaining years in Figure 53 . Figure 57 shows that even though Influenza cases have become much less prevalent, the same Gompertz Function growth marks the main seasonal outbreak in 2020-2021.

Another important regularity which is visible in the cumulative cases is that the periods of Gompertz Function growth end in a transition to linear growth (just as we have already observed was the case with the initial Covid-19 outbreaks).But each of the main influenza outbreaks is also preceded by a period of linear growth. Figures 58 and 59 show the linear entry to and exit from the 2016-17 Gompertz Function growth phase.The linear fits to the cumulative case graphs shown in these entry and exit figures have r 2 values of over 0.99 and very small absolute errors between the data and the linear fit-so this is not simply 'close' to linear, it is almost exactly linear. 22 The data from the 2020-2021 season, which also follows Gompertz Function growth as we show below, is invisible on the scale of this graph.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 65,000$ 0 1 ) J a n ) 2 0 1 7 $ 0 8 ) J a n ) 2 0 1 7 $ 1 5 ) J a n ) 2 0 1 7 $ 2 2 ) J a n ) 2 0 1 7 $ 2 9 ) J a n There is a further regularity in the 'off season' portions of the cumulative case data.The growth in cumulative cases continues to be piecewise linear with drops in the slope though the summer minimum. For example, in 2017 the exit from Gompertz Function growth in April had a slope of 102 cases per day which dropped in two stages to only 21 cases per day by August.

But the entry to the Gompertz Function growth phase in November 2017 ( Figure 60 ) had a slope of 125 cases per day. How did the case rate increase to this level?

Our conjecture when we observed this was that there was an intermediate phase of rapid Gompertz Function growth whose exit is to linear growth at the higher rate. This is consistent with the apparent universality of Gompertz Function growth in epidemic outbreaks that we have seen all of our other examples.

And this is precisely what we found in the Portuguese Influenza case data-not just in 2017 but each year, including 2020. Figures 61 and 62 show the cycle from January to December 2017 and the detail of the piecewise linear spring and summer decline followed by a Gompertz Function growth outbreak in September and October. This is followed (as Figure 60 Figures 63 to 66) . Table 1 shows the dates of the Gompertz Function growth and linear growth cycles from December 2016 to March 2021. 45 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 46 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is clear that, at least in Portugal, the influenza cycle has, to a very good approximation, two modes: linear growth and Gompertz Function growth. Moreover, the Gompertz Function growth phases agree with Edgar Hope-Simpson's cycle in the Northern Temperate Zone [9] -so we expect to see this pattern repeated in daily influenza cases in other countries as well.

If this conjecture is born out, it would have very important implications for the prediction of the influenza cycle and the resulting demands on healthcare systems which are often severely stressed in annual influenza outbreaks. During these periods the Gompertz Function Model has good predictive power. In the piecewise linear growth periods linear extrapolation will suffice.

The prediction problem is therefore reduced to identifying the transitions, especially from linear to Gompertz Function growth.

We now show the same alternation in growth separated waves in the 'Spanish Flu' epidemic.

The worst pandemic of the last century, the 'Spanish Flu', was felt in waves in various locations around the world. [6] In England and Wales, it came in three distinct waves, the mechanisms behind which are still uncertain. [20] , [7] Weekly records of Spanish Flu deaths from the Registry for England and Wales 23 record deaths parish by parish across England and Wales. The cumulative 23 We are grateful to D. Earn for providing this data.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 48 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Cumula9ve"Deaths"in"England"and"Wales"" Linear"Fit" Figures 67 to 70 ). In the complement to the linear growth periods we see excellent fits by Gompertz Functions. (Figures 71, 72 and 73 show the fits to the cumulative cases from the beginning to the end of each period.) The calculation of T 2 in each wave gives peaks which agree with the report of the Registrar for England and Wales' description of the epidemic wave peaks [11] .

The cumulative deaths over all three waves are shown in Figure 74 with the Gompertz Function growth periods in red.

In linear growth periods, as we have already noted, the Reproduction Number must be very close to 1-which indicates that at the end of Waves 1 and 2 the dominant variant had become endemic. There appears to be no generally accepted mechanism that would explain the waves observed in the Spanish Flu, [20] , [7] however the linear growth periods suggest that Waves 2 and 3 may have been driven by variants of the original virus. This is also consistent with increased virulence of the second wave relative to either Wave 1 or Wave 3.

In Figure 8 we illustrated the fit of the Gompertz Function to Spanish Flu deaths in Prussia from the beginning of the outbreak until 15 December 1918. From that date, cumulative deaths grew linearly as Figure 75 shows. The linear fit to the data shown is almost exact, with a maximum absolute error of only 0.23% and r 2 > 0.99. We conclude from this that by mid-December of 2018 the Prussian Spanish Flu epidemic phase was over. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 123,000% 0 4 * J a n * 1 9 1 9 % 1 1 * J a n * 1 9 1 9 % 1 8 * J a n * 1 9 1 9 % 2 5 * J a n * 1 9 1 9 % 0 1 * F e b * 1 9 1 9 % 0 8 * F e b * 1 9 1 9 % Cumula6ve%Deaths%in%England%and%Wales%% Linear%Fit% Linear%fit%Extrapolated% 52 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 53 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. ; https://doi.org/10.1101/2021.12.26.21268419 doi: medRxiv preprint 54 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. ;

We have already noted that the initial Covid-19 outbreaks in our examples all ended with a transition from Gompertz Function growth to linear growth. Figure 51 showed this for Covid-19 Hospitalisations in London. After the linear exit from Gompertz Function growth phase in London Hospitalisations, there was piecewise linear growth with progressively lower slopes through the summer of 2020.

But, exactly as in Hope-Simpson's influenza cycle, in September 2020 Covid-19 hospitalisations started rising again in Gompertz Function growth. (Figure  76) This was followed by linear growth from 1 November to 7 December as Figure  77 illustrates.

Then the main outbreak came at the height of the annual respiratory virus season ( Figure 78 ). It ended with a transition to linear growth in March 2021.

Covid-19 hospitalisations, ICU admissions and deaths in London repeated exactly the seasonal cycle of alternating Gompertz Function growth and linear growth we observed in Portuguese influenza cases.

We have seen the same phenomenon repeated in multiple locations. We illustrate this next for Portugal's ICU admissions. Like London, Portugal repeated the same pattern of linear and Gompertz function growth that we observed in the Portuguese influenza case records. Figure 79 shows the late summer outbreak of Gompertz Function growth followed by the main outbreak from January to March 2021 ( Figure 80 ). This ended, as usual, with a transition to linear growth.

In data from Sweden, Isle de France and in Ontario, once again we find this influenza-like alternation of linear and Gompertz Function growth in the late summer and the autumn of 2020.

In each of these three locations there was an outbreak of Gompertz Function growth in the usual winter period, ending as usual with a period of linear growth.

But then, unlike London and Portugal, there was a further 'wave' beginning in March (late, but still within the seasonal period identified by Hope-Simpson). The two waves were separated by linear growth-which indicates that a different variant may have been responsible for the second one.

In Rio de Janeiro and Amazonas we have also seen the alternation of Gompertz Function growth and linear growth. The tropical region influenza cycle is more complicated than in the temperate zone so a comparison of the timing of Covid-19 outbreaks and the annual influenza cycle will be explored elsewhere. Figures 90-93 for hospitalisations in the State of Rio de Janeiro and deaths in Amazonas show how well the Gompertz Function fits the second round of Covid-19 outbreaks. Table 2 shows a sample of dates for alternating Gompertz Function and linear growth. 55 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 56 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. ; https://doi.org/10.1101/2021.12.26.21268419 doi: medRxiv preprint 57 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 58 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 59 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. 60 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

Early in the Covid-19 epidemic the doubling time was a matter of daily speculation and concern. Then, and subsequently, the phrase 'exponential growth' has been used to describe Covid-19 outbreaks by many people who clearly have only a tenuous grasp of its meaning.

Epidemics do indeed grow rapidly, even extremely rapidly, at first. Epidemiologists often make use of the idea that epidemics have an initial exponential growth phase, but this is really an approximate feature of compartmental models 24 not an empirical observation. In a study using a range of data sets from real epidemics, Chowell et al [5] show that subexponential growth is the rule rather than the exception in the early phase of an epidemic.

The utility of the 'exponential growth' approximation is that it allows modellers estimate model inputs. While there is often a reasonably good fit to an exponential function in the early phase in epidemic data, such fits rapidly lose any predictive power. Exponential growth is simply too explosive to be sustainable. Figures 94 and 95 show the Gompertz Function fit and the best exponential function fit (using the same nonlinear regression algorithm) for Covid-19 hospitalisations in Portugal from 5 March to 9 April 2020. It's (just) possible to see from the graphs that the Gompertz Function fit is closer to the data. But in fact the Gompertz Function fit is much better: the mean squared error of the exponential fit is 1.8 times as large as it is for the Gompertz Function over the sample period.

There is also a very big difference in the loss of predictive power out of sample. Both fits produced over-estimates of future hospitalisations. On 9 April 14, 600 people had been hospitalised for Covid-19 in Portugal. By 15 May a total of just over 50, 000 hospitalisations were recorded. The exponential fit would have reached 1.3 million and the Gompertz Function fit 274, 000 if they were projected out to 15 May.

But it's not the long term predictions that matter on 9 April. We've already seen that by 16 April the Gompertz Function fit (Figure 46 ) had absolute errors below 10% for more than a week and below 15% for three weeks.The important question is how useful the predictions made on 9 April would have been in the short term. Figure 96 shows the out of sample errors for the two fits.

Remember that hospitalisation data is typically delayed by two days, so that while the first out of sample day is 10 April, the calculations could not have been made until 11 April. Only on 12 April is the out of sample period in the future of the calculation. By this time the error in the exponential growth fit is already almost -18%, compared with -12% for the Gompertz Function. By 16 April the exponential fit error is -38% compared to -25% for the Gompertz Function. 24 For example in an SIR model, if we assume that the entire population is susceptible at the outset, so S N ≈ 1, then the while that condition holds, I grows exponentially. But the SIR equations also show the evolution of S will rapidly violate the condition S N ≈ 1.

We have seen that out of sample error levels for Gompertz Function fits drop rapidly to 10% or less for extended periods as the epidemic progresses. But in the early stages errors over a period of a week or more will generally exceed this level. This is because there are two distinct phases of Gompertz function growth.

We have shown evidence for the Extended Gompertz Function Model of alternating periods of Gompertz Function Growth and linear growth. Within the former we generally observe two phases. While there are excellent Gompertz Function fits all through an epidemic, the Gompertz Function that fits the data for, say, the first 3 weeks is different from the one that fits them 3 weeks later.

We illustrate this again with Portuguese Hospitalisations from the initial Covid-19 epidemic. Figure 33 shows how well the Gompertz Function fits the hospitalisation data from 5 Mar to 15 May 2020. But we can also see from Figure  97 that the Gompertz Function from 9 April is a much better fit to the data from 5 March to 9 April. The mean squared error over this part of the sample is almost four times as large for the 15 May fit as it is for the one on 9 April. This behaviour is generic. It is the reason that there are still large in-sample errors in the initial weeks for fits whose out of sample errors fall dramatically as in Figures 42 to 48 .

Note that this is inconsistent with the idea that early epidemic observations can be viewed as being a 'noisy' version of the values from the final Gompertz Function. The noise would have to be systematically biased to be much larger in the initial weeks of the epidemic than it is in the later stage. There is certainly no reason to imagine that 'measurement error' in the count of hospitalisations could behave in such a way.

We will return to the question of improving the predictive power of early stage fits and ways of using advanced statistical methods to produce accurate bounds on early stage surge levels in a separate publication.

The division between the 'initial phase' and the rapid convergence phase can be described in terms of the critical time T 1 when the acceleration reaches its peak. Each successive fit produces a different pair (a i , b i ) and hence a different value for T 1i . During the initial phase T 1i will be in the future of the calculation date D i . The threshold for the end of the initial phase is the time from which all T 1i are in the past of date D i . In the case of the Portuguese Hospitalisations, T 1i for the 15 May Gompertz Function fit is 29 March, the date it had converged to by 30 April. T 1i was consistently in the past of the calculation date by 16 April-which was effectively the end of the 'initial phase'.

Of course we can't know how long the initial phase will last. We just know that it will end. We know that the evolution will continue to follow the Gompertz Function Model or switch to linear growth-as happens every year in the initial influenza outbreak in Portugal and has happened in all of the Covid-19 outbreaks reported above.

It is often the case in a mathematical process that we know a certain event will occur if an iteration continues long enough-but we can't know in advance how many iterations will be needed. In the meantime, we only know that we should continue the iteration.

We should continue to update the Gompertz Function Fits and monitor their out of sample predictive power. This can very easily be done every day.

During the initial phase it is common to find that the best fit has a very large value of the parameter N . In the Portuguese hospitalisation example on 9 April for example, N was over 7 million (which is 70% of the population of Portugal and not a believable long term prediction). The date of T 1 given by the 9 April fit was 29 May. A week later the 16 April fit gave a T 1 of 1 April-well in the past-and N had dropped below 82, 000. It's easy to see from the Gompertz Function dynamics why very large values of N will occur in the initial phase of very rapid growth. These dynamics are, as we've already discussed, completely determined by those of the Gumbel distribution. We can think of the parameters a and b obtained in a fit of data up to day D as defining a map from the time interval of days Given any positive integer k, there's a number B(k) < 0 such that whenever x < B(k) the first k derivatives of G are positive. When epidemic data is increasing very rapidly (say with growth like a polynomial of degree k), the nonlinear fit routine will find a Gompertz Function such that b + aD < B(k). The more negative the value of B(k), the larger the value of N .

In our example, on 9 April, D = 35 and the right hand endpoint of the image interval is a35 + b = −1.8285. In the image interval the first four derivatives are positive and the first three are monotone increasing. Near the end of the interval the fourth derivative peaks and starts to decrease. This dampens the Gompertz Function fit's growth relative to the exponential case, where derivatives of all orders are always positive.

The Gumbel distribution's value at -1.8285 is approximately 0.00198. In order for the Gompertz Function value to approximate 14, 586, the number of cases on 9 April, we need to have a value of N such that 0.00198N ≈ 14, 586 or N ≈ 7.4 million.

In all of our examples we have shown fits to cumulative data. This is more directly aligned with our focus of predicting health care demands. The relevant issue is how many new patients will be admitted over some planning horizon rather than how many each day within that period.

But there is also a technical issue. We could have fit the daily data with the function Y (t) = N g(t, a, b) where g is the probability density function of the Gumbel distribution. But the passage from daily data to the cumulative is equivalent to integration and has a significant smoothing effect. It is more efficient to first smooth the data then fit the curve and take first differences than it is to fit the daily data and then integrate.

The linear growth phases are also far more obvious in the cumulative than they are in the daily data.

Nevertheless, fitting the daily data to the derivative of the Gompertz function gives very similar results because of the very close approximation ofẊ (t) to the daily difference X(t + 1) − X(t). Note that this is a feature of the small values of a because a one-day grid on the interval [0, D] is equivalent to a grid of step size a on [b, aD + b].

Compartmental models such as the SEIR model where the linked dynamics of portions of a population that are susceptible (S) to an infectious disease, exposed to it but not yet able to transmit it (E), infectious (I) and removed (R) form the basis for much of the modelling of epidemics. The simpler SIR model (where exposure equates to infectiousness) was introduced by Kermack and McKendrick almost a century ago as an approximation (whose assumption of homogeneity in the population they acknowledged was unrealistic).

Like all mathematical models, these rely on a series of assumptions which are only approximately satisfied in reality. The hope is that they can still provide useful insight into the real world process.

One of the most restrictive assumptions is that the populations within the compartments are homogeneous. As Graham Medley, Chairman of the UK's SPI-M pandemic modelling group, points out: 25

It is intriguing that, currently, most models of infectious disease transmission dynamics assume that all hosts are identical, when we know that they are not. For some infections, such as measles, it is probably adequate to consider that everybody is average when predicting the impact of immunisation. However, such models result in policy decisions that have assuming that "everybody is equal" as an unwritten assumption.For other infections, such as HIV, assuming that everybody is average is known to be inadequate; the commonest model structures assume that the population is divided into discrete groups, where everybody within the group is average for that group. But how should the groups be chosen, and how do they interact?

In the case of Covid-19, removing this simplifying assumption can produce outcomes which are greatly at odds with those produced by the standard models. In particular this can significantly lower the Herd Immunity Threshold, in either the SIR model [17] or in the SEIR model [8] .

In [8] , Gabriela Gomes et al show this with heterogeneity introduced through individual variation in either susceptibility or exposure in an SEIR model of deaths where the Herd Immunity Threshold is lowered substantially.

The cumulative simulated deaths for England reported in [8] are very well approximated by Gompertz functions. This may simply reflect the fact that they are constructed to be in reasonable agreement with the observed deaths data. It is also possible that heterogeneity could be 'calibrated' by targeting Gompertz Function growth in the outputs. Then another Gompertz Function growth period began, this one 'out of season'.This caused the Government to delay the planned 'reopening' for several weeks at tremendous economic and social cost. But by mid July the hospitalisation growth was linear once again.

In the autumn there was another seasonal Gompertz Function growth phase 69 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. ; which, as in 2020 was replaced by linear growth in November. It seems likely that another period of Gompertz Function growth should have been expected in December. It appears to have begun in the first week of December with the Omicron variant.

We cannot know in advance when a transition from linear growth to Gompertz Function growth will occur. We should be ready for these at the usual respiratory virus intervals, but it is now clear that Covid-19 can appear 'out of season' as well.

To benefit most from the use of this model, it is essential to be able to detect the transitions, especially from linear to Gompertz Function Growth.

While we know that the latter will fairly rapidly become highly predictable, we must also deal with the initial period where the predictive power is much lower-and we cannot know in advance how long that phase will persist.

There are many ways of approaching these problems. Ours is to use a probabilistic approach to estimate, based on recent observed daily admissions what future peaks will look like on average. This has proved very effective both in detecting the onset of the Gompertz Function growth phase and in providing useful bounds for growth of admissions until the predictive power of the Gompertz Function fits becomes sufficient. We will discuss this approach in a separate publication.

We have shown that the Extended Gompertz Function Model describes the Influenza cycle in Portugal and have shown indirect evidence that this is also the case elsewhere. It is important to test this conjecture in as many other countries as possible. If it is confirmed, the model can be an important addition to influenza surveillance systems and alleviate the frequent crises that occur when the main seasonal outbreaks are larger than usual.

Finally, the Extended Gompertz Function Model has immediate applications to preparation for future pandemics. When such an event occurs we should expect cases, hospitalisations and so on to follow Gompertz Function Growth. During these periods the Gompertz Function Model has excellent predictive power.

We should expect waves to be separated by periods of linear growth indicating endemic status for the initial virus and the likelihood of a variant driving a subsequent wave.

It is likely that the next epidemic (of any sort) will also begin with Gompertz Function growth. We are hopeful that,when that happens, rather than panicked visions of exponential growth, this model will be applied to make accurate predictions.

70 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 29, 2021. ; End of Table   A The Geometry Behind the Gompertz Function This appendix provides a quick sketch of the geometry behind the Gumbel distribution and the Gompertz Function. An introduction to this topic is contained in [4] where we described a new solution to the problem of characterising the domains of attraction of the Gumbel distribution and the other Extreme 71 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 72 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Gompertz"Func?on"Fit"15"May"2020"

Gompertz"Func?on"Fit"9"Apr"2020" The 'location-scale' parameters in univariate probability distributions arise from the action of the affine group t → at + b. As Elie Cartan taught us [2] , group actions produce geometry and geometry produces differential invariants such as curvature. The Gumbel distribution is part of an exceptional maximally symmetric constant 'curvature' family, the Extreme Value Theory attractors and is the one with zero curvature.

Cartan's methods [19] show that for a smooth univariate probability distribution F the equivalence class of F under the affine group is determined by the functional relation between the invariant J(q) =q q 2 and q = log(F ).

If we denote that function by h so that the relation is J(q) = h(q), then we may also view the equivalence class as determined by the second order ordinarÿ

and identify the location scale parameters of F as the initial conditions for this differential equation.

For each F this equation is the Euler-Lagrange equation (equivalently Hamilton's equations) for the variational problem of minimising the integral of E(q,q) =q 2 e −2 R h(q)dq . E is the conserved quantity that Noether's theorem associates to the time translation symmetry subgroup of the affine group so we can think of it as an 'energy'.

The 1-parameter family of distributions with h(q) = 1+c q corresponds to the Extreme Value distributions. The Gumbel distribution is given by c = 0, the values c < 0 give the Weibull distributions and c > 0 the Fréchet distributions. If we parametrise these with ξ = 1 c then as |c| → ∞ both the Weibull and Fréchet families tend to the Gumbel distribution.

Only for the Extreme Value distributions is there a second symmetry of the differential equation. It is obtained by simultaneously scaling t and q by the same factor. (This is not a Noether symmetry so there is no associated conservation law.)

This maximally symmetric case is diffeomorphic to the affine group itself, identifying the differential system for equation (15) with the Maurer Cartan structure equations for the affine group. [3] A distribution F on the interval [A, B] (where either of A or B may be infinite) is in the domain of attraction of an Extreme Value distribution corresponding to the constant c if and only if its invariant h F tends to 1+c q as t → B. The Gumbel distribution's domain of attraction is the thin-tailed distributionsthe distributions defined on intervals of the form [A, ∞] which have finite moments of all orders (such as the normal, Laplace or logistic distributions). So if F is any thin-tailed distribution the defining equation for the equivalence class of F must approach that of the Gumbel distribution as the percentile level tends to 100%.

This means that any probability distribution F in the Gumbel domain of attraction will have a good fit by a Gumbel distribution beyond some quantile Q and hence a good fit by the Gompertz Function to a model Y = N F (t, a, b) above the value of t corresponding to Q.

In a preliminary investigation we have seen that the quality of any fit to epidemic data seems to indicate that, for practical purposes, Q ≈ 0.5. So the portion of the epidemic curve that occurs after about half of the susceptible population N is affected will be essentially indistinguishable from a Gompertz Function. If we add to this the empirical observation that the daily peak comes at approximately 0.37N , the epidemic curve is bound to resemble a Gompertz function very closely over its entirety.

The long evolutionary path to a stable equilibrium between virus and host seems to have led to the emergence of the Gompertz function as a stable equilibrium for epidemic evolution.

The Gumbel energy isq 2 q 2 , the square of the relative velocity so it would be interesting to see if there are known biological processes that minimise this.The 'simpler' Lagrangian with energyq 2 yields an exponential distribution (the c = −1 case of the Weibull family) for F . This could also be a stable equilibrium but presumably one with no people and no human viruses.

There is an intriguing bit of evidence for the idea that viruses themselves are following a Gompertz Function in an example of observations of bacteriophage viruses (A. G. McKendrick 1939) . McKendrick's model produces a good fit but a Gompertz density fits the data with only about one tenth of the mean squared error of the McKendrick model.

The Covid-19 data presented in our analysis were taken from: 

An outbreak of common colds at an Antarctic base after seventeen weeks of complete isolation

Oeuvres Complètes

Leçons sur les invariants intégraux

From the Geometry of Extreme Value Distributions to Improved Tail Fitting in Market Data

Characterizing the reproduction number of epidemics with early subexponential growth dynamics

The need for interdisciplinary studies of historic pandemics

Inferring the causes of the three waves of the 1918 influenza pandemic in England and Wales

Aguas (2021) Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold doi

The role of season in the epidemiology of influenza

McKendrick (1927) A contribution to the mathematical theory of epidemics

The Age Pattern of Mortality in the 1918-19 Influenza Pandemic

Epidemiological monitoring and measurement of infectivity rates in key countries

Predicting the Trajectory of Any COVID19 Epidemic From the Best Straight Line

Herd Immunity is an important-and often misunderstood-public heath phenomenon

McKendrick (1939) The Dynamics of Crowd Infection

What the media gets wrong about SAGE's models. The Spectator

Powerlaw population heterogeneity governs epidemic waves

Using phenomenological models for forecasting the 2015 Ebola challenge

Elie Cartan, differential systems, transformation groups and differential invariants

Influenza: the Mother of All Pandemics