key: cord-224211-6lsvjfqb
authors: Schaback, Robert
title: Modelling Recovered Cases and Death Probabilities for the COVID-19 Outbreak
date: 2020-03-26
journal: nan
DOI: nan
sha: 
doc_id: 224211
cord_uid: 6lsvjfqb

From March 23rd, the data for the recovered cases of COVID-19 are missing from the standard repository maintained by the Johns Hopkins University in collaboration with the WHO. But since data concerning recovered patients are extremely important for modelling the COVID-19 outbreak, a method for estimating the missing data is provided and tested. As a byproduct, it produces estimates for the probabilities to die $k$ days after confirmation, or to survive after $d$ days.

The COVID-19 repository [1] at Johns Hopkins University provides cumulative data on • Confirmed infected persons (C) and

• Deaths (D) per territory, as integer-valued time series beginning on Jan. 22nd, 2020. Up to March 23rd, there was a count of Recovered (R) as well, but this was stopped for various reasons. One reason that applies for Germany is that there simply is no defined registration for the Recovered. Authorities only report confirmed cases and deaths, but they say [2] that a person with mild or no symptoms is considered to be recovered after 14 days. However, the Recovered are extremely important for analysis of the COVID-19 outbreak. They are the major R part of standard models like SIR and SEIR [3] , and they define the Infectious I = C − D − R that hopefully have a peak at some time from the exponential outbreak. Prediction of this peak is the central problem for any analysis of the COVID-19 pandemic.

To illustrate the problem, let us assume a form of the 14-day-rule:

If a confirmed person survives for d days after confirmation, the person is counted as recovered.

To analyze this, let D i, j be the count of deaths on day j of the new registered of day i, i.e. i < j ≤ i + d. Then the new casualties on day j are 

Summing over j up to some n yields

which is quite a simple formula, but the count should be cumulative. This fails for various countries, unless d is chosen to be intolerably small. For example, the data for Italy at day n = 52 (March 14th) have C n−14 − D n = 888 − 1266 = −378. Furthermore, while the general assumption makes sense for large d, the above solution will then necessarily fail.

Assume that p k is the probability to die at day k after confirmation, after still alive at day k − 1. These events are assumed to be stochastically independent. The 3 probability to die at day k after confirmation then is

Now the deaths at day n can be expected to be

and it is clear how to set up a nonlinear constrained least-squares fit to estimate the probabilities. The above equations are set up for various n, using the newest available data, and leading to more than d equations for the d unknowns p 1 , . . ., p d that enter nonlinearly into the q k by (1). Then the sum of squares of the residuals is minimized.

The probability to be still alive on day d after confirmation is

and therefore the Recovered can be modelled by

This is automatically cumulative and maintains the integrity of person counts between the Confirmed, Deaths, and Recovered. All confirmed persons end up some time later as recovered or dead.

As a byproduct, the above approach provides estimates for the probability to die on day k after confirmation, or to survive after d days. Of course, this is strongly dependent on how and when confirmations are done in a certain territory, and it only applies to persons with confirmed COVID-19 by the rules applied there. The dotted green values are the R data from the old data set, as far as available. The deviations from the dotted and the continuous R values are not necessarily failures of the method, because the old R data were questionable anyway and might not maintain data consistency. France has not reported any Recovered for quite some time, and the data from Germany is doubtful for the procedural reasons mentioned above.

The survival probabilities in the last row of Table 1 are low for countries before and after the full outbreak, while they are large for countries in a critical phase. When testing different decision times d, in particular larger d, the estimated survival probabilities do not change much, while the death probabilities continue their tendency to have peaks around day 5 and day 14. There is no attempt to interpret the findings to more detail, in particular towards medical, social, or political implications. In further work, the obtained data for the Recovered should enter into predictions of a possible peak of the Infectious. 

COVID-19 repository at GitHub

The mathematics of infectious diseases