key: cord-266930-a1mzxmsb authors: Rigatti, S. J.; Stout, R. title: SARS-CoV-2 Antibody Prevalence and Association with Routine Laboratory Values in a Life Insurance Applicant Population date: 2020-09-11 journal: nan DOI: 10.1101/2020.09.09.20191296 sha: doc_id: 266930 cord_uid: a1mzxmsb Objectives: The prevalence of SARS-CoV-2 antibodies in the general population is largely unknown. Since many infections, even among the elderly and other vulnerable populations, are asymptomatic, the prevalence of antibodies could help determine how far along the path to herd immunity the general population has progressed. Also, in order to clarify the clinical manifestations of current or recent past COVID-19 illness, it may be useful to determine if there are any common alterations in routine clinical laboratory values. Methods: We performed SARS-CoV-2 antibody tests on 50,130 consecutive life insurance applicants who were having blood drawn for the purpose of underwriting (life risk assessment). Subjects were also tested for lipids, liver function tests, renal function studies, as well as serum proteins. Other variables included height, weight, blood pressure at the time of the blood draw, and history of common chronic diseases (hypertension, heart disease, diabetes, and cancer). Results: The overall prevalence of SARS-CoV-2 was 3.0%, and was fairly consistent across the age range and similar in males and females. Several of the routine laboratory tests obtained were significantly different in antibody-positive vs. antibody-negative subjects, including albumin, globulins, bilirubin, and the urine albumin:creatinine ratio. The BMI was also significantly higher in the antibody-positive group. Geographical distribution revealed a very high level of positivity in the state of New York compared to all other areas (17.1%). Using state population data from the US Census, it is estimated that this level of seropositivity would correspond to 6.98 million (99% CI: 6.56-7.38 million) SARS-CoV-2 infections in the US, which is 3.8 times the cumulative number of cases in the US reported to the CDC as of June 1, 2020. Conclusions: The estimated number of total SARS-CoV-2 infections based on positive serology is substantially higher than the total number of cases reported to the CDC. Certain laboratory values, particularly serum protein levels, are associated with positive serology, though these associations are not likely to be clinically meaningful. population data from the US Census, it is estimated that this level of seropositivity would correspond to 6.98 million (99% CI: 6.56-7.38 million) SARS-CoV-2 infections in the US, which is 3.8 times the cumulative number of cases in the US reported to the CDC as of June 1, 2020. The estimated number of total SARS-CoV-2 infections based on positive serology is substantially higher than the total number of cases reported to the CDC. Certain laboratory values, particularly serum protein levels, are associated with positive serology, though these associations are not likely to be clinically meaningful. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020 . . In early 2020 a novel coronavirus emerged in Hubei Province, China 1 . The causative agent was a betacoronavirus most closely related genetically to zoonotic viruses found in bats, and clinically similar to recent emergent epidemic coronaviruses which caused Severe Acute Respiratory Syndrome (SARS) and Middle Eastern Respiratory Syndrome (MERS) 2 . Since then, the virus has become a worldwide pandemic, infecting over 17 million persons and causing more than 660,000 deaths as of this writing 3 . The first case in the United States occurred on January 20 th , 2020 4 . And since then the Centers for Disease Control and Prevention has recommended that all states report laboratory-confirmed cases 5 . Case counts have been closely tracked by the CDC, the press, and academic institutions. However, because the illness caused by SARS-CoV-2 may be asymptomatic or minimally symptomatic 6 , these counts of cases may underestimate the number of persons who have been infected. Various studies of seroprevalence in the United States 7,8 have shown different results based on timing and locality, but have been consistent in showing that seroprevalence is higher than would be implied by simple case counts based on viral antigen testing. Because SARS-CoV-2 is novel, the presence of antibodies in the blood likely indicates a history of infection since the pandemic began, and serologic testing can be used to estimate the overall rate of infection, even in those who had minimal symptoms or who were never tested despite symptoms. In this study, a convenience sample of blood specimens submitted to a commercial laboratory was used to conduct a survey of seroprevalence. The goal was both to estimate the overall number of cases in the general population and to examine the data to determine if any common clinical laboratory tests were significantly associated with seropositivity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.09.09.20191296 doi: medRxiv preprint In the United States, the process of purchasing life insurance often involves a brief physical examination by paramedical professionals, the collection of height, weight and blood pressure measurements, and the testing of blood and urine specimens for common analytes related to overall health. Such tests are seldom, if ever, performed on individuals below age 17 years or above 85 years. Also, blood tests are generally reserved for individuals applying for higher dollar amounts of life insurance or for those applying for permanent types of insurance (rather than term insurance). Thus, individuals applying for life insurance are a self-selected group primarily from higher socio-economic strata. Those who have a history of chronic illness may be less likely to apply because more serious conditions can be associated with higher life Other information available on test subjects included age, sex, smoking status (tobacco use within one year), height, weight, blood pressure, and routine laboratory measures which included some combination of glucose, blood urea nitrogen, creatinine, aspartate aminotransferase, alanine aminotransferase, gamma glutamyltransferase, alkaline phosphatase, total bilirubin, total cholesterol , high density lipoprotein (HDL) cholesterol, triglycerides, lactate dehydrogenase, hemoglobin A1c, and NT-pro B-type natriuretic peptide. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . Limited medical history was available in the form of responses to simple yes/no questions regarding a history of cardiovascular disease, diabetes, hypertension, and cancer. The differences in continuous variables between the antibody-positive and negative groups were tested for significance with the Mann-Whitney U test, while differences in categorical variables were tested using the chi-square test. To estimate the total burden of SARS-CoV-2 infections in the US, census data was obtained. For each state and the District of Columbia, the total 2018 estimated census population was multiplied by the US population proportion between the ages of 20 and 80 (71.1%). Then, the state-specific proportion of positive tests was applied from our sample. Confidence limits were estimated by generating 5000 bootstrap samples (with replacement) of our data and recalculating the total number of US cases. Under and over-representation of states was determined by a ratio between the proportion of individuals living in a given state to the proportion of tests performed in that state. All statistical analyses were performed using R (version 3.6.1) 9 and R-studio (version 1.2.1335) 10 . The overall sample included 50,025 individuals with a median age of 42 years (IQR: 34-54), 56% of whom were male. Geographical distribution deviated somewhat from the overall population distribution of the US, with some under-representation from Maine, West Virginia, Vermont and Oklahoma, and over-representation from Nebraska, Hawaii, and Utah. Characteristics of the study population are displayed in Table 1 . The antibody positive group tended to be slightly younger (median age 41) vs. the antibody negative group (median age 42). The proportion of subjects reporting a history of heart disease, hypertension and/or diabetes was similar between the positive and negative groups. Laboratory tests which were statistically different (p < 0.01) between the positive and negative group included creatinine, BUN, bilirubin, total protein, albumin, globulin the albumin:globulin ratio, total cholesterol, hemoglobin A1c, and the urine . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.09.09.20191296 doi: medRxiv preprint protein:creatinine ratio. Also, BMI tended to be higher in the positive group than the negative group. While these differences were statistically significant, the numeric differences are quite small with overlapping distributions (see Figure 1 ) and unlikely to be clinically relevant. Rates of positive serology varied by age and sex (see Table 2 ), with lower rates among individuals over age 60. Geographically, at least 50 samples were obtained from each state and the District of Columbia. Fewer than 100 tests were obtained from Vermont (59), Wyoming Table 3 . The estimate of the total number of SARS-CoV-2 infections in the US was 6.98 million (99% CI: 6.56-7.38 million). Because of the small numbers of samples in certain states, the estimation of the overall rate of in the US could not utilize a more precisely weighted approach, but relied solely on state population-based weights. Because antibodies to SARS-CoV-2 may take some time to develop, the total number of COVID reported to the CDC as of June 1 st , 2020 was used as the baseline comparator. This was chosen because it is in the earlier portion of the date range of the study. Compared to the 1.816 million cases reported to the CDC, our estimate of 6.98 million cases is 3.8 times the total burden of reported cases. An attempt was made to develop a prediction model (results not shown) using logistic regression and the laboratory values which demonstrated statistically significant differences between the positive and negative groups. These models did not achieve reasonable performance. Even if performance was better it is likely that, over time, as the acuity of the pandemic wanes and antibody levels persist, these lab values will become less predictive of serological status. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 11, 2020. . This study estimated the seroprevalence of SARS-CoV-2 antibodies in a geographically diverse sample of adults in the US within a 6-week collection period ending in late June 2020. The rate of positivity ranged from 0% to 17% by state and from 1-3% across age and sex categories. The choropleth map of seropositivity roughly corresponds to the areas where the most COVID-19 cases were reported during that period of time. Our results suggest that many more infections occurred than were reported. This is likely due to asymptomatic or minimally symptomatic infections for which care was not sought or symptomatic infection for which testing was not The collection period was from April 19 to 28, 2020, and the estimate was 22.7%. The higher estimate than the current study, despite being performed in an earlier time period, is likely due to a geographical distribution that is more localized to the highest prevalence metro region, rather than the entire state of New York. The present study is different in that it evaluates laboratory findings in the setting of positive or negative SARS-CoV-2 serology. The difference in lab test results were very modest and insufficient to identify who should or should not be tested for SARS-CoV-2. It implies that, around the time of study, the number of infections in the US was nearly 4 times higher than reported suggesting a much more widespread pandemic, but with a smaller rate of hospitalization, complications and deaths. Weaknesses of the study include the imbalanced representation of the US states, as well as the lack of samples from those under age 20 or over age 80. The age distribution is also more heavily weighted to the young adult years, which is not representative of the US population. Although the sample size was large, it was not large enough to stratify by both age and geography when estimating population seroprevalence. Finally, the life insurance-buying population tends to be both healthier and wealthier than average, and this could also bias the results in an indeterminate direction. The rate of SARS-CoV-2 seropositivity in this population of insurance applicants implies a burden of infection approximately 3.8 times higher than the number of reported cases. While some differences in laboratory values reached statistical significance, these differences were . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. Table 3 : Prevalence of SARS-CoV-2 Antibodies by Location . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.09.09.20191296 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.09.09.20191296 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 11, 2020. . https://doi.org/10.1101/2020.09.09.20191296 doi: medRxiv preprint Clinical features of patients infected with 2019 novel coronavirus in Wuhan SARS and MERS: recent insights into emerging coronaviruses Coronavirus Pandemic Severe acute respiratory syndrome coronavirus 2 from a patient with coronavirus disease, United States Nationally Notifiable Diseases Surveillance System. Coronavirus disease 2019 (COVID-19). Accessed The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team. The Epidemiological Characteristics of an Outbreak fo 2019 Novel Coronavirus Diseases (COVID-19)-China Population Point Prevalence of SARS-CoV-2 Infection Based on a Statewide Random Sample -Indiana Seroprevalence of SARS-CoV-2-specific antibodies among adults in R: A language and environment for statistical computing. R Foundation for Statistical Computing RStudio: Integrated Development for Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States Seroconversion of a city: longitudinal monitoring of SARS-CoV-2 seroprevalence in New York City. medRxiv. Preprint posted online Cumulative incidence and diagnosis of SARS-CoV-2 infection in New York A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results Rapid identification of SARS-CoV-2-infected patients at the emergency department using routine testing Towards an Arificial Intelligence Framework for Data-Driven Prediction of