key: cord-213187-f1ic63o5
authors: Rahman, Akond; Farhana, Effat
title: An Exploratory Characterization of Bugs in COVID-19 Software Projects
date: 2020-05-31
journal: nan
DOI: nan
sha: 
doc_id: 213187
cord_uid: f1ic63o5

Context: The dire consequences of the COVID-19 pandemic has influenced development of COVID-19 software i.e., software used for analysis and mitigation of COVID-19. Bugs in COVID-19 software can be consequential, as COVID-19 software projects can impact public health policy and user data privacy. Objective: The goal of this paper is to help practitioners and researchers improve the quality of COVID-19 software through an empirical study of open source software projects related to COVID-19. Methodology: We use 129 open source COVID-19 software projects hosted on GitHub to conduct our empirical study. Next, we apply qualitative analysis on 550 bug reports from the collected projects to identify bug categories. Findings: We identify 8 bug categories, which include data bugs i.e., bugs that occur during mining and storage of COVID-19 data. The identified bug categories appear for 7 categories of software projects including (i) projects that use statistical modeling to perform predictions related to COVID-19, and (ii) medical equipment software that are used to design and implement medical equipment, such as ventilators. Conclusion: Based on our findings, we advocate for robust statistical model construction through better synergies between data science practitioners and public health experts. Existence of security bugs in user tracking software necessitates development of tools that will detect data privacy violations and security weaknesses.

The novel Coronavirus disease (COVID-19) is a worldwide pandemic that spreads through droplets generated from coughs or sneezes and by touching contaminated surfaces (John Hopkins University, 2020) . As of May 31 2020, COVID-19 has caused 370,247 deaths across the world (John Hopkins University, 2020) . Apart from causing thousands of deaths and creating long term health repercussions for vulnerable populations, COVID-19 has severely impacted the economic sector. According to a recent study (Erin Duffin, 2020) , due to COVID-19 gross domestic product (GDP) will decrease from 3.0% to 2.4% worldwide. As of May 28 2020, nearly 41 million citizens reported unemployment in USA alone (Mitchell Hartman, 2020) . More than 3.9 billion people around the world were under some form of stay at home order due to COVID-19 (Alasdair Sandford, 2020) .

Health care professionals are at the frontline of combating COVID-19. Practitioners from other domains, such as software engineering have also joined forces to analyze and mitigate the negative consequences of COVID-19. For example, statistical modeling was used to build a software that identifies pneumonia caused by COVID-19 from lung scan images (Tom Simonite, 2020) . The software was used in 34 Chinese hospitals (Tom Simonite, 2020) . In response to the food insecurity caused by COVID-19, practitioners have created an interactive visualization software that displays free meal sites across USA (Why Hunger, 2020) . The creators of the software envision in building a social movement to eradicate hunger and address economic inequalities. As another example, Apple and Google have jointly announced of creating a software framework that will help practitioners build tools to trace COVID-19 infection status of mobile app users (Apple, 2020) . The above-mentioned examples show COVID-19 software i.e., software used for analysis and mitigation of COVID-19, to have near-term and long-term effects on public health and society.

Despite the above-mentioned advancements, COVID-19 software projects are susceptible to bugs. Let us consider Figure 1 in this regard. Figure 1 provides a snapshot of a bug report related to statistical modeling (Begley, 2020a) . We observe when implementing a statistical model the practitioners did not consider the correlation between ICU bed availability and death rate prediction. Furthermore, the number of intensive care unit (ICU) beds is incorrectly assumed to be 40,000 instead of 1,000.

We hypothesize systematic analysis can reveal bug categories including statistical modeling bugs similar to Figure 1 . In prior work researchers (Garcia et al., 2020; Rahman et al., 2020; Linares-Vásquez et al., 2017; Catolino et al., 2019; Thung et al., 2012; Wan et al., 2017) have documented the importance of bug categorization. For example, for autonomous vehicle software Garcia et al. (Garcia et al., 2020) stated that categorization of bugs can help to construct bug detection and testing tools. Linraes-Vásquez et al. (Linares-Vásquez et al., 2017) stated categorizing vulnerabilities can help Android practitioners "in focusing their verification and validation activities". According to Catolino et Fig. 1 : An example of a bug report related to statistical modeling in a software project called 'neherlab/covid19 scenarios'.

al. (Catolino et al., 2019) , "understanding the bug type represents the first and most time-consuming step to perform in the process of bug triage". Categorization of bugs in COVID-19 software can help practitioners and researchers to (i) understand the nature of COVID-19 software bugs, (ii) construct bug detection and repair tools, and (iii) measure COVID-19 software quality by using reported frequency of bug categories as a benchmark.

In prior work, researchers have categorized bugs for IaC , autonomous vehicle (Garcia et al., 2020) , machine learning (Thung et al., 2012; Islam et al., 2019) , and blockchain (Wan et al., 2017) software. However, COVID-19 software is different from previously studied software in the following aspects: (i) development context: unlike previously studied software projects, COVID-19 software is developed in response to a pandemic that infected 6.1 million individuals in five months (John Hopkins University, 2020) , and (ii) public health: unlike previously studied software projects, COVID-19 software has direct implications on public health and relevant policy making for inhabitants in 188 countries.

In response to the pandemic, researchers have conducted studies related to modeling (Dehning et al., 2020; Yang and Wang, 2020; Tamm, 2020) , biological science (Jin et al., 2020; Wang et al., 2020; De Clercq, 2006; Helms et al., 2020) , social science (Van Bavel et al., 2020; Pulido et al., 2020; Evans et al., 2020; Jarynowski et al., 2020) , and policy making (Corey et al., 2020; Mello and Wang, 2020; Rourke et al., 2020; Kraemer et al., 2020) . However, characterization of bugs in COVID-19 software remains an unexplored area.

The goal of this paper is to help practitioners and researchers improve the quality of COVID-19 software through an empirical study of open source software projects related to COVID-19.

We answer the following research questions: We conduct an empirical study with 129 open source COVID-19 software projects hosted on GitHub. First, we apply qualitative analysis (Saldana, 2015) on the README files of the collected open source software (OSS) projects to identify what categories of OSS projects exist related to COVID-19. Next, we apply qualitative analysis on 550 bug reports from the collected OSS projects to identify bug categories. We also quantify the frequency and resolution time of each bug category across the identified project categories. An overview of our paper is available in Figure 2 .

Contributions: We list our contributions as following:

-A categorization of bugs that appear in COVID-19 software projects; -A categorization of OSS projects related to COVID-19; -An empirical study that identifies what category of bugs appear for what category of COVID-19 software projects; and -A curated dataset which maps each identified bug report to the identified bug categories 1 .

We organize rest of the paper as following: we provide background and discuss related work in Section 2. We provide the methodology and results for RQ1 and RQ2 respectively, in Sections 3 and 4. We discuss our results with a summary of our findings in Section 5. We provide the limitations of our paper in Section 6. Finally, we conclude the paper in Section 7.

In this section, we first provide background on COVID-19 in Section 2.1 and briefly describe related research in Section 2.2.

COVID-19 stands for 'Coronavirus disease 2019' (John Hopkins University, 2020) . COVID-19 is an infectious disease that causes severe respiratory problems for infected human beings. The first case of an infected COVID-19 patients was reported in December 2019 in Wuhan, China (John Hopkins University, 2020). Since then the disease has spread rapidly. To date 6,108,525 cases have been reported across 188 countries, which resulted in 370,247 deaths (John Hopkins University, 2020) .

COVID-19 is highly contagious (Sanche et al., 2020) and is caused by a strain of Coronavirus called severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) (Gorbalenya, 2020) . Disagreements exist amongst researchers on how the SARS-CoV-2 was transmitted to human beings. By using genome sequence similarity, a group of researchers speculate that the virus was transmitted to human beings from pangolins (Wong et al., 2020) . Researchers have found the SARS-CoV-2 virus to have 92% resemblance with the Coronavirus found in pangolins . Pangolins are nocturnal mammals found in Asia and Sub-Saharan Africa. Other researchers speculate the SARS-CoV-2 to be transmitted from horseshoe bats and civets. Researchers from Mc-Master University found the SARS-CoV-2 virus to share 99.8% of it's genome with a civet Coronavirus (Cyranoski, 2020) . From another phylogenetic analysis, researchers observed that a virus from horseshoe bats have a 96% resemblance to the SARS-CoV-2 virus (Zhou et al., 2020) .

COVID-19 is a human transmissible disease. Transmission occurs via respiratory droplets from coughs and sneezes within a range of ∼6 feet. The virus can also be transmitted to a human being via contaminated surfaces as well as through droplets generated from cough and sneezing. What makes COVID-19 more susceptible for transmission is that infected human beings do not exhibit symptoms up until 2∼14 days from infection (John Hopkins University, 2020) . Symptoms of COVID-19 include cough, shortness of breath, fever, sore throat, and loss of taste or smell.

As of May 31, 2020, no vaccines exist that can cure COVID-19. To prevent the spreading, health experts have strongly recommended for maintaining personal hygiene e.g., frequent hand washing, social distancing, and mandatory lockdown, when necessary (John Hopkins University, 2020).

Researchers (Kissler et al., 2020) have provided evidence that support the recurring nature of COVID-19. About the recurrence of COVID-19 Kissler et al. (Kissler et al., 2020) stated "a resurgence in contagion could be possible as late as 2024.".

Our paper is related with prior research that has focused on computing research related COVID-19 and characterization of bugs in OSS projects. We briefly describe each of these areas in the following subsections:

Our paper is related to recent computing research on COVID-19. Since the outbreak of COVID-19 in December 2019, researchers have conducted extensive research on understanding the spread of the disease through lens of computer science. We describe related work as following:

-Visualization techniques: Dey et al. (Dey et al., 2020) constructed a visualization tool called Visual Exploratory Data Analysis (VEDA) to understand the spread of COVID-19. -Machine learning and statistical modeling: Naude (Naudé, 2020) identified research areas where machine learning can be applied to combat COVID-19: (i) early notifications, (ii) tracking and prediction, (iii) data dashboards, (iv) diagnosis and prognosis, (v) treatments and cures, and (vi) social control.

Researchers Tamm, 2020) in separate studies have used statistical models to understand the COVID-19 outbreak. Yang and Wang (Yang and Wang, 2020) proposed a mathematical model to understand the COVID-19 outbreak in Wuhan, where the virus originated from. They observed the diseases to be endemic and advocated for long-term disease prevention and intervention public health programs. Tamm (Tamm, 2020 ) constructed a mathematical model to understand the outbreak in Moscow using five scenarios based on control measures. Tamm (Tamm, 2020) reported that fatality due to COVID-19 would remain extremely high and healthcare providing institutions would be overloaded. Randhawa et al. (Randhawa et al., 2020) applied machine learning to classify pathogens, and observed evidence that supports the hypothesis that COVID-19 originated from a bat, as their model classified the COVID-19 virus as 'Sarbecovirus', a sub-category within 'Betacoronavirus'. Rao and Vasquez (Rao and Vazquez, 2020) used machine learning algorithms to identify potential COVID-19 patients using mobile web-based survey data. Currie et al. (Currie et al., 2020) identified challenges in modeling the COVID-19 pandemic, which included quarantine strategies and case isolation, social distancing measures and applications, lock down management, and testing for COVID-19. Pandey et al. (Pandey et al., 2020) used machine learning to construct a contract tracing mobile application for COVID-19 that uses the smartphone's senor data and self-assessment of the smartphone user. Santosh (Santosh, 2020) advocated for the usage of active learning and multi-modal data to identify COVID-19 outbreaks as the pandemic is world-wide and differences in location can impact model performance used in forecasting.

-Robotics: Yang et al. advocated for the usage of robotics to help combating COVID-19 as robots can be deployed to deliver food and medicine, as well as for disinfecting infrastructure, such as medical centers and schools.

The above-mentioned discussion shows a lack of research that have characterized bugs in COVID-19 software projects. We address this research gap in our paper.

Our paper is also related with prior research that have characterized bugs in OSS. Mockus et al. (Mockus et al., 2002) studied the contribution nature in OSS Apache and Mozilla projects. They (Mockus et al., 2002) observed contributors who submit bug reports are approximately 8.2 times higher in number than contributors who address bugs in bug reports. Ma et al. (Ma et al., 2017) investigated Python GitHub projects that are used in the scientific domain, and observed developers to use stack traces, as well as communicate with upstream developers, to identify root causes of bugs. Zhang et al. (Zhang et al., 2019) examined bug reports for mobile and desktop software hosted on GitHub, and identified differences on how the reports are constructed. Ray et al. (Ray et al., 2014) studied the correlations between bugs and the language the software is being developed, and reported a modest correlation using an empirical study of 729 GitHub projects. Categorization of domain-specific OSS bugs has also been investigated: Thung et al. (Thung et al., 2012) , Garcia et al. (Garcia et al., 2020) , Wan et al. (Wan et al., 2017) , Islam et al. (Islam et al., 2019) , and Rahman et al. in separate research papers used OSS projects to classify bug categories respectively, for machine learning, autonomous vehicle, blockchain, deep learning, and IaC.

We take motivation from above-mentioned research and study COVID-19 software bugs in the following manner:

categories of bugs; frequency of identified bug categories; resolution time of identified bug categories; and categories of software projects.

In this section, we answer "RQ1: What categories of open source COVID-19 software projects exist? ". We define COVID-19 software projects as software projects used for analysis and mitigation of COVID-19. We hypothesize multiple categories of COVID-19 software projects to exist in the OSS domain. We validate our hypothesis by systematically categorizing COVID-19 software projects. Our categorization will provide insights on how the software development community has responded to the COVID-19 pandemic.

We answer RQ1 by completing the following steps:

We conduct our empirical analysis by collecting COVID-19 software projects hosted on GitHub. To collect these projects we use GitHub's search utility (GitHub, 2020c), where we first identified projects tagged as 'covid-19'. We use the search string 'covid-19', as it is a topic designated for COVID-19 by GitHub (GitHub, 2020a) . Upon collection of the projects we apply a set of filtering criteria so that we can identify projects that contain sufficient data for analysis. We describe the filtering criteria below:

-Criterion-1: The project must have at least 2 developers. Our assumption is that this criterion will filter projects used for personal purposes. -Criterion-2: The project has at least 5 open issues. We use this filtering criterion to identify projects that are actively maintained. -Criterion-3: The project must have at least two commits per month. Munaiah et al. (Munaiah et al., 2017) used the threshold of at least two commits per month to determine which projects have enough development activity for software organizations. We use this threshold to filter projects with short development activity. -Criterion-4: The README of the project is written in English. README projects related to COVID-19 can be non-English. We do not include non-English projects as raters who will perform categorization are not familiar with non-English languages, such as Spanish and Cantonese. -Criterion-5: The project is actually related to COVID-19. Practitioners can mislabel projects using the 'topic' feature of GitHub. For example, from manual inspection we observe the 'RehanSaeed/Schema.NET' 2 project to be tagged as 'covid-19', even though it is used to convert blob objects into C# classes.

We apply a qualitative analysis called open coding (Saldana, 2015) on the content of README files for each of the downloaded projects from Section 3.1.1. README files describe the content of the project and give GitHub users an overview of the software project (Prana et al., 2019) . We hypothesize that by systematically analyzing the content of the README files we can derive what types of software projects are developed that are related to In open coding a rater identified and synthesizes patterns within unstructured text (Saldana, 2015) . We select open coding because we can obtain detailed information on the software project categories. We use a hypothetical example to demonstrate our process of open coding in Figure 3 . First, we

The collect text from the README files for each of the collected projects from Section 3.1.1. Next, we extract text snippets that describe the purpose of the software project. For example, from the raw text 'The COVID-19 Vulnerability Index (CV19 Index) is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID-19 ' we extract the text snippet 'a predictive model ', as the extracted text snippet describes the purpose of the software project. Next, from the text snippets 'a predictive model ' and 'modelling estimated deaths' we generate an initial category called 'Models to predict'. Two initial categories 'Models to predict' and 'Models to understand ' are combined into one category 'Statistical modeling', as they both indicate the descriptions of the software projects to be related with statistical modeling. The first and second author conduct the open coding process separately. The first and second author respectively an experience of 10 and 6 years in software engineering and has experience in conducting open coding upon software project artifacts, such as commit messages and Stack Overflow posts (Farhana et al., 2019) . Upon completion of open coding process the first and second author identify agreements and disagreements. Disagreements are resolved upon discussion, Agreement rate is calculated using Cohen's Kappa (Cohen, 1960) . During the discussion phase both authors agreed present their justification, and recheck the category derivation based on the discussion and revisiting content. The mapping determined upon discussion is considered final. One project can map to multiple categories.

We apply closed coding (Crabtree and Miller, 1999) to identify which project maps to the identified categories from Section 3.1.2. Closed coding is the qual-itative analysis techniques where a rater maps an artifact to a pre-defined category by inspecting the artifact (Crabtree and Miller, 1999) . The first and second author separately conduct closed coding on the collected README files. After completing the closed coding process the first and second authors identify agreements and disagreements. Agreement rate is recorded using Cohen's Kappa (Cohen, 1960) . Disagreements are resolved using discussion. During the discussion phase both authors present their justification for disagreements. Next, based on the discussion the authors recheck the labeling based on the justification and content analysis. The categorization determined upon discussion is considered final.

The derived categories are susceptible to the bias of the first and second author. We mitigate the limitation by allocating an additional rater who applied closed coding for a subset of the README files. The additional rater who is not an author of the paper, is a fourth year PhD student in the Department of Computer Science at Tennessee Technological University. The rater has a professional experience of 2 years in software engineering and has conduced qualitative analysis on software artifacts, such as bug reports. We randomly allocate a set of 100 README files mined from 100 projects to the rater. The rater applies closed coding on the content of the README files, to identify the mapping between each project and identified categories. Upon completion of closed coding we calculate Cohen's Kappa (Cohen, 1960) between the rater and the first author, as well as with the second author, separately.

We answer RQ1 by first providing summary statistics of our dataset in Section 3.2.1. Next, we report categories of the projects in Section 3.2.2.

Altogether we download 129 projects for analysis. Using the search feature we identify 3,276 public projects upon which we apply our filtering criterion. A complete breakdown of our filtering criterion is available in Table 1 . Summary statistics of the projects is available in Table 2 . 'Languages' in Table 2 correspond to the count of main programming languages of the collected projects as determined by GitHub's linguist tool (GitHub, 2020b). Example languages include JavaScript, Python and R.

A temporal evolution of the 129 COVID-19 software projects based on creation date is available in Figure 4 . We observe sharp increase in project creation after Feb 29, 2020. 

We identify 7 categories of COVID-19 software projects. We describe each of the categories below in an alphabetic order: I: Aggregation:: This category includes software projects that curate data related to COVID-19 and present collected COVID-19 data in an aggregated format using visualizations. The purpose of these projects is to help users understand the spread of the COVID-19 disease over time and loca-tion. Software projects that belong to this category can be country specific as done in 'juanmnl/covid19-monitor' (juanmnl, 2020) and 'dsfsi/covid19za' (dsfsi, 2020b) respectively, for Ecuador and South Africa. Aggregation of COVID-19 data can also be at a global level, for example, 'boogheta/coronaviruscountries' (boogheta, 2020) is a software that aggregates COVID-19 data across the world and allows software users to compare the reported cases on a country-by-country basis.

II: Education:: This category includes projects that provide utilities on educating people about COVID-19. Lack of knowledge related to infections and symptoms can contribute to rapid spreading of COVID-19. The purpose of these projects is to build software, where users can ask questions and obtain answers. We observe two categories of software: first, question and answer websites similar to Stack Overflow 3 , such as 'nthopinion/covid19' (nthopinion, 2020) , where users can ask questions about COVID-19, and other users answer such questions. Second, we observe bot-specific software, such as 'deepsetai/COVID-QA' (deepset ai, 2020) that provides answers for questions related to COVID-19 automatically.

III: Medical equipment:: This category includes projects to curate and maintain source code for the design and implementation of medical equipment used to treat COVID-19. The purpose of these projects is to create designs of COVID-19 related medical equipment, such as ventilators at scale, so that the growing need of medical equipment in hospitals is satisfied. One example of such repository is 'makers-for-life/makair' (makers-for life, 2020), which states the following in it's README page: "Aims at helping hospitals cope with a possible shortage of professional ventilators during the outbreak. Worldwide. ... We target a per-unit cost well under 500 EUR, which could easily be shrunk down to 200 EUR or even 100 EUR per ventilator given proper economies of scale, as well as choices of cheaper on-the-shelf components". The project includes design of the proposed ventilators as CAD files, as well as relevant firmware available as C++ code files.

Another example is the 'popsolutions/openventilator' (popsolutions, 2020), which aims to provide cheap but reliable ventilators to treat COVID-19 in economically under-developed regions of the world. The software project initiated from a Facebook group called 'Open Source COVID-19 Medical Supplies' 4 , where members discussed the scarcity of ventilators and the importance of creating cheap ventilators through efficient design. In the project we notice developers to create, build, and share designs using OpenSCAD scripts. OpenSCAD is an open source tool to build computer-aided design (CAD) objects 5 .

IV: Mining :: This category includes projects that provide APIs to mine COVID-19 data from data sources, such as the US Center for Disease Control and Prevention (CDC) (CDC, 2020), the World Health Organization (WHO) (WHO, 2020) , and data reported from local institutions. The purpose of this category of software is to provide utilities for software developers so that they can get real-time access to COVID-19 data to build aggregation software, discussed above. Because of the nature of the pandemic, access to real-time data is pivotal for accurate aggregation and analysis. The mining tools help developers to get such support. Mining software can be location specific, for example 'dsfsi/covid19africa' (dsfsi, 2020a) is dedicated to curate and collate COVID-19 related data for African countries.

V: User tracking :: This category includes software projects that collects information from users regarding their COVID-19 infection status. Tracking of user information can happen voluntarily, where the user voluntarily self reports COVID-19 infection status. The 'enigmampc/SafeTrace' (enigmampc, 2020) software is an example where users self report their infection status as well as location history. Tracking of user information can also be done using inference, as done in 'OpenMined/covid-alert' (OpenMined, 2020) , where the software collects user's location information to predict if the user is in a location with high infection density. One utility of these projects is to identify high-risk locations so that users can have an understanding of which nearby location can be avoided. Self reporting software have yielded benefits for China and South Korea .

VI: Statistical modeling :: This category includes software that use statistical models to predict attributes related to COVID-19. The purpose of the projects is to make predictions for the future based on existing data. Example usage of statistical models include (i) predicting death rate as done in 'ImperialCollegeLondon/covid19model' (ImperialCollegeLondon, 2020), (ii) automating the process of lung segmentation with computerized tomography (CT) scan, as done in 'JoHof/lungmask' (JoHof, 2020), (iii) predicting the impact of the COVID-19 pandemic on hospital demands as done in 'neherlab/-covid19 scenarios ' (neherlab, 2020) , and (iv) predicting presence of COVID-19 with X-ray images using deep learning as done in 'elcronos/COVID-19' (elcronos, 2020).

VII: Volunteer management:: This category includes software used to efficiently manage volunteering effort. The purpose of this software is to build software platforms so that users can volunteer and participate in activities to help distressed families and communities. One example is the 'covidvolunteers' (helpwithcovid, 2020) software, which provides a web portal where users can sign up for 650 projects that include donation of masks, personal protective equipment (PPEs), and testing of COVID-19 6 . Platforms can be global, such as 'covid-volunteers', and also regional, for example 'Applifting/pomuzeme.si' (Applifting, 2020) creates a web portal so that people inside Czech Republic can volunteer. 

Based on project count aggregation is the most frequent category. Along with project count, we provide summary statistics of projects that belong to each category in Table 3 . We also observe on average user tracking projects to be more frequently released compared to other project types.

We identify four software projects that belong to multiple categories. As an example, the 'soroushchehresa/awesome-coronavirus' (soroushchehresa, 2020) project belongs to the categories: aggregation, mining, and statistical modeling.

We report agreement rate for three steps: open coding, closed coding, and rater verification. Open coding: After completing open coding, the first and second author respectively, identified 7 and 10 categories. The agreement rate is 70.0%, and the Cohen's Kappa is 0.7, indicating 'substantial' agreement (Landis and Koch, 1977) . The authors disagreed on 'Volunteering software related to local communities', 'Education bots', and 'Aggregated visualizations', additional categories identified the second author. Upon discussion both authors agree that the category 'Education bots' can be merged with 'Education' as it encompasses all categories of knowledge software, such as bots and web applications. The authors also agreed that 'Volunteering software related to local communities' can be merged with 'Volunteer management' and 'Aggregated visualizations' can be merged with 'Aggregation', as 'Aggregation' includes software that aggregates COVID-19 data and displays aggregated data with visualizations. Closed coding: During closed coding the first and second author mapped each of the 129 project to an existing category. The agreement rate is 93.8%. The Cohen's Kappa is 0.92. The authors disagreed on the labeling of 8 projects, which are resolved through discussion.

Rater verification: We also measured the agreement rate between an additional rater and the authors for categorizing README files of projects. Cohen's Kappa between the additional rater and the first author for a randomly selected set of 50 README files is 0.73, indicating 'substantial' agreement (Landis and Koch, 1977) . Cohen's Kappa between the additional rater and the second author for a randomly selected set of 50 README files is 0.73, indicating 'substantial' agreement (Landis and Koch, 1977) . The agreement rate between the additional rater and the first and second author is respectively, 78.0% and 76.0%.

In this section, we answer "RQ2: What categories of bugs appear in COVID-19 software projects? How frequently do the identified bug categories appear? What is the resolution time for each bug category? " A categorization of bugs for COVID-19 software projects can inform practitioners and researchers about how software related to COVID-19 is developed and in which areas they can help. Furthermore, educators can learn about the software bugs that occur in a software related to a pandemic and disseminate these findings in the classroom. Frequency of the identified bug categories can help us understand what categories of software tend to contain what types of software bugs and provide quality improvement suggestions accordingly. Quantifying the resolution time for bugs in software projects can help software engineering researchers provide actionable guidelines to practitioners. For example, Wan et al. (Wan et al., 2017) observed that for blockchain software projects security bugs can take longer to fix compared to other bug categories. Based on their findings Wan et al. (Wan et al., 2017) recommended that blockchain project maintainers can adopt security analysis and repair tools to fix security bugs quickly.

In this section we provide the methodology to identify bug categories, quantify bug category frequency, and bug resolution time.

Methodology to Identify Bug Categories: We identify bug categories using the following steps:

-Step#1-Filtering: We collect the 4,405 issue reports from the 129 projects and manually inspect each issue report. We do not rely on automated approach, such as keyword search or using bug labels, as automated approaches tend to generate false positives, which may bias research results (Herzig et al., 2013) . While inspecting each issue report we use the following IEEE definition for bugs: "an imperfection that needs to be replaced or repaired " (IEEE, 2010), similar to prior work . By completing this step we will obtain a set of closed issues reports that correspond to bugs. We use closed reports because as open bug reports are often incomplete and may not help in identifying bugs (Wan et al., 2017) . The first and second author manually inspect individually to identify what issue reports correspond to bugs. We record agreement rate and Cohen's Kappa (Cohen, 1960) between the first and second author. Disagreements between the first and second author is resolved through discussions. The process is subjective and susceptible to the bias of the first and second author. We mitigate the bias by using an additional rater, who inspected randomly inspected 100 issue reports and classified them as bug reports and non-bug reports. The additional rater is the fourth year PhD student at Tennessee Technological University who is also involved in rater verification for RQ1. -Step#2-Open coding: We apply open coding (Saldana, 2015) on the content of the collected bug reports from Step#1. Our open coding process for deriving bug categories is similar to our process of deriving project categories illustrated in Figure 3 . First, we extract raw text from bug report titles and description, from which we generate initial categories. Next, we merge initial categories based on the commonalities and generate categories. Similar to deriving project categories, the first and second author separately apply the process of open coding to generate bug categories. Upon completion of the process we quantify agreement rate and measure Cohen's Kappa (Cohen, 1960) . For disagreements we conduct discussion. Generated categories upon discussion is considered final.

Methodology to Quantify Bug Category Frequency: We apply the following steps to quantify the frequency of identified bug categories:

-Step#1-Closed coding: We apply closed coding (Crabtree and Miller, 1999) to map each identified category to the bug reports that we study. The first and second author separately apply closed coding for the collected bugs from Step#1. Upon completion, we calculate the agreement rate and Cohen's Kappa (Cohen, 1960) . Disagreements are resolved using discussion. -Step#2-Metric calculation: We quantify the frequency of the identified bug categories using two metrics: 'Proportion of Bugs Across All Projects (Bug-PropAll)' and 'Proportion of Bugs For a Certain Project Category (Bug-PropCateg)'. We use Equations 1 and 2 to respectively calculate 'Bug-PropAll' and 'BugPropCateg'. The 'BugPropAll' metric using Equation 1 provides a holistic overview of the frequency of identified bug categories. The 'BugPropCateg' metric using Equation 2 provides a granular overview of bug category frequency for each software project types identified from Section 3.2.2. -Step#3-Rater verification: The use of first and second author as raters to conduct closed coding is susceptible to rater bias. We mitigate this limitation by allocating an additional rater. We assign randomly selected 250 bug reports to the additional rater who apply closed coding. We provide the additional rater with a document that provides definitions of each identified category with examples. Similar to our process of rater verification for project categorization, the additional rater is the fourth year PhD student in the Department of Computer Science in Tennessee Tech. University. The fourth year PhD student is involved in the rater verification process for identifying project categories and labeling issue reports as bug reports.

BugPropAll(x) = # of bug reports labeled as category x total # of bug reports * 100% (1) BugPropCateg(x, y) = # of bug reports labeled as x, belonging to project type y total # of bug reports for project type y * 100%

(2)

We use the open and closing timestamp for each closed bug report in our dataset to quantify the resolution time for each bug category, similar to Wan et al. (Wan et al., 2017) . We calculate bug resolution time by computing the number of hours that have elapsed between when the bug report is opened and closed, and not re-opened again, as per our dataset , which was downloaded on April 04, 2020. We report bug resolution time for all bug categories, as well as for bug reports that belong to certain categories of software projects.

We answer RQ2 by first providing a breakdown of how we obtained our bug reports in Table 4 and 5. As shown in Table 5 , the categories with the most and least bug reports is respectively, aggregation and medical equipment. One project can belong to multiple categories, and that is why the total count of bug reports do not total to 550. On average, bug reports per project to vary from 1.3∼6.4, as shown in the 'BugReport/Project' column.

Next, we describe the identified bug categories in Section 4.2.1 by applying open coding on the collected 550 bug reports. The frequency of the identified bug categories is provided in Section 4.2.2. We provide details of rater verification in Section 4.2.3. Finally, we provide the bug resolution time in Section 4.2.4. 

We identify 8 bug categories, which we describe below alphabetically: I: Algorithm:: This category corresponds to bugs when implementation of an algorithm does not follow expected behavior. An algorithm is a sequence of computational steps that transform input into output (Cormen et al., 2009) . We observe algorithm bugs to include two sub-categories: (i) bugs related to statistical modeling algorithms, where statistical modeling results are incorrect due to incorrect assumptions and/or implementations, and (ii) bugs related to incorrect logic implemented in the software. Algorithm bugs have been previously observed in autonomous vehicle software (Garcia et al., 2020) and machine learning software (Thung et al., 2012) .

Example:

We provide examples for the two sub-categories:

-Statistical modeling: In a bug report titled "Death rates should increase when ICU's are overwhelmed " (Begley, 2020a) , a practitioner describes how incorrect assumption can result in incorrect modeling behavior. The practitioner discusses that bed space is correlated with estimation of fatality rate. When bed space of hospitals are exhausted hospitals will not be able to treat new COVID-19 new patients, which could potentially increase the fatality rate.

The bug report provides evidence that if the context of COVID-19 is not correctly incorporated in statistical models, them models will provide incorrect results. Incorrect statistical models can be consequential, as countries are adopting public health policies specific to COVID-19. For example researchers have critiqued the statistical models derived by the Institute for Health Metrics and Evaluation at the University of Washington (IHME), and advised USA policy makers to use the modeling results with caution (Begley, 2020b) . -Incorrect logic: In a bug report titled "Fix Prefecture Sorting" (reustle, 2020), a practitioner describes a sorting bug which occurs when trying to visualize COVID-19 cases based on prefectures in Japan. A prefecture is an administrative jurisdiction in a country similar to a state or province (Hu and Qian, 2017) . The bug occurred due to an incorrect logic that did not perform sorting by prefectures.

II: Data:: This category corresponds to bugs that occur during mining and storage of COVID-19 data. As discussed in Section 3.2.2 we observed our dataset to include projects that mine and aggregate COVID-19 data. We observe four sub-categories of data bugs: (i) storage: bugs that occur while storing data in a database, (ii) mining: bugs that occur while retrieving data from data APIs, (iii) location: bugs where location information in stored data is incorrect, and (iv) time series: bugs that correspond to missing data for a certain time period. Data bugs have been previously reported for deep learning software (Islam et al., 2019) .

Example:

We provide examples for each of these sub-categories below:

-Storage: In a bug report titled "Temperature data not saved in the backend " (pavel ilin, 2020), a practitioner describes a bug where patient temperature data is inserted in the front-end but not stored in the database. -Mining: When mining data from sources bugs occur. A practitioner describes a mining bug in a bug report titled "CDC Children scraper is outdated " (Timoeller, 2020) . The mining tool mines data related to children affected by COVID-19. -Location: in a bug report titled "Rajasthan District names are wrong", a practitioner describes that inserted location data for an Indian state called 'Rajasthan' is wrong (SinghRajenM, 2020). -Time series: missing data was reported for a project and reported in a bug report titled "Data has a gap between 2020-3-11 and 2020-3-24 " (zbraniecki, 2020) .

III: Dependency :: This category corresponds to bugs that occur when execution of the software is dependent on a software artifact that is either missing or incorrectly specified. For COVID-19 projects an artifact can be an API or a build artifact. Dependency bugs were previously reported in IaC , machine learning (Thung et al., 2012) , and audio processing software (Cotroneo et al., 2013) .

Example: In a bug report titled "Missing PostGIS " (vaclavpavlicek, 2020), a practitioner describes that installation and execution of the software is prohibited due to a software package called 'PostGIS', which is used to store spatial and geographic measurements, such as area, distance, polygon, and perimeter in PostgreSQL databases.

IV: Documentation:: This category corresponds to bugs that occur when incorrect and/or incomplete information in specified in release notes, maintenance notes, and documentation files, such as README files. Documentation bugs have been reported in prior research related to autonomous vehicle (Garcia et al., 2020) and IaC .

Example: In a bug report titled "Missing code of conduct", a practitioner describes a 'CODE OF CONDUCT.md' file to be missing in a Markdown file that describes how practitioners can contribute to the project (mdeous, 2020).

V: Performance:: This category corresponds to bugs that cause performance discrepancies for the software. Performance bugs are manifested in slow response of the web or mobile app. Performance bugs have been previously reported in prior research related to blockchain software (Wan et al., 2017) .

Example: In a bug report titled "Cluster animation slowing down the browser. It also takes much time", a practitioner describes how a performance bug related to an animation feature is slowing down a Firefox browser on Windows 10 (Subratappt, 2020) . The performance bug was reported for a website called 'covid19india.org' 7 , which aggregates COVID-19 data for India and displays them.

VI: Security :: This category corresponds to bugs that violate confidentiality, integrity, or availability for the software. Prior research on bug categorization have also observed security bugs to appear for blockchain software (Wan et al., 2017) , video game software (Pascarella et al., 2018) , and IaC .

Example: In a bug report titled "Fix password reset procedure" (landovsky, 2020), a practitioner describes a password reset bug, where the password reset procedure ends arbitrarily after 500 login attempts.

VII: Syntax :: This category corresponds to bugs related with the syntax of the programming languages used to develop the software. Syntax-related bugs have been reported in prior research related to IaC , deep learning (Islam et al., 2019) , and OSS GitHub software (Ray et al., 2014) .

Example: We notice bugs related to data types in 'neherlab/covid19 scenarios'. In the bug report titled "Fix types and linting errors" (ivan aksamentov, 2020), a practitioner describes how linting and type checking was disabled for the project, which led to bugs related to linting and type checking.

VIII: UI :: This category corresponds to bugs that involve the user interface (UI) of the software. UI bugs include navigation-related bugs on web pages, bugs related to accessibility, displaying incorrect images, links, and color, and responsiveness. UI bugs have been previous reported for blockchain software (Wan et al., 2017) .

Example: In a bug report titled "accessibility fixes" (abquirarte, 2020) describes a UI bug related to accessibility. According to the bug report, a screen reader incorrectly renders check marks and crosses in front of the "Do's and Don't as M's and N's".

Based on the 'Proportion of Bugs Across All Projects (BugPropAll)' metric we observe UI bugs to be the most frequent category, whereas documentation is the least frequent category. We provide a compete breakdown of the metric in Table 6 . Data bugs have four sub-categories: storage, mining, location, and time series. The frequency for storage, mining, location, and time series is respectively, 4.7%, 5.8%, 87.2%, and 2.3%. Algorithm bugs have two sub- categories: statistical modeling and wrong logic. The frequency for statistical modeling and wrong logic is respectively, 42.3% and 57.7%.

We observe bug category frequency to vary for different categories of projects. We provide the 'Proportion of Bugs For a Certain Project Category (Bug-PropCat)' values for each project category in Table 7 . 'AGG', 'MINE', 'STA', 'EDU', 'TRAK', 'VOL' and 'EQU' respectively, corresponds to the seven project categories: aggregation, mining, statistical modeling, education, user tracking, volunteer management system, and medical equipment.

According to Table 7 , except for mining and medical equipment software, the dominant bug category is UI. One possible explanation can be the analyzed software projects have UIs, which may have contributed to the frequency of UI bugs. For mining software the dominant bug category is data bugs i.e., bugs that occur due to storing and processing of COVID-19 data. For medical equipment software the dominant bug category is dependency. We also notice algorithm bugs to be the second most frequent bug category for statistical modeling software. Similar to prior work on machine learning (Thung et al., 2012) , we expected algorithm bugs to be the most dominant category for statistical modeling. Statistical modeling software also have UIs for user interaction, and the count of UI bugs may have foreshadowed the count of algorithm bugs.

We report agreement rate for four steps: issue labeling, open coding, closed coding, and rater verification. Labeling issues as bugs: While labeling collected issue reports as bug reports and non-bug reports the agreement rate is 96.5% and the Cohen's Kappa is 0.9. Open coding to identify bug categories: The first and second author respectively, identified 9 and 10 categories. The agreement rate is 72.7%, and the Cohen's Kappa is 0.70, indicating 'substantial' agreement (Landis and Koch, 1977) . The first author identified 'database' as a category not identified by the second author. Upon discussion both authors agreed that 'database' is related to data storage and belongs to the data category. The second author identified two additional categories 'Public health data' and 'Type errors'. After discussing the definitions of all categories both authors agreed that 'Public health data' and 'Type errors' can respectively, be merged with data and syntax. Closed coding to quantify bug category frequency: During closed coding the first and second author mapped each project to an existing category. The agreement rate is 95.1% and the Cohen's Kappa is 0.93. The authors disagreed on the labeling of 27 bug reports, which are resolved through discussion. Rater verification: For the randomly selected 250 issue reports we allocate an additional rater who manually identified which of the issue reports are bug reports and non-bug reports. The Cohen's Kappa between the additional rater and the first author is 0.80, indicating 'substantial' agreement (Landis and Koch, 1977) . The Cohen's Kappa between the additional rater and the second author is 0.84, indicating 'perfect' agreement (Landis and Koch, 1977) . The agreement rate between the additional rater and the first and second author is respectively, 89.0% and 93.0%.

We have also measured the agreement rate between an additional rater and the authors for categorizing bug reports. Cohen's Kappa between the additional rater and the first author for a randomly selected set of 250 bug reports is 0.65, indicating 'substantial' agreement (Landis and Koch, 1977) . Cohen's Kappa between the additional rater and the second author for a randomly selected set of 250 bug reports is 0.68, indicating 'substantial' agreement (Landis and Koch, 1977) . The agreement rate between the additional rater and the first and second author is respectively, 78.0% and 81.6%.

We provide bug resolution time as measured in hours for all bug categories in Table 8 . From Table 8 we observe that based on min and median bug resolution times security bugs take the longest to resolve, followed algorithm bugs. We also observe data bugs to take as long as 548 hours to resolve. A breakdown of bug resolution time across the six categories of projects is provided in Table 9 . The 'All' row in Table 9 shows the minimum, median, and maximum bug resolution time for all bug categories measured in hours.

In Table 9 we observe four instances where the minimum bug resolution time is less than 6 minutes (< 0.1 hours). One possible explanation can be practitioners' habit of opening a bug report after they have developed the fix for a bug (Wan et al., 2017; Thung et al., 2012) . In such cases, practitioners notice the bug early, construct the fix for the bug, and then submit the bug report by opening and closing the bug report promptly.

Median bug resolution duration for each project type and bug category is provided in Table 10 . 'AGG', 'MINE', 'STA', 'EDU', 'TRAK', 'VOL' and 'EQU' respectively, corresponds to the seven project categories: aggregation, mining, statistical modeling, education, user tracking, volunteer management system, and medical equipment. We observe median bug resolution time to vary across bug categories as well as for project categories.

In this section, we first provide a summary of our findings in Section 5.1. Next, we provide a discussion on the implications of our findings in Section 5.2. 

We discuss the implications of our findings below: Security and privacy implications of user tracking software: From Table 3 we observe 9 projects to be related with user tracking. While the benefits of user tracking software has been documented for countries, such as Russia and South Korea (Morning, 2020) , this category of software can have negative impacts on privacy of end-users. Data generated from user tracking software can be leveraged for marketing purposes. To preserve privacy of user data in user tracking software we make the following recommendations:

-Policy makers should construct policies specific to COVID-19 software that collects user data. -Practitioners who develop user tracking software should leverage existing privacy policy frameworks, such as the 'National Institute of Standards and Technology (NIST) Privacy Framework' (National Institute of Standard and Technology, 2020). -Privacy researchers can build tools that will automatically detect and report privacy policy violations.

Evidence from Table 7 shows that security bugs to exist for user tracking software. We advocate security researchers to systematically investigate if user tracking software includes security bugs. Recent news articles suggest that user tracking software, such as contract tracing apps may become more and more prevalent as Apple and Google are already providing frameworks to build software that tracks user data. (Apple, 2020) . Our hypothesis is that availability of these frameworks will facilitate rapid development and deployment of mobile apps that collect user data. Security weaknesses in these apps can provide malicious users opportunity to conduct large-scale data breaches. We notice anecdotal evidence in this regard: a researcher has identified vulnerabilities in a user tracking app that could leak user location data (Greenberg, 2020) . Panelists at EuroCrypt 2020, a cryptography research conference, discussed limitations of user tracking mobile apps for COVID-19 with respect to API design, indoor location tracking, and informing users about privacy risks (EuroCrypt, 2020a) (EuroCrypt, 2020b).

Towards constructing correct statistical models: From Section 4.2.1 we have observed statistical modeling bugs to exist. Bugs related to statistical modeling can be consequential because based on the predictions generated by statistical models, policy makers enforce public health policies. One possible explanation for buggy statistical models can be attributed to the quality of datasets using which statistical models are build (Koerth et al., 2020) . For example, fatality prediction models that are built using the 'Diamond Princess Cruise Ship Dataset' may not be applicable for a specific geographic region with low population density. Another possible explanation can be a lack of context and knowledge related to public health specific that hinders model builders to identify appropriate independent variables to construct the models. Incorrect estimation of hospital beds from our discussion in Section 4.2.1 is one example. Other examples of independent variables related to public health includes staff availability, count of known cases, hospitalization rate etc. (Attia, 2020) . According to a health expert (Attia, 2020) , statistical models that predicted 2.4 million US residents to die, assumed a hospitalization rate of 15-20%, which in reality was 5%.

Based on our findings and above-mentioned explanations we make two recommendations:

-Automated testing for COVID-19 modeling: We hope to see novel research in the domain of COVID-19 that will test the correctness of constructed statistical models used in forecasting in an automated manner. In recent years, we have seen research efforts that test deep learning models (Tian et al., 2018; Pei et al., 2017; Ma et al., 2018) . We expect similar research pursuits for COVID-19 statistical modeling. -Better synergies between data science and public health practitioners: Construction and verification of COVID-19 statistical modeling should involve practitioners from public health and data science. Public health practitioners within a specific locality can provide necessary context that data scientists can incorporate in their statistical models.

Implications for Educators: Our findings have implications for educators involved in teaching the following topics:

-Data science: Educators who teach data science can use the examples of statistical modeling bugs to highlight the value of considering the full context and related limitations that accompany statistical modeling. -Information security and privacy: User tracking software can be discussed in information security and privacy courses to demonstrate the value of protecting user data. Such discussion can also include privacy policy frameworks that are already in place, such as the NIST Privacy framework (National Institute of Standard and Technology, 2020).

-Software engineering: Our categorization of bugs related to COVID-19 software development can be discussed to demonstrate that understanding and repair of bugs requires contextualization.

Benchmark for practitioners and researchers: Tables 6-10 can be used as a measuring stick by practitioners and researchers who are involved with COVID-19 software projects. Practitioners can estimate their bug resolution efforts by comparing median resolution times for bugs in their COVID-19 software projects to that of Tables 8, 9, and 10.

Compared to prior work related to blockchain and machine learning (Thung et al., 2012; Wan et al., 2017) , median bug resolution time is lower for COVID-19 software projects. We provide two possible explanations: one possible explanation can be related to the sense of urgency. Practitioners may have realized that bugs in COVID-19 software projects could hamper the analysis or mitigation of COVID-19, and therefore, needs immediate attention. Another possible explanation can be the limitations of our dataset. The age of our software projects do not exceed four months and that may have biased median bug resolution time. We advocate for future research that will confirm or refute our explanations.

Recurrence-related implications: In Section 2.1, we have discussed evidence related to recurrence of COVID-19. We hypothesize that COVID-19's recurrence will lead to more COVID-19 software building. Whether or not our findings hold for these newly constructed COVID-19 software can be validated through a replication of our paper. We expect to observe more categories of COVID-19 software projects as well as more bug categories that could occur with varying distributions.

Mitigation strategies: We suggest the following mitigation strategies for each of the identified bug categories:

-Algorithm: Algorithm bugs can be mitigated through peer reviews. One sub-category of algorithm bugs is bugs related to statistical models built for COVID-19. Mitigation of bugs in statistical models may require synergies between practitioners from data science and public health. -Data: Data bugs can be mitigated by adequate accumulation of data, a strategy which requires non-trivial amount of effort. We advocate for community effort so that COVID-19 data sources are curated. -Dependency: Practitioners can mitigate dependency bugs by using automated tools, such as Dependabot 8 . -Documentation: Practitioners can mitigate documentation bugs through peer review and documentation-specific linters that can alert practitioners when incorrect information is specified in README files. -Performance: Researchers can use research tools, such as Toddler (Nistor et al., 2013) and Clarity (Olivo et al., 2015) to detect performance bugs in software source code. Furthermore, performance bugs in UIs can be detected by record and replay tools, such as Pounder as suggested by researchers (Adamoli et al., 2011) .

-Security: We advocate practitioners to use automated security scanning tools applicable for the language used in the COVID-19 software repository. Prevalence of security bugs can be detected and mitigated by applying security scanning tools. -Syntax : We advocate practitioners to use existing static analysis tools to mitigate syntax bugs. -UI : Practitioners can mitigate UI bugs by using automated UI testing tools, such as Selenium 9 . We have noticed UI bugs related accessibility issues, which can be mitigated by automated tools, such as Applause 10 .

We describe the limitations of our paper as following:

Conclusion validity: We have used raters who derived the software and bug categories. Both raters are authors of the paper. Our derived categoires are susceptible to the authors' bias. We mitigate this limitation by allocating another rater who is not the author of the paper who verified our ratings.

Our categories might not be comprehensive because our categorization for projects and bugs is limited to the dataset that we collected. The bug resolution time could be limiting as our dataset includes projects that have a duration of four months.

We use the topic 'covid-19' to identify and filter COVID-19 software projects from GitHub. Any software project that is not labeled as 'covid-19' will not be included in our dataset.

Our datasets have limited lifetime as the COVID-19 was discovered in December 2019, and the lack of maturity in our datasets may influence our analysis. We mitigate this limitation by identifying projects using a filtering criteria so that we can identify projects with sufficient development activity.

Internal validity: For RQ1 and RQ2 we use ourselves, the authors of the paper, as raters who conduct open and closed coding on README files and bug reports. Our research is susceptible to mono-method bias, as our categorization and labeling may be influenced by the authors' implicit expectations and hypotheses about the study.

External validity: Our findings are not comprehensive. We have not analyzed projects hosted outside GitHub and private projects hosted on GitHub. We mitigate this limitation by analyzing 129 software projects that belong to 7 categories.

The COVID-19 pandemic has impacted people all over the world causing thousands of deaths. Software practitioners have joined the fight in combating the spread and mitigating the dire consequences of COVID-19. An understanding of COVID-19 software categories and software bugs can give us clues on how the software engineering community can help even further in combating COVID-19.

We conduct an empirical study with 129 COVID-19 software projects hosted on GitHub. We identify 7 categories of software projects: aggregation, mining, statistical models, education, volunteer management, user tracking, and medical equipment. By applying open coding on 550 bug reports, we identify 8 categories of bugs: algorithm, data, dependency, documentation, performance, security, syntax, and UI. We observe bug category frequency to vary with project categories, e.g., for mining projects data-related bugs is the most frequently occurring category.

Our findings have implications for educators, practitioners, and researchers. Educators can use our categorization of COVID software projects and related bugs to educate students about the security and privacy implications of COVID-19 software. Privacy researchers can build tools that will check if user tracking software related to COVID-19 are not leaking user data. Practitioners in the data science domain can learn from our categorization of statistical modeling bugs to understand limitations of constructed statistical models and verify underlying assumptions that accompany constructed statistical models. Based on our findings we also advocate for better synergies between data scientists and public health experts so that statistical modeling bugs can be mitigated. We hope our paper will advance further research in the domain of COVID-19 software. 

Automated gui performance testing

2020) Fix types and linting errors

Privacy-preserving contact tracing

Applifting (2020) pomuzeme.si. github.com/Applifting/pomuzeme.si, [Online

Comparing covid-19 to past pandemics, preparing for the future, and reasons for optimism

Death rates should increase when icu's are overwhelmed

Influential covid-19 model uses flawed methods and shouldn't guide u.s. policies, critics say

boogheta (2020) boogheta/coronavirus-countries

Not all bugs are the same: Understanding, characterizing, and classifying bug types

CDC (2020) Cases, data, and surveillance

Recurrence of positive sars-cov-2 rna in covid-19: A case report

A coefficient of agreement for nominal scales

Testing techniques selection based on odc fault types and software metrics

How simulation modelling can help reduce the impact of covid-19

Mystery deepens over animal source of coronavirus

Potential antivirals and antiviral strategies against sars coronavirus infections

Analyzing the epidemiological outbreak of covid-19: A visual exploratory data analysis approach

Impact of the coronavirus pandemic on the global economy -statistics & facts

EuroCrypt (2020b) s-212 panel discussion on contact tracing

Sport in the face of the covid-19 pandemic: towards an agenda for research in the sociology of sport Farhana E, Imtiaz N, Rahman A (2019) Synthesizing program execution time discrepancies in julia used for scientific software

A comprehensive study of autonomous vehicle bugs

Language savant

Search : Covid-19

The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sars-covindia-covid-19-contract-tracing-app-patient-location-privacy

Neurologic features in severe sars-cov-2 infection

Its not a bug, its a feature: How misclassification impacts bug prediction

Land-based finance, fiscal autonomy and land supply for affordable housing in urban china: A prefecture-level analysis

How digital contact tracing slowed covid-19 in east asia

Ieee standard classification for software anomalies

Temperature data not saved in the backend

A comprehensive study on deep learning bug characteristics

Attempt to understand public health relevant social dimensions of covid-19 outbreak in poland

Structural basis for the inhibition of sars-cov-2 main protease by antineoplastic drug carmofur

Corona Virus

juanmnl (2020) covid19-monitor. github.com/juanmnl/covid19-monitor

Projecting the transmission dynamics of sars-cov-2 through the postpandemic period

The effect of human mobility and control measures on the covid-19 epidemic in china

The measurement of observer agreement for categorical data

Online; accessed 10-May-2020] makers-for life (2020) makers-for-life/makair

An empirical study on android-related vulnerabilities

Deepmutation: Mutation testing of deep learning systems

How do developers fix crossproject correlated bugs? a case study on the github scientific python ecosystem

Missing code of conduct

Two case studies of open source software development: Apache and mozilla

NewsEvents/AlertsNewsletters/all/ Mobile-Applications-For-COVID-Tracking-Tracing-Balancing-the-Need-for-Personal-Information-and-Pr [Online

Curating github for engineered software projects

Nist privacy framework

Artificial intelligence against covid-19: An early review neherlab (2020) covid19 scenarios. github.com/neherlab/covid19_ scenarios

Toddler: Detecting performance problems via similar memory-access patterns

Static detection of asymptotic performance bugs in collection traversals

OpenMined (2020) covid-alert. github.com/OpenMined/covid-alert

A machine learning application for raising wash awareness in the times of covid-19 pandemic

How is video game development different from software development in open source?

Deepxplore: Automated whitebox testing of deep learning systems

Categorizing the content of github readme files

Covid-19 infodemic: More retweets for science-based information on coronavirus than for false information

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: Covid-19 case study

Identification of covid-19 can be quicker through artificial intelligence framework using a mobile phone-based survey in the populations when cities/towns are under quarantine. Infection Control and Hospital Epidemiology p 118

A large scale study of programming languages and code quality in github

reustle (2020) Fix prefecture sorting. reustle/covid19japan/issues/15

Policy opportunities to enhance sharing for pandemic research

High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2

Ai-driven tools for coronavirus outbreak: Need of active learning and cross-population train/test models on multitudinal/multimodal data

] soroushchehresa (2020) soroushchehresa/awesome-coronavirus. github.com/ soroushchehresa/awesome-coronavirus

Cluster animation slowing down the browser. it also takes much time

Covid-19 in moscow: prognoses and scenarios

An empirical study of bugs in machine learning systems

Deeptest: Automated testing of deepneural-network-driven autonomous cars

Cdc children scraper is outdated

Software that reads ct lung scans had been used primarily to detect cancer. now it's retooled to look for signs of pneumonia caused by coronavirus

vaclavpavlicek (2020) Missing postgis. Applifting/pomuzeme.si/issues/ 164

Using social and behavioural science to support covid-19 pandemic response

Bug characteristics in blockchain systems: A large-scale empirical study

A human monoclonal antibody blocking sars-cov-2 infection

WHO (2020) Global research on coronavirus disease

Why Hunger (2020) Why hunger

.' ? the sociology of health and illness in covid-19 time

A mathematical model for the novel coronavirus epidemic in wuhan, china

Combating covid-19-the role of robotics in managing public health and infectious diseases

Bug reports for desktop software and mobile apps in github: What's the difference?

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Acknowledgements We thank the PASER research group members at Tennessee Technological University for their useful feedback. We also thank Farzana Ahamed Bhuiyan of Tennessee Technological University for her help as an additional rater.