key: cord-147853-h9t7sp4z authors: Stephany, Fabian; Stoehr, Niklas; Darius, Philipp; Neuhauser, Leonie; Teutloff, Ole; Braesemann, Fabian title: The CoRisk-Index: A data-mining approach to identify industry-specific risk assessments related to COVID-19 in real-time date: 2020-03-27 journal: nan DOI: nan sha: doc_id: 147853 cord_uid: h9t7sp4z While the coronavirus spreads, governments are attempting to reduce contagion rates at the expense of negative economic effects. Market expectations plummeted, foreshadowing the risk of a global economic crisis and mass unemployment. Governments provide huge financial aid programmes to mitigate the economic shocks. To achieve higher effectiveness with such policy measures, it is key to identify the industries that are most in need of support. In this study, we introduce a data-mining approach to measure industry-specific risks related to COVID-19. We examine company risk reports filed to the U.S. Securities and Exchange Commission (SEC). This alternative data set can complement more traditional economic indicators in times of the fast-evolving crisis as it allows for a real-time analysis of risk assessments. Preliminary findings suggest that the companies' awareness towards corona-related business risks is ahead of the overall stock market developments. Our approach allows to distinguish the industries by their risk awareness towards COVID-19. Based on natural language processing, we identify corona-related risk topics and their perceived relevance for different industries. The preliminary findings are summarised as an up-to-date online index. The CoRisk-Index tracks the industry-specific risk assessments related to the crisis, as it spreads through the economy. The tracking tool is updated weekly. It could provide relevant empirical data to inform models on the economic effects of the crisis. Such complementary empirical information could ultimately help policymakers to effectively target financial support in order to mitigate the economic shocks of the crisis. With COVID-19 ("coronavirus") reaching the level of a pandemic (see Fig. 1A ), governments and companies around the world are also exposed to the resulting risks for the highly interconnected global economy. To slow down the spread of COVID-19, governments in China, Europe, the US, and beyond are taking drastic measures, such as travel warnings, border and store closures, regional lockdowns, and curfews. These measures are having drastic consequences for personal freedom and the economy. Some sectors, such as airlines or hotels, are facing a nearly complete breakdown of demand. As a first reaction to the expected economic shocks, the global stock markets have collapsed (see Fig. 1B ). In an attempt to mitigate the general economic downturn, governments all over the world are providing considerable financial support. The US government, for example, is preparing an aid package of 2.2$ trillion in response to the virus [1] . The German government plans to take up to 156 billion Euros of additional debt (equivalent to half of the federal budget for 2020) to support the economy [2] . Many of these immediate aid packages are not targeted to specific industries, but are meant to support businesses in all parts of the economy. While such general programmes can help to stabilise financial markets in the short term, it is paramount for their long-term effectiveness to concentrate the support to those areas of the economy that are most in need. For this reason, governments have to identify the industries that are most severely affected by the coronavirus pandemic. However, in the current situation, policy-makers lack reliable and up-to-date empirical data, which would allow to assess industry-specific economic risks in real-time. Such information would be crucial to effectively target financial support as the crisis hits the economy and to mitigate the economic shocks. The study presented here investigates a potential data source that could provide an empirical basis to identify industry-specific economic risks related to COVID-19 and to inform models on the economic effects of the current crisi. We examine company risk reports (10-K reports) filed to the U.S. Securities and Exchange Commission (SEC) and introduce a data-mining approach to measure firms' risk assessments. 1 In collecting all 1,750 reports published from 30th January 2020 -the day the term coronavirus first appeared in a 10-K report -to 29th March, we can assess and track the reported risk perceptions related to COVID-19 for different industries. Preliminary findings suggest that the company risk reports' show a forward-looking awareness of potential economic risks associated with the corona-crisis, which is leading stock market developments by weeks. Moreover, the awareness towards COVID-19 differs substantially between industries. For example, while 48 % of the firms in retail have mentioned the coronavirus as a potential economic risk, only 22 % of the businesses in financial services have done so. Thirdly, based on natural language processing, we can identify specific corona-related risk topics and their perceived relevance for the different industries. Lastly, our approach allows us distinguish the industries by their reported risk awareness towards COVID-19. While not all sectors of the economy mention severe short-term risk factors due to the corona-crisis, the dense integration of business activities between all parts of the economy implies that adverse effects of the crisis could likely spread from currently affected industries to others over time. The empirical information provided could help to inform macro-economic models on the effects of the corona-crisis and, thus, help to inform policy-makers to better target current economic support programmes to industries that report most severe risks at the current phase of the crisis. The preliminary findings presented here are summarised in one compound index. The CoRisk-Index 2 tracks industry-1 SEC filings represent financial statements of publicly listed companies including a risk assessment. SEC filings are imperative to comply with legal and insurance requirements and therefore should contain the most relevant risks. As a result, companies have a strong incentive to neither under-nor overestimate risks. Moreover, analysing the most recent two-month period of SEC filings represents a random sample of all companies since companies are obliged to report to the SEC at a fixed, but randomly assigned date, independent of their industry. 2 http://oxford.berlin/CoRisk specific risk assessments related to COVID-19 in real-time. It is available on an interactive online dashboard. This tool is constantly updated and the methodology will be refined. It allows researchers, policymakers, and the public to estimate potential risk factors of COVID-19 in individual sectors of the economy. As the risk filings database is updated on a daily basis, the online tool will allow tracking the potential impact of the crisis as it unfolds and spreads through the economy. With more firms providing risk reports that describe the potential or real impacts of the crisis, the tool will be refined to allow for a more granular perspective on individual industries and sub-sectors. Not all sectors of the economy are substantially affected in the early phase of the crisis, but dense inter-connections between all parts of the economy imply that the adverse business effects might spread to other industries over time. In this section, we review related work on the economic consequences of COVID-19 and studies on previous epidemics. Moreover, we discuss the assessment of economic risks via reports such as the 10-K reports required by the U. S. Securities and Exchange Commission (SEC). Global pandemics of infectious diseases are not a new phenomenon. Throughout the last century, the world experienced several global and regional outbreaks: Most severely, millions died during the spread of the Spanish flu in 1918-1920. Regarding the economic impact by sector, Gopinath [11] identifies manufacturing and the services sector as disproportionately affected in China, based on the Purchasing Managers' Index. Ramelli and Wagner [12] uses Google search intensity to measure attention paid to COVID-19 and stock market data to reveal the economic impact by sector. Their analysis shows that the energy, retail, and transportation sector experienced the largest losses in the United States and China, whereas health care gained considerably in both countries. The analysis by Huang et al. [13] (also based on stock market data) confirms that the services sectors seem to be the most severely affected in China. In sum, Baldwin and Mauro [10] provides valuable context, insights and timely policy recommendations. However, none of the recent contributions provide an empirical database to analyse the economic risks of different industries based on up-to-date empirical evidence (beyond highly aggregated stock market signals). 3 At this point we want to highlight that it is not the aim of this study to present a model of risk forecasting that is competing with established macro-economic approaches. In contrast, the data-driven methodology presented here aims to explore an alternative data source that could help to inform such economic models. The advantage of the data we provide here is the high time resolution. More traditional sources of empirical information used to calibrate macro-economic model usually include, for example, unemployment rates. While the value of such statistics is undisputed, they are reported with a time-lag. We perceive the purpose of the index provided here to be a complementary data source that could be compared with official statistics on the economic effects of the crisis over time. While research on recent epidemics are limited to simulations [14, 15] , or specific sectors [16] , the study of historical pandemics might provide informative empirical assessments of pandemic-related economic effects. Studies on the 1918 Spanish Flu confirm the primordial effectiveness of non-pharmaceutical (such as, for example, social distancing) interventions even if these come at the cost of economic slowdowns [17, 18] . Moreover, based on the historical data, researchers find a correlation between mortality rates and declines in GDP, consumption and returns on stocks [19] and an increase in poverty [20] . However due to the global scope of the crisis for the highly inter-connected world economy of 2020, insights derived from past epidemics are of limited use in identifying the various industry-related risks during the COVID-19 pandemic. Both, the research on COVID-19 and the historical pandemics rely on stock market information to quantify the economic effects of infectious diseases. However, stock market information comes with several drawbacks. Most importantly, stock markets are prone to irrational herd behaviour and prices capture a variety of information signals into one aggregated index. Examining current stock market dynamics reveals a general economic downturn, but it does not allow to isolate the sector-specific COVID-19 risks. Therefore, we propose to use SEC reports which include risk statements. We argue that these reports represent a promising real-time measure of industry-specific business risks. Furthermore, the analysis of report statements discussing 'coronavirus' allows to isolate the business risks exclusively associated with the COVID-19 outbreak. Since the great recession hit the world economy in 2008, risk has been a crucial topic in governance and finance. While risk assessments of the financial system led to diverse measures to make the world economy less vulnerable to economic shocks originating in the financial sector, a health crisis, such as the current pandemic, poses different risks to the economy. While government measures against the spread of the disease hinder the population from working and consuming, which results in businesses interrupting production, many economies face demand and supply shocks at the same time. In particular, as different industries rely on distinct input factor compositions and supply chains, the sectors of the economy react differently to shocks [21] . Regarding the COVID-19 crisis, we expect sectors whose operations are more connected to supply from manufacturers in China to publicly report corona-related risk earlier than others. These sectors are also highly connected and interdependent within the national economy and risk might spread between sectors [22, 23] . Most risk assessment approaches focus on quantitative probability-based methods and financial data [24, 25] . The data published in such quantified risk assessments is often made available retrospectively, which makes a real-time evaluation of risks difficult. In contrast to such assessments, we investigate the annual 10-K reports filed to the U. S. Securities and Exchange Commission (SEC), which provide verbal corporate risk disclosure and financial statements. Besides, SEC filings are imperative for legal and insurance requirements and they need to contain the most relevant risks to protect the company from legal liabilities. Depending on the volume of publicly traded stocks, companies with a public float of over 700 million USD are obliged to report within 60, while smaller companies have 75 or 90 days after the end of the fiscal year. Furthermore, the reports inform investment decisions and risk governance at the same time and, thus, companies, as rational agents, are likely to communicate moderate risk assessments [26] . In fact, prior work has underlined the forward-looking nature of the reports, since they allowed a more effective prediction of volatility on stock market returns than the compared approaches [27] . Correspondingly, we expect the 10-K reports to also provide forward-looking information on risk assessments during the observed time period and in particular on rising business risk in relation to the spread of COVID-19. Moreover, the filing companies are categorised into sectors using the "Standard Industry Classifier", which allows a real-time analysis of industry-specific risk developments. In this study we apply natural language processing to extract corona-related risk information from the reports and analyse sector-specific differences in risk awareness and disclosure (details in section 3 and in the appendix A). Based on the literature on the potential economic effects of pandemics and on the information available in business risks reports, we derive the working hypothesis 4 of this study: Working hypothesis: SEC 10-K reports contain corona-related information, which allow to track the industryspecific economic risk assessments in near real-time as the economic crisis unfolds. This working hypothesis is split into five operationalised hypotheses. A core assumption of the investigation of the risk reports is that they contain economic meaningful information. Accordingly, the sentiment of the reports should reflect overall market conditions: H 1: 10-K risk reports contain economic relevant information on short-term market expectations, i. e. they are correlated with overall stock market trends in the current crisis. In contrast to stock indices, which are essentially a highly aggregated signal of all market expectations, risk reports provide a detailed outlook of individual risk factors and their potential influence on business outcomes. 5 As some of the potential economic effects of the corona-crisis could be already foreseen when the disease was still largely concentrated in China (in particular risks with respect to supply chain interruptions), we hypothesise that: Industries are affected differently during the crisis, depending on their business model. As introduced in [28] , the economic crisis will unfold in several phases with different characteristics. Those sectors of the economy that are more vulnerable to supply chain interruptions and short-term collapses of consumer demand should be more affected in the early phase of the crisis than other sectors: Corona-related risk factors in 10-K reports differ between industries. 4 While the results provided in this working paper are largely descriptive, we present the research hypotheses, which have guided our data exploration. 5 This should not imply that the firms have a correct risk forecasting model in place or that they actually use all available information to provide a best-possible risk prediction. Instead, we assume that the firms report risks that they perceive as relevant at the time of reporting (see also the methodological limitations section below). A presumption of our approach is that firms try to 'honestly' report relevant risks, not that they actually predict them. With more data becoming available over the next weeks (in particular first statistics on the macro-economic effects, such as unemployment rates), we will incorporate predictive models to get a better understanding in how far the reported risk factors indeed correlate with advers effects experienced. In the early phases of the crisis, as the market expectations adjust, all sectors suffer from restrictions to business activity, which manifest in collapsing stock market prices. However, one would assume that those firms that show most severe corona-related risks in the early phase of the crisis will see most stock market losses: H 4: Stock prices of firms in industries that report high risks lose more value than other firms in the short term. While the crisis unfolds, adverse economic effects radiate through the highly inter-connected economic system. As some sectors that are severely affected already in the early phase of the crisis face increasing challenges, their problems transmit to other sectors, in form of less demand, reduced purchasing power or supply interruptions. Consequently, we hypothesise that 6 : H 5: As the crisis unfolds, those sectors that are densely connected with industries that report high risks in the early phase of the crisis, are likely to report increasing risks sooner than other industries. To test the hypotheses, we use web-mining techniques to collect data from SEC 10-K reports and conduct text mining to extract information relevant for the individual hypotheses, as outlined in the next section. Section 4 presents the results of the analysis. The 10-K SEC filings, as legal reports to publicly communicate corporate risks and financial statements, provide a valuable and innovative text data source for risk assessment. 7 We use a web scraper to collect all 10-K filings published between 30/01/2020 and 20/03/2020 from the SEC's "Electronic Data Gathering, Analysis and Retrieval System" (EDGAR) database. The filing documents contain the company names and central index keys (CIK) as unique firm identifiers and the Standard Industry Classifier (SIC) that allow linking and comparing the filings data with individual business and industry data. Each 10-K filing contains a Risk Factor section (Item 1A) under which the reporting company is obliged to disclose all types of risks their business might be facing to adequately warn investors. Companies are required to use "plain English" in describing these risk factors, avoiding overly technical jargon that would be difficult for a layperson to follow. The text of each risk factor section builds the document by which a company is represented in our analysis Smaller reporting companies are hence not included in our analysis. The entire risk factor text is set to lower case for further analysis, before the occurrence of the main two keyword tokens, 'corona' and 'covid', is counted. Any word containing one of the two main tokens is likewise counted. Hereby the measure 'corona-keyword count per report' is created. Similarly, each company that reports one of the keywords at least once is included in the 'share of corona-mentioning firms per industry' . After text pre-processing, we apply different Natural Language Processing tools to analyse the risk factor section of the reports. Different sectors are facing different challenges, therefore companies are reporting about different corona-related risks. We aim to capture these risk topics via a keyword search on predefined topics. In order to explore possible topics, we use Latent Dirichlet Allocation (LDA) for unsupervised topic modelling, similar to [29] . We only apply the topic model to corona-related paragraphs in the risk sections. We additionally examine the most frequent words 6 Testing this hypothesis requires more time-series data and data on potential propagation mechanisms through the economy. We plan to include an analysis on this in a revised of the working paper. 7 The data source explored here is not meant to replace any established macro-economic statistics, but rather to provide a complementary alternative source of data to identify immediate industry-specific risk perceptions. and bi-grams in the documents. Using this exploratory analysis, we define a set of topics by hand, which are specified by keywords. We then conduct a keyword search to count how much these terms are mentioned in the different industries in order to estimate the topic prevalence. Moreover, we measure the sentiment of corona-related paragraphs via sentiment analysis. Additionally, we build a sentiment-based risk indicator of the sentiment in sentences that mentioned the coronavirus and compare it to stock market movements during the observation period ( Fig. 1B and 1D) . A more detailed description of the different methods can be found in the appendix A. The main results of our analysis are displayed in Figure 1C and D as well as in Figure 2 . 8 From the end of January 2020, an increasing number of 10-K reports refers to the coronavirus (Fig. 1C) , indicating an increasing awareness of COVID-19 as a potential economic risk. The overall sentiment of the 10-K reports decreased sharply (red line in Fig. 1D ), following the collapsed stock markets after 15th February (red line in Fig. 1B) . This timely correlation provides evidence in favour of research hypothesis H 1 -the 10-K reports provide economically relevant information. Moreover, the sentiment of the text sections that discuss the corona-related business risk factors (blue line in Fig.1D) decreases already a few weeks before the overall negative business outlooks manifested in falling stock prices. This observation supports hypothesis H 2: Corona-related risk sentiments show a time lead compared with overall stock market dynamics. From this comparison, we conclude that the 10-K reports contain business relevant information that can be isolated -in contrast to the aggregated signal covered in stock indices alone. Investigating the reports can, thus, provide additional information about the firm-and sector-specific risks associated with the pandemic. Differences in corona-related risk assessment by industries are displayed in Figure 2 . Not all sectors of the economy show a similar awareness of the potential business impacts of the pandemic ( Fig. 2A) , which provides evidence in favour The firms, moreover, differ substantially with regards to the intensity with which they discuss the potential impacts of the coronavirus to their businesses in the 10-K reports. As Fig. 2B shows, some firms, in particular in retail and manufacturing, mention terms related to the pandemic ('corona', 'coronavirus', 'COVID') substantially more often than other firms. The 10-K reports moreover provide information about specific types of risk perceptions. Fig. 2C provides a heatmap with important topics per sector. The rows represent five relevant topics (demand, finance, production, supply, and travel), which have been derived from unsupervised topic modelling and subsequent uni-and bi-gram search. 10 The cells of the heatmap are coloured according to the topic relevance, i. e. the number of topic-specific keywords per industry. Additionally, we have applied a hierarchical clustering algorithm on the data to identify related topics and sectors, which report about similar topics (tree-like dendrogram plots at the side of the heatmap). 8 At this point we want to highlight, again, that the results reported here are preliminary. With more data becoming available over the next weeks and, hence, with a refined methodology, we expect to be able to report more in-depth analyses and conclusive findings. These will be constantly updated on the online dashboard. Moreover, we want to emphasise that all quantitative findings are build on data of U. S. firms. It remains to be seen in how far the findings could be extrapolated to other economies. 9 In some of the industries, only very few firms have provided risk reports in the current observation period. In particular in these groups, the results are likely to change as more data become available. 10 Details of the text mining approach and the final keywords that define the topics are provided in the appendix. The figure reveals that most firms either report demand and financial risk factors (left branch of the top-dendrogram) or production and supply chain risks (middle branch of the top-dendrogram). For example, supply chain problems represent the biggest reproted risk component for firms in the retail industry, while manufacturing firms report both supply and production risks, as well as finance and potential demand risks. While reports of the mining industry consider demand-related issues as the biggest risk factor, other sectors, such as real estate or education, do not report extensively about any of the specific risk types. These findings support hypothesis H 3 -corona-related risk factors reported in 10-K reports differ between industries with regard to occurrence and topical context. The reports are, thus, a valid data source to identify sectors that face particular risk factors in the current early phase of the crisis. Based on the quantifiable corona-related risk factors inferred from the 10-K reports (share of reports mentioning corona-risks per industry, average count of corona-keywords per report, sentiment of corona-related text sections), we could categorise the industries into three risk groups (details of the calculation of the risk ranking and the categorisation are provided in the appendix). 11 The clustering of sectors into the groups is displayed in Fig. 2D . The lowest risk group 1 is dominated by infrastructure sectors, such as construction and power supply, but also includes financial services. Several intermediate risks (group 2) are reported by different industries from the services sector, for example, admin services or information and communication. The highest risk group 3 contains manufacturing, retail, education, water supply, and hospitality. In contrast to hypothesis H 4, the different risk groups do not show substantial differences with regards to short-term stock losses. While this observation is not aligned with our initial hypothesis, it underlines that stock markets are less suited to differentiate industries by corona-related economic risk perceptions, as the deteriorating overall market expectation dominate in the stock price signal. Other data sources, like the 10-K reports investigated here, provide a more granular perspective on COVID-19 related perceived risks. While a one-dimensional categorisation of risk assessments tends to over-simplify the crisis firms are facing, it allows to compare the different industries and to identify those parts of the economy, which currently report more or less severe effecs due to the immediate economic consequences of the pandemic. The data-driven assessment reveals, in particular, that manufacturing and retail are among the industries that report to be most vulnerable to the changing economic environment. Not only decreasing consumer demand but particularly problems along the supply chain mark substantial risk factors for those industries. On the other hand, it is mostly firms from the information and communication services sector, which are less dependent on the physical transport of goods, that are currently reporting fewer risks. Moreover, with only 4 % of the 10-K reports mentioning the coronavirus as a potential risk factor, the construction sector seems largely unaffected by the short term economic shocks. The extent to which the industries are affected by corona-related business risks is likely to change over time. Accordingly, the risk categorisation presented here is a static measure that needs constant updating and refinement as the crisis unfolds. To do this, the study is supplemented by an online dashboard, which tracks the main findings 11 Please note that the ordinal categorisation into different risk groups will be replaced by a continuous risk index for all industries, as more reports are released over time. The simplyfing categorisation of industries into risk groups would most likely need considerable adjustments over time, as more reports will include corona-related risks. with substantially lower values. With more reports being released over time, we will update and refine the CoRisk-Index to allow for more fine-grained industry-specific analyses, including breakdown by topics and sentiment. The index starts with zero, indicating no corona-related business risks in the reports. We will track the CoRisk-Index over the course of the corona crisis until the index reaches zero again. As discussed in section 2, economies are characterised by dense flows of demand, supply, and capital between the industries. Accordingly, we highlight this inter-connectedness in Fig. 3B , displaying the top-two cross-industry capital flows per sector within the U. S. economy (based on 2015 the national Input-Output table provided by the OECD). As the figure reveals, specific parts of the economy are currently mentioning the most substantial risks, in particular manufacturing, retail, and hospitality. Industries in the primary sector, infrastructure, and services are currently reporting fewer risks. While the short term nature of the investigation presented here does not allow for industry-specific time-series investigations, which would be necessary to provide final evidence in favour of hypothesis H 5, the dense interconnection between all parts of the economy indicates that it is only a matter of time until the risk factors from decreasing consumer demand and supply chain interruptions will transfer to the other parts of the economy. 12 As the COVID-19 pandemic unfolds, governments are attempting to reduce contagion rates at the expense of personal freedom and negative economic effects. Sizeable cyclical and fiscal policy packages are prepared in order to counterbalance the dooming global economic downturn. In order to ensure an effective usage of public crisis spending, it is paramount to understand in detail which industries are most affected by the pandemic and currently most in need of support. This study introduces a data-mining approach to measure the reported business risks induced by the current COVID-19 pandemic. We examine company risk reports filed to the U.S. Securities and Exchange Commission (SEC). Harnessing this data set enables a real-time analysis of potential risk factors. Preliminary findings show that the companies' risk awareness is preempting stock market developments by weeks. While stock prices typically condense the market's multiple signals, the 10-K reports allow to isolates the company risk perceptions associated to the COVID-19 outbreak. Moreover, this risk awareness differs substantially between industries, both in magnitude as well as in nature. Based on natural language processing techniques, we can identify specific corona-related risk-topics and their relevance for the different industries: supply chain-and production related issues seem to be mostly relevant for retail and manufacturing, while several industries have reported demand-or finance related risk factors. Additionally, our approach allows to cluster the industries into three groups of reported risk factors, giving a better understanding of which industries are considering the most severe impacts at the current stage. We summarise the corona-related business risk perceptions per industry in a compound index published online. 13 The CoRisk-Index, which will be updated constantly over the course of the crisis, provides an up-to-date database to identify the industries that report most substantial risk factors in the different phases of the unfolding economic crisis. The online dashboard can provide a data source to inform economic models and to provide empirical information that could help to assess potential policies, which aim to effectively target financial support and to mitigate the economic shocks of the current crisis. 12 In the current version of the working paper, Fig. 2E represents a simple visualisation of some relevant economic flows within the U. S. economy, which imply likely propagation channels of adverse economic effects. In a revised version, we will include a more detaile network analysis to investigate potential risk propagation mechanisms. 13 http://oxford.berlin/CoRisk Governments are eager to counterbalance the dooming global economic crisis induced by the COVID-19 pandemic with cyclical and fiscal policy packages of enormous volumes. Democratic accountability demands this public crisis stimulus to be spend as effectively as possible. Our data-driven analysis of the 10-K SEC filings provides an alternative data source, which could be used to help identifying industries that are most severely affected by COVID-19. In particular the industries in risk group 3 are currently reporting most severe effects from the sharply decreasing consumer demand and interruptions of the supply chain, and might face substantial problems through the unfolding of the crisis. 14 Other parts of the economy, in particular information-processing service industries and construction, are currently reporting less severe corona-related risks. However, as the shock will transmit throughout the tightly inter-connected economy over time, these industries are likely to face larger challenges at a later stage in time. Nonetheless, at the current stage, their core businesses seem to be less directly affected by supply chain interruptions and collapsing consumer demand than other industries. Our approach is based on the risk assessments in the U. S. Securities and Exchange Commission (SEC) filing reports. Thus, the value of the approach relies crucially on company self-reporting. While the firms are unlikely to provide a risk prediction with a high forecasting accuracy, it might still be worth exploring the reports as alternative data source to measure risk perceptions. As the reports serve as legal and insurance requirements against financial risks, but also as a basis for investment decisions of investors, companies are implicitly encouraged to neither over-nor understate the risks they are facing. Nevertheless, our results are limited to this self-assessment. As many of the implications of the corona crisis are still uncertain, our approach thus reflects a way to approximate potential implications on current estimations of experts in the different sectors, represented by the companies, and does not include risks that are unforeseeable for themselves at a given point in time. Moreover, the data to assess the precision of these estimations does not exist yet, as there has not been a pandemic with comparable global economic consequences in recent time. In the future, we will evaluate our results by looking at employment data in different sectors. Moreover, the pandemic continues to spread and will soon affect all industry sectors and all countries of the world. As more and more companies will report on related risks, the count of corona mentions alone will lose its information-value as a measure to differentiate between endangered industry sectors. Then, more granular measures, such as the identified topic categories, will become more important to distinguish between different natures of risks. The exploration of alternative data sources that are meant to complement or now-cast established economic statistics always comes with uncertainty. For example, it is not yet clear how the short-term risks described by the different industries will translate into long-term economic outcomes, such as bankruptcies. Nonetheless, we believe that the reports could be a reliable source of empirical information about the issues faced by different industries in the current situation, and they might be used to inform forecasting models on industry-specific economic effects of the crisis, as they help fill a data gap. Models that incorporate alternative data sources such as the one presented here could then be beneficial for developing economic support packages that are currently provided by governments. All technical methods serve the higher purpose of providing timely and comprehensible insights into the industryspecific effects caused by the global outbreak of the coronavirus. To mitigate susceptibility to errors and increase 14 At this point, we want to emphasise again that the findings are preliminary and might change over time, as more data becomes available and as the research methodology is refined. Moreover, the risk categorisation presented in this study is not meant as a direct policy recommendation. Instead, the tool is meant to provide an empirical source of information for real-time tracking of industry-specific risk perceptions, which can help to inform policy makers in times of a fast-evolving economic crisis. reproducibility, we mostly draw from more basic technical methods. This can be seen in the discovery of Corona-relevant keywords which is based on the matching of regular expressions to avoid error-prone text pre-processing. Reduction of technical complexity, however, comes at the cost of diminished modelling fidelity and potential accuracy of results. For instance, the sentiment classifier has not been fine-tuned on text snippets discussing financial risks in particular. Further, it may be questioned whether a generalist sentiment score yields a reliable measure for the assessment of risks. Additionally, the LDA-based topic modelling approach lacks in interpretability and robustness, and we therefore only use it for exploration until now. The topics and keywords we use for estimating topic prevalence are therefore hand-coded which limits the detection of topics to predefined terms. In general, the findings presented here should be considered as preliminary. Sensitivity checks are needed to validate their robustness, and numerous extensions will help to better assess the potential value of the exploration of 10-K reports as a complementary data source to measure industry-specific risk factors. We expect some of these limitations to be mitigated with more reports to be analysed in the near future. Adjusted results will be published on the online dashboard and in a refined version of the working paper. The following impetus for future work builds directly on the identified methodological limitations and caveats mentioned throughout the paper. Rather than harnessing generalist sentiment analysis, future efforts should explore risk classifiers trained on labelled financial risk assessments and entity sentiment analysis [30] . The robustness of the results needs to be checked in more detail. In particular, we will compare historical stock market and unemployment data with risk measures extracted from 10-K reports to investigate the correlation between risk reports and overall economic trends. This also holds for the natural language processing methods applied. Based on more data that is to be captured in the next weeks, we will explore in how far different text mining methods lead to similar results with respect to the topics that can be extracted, and what the effect of different keyword choices is on the results presented in the heatmap. To investigate the potential propagation of risk factors through the economy, we will include a network analysis based on input-output data to identify the most contagious areas of the economy. Another promising direction is the extension of the inter-sectoral U. S.-centred trade network to more comprehensive data on international inter-and intra-sectoral trade. Analysing inter-national cooperation networks [31] , mining industry-specific key technologies [32] and drawing from more advanced network analysis techniques such as dynamicity [33] or sentiment-based network clustering [34] will be highly beneficial. Likewise, the content of other online platforms, such as Wikipedia, as a source for risk assessment could be explored [35, 36] . This could allow an estimation of operational risk channels and risk spillovers propagating between industries and countries due to global supply chains and peer-tier co-operations. Considering more financial data could include credit risks (financial exposures) of companies. Besides the comparison with additional financial signals, we plan to include unemployment rates in our analysis as an exogenous variable to the SEC risk assessment. We are grateful for comments made by Andrew Stephen, Felix Reed-Tsochas, Felipe Thomasz, and Slava Jankin. Their comments have helped to substantially improve the working paper. Sentiment analysis on the SEC filings serves the purpose of a more contextualised semantic understanding of the risk assessment concerning COVID-19. We selected the paragraph of the risk report, with the highest number of coronavirus mentions and calculated the sentiment based on the TextBlob API [39] using the code developed in [30] , We use unsupervised learning techniques to identify the main topics that companies mention when describing coronavirus related risks. Latent Dirichlet Allocation (LDA) is a Bayesian computational linguistic technique that identifies the latent topics in a corpus of documents [40] . This statistical model falls into the category of generative probabilistic modelling: a generative process which defines a joint probability distribution over the observed random variable, i.e. the words of the documents, and the hidden random variables, which is the topic structure. In other words, LDA uses the probability of words that co-occur within documents to identify sets of topics and their associated words [29] . The number of topics has to be defined in advance. LDA is a frequently used technique to identify main topics in a corpus. Nevertheless, the interpretation of these topics can sometimes be difficult. We thus perform LDA for explorative purposes in our research only. By additionally exploring the most common words and bi-grams we then define topics and the defining keywords by hand. We then detect how often companies mention these keywords in the corona related risk sections to estimate how important the topics are in different sectors. In particular, we perform the following steps: Referring to section A.3, we filter all sentences from the risk sections that mention either "corona" and "covid", thereby also accounting for "coronavirus" and "covid-19". Before we train the LDA model we prepare the documents to achieve better performance of the method. We remove all common English stopwords, which are frequent words such as "is, " "the, " and "and" as well as those words which appear in at least 80% of the documents. These words are not useful in classifying topics as they are too frequent and therefore decrease performance. Moreover, we delete all words that do not occur in at least 2 documents. We turn the documents into numerical feature vectors ("Bag of words" representation). These vectors disregard word order and simply contain the number of times a word appears. We then use LDA to extract the topic structure. Like any unsupervised topic model, this requires setting the number of topics a priori. We selected this key parameter based on semantic coherence, evaluating a range of 2 to 8 topics, which leads to a final model of 4 topics. The top 10 terms of each topic are displayed in Table 2 . The derived topics give a good insight into the general narratives of risks used in the documents. Nevertheless, they are hard to interpret, as the early corona-related risk reports are still quite generic such that most of the companies cover a lot of risk factors. Thus we make use of our insights from the topic modelling to define 5 main topics defined by keywords, displayed in Table 1 . The choice of keywords is additionally informed by clustering the most frequently used words and bi-grams in the documents. We then measure the frequency of these keywords per topic per industry in all of the documents to get more insights into the nature of risks different sectors are facing due to the pandemic. Production business operation, business disruption, product, work stoppage, labor disruption, labor, work, manufacturing operation, labor shortage, employee productivity, product development, business activity Supply manufacturing facility, manufacture facility, contract manufacturer, service provider, logistic provider, supply disruption, party manufacturer, supply disruption, facility, supply, transportation delay, delivery delay, supplier, business partner, supply chain, material shortage House leaders look to expedite $2.2 trillion relief package but face possibility that one GOP lawmaker may delay passage -The Washington Post Was das alles kostet Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Potential for global spread of a novel coronavirus from China Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. London: Imperial College COVID-19 Response Team Nonpharmaceutical Measures for Pandemic Influenza in Nonhealthcare Settings-Social Distancing Measures Hand-Hygiene Mitigation Strategies Against Global Disease Spreading through the Air Transportation Network Economics in the Time of COVID-19 Mitigating the COVID Economic Crisis: Act Fast and Do Whatever It Takes Limiting the economic fallout of the coronavirus with large targeted policies What the stock market tells us about the consequences of COVID-19 Saving China from the coronavirus and economic meltdown: Experiences and lessons. In: Mitigating the COVID Economic Crisis: Act Fast and Do Whatever It Takes CEPR Press The macroeconomic impact of pandemic influenza: estimates from models of the United Kingdom Avian Influenza: Potential Economic Impact of a Pandemic on Australia The economic impact of H1N1 on Mexico's tourist and pork sectors Public health interventions and epidemic intensity during the 1918 influenza pandemic Pandemic influenza: studying the lessons of history The Coronavirus and the Great Influenza Epidemic -Lessons from the 'Spanish Flu' for the Coronavirus's Potential Effects on Mortality and Economic Activity The impact of the 1918 Spanish flu epidemic on economic performance in Sweden: An investigation into the consequences of an extraordinary mortality shock Measuring the supply chain risk and vulnerability in frequency space Network structure of production The network origins of aggregate fluctuations The risk concept-historical and recent development trends. Reliability Engineering & System Safety Some foundational issues related to risk governance and different types of risks SEC adopts rules to modernize and simplify disclosure Predicting risk from financial reports with regression COVID-19: Europe needs a catastrophe relief plan The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation Measuring Proximity Between Newspapers and Political Parties: The Sentiment Political Compass Global networks in collaborative programming Mining the Automotive Industry: A Network Analysis of Corporate Positioning and Technological Trends Disentangling Interpretable Generative Parameters of Random and Real-World Graphs Hashjacking" the Debate: Polarisation Strategies of Germany's Political Far-Right on Twitter An exploration of wikipedia data as a measure of regional knowledge distribution Coding together-coding alone: The role of trust in collaborative programming Rank and Filed, SEC filings for humans Yahoo! Finance market data downloader Simplified Text Processing -TextBlob 0.15.2 documentation Probabilistic topic models We use web crawlers and scrapers written in the general programming language Python for finding and extracting the relevant risk section in the 10k reports. Via the SEC online interface 15 past SEC filings are queried since the first emergence of 'coronavirus' in a report on January 30th 2020. A web crawler allows to find the relevant risk factor section in the 10-K reports and stores each paragraph for later text processing and analysis. The collection of stock market data follows the incentive to provide reference data for the company SEC filings. In both, SEC filings and financial markets, each company is assigned a unique Central Index Key (CIK) and a ticker number. Based on the CIK, we identified a list of 13,737 companies using the data registry provided by Rank and Filed [37] . This list was then filtered by the list of companies of which we obtained SEC filings. Using the Yahoo Finance API, we finally retrieved stock prices for all remaining companies in the list [38] . Since this study examines the attention attributed to COVID-19 in the SEC filings, the discovery mechanism of relevant COVID-19 mentions is of central importance. To mitigate susceptibility to errors due to word splitting, stemming and other text preprocessing, we decided for the most simple approach based on the matching of regular expressions. We scanned the reports for the two relatively unambiguous terms "corona" and "covid", also accounting for "coronavirus" and "covid-19" without duplication.