key: cord-0111994-7r461iff authors: Duenas, Marco; Ortiz, V'ictor; Riccaboni, Massimo; Serti, Francesco title: Assessing the Impact of COVID-19 on Trade: a Machine Learning Counterfactual Analysis date: 2021-04-09 journal: nan DOI: nan sha: eaffcdb0011dc1168d8c55b0df388aa40b0b946c doc_id: 111994 cord_uid: 7r461iff By interpreting exporters' dynamics as a complex learning process, this paper constitutes the first attempt to investigate the effectiveness of different Machine Learning (ML) techniques in predicting firms' trade status. We focus on the probability of Colombian firms surviving in the export market under two different scenarios: a COVID-19 setting and a non-COVID-19 counterfactual situation. By comparing the resulting predictions, we estimate the individual treatment effect of the COVID-19 shock on firms' outcomes. Finally, we use recursive partitioning methods to identify subgroups with differential treatment effects. We find that, besides the temporal dimension, the main factors predicting treatment heterogeneity are interactions between firm size and industry. The COVID-19 outbreak has affected the world economy, generating unprecedented health, human, and economic crises. To face the health crisis, governments implemented social distancing and lockdown policies, exacerbating supply and demand shocks. Given the uncertainty about how long the crisis will last, the recovery will depend on the effectiveness of the measures adopted to reactivate production and consumption worldwide (World Bank, 2020) . In a highly interconnected world, the impact of the pandemic on international trade has generated great attention (Felbermayr and Görg, 2020; Antràs et al., 2020) . International trade is being affected by national lockdowns, trade and trade-related measures adopted by countries, and by the temporal disruption of global value chains (Bonadio et al., 2020; Evenett, 2020) . Global trade, which is typically more volatile than output and tends to fall particularly sharply during a crisis, has shown the biggest fall since the 2009 global financial crisis. From the beginning of the COVID-19 epidemics, scholars underlined that, though its impact on international trade could have been comparable to the Great Trade Collapse of [2008] [2009] , this time, the demand side shock is accompanied by a supply-side shock (Baldwin and Tomiura, 2020) . Moreover, this supply-side effect could be reinforced by a supply-side contagion via importing/supply chains, which have grown in relevance during the last decade. In other words, supply disruptions in the countries providing intermediate inputs to a given country are likely to hurt also its export performance. This paper aims to estimate the causal effect of the COVID-19 shock on a firm's probability of survival in the export markets, and to study the heterogeneity of this effect. The main hurdles for this evaluation task are related to the pervasiveness of the COVID-19 shock. Indeed, the fact that all firms are directly and/or indirectly exposed to the effects of COVID-19 crisis makes it hardly possible to find a control group of firms to be used to build a counterfactual non-COVID-19 scenario. Moreover, identifying the main patterns through which the COVID-19 shock has affected firm-level trade is a demanding task because the economy-wide impact of the shock is coupled with complex interdependencies between firms and products belonging to different sectors and countries, as underlined above. By interpreting exporters' dynamics as a complex learning process, 1 this paper's first contribution is exploring and comparing the effectiveness of different Machine Learning (ML) techniques in predicting firms' trade status in two different scenarios, a COVID-19 and a non-COVID-19 setting. ML techniques have been successfully applied to predict firm performances and help companies (and public agencies) in their decision-making in complex environments. The accumulated literature shows that ML techniques' ability to classify companies is high and reliable in such high-dimensional contexts . Up to what we know, this is the first time that ML techniques are used to predict firm-level international trade performance. This paper's second contribution is to use these predictions to estimate the causal effect of the COVID-19 shock at the individual firm level. We use the estimated ML model with the best performance in predicting the 2019 export status of firms exporting in 2018 to build a 2020 non-COVID-19 counterfactual outcome for firms exporting in 2019. Then, we compare these counterfactual non-COVID-19 firm-level export probabilities with the predicted probabilities of the best performing ML model using the characteristics of 2019 exporters to predict their export status in 2020. These estimated probabilities summarize the information on the observed COVID-19 scenario and express it in a metric that is comparable 1 Firms have heterogeneous and incomplete information about the trade opportunities. This is true both on the exporting and the importing side of firm activities. For example, in Albornoz et al. (2012) and Eslava et al. (2015) exporting firms are uncertain and learn about the appeal of their products and, more in general, about the profitability of exporting their products on the international markets. By searching for clients and observing their realized profitability, firms update their beliefs about their capabilities in international markets. with the estimated counterfactual non-COVID-19 outcomes. Finally, we employ ML techniques to study the heterogeneity of the estimated COVID-19 effects according to firms' characteristics. ML has been proved to be helpful in such high dimensional settings to individuate subgroups, which are particularly responsive to the treatment and, therefore, to identify the most relevant dimensions of the heterogeneity of a treatment. Different ML tools have been used in the literature with a trade-off between precision and interpretability: decision-tree based algorithms, ensemble of trees, Bayesian ensemble of trees, doubly robust approaches, LASSO-based approaches, or meta-learners (Athey and Imbens, 2017; Dominici et al., 2020) . We focus on Colombian exporters because of the availability of Colombian Customs data for 2020 and previous years. Similar to many other countries, in 2020, Colombia has witnessed domestic supply and demand shocks related to factory closures, cessation of some public services, and disruptions in the supply chain at home and abroad. de Lucio et al. (2020) found that Spanish exports decreased more in destinations that introduced strict policies to contain COVID-19, particularly in March and May 2020, showing how in Spain export performance during the pandemic depends on COVID-19 induced demand shocks in export markets. Using a sector-level gravity model, Espitia et al. (2021) show that, during the COVID-19 crisis, sectors that tend to be relatively less internationally integrated suffered less from foreign shocks but were more vulnerable to domestic shocks. The paper is organized as follows. Section 2 briefly describes the Colombian context. Section 3 presents the firm-level data, variables employed in the analysis, and descriptive statistics. Section 4 explains the empirical strategy. Section 5 reports the main estimation results, and section 6 summarizes the findings and discusses both interpretation and limitations of the analysis. Colombia is a country that exports little compared to other countries in Latin America with similar development levels. In recent years, the share of total exports of Colombian GDP has oscillated around 15%, well below other countries in the region that practically double this measure, such as Chile and Mexico. Although the Colombian economy was relatively closed during most of the twentieth century (Ocampo and Tovar, 2000) , it has been strongly affected by international crises, as the global financial crisis in 2008 -2009 (Zuluaga et al., 2009 . The Colombian openness started in the 1990s with several market-oriented reforms aiming at liberalizing financial and capital markets. Nowadays, Colombia has 16 bilateral trade agreements in force. Even though Colombia increased the number of trade partners and the value and volume of trade, the integration into world trade markets is still modest (Cepeda-López et al., 2019) . An essential reason behind Colombia's poor performance is that its export basket exhibits a low diversification level, with a prevalence of primary products, because of the relative abundance of natural resources and low-skilled labor. Besides, the emergence of raw products derived from mining has gained a larger share in total exports, reducing the importance of other products that have been successful, such as coffee, bananas, flowers, some labor-intensive manufactures, and petrochemicals. Bruno et al. (2018) analyzed the export diversification patterns of Colombian manufacturing firms using a product-firm approach (bipartite network analysis). They show that manufacturing firms can be grouped in clusters with a modular structure, meaning that the groups of firms reveal specialization in products that require similar capabilities. Interestingly, these clusters are characterized by a hierarchical structure so that some firms can export a wide range of products, exploiting their economies of scope. On the other side, most of the firms are more specialized, exporting a limited number of products. Since the outbreak of the COVID-19 pandemic, Colombia implemented early measures to contain the spread of COVID-19 and prepare the health system and mitigate the economic and social impact. The Colombian government issued non-compulsory requests for remote working to private companies on February 24; schools and universities were closed on March 16. On March 25, when there were less than a dozen deaths, the government implemented a complete and mandatory lockdown until April 13. During this period, only a few essential activities -such as health services, public services, communications, banking and financial services, food production, pharmaceuticals, and cleaning and disinfection products -were excluded. The partial lockdown implementation-between April 27 and May 11-allowed a gradual restoration of mobility, enabling a set of non-essential activities under security guidelines and protocols to guarantee social distancing. Most manufacturing activities were gradually allowed at this stage, while non-authorized activities were restricted to market their products through electronic commerce platforms. Finally, from May 28, restrictions to the services sector have been lifted, and on September 1, the government announces the end of confinement, and airports were open. To better cope with the emergency, Colombian authorities have introduced transitory provisions to secure international trade of essential products. Along with the lockdown measures, medicines, supplies, and equipment in the health sector had zero-tariff for six months. Besides, the export and re-export of these products were forbidden. There was a zero-tariff from April 7 to June 30 for raw materials such as maize, sorghum, soybeans, and soybean cake. The impact of lockdown policies on individuals' behavior and firms' activities is likely to be affected by their endogenous responses to the legal restrictions and to be highly heterogeneous, depending on workers' and firms' characteristics. For instance, Dueñas et al. (2021) find that the responses to lockdown policies largely depend on socio-economic conditions, with the part of the population with worse socio-economic conditions showing lower mobility flows decreases. Regarding business activities, for instance, the lockdown could have led to a more significant impact on formal activities than on informal ones, and some industries could have better adapted than others to remote working. More in general, as mentioned in the introduction, the firm-specific exposure to the COVID-19 shock might depend on multiple factors such as the nature of its final products (de Lucio et al., 2020) , its size, the importance of economies of scale and scope, the identity of the destination countries of its shipments, and the origins of its intermediate inputs. To investigate the impact of the COVID-19 pandemic on Colombian firms we use monthly export transactions data reported at the Colombian Customs Office (Dirección de Impuestos y Aduanas Nacionales, DIAN) for 2018, 2019, and 2020. For each transaction, we consider the exporter ID as the firm identifier; the date; a 10-digit Harmonized System code (HS) characterizing the product; the product origin within Colombia (department level); the means of transportation of the shipment; the country of destination; and, the free on board value of the transaction in US dollars. We removed all transactions related to re-exports of products elaborated in other countries. As a result, we end up with 386,132 customs reports in 2018, 402,140 in 2019, and 365,626 in 2020. In our analysis, we consider products classified at the six-digit level of the HS-code. We consider different features of exporters, according to their monthly exports: the total export value, the number of products (N P ), the number of export destinations (N D), the Herfindahl-Hirschman indexes at the product level (HH p ) and the destination level (HH d ), and sets of dummies for the destinations countries and continents, Colombian-department from which the product comes from, the means of transportation, product sector (HS-chapter), and the product-industry (HS-section). Moreover, we build two sets of dummy variables indicating whether a firm has experience exporting in specific destinations and product sector, and create four size dummies classifying firms according to the quartiles of the firm-level distribution of the total yearly log-value of exports. All in all, we end up with 615 features to be used in the machine learning setup. To measure the COVID-19 demand shock, we use the information on government contention measures coming from Hale et al. (2021) , which consists of four indexes (ranging from 0 to 100) representing the strength of the measures taken by countries to contain the COVID-19 outbreak. The authors provide an economic index summarizing economic policies, an health index summarizing health policies, a government index describing the strictness of 'lockdown style' policies and an overall government response index (stringency index ). Additional features are the number of COVID-19 cases and COVID-19 related deaths in each destination country (per 100,000 inhabitants). The information related to the number of cases and deaths and the government measures are released daily. We average this information to integrate it into our monthly data set. We include this information as a set of variables defined at the destination level. Our final data set is composed of 1,533 features. For a summary of all features see Table Appx .1 in Appendix A. The left panel in Figure 1 shows the evolution of total monthly exports during 2019 and 2020. The total monthly value of exports in 2020 is significantly lower than the one observed for the corresponding month in 2019, except for January and February. The lockdown measures implemented to contain the COVID-19 outbreak in Colombia and abroad had a severe impact between April and June-the value in April 2020 is half than the one observed in April 2019 (47%). In a typical month, large firms get a lion's share the total exports. A regular pattern in looking at customs data is that more prominent exporters trade during many months and ship more frequently than smaller firms, which make only a few shipments. The right panel in Figure 1 shows the proportion of surviving exporting firms at year t among those exporting at year t − 1, by size classes defined at t − 1. Comparing the figures for 2020 with those for 2019, it seems that the COVID-19 outbreak affected all firms regardless of their size. However, the effect looks proportionally stronger for small firms (Q1 and Q2 of the distribution). In contrast, larger firms are less affected and recover faster to the values of trade observed in 2019. Figure 1 : The evolution of total exports (left) and the proportion of surviving exporting firms at year t among those exporting at year t − 1 within size class at t − 1 (right). Firm size class derives from the firms' exports (in ln) distribution quartiles in a given year. Figure 2 shows, separately for the first and second quarter of a year, the percentage of firms that survive, enter, or exit the export market and their corresponding shares of total exports. Thus, for a given quarter in 2019 and the corresponding quarter in 2020, we label each firm as exiting when it is present in 2019 and absent in 2020, entrant when it is absent in 2019 and present in 2020, and surviving when it is present in both years. We average the total value exported by each firm during the same quarter of two different years. Then, we sum the individual average value exported according to the firms' status. It turns that surviving firms play an essential role in explaining total exports: they are around half of the total number of firms in both quarters and account for about 90% of the total export value. The volume lost due to exiting firms is around 5% (assuming they would have exported in 2020 similar export volumes as observed in 2019). Entrant firms almost made up this 5% loss. Despite this, the firms' composition that participates in exports is very different. The number of exiting firms in the second quarter of 2020 is much higher than the share of the first quarter of 2020 and the share of 2019 in the same period of the year. Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics (39) 0 20 40 Number of Firms Growth (%) Jan−Mar 2020 Top 80% exports by HS−Chapters Prep. veg. (20) Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics (39) −20 −10 0 10 20 30 Total Export Growth (%) Jan−Mar 2020 Top 80% exports by HS−Chapters Prep. veg. (20) Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics (39) −40 −20 0 Number of Firms Growth (%) Apr−Jun 2020 Top 80% exports by HS−Chapters Prep. veg. (20) Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics ( Figure 3 shows that the second quarter of 2020 is characterized by a severe and pervasive drop of the number of exporting firms and the volume of exports. Note that compared to the second quarter, the first quarter export growth exhibits a similar heterogeneity. However, growth rates tend to be less extreme and, on average, more stable in the number of exporters and trade volumes. 2 Exports by product sectors in the second quarter of 2020 (see Figure 4 ) reveals a generalized decrease in the number of exporting firms and trade values, while the first quarter exhibits very heterogeneous patterns. The sectors that appear to be more severely affected in the second quarter are Footwear (HS64), Leather Articles (HS42), Furniture (HS94), Books (HS49), Articles fo Metal (HS83), Knitted and Not-Knitted Accessories (HS61-62), Vehicles (HS87) and Articles of Iron or Steel (HS73). Interestingly, these sectors are relatively more labor-intensive in Colombia, and therefore they could be susceptible to disruptions connected to social distancing. Finally, only for Coffee and Tea (HS08), Other textiles (HS63) and Jewelries (HS71) exports in value significantly grew in the second quarter. Instead, in terms of the number of exporting firms, no product sectors exhibit notable positive dynamics. Figure Appx .2 in the Appendix shows the growth for 2019, pointing that in periods without strict quarantine -such as the ones of the second quarter of 2020 -the changes in exports are also very heterogeneous, but there are not such extreme changes. In summary, this preliminary evidence suggests that the impact of the COVID-19 shock on Colombian firms' export has been extremely heterogeneous across sectors and destinations. This section illustrates our empirical strategy to estimate the effect of the COVID-19 shock on firms' probability of surviving in the export markets, and to study its heterogeneity by firms' observable characteristics. As in any other evaluation study, the primary identification task is to build a counterfactual outcome, which is not observed, for the treated units. Unfortunately, in considering the effect of the COVID-19 shock, one cannot select any subset of untreated Colombian firms (or if they were available firms of other countries) as a control group because this treatment is affecting, at least indirectly, all firms during 2020. Furthermore, even an identification strategy based on comparing individual firms subject to different intensities of the treatment appears infeasible due to the complex and ex-ante unknown paths through which firms are potentially exposed to the treatment. 3 In other words, the intensity of treatment might depend on the firm's characteristics, such as the identity of suppliers and clients, the characteristics of the traded final product, among many others. Therefore, as standard in the literature studying the effect of COVID-19, we must resort to using the information on firms' exporting behavior available for periods before the crisis. Following the intuition of Varian (2016) , and similarly to the applications of Cerqua and Letta (2020) , and Fabra et al. (2020) , we use the prediction capabilities of ML techniques to build the counterfactual scenario for the 2020 firms' level outcomes by using pre-pandemic information on firms' export behavior and firms' characteristics. In particular, the outcome (success) that we want to predict is whether a company that was exporting in a given month in 2019 will export again in the same month of 2020. We build two different machines for each month to make predictions about individual exporters' success in 2020. One machine is the counterfactual machine, which could be defined as a "naive" machine because it does not consider the COVID-19 information (i.e., variables related to the firm in 2020 and the pandemic) to make predictions. We call this machine "Shock Unaware Machine" (SUM). The other machine we build is fully aware of all the available information related to the COVID-19 scenario. We call this second machine "Shock Aware Machine (SAM)". The SAM holds the information on the observed COVID scenario and expresses it in a comparable metric with the estimated non-COVID counterfactual outcomes deriving from the SUM. In the case of the SUM, we train a model (for each month) by using the set of exporters observed in 2018 (with the exporting success during the same month in 2019 as the outcome) and test it with the firms operating the international markets in 2019. Therefore, we apply the selected SUM to predict the 2020 outcome for firms exporting in 2019. 4 We use SUM predictions as the firm-level counterfactual outcome. The SAM machine considers the exporters operating the market in 2019. As mentioned in section 3, besides the customs data for these companies in 2019 and 2020, we also include the number of COVID-19 cases and deaths in the destination countries in 2020. Moreover, we also use the information related to governments' stringency measures at each destination country Hale et al. (2021) . In order to obtain one 2020 prediction for every firm that exported in 2019, we rely on cross-validation techniques (i.e., K-fold method) to validate the predictions out of sample. This approach trains the model on a random 80% of the data and tests it on the 20%. Then, it repeats the mentioned process five times (K = 5) until we have one 2020 prediction for each 2019 exporter. We construct the counterfactuals of Colombian exporters by taking the predictions of the SUM, and we compare the counterfactual predictions with those obtained by the SAM. Therefore, the differences between the two predictions represent our estimated firm-level COVID-19 effects.α (1) Differently from Cerqua and Letta (2020), we compare the counterfactual predictions (SUM) with the SAM predictions, instead of comparing the former ones with the observed outcome (whether an exporter is successful or not), to not lose accuracy in the comparison. Indeed the SUM (and the SAM) prediction outcome is a probability, but the observed outcome takes a binary value. The implicit untestable assumption is that the prediction error of the SUM and SAM are similar. The facts that the best performing SUM and the SAM have the same structure (e.g., a logit-LASSO model, see the next section) and that we do not find any significant COVID-effect for the first trimester of 2020 are reassuring. The average of the estimated firm-level effects constitutes the estimated COVID-19 average treatment effect. Finally, we use the estimated individual effects to identify subgroups with differential treatment effects based on their exogenous firm-product characteristics. To favor the interpretability of the results, we use a Regression Tree, by which we recursively partition the variables space to identify subgroups with differential treatment effects. Once we have defined the methodology to build the SUM and the SAM machines, we select the best model in terms of prediction performance among a set of standard ML techniques and compare them with a benchmark logistic regression. We compare six different models: Logit, Logit-LASSO, Classification Tree, Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting. This comparison is made both for January and April, representing a clear non-COVID-19 and a sharp COVID-19 impacted month, respectively. Tables 1 and 2 compare the power of predictions of the models by presenting five commonly used performance measures for classification problems: R 2 , Area Under the receiver operating Curve (AUC), Precision-Recall (PR), Balanced Accuracy (BACC), and F1-Score for the positive class (success). These statistics range between 0 (when the model completely misclassifies the observations) and 1 (when the model predicts the outcome perfectly). All of them are general measures of the predictive power of a model. However, PR describes the performance particularly well under a zero-inflated context like ours (Saito and Rehmsmeier, 2015) , where the number of Colombian exporters not succeeding in exporting the same month of the next year exceeds the successful ones. More in general, in the presence of unbalanced data (which includes our zero-inflated empirical setting) both BACC (Brodersen et al., 2010) and F1-scores (Van Rijsbergen, 1979 ) are particularly informative. Table 1 5 shows that Logit-LASSO outperforms other models in January when predicting with the SUM, and performs almost equally than the Classification Tree in April. Table 2 , 6 shows that Logit-LASSO outperforms other models both in January and April for the SAM. Therefore the Logit-LASSO model is the best performing model across all settings. To corroborate the selection of Logit-LASSO as the best performing algorithm we use a ML ensemble technique called "Super Learner " ( Van der Laan et al., 2007; Polley and Van der Laan, 2013; Van der Laan and Rose, 2011). The Super Learner is a prediction algorithm that assigns weights to find the optimal combination among a collection of prediction algorithms. 7 This method allows us to create an ensemble model combining all the six methods we want to test in such a way that we can observe the contribution of each model to the selected final Super Learner model. Each model's weight in the final selected Super Learner is given by the parameter Coef. The performance or accuracy of each model is estimated with a statistic called empirical risk, which considers the mean-squared error obtained in a cross-validation setting to consider possible overfitting problems. This ensemble method weights the six models to minimize the cross-validated empirical risk (i.e., the average empirical risk across five folds). As discussed in Section 4, we are interested in predicting export status in 2020 for firms exporting in 2019. For the SUM predictions 8 , Table 3 reveals that Logit-LASSO and Random Forest are the models achieving the highest performance in January. However, in April, the Logit-LASSO is undoubtedly the best performing model. For the SAM predictionswhen we train the model on the sample of firms exporting in 2019, using their 2019 features and their outcomes in 2020 -the weight of the Logit-LASSO model is more pronounced in January. All in all, Table 3 confirms that, both in January and April, the model with the highest performance is the Logit-LASSO. Therefore, the Logit-LASSO appears to be the best performing model. We chose the Logit-LASSO instead of the ensemble provided by the Super Learner because the performance achieved by the two models is very similar, 9 and because, in the following evaluation exercise, we prefer to compare predictions obtained with the same model across different months and COVID-19 scenarios. 10 Moreover, predicting by using just one model is exponentially faster than using the Super Learner. Coef. Logit 0% 3% 1% 2% Logit-LASSO 38% 61% 59% 60% Tree 11% 5% 7% 1% RF 40% 21% 23% 18% SVM 0% 0% 0% 0% Gradient Boosting 11% 10% 10% 19% Given the above results on the prediction accuracy of the considered models, in the following analysis, we will rely on the Logit-LASSO model. In the Logit-LASSO model we include interactions between size and industry, sector, means of transportation as well as with 8 We obtain these predictions by training the SUM on the sample of firms exporting in 2018, and using their 2018 features and their outcomes in 2019 9 In terms of empirical risk, we obtain practically indistinguishable values for the Logit-LASSO and the Super Learner. 10 Indeed, as shown in Table 3 , the Super Learner adapts the weights associated with each ML routine to the different months and scenarios. destination country dummies. 11 Logit-LASSO is used for model selection, i.e. reducing the dimensionality of the matrix of predictors Ahrens et al. (2020) . To select the most relevant predictors, the model shrinks the coefficients of some variables to zero. The prediction analysis is repeated for all months between January-July 2020. During this period, Logit-LASSO selects 39 variables (out of 975) for the SUM and 55 variables (out of 1927) for the SAM in at least one month. Table Appx .3 in Appendix C compares the most important variables for each machine. 12 We use the Logit-LASSO predicted probabilities to estimate the average monthly effect of the COVID-19 shock as the monthly average ofα i (the difference between the firm-level predicted probabilities of success in the SUM and the SAM scenarios), which are presented Figure 5 . If we assume that, in the first months of 2020, firms are not affected by the COVID-19 shock, we can consider the estimates comparing the SAM and SUM predictions as a falsification test, similarly to the in-time placebo test routinely used in Synthetic Control Methods-SCM (Abadie et al., 2015) . Estimating an economically significant effect of the COVID-19 treatment in the months before the actual economic shock happened would indicate that our model is mechanically predicting a COVID-19 effect even when it is not expected. We will also apply this placebo study conditioning on exogenous firms' characteristics observed in 2019 by estimating COVID-19 effects for selected subsamples of firms according to such characteristics. We interpret these placebo studies as a robustness check on our results on treatment heterogeneity. As shown in Figure 5 , the probabilities obtained from the SUM and the SAM are almost identical on average for January, February, and March. This result is reassuring since only on March 25, 2020, the Colombian government implemented a complete and mandatory lockdown. More in general, we can conclude that our identification strategy is not mechanically recovering COVID-19 effects for a period with low incidence in Colombia and in the rest of the world. We find that the peak of the COVID-19 effect is in April 2020, when we find an average difference between the predicted probabilities of exporting of nearly 20 percentage points. In the following months, the estimated average effect is declining. In the following, we will first explore the heterogeneity of the COVID-19 effect by focusing separately on each of the firms' characteristics observed in 2019. To carry out the conditional in-time-placebo tests explained above, we will focus separately on two temporal windows: January to March and May to July. As suggested above, we check whether our SUM is able to represent the "business as usual" situation for the first non-COVID-19 impacted months even within subgroups of firms defined by their characteristics. If we assume that, during these first months of the year, no subgroup-specific shock changed the "business as usual" situation, finding significant heterogeneity of COVID-19 effects in the first time window would rest credibility to our results. Indeed, it would indicate that it is likely that the SUM counterfactual scenario is biased (at least for that specific subgroup of firms/along with that specific firm characteristic). In Figure 6 , we concentrate on firms' diversification (i.e., selling a wide range of products and/or selling to many countries), and we study whether, as suggested by the literature on risk and diversification, this dimension is a relevant determinant of the firms' resilience to COVID-19. In the upper panel of Figure 6 , the lines represent the mean COVID-19 effect as a function of the number of destinations and of the number of exported products (the dots represent the single observations, and the shaded area the interval of confidence). In the bottom panel of Figure 6 , the lines represent the mean COVID-19 effect as a function of the Herfindahl-Hirschman Index of Products and of the Herfindahl-Hirschman Index of Destinations (the dots represent the single observations, and the shaded area the nonparametric smooth fit with its interval of confidence). We find very weak evidence that in April, May, June, and July (see lines and dots in blue), firms that are more diversified in terms of destinations or products fare better. Reassuringly, in the four panels of the figure, the lines and the dots in gray show that our identification strategy finds no change in the importance of diversification for the first three months of 2020. We also report disparities in the effect of COVID-19 at other economic dimensions. The top-left panel of Figure 7 shows that the pandemic shock affects more those exports that use the land as a means of transportation than those using the sea. It is also remarkable that exports made by air are heavily negatively impacted by the pandemic. The top-right panel of Figure 7 shows how the COVID-19 shock has affected companies depending on their size. The smallest firms (Q1) are severely affected by the COVID-19 shock. Companies belonging to the second quartile (Q2) of the size distribution are the most impacted pandemic firms. As a firm's dimension increases, the effect of COVID-19 shock is lower. As expected, the biggest firms (Q4) are more resilient. The bottom panel of Figure 7 looks into the effect of the COVID-19 shock on each industry (HS-section). 13 We find that all export-industries, but "Prepared Foodstuffs, Beverages, Animal (01) Vegetable (02) Fats/oils (03) Prep. food (04) Mineral (05) Chemical (06) Plastics (07) Leather (08) Wood (09) Paper (10) Textile (11) Footwear (12) Cement (13) Jewel (14) Metals (15) Machinery (16) Vehicles (17) Precis. inst. Spirits, Tobacco (04)", are negatively affected by the pandemic consequences. However, we still find much heterogeneity in the size of the impact on the probabilities to continue exporting successfully. The "Vegetable Products (02)" industry seems to be well-prepared to face the shock of COVID-19. Nonetheless, in other industries like "Textile (11)", "Jewelries (14)", "Leather (08)", "Vehicles (17)", "Miscellaneous Manufacturing (20)" and "Footwear (12)", our model estimates that the probabilities of success of their exporters are dramatically reduced. The vehicles industry is a representative example of an industry affected by the pandemic, although it has a limited share of the Colombian exports. Due to mobility (HS), sectors into 22 sections. restrictions imposed by the majority of countries, people stopped using transport, and this directly affected the amount of sales of the industry. The "Prepared Food" industry seems to have benefited from the COVID-19 shock. Finally, it is important to notice that we do not find that COVID-19 significantly impacts any subgroup of firms during the first three months of the year. Having explored the expected impact of the COVID-19 effect by month and firm characteristics, we develop a heterogeneity analysis investigating our model's predictions for each Colombian exporter depending on their leading destination of export. Figure 8 shows that in the first quarter of 2020 our machines do not detect any meaningful effect of COVID-19 (top panel). The heterogeneity of the COVID-effect (bottom panel) in the subsequent period appears to be very weakly associated with a firms' main destination. This suggests that in Colombia, at least in the analyzed period of 2020, the exposure to export markets differently hit by COVID-19 is not among the main sources of treatment heterogeneity. 14 The heterogeneity analysis we have documented in the previous figures is broadly 14 This finding is also confirmed by the results presented in Table Appx.4. confirmed by the linear regression analysis presented in Table Appx .5. We regress the estimated treatment effect in percent (by using the logarithmic difference approximation) on firms' characteristics separately for each time window. It is important to notice that the R 2 of the regression for the first three months is equal to 0.056, while that for the subsequent months is about 0.42. Another time, following the reasoning of the in-time placebo test, this last piece of evidence suggests that we were able to build a credible counterfactual. In order to systematize our findings on treatment heterogeneity and to check for the existence of interaction effects between firm characteristics, we estimate a Regression-Tree using as dependent variable the logarithm of the firm-level COVID-19 effect and, as explanatory variables, the firms' characteristics used above. A decision tree is the simplest and the most interpretable way in the Machine Learning literature to interpret and capture non-linearities in our estimated effects. As shown in Figure 9 , Colombian exporters operating in April and May are predicted to be severely affected by the pandemic shock. In particular, the subgroup of firms suffering the most is the subgroup belonging to the industry of Footwear, Jewelry, Leather, Manufacturing, Paper, Textile, or Vehicles. Within this subgroup, firms that are in the first (Q1) and the second (Q2) quartile of the size distribution are the ones that are predicted to have the most significant impact of COVID-19. For this small subgroup (5% of total firms in the sample), the probabilities of succeeding in the international market are reduced by 100% under the COVID-19 pandemic (for the mentioned months). Nonetheless, firms located in the third (Q3) and the fourth (Q4) quartile and that belong to the same industries are affected by COVID-19 with a reduction in the probability of survival of about 55%. For the same months of April and May, exporters belonging to the industries of Vegetable Products (e.g., coffee, tea, live trees, cereals, etc.), Prepared Foodstuffs (e.g., sugars, cocoa, etc.), and/or Beverages, Spirits and Tobacco are predicted to be much less affected by COVID-19 (the probability to succeed is reduced by just 6.9% (subgroup composed by 8% of the total sample). Finally, it is important to underline that the Tree selects as non affected by the treatment all firms in the first three month of the year. Inasmuch, for this group of firms, the Tree is unable to find any split of one of the explanatory factors which is able to improve the sum of squared residuals by 1% with respect to the previous node (which is the rule that we set to allow a split and to limit overfitting). 15 This confirms that we do not find any treatment heterogeneity for the first three months of the year. Figure 9 : Regression-Tree to identify the COVID-19 effect heterogeneity by subgroups. The following explanatory variables are included: month (a factor with 7 levels), modal Colombian region (a factor with 29 levels), modal means of transportation (a factor with 3 levels), company size (a factor with 4 level, the quartiles), modal industry (a factor with 19 levels), HH p , HH d , N P , N D, distance to the capital in the destination country, continent, , total value exports (ln) and company size. Other factor variables are included with a reduced number of levels. The lose in terms of information is more that compensated by the gains in terms of interpretability of the results. The variable destination includes only the first 15, by number of transactions, countries. The sector variable uses only those sectors selected by Logit-LASSO. Our study contributes to the strand of the literature concentrating on the effects of COVID-19 on international trade. Earlier studies provided computable general equilibrium estimates on the expected trade impact of COVID-19 using simulations from computable general equilibrium models (World Trade Organization (WTO), 2020; Bonadio et al., 2020) . Instead, as de Lucio et al. (2020) and Espitia et al. (2021) , we carry out an econometric analysis employing actual data on international trade flows during the pandemic. Unlike the latter papers, our identification strategy also exploits pre-2020 information and machine learning methods to reconstruct the counterfactual 2020 firm-level outcomes in the absence of the pandemic shock, and investigates the heterogeneity of the COVID-19 effect. On average, we find that the COVID-19 shock decreased a firm's probability of surviving in the export market by about 44% in April and May and by approximately 13% in June and July (see the first leaves of the Tree of Figure 9 ). Our heterogeneity analysis suggests that, besides the temporal dimension, the main factors predicting treatment heterogeneity are interactions between firm size and industry. We use in-time placebo tests to check the credibility of our counterfactual estimates. This analysis is the first step towards a more exhaustive study of the COVID-19 effect on international trade by using ML counterfactuals. In a future revision of this paper, we aim to consider the intensive margins of trade and firms' import behavior and to enlarge the time window under analysis. From a methodological perspective, we will experiment with alternative methodologies to identify the shock effect's heterogeneity. More generally, this paper shows how machine learning methods can be applied successfully to predict firms' trade potential. We consider this method and its application promising avenues of research to assist firms and public agencies in their decision-making processes. The bulk of countries possess export promotion agencies whose objective is to sustain firms' internationalization activities by lowering the costs of information acquisition (Broocks and Van Biesebroeck, 2017; Munch and Schaur, 2018) . Indeed, forming a new trade relation typically requires substantial effort to gather information that is not freely available but is acquired through search and learning efforts. To start to trade (a new product or to a new destination), firms first need to be aware of the existence of a trading opportunity. Once the potential trading partner has been identified, there are additional obstacles to establish a successful trade relationship, including learning how to do business in the presence of non-tariff barriers (safety regulations, formal trade procedures, customs and infrastructures efficiency, etc.) and issues related to incomplete information or limited capability to process information (Rauch and Watson, 2003; Allen, 2014; Dasgupta and Mondria, 2018) . Moreover, the literature has also stressed the existence of complex interdependencies (complementarity or substitutability) between products and destination markets (from the perspective of technology/knowledge, local tastes, legal requirements, and marketing and distribution costs). 16 To stimulate economic recovery in the post-COVID-19 period, governments worldwide have planned export promotion programs to help firms reestablish their pre-crisis export level. 17 Our future research goal is to use ML techniques to help firms and public agencies to predict a firm's diversification/differentiation potential (to start exporting and to be a successful exporter at the firm, destination, and product-level) by taking into account both the relatedness among products and markets (i.e., by looking at the histories of the export baskets of firms) and the similarity of firms in terms of their fundamentals (size, productivity, types of intermediate inputs used, innovativeness, location, and employment structure). Given that exporter dynamics can be understood as a complex learning process dense of interdependencies and that ML techniques have been successfully applied to predict firm performances in such settings, we plan to use these techniques and firm-level data to build a recommendation system to help firms learning their latent comparative advantages by providing export diversification/differentiation recommendations. 17 On the effectiveness of this kind of policies in stimulating firms' competitiveness and the importance of the export markets to grow after the 2008 crisis, see Van Biesebroeck et al. (2016) and Almunia et al. (2018) , respectively. Variables with one level for each destination. Registers the number of reported deaths with COVID-19 as causal link. We present a daily average by 100,000 inhabitants. Coronavirus data* Models: SUM and SAM Size*Industry Factor variables with 5 levels for each industry. Takes value 1 when the company size is Q1, value 2 when company size is Q2, value 3 when the size is Q3 and 4 when the size is Q4 while operating in a given industry. However, it takes value 0 if a company is not operating in this industry (for any size level). Size*Sector Factor variables with 5 levels for each sector. Takes value 1 when the company size is Q1, value 2 when company size is Q2, value 3 when the size is Q3 and value 4 when the size is Q4 while operating in a given sector. However, it takes value 0 if a company is not operating in this sector (for any size level). Size*Means of Transportation Factor variables with 5 levels for each sector. Takes value 1 when the company size is Q1, value 2 when company size is Q2, value 3 when the size is Q3 and value 4 when the size is Q4 while operating using a given means of transportation. However, it takes value 0 if a company is not operating using this means of transportation (for any size level). Size*Destination Factor variables with 5 levels for each sector. Takes value 1 when the company size is Q1, value 2 when company size is Q2, value 3 when the size is Q3 and value 4 when the size is Q4 while operating in a given destination. However, it takes value 0 if a company is not operating in this destination (for any size level). Authors' own elaboration * https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics (39) −10 0 10 Number of Firms Growth (%) Jan−Mar 2019 Top 80% exports by HS−Chapters Prep. veg. (20) Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics (39) 0 40 80 Total Export Growth (%) Jan−Mar 2019 Top 80% exports by HS−Chapters Prep. veg. (20) Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics ( Edi. prep. (21) Jeweleries (71) Pharma (30) Fuels (27) Organic chem. (29) Cutlery (82) Glass (70) Tanning/dyeing (32) Aluminum (76) Other textile (63) Vehicles (87) Wood (44) Misc. manuf. (96) Footwear (64) Ess. oils (33) Art. metal (83) Soap (34) Coffee, tea (09) Art. leather (42) Rubber (40) Edi. fruit (08) Misc. chem. (38) Optical (90) Books (49) Furniture (94) Live trees (06) Paper (48) Acc. knitted (61) Acc. not knitted (62) Art.iron/steel (73) Electr. mach. (85) Nuclear reac. (84) Plastics ( D Appendix -Heterogeneity Correlations Table Appx .4 shows the significance of the correlation between the COVID-19 effect by months and the level of stringency imposed by each country during the same period. This correlation is only significant during the month of June when we observe a negative correlation between the two variables (lower levels of stringency are correlated with a higher COVID-19 effect in June). effect. The first column shows the results for January to March. Note that this model explains only 6% of the variance of the schock effect (R 2 ). The second column corresponds to the months from April to July. Note that in this model the R 2 increases notably to 42%. Comparative politics and the synthetic control method lassopack: Model selection and prediction with regularized regression in stata Sequential exporting Information frictions in trade Venting out: exports during a domestic slump Globalization and Pandemics. (No. w27840) The state of applied econometrics: Causality and policy evaluation Thinking ahead about the trade impact of COVID-19 Supervised learning for the prediction of firm dynamics Global supply chains in the pandemic The balanced accuracy and its posterior distribution The impact of export promotion on export market entry Colombian export capabilities: building the firms-products network Colombian liberalization and integration to world trade markets: Much ado about nothing Local economies amidst the covid-19 crisis in italy: a tale of diverging trajectories Inattentive importers Impact of Covid-19 containment measures on trade. Working Papers 2101 From controlled to undisciplined data: estimating causal effects in the era of data science using a potential outcome framework Changes in mobility and socioeconomic conditions during the COVID-19 outbreak. Humanities and Social Sciences Communications Multi-product firms and flexible manufacturing in the global economy A search and learning model of export dynamics Pandemic trade: COVID-19, remote work and global value chains Sicken thy neighbour: The initial trade policy response to COVID-19 Degrowth versus Decoupling: Competing strategies for carbon abatement? Implications of covid-19 for globalization A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker) The product space conditions the development of nations Bilateral relatedness: knowledge diffusion and the evolution of bilateral trade Innovation, trade and multi-product firms Extended gravity The effect of export promotion on firm-level performance Colombia in the classical era of 'inward-looking development A concordance between ten-digit US Harmonized System Codes and SIC/NAICS product classes and industries SuperLearner: Super Learner Prediction. Available at cran.r-project Starting small in an unfamiliar environment The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets Did export promotion help firms weather the crisis? Super learner Targeted learning: causal inference for observational and experimental data Information retrieval: theory and practice Causal inference in economics and marketing Global Economic Prospects Trade set to plunge as COVID-19 pandemic upends global economy Informe de la Junta Directiva al Congreso de la República -Marzo de Product-Herfindahl Index.Measures the concentration of products at 6-digits. HS by company-month.Authors' own elaboration. Authors' own elaboration. Factor variable with one level (dummy variable) for each sector.Takes value 1 in all periods after a company exports for first time in a given sector (reflecting past experience in a sector). Factor variable with one level (dummy variable) for each destination. Takes value 1 in all periods after a company exports for first time in a given destination (reflecting past experience in a destination).Authors' own elaboration. Variables with one level for each destination.Records measures such as income support and debt relief. Ranges from 0 to 100. Hale et al. (2021) Variables with one level for each destination. Records the strictness of 'lockdown' style policies that primarily restrict people's behaviour. Ranges from 0 to 100. Hale et al. (2021) Variables with one level for each destination. Combines 'lockdown' restrictions and closures with measures such as testing policy and contact tracing, short term investment in healthcare, as well investments in vaccine). Ranges from 0 to 100. Hale et al. (2021) Stringency Index