key: cord-0796235-3hetle1i authors: Marcílio-Jr, Wilson E.; Eler, Danilo M.; Garcia, Rogério E.; Correia, Ronaldo C.M.; Rodrigues, Rafael M.B. title: Visual analytics of COVID-19 dissemination in São Paulo state, Brazil date: 2021-03-24 journal: J Biomed Inform DOI: 10.1016/j.jbi.2021.103753 sha: 9634804cc74dd2ab06e81a56432954542e36c887 doc_id: 796235 cord_uid: 3hetle1i Visual analytics techniques are useful tools to support decision-making and cope with increasing data, particularly when monitoring natural or artificial phenomena. When monitoring disease progression, visual analytics approaches help decision-makers to understand or even prevent dissemination paths. In this paper, we propose a new visual analytics tool for monitoring COVID-19 dissemination. We use k-nearest neighbors of cities to mimic neighboring cities and analyze COVID-19 dissemination based on comparing a city under consideration and its neighborhood. Moreover, such analysis is performed within periods, which facilitates the assessment of isolation policies. We validate our tool by analyzing the progression of COVID-19 in neighboring cities of São Paulo state, Brazil. The novel coronavirus (SARS-CoV-2), or simply COVID- 19 , has already infected more than 116 million people worldwide and caused over one million deaths by March 2021. While understanding the biological aspects of such a virus is essential [1, 2, 3] , it is necessary to monitor the evolution in the number of cases in cities and their neighborhoods to provide information to decision-makers to think about strategies isolation policies according to the risk of dissemination. Moreover, citizens must be aware of how the isolation policies affect the dissemination by COVID-19 and the number of cases after isolation. To monitor COVID-19 dissemination, we propose a visual analytics tool to help analyze the growth in the number of confirmed cases in the cities of Sao Paulo state, Brazil. Our approach's main contribution consists of its ability to perform local and regional analysis of the cities by inspecting periods of analysis defined in days. We argue that analyzing a period al- * wilson.marcilio@unesp.br lows one to follow the evolution of the number of confirmed cases more easily because one can observe the number of cases accumulated for a period and not since the first day of notification. Besides defining a period, our perspective about the regional analysis (a city neighborhood) could be fundamental for decision-making due to disease dissemination patterns that usually follow a hierarchy spreading from bigger cities to neighborhoods. So, besides presenting the number of city confirmed cases, we also present the number of cases in its neighborhood. Comparing a city and its neighborhood allows verifying if a neighborhood has the main focus of COVID-19 dissemination or various dissemination points. For example, if a city under consideration has a bigger number of cases than its neighborhood, it stands out in the influence of the neighborhood; on the other hand, if the neighborhood stands out over the city under consideration, the neighborhood can have one or more cities that could influence cities with fewer confirmed cases. Finally, if both the city in analysis and its neighborhood have a high number of confirmed cases, such a neighborhood has high COVID-19 dissemination. To validate our methodology, we provide several case studies by analyzing cities in the Sao Paulo state, Brazil. Besides highlighting the cities on the map, our tool also summarizes the risk of dissemination using a radial visualization. We use the slope of the number of cases to interpret the risk of dissemination. Note that we are focusing on the dissemination risk rather than the number of cases itself. In the radial visualization, the circle encodes the city in the analysis, while a donut chart maps the neighboring cities. We use color saturation to indicate the risk of dissemination. That is, darker colors will represent cities with a higher risk of dissemination. This paper is organized as follows: in Section 2, we briefly delineate some related works; Section 3 presents the hierarchical spreading of COVID-19, from which our methodology is based; Section 4 shows the proposed visual analytics tool; analyses using the tool are presented in Section 5; in Section 6, we discuss some aspects of the technique; we conclude our work in Section 7. Using data to detect and quantify health events is a useful strategy to understand disease outbreaks. Usually, the strategies use data mining or visualization techniques to monitor events related to a disease of interest. Visualization-based strategies for monitoring the dissemination of diseases account for the fact that graphical representations can enhance the ability to identify data patterns and tendencies. In this case, it is better to look at visual variables, such as position, color, or area, than tables or reports to identify tendencies of growth and many other patterns. The literature presents many examples of systems using visualization techniques to enhance the analysis of disease dissemination, such as the work of Hafen et al. [4] , where they delineate a strategy to detect outbreaks based on monitoring pre-diagnostic data of emergency department chief complaints. Although using simple line curves, the visualization components helped at identifying patterns in the data. HealthMap [5] , on the other hand, uses geolocation of media reports to integrate outbreak textual data in a single resource. The system helps extract useful information and summarize unstructured data of disease reports, facilitating decision-maker's analysis. Another interesting approach is to use what-if strategies and visualize the outcomes depending on the decision alternatives applied when dealing with disease outbreaks [6] . Other strategies employing visualization tools are using heatmaps to analyze patterns of handfoot-mouth disease [7] , employing intelligent graph visualizations and reordered matrices to understand influenza dissemination paths [8] , or visualizing the effect of decision measures implemented during a simulated pandemic influenza scenario [9] . From the data mining perspective, it is usually interesting to contrast social network posts related to diseases with officially reported cases. These approaches are based on the strength of the relationship between officially reported cases and the searches on the web or posts on social media using words related to the diseases [10, 11, 12, 13] . For instance, most of the works use the web and social media to detect Influenzalike Illness events [10, 14, 15] . An excellent example of using data-mining techniques to detect disease outbreaks in the dutch system Coosto [16] , which uses Google Trends and social media data to detect outbreaks using a cut-off criterion. Other works also have shown that Twitter data is highly correlated to disease activity [17, 18] , such as predicting Dengue cases [19] or using Twitter-based data to automatically monitor avian influenza outbreaks, showing that one-third of outbreak notifications were reported on Twitter earlier than official reports [20] . In this work, we provide a visual analytics approach to monitor the evolution of dissemination of COVID-19 to facilitate analysis of dissemination during and post isolation. We use visualization techniques and analysis based on time windows to help analysts monitor how neighboring cities' situation affects the dissemination of COVID-19 to a city in the analysis. This section explains the hierarchical spreading behavior of COVID-19, which we use to define each city's neighborhoods. 2 The basic idea of the hierarchical spreading of COVID-19 is that cities with confirmed cases disseminate infection to their neighboring cities. In this case, the neighboring cities are the cities with confirmed cases inside the k nearest cities to the city in the analysis. Note that regional cities (bigger cities) are most likely to disseminate COVID-19 to their neighborhoods due to the bigger number of inhabitants, more job opportunities, more cultural access, and other social aspects that could attract people from neighboring cities. Figure 1 illustrates the hierarchical spreading, where an orange circle represents a city with a confirmed case, and the arrows indicate potential dissemination paths due to location proximity. To formalize the definition of a neighborhood for city A, Algorithm 1 shows the computation of each city's k nearest cities in the Sao Paulo state based on the latitude and longitude coordinates. Algorithm 1 Computing k nearest cities. 1 : procedure k nearest cities(cities, k) 2: latlong coords ← get latlong(cities); 3: knn sets ← KNN(latlong coords, k); 4: return knn sets; To augment the city A's neighborhood, besides the k nearest cities of A, we also add to the neighborhood the cities with A in their k nearest cities set. In this way, we simulate better the interaction between a city and its neighborhood. Finally, it is essential to mention that a few cities may not present confirmed cases. Thus, these cities do not appear in the visualization tool (see Section 4). Our visual analytics approach has the main objective of helping decision-makers analyze a city's situation based on disease dissemination. Besides information of a city in interest, our tool provides information about the situation of its neighborhood. We delineate the following requirements for our visual analytics approach to monitor the dissemination curves and help analysis based on the number of infections by COVID-19:: First, it is necessary to define the neighborhood of a city. Our strategy follows the hierarchical spreading scheme of COVID-19, as explained in Section 3, which states that a city with confirmed cases influence (i.e., can disseminate) its neighboring cities. These neighboring cities are retrieved using the k nearest neighbors algorithm. In our case, city A's neighborhood consists of its the k nearest cities and the cities with A in their k nearest cities set. Given that, we can analyze a city based on its dissemination as well as its neighborhood. In the following, we present how we accomplish each requirement. is shown at the bottom of the visualization. We also provide a map of the São Paulo state (g) to visualize the neighborhood in the analysis. To facilitate a comparison between a city and its neighborhood, users can use the infection curves for the city itself and the neighborhood. With such visualization, it is possible to understand if the city is being influenced by its neighborhood when the number of confirmed cases is greater for the neighborhood or if the city influences the neighborhood when the number of confirmed cases is greater for the city. Figure 3 illustrates a city influencing its neighborhood in the time window and the total number of cases. Note that the number of cases in Presidente Prudente is by far bigger than in its neighborhood. R2: Visualize the evolution of the number of cases in a time window. Using a time window, i.e., focusing analysis in a specified number of days, helps monitor the evolution of dissemination in chunks of time and facilitates the comparison among cities. In this case, questions such as which city responds better to isola-tion policies and how the isolation policy in a certain period affected the dissemination of a posterior period can be answered. Figure 4 illustrates how using the time window for analyzing only the confirmed cases inside the period helps us visualize flattening the Birigui city's curve. R3: Visualize the increasing and flattening of the curves. Although the curves with the total number of notification are sufficient to communicate how a city or neighborhood is performed on a period, the curves consisting of only reported cases in a period of analysis help users at focusing only on the increasing or flattening aspects of the curve, as shown for the requirement R2 and the example presented in Figure 4 . Such an approach is also useful when contrasting the curve slope with isolation indices in previous periods. R4: Quickly understanding the situation of a neighborhood. To promptly visualize a neighborhood's situation, we designed a glyph inspired in Somarakis et al.'s work [21] to encode the slope of the dissemination curves in a period. A donut chart represents a city's neighborhood in our visualization, while a concentric circle encodes the analyzed city, as shown in Figure 5. We use color to represent the angle formed by the slope of the dissemination curves. That is, the bigger the increase in the number of cases, the darker the color. Notice that the colors encode the increase, not the number of cases. To monitor confirmed cases' evolution, we use the São Paulo government's data at SEADE 1 . The data consists of daily updated cases in each city with a confirmed case. To create the visualization tool, we only use the city name, daily confirmed cases, and the date attributes -the latitude and longitude coordinates of each city are retrieved using the geopy 2 library. Further, the isolation index is also provided by the São Paulo government 3 for only a few cities. The isolation index consists of the percentage of inhabitants in isolation. the next few days in São Paulo's neighborhood. So, it is not our tool's concern to select cities hypothesizing that they will disseminate the virus. Instead, investigators select a city of interest and then use the time window to visualize and understand the first notification and confirmed cases' progression compared to its neighbors. In this section, we inspect the dissemination evolution of various cities in the Sao Paulo state, Brazil. We used a time window of 20 days to assess the following periods: from April 19th to May 9th, from April 21st to May 11th, and from April 26th to May 16th. 5.1. Regional city presents risk of dissemination to its neighbors Presidente Prudente. Fig. 6 shows the evolution of the confirmed cases of Presidente Prudente. Up to May 16th, Presidente Prudente has 89 confirmed cases, as shown on the figure's left. However, it is interesting to note how was the acceleration of confirmed new cases in the city in the period of April 19th to May 9th, April 21st to May 11th, and April 26th to May 16th. This city, in particular, has been giving the lowest isolation indices in the Sao Paulo state -we highlight in red the mean of the isolation indices in the period. It is worth noticing that besides presenting a critical situation in the city itself, Presidente Prudente greatly influences its neighborhood. Fig. 7 shows how the number of cases in the Martinópolis. After inspecting Presidente Prudente, we could notice that such a regional city greatly influences its neighborhood according to the risk of dissemination. In this case, given the low isolation index of Presidente Prudente, it could be useful to understand how neighboring cities can respond to such risk. Taking the city of Martinopolis as an example, we can see From the second to the third period, the curve of cases starts to present a plateau. Fig. 11 shows the dissemination curves and the isolation indices for Araçatuba. In this case, low isolation indices might be the reason for the increase in the number of cases. Araçatuba has a more critical dissemination curve, which could be explained -together with other reasons -by the low isolation index, which was 44% +/-0.05 from April 17th to May 7th. Finally, due to the augment to 48% +/-0.055 from April 19th to May 9th, it is possible to see a little flattening in the curve from April 21st to May 11th. This section aims to analyze cities in neighborhoods with a rapidly increasing number of cases, besides making relations to isolation indices for hypothesis generation. For this purpose, we analyze three periods: from March 23th to April 12th, from April 13th to May 3rd, and from May 1st to May 21st. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 i.e., the number of cities influencing Americana is bigger, and the darker colors stress the risk of contamination of these cities. Backing to the Santa Gertrudes's neighborhood, we finish by analyzing the last period, from May 1st to May 21st. Fig. 16 shows the situation of the neighborhood up to May 21st. The first thing to notice is the curve inclination of the cases in the period, where it is possible to see the exponential pattern of COVID-19. Then, we can see that two other cities (Araras and Cordenopolis) notified almost the total number of cases in this period. The influence that the risk of the neighborhood plays in these cities can explain such a pattern. Finally, the higher isolation index for both earlier periods of analysis for Americana made it possible to present a lower number of infections in this critical period, as shown in Fig. 17 . This section aims to analyze cities closer to the São Paulo state capital (São Paulo) and other regional cities. For earlier periods, readers will see that the curves seem flattened. However, this is due to the scale used to convey the number of cases. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 analysts can recall the donut charts to visually investigate the evolution of the number of cases. Santos. By March 29th, Santos's city did not present any confirmed cases. Besides that, its neighborhood was not presenting a critical situation if we look at the number of confirmed cases in Fig. 18 . Although the situation was not critical at such a moment, the isolation indices in Table 1 could indicate difficult periods ahead according to these cities' geolocation, the low isolation indices are even more severe, which are very close to the capital, São Paulo. Moving the time window seven days further, i.e., analyzing 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 The situation was different by May 3rd, in which the neighboring cities with confirmed cases jumped from four to eight. However, even with many neighboring cities with confirmed cases, Rio Preto still presents a higher number of infections than the cities in its neighborhood combined. Additionally, the increase in the number of cases seems to have a slow pace until May 3rd, which cannot be said for further periods, as we will see in the following. 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Finally, Fig. 24 shows that the COVID-19 dissemination for Ribeirao Preto has not presented a decrease. Instead, the number of cases seems to be increasing rapidly, which can be dangerous due to low isolation indices presented by the cities with more critical curves -see Table 2 the isolation indices for Ribeirao Preto and Sertãozinho. Ribeirão Preto 47% ± 0.026 Sertãozinho 47% ± 0.017 As for Ribeirao Preto, the city of São José do Rio Preto is the most influential in its neighborhood, as shown in Fig. 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 borhood start to present confirmed cases with the rapid increase (see Table 3 ), as shown in the donut chart. From this period until June 10th, São Paulo's and its neighborhood do not change in the increase in the number of reported cases. Fig. 30a and 30b show how the dissemination of COVID-19 continues at a rapid pace in São Paulo's neighborhood. The cities in the donut chart of Fig. 30a are very populous, besides interacting with themselves. On top of that, the isolation indexes are not useful, as seen in Fig. 31 , which could aggregate even more. Throughout the results section, we could demonstrate the usefulness of our visual analytics tool to understand the dissemination of COVID-19 in cities of interest by analyzing cities' influence on their neighborhood and vice-versa. It is important to emphasize that our tool helps analyze the evolution while it is not mainly focused on the number of cases a city or a neighborhood may present. In this case, our tool draws attention to cities or neighborhoods that present an increasing number of confirmed cases so that we believe that it could be employed even after infection by COVID-19 is controlled and employed to monitor the dissemination of other diseases. While we defined the neighborhood of a city as cities with spatial proximity, it is important to stress that this may not reflect the reality in some cases. For example, for a regional city, a neighborhood may be defined as the set of cities influenced by or influence cities according to some aspects, such as citizens that commute from smaller cities to greater ones to work. We hope that decision-makers can use our methodology to monitor the evaluation of the number of cases in cities and neighborhoods to respond to dissemination risks quickly. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 The novel coronavirus (SARS-CoV-2), or simply COVID- To validate our methodology, we provide several case stud-ies by analyzing cities in the Sao Paulo state, Brazil. Besides highlighting the cities on the map, our tool also summarizes the risk of dissemination using a radial visualization. We use the slope of the number of cases to interpret the risk of dissemination. Note that we are focusing on the dissemination risk rather than the number of cases itself. In the radial visualization, the circle encodes the city in the analysis, while a donut chart maps the neighboring cities. We use color saturation to indicate the risk of dissemination. That is, darker colors will represent cities with a higher risk of dissemination. This paper is organized as follows: in Section 2, we briefly delineate some related works; Section 3 presents the hierarchical spreading of COVID-19, from which our methodology is based; Section 4 shows the proposed visual analytics tool; analyses using the tool are presented in Section 5; in Section 6, we discuss some aspects of the technique; we conclude our work in Section 7. Using data to detect and quantify health events is a useful strategy to understand disease outbreaks. Usually, the strategies use data mining or visualization techniques to monitor events related to a disease of interest. Visualization-based strategies for monitoring the dissemination of diseases account for the fact that graphical representations can enhance the ability to identify data patterns and tendencies. In this case, it is better to look at visual variables, such as position, color, or area, than tables or reports to identify tendencies of growth and many other patterns. [6] . Other strategies employing visualization tools are using heatmaps to analyze patterns of handfoot-mouth disease [7] , employing intelligent graph visualizations and reordered matrices to understand influenza dissemination paths [8] , or visualizing the effect of decision measures implemented during a simulated pandemic influenza scenario [9] . From the data mining perspective, it is usually interesting to contrast social network posts related to diseases with officially reported cases. These approaches are based on the strength of the relationship between officially reported cases and the searches on the web or posts on social media using words related to the diseases [10, 11, 12, 13] . For instance, most of the works use the web and social media to detect Influenzalike Illness events [10, 14, 15] . An excellent example of using data-mining techniques to detect disease outbreaks in the dutch system Coosto [16] , which uses Google Trends and social media data to detect outbreaks using a cut-off criterion. Other works also have shown that Twitter data is highly correlated to disease activity [17, 18] , such as predicting Dengue cases [19] or using Twitter-based data to automatically monitor avian influenza outbreaks, showing that one-third of outbreak notifications were reported on Twitter earlier than official reports [20] . In this work, we provide a visual analytics approach to monitor the evolution of dissemination of COVID-19 to facilitate analysis of dissemination during and post isolation. We use visualization techniques and analysis based on time windows to help analysts monitor how neighboring cities' situation affects the dissemination of COVID-19 to a city in the analysis. This section explains the hierarchical spreading behavior of COVID-19, which we use to define each city's neighborhoods. 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 The basic idea of the hierarchical spreading of COVID-19 is that cities with confirmed cases disseminate infection to their neighboring cities. In this case, the neighboring cities are the cities with confirmed cases inside the k nearest cities to the city in the analysis. Note that regional cities (bigger cities) are most likely to disseminate COVID-19 to their neighborhoods due to the bigger number of inhabitants, more job opportunities, more cultural access, and other social aspects that could attract people from neighboring cities. Figure 1 illustrates the hierarchical spreading, where an orange circle represents a city with a confirmed case, and the arrows indicate potential dissemination paths due to location proximity. latlong coords ← get latlong(cities); 3: knn sets ← KNN(latlong coords, k); return knn sets; To augment the city A's neighborhood, besides the k nearest cities of A, we also add to the neighborhood the cities with A in their k nearest cities set. In this way, we simulate better the interaction between a city and its neighborhood. Finally, it is essential to mention that a few cities may not present confirmed cases. Thus, these cities do not appear in the visualization tool (see Section 4). Our visual analytics approach has the main objective of helping decision-makers analyze a city's situation based on disease dissemination. Besides information of a city in interest, our tool provides information about the situation of its neighborhood. We delineate the following requirements for our visual analytics approach to monitor the dissemination curves and help analysis based on the number of infections by COVID-19:: First, it is necessary to define the neighborhood of a city. Our strategy follows the hierarchical spreading scheme of COVID-19, as explained in Section 3, which states that a city with confirmed cases influence (i.e., can disseminate) its neighboring cities. These neighboring cities are retrieved using the k nearest neighbors algorithm. In our case, city A's neighborhood consists of its the k nearest cities and the cities with A in their k nearest cities set. Given that, we can analyze a city based on its dissemination as well as its neighborhood. In the following, we present how we accomplish each requirement. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 To monitor confirmed cases' evolution, we use the São Paulo government's data at SEADE 1 . The data consists of daily updated cases in each city with a confirmed case. To create the visualization tool, we only use the city name, daily confirmed cases, and the date attributes -the latitude and longitude coordinates of each city are retrieved using the geopy 2 library. Further, the isolation index is also provided by the São Paulo government 3 for only a few cities. The isolation index consists of the percentage of inhabitants in isolation. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 the next few days in São Paulo's neighborhood. So, it is not our tool's concern to select cities hypothesizing that they will disseminate the virus. Instead, investigators select a city of interest and then use the time window to visualize and understand the first notification and confirmed cases' progression compared to its neighbors. In this section, we inspect the dissemination evolution of various cities in the Sao Paulo state, Brazil. We used a time window of 20 days to assess the following periods: from April 19th to May 9th, from April 21st to May 11th, and from April 26th to May 16th. 5.1. Regional city presents risk of dissemination to its neighbors Presidente Prudente. Fig. 6 shows the evolution of the con- This city, in particular, has been giving the lowest isolation indices in the Sao Paulo state -we highlight in red the mean of the isolation indices in the period. It is worth noticing that besides presenting a critical situation in the city itself, Presidente Prudente greatly influences its neighborhood. Fig. 7 shows how the number of cases in the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Martinópolis. After inspecting Presidente Prudente, we could notice that such a regional city greatly influences its neighbor- From the second to the third period, the curve of cases starts to present a plateau. Fig. 11 shows the dissemination curves and the isolation indices for Araçatuba. In this case, low isolation indices might be the reason for the increase in the number of cases. Araçatuba has a more critical dissemination curve, which could be explained -together with other reasons -by the low isolation index, which was 44% +/-0.05 from April 17th to May 7th. Finally, due to the augment to 48% +/-0.055 from April 19th to May 9th, it is possible to see a little flattening in the curve from April 21st to May 11th. This section aims to analyze cities in neighborhoods with a rapidly increasing number of cases, besides making relations to isolation indices for hypothesis generation. For this purpose, we analyze three periods: from March 23th to April 12th, from April 13th to May 3rd, and from May 1st to May 21st. Backing to the Santa Gertrudes's neighborhood, we finish by analyzing the last period, from May 1st to May 21st. Fig. 16 shows the situation of the neighborhood up to May 21st. The first thing to notice is the curve inclination of the cases in the period, where it is possible to see the exponential pattern of COVID-19. Then, we can see that two other cities (Araras and Cordenopolis) notified almost the total number of cases in this period. The influence that the risk of the neighborhood plays in these cities can explain such a pattern. Finally, the higher isolation index for both earlier periods of analysis for Americana made it possible to present a lower number of infections in this critical period, as shown in Fig. 17 . This section aims to analyze cities closer to the São Paulo state capital (São Paulo) and other regional cities. For earlier periods, readers will see that the curves seem flattened. However, this is due to the scale used to convey the number of cases. Earlier periods are influenced by the number of cases reported by the analyzed neighborhoods in this section. For instance, 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 analysts can recall the donut charts to visually investigate the evolution of the number of cases. Santos. By March 29th, Santos's city did not present any confirmed cases. Besides that, its neighborhood was not presenting a critical situation if we look at the number of confirmed cases in Fig. 18 . Although the situation was not critical at such a moment, the isolation indices in Table 1 could indicate difficult periods ahead according to these cities' geolocation, the low isolation indices are even more severe, which are very close to the capital, São Paulo. Moving the time window seven days further, i.e., analyzing 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 The situation was different by May 3rd, in which the neighboring cities with confirmed cases jumped from four to eight. However, even with many neighboring cities with confirmed cases, Rio Preto still presents a higher number of infections than the cities in its neighborhood combined. Additionally, the increase in the number of cases seems to have a slow pace until May 3rd, which cannot be said for further periods, as we will see in the following. 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Finally, Fig. 24 shows that the COVID-19 dissemination for Ribeirao Preto has not presented a decrease. Instead, the number of cases seems to be increasing rapidly, which can be dangerous due to low isolation indices presented by the cities with more critical curves -see Table 2 the isolation indices for Ribeirao Preto and Sertãozinho. Ribeirão Preto 47% ± 0.026 Sertãozinho 47% ± 0.017 As for Ribeirao Preto, the city of São José do Rio Preto is the most influential in its neighborhood, as shown in Fig. 26 for the period from April 16th to May 5th. The image shows that the 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 cases in Sao José do Rio Preto are three times greater than the accumulated cases for all the cities presenting confirmed cases. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 borhood start to present confirmed cases with the rapid increase (see Table 3 ), as shown in the donut chart. From this period until June 10th, São Paulo's and its neighborhood do not change in the increase in the number of reported cases. Fig. 30a and 30b show how the dissemination of COVID-19 continues at a rapid pace in São Paulo's neighborhood. The cities in the donut chart of Fig. 30a are very populous, besides interacting with themselves. On top of that, the isolation indexes are not useful, as seen in Fig. 31 , which could aggregate even more. Throughout the results section, we could demonstrate the usefulness of our visual analytics tool to understand the dissemination of COVID-19 in cities of interest by analyzing cities' influence on their neighborhood and vice-versa. It is important to emphasize that our tool helps analyze the evolution while it is not mainly focused on the number of cases a city or a neighborhood may present. In this case, our tool draws attention to cities or neighborhoods that present an increasing number of confirmed cases so that we believe that it could be employed even after infection by COVID-19 is controlled and employed to monitor the dissemination of other diseases. While we defined the neighborhood of a city as cities with spatial proximity, it is important to stress that this may not reflect the reality in some cases. For example, for a regional city, a neighborhood may be defined as the set of cities influenced by or influence cities according to some aspects, such as citizens that commute from smaller cities to greater ones to work. Other indicators. Although we use the absolute number of confirmed cases to monitor the progression of COVID-19 dissemination, other indicators could be employed, such as a relation between the number of confirmed cases and the number of performed tests or total population. We choose the absolute number of cases in this work since it is the most straightforward approach to visualize the progression and be a well-known metric and more comfortable to understand. Notice that a more accurate picture of the epidemiological status would need a relationship between the number of tests and the neighborhood population. However, our tool offers easy interpretability and the capacity to reach a broader audience. Finally, different types of indicators (that would be useful to monitor even other diseases) can be easily incorporated in future works. Visualization tool. The visualization tool is available at RADAR 4 . 4 https://covid19.fct.unesp.br/analise-regional/en/ 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 7 Visual analytics techniques help discover patterns that would be difficult to perceive by looking only at raw data. In this work, we employed visualization metaphors to analyze the evolution of the number of cases of COVID-19 in the São Paulo state, Brazil. Our methodology consists of visualizing the dissemination based on time windows and contrasting the number of cases in the periods of analyses with the cities' isolation indices. Throughout several analyses, we show how our visualization design helps analyze a city's situation according to the number of cases in a time window and neighborhood situation. We show that our methodology emphasizes how the isolation index benefit cities regarding the dissemination, even when these cities are part of critical neighborhoods in the sense of the number of cases. We hope that decision-makers can use our methodology to monitor the evaluation of the number of cases in cities and neighborhoods to respond to dissemination risks quickly . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Clinical features of patients infected with 2019 novel coronavirus in wuhan, china Single cell rna sequencing of 13 human tissues identify cell types and receptors of human coronaviruses High expression of ace2 receptor of 2019-ncov on the epithelial cells of oral mucosa Syndromic surveillance: Stl for modeling, visualizing, and monitoring disease counts HealthMap: Global Infectious Disease Monitoring through Automated Combining search, social media, and traditional data sources to improve influenza surveillance Detecting and predicting emerging disease in poultry with the implementation of new technologies and big data: A focus on avian influenza virus Social media posts and online search behaviour as early-warning system for mrsa outbreaks Dengue surveillance based on a computational model Single cell rna sequencing of 13 human tissues identify cell types and receptors of human coronaviruses High expression of ace2 receptor of 2019-ncov on the epithelial cells of oral mucosa Syndromic surveillance: Stl for modeling, visualizing, and monitoring disease counts HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports Development of a quick look pandemic influenza modeling and visualization tool Visualized exploratory spatiotemporal analysis of hand-foot-mouth disease in southern china Visual analytics of spatial interaction patterns for pandemic decision support A pandemic influenza modeling and visualization tool Twitter improves influenza forecasting Towards detecting influenza epidemics by analyzing twitter messages Predicting flu trends using twitter data National and local influenza surveillance through twitter: an analysis of the 2012-2013 influenza epidemic Combining search, social media, and traditional data sources to improve influenza surveillance Detecting and predicting emerging disease in poultry with the implementation of new technologies and big data: A focus on avian influenza virus Social media posts and online search behaviour as early-warning system for mrsa outbreaks Dengue surveillance based on a computational model of spatio-temporal locality of twitter Sensitivity of the dengue surveillance system in brazil for detecting hospitalized cases Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting dengue at country and city level The assessment of twitter's potential for outbreak detection: Avian influenza case study (a) Donut chart showing a rapid increase in the number of cases. (b) Curves of confirmed cases for two periods of time Writing-Original draft preparation This work was supported by Fundação de Amparoà Pesquisa do Estado de São Paulo (FAPESP) grants 18/17881-3 and 18/25755-8. We also thank the anonymous reviewers for their thoughtful suggestions on how to improve our manuscript. This work was supported by Fundação de Amparoà Pesquisa do Estado de São Paulo (FAPESP) grants 18/17881-3 and 18/25755-8. We also thank the anonymous reviewers for their thoughtful suggestions on how to improve our manuscript.