key: cord-102776-2upbx2lp authors: Niu, Zhibin; Cheng, Dawei; Zhang, Liqing; Zhang, Jiawan title: Visual analytics for networked-guarantee loans risk management date: 2017-04-06 journal: nan DOI: 10.1109/pacificvis.2018.00028 sha: doc_id: 102776 cord_uid: 2upbx2lp Groups of enterprises guarantee each other and form complex guarantee networks when they try to obtain loans from banks. Such secured loan can enhance the solvency and promote the rapid growth in the economic upturn period. However, potential systemic risk may happen within the risk binding community. Especially, during the economic down period, the crisis may spread in the guarantee network like a domino. Monitoring the financial status, preventing or reducing systematic risk when crisis happens is highly concerned by the regulatory commission and banks. We propose visual analytics approach for loan guarantee network risk management, and consolidate the five analysis tasks with financial experts: i) visual analytics for enterprises default risk, whereby a hybrid representation is devised to predict the default risk and developed an interface to visualize key indicators; ii) visual analytics for high default groups, whereby a community detection based interactive approach is presented; iii) visual analytics for high defaults pattern, whereby a motif detection based interactive approach is described, and we adopt a Shneiderman Mantra strategy to reduce the computation complexity. iv) visual analytics for evolving guarantee network, whereby animation is used to help understanding the guarantee dynamic; v) visual analytics approach and interface for default diffusion path. The temporal diffusion path analysis can be useful for the government and bank to monitor the default spread status. It also provides insight for taking precautionary measures to prevent and dissolve systemic financial risk. We implement the system with case studies on a real-world guarantee network. Two financial experts are consulted with endorsement on the developed tool. To the best of our knowledge, this is the first visual analytics tool to explore the guarantee network risks in a systematic manner. Financial safety is a main concern of the government and banks. The majority of small and medium enterprises (SMEs) are difficult to get loan from the banks for their limited credit qualification, thus they often need to seek loan guarantees. In fact, guaranteed loan is already an important way to raise money in addition with seeking listed. In some developed economy like in US and UK, special government backed banks are established to provide guarantee credit [22, 27, 30, 40, 55] ; While in emerging economies like Korea [19] and China [31] , it is more common that the corporations guarantee each other when they are trying to secure loans from lending institutions. It is reported that a quarter of the $13 trillion in total outstanding loans in China are guaranteed loans in 2014 [40] and there is an 18% year-to-year increase [36] . This has led to a noticeable new phenomenon: a large amount of corporations back each other and form complex guarantee networks. Appropriate guarantee union may reduce the default risk but contagious damages over the networked enterprises may happen in practice. With the economic down period, large -scale breach of contract would hazard the banking asset quality deteriorated seriously and cause systematic crisis. Although the loan guarantee network appeared for less than twenty years and it is still not well understood. The current financial academic community published some qualitative analysis works on small guarantee networks and there is few quantitative analysis research. In banking industry, the credit assessors evaluate an enterprise basically on the basis of classic credit rating approach. Such a approach is not well suited for the complex benefit relationships. The risk management for the loan guarantee network is challenging: Firstly, the loan guarantee network may consist of thousands of enterprises with complex guarantee relationships and intertwined risk factors, making it very difficult to analysis. Fig. 1 illustrates a real guarantee network we constructed using ten years of bank loan records and it consists of more than 1000 enterprises, each of which has more than 3000 financial entries. Monitoring the financial status is so difficult that usually only after capital chain rupture, can the regulators study the case in-depth. Secondly, the fact that small and medium enterprise business operations (for example, the loan officers do not access to the enterprise net assets information) have inadequate transparency makes the loan risk evaluation more difficult. Some borrowers fraudulently obtain loans using the faultiness of bank lending risk managements. The cognition to risk loan guarantee especially malicious guarantee is still relatively limited. Thirdly, thousands of guarantee networks of different complexities coexist for a long period and evolve over time, this requires adaptive strategy to prevent, identification and dismantling systematic crisis. In the complex background of the growth period, the structural adjustment of the pain period and the early stage of the stimulus period, the structural and deep-level contradictions emerged in the economic development, all kinds of risk factors along the guarantee network accelerate the risk transmission and amplification, the guarantee network may be alienated from "mutual aid group" as "breach of contract". In this paper, we propose visual analytics approach for loan guarantee network risk management. It includes visual analytics for i) enterprises default risk; ii) high default groups; iii) high default pattern; iv) evolving guarantee network; and v) default diffusion path. In a nutshell, the main contributions are: 1. We consolidate with financial experts and identify five key research problems for loan guarantee network risk management, which is driven by emerging finance industry demands, and we believe this is an important research problem to the visual analytics science and technology community; 2. We propose intuitive visual analytics approaches for the tasks of i) enterprises default risk; ii) high default groups; iii) high default pattern; iv) evolving guarantee network; and v) default diffusion path. 3. We construct real loan guarantee network and perform empirical study on ten years of bank loan records. We highlight three high default patterns which are difficult to be discovered without visual analytic approach. We conduct interviews with two banking loan experts and got endorsed. The rest of the paper is organized as following: Section 2 describes works involving different aspects related to our problem; Section 3 details the five visual analytic tasks and our approaches; Section 4 describe the data, case study; and we report user study results in Section 5. Conclusions and future works are described in Section 6. To our best knowledge, this is the first work of visual analysis for the loan guarantee network risk management. We thus introduce several relevant work on network analytics in the financial domain; anomalous and significant subgraph detection in attributed networks; and works on financial security visualization. Credit risk evaluation Consumer credit risk evaluation is often technically addressed in a data-driven fashion and has been extensively investigated [5, 24] . Since the seminal "Partial Credit" model [39] , numerous statistical approaches are introduced for credit scoring, including logistic regression [60] , k-NN [26] , neural network [18] , support vector machine [28] . More recently, [4] presents an in-depth analysis on how to interpret and visualize the learned knowledge embedded in the neural networks using explanatory rules. The authors in [32] combine debt-to-income ratio with consumer banking transactions, and use a linear regression model with timewindowed data set to predict the default rates in a short future. They claim a 85% default prediction accuracy and can save cost between 6% and 25%. Financial network analytics Financial crises and systemic risk have always been a major concern [9, 21] . Networks or graph is a natural representation of the financial systems as they often bear complex interdependence and connections inside [2] . The relationship between network structure and financial system risk are carefully studied and several insights have been drawn: network structure has few impact for system welfare but plays an important role in determining systemic risk and welfare in short-term debt [3] . After the 2008 global financial crisis, network theory attracts more attention: the crisis brought by Lehman Brothers spreads on connected corporations in a similar infectious way as the epidemic of Severe Acute Respiratory Syndrome (SARS) in 2002 -both are small damage that hits a networked system and causes serious events [8, 13] . The journal of Nature Physics organizes a special on how to understand some fundamental economic issues using network theory [1] . These publications suggest the applicability of network based financial model. For example, the dynamic network produced by bank overnight funds loan may be an alert of the crisis [13] . Contrary to the conventional stereotype that large institutions are "too big to fail", the truth is the position of the institution in the network is equally and sometimes more important than its size [6] . More central the vertex is to the graph, more influential it is to the whole economic network when default occurs [13] . Moreover, the research that aims to understand individual behavior and interactions in the social network, has also attracted extensive attention [7, 20, 46, 47, 61, 62, 67] . Although preliminary efforts have been made using network theory to understand fundamental problems in financial systems [12, 17, 64] , there is little work on the system risk analysis in the loan guarantee network except for the preliminary work [41] . Among them, may be the most important work is using K-shell decomposition to predict the default rate; positive correlation between the K-Shell decomposition value of the network and default rates was reported [41] . Anomalous and significant subgraph detection in network Anomalous and significant subgraphs have been applied in many domains such as societal events in social media, new business discovery, auction fraud, fake reviews, email spams, false advertising [42, 54] . Classic anomalous and significant subgraphs refer to subgraphs, in which the behaviors (attributes) of the nodes or edges are significantly different from the behaviors of those outside the subgraphs [48] . Anomalous and significant subgraphs in social network can be used for early detection of emerging events such as civil unrest prediction, rare disease outbreak detection, and early detection of human rights events. The heterogeneous social network is modeled as a sensor network in which each node senses its local neighborhood, computes multiple features, and reports the overall degree of anomalousness. P-values of the subgraphs are used to represent the significance, and iterative subgraph expansion are used for the scaling problem [15] . Emerging events such as crimes or disease cases are detected from Spatial Networks [34, 44] . A common challenge for the subgraph detection is the complexity. As many of the algorithms are turned into subgraph isomorphism problem which is N-P complete problem, it is computationally infeasible for naive search. Algorithms are designed to optimize the performance. Readers are referred to [43, 58, 59, 68] for more details. Visualization in financial systems Financial risk is a major concern of the government and the banks. Visual analysis can enhance the understanding and communication of risk, help to analysis risks and prevent systemic risks. This is done by developing interpretable models, and and couple them with visual, interactive interfaces. In modern banking industry the business becomes more and more complex, the risk assessment and risk loan pattern detection have attracted a major concern. Animation is used to visually analysis large amounts of time-dependent data [63] . In [29] , 3D tree map are introduced to monitor the real-time stock market performance and to identify a particular stock that produced an unusual trading patterns. Interactive exploratory tool is designed to help the casual decision-maker quickly choose between various financial portfolios [50] . Coordinated specific keywords visualization within wire transactions are used to detect suspicious behaviors [14] . The Self-Organizing Map (SOM), a neural network based visualization tool is often used in financial risk visualization analysis, for monitoring the sovereign defaults occurrence in less developed countries [52] , visual analysis of the evolution of currency crises by comparing the clusters of crises between decades [51] , and discovering imbalances in financial networks [53] . Self-Organizing Time Map (SOTM) are used to decompose and identify temporal structural changes in macro financial data around the global financial crisis in 2007 and 2009. Readers are referred to [37] for more references on financial visualization. We consult with financial experts and consolidate five analysis tasks. In this section, we give an brief introduction in the first place before describing detailed algorithm, strategy and interactions. Fig. 2 gives the overview of the system and tasks. We first construct the real loan guarantee networks from bank record, perform statistical analysis and employ machine learning based approach to predicate enterprise default risk. All these data are fit into the interface to finish the tasks proposed by financial experts. Specifically, the tasks include: T1: Visual Analytics for Enterprise Default Risk. The current internal loan credit rating system is based on the pure financial status of the individual borrower. Credit assessor can usually access to the first layer of guarantee chain, and could not trustfully evaluate the entire guarantee network. In order to avoid inadequate risk assessment, it is necessary to carry out a systematic analysis of the enterprise. T2: Visual Analytics for High Default Group. Identifying the high default groups helps the banking experts single out and tackle the principal default problem. Visual analytics tools should be developed for thoroughly analyzing of the network, and recognize high defaults enterprises. T3: Visual Analytics for High Default Pattern. Some known guarantee patterns may lead to default and diffusion, but there may exist more complex patterns which is difficult to be discovered. This task requires visualize the known risk guarantee pattern and able to explore other more complex risk guarantee patterns. T4: Visual Analytics for Evolving Guarantee Network. Like many other real networks, there are competitive decision making taking place in the guarantee network. Understanding the network dynamic helps financial experts understand how the firms are connected together temporally. This task requires visualizing the guarantee network evolution based on history data. T5: Visual Analytics for Default Propagation Path. Before the crisis, forecasting the default diffusion path and monitoring the default spread status helps the government and bank take precautionary measures, conduct research, and take effective measures to prevent and dissolve risks, such that no regional or systemic financial risk occur. Default risk predication The loan records reveal that the guarantee network and default rates are both increasing, and the network structures show strong correlation with the defaults. We construct feature vector consisting of hybrid information and employ supervised learning approach to train the prediction model. In what follows, we discuss the hybrid features used in our model. In order to build a highly representative feature which can reliably reflect the statistical relationships between the customers information and their repayment ability, we clean the data and construct the features as: Basic Profile, the essential company registration information, which reflects the character, capital, collateral, capability, condition and stability [41] . We use business nature, registered capital, enterprise scale, employee number and others as corporation's basic profile. Most banks require company to update the basic information when the enterprise makes a loan application, and we choose to use the latest information as the basic profile features of the loan. Credit Behavior, historical behavior e.g. credit history, default records, default amount, total loan amount and loan count, total loan frequency (if any), total default rates. They are calculated by all the loan records before the active loan contract. Active Loan, the loan contract in its execution period. It contains active loan amount, active loan times, type of capital return and interest return etc. Network Structure, network features such as centralities are extracted as NS. Note that as discussed above, the basic profile may be not completely trustworthy as the SMEs may provide out of date or even fake information to the bank. However, the guarantee network is trustable information as the bank can build it from its own record systems. The prediction of default for a customer's loan guarantee can be modeled as a supervised learning problem. We use logistic regression based on gradient boosting tree [23] for the predication. The tree ensemble model using K additive function to prediction output can be represented as: In Eq. 1, f k is the k th decision tree, X i is the training feature andŷ i is predication results.Finding parameters of the tree model is turned into minimize the objective function problem and it can be trained in an additive manner [16] . where where ∑ i l(ŷ i , y i ) is a training loss function measures the difference between the prediction and the target; Ω( f ) is a smoothing regularization term to avoid over fitting. Specifically, we use three-month window for training, observation, predication, and evaluation. As Fig. 3 shows, in the training stage, for all customers who obtain bank loans from 2013 Q1(first quarter 1. Prediction shall be adapted to a dynamic setting with a regularly updated forecasting results. In fact, using sliding window is a typical way for rolling prediction as commonly adopted in event prediction practices such as [65, 66] . 2. The business often runs on a quarterly basis. Thus from a business demand perspective, it would be helpful to know the borrowers who may be default on a quarterly basis. Default risk visualization. We design and implemented visual interface enable to view the network with various multiple measurements. Fig. 4 gives the interface, by which users can adjust the node size by the predicted default risk and by the following network centrality measurements: Hub score and Authority score, K-Shell decomposition score, PageRank, Eigenvector Centrality Scores, Betweenness centrality, Closeness centrality. Fig. 5 gives a part visualization of a real guarantee network. In the graph, all defaulted enterprises are highlighted by red circles. Node size proportional to predicted risk (a), K-shell value (b), and authority score (c). Through the interface, users can also observe the rolling prediction risk of an enterprise over month and highlight it on the whole network by choosing it on the heatmap. Recognizing high default groups narrow down the risk guarantee relationship search scope and enable financial experts focus on firms with high-default crowd. Usually, community detection divides the guarantee network into groups (communities) based on how the nodes are connected together. Theoretically, community structure in graph is defined as the node set internally interacts with each other more frequently than with those outside it. Identifying such sub-structures provides insight into understanding the structure of complex networks (both functions and topology affect each other) [57] . Based on the conjecture that defaults occur in clusters, we first divide the whole network into several disjoint sets by community detection. Fig. 6 (a) shows the results on a typical independent subgraph we constructed from the bank loan records. The communities are marked using separate color background and average default rates are labeled. There are 30 communities, but the default occurs on four of them with average 38% to 8.6% defaults rates, all other 9 communities have no default during the guarantee network existence. Similar phenomenon are observed on random walks, edge betweenness, and spinglass community. In practice, we first use random walk algorithm [45, 49] to divide the whole guarantee networks into groups. We use a revised treemap interface to visualize the community detection results. The community label and default rates are displayed on the flat colored blocks.The treemap chart used for navigation here, thus the sum of area does not necessarily to be one. The larger blocks reveal the high default communities saliently. However, the evaluation of community detection is still an open question [35] , and the community detection algorithm only considers the link information and neglect node attribute information, the partition may not be consonant to the actual conditions. The basic rule for community detection is to minimize the number of links between communities and this uses pure network structure information. In financial practice, each node in the network comes with rich information such as enterprise sectors, changes in deposits, assets, loan amount, etc,. It would be unreliable discarding such attributes when dividing the network. By interaction, we enable the users to edit the communities into coherent ones by referring to relevant financial matric. We allow users to interactively perform the following manipulation actions. Interactive community editing. We enable users to explore the financial information and interactively edit the communities by merging strong associated communities, reassign the community labels for the structural hole spanners, a key role in the information diffusion [11] or split a community into several disjoin smaller groups. The generated subgraph are noted as group of interest (GOI), the high risk guarantee pattern are often hidden in the GOI. Reassign. The reassign operation allow to the change the community labels of the structure hole spanner. Structure hole spanner is the bridge node which connect different communities in a network. Fig. 7 is reproduced from [25] , and it illustrates a network with three communities and six structural hole spanners. Empirical study suggests that individuals would benefit from filling the "holes" (called as structural hole spanners) between people or groups that are otherwise disconnected [10] . Principled methodology to detect structural hole spanners from a given social network are still not clear [38] . In fact, we observed high default on structure hole spanners with their neighbouring internal nodes. We enable the users can investigate the financial matric and reassign the community labels of the structure hole spanners. Specially, when the user wish to merge two adjacent communities, he/she firstly double clicks one block on the tree map, all the other connected commonties are highlighted. Single clicking the structure hole spanner node can reassign it into the opposite community. For example, when community C 1 and C 2 are chosen, single click node a, both communities will be merged as C 1 , and vice versa. Merge. Neighbouring communities can be merged. As the community detection divides a graph purely based on links in the graph, algorithm may generate too many communities where some of them share common sector category or similar network structures. Merging the communities referring the financial matric can produce medium size and more tractable subgraphs. Specifically, when the user wish to merge two adjacent communities; he firstly double click one "tile" on the tree map, all other the connected commonties are highlighted. Double click the structure hole spanner node can merge the two commonties together and labeled as the clicked community. Split. Sometimes, we need to split the community into several parts. This happen when there exists when the default unevenly distributed. We can cut off the stable parts and this may reduce the MOI computation complexity. Specifically, when the user wish to split the community; he firstly double click one "tile" on the tree map, all other the connected commonties are highlighted. Double click the edge, the two opposite parts of the subgraph will be split into two communities. Financial information is useful. We use a financial radar chart to encode the key financial status under the tree map. Specially, the key indices include: Defaults, historic default behavior; LA/RC the ratio of loan amount to registered capital. It would more insightful using the ratio of loan amount to enterprise net assets, however, as the latter one is not always available. We use registered capital instead. Deposit loss the percentage of deposit loosing. The shorting of money and rapid decrease of deposit should not be ignored. Sector the enterprise sector is also important clue when editing communities. GA/RC the ratio of guarantee amounts to registered capital. As the loan guarantee is an obligation of a borrower if that borrower defaults, the ratio of guarantee amount to enterprise net assets is a crucial factor for the financial systematic stability. Similarly, we use registered capital instead. Credit rating it is the review rating of bank expert, which is also a key clue when editing communities. Usually, high default pattern discovery is not possible by observation as a practical loan guarantee network may consist of several tens of thousands nodes; nor does it via algorithms -naive subgraph mining from the network led to isomorphism problem which is proved to be NP-complete problem. We adopt a Shneiderman Mantra strategy to reduce the computation complexity. Guarantee circle visualization. The small and medium firms improve their borrowing capacity by a third party guarantor. Empirical studies by bank risk control specialist suggest the guarantee circle is a source of default risk. The most frequently used guarantee circle patterns include mutual guarantee, joint liability guarantee, star shape guarantee loan,and revolving guarantee (see Fig. 8 ). Such interactions are legal in China currently. They can enhance the solvency level to some extent but may induce occurrence of risks and transmission of risk pointed by financial regulatory documents. Often the specialists in the bank risk control department have only SQL query capability to find relative simple guarantee pattern. In this work, we enable automatically guarantee circle detection and visualization -the common recognized risk loan patterns including mutual guarantee, co-guarantee, and revolving guarantee are highlighted on the network. Fig. 9 gives an example of revolving loan guarantee detected from a real-world loan guarantee network. Users are able to focus on the relevant firms and explore more details. Besides, there are five firms default among the eleven firms in the three revolving structure, informing the banking experts to pay more attention on the firms involved in such patterns. New risk pattern discovery. As mentioned above, guarantee circles are relatively clearly understood by banking experts. However, they still can not quite understand does there exist more complicated guarantee patterns that may have implicit connections with high default phenomenon. We develop a visual analytics tool to help the experts discover and understand what have happened. The task is challenging: arbitrary guarantee pattern which has high default rate can be underneath the complex network structures. It is impossible to exhaustively compare all network patterns to determine whether it is in high default. Based on the conjecture that defaults occur in clusters, we propose an interactive Shneiderman Mantra strategy [56] to narrow down the risk guarantee pattern searching space. Fig. 2 gives the processing flow. Because the GOI are groups with high default rates, there may exist guarantee patterns which are prone to default. Usually, the motifs are the most basic building blocks for a network and the number of structures are limited. Motifs may reflect functional properties and provide a deep insight into the networks. A complex guarantee network is always connected by several smaller subgraphs bridged by the structural hole spanners. The sub-graphs inside the communities may reveal certain risk even fraud pattern. In this work, we obtain a set of motifs by first detecting motifs from the GOI. The motifs are ranked by their default rates (Eq. (4) ). Among them, high default rate motifs are noted as pattern of interest (POI) and they may need be investigated by banking experts in priority. where m is a motif. All motifs are possible risk loan guarantee patterns. However, it is still computationally challenging to obtain all POIs by the approach above for the following reasons. Firstly, motif structures increases with the node number increase rapidly, for example, 4 node motif has over 3000 possibilities. It is impossible to enumerate all motif structures. Secondly, motif matching is exhaustively searched from the query graph into the large network, and it is essences subgraph isomorphism problem. It still takes too much time for motifs with more nodes to be matched on the network. In this work, we propose an interactive motif editing approach. Users can further explore the financial information of adjacent nodes and add them to the motifs and generate POI. Network evolution over time is observed from the guarantee network. The topology of the network keeps changing -some nodes are connected to the network or removed from it, some communities are connected together through the guarantee of the structural hole spanner. Like many other real networks, there are competitive decision making taking place in the guarantee network: when a firm lack security to obtain a loan from the bank, it may resort to a guarantee corporation or thirty party firms. To some extent, the new guarantors may improve the overall system rationality but also may induce unstable factor as the network becomes even more complex. Understanding the network dynamic helps to financial experts understand how the firms are connected together temporal. In this work, we use animation to visualize the evolving of guarantee network. Users can drag the time bar to backtrack how the network evolve over time. They are allowed to hover mouse cursor over the node to view the company's financial information. This will help the financial experts understand what has happened historically. Fig. 10 gives an example how a real network evolve from July 2013 to April 2014. Combining enterprises financial status of different time, financial experts would be able to make analysis. Financial systematic risks is a top concern for the government and banks, however, as a new phenomenon, the understanding to the systematic risk of the loan guarantee network is still not sufficient. Sophisticated guarantee relationships tend to cause credit granted by multiple lenders and excessive credit. In the loan guarantee, a guarantor has the debt obligation if the borrower defaults, if the guarantee could not payback to the back, it may resort to its guarantors. In this case, the default may propagate like virus.The default contagious increases the possibility of occurrence of risks and transmission of risks. Especially in the economic downturn, some enterprises face operation difficulties and the financial crisis will have a domino effect: the default phenomenon may spread rapidly in the network, and this will make a large number of enterprises fall into unfavorable situation. The government and the banks always wish to monitor the default spread status and understand the complexity of the current issue of risks before they can take precautionary measures, conduct research, and take effective measures to prevent and dissolve risks, to ensure that no regional or systemic financial risk occur. Based on the relevant knowledge and experience, we develop the visual analytics tool to aid the default path discovery by visualization. A principle of the default diffusion can be described as the vulnerable nodes are the guarantors. Fig. 11 gives a diffusion path illustration. (a) is a guarantee network with eight nodes, where node E provides guarantee to five adjacent nodes and C, D provide guarantee to B and then to A; (b) is the possible diffusion path, the default of node A may lead to the B, C, D even E default. It is noted that node G, F, H are not connected with node E, as the default of E will not affect the repayment status of G,F, and H. In practice, there may be multiple possible propagation path as each node can serve as guarantor or get guaranteed. It is difficult to outline the main propagation path from the entire. We make the following assumption: the node on multiple propagation pathes is the key to prevent large scale default diffusion and thus should be highlighted. We compute all the propagation pathes, count occurrences and highlight the node on the network. We use the color to illustrate the propagation risk importance. We design the visual analytics tool which enables financial experts take into account of several factors on the judgment of defaults. The factors include the financial information of the corporation and guarantee contract amount information. The former information is plain listed when the user hovers the mouse pointer on the node, while a sankey diagram is used to represent the guarantee flow. The widths of the sankey diagram bands are directly proportional to the guarantee amount. Fig. 12 (a) gives results on a real guarantee network, when we choose one node, for example, node 32, the whole potential propagation path is highlighted in (b), and (c) is the corresponding sankey diagram. It can be seen that upstream companies usually provides more guarantee than they received. For example, node 18 provides much more guarantee than it receives. The imbalance of guarantee amount and collateral amount provide clue for the credit line assessment. The real situation is even much more complex. The default may be diffused like a virus infection and the virion must identify and bind to its receptor (guarantor). As mentioned earlier, each enterprises has more than 3000 financial entries, it is difficult to quantify anti-infective ability for each enterprises. We enable users to look up multiple financial status and cut off the propagation path. We also note that the propagation model provides more insights to end users and we plan to perform in-depth study for the topic and provide simulation interface in the future. We collect loan records spanning ten years from a major commercial bank in China. The names of the customers in the records are encrypted and replaced by an ID; we can access the basic profile like the enterprise scale, the loan information like the guarantee ID and loan credit. We first introduce the loan process, and then explain how the information are extracted and cleaned. The banks need to collect as much fine-grained information as possible, concerning the repayment ability of the enterprise. The information falls into four categories: transaction information, customer information, asset information such as mortgage status, history loan approval bank side record, etc. The most relevant to the loan guarantees are eight data tables: customer profile, loan account information, repayment status, guarantee profile, customer credit, loan contract, guarantee relationship, guarantee contract, default status. There are often more than one guarantors for one loan transaction, and there may be several loan transactions for a single guarantor in a period. Once the loan is approved, the SMEs usually can obtain the full size of loan immediately, and start to repay to the bank regularly by an installment plan until the end of the loan contract. In the record preprocess phase, by joining the nine tables, we obtain records related to the corporation ID and loan contracts. We then construct the guarantee network and compute the network related measurements. We now report the observations derived from the data. Overall statistics There are 11,000 loan customers, which span 60,948 mutual guarantee relationships derived from 36,618 loan contracts. There are 5,911 defaults during the past ten years, out of the total 87,307 repayments. The overall default rate to the number of contracts is 6.77%. Centrality indicators are helpful to identify the relative importance of nodes in the network. Fig. 13 gives the histogram of several most complex subgraphs on how the defaults distributed with different centrality indicator values. It is noted defaults happen more on nodes with large authority value and small hub values. This is consistent with intuition -the enterprise works as the hub ones back a large number of other corporations and it is supposed to be relatively stable and operates in good condition. In contrast, the enterprise works as the authority ones and accepts guarantee from many other corporations and this means they lack funds security and have higher risk in trouble. The statistics indicate the lender to watch the status of the "authority" high nodes in the guarantee network. Although the underlying assumption of pagerank is quite alike authority score, we did not observe similar correlation between the values and default rates (see Fig. 13 ). It is observed that the larger the centrality the higher default rates. The tasks were as follows: (1) Visual Analytics for High Default Groups; and (2) Visual Analytics for high default Pattern. The first case study is to find high default groups. The random walk community detection algorithm divides the guarantee network into 36 communities. The statistics are given in Table 1 . We edit the community following basic guidelines: (1) consider default status, loan amount and other financial statistics comprehensively; (2) mall communities can be either merged with its neighbouring large communities or pruned. For example, community 35,34 both have 4 nodes and these firms never default. There is low possibility they will become high default groups in the future; while the community 23 be merged with the neighbouring communities. (3) structural hole spanner nodes should be paid special attention. Usually, there are defaults happen on the structural hole spanners, the adjacent communities can be merged. Finally, we obtain ten communities and seven of them has relative high default rates as Table 2 . The seven medium sized groups of subgraphs which can be efficiently processed for further tasks. It is noted that the merge and reassign operation are based on the user expertise. As the user may choose various criteria, the final tree map can demonstrate different combinations and default rates. In this subsection, we explore high default patterns beyond guarantee circles. It includes (1) automatic motif detection from high default groups. Specifically, we employ the gtrieScanner (http://www.dcc.fc.up.pt/gtries/) approach. (2) matching the motifs with the entire network and calculate the ratio for default firms. (3) ranking the motifs in descending defaults order, and they are high default patterns. (4) the user interactively edits the high default patterns by adding more nodes, and the system will automatically match the new subgraph with the entire network and produce the ratio for default firms. Theoretically, There are 199 and 9,364 possibility combinations for 4-and 5-vertex-motifs [33] for a directed network, respectively. Matching all those motifs on the whole network would be time-consuming. The user interactively editing motifs helps more efficient to explore new patterns. In practice, we choose to analyze community 3, which consists of 103 enterprises; 36% of them default the 85% loans from the bank, as Table 2 shows. Fig. 15 gives the twenty 4-vertex-motifs automatic algorithm detected from community 3, and Table 3 shows the statistical information. Although there are nearly 200 kinds of 4 vertex node motif shapes, there are only 20 existing in the high default group. We thus perform analysis on the 20 motifs instead of every shape. The detailed motif shapes are given in Fig. 15 . Most of the them have rather complex structures, however, some of them are known to banking experts, for example, motif 6 is joint liability loan. Some others can be understood by a combinations of smaller guarantee patterns. For example, motif 5 is a combination of joint liability with a single guarantee. Three of the motifs, motif 15, 16, and 17 attracted our attention. (1) high default rates for the patterns (ranging from 61% to 90% in ratio for default firm and 55% to 100% in Ratio for default amount); (2) relative small number of instances (4 or 5) are detected from the whole network. Besides, (3) the top five risk motifs show single input, single output, feed forward structures. Fig. 16 gives all the pattern 15 instances detected from the entire network. Some of motif instances coincide together. These three patterns are interesting, for example, pattern 15 recurrent for five times in a group, the bank lost all the money lend to the enterprises with such guarantee structures(see Table 3 ). There is high possibility that fraud loan guarantee may happen for several times; and local bank failed to recognize the fraud pattern. Similar analysis implies pattern 16 and 17 may be also guarantee patterns with high default. We then conduct interviews with two banking loan experts. The first one comes from the financial regulator. The expert has more than five years of guarantee network research experience and has published several important investigation reports and books on the Chinese loan guarantee network status. The second one comes from a major commercial bank credit department; who has ten years of loan approval experience. Both experts are attracted by and understand the visualization guarantee relationships immediately. The first expert is rather inter- ested with the community editing. He said they try to resolve the financial risks in guarantee network, a major operation is to split the loan guarantee network into smaller ones with risks isolated. In this case, health enterprises will not be affected by financially risk enterprises. The editing function of our tool provides them a powerful weapon to achieve their target. Besides, the expert also has interest in the risk guarantee pattern discovery module, and he agrees the significant value provided by the finding of such risk patterns. There might exist illegally convey benefits under the suggested high default patterns. The expert will also dive into the financial disclo-sures of the risk guarantee enterprises and examine whether fraud guarantees are happening. The second expert expressed that he has never grasp the whole intercalations between enterprises so clear when assessing a loan. The expert claims the tree map gives an intuitive understanding about the guarantee groups. We present visual analytics approach for loan guarantee network risk management in this paper. To our best knowledge, this is the first work using visualization analysis approaches to address the guarantee network default risk issue. We design and implement interactive interface to analysis the individual enterprises default risk, high default groups, patterns in the group, network evolution and default diffusion path. The analysis can help the government and bank monitoring default spread status and provides insight for taking precautionary measures to prevent and dissolve systemic financial risk. Future work will include computational modeling of default diffusion and visual analytics for taking precautionary measures. Net gains Networks in finance Financial connections and systemic risk Using neural network rule extraction and decision tables for credit-risk evaluation Benchmarking state-of-the-art classification algorithms for credit scoring Debtrank: Too central to fail? financial networks, the fed and systemic risk Network analysis in the social sciences Complex financial networks and systemic risk: A review Bubbles, financial crises, and systemic risk Structural holes and good ideas1 Secondhand brokerage: Evidence on the importance of local structure for managers, bankers, and analysts The Making of a Transnational Capitalist Class: Corporate power in the twenty-first century. Zed books Network opportunity Wirevis: Visualization of categorical, time-varying data from financial transactions Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs Xgboost: A scalable tree boosting system Social network, social trust and shared goals in organizational knowledge sharing. Information & Management A comparison of neural networks and linear scoring models in the credit union environment Analysis of loan guarantees among the korean chaebol affiliates Social network sites: Definition, history, and scholarship Rasch models: Foundations, recent developments, and applications The effect of credit scoring on small-business lending Greedy function approximation: a gradient boosting machine. Annals of statistics Statistical classification methods in consumer credit scoring: a review Joint community and structural hole spanner detection via harmonic modularity A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician HMRC, Department for Business Innovation & Skills. 2010 to 2015 government policy: business enterprise Credit scoring with a data mining approach based on support vector machines. Expert systems with applications A visualization approach for frauds detection in financial market Determinants of the guarantee circles: The case of chinese listed firms Determinants of the guarantee circles: The case of chinese listed firms Consumer credit-risk models via machine-learning algorithms Network motif detection: Algorithms, parallel and cloud computing, and related tools A spatial scan statistic Benchmark graphs for testing community detection algorithms China faces default chain reaction as credit guarantees backfire Modelling dependence with copulas and applications to risk management Mining structural hole spanners through information diffusion in social networks A rasch model for partial credit scoring Loan 'guarantee chains' in china prove flimsy Credit risk evaluation for loan guarantee chain in china Efficient anomaly detection in dynamic, attributed graphs: Emerging phenomena and big data Fast subset scan for spatial pattern detection Detection of emerging space-time clusters Finding and evaluating community structure in networks Complex networks in the study of financial and social systems The lifecycle and cascade of wechat social messaging groups Anomaly detection in dynamic networks: a survey Maps of random walks on complex networks reveal community structure Finvis: Applied visual analytics for personal financial planning Clustering the changing nature of currency crises in emerging markets: an exploration with self-organising maps Sovereign debt monitor: A visual self-organizing maps approach Chance discovery with self-organizing maps: Discovering imbalances in financial networks Anomaly detection in online social networks The eyes have it: A task by data type taxonomy for information visualizations General optimization technique for high-quality community detection in complex networks Scalable detection of anomalous patterns with connectivity constraints Penalized fast subset scanning A credit scoring model for personal loans Relational learning via latent social dimensions Community detection and mining in social media Applying animation to the visual analysis of financial time-dependent data Using social network knowledge for detecting spider constructions in social security fraud Sales pipeline win propensity prediction: a regression approach Towards effective prioritizing water pipe replacement and rehabilitation Evaluation without ground truth in social media research Graph-structured sparse optimization for connected subgraph detection Figure 14 : High default groups after interactive editing. Motif