key: cord-0584332-18phiybi authors: Zhang, Chenhan title: Complicating the Social Networks for Better Storytelling: An Empirical Study of Chinese Historical Text and Novel date: 2020-08-25 journal: nan DOI: nan sha: 30a0b04f23b8540d505962a35488247bccb07431 doc_id: 584332 cord_uid: 18phiybi Digital humanities is an important subject because it enables developments in history, literature, and films. In this paper, we perform an empirical study of a Chinese historical text, Records of the Three Kingdoms (textit{Records}), and a historical novel of the same story, Romance of the Three Kingdoms (textit{Romance}). We employ natural language processing techniques to extract characters and their relationships. Then, we characterize the social networks and sentiments of the main characters in the historical text and the historical novel. We find that the social network in textit{Romance} is more complex and dynamic than that of textit{Records}, and the influence of the main characters differs. These findings shed light on the different styles of storytelling in the two literary genres and how the historical novel complicates the social networks of characters to enrich the literariness of the story. among others. Third, network study is essential for the social network of a story and any network that possesses a topological structure, which can help gain an insight into the storyâĂŹs characters based on its grand narration [30] . To fill the gap, this paper introduces a social network and sentimental analysis work on two different texts of one of the most famous Chinese story, The Three Kingdoms. In particular, we leverage the state-of-the-art natural language processing (NLP)-based model to extract the social networks in the narratives of two books. A series of descriptive statistical analysis on the extracted networks is conducted, and we discover the homogeneity and heterogeneity in terms of topological features in these networks. Additionally, we adopt the sentimental analysis to compare the evaluations on some of the main characters. The results reveal that the social network is more complicated in the narrative of the novel (Romance) than that of the historical text (Records). Consequently, it can be concluded that the literariness of stories has a tight relationship with the complexity of the social networks they entail. The main contribution of this paper is as follows: • We integrate the latest NLP and network science techniques to extract and analyze the social networks of historical text and novel. • We depict the difference in the dynamic social networks of the Records and the Romance, the classic historical text, and novel of the same story. • A series of comprehensive case studies are performed, and we find that the historical novel complicates the social networks of characters to enrich the literariness of the story. The remainder of this paper is organized as follows. In Section 2, the backgrounds of text mining and social network analysis researches are presented. Section 3 elaborates on the network extraction approach. We perform a series of empirical studies in Section 4 to demonstrate the thesis of this work. Finally, this paper is concluded in Section 5 with a summary of potential future studies. Previous studies have demonstrated the importance of network analysis in different domains, such as complex network in supply chains [5] and risk identification in electric industries [3] . For networks that possess a social structure, SNA can be used to study social structures by analyzing the relationships, communities, and activities through topology graph theory [17, 26] . Initially, the study of SNA focuses on the network that actually exists, such as mobile social networks [18] ( Table 1 categorizes the metrics of SNA according to various features of social network). The development of NLP enables the extraction of the latent social network in narratives, such as literary text and news text (narrative network analysis). Recently, studies focus on narrative networks in literary works such as novels. For example, studies on Harry Potter find salient, small-world, and scale-free features in its social network [38, 41] , and these features reveal that the story is penetrated by compact character relationships. Homophily [22] , Multiplexity [10] , Reciprocity [14] , Network Closure [16] , Propinquity Distributions Bridge, Centrality [6] , Density, Distance [28] , Structural holes [9] , Tie Strength Segmentation Cliques, Clustering Coefficient, Cohesion 2.2 Text Mining and Natural Language Processing 2.2.1 Named Entity Recognition. Named Entity Recognition (NER) is among the core tasks in NLP. In story-oriented text ming, the NER task requires that the characters and sentiment representatives are treated as entities and can be identified in the texts [7] . A bulk of computational linguistic-based NER methods are developed, which plays vital roles in NER tasks, especially the token-level tasks [21, 34] . 2.2.2 Part-of-speech tagging. Part-of-speech (POS) tagging is the process of tagging a token (a word) for a particular part of speech according to its context [20] . Table 2 shows each type of tag with its corresponding meaning. POS tagging helps form related grammatical rules for different language patterns. To extract the social network of a story, characters and their connections among one another must be identified. The distribution of characters in a story is scattered and sometimes connotative. Natural language processing (NLP) technologies automate the identification of this specific information in texts, which can be a useful weapon [24] . Manuscript submitted to ACM The popularity of deep learning has facilitated designing a number of related models to handle the subtasks of NLP, such as NER. Google proposed BERT [13] , which substantially overcomes the limitations of existing models. BERT is based on Open AI GPT and performs attention mechanism on its model [31, 37] . It can predict the correct textual ID according to its entire context without a single directional limitation. In actual cases, BERT distinctly outperforms existing models in various metrics. For reference of the readers, Table 3 compares the capacities of the most widely adopted NLP models. To train deep learning-based NLP models in supervised or semi-supervised manners, a text corpus is required as the training dataset. In this work, we employ Stanford Question Answering Dataset (SQuAD) as the text corpus. The design of SQuAD is inspired by answering questions from reading comprehension [32] . Unlike the previous datasets, the Manuscript mechanism of SQuAD requires machines to select the answer from all possible candidates in the contexts rather than from a list of possible answers for each question. The answer is sometimes not a single word but a phrase, which makes the answer difficult to predict. Therefore, the robustness of models can be improved through such rigorous learning. In this paper, data from literary texts are limited. Therefore, we use a BERT + SQuAD method that can substantially address the problem because it can considerably improve prediction accuracy despite limited data [13] . Furthermore, some traditional methods are still adopted in such situations. In this work, we propose a text mining algorithm to extract the social networks in the narratives. We first pre-process the raw text to clean out the noise in the text and extract the accessible text from the narrative as the corpus. Then, we identify the characters from the corpus. Meanwhile, we also achieve sentiment extraction. Finally, the extracted characters are utilized to construct the social network. The schematic view of our text mining algorithm is illustrated as 3.3.1 Pre-processing. Raw text is required to be cleaned and further normalized to a specific format (i.e., corpus) for the processing of the algorithm. Pre-processing work is relatively simplified in this work since the adoption of the deep learning-based tool enables a loose format of the text in the corpus. Regular expression in data cleaning. Noisy contents are expected to be adjusted or eliminated because they are mixed with useful data, which may mislead results. Such content usually includes tables of contents, titles, headers. Fortunately, most of these noises usually follow a specific format. For example, as a translation of historical records, the âĂIJRecords Pagination, headings, and comments Corpus Extraction. Independently building a character-oriented corpus instead of basing on the existing corpus is essential for the character extraction task in this work. We assume that in a narrative, characters usually perform in conversations; hence, their identification is focused on such conversations. Each conversation consists of several dialogs, which usually follow a specific double quotation mark format, that is, one paragraph starts with the double opening quote (âĂIJ) and ends with the double closing quote (âĂİ). Following this rule allows all dialogs to be extracted from where conversations are located. In addition, the context in which a conversation occurs commonly contains the characters (a.k.a. speakers). The context usually follows the dialog, which is easy to identify. Therefore, each item of the corpus consists of two parts, âĂIJcontextâĂİ and âĂIJtalk,âĂİ which map to their corresponding content (See Figure 3 ). Labelling the speakers. ConversationsâĂŹ portrayal varies in the storytelling. Similarly, the location of the speaker in a dialogical context differs considerably, thereby making it difficult to identify in an automated way. Therefore, a manual labeling process is required to locate the speaker in each context accurately. Given that this process is time-consuming, a GUI-labeling tool based on Jupyter Notebook is developed, and the visual operation substantially facilitates manual work (See Fig. 4 ). In this work, a total of 1,702 items from Romance can be labeled within just three hours. Data augmentation. The size of data extracted from books is usually insufficient to reach a promising number of training samples, and it may result that the deep learning-based models cannot achieve a satisfactory prediction result [36] . In this work, the speaker corpus of Records collected only 1,248 items, and a measly portion of 806 samples (64.5%) are labeled after transcribing the entire text. To address this issue, a data augmentation approach is introduced to generate a sufficient number of new annotated data. How to generate new data and how much data should be generated are essential questions to answer. All speakers are assumed to be included in all the contexts. Supposing a total of S labelled speakers and M contexts, we can generate D A = S * M new data samples. In this work, we use this data augmentation method and obtain over a million new data samples, as shown in Table 4 . Speakers Identification. A BERT + SQuAD algorithm is used to build a speaker prediction model in this work. SQuAD provides a structure to answer the question (prediction) by comprehending the context. Referring to the structure of SQuAD, we structured a ternary dataset (i.e., context, answer, question). (See Figure 5 for an example of the dataset). BERT provides a contextual prediction algorithm, and we use this model to predict the speakers from the text. that we omit the training procedure of BERT since it is not among the main focuses, interested readers can refer to our codes available at https://github.com/GreatWizard9519/Social-network-extraction-and-analysis-of-Three-kingdoms. No related baseline evaluates the training effort because this study is an individual project. The predicted results after manual proofreading covers the vast majority of characters in the books (approximately 93% on Romance and 91 on Records). In this way, the appearing characters can be obtained from the prediction result. Aliases association. More than a few characters in the two books possess one or more aliases. For example, âĂIJXuan-deâĂİ, âĂIJLord LiuâĂİ, and âĂIJThe First RulerâĂİ all refer to the character âĂIJLiu BeiâĂİ. To overcome this problem, an alias-matching mechanism is built to map aliases of the characters. A flaw of this mechanism in practical use is that the shared family name or title may be mapped to multiple characters. For example, âĂIJSimaâĂİ can be mapped to âĂIJSima YiâĂİ and âĂIJSima ZhongxiangâĂİ. We develop two solutions that can solve this problem. First, the aliases mapping is classified according to the chapters of the story. For instance, âĂIJSima ZhongxiangâĂİ is a character who simply appears at the beginning of Records; hence, the mapping: âĂIJSimaâĂİ to âĂIJSima ZhongxiangâĂİ should solely be applied at the first few chapters. Second, the context is considered when mapping aliases. For example, when âĂIJLiu BeiâĂİ appears, the closest âĂIJLordâĂİ should be âĂIJLord LiuâĂİ (i.e., âĂIJLiu BeiâĂİ) with a high possibility. While the extraction and analysis w.r.t. sentiment is not a main focus of this paper, we still conduct related simple studies on some key characters to make the audience gain a deeper understanding of the story. Sentiments toward a character can be differently described. In this work, our sentiment analysis focuses on evaluative words. Other characters who comment about a certain character is a good entry to extract evaluations. Figure 6 shows one of âĂIJChen GongâĂİ's evaluations on âĂIJCao CaoâĂİ in Romance. Therefore, the extraction of such evaluative words is applied. First, all conversations involving a specific character are extracted and tokenized, and each token is tagged with the corresponding part of speech (i.e., POS tag). Subsequently, following the example shown in Figure 6 , words that possess an adjective POS (tagged with âĂIJJJâĂİ) and collected since we consider them as "evaluative words" to characters, which can be utilized in sentiment analysis. can construct social networks. The essential representations for our extracted social networks are defined below: • Nodes: For each character coming on stage, a node is built. As aforementioned, all characters are from the identified speakers; hence, the social network merely describes the relationship between characters who have monologues or dialogs. It is worth noticing two phenomena when using this node representation. First, the number of nodes is less than the actual number of characters that appear in the books. Second, there appear Fig. 7 . Network extracted from Romance (only show nodes whose degree is larger than 6). Fig. 8 . Network extracted from Records (only show nodes whose degree is larger than 6). Manuscript submitted to ACM some characters are isolated without any interactions with other characters (i.e., nodes whose degree is 0) in our networks. To ensure the completeness of the social network, we manually append some of the missing characters and meanwhile include the isolated nodes when constructing the networks. • Edge: To correlating the nodes, namely, construct edges, we establish an assumption that the adjacent appearance of characters will serve as the basis for creating interactions. Such an assumption is seemingly a coarse-grained solution. However, the outcome will have a high degree to match the actual situation when the size of the involved data is large enough. Based on this assumption, an algorithm established that an interaction (edge) is built when two adjacent characters are detected in the same context. Furthermore, on the account that the representation of edge describes a reciprocal relationship, the network is thereby considered as a bidirectional graph, wherein the values of in-degree and out-degree of every single node are equivalent. Network. Unlike others, the social network extract from narrative will grow as the story carries on. Investigation of network dynamics can help us gain a better insight into the story. To this end, the texts of the two books are chronologically split into five stages, and their corresponding networks are extracted though the same method introduced above. Some key events are set aside as separate markers to normalize the distribution of each stage due to the difference in the chapter settings of the two books, for instance, âĂIJthe death of Dong ZhuoâĂİ and âĂIJthe death of Liu BeiâĂİ. Moreover, these five stages represent the five most prominent periods in the story of the Three Kingdoms. Joining the five separate networks, a dynamic network with evolving growth across the five stages is obtained. In this work, we use Gephi [4] to visualize the extracted social networks. To present a clear visual effect, the two demonstrated networks (See Figure 7 for Romance and Figure 8 for Records) have been filtered to only include the characters (nodes) whose degree is greater than 6. Nodes are classified by using different colors and sizes, wherein the size of nodes is ranked from its value of degree. Moreover, the color of nodes is determined by their communities categorized by modular algorithms for aesthetic needs. This study focuses on exploring the discrepancy or similarity of multiple dimensions between the two books of the Three Kingdoms by employing social network analysis, and further gain an insight into the storytellings entailed in the two books. Our analysis incorporates two dimensions. First, a holistic analysis on the social networks extracted from the two books is introduced wherein global properties are emphatically considered. Subsequently, the observation on some protagonists will be discussed. To present the research logic, in the following investigations, we will first raise some interesting questions and approach them with rational explanations from the analysis results. The framework of a great story is grand, which generally involves numerous characters, intricate relationships, and thus entails a vast social network. In this work, we measure the "grandness" of the two social networks by using the three metrics as below: • N : The number of characters who appear in the story (i.e., number of nodes). • E: The number of interactions that occur in the story (i.e., number of edges). • d: The shortest distance between the two most distant nodes in the network (i.e., the diameter of network). Manuscript The results of the comparison is shown in Table 5 , where the number of nodes of Romance is four times of that of Records. Additionally, the diameter of Romance is 9 while the one of Records is only 3. Consequently, there are more interactions (i.e., edges) in Romance. It implies that the social network of Romance is grander than that of Records. Since the two books narrate the same story, we assume the casts of them (appearing characters) are similar. According to the statistical results, we find that while the number of characters involved in the Romance is far larger than that of Records, the former covers approximately 71.4% characters of the latter. It indicates the similarity of casts between the two books is very high. In addition, we rank the top 50 most frequently appearing characters in the two books (See Figure 9 ) and find a 50% coincidence referring to Figure 10 . Previous studies indicate that novelistic literature usually involves a social network that is more complex and thus the literariness and the dramaticism can be greatly enriched [15, 38] . In this paper, two main topological features of the complex networks are considered, and related investigations are conducted on the two extracted social networks. Small-world. Small-world is a complex network feature that describes a random network with a highly clustered structure. In a small-world network, most nodes are not neighbors of each other, yet the neighbors of some random nodes are probably going to be neighbors of one another, and most nodes can be reached from each other node by few jumps or steps. We can find out more homogeneity in the social structure when its social network possesses such a small-world feature. To measure the small-world feature in our extracted social networks, we introduce the two key metrics, namely, average clustering coefficient and average path length. Small-world networks are usually recognized as having large average path value length and low average clustering coefficient value. Moreover, an advanced metric, Small-World Index (SWI) [25] , is introduced. SWI is capable of quantifying the small-world feature, which can provide Manuscript submitted to ACM a more straightforward recognition. The calculation of SWI is where C and L are the clustering coefficient and average path length respectively, which are derived from the observed network (note that we compute them by Gephi in this work); C l and L l refer to the clustering coefficient and mean path length in a lattice reference network characterized by a high C and L; Similarly, C r and L r refer to the clustering coefficient and mean path length in a random reference graph characterized by a low C and L. From the results shown in Table 6 , we can observe that the Records has a significantly higher average path value and smaller average clustering coefficient compared to those of Romance. Especially, the results of the calculated SWI indicate that the SWI of Records (1.562) is higher than that of Romance (0.862), thereby quantifiably confirming our assumption. Literature that focuses on a single character or a group of characters presents a higher SWI than those focused on a mass of characters. Romance focuses on a few protagonists, features a much higher SWI than the Records, where the story follows several protagonists. It implies that Romance focuses more on storytelling around several characters rather than epic depiction. Scale-free. Scale-free describes a network whose degree distribution follows a power law. It reveals the Pareto principle that 20% of individuals commonly hold 80% of the total resources in a society, a.k.a., "the rich get richer" [11] . To investigate the scale-free feature of the two networks, we demonstrate the degree distribution of nodes in them, as shown in Figure 11 , where x is the degree of a node and y is the number of nodes which possess this degree. From the results, we can observe a salient power-law distribution in the diagrams of both networks. The satisfaction of the power-law indicates that networks of the two books are both scale-free. However, the distribution of Romance has a more significant coefficient of determination (R 2 : 0.8162 > 0.7749) than that of Records, which means that Romance is relatively more in line with this law. Rich-club Coefficient. A number of scale-free networks exhibit a âĂIJrich-clubâĂİ feature, indicating that a small number of nodes possessing a large number of edges also connect well to one another [1, 42] . The Rich-Club Coefficient is used to measure this feature, which can be computed by where N >k is the number of nodes whose degree is not less than k, and E >k is the actual number of edges among the nodes whose degree are not less than k; ϕ(k) is the ratio between the number of edges that exist among the nodes that have a degree larger than k and the total possible number among them. Considering the different sizes of the two networks, we compare the ratio of nodes that can form a fully connected network (ϕ(k) = 100%) deduced by the cut-off degree observed, which can be calculated by where k ϕ(k )=100% is the minimum k which make ϕ(k) = 100%, and N is the number of nodes in the network. The calculated r f c are 5.09% (Romance) and 20.31% (Records), respectively. It reveals that the top 5% rich nodes in Romance can approximately form a fully connected network, whereas the number has to be approximately the top 20% in Records. It can be concluded that both networks have a rich-club feature, which is more significant in Romance. These results reveal that despite more characters appearing in Romance than in Records, the story always revolves around a few protagonists in Romance. Which is More Dramatic? Dramatic changes make stories splendid. The rise and fall of warlords constantly change the social structure of the story of the Three Kingdoms across all stages. To study the growth of the social structure, we investigate the social network according to the idea introduced in Section 3.4.2. As shown in Figure 13 , five metrics are adopted to observe the dynamic change of the networks. Interesting phenomena are found in the results as below. The average node growth rate in Romance and Records are 147% and 63%, respectively. This suggests that Romance has dramatic changes in terms of the number of characters, and the appearance of characters on each stage is overwhelming. The density comparison indicates that Romance has a larger network size through all the stages, yet its density is lower, where the gap in the last three stages is especially notable. The change of the average degree of two networks follows a similar pattern, demonstrating that they both increase at the beginning and then reach a plateau. Records has a considerably larger average path length and lower average clustering coefficient in the majority of five stages except in the first two, which match its better performance in small-worldliness. Overall, we can observe that the growth of the social network in Romance is more rapid. Comparatively, Records has a tight, and gradually clustered network. In this subsection, we assess the network feature on specific characters. While the story of the Three Kingdoms involves numerous forces, the main focus is the three force blocs, i.e., Wei, Shu, and Wu. Therefore, their respective sovereigns, namely, Cao Cao (Wei), Liu Bei (Shu), and Sun Quan (Wu), are chosen as the targets for our character-centric investigation. In the story of the Three Kingdoms, the personal influence of each sovereign considerably represents the influence of the forces they possess. Given this kind of influence in a social network, the sovereignsâĂŹ interactions with other characters reflect their influence. Three related metrics are introduced to compare their influence: • Degree: Degree or degree centrality is a basic measure that counts the number of neighbors that a node (character) has. The weighted degree is additionally considered, which is calculated by considering the number of interactions that occur between two characters. • Closeness centrality: Closeness centrality measures the extent of closeness of a node to a network. It is calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Its formula is expressed as where C(i) is the closeness centrality of node i, and d(i, j) denotes the distance between node i and node j. • Betweenness centrality: For each pair of nodes in a network, at least one shortest path exists between nodes, wherein either the number of edges that the path passes through (for unweighted networks) or the sum of the weights of the edges (for weighted networks) is minimized. Betweenness centrality is a measure of the number of the shortest path that passes through a node. Denoted by д(v), the betweenness centrality of node v can be calculated by where σ i j is total number of shortest paths from node i to node j, and σ i j (v) is the number of those paths that pass through node v. A characterâĂŹs property of "bridge" can be measured by betweenness centrality. Table 7 presents the results of Romance and Table 8 shows the results of Records. In Romance, Cao Cao exhibits the highest measures of the four metrics, followed by Liu Bei, whereas Sun Quan has the lowest measures. In Records, Liu Bei leads the performance instead of Cao Cao, and Sun Quan is far behind them. This can support us to conclude that Cao Cao is the most influential of the three sovereigns in Romance, and Liu Bei is the one in Records, and the influence of Sun Quan is lower than the other two lords in both books. Generally, historical records tend to lean toward objectivity, whereas fictional novels contain subjective emotions. The creation of Romance began approximately toward the end of the Yuan Dynasty, which was a dark era for common lords of Shu and Wei, respectively. Our investigation focuses on these two characters from the point of their evaluating words. Taste" of the Character Sentiment. We commence by collecting and ranking the evaluative words to Liu Bei and Cao Cao. Specifically, we present the results by adopting the word cloud, as shown in Figure 14 . As the word cloud shown in Figure 14 , A sketchy sentimental opinion on Cao Cao and Liu Bei can be obtained. For example, in Romance, evaluative words such as âĂIJgreatâĂİ and âĂIJableâĂİ are mentioned for both two lords. However, we in addition find words such as âĂIJcraftyâĂİ and âĂIJevilâĂİ on Cao Cao and âĂIJhumbleâĂİ on Liu Bei, which reveals the difference. Moreover, more negative words are obviously found about Cao Cao in Romance than in Records. While this observation cannot bring us to the conclusion that the authors of the two books have an evident preference to a character, we can at least find the there exists differences regarding the depiction of the same character in the two books. For a better understanding of the evaluative words to , we conduct a quantitative comparison . Particularly, we introduce a sentimental scoring metric, SentiWordNet [2] . SentiWordNet score can be calculated by subtracting both polarities (positive and negative) of each token and subsequently calculating them: where n denotes the number of involved evaluative words, posScore i and neдScore i are the positive and negative scores of word i provided by SentiWordNet. The criterion of SentiWordNet gives Negative (i.e., ËŮ1), Neutral (i.e., 0) and Positive (i.e., 1) for users to classify the word. Figure 15 implies that Cao CaoâĂŹs score is lower than that of Liu Bei in Romance. Nonetheless, the score of Cao Cao is higher in Records. This finding is consistent with the subjective perception obtained in Section 4.3.1. In addition, the scores of the two characters are both higher in Romance than in Records. This possibly reveals the different sentimental tones of the authorsâĂŹ wording in the narrative. Surrounding on the Story of Three Kingdoms, this paper revives the research on digital humanities, which seeks to digitize working procedures of sociologists and historians in the field of humanities by using state-of-the-art data science technologies. Manuscript submitted to ACM Particularly, the advanced NLP model BERT is employed in our character identification work, and a satisfying outcome is obtained. Subsequently, we conduct a series of topological analysis to quantify and characterize the extracted social networks, where we additionally present a quantitative comparison between the two books. Specifically, network topological features, such as small-world, scale-free, and centrality of specific characters, are measured. The results reveal that the social network is more entangled in the narrative of the Romance than that of the Records, especially, more protagonist-oriented. Moreover, this provides a quantitative reference for the macro (e.g., structural features of a story) and micro levels (e.g., the influence or sentiment of a specific character), and the extent of the grandness vividness of a story can be expressed scientifically. This work can help both researchers and non-expert readers gain an insight into the story of the Three Kingdoms and the procedure of its digital analysis. Moreover, numerous involved sub-works can be refined in the future. First, the definition of interactions between characters is coarse-grained. Second, a mere five-slice dynamic network is built in this project, and hopefully, a large-scale dynamic network, which can incorporate hundreds even thousands of slices, can be obtained if the story is subdivided in fine granularity, for instance, year-to-year or day-to-day. A unifying framework for measuring weighted rich clubs Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining Visual analysis of supply network risks: Insights from the electronics industry Gephi: an open source software for exploring and manipulating networks Network analysis of supply chain systems: A systematic review and future research Centrality measures in networks What do we mean when we speak about Named Entities Romance of the Three Kingdom Structural holes: The social structure of competition Emergence of network features from multiplexity Variational principle for the Pareto power law Analysis of adapted films and stories based on social network Bert: Pre-training of deep bidirectional transformers for language understanding A review of developmental networks: Incorporating a mutuality perspective Extracting social networks from literary fiction Do you two know each other? Transitivity, homophily, and the need for (network) closure Quantitative narrative analysis. Number 162. Sage Applications, architectures, and protocol design issues for mobile social networks: A survey What is digital humanities and whatâĂŹs it doing in English departments? Part-of-speech tagging using decision trees Named entity recognition: fallacies, challenges and opportunities Birds of a feather: Homophily in social networks Efficient estimation of word representations in vector space Story Analysis Using Natural Language Processing and Interactive Dashboards How small is it? Comparing indices of small worldliness Social network analysis: a powerful strategy, also for the information sciences Resource Building and Parts-of-Speech (POS) Tagging for the Mizo Language Rethinking syntax: A commentary on E. KakoâĂŹs âĂIJElements of syntax in the systems of three language-trained animalsâĂİ Deep contextualized word representations A social network analysis approach to a digital interactive storytelling in mathematics Improving language understanding with unsupervised learning Squad: 100,000+ questions for machine comprehension of text Google Books' Library Project is fair use Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition Text as Ride: Electronic Literature and New Media Art. Center for Literary Computing Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages Attention is all you need Topology analysis of social networks extracted from literature Storytelling and narrative knowing: An examination of the epistemic benefits of well-told stories Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis Small-world and Scale-free Features in Harry Potter The rich-club phenomenon in the Internet topology