When External Knowledge Does Not Aggregate in Named Entity Recognition

Privatto, Pedro Ivo Monteiro; Guilherme, Ivan Rizzo

doi:10.1007/978-3-030-91699-2_42

Pedro Ivo Monteiro Privatto¹⁰ &
Ivan Rizzo Guilherme¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1214 Accesses

Abstract

In the different areas of knowledge, textual data are important sources of information. This way, Information Extraction methods have been developed to identify and structure information present in textual documents. In particular there is the Named Entity Recognition (NER) task, which consists of using methods to identify Named Entities, such as Person, Place, among others, in texts, using techniques from Natural Language Processing and Machine Learning. Recent works explored the use of external sources of knowledge to boost the Machine Learning models with sets of domain specific relevant information for the NER task. This work aims to evaluate the aggregation of external knowledge, in the form of Gazetter and Knowledge Graphs, for NER task. Our approach is composed of two steps: i) generation of embeddings, ii) definition and training of the Machine Learning methods. The experiments were conducted on four English datasets, and their results show that the applied strategies for external knowledge integration did not bring great gains to the models, as expressed by F1-Score metric. In the performed experiments, there was an F1-score increase in 17 of the 32 cases where external knowledge was used, but in most cases the gains were lesser than 0.5% in F1-score. In some scenarios the aggregated external knowledge does not capture relevant content, thus not being necessarily beneficial to the methodology.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Prompt-Based Data Augmentation Framework for Few-Shot Named Entity Recognition

Robustness of Named Entity Recognition: Case of Latvian

TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models

1 Introduction

With the evolution of digital technologies and the internet users profile, the generation of unstructured data had a huge increase. According to [22], unstructured data are those that do not show a clear syntactic and semantic, machine-processable structure.

This way, methods have been developed to identify and structure information present in textual documents. In these methods, the data structuring process make use of Natural Language Processing, an area that uses concepts of Linguistics and Artificial Intelligence to process data and automate language related tasks. Some of the Natural Language Processing tasks that we can mention are Machine Translation, Speech Recognition, Automatic Text Summarization, Information Extraction, among others.

Information Extraction methods have been developed to identify and structure information present in textual documents. In particular there is the Named Entity Recognition task, which consists of using methods to identify Named Entities, such as Person, Place, among others, in texts written in natural language and categorize them according to their nature. A Named Entity is a concept, formed by one or more words, that belongs to a previously defined semantic group [9,10,11, 25, 28, 33].

Traditionally, NER approaches have used many techniques from Linguistics, such as syntactic labels, word lemmas, affixes, among others, to extract information present in texts. The use of traditional techniques is quite laborious, as it involves several stages of data preparation. In order to ease the work of traditional NER approaches, Machine Learning methods have been developed. Recently, numeric techniques that captures syntactic and semantic aspects of words has been gaining space for their similar results, sometimes even better, when compared to classical techniques, without requiring an extensive process of feature engineering.

As examples of Machine Learning approaches to Named Entity Recognition task we can cite [5, 6, 12, 14, 15, 26], among many others. Distributed representations for words are some of the reasons for the popularization of Machine Learning methods in NLP, and as example of these representations there are word2vec [23, 24], GloVe [27], fastText [3], Flair [1], BERT [7], and many others.

As stated by Ratinov and Roth [29], NER is knowledge-intensive task. Therefore, there are approaches that still perform feature engineering with Machine Learning methods in order to include features that cannot be captured by word vectors or texts. Even though part of the problems are solved by feature engineering, there is still a semantic gap that cannot be solved with lexical and syntactic attributes alone. In this sense, semantic repositories have been adopted.

In [5] external information is added to the CNN-LSTM model through the use of a lexicon. This lexicon is made of Wikipedia data and is used to add new features in the shape of Inside-Outside-Beginning tags to the input vectors.

A work using external knowledge from ontology is presented in [19], where the authors perform the Named Entity Recognition on bridge inspection reports. In order to do this, they use a semi-supervised Conditional Random Fields (CRF) whose inputs consists of syntactic features, like Stems and Part Of Speech tags, and semantic features from the domain ontology, named BridgeOnto.

Liu et al. [20] add external information from Gazetteers, but instead of using tags indicating the pertinence of the word to one or more Gazetteers, it uses Hybrid Semi-Markov CRF to generate a numeric value that express the degree of pertinence of the word to a Gazetteer. The reason for this is to find a better alternative to the hard match representation, which is commonly used when Gazetteers use is adopted.

In [18] the authors present an approach using external knowledge through use of lexicons together with syntactic features like Part Of Speech tags and n-grams.

Ratinov and Roth [29] enrich the Named Entity Recognition task by using a External Knowledge in the form of gazetteers on CoNLL03, MUC7, and a smaller dataset of webpages assembled and annotated by them. They also use other features, such as context aggregation, extended prediction history, among others, to boost the NER performance of their model.

Seyler et al. [30] divide external features in four categories to quantify the impact of each one in the NER task, using a Linear-Chain CRF on CoNLL03 and MUC7 datasets. The four categories are: i) Knowledge Agnostic - using only local features; ii) Name-Based Knowledge - using a list of named entities; ii) Knowledge-Base-Based Knowledge - using features extracted from a Knowledge Base or an annotated corpus; iv) Entity-Based Knowledge - using the results of a Named Entity Disambiguation. Authors show that incrementally adding more categories of knowledge yields better effectiveness, but sometimes at the cost of efficiency, stating that there is a trade-off between them.

In order to add external knowledge to a neural model, Ding et al. [8] propose the use of an adapted Gated Graph Sequence Neural Network to capture the information contained in multiple gazetteers. The role of the Gated Graph Sequence Neural Network is to serve as a embedding layer that learn how to combine the knowledge present in more than one gazetteer of the same or different type. The resulting embeddings are then fed to a standard BiLSTM-CRF to fulfill the NER task on Weibo-NER and OntoNotes 4, both in Chinese.

The gazetteer knowledge is aggregated to an Attentive Neural Network by Lin et al. [17] for the Nested Named Entity Recognition task. They leverage the knowledge contained in gazetteers by finding a representation for entity candidates through what they call a gazetteer network, that is concatenated with the representation learned by a region encoder. The experiments show that this strategy improves the model performance on ACE2005 dataset.

Xiaofeng et al. [32] propose a method to incorporate dictionary features to a BiLSTM-CRF model in order to evaluate their impact. Their differential is that the dictionaries used during the training phase are obtained from the train split data, whereas the dictionaries used in the testing phase are from SENNA. Their experiments are conducted on CoNLL2003 dataset, and show that the size of the dictionaries (partial or oversized) may lead to inferior results in some cases.

The purpose of this work is to evaluate the aggregation of external knowledge, in form of Gazetter and Knowledge Graphs from YAGO and Freebase, for Named Entity Recognition task using BiLSTM, BiGRU and CRF.

The paper is organized as follows: Sect. 2 presents our approach, details of the models, the input vectors and the external knowledge used. In Sect. 3 the experimental protocol and the datasets are explained, and the results are discussed. Section 4 presents conclusions and future work.

2 Approach

This work aims to investigate the use of external knowledge in some commonly used neural models for NER. The first step of our approach is to generate the embeddings for the datasets utilized by the methods, and add the sources of external knowledge. After the first step, the next one is to define the neural networks, with the architecture inspired by [5].

This section introduces the neural models used, as well as the sources of the external knowledge and how these knowledge sources are added to our model.

2.1 Neural Models

In this work we used two neural models: Bidirectional Long Short-Term Memory (BiLSTM) combined with Conditional Random Fields (CRF) [6], and Bidirectional Gated Recurrent Units (BiGRU) combined with CRF.

The aim of the models is to find a sequence of labels $\mathcal {Y}=\{\mathcal {Y}_1,\mathcal {Y}_2,...,\mathcal {Y}_n\}$ for a given sequence of inputs $\mathcal {X}=\{\mathcal {X}_1,\mathcal {X}_2,...,\mathcal {X}_n\}$ of length n. In this work, $\mathcal {X}_i$ are vector representations of each word and its features in a sentence. These vectors are used as input to the BiLSTM/BiGRU layers, and the purpose of these layers is to extract features and create a new feature vector for each word represented by $\mathcal {X}_i$ while considering the surrounding words present in the same sentence. The idea of bidirectional layers is to use the same input vectors for two LSTM/GRU layers, one layer with the word sequence from left to right, generating $\overrightarrow{h_i}$, and the other the sequence from right to left, generating $\overleftarrow{h_i}$, which are concatenated into one feature vector, $h_i = [\overrightarrow{h_i}: \overleftarrow{h_i}]$, that is used as input by the classification layer.

Following [14] and other works, we chose CRF as the classification layer of our models because it takes into account the previously assigned labels. This way, combining BiLSTM/BiGRU with CRF, the models make good use of the sequential characteristic of texts.

2.2 Embeddings

Inspired by [5], this work make use of common embeddings: Character embeddings, Casing embeddings and Word embeddings. Further, we also aggregate the aim of validation: External Knowledge embeddings. All these embedding techniques are explained below, with exception of the External Knowledge embedding that is shown in the next subsection.

Character Embeddings: To generate the Character embeddings we first create a vector, randomly initialized with $U(-0.5,0.5)$, for each character in a set of 135 characters. After initialization, these vectors are then retrieved for each character on each word and used as input to a Convolutional Neural Network with max-pooling layer to generate a single vector with information from all the characters of a word, named Character embedding. To better capture character features, character vectors are trained with the rest of the model, and we use a Dropout layer to avoid Character embeddings overfitting.

Casing Embeddings: For the Casing embeddings generation, each input word is categorized in one of eight possible categories according to their composition, such as their capitalisation and presence of numbers and special characters. Then, each category is initialized as a one-hot vector that is further trained with the model.

Word Embeddings: As Word embeddings we decided to use pre-trained GloVe [27] Word embeddings with 50 dimensions. Other candidates were word2vec, fastText, Flair and BERT. Some of these maybe would yield better final results, but as the objective of the work is to analyze the impact of external knowledge in the chosen neural architecture, results below the state of the art do not invalidate this work.

2.3 External Knowledge

To aggregate external information to the input vectors, we chose to use two distinct sources: Gazetteers made from version 3.1 of Yet Another Great Ontology (YAGO) [21]; and Knowledge embeddings from Freebase, generated by TransE method [4] using OpenKE framework [13] with the latest Freebase dump^{Footnote 1}.

To create the Gazetteers we picked all entities referring to four types of entities in YAGO (Person, Organization, Foundation, Place), which correspond to three of CONLL2003 types (Person, Organization, Place). Besides the chosen types, we also picked their sub-types (e.g. Abstract painters is a sub-type of People, Presidents is a sub-type of People) By the end of this process, we had three Gazetteer lists whose number of entities is shown in Table 1.

Table 1. Number of entities in each Gazetteer list.

Full size table

With the purpose of adding Gazetteer information to the input vectors, a strategy similar to the Casing Embedding was used, but instead of the eight categories related to the composition of the word, we used another eight categories that express the pertinence of the word in one or more Gazetteer. To do this, we did a tagging stage where each word of the dataset was tagged according to their pertinence to a Gazetteer (e.g. Washington received PER/LOC tag, Kilimanjaro received only LOC tag). Then one-hot vectors are generated for each of the categories, which are then trained with the model, just as we did with Casing Embeddings.

As for the Knowledge embeddings from Freebase, we chose to use only single-word topics, but to compensate it we did not filter the topics by types. This way we used the Knowledge embeddings of all words in the text, and if the word doesn’t have a knowledge embedding we used a vector full of zeros.

Regardless of the source, the addition of external knowledge is done by concatenating the semantic feature vector, that was generated by one of the method described above, to the other vectors (Word, Casing, Character) in order to present the resulting vector as input to the neural models used.

3 Experiments

We conducted experiments on four datasets in order to check the impact of external knowledge on chosen neural model for NER task. We executed each experimental setting a total of 10 times due to the stochastic elements present in each model initialization. For evaluation we decided to adopt the F1-score metric, shown in Eq. 1 that is the harmonic mean of Precision (P) and Recall (R), shown in Eq. 2 and Eq. 3, where TP stands for True Positive, FP for False Positive, and FN for False Negative.

$$\begin{aligned} F_1-score = \frac{2 * P * R}{(P + R)} \end{aligned}$$

(1)

$$\begin{aligned} P = \frac{TP}{TP + FP} \end{aligned}$$

(2)

$$\begin{aligned} R = \frac{TP}{TP + FN} \end{aligned}$$

(3)

So, the F1-score results presented in this section are the mean of 10 executions for each setting of the model.

3.1 Datasets

In this work we chose four distinct datasets, all of them in English, with different sizes and types of entities in order to validate our approach on different scenarios. The chosen dataset are CONLL2003 [31], OntoNotes5, MIT Movies [18], and MIT Restaurants [18]. We only used train/test split, leaving validation sets out of our experiments, thus not conducting a hyper-parameter optimization. All of the datasets use Inside-Outside-Beginning as entity annotation scheme. Table 2 shows in details the number of sentences and entities contained in datasets.

Table 2. Quantification of datasets.

Full size table

3.2 Experimental Settings

The experiments consisted in using external embeddings, described on previous sections, to check their impact on the results of our models. To verify this, we conducted experiments in the same model with and without the external knowledge. The model without the external knowledge was used as Baseline for our modifications, this way, we executed both network models, BiLSTM-CRF and BiGRU-CRF, with and without the addition of the external embeddings and compared their F1-score.

The parameters used are shown in Table 3. The parameters are the same for all datasets, except for OntoNotes 5 dataset, which we decided to use a lower Learning Rate and fewer Epochs due its large amount of data.

Table 3. Parameters values. Values with $*$ symbol refer to parameters used only for OntoNotes5.

Full size table

3.3 Experimental Results

Table 4 shows the results of the carried out experiments and the state-of-art (as stated by the authors in their papers) F1-score for the given dataset. The results show a an increase in all cases when we added Gazetteer information to BiLSTM-CRF. However, the addition of Knowledge Embedding to BiLSTM-CRF decreased the F1-score on every situation. On the experimental results using BiGRU there are two scenarios of increase in F1-score with the addition of Gazetteer information, and the use of Knowledge Embeddings also increased the F1-score in two scenarios. Regardless of whether the external knowledge increase or decreased the F1-score, the difference was not significant, representing on average 0,4%, with exception of OntoNotes 5 dataset, where the differences were very accentuated, averaging 6,76%.

Even though other works show an increase in F1-score when adding external knowledge to the models, our addition of external knowledge doesn’t always brings positive impacts on F1-score, as shown by Table 4, and even when it does it not necessarily achieve the best result of our models. Furthermore, even our best results are far from the state of the art in some datasets.

Table 4. Comparison of F1-score between our models and state-of-art models, where Gaz stands for Gazetteer, and KE stands for Knowledge Embeddings. Each column represents the F1-score for that dataset, and the bold values represent the best result for that dataset.

Full size table

When compared to the results of other works, such as [29, 30] our strategy doesn’t seem to bring gains, however it is worth noting that the models we chose as baseline achieve good values of F1-measure (88.57% and 88.15% for CoNLL2003 using BiLSTM and BiGRU, respectively), which are very close, respectively, to their best and second best results. This choice of a good baseline may be one of the reasons behind the small gains.

4 Conclusion

This work aimed to evaluate the use of external knowledge used in machine learning methods for Named Entity Recognition. The methodology was composed of two steps: i) generation of embeddings, ii) definition and training of Machine Learning methods.

The defined models were trained and tested in four English dataset of different sizes and different types of entities in order to evaluate our methodology.

As our experiments show, in spite of an increase of F1-score in 17 of the 32 cases, the way external knowledge was integrated to the model did not bring much gains, most of them being lesser than 0.5%, and in some datasets the results were a way below the state-of-art methods (for the values stated by the authors in their papers). This may be explained because we haven’t made any hyper-parameter optimization, which may have led the model to suffer from overfit, and underfit in the case of OntoNotes5, where there was an increase of 11.8% in one scenario.

Another point to consider in the gap between our results and the state-of-art is the choice of Word embeddings: while our choice was to use GloVe Word embeddings, most recent works use embeddings that better capture the context of words.

It is important to note that we haven’t reproduced the state-of-art methods. Although we did not achieve state-of-art results, we were still able to check the impact of the addition of external knowledge in the used neural models.

The results and discussion show that the results of adding external knowledge are strongly linked to what information is used, as well as how it is used. We conjecture that in some cases the information present in external bases may be already integrated on the word representations, especially when the embeddings training set and the knowledge base share common data.

This way, adding external knowledge to the models does not always improve the results, and can even lead to performance decreases. So, in order to integrate external knowledge, a deep analysis is needed to capture all the semantic present in external knowledge bases.

Notes

1.
https://developers.google.com/freebase/.

References

Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649. Association for Computational Linguistics (August 2018). https://www.aclweb.org/anthology/C18-1139
Baevski, A., Edunov, S., Liu, Y., Zettlemoyer, L., Auli, M.: Cloze-driven pretraining of self-attention networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 5360–5369. Association for Computational Linguistics (November 2019). https://doi.org/10.18653/v1/D19-1539. https://www.aclweb.org/anthology/D19-1539
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Computat. Linguist. 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051. https://www.aclweb.org/anthology/Q17-1010
Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, vol. 2, pp. 2787–2795. Curran Associates Inc., Red Hook (2013). https://doi.org/10.5555/2999792.2999923
Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016). https://www.aclweb.org/anthology/Q16-1026
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011). https://doi.org/10.5555/1953048.2078186
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (June 2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Ding, R., Xie, P., Zhang, X., Lu, W., Li, L., Si, L.: A neural multi-digraph model for Chinese NER with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1462–1467. Association for Computational Linguistics (July 2019). https://doi.org/10.18653/v1/P19-1141. https://aclanthology.org/P19-1141
Freire, N., Borbinha, J., Calado, P.: An approach for named entity recognition in poorly structured data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 718–732. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_55
Chapter Google Scholar
Gorinski, P.J., et al.: Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches. CoRR abs/1903.03985. arXiv:1903.03985 (2019)
Goyal, A., Gupta, V., Kumar, M.: Recent named entity recognition and classification techniques: a systematic review. Comput. Sci. Rev. 29, 21–43 (2018). https://doi.org/10.1016/j.cosrev.2018.06.001
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017). https://doi.org/10.1093/bioinformatics/btx228
Han, X., et al.: OpenKE: an open toolkit for knowledge embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, pp. 139–144. Association for Computational Linguistics (November 2018). https://doi.org/10.18653/v1/D18-2024. https://www.aclweb.org/anthology/D18-2024
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 260–270. Association for Computational Linguistics (June 2016). https://doi.org/10.18653/v1/N16-1030. https://www.aclweb.org/anthology/N16-1030
Lange, D., Böhm, C., Naumann, F.: Extracting structured information from Wikipedia articles to populate infoboxes. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1661–1664. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1871437.1871698
Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., Li, J.: Dice loss for data-imbalanced NLP tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 465–476. Association for Computational Linguistics (July 2020). https://doi.org/10.18653/v1/2020.acl-main.45. https://www.aclweb.org/anthology/2020.acl-main.45
Lin, H., Lu, Y., Han, X., Sun, L., Dong, B., Jiang, S.: Gazetteer-enhanced attentive neural networks for named entity recognition. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 6232–6237. Association for Computational Linguistics (November 2019). https://doi.org/10.18653/v1/D19-1646. https://aclanthology.org/D19-1646
Liu, J., Pasupat, P., Cyphers, S., Glass, J.: Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8386–8390 (2013). https://doi.org/10.1109/ICASSP.2013.6639301
Liu, K., El-Gohary, N.: Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports. Autom. Constr. 81 (2017). https://doi.org/10.1016/j.autcon.2017.02.003
Liu, T., Yao, J.G., Lin, C.Y.: Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 5301–5307. Association for Computational Linguistics (July 2019). https://doi.org/10.18653/v1/P19-1524. https://www.aclweb.org/anthology/P19-1524
Mahdisoltani, F., Biega, J., Suchanek, F.M.: Yago3: a knowledge base from multilingual Wikipedias (2013). https://hal-imt.archives-ouvertes.fr/hal-01699874/
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://doi.org/10.1017/CBO9780511809071
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, Workshop Track Proceedings, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013 (2013). arXiv:1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546. arXiv:1310.4546 (2013)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticæ Investigationes 30(1), 3–26 (2007). https://doi.org/10.1075/li.30.1.03nad. https://www.jbe-platform.com/content/journals/10.1075/li.30.1.03nad
Nurdin, A., Maulidevi, N.U.: 5w1h information extraction with CNN-bidirectional LSTM. J. Phys. Conf. Ser. 978, 012078 (2018). https://doi.org/10.1088/1742-6596/978/1/012078
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543. Association for Computational Linguistics (October 2014). https://doi.org/10.3115/v1/D14-1162. https://www.aclweb.org/anthology/D14-1162
Rais, M., Lachkar, A., Lachkar, A., Ouatik, S.E.A.: A comparative study of biomedical named entity recognition methods based machine learning approach. In: 2014 3rd IEEE International Colloquium in Information Science and Technology (CIST), pp. 329–334 (October 2014). https://doi.org/10.1109/CIST.2014.7016641
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the 13th Conference on Computational Natural Language Learning, CoNLL 2009, USA, pp. 147–155. Association for Computational Linguistics (2009)
Google Scholar
Seyler, D., Dembelova, T., Del Corro, L., Hoffart, J., Weikum, G.: A study of the importance of external knowledge in the named entity recognition task. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, pp. 241–246. Association for Computational Linguistics (July 2018). https://doi.org/10.18653/v1/P18-2039. https://aclanthology.org/P18-2039
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, CoNLL 2003, USA, vol. 4, pp. 142–147. Association for Computational Linguistics (2003). https://doi.org/10.3115/1119176.1119195
Xiaofeng, M., Wei, W., Aiping, X.: Incorporating token-level dictionary feature into neural model for named entity recognition. Neurocomputing 375, 43–50 (2020). https://doi.org/10.1016/j.neucom.2019.09.005. https://www.sciencedirect.com/science/article/pii/S0925231219312652
Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 2145–2158. Association for Computational Linguistics (August 2018). https://www.aclweb.org/anthology/C18-1182

Download references

Author information

Authors and Affiliations

Institute of Geosciences and Exact Sciences, UNESP - São Paulo State University, Rio Claro, SP, Brazil
Pedro Ivo Monteiro Privatto & Ivan Rizzo Guilherme

Authors

Pedro Ivo Monteiro Privatto
View author publications
Search author on:PubMed Google Scholar
Ivan Rizzo Guilherme
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Privatto, P.I.M., Guilherme, I.R. (2021). When External Knowledge Does Not Aggregate in Named Entity Recognition. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_42
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

When External Knowledge Does Not Aggregate in Named Entity Recognition

Abstract

Similar content being viewed by others

Prompt-Based Data Augmentation Framework for Few-Shot Named Entity Recognition

Robustness of Named Entity Recognition: Case of Latvian

TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models

Explore related subjects

1 Introduction

2 Approach

2.1 Neural Models

2.2 Embeddings

2.3 External Knowledge

3 Experiments

3.1 Datasets

3.2 Experimental Settings

3.3 Experimental Results

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us