Portuguese Neural Text Simplification Using Machine Translation

de Lima, Tiago B.; Nascimento, André C. A.; Valença, George; Miranda, Pericles; Mello, Rafael Ferreira; Si, Tapas

doi:10.1007/978-3-030-91699-2_37

Tiago B. de Lima ORCID: orcid.org/0000-0002-0707-522X¹⁰,
André C. A. Nascimento¹⁰,
George Valença¹⁰,
Pericles Miranda¹⁰,
Rafael Ferreira Mello¹⁰ &
…
Tapas Si¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1272 Accesses
3 Citations

Abstract

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Neural Machine Translation for Low Resource Assamese–English

A Low-Resource Language Translation: French to Mooré

Neural Network Machine Translation Model Based on Deep Learning Technology

1 Introduction

Automatic Text Simplification (ATS) aims to transform complex sentences into simpler ones, which supports second language learners and improves communication with people with poor literacy, among other benefits [4]. ATS can be applied to multiple domains. For instance, it could simplify legal documents transforming jargon-heavy and juridic terms into more accessible vocabulary, making the document more understandable by the general public [9]. Hence, there a lot of opportunities for ATS techniques in institutions from the public sector, which needs to make their public documents more accessible and reinforce their relevance and transparency.

ATS has been an active research topic over the years with several applications [28]. In this context, José and Finatto expressed that the demand for text simplification in Brazil has increased over the years [16]. One of the main reasons for that is the growing need to make specific concepts accessible for a wider range of people. The authors also made an investigation about the language used in the documents provided by the Ministry of Health of Brazil. They found two kinds of documents describing the same disease. The first one is directed to health professionals and, therefore, has a particular domain-specific vocabulary. In the other class of documents, focused on the general public, a “simplified version” is provided. Nevertheless, there are still terms that are not usual among the less educated, such as “mucus”. Besides that, the manual simplification of complex text is not scalable, thus, ATS alternatives should be investigated thoroughly [16].

In the past decade, Portuguese ATS had a significant expansion with systems using lexical and syntactical simplification and Statistical Machine Translation (SMT) methods [3, 4, 30]. For instance, Specia [30] proposed a SMT model to simplify text in Portuguese using only a few examples. It achieved acceptable adequacy and fluency [30]. However, these studies were developed more than ten years ago, before the advent of Neural Networks architectures for NLP tasks. Therefore, even though deep learning algorithms are a trend of the field, they still have limited applications in Portuguese ATS. For instance, Neural Machine Translation (NMT) is a recent method for text simplification which directly transforms complex sentences in simpler ones without any need of syntactical or lexical analysis [2]. NMT has gained popularity due to its well succeed results on simplification in a variety of domains [2]. Moreover, different companies have used NMT methods in their services, such as Google Translate, Microsoft Translator, IBM Watson Language Translator [13, 27, 37].

MT-based ATS methods, such as NMT, usually use a parallel corpus to map hard-to-read sentences to simpler ones. Besides, it is domain-independent, i.e., one can use a larger parallel corpus to train an ATS model and then apply the same model on texts from a different domain with regards to its performance [8, 12, 32]. It is important to highlight that NMT methods have presented successful results and outperformed consolidated statistical methods [2]. According to our knowledge, no research has investigated the application of NMT methods to simplify Brazilian Portuguese texts.

Based on this scenario, this study investigates and assesses the use of state-of-the-art methods based on NMT to simplify documents written in Brazilian Portuguese automatically. This paper presents an empirical evaluation of NMT Models in a parallel corpus extracted from complex and simplified translations of the Bible to reach this goal. The results demonstrated that the use of NMT in Portuguese text simplification is promising, with a wide range of practical applications. These findings can improve text accessibility for more people, fostering the democratisation of information.

This paper is organised as follows. Section 2 introduces basic concepts about NMT and the methods adopted in this work. Section 3 presents works related to the ATS problem. In Sect. 4, materials and methods are detailed. Section 5 presents the results and discussion. Finally, Sect. 6 states the conclusion and future works.

2 Background

This paper explores the use of NMT models using mono language translation to simplify texts in Portuguese. Herein, we briefly introduce Recurrent Neural Networks (RNNs), NMT, and the methods considered in this work, i.e., an Attention-based model and a Bidirectional Recurrent Neural Network (B-RNN) with an Attention layer.

A RNN main advantage is to learn temporal information, even though it can be used in a non-temporal context [20]. In the case of machine translation, it usually works with a combination of encoder-decoder architectures in which both of them use RNN [20, 34]. An encoder is responsible for transforming the input in a context vector summarising its information after T recursive updates [25]. The decoding process takes the dummy input and generates recursively the output feeding the next output with the previous output generated [25]. In this paper, we consider a specific RNN type called Bidirectional Recurrent Neural Network (B-RNN), in which the encoder is not only able to predict based on the past inputs but also in the future ones [20]. It produces a feed-forward sequence $(f_1, f_2, \dots , f_n)$ and a backward sequence $(b_1, b_2, \dots , b_n)$ such that h = [f, b] is the concatenation of them [20].

A significant advancement on RNN was the proposal of the attention mechanism [6]. This mechanism allows a sequence-to-sequence model to pay attention to key parts of the target sequence. Consequently, it permits the model to learn the correct alignment of the sentences [6]. Studies have stated that attention mechanisms significantly improve the model performance on long sentences and improve the model soft-alignment [6]. Consequently, it had a considerable impact on improving the results of machine translation [6]. In this work, we used a B-RNN using an attention layer as one of the algorithms to be analysed in the ATS problem for Portuguese.

More recently, it was proposed a method called Transformer, which is based solely on attention layers, dispensing with recurrence and convolutions entirely [33]. The Transformer follows a general sequence-to-sequence architecture based on encoder-decoder [33]. The basic encoder format is a stack of N layers followed by two sub-layers, the self-attention, and the multi-head attention. Also, the encoder has a normalization layer and a residual connection [33]. The decoder follows a similar design using a stack of N layers with sub-layers and normalization with multi-head attention and residual connections [33]. The attention implementation proposed by [33] consists of a scaled dot product attention where the key (K), queries (Q) are vectors (V) of dimension $d_k$ and values are vectors of dimension $d_v$ [33]. Therefore, the attention to each output is calculated as given by Eq. 2 where $\frac{1}{\sqrt{d_k}}$ is the scaling factor.

$$\begin{aligned} Attention(Q,K,V) = softmax(\frac{QK^{T}}{\sqrt{d_k}})V. \end{aligned}$$

(1)

In practice, the model makes use of multi-head attention to learn different parts of the representation and in different positions [33]. Besides, attention is used both in decoder and encoder and in a self-attention manner [33]. In addition to attention, the models also use a fully connected position-wise feed-forward with positional encoding layer also [33].

The Transformer experimental results showed that only attention models overcame RNNs in quality and required significantly less time to train as in the on two machine English-French translation tasks [33]. The development of Transformer allowed the development of new promising models such as Bidirectional Encoder Representations from Transformers (BERT) and many others [33]. Due to its relevant results, the Transformer was considered in our experiments as one of the algorithms to be analysed in the ATS problem for Portuguese.

3 Related Works

ATS is a relevant task, with a growing interest in Natural Language Processing field in recent years. This section presents relevant methods developed, over the last decade, for the ATS problem for Brazilian Portuguese.

Recently, the ATS problem has been addressed as a monolingual machine translation problem, where a given text is translated into a simpler one. There are two relevant machine translation approaches: statistical machine translation (SMT) and neural machine translation (NMT). In the SMT, the translation of the original sentence f (called the translation model) into a sentence e (called the language model) is modelled on the Bayes Theorem as detailed in [1]. The research carried out by [30] treated the ATS problem of Portuguese texts as a translation task. The authors adopted the SMT approach to learning how to translate complex sentences into simple ones. The SMT was trained with a parallel corpus of original and simplified texts, aligned at the sentence level. The translations produced were evaluated using the metrics (Bilingual Evaluation Understudy) BLEU and manual inspection. According to both evaluations, the results were promising, and the overall sentence quality is not harmed. It was observed that some types of simplification operations, mainly lexical, are correctly captured.

In summary, despite all the advancements, there is still a gap in studies on the applications of NMT to Portuguese. NMT is a recently developed deep learning technique that has reached significant results over several complex tasks [2, 6, 23, 34, 39]. According to [2], NMT based methods have shown a better alternative than SMT techniques for translation problems. Although several works employed NMT for text simplification [2, 10, 23, 26, 32], no work has applied it in Portuguese texts. To the best of our knowledge, the last works in automatic text simplification in Portuguese was done more than ten years ago [3, 4, 30], and used SMT. Thus, this paper aims to explore state-of-the-art NMT methods for text simplification in Portuguese.

4 Materials and Methods

This section details the dataset, methods, experimental methodology and evaluation metrics of this work.

4.1 Data Description

This work adopted a parallel corpus based on different versions of the bible to evaluate the NMT methods. The first one is a traditional version called Almeida Revista e Corrigida (ARC), published in 1997, with a complex text style. The newer version, called Nova Almeida Atualizada (NAA), was launched in 2017, simplifying the traditional version^{Footnote 1}. In this paper, we also evaluated other versions of the bible, such as the Nova Tradução Linguagem de Hoje (NLTH), Nova Bíblia Viva (NBV) and the Nova Versão Internacional (NVI). Considering a diverse range of versions might provide different simplification types as explored in the Porsimples project [4].

Each dataset has 29070 aligned verses, which will be used to create the proposed sequence-to-sequence model. Table 1 provides more information about the corpora used in this study including number of tokens, sentences, readability and Lexical Diversity (extracted using pylinguistic library [7]). In that library, the readability ease (FRE) score was proposed by [17] and adapted to Portuguese by [22] (see Eq. 2).

$$\begin{aligned} FRE = 206.835 - (1.015 * ASL) - (84.6 * ASW) \end{aligned}$$

(2)

It adds 42 more points to the equation proposed by [22] because Portuguese words has more syllable then English ones and therefore, the score would be too penalised. ASL means the average sentence length and AWS means average number of syllables per word. Complex and Simplified Portuguese texts tend to have key differences. One of them is the size of each sentence and the number of tokens per sentence, as pointed out by a previous work, the Porsimples project, which manually simplified a newspapers corpus. The study found that, in most cases, the simplified versions had fewer words per sentence [4]. This was also observed in both versions of the bible.

In a random sample of 292 pairs of verses (0.01% of the original dataset), we analysed different aspects of the versions of the bible. The Table 1 and Fig. 1 shows the difference between the traditional ARC versions and the other ones.

Table 1. The table shows descritive statistics on texts using the pylinguistic library in a random sample of 303 pairs of verses (0.01% of the original dataset). It is possible to see that simplified text has, in general, fewer tokens per sentence and more sentences.

Full size table

Figure 1 shows the histograms stating the frequency of the number of tokens per sentence of each Bible. The simplified versions of the Bible have fewer tokens per sentence, with more sentences under the median. The distribution of tokens per sentence in the ARC is smoother, and it has a prevalence of longer sentences (i.e., more tokens per sentence). It is considerably different from the other versions, specially NLTH and NBV. The difference is even greater from the ARC to the NLTH version in almost all aspects, such as average tokens per sentence and median sentence length. Thus, both features indicate that the NLTH and the other versions are easier to read because it has fewer tokens per sentences and more sentences.

Table 2 exemplifies the aligned versions of ARC and NLTH Bibles used in the experiments. It is important to remain that ARC is considered the complex version to be simplified. The other versions are considered a target in separated experiments and combined in a unique dataset afterward.

Table 2. The table exemplifies the aligned versions of the Bible used in the experiments. The passage is from Genesis Chapter 1.1-4.

Full size table

4.2 Automatic Text Simplification

This section presents the details about the automatic text simplification architecture adopted in this study. We applied the Transformer and Bidirectional Recurrent Neural Network (B-RNN) architectures to simplify texts in Portuguese which are state-of-the-art methods.

Table 3. The table shows the hyperparameters used in both models.

Full size table

To perform the evaluation, we used the algorithms implemented at OpenNMT framework [18]. In OpenNMT, it is possible to train the model using different datasets. We considered each pair of SOURCE-TARGET a distinct corpus and assigned different weights based on the difference between the median sentence length from the ARC version to the target version of the corpus presented at Table 1. Further, the validation content received target examples in the same proportion of the weights. The main objective is to avoid over-fitting by increasing the training data. Also, it might allow the model to learn different simplification styles, which may improve the model generalisation. The combined corpus can be identified in the following tables as “multi-corpus”.

Finally, the dataset was split into 17441 verses for training, 8139 verses for validation, and 3489 verses for test parallel examples. The experiments considered different target corpus where the ARC bible version is always the input. Also, different encoder-decoder architecture were considered (see Table 5). Two different experiments were performed for each model: with and without pre-trained word embeddings (Portuguese glove embedding with 300 dimensions [14]). At last, a total of 20 different experiments were performed: 5 to each encoder-decoder architecture for each corpus. All the experiments considered a shared embedding and vocabulary and allowed the execution of 10000 training epochs. Detailed information on the experiment setup is given in Tables 3 and 4.

Table 4. The table shows the hyperparameters specific for each model.

Full size table

4.3 Evaluation

The evaluation used two metrics for translation and text simplification evaluation. The first one is the Bilingual Evaluation Understudy Score (BLEU score) [24]. The BLEU score is a widely used metric to evaluate machine translation between two languages based on a reference corpus. It also has been extensively used to assess automatic text simplification, especially the models based on mono language translation [5, 35]. Another metric is the System output Against References and against the Input (SARI score) [35]. Unlike the BLEU score, the initial purpose of the SARI score is to evaluate text simplification, considering the system output and references and the source sentence. In summary, SARI score measure how well the words are maintained or changed by the system [35]. Herein, it was used the SARI and BLEU score implementation^{Footnote 2} proposed by [5].

5 Results

This section presents the results obtained in the experiments. First, we discuss quantitative aspects of the supervised metrics, and then, a more in-depth discussion on the quality of the predictions is given. Table 6 synthesize the results from Table 5. Table 5 shows the detailed results of the text simplification for each architecture and pair of datasets analyzed. In this paper context, as long as it was not possible to find a massive, parallel corpus of Portuguese simplified texts and the experiment training time constraints, the B-RNN and B-RNN+Embedding achieved the best results. Despite the poor performance when compared with the BRNN model, the transformer model might improve its performance when trained with more epochs and with a larger corpus [11, 21, 29].

Table 5. The table shows the result of the text simplification to the different experiments. The B-RNN model outperforms all the other models when both metrics are considered.

Full size table

Table 6. The summary of metrics of Table 5.

Full size table

Table 7. The multi-corpus prediction was produced using the B-RNN model with and without pre trained embeddings. He both models removed a specific part of the sentence to try to make it shorter which is in bold text.

Full size table

5.1 Simplification Quality

One particular insight is that the simplification using multiple targets achieves a much higher BLEU score but has a lower SARI score in almost all experiments. This difference is due to the diverse nature of both metrics, i.e., the BLEU score measures the number of unigrams of the system prediction is part of the references. In other words, it calculates a “modified precision score”, which decreases the incentive of an over-generation of a particular word to obtain a high score [24]. Therefore, the high BLEU score might mean that the model is sharing a significant overlap with the references in the prediction^{Footnote 3}. On the other hand, the SARI score rewards the words that are maintained in both reference and source sentence [5, 36]. It also scores the addition of new words as long as they belong to at least one reference [5, 36]. Further, the metric showed to be intuitive on how the simplification gain is calculated [5, 36].

Table 8. Different scores produced by the simplification made over different corpus from the best performing methods which are B-RNN with and without pre trained embeddings.

Full size table

Figure 2 show that in the single training corpora, the metrics of readability is improve and it is even higher than the readability of the reference corpus. This means that the model was able to learn the style of the target corpus as it was pointed out by previous works [19, 38]. Besides, although the predicted texts have more tokens per sentence in average, the high readability score might mean that it is predicting short words as it is one of the aspects considered in the readability metric [7].

Finally, the model trained with a single corpus achieved a higher SARI score, indicating a better simplification. Nonetheless, it could not produce a sentence with the same level of grammatical correctness and semantic meaning in this particular example as the one produced by the multi-corpus training approach. It was pointed by [31] that even though SARI scores can represent the quality of the simplified sentences, the BLEU score performs better on scoring the grammatical meaning of the sentences.

Table 7 presents an example of the outcome of the best text simplification method. It shows details about the potential of the application of the proposal in practice. As presented, the text after applying the algorithm contains more general words than the original text. Even though it did not produce an exact translation of the tokens, the model is able to maintain the original meaning of the sentence and grammatical correctness.

6 Conclusion

Neural Machine Translation (NMT) methods have achieved successful results for the Text Simplification problem in different languages, overcoming traditional statistical approaches. To the best of our knowledge, no research has investigated the application of NMT methods to simplify Brazilian Portuguese texts. The main contribution of this paper is the application of NMT methods for the simplification of Portuguese text. Two different state-of-the-art NMT methods were considered: the Transformer and Bidirectional Recurrent Neural Network (B-RNN). The results demonstrated that the B-RNN was able to obtain the best results, in average (BLEU = 21.84 without pre trained embedding and SARI = 45.34 with pre trained embedding), despite the small corpus size and limited training epochs constraints.

Another significant improvement was the use of multiple corpora presenting different possible simplifications for the same input. It achieved an improvement of over 8 points on the BLEU score. Despite of a lower SARI score, the higher BLEU score might mean the ability to preserve the sentence meaning and grammatical correctness.

As future works, we intend to: (i) perform an analysis on the parameters of each algorithm evaluated; (ii) use different embedding models, such as BERT [11, 21, 29]; (iii) apply the NMT methods in text from different domains, such as law and health; (iv) explore the use of other methods such as lexical and syntactical simplification and pre-trained models for a mono-lingual translation approach.

Notes

1.
versions are available at: http://altamiro.comunidades.net/biblias.
2.
SARI and BLEU score implementation: https://github.com/feralvam/easse.
3.
BLEU score scale: https://cloud.google.com/translate/automl/docs/evaluate.

References

Al-Onaizan, Y., et al.: Statistical machine translation. In: Final Report, JHU Summer Workshop, vol. 30 (1999)
Google Scholar
Al-Thanyyan, S.S., Azmi, A.M.: Automated text simplification: a survey. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)
Article Google Scholar
Aluisio, S., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: Proceedings of the NAACL HLT 2010 5th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–9 (2010)
Google Scholar
Aluísio, S.M., Gasperin, C.: Fostering digital inclusion and accessibility: the PorSimples project for simplification of Portuguese texts. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 46–53. Association for Computational Linguistics (2010)
Google Scholar
Alva-Manchego, F., Martin, L., Scarton, C., Specia, L.: EASSE: easier automatic sentence simplification evaluation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, pp. 49–54. Association for Computational Linguistics (November 2019). https://www.aclweb.org/anthology/D19-3009
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Castilhos, S., Woloszyn, V., Barno, D., Wives, L.K.: Pylinguistics: an open source library for readability assessment of texts written in Portuguese. Revista de Sistemas de Informação da FSMA 18, 36–42 (2016)
Google Scholar
Chu, C., Wang, R.: A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258 (2018)
Collantes, M., Hipe, M., Sorilla, J.L., Tolentino, L., Samson, B.: Simpatico: a text simplification system for senate and house bills. In: Proceedings of the 11th National Natural Language Processing Research Symposium, pp. 26–32 (2015)
Google Scholar
Cooper, M., Shardlow, M.: CombiNMT: an exploration into neural text simplification models. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5588–5594 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Freitag, M., Al-Onaizan, Y.: Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897 (2016)
Gao, Y., et al.: IBM MASTOR system: multilingual automatic speech-to-speech translator. Technical report, IBM Thomas J Watson Research Center Yorktown Heights, NY (2006)
Google Scholar
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength Natural Language Processing in Python (2020). https://doi.org/10.5281/zenodo.1212303
José, M., Finatto, B.: Acessibilidade textual e terminológica: promovendo a tradução intralinguística. Estudos Linguísticos (São Paulo. 1978) 49(1), 72–96 (2020). https://doi.org/10.21165/el.v49i1.2775
Kincaid, J.P., Fishburne, R.P., Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada, pp. 67–72. Association for Computational Linguistics (July 2017). https://www.aclweb.org/anthology/P17-4012
Krishna, K., Wieting, J., Iyyer, M.: Reformulating unsupervised style transfer as paraphrase generation. arXiv preprint arXiv:2010.05700 (2020)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Article Google Scholar
Martins, T.B., Ghiraldelo, C.M., das Graças Volpe Nunes, M., de Oliveira Junior, O.N.: Readability formulas applied to textbooks in Brazilian Portuguese. Icmsc-Usp (1996)
Google Scholar
Nisioi, S., Štajner, S., Ponzetto, S.P., Dinu, L.P.: Exploring neural text simplification models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short papers), pp. 85–91 (2017)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Park, S.H., Kim, B., Kang, C.M., Chung, C.C., Choi, J.W.: Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1672–1678 (2018). https://doi.org/10.1109/IVS.2018.8500658
Qiang, J.: Improving neural text simplification model with simplified corpora. arXiv preprint arXiv:1810.04428 (2018)
Rescigno, A.A., Vanmassenhove, E., Monti, J., Way, A.: A case study of natural gender phenomena in translation a comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish. In: Association for Machine Translation in the Americas (AMTA): Workshop on the Impact of Machine Translation, iMpacT 2020, p. 62. Workshop on the Impact of Machine Translation (iMpacT 2020) at Association (2020)
Google Scholar
Sikka, P., Singh, M., Pink, A., Mago, V.: A survey on text simplification. arXiv preprint arXiv:2008.08612 (2020)
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, 20–23 October (2020, to appear)
Google Scholar
Specia, L.: Translating from complex to simplified sentences. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 30–39. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_5
Chapter Google Scholar
Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. arXiv preprint arXiv:1810.05022 (2018)
Sulem, E., Abend, O., Rappoport, A.: Simple and effective text simplification using semantic and neural methods. arXiv preprint arXiv:1810.05104 (2018)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wang, T., Chen, P., Amaral, K., Qiang, J.: An experimental study of LSTM encoder-decoder model for text simplification. arXiv preprint arXiv:1609.03663 (2016)
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016). https://cocoxu.github.io/publications/tacl2016-smt-simplification.pdf
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Article Google Scholar
Yamada, M.: The impact of Google Neural Machine Translation on post-editing by student translators. J. Specialised Transl. 31, 87–106 (2019)
Google Scholar
Yang, Z., Hu, Z., Dyer, C., Xing, E.P., Berg-Kirkpatrick, T.: Unsupervised text style transfer using language models as discriminators. arXiv preprint arXiv:1805.11749 (2018)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

Download references

Author information

Authors and Affiliations

Universidade Federal Rural de Pernambuco, Rua Dom Manuel de Medeiros, Recife, Pernambuco, 52171-900, Brazil
Tiago B. de Lima, André C. A. Nascimento, George Valença, Pericles Miranda & Rafael Ferreira Mello
Bankura Unnayani Institute of Engineering, Subhankar Nagar, Bankura, Pohabagan, 722146, West Bengal, India
Tapas Si

Authors

Tiago B. de Lima
View author publications
Search author on:PubMed Google Scholar
André C. A. Nascimento
View author publications
Search author on:PubMed Google Scholar
George Valença
View author publications
Search author on:PubMed Google Scholar
Pericles Miranda
View author publications
Search author on:PubMed Google Scholar
Rafael Ferreira Mello
View author publications
Search author on:PubMed Google Scholar
Tapas Si
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Tiago B. de Lima , André C. A. Nascimento or Rafael Ferreira Mello .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Lima, T.B., Nascimento, A.C.A., Valença, G., Miranda, P., Mello, R.F., Si, T. (2021). Portuguese Neural Text Simplification Using Machine Translation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_37
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Portuguese Neural Text Simplification Using Machine Translation

Abstract

Similar content being viewed by others

Neural Machine Translation for Low Resource Assamese–English

A Low-Resource Language Translation: French to Mooré

Neural Network Machine Translation Model Based on Deep Learning Technology

1 Introduction

2 Background

3 Related Works