Abstract
Methods for document summarization hold significance in numerous applications, particularly in scenarios involving extensive content, such as news and social media monitoring. Nevertheless, enhancing the task of summarizing multiple documents remains an area for improvement. Systems that generate summaries from multiple documents must take into account additional challenges such as document redundancies and inconsistencies. On the other hand, as multi-document databases contain multiple descriptions of the same content, they offer benefits that go beyond the availability of more content and we believe that this benefit has not been explored in the literature. Different authors may write different descriptions of the same event, some more objective, others more detailed, or even with different terms or writing styles. As a way to take advantage of this availability of different descriptions, we present a new approach to evaluating documents in order to identify those that can contribute most effectively to the generation of automatic summaries. For this, we employed a Siamese network, which was trained using the ROUGE metric observed in individual documents. Additionally, we demonstrated how to apply the outcomes of document evaluation to different summarization techniques. Our experiments included a comparison with SOTA approaches (LeadSum, TextRank, PacSum, BertSum) in multiple datasets encompassing news, events, Wikipedia texts, and scientific publications (Multi-News, WCEP, WikiSum, arXiv, Multi-XScience) and the results indicated our approach had a relevant improvement in the production of summaries, with statistical significance in the evaluation with Wilcoxon rank sum for a confidence factor of 95%.
Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Automatic text summarization is the process that applies computational methods to produce a condensed version of one or more input documents while preserving the most relevant information [7]. Particularly in the case of summarizing multiple documents, in addition to selecting the most relevant content, there is the challenge of avoiding redundancies and inconsistencies. News summarization is an example where an ongoing event can be reported by different channels, resulting in different descriptions with different updates.
According to [22], many of the multi-document summarization models combine inputs as a flat sequence and do not consider relationships between documents. We believe the combination of input documents can help provide a more accurate description of the content, and our goal is to improve summarization by leveraging this variety of input documents.
More specifically, we will apply a Siamese network [13] to evaluate different input documents and identify those capable of producing better summaries. The difference between documents arises, for example, from the different volumes of information and objectivity. To emphasize these differences, we will consider different extractive summarization techniques. Extractive summarization aims to produce a summary selecting a representative subset of the sentences or paragraphs from the source text without making any changes in them, opposed to abstractive summarization which aims to represent the input content with new words and sentences. Despite the promising results with abstractive methods, it is not possible to guarantee that the summaries produced will be consistent with the facts [32], which can be limiting in some usage scenarios.
The Siamese Neural Network is a natural choice for our approach, as these networks are efficient in comparison learning. We use datasets composed of multiple input documents and at least one reference summary. Every input document will be evaluated using the ROUGE metric and, after that, it will be reorganized into pairs indicating which one obtained the best evaluation. This arrangement is compatible with the Siamese network application, which needs to learn to recognize the qualities that enable one document to be evaluated more highly than another. We used a training dataset and a separate test dataset, demonstrating generalization to unseen data. As an additional benefit, reorganizing documents into pairs produces a larger volume of training data.
Our main contributions are as follows:
-
Proposal of a document comparison method, based on Siamese networks, applied to the automatic summarization of multiple documents;
-
Description of changes made to a variety of summarization methods to take advantage of document prioritization;
-
Experiments validated with statistical tests pointing to relevant differences in our results;
-
Comparison of some popular summarization techniques using the same datasets and testing methodology.
-
Our document prioritizing code is made publicly availableFootnote 1.
The remainder of this article is organized as follows. We review the related work in Sect. 2. Section 3 presents the proposed model description. Section 4 describes the datasets, the baseline methods, the changes made to the methods to include document prioritization and information about the testing methodology. In Sect. 5 the results and discussions are presented. Finally, Sect. 6 brings the conclusion and future works.
2 Related Work
Cross-document relations should be taken into account while summarizing multiple documents. These relationships are typically used by summarizing techniques to reduce repetition and identify the most important details. Applying clustering techniques, such as in [8, 33], is a way to reduce redundancy.
Graphs are often used to represent the relationship between pieces of text, as in [31]. This work used heterogeneous vertices to represent information of larger granularity, such as sentences, or smaller granularity, such as words. The vertices contained embeddings to represent information, which were updated with graph attention networks [29]. In [2] GRAPHSUM is presented, which analyzes text fragments to select sentences. Text fragments are represented as nodes and a variation of the Pagerank method is proposed to favor positive correlations.
The encoder-decoder framework is a widely used method in abstractive summarization approaches, including multi-document scenarios. With this method, the encoder layer aims to create a representation of the input texts, and the decoder generates new sentences using this representation. The methodology has shown good results, although there are still some challenges, especially with long documents. To mitigate this difficulty, [15] proposed the use of graphs to guide abstractive summarization.
A popular datasets for evaluating multi-document approaches is presented in [9], which also proposes a summarization technique. The suggested approach is a change to the Point-Generator summarizer [26], applying an attention layer that includes the importance of sentences. To determine the importance of sentences, a bidirectional LSTM network is applied to produce embeddings. Using the sentence representations as input, a similar process is applied to produce the document representation. This outputs are used to identify the most important sentences, which occurs with a variation of the MMR [3] technique.
In the works retrieved, the main motivation for extracting cross-document relations is the reduction of redundancy and identification of the most important sentences. In contrast to these approaches, our goal is to evaluate entire documents. Beyond just finding documents with the most pertinent information, our method seeks to identify those that are capable of producing the best summaries. A possible application, which will be demonstrated in our experiments, is the use of an additional layer in the evaluation of sentences, prioritizing the sentences that are present in the best documents.
3 Model Description
The suggested approach seeks to determine which articles produce the best summaries, allowing documents to be ranked in order of importance for information extraction. The hypothesis is that better summaries can be produced when more sentences are extracted from the best documents. In the datasets we consider, which are presented in Sect. 4, we observe that each sample is composed of a variable number of input documents. For simplicity, we suggest a technique for evaluating pairs of documents, which allows us to rank documents by applying multiple comparisons.
In this way, the model needs to be able to determine which of the two input texts has the highest likelihood of yielding a good summary, and Siamese networks are useful for learning by comparison [16]. In its simplest configuration, a Siamese network is composed of two attribute extraction networks, one for each input, and a comparison head. This architecture is flexible enough to allow deep-learning techniques to be applied to the feature extraction layer, which is a configuration that has achieved state-of-the-art results in different tasks [1, 14, 30]. Additionally, the two feature extraction networks have identical connection weights, ensuring a similar processing method for both input data.
In our model, as depicted in Fig. 1, each input document is processed by a neural network and produces a representation as document embeddings. These embeddings are used in a classifier to identify the best document. During training, the error observed in the classification is used to adjust the entire network, including the two networks that produce the document embeddings. With this approach, the reduction of errors in the classifier occurs through the optimization of document representations, which requires the network to learn how to identify the most useful attributes for evaluating document quality.
Figure 2 shows the network in more detail up until the point where the document representation stage is completed. In this figure, the input documents are used to produce word embeddings that are processed by a convolutional neural network (CNN) [17]. CNN provide adjustable configurations that enable evaluating the precision increase as a function of computing cost in addition to the possibility of applying parallel processing. Furthermore, this type of network has been successfully applied to a variety of problems such as news classification [10], medical image classification [5] and energy demand prediction [25].
Our network is composed of one-dimensional filters, with multiple planes and different window sizes. This means that the filters consider each word embedding as a multidimensional point, that the filters are moved along the text, and that at each iteration several tokens are evaluated simultaneously, depending on the size of the window. Filter results are compiled in a max-pooling layer. During the training, we included a dropout procedure, which in our experiments helped to increase generalization. At the end, a fully connected layer is included to performs dimensionality reduction and produce the document representation. The step after producing document embeddings, not represented in this figure, is the union of two document representations in a fully connected layer and a classifier activated with a Sigmoid function.
To train the network, we need labeled pairs of documents that we construct using the training data. We start by evaluating all documents with the ROUGE metric. Using this evaluation, we choose two documents, \(d_1\) and \(d_2\), and assign the label 1 if the ROUGE evaluation of \(d_1\) is better than \(d_2\) and 0 otherwise. We take care during this process to maintain class balance and avoid duplicating documents in order to prevent overfitting. Furthermore, as pointed out by [28], differences in the lengths can interfere with the evaluation with ROUGE, so we are using summaries of the input documents truncated with the same number of tokens. For simplicity, summaries are produced by extracting the initial tokens from documents and we leave other approaches for future work.
To apply the trained network to order sets of documents, we need a strategy using pairwise comparisons. Since each sample we considered was composed of a few input documents, for the sake of simplicity, we are applying the comparison to all possible pairs. In this way, considering all possible pairs of documents, we count the number of times that a document \(d_x\) obtained a score higher than \(d_y\), according to Eq. 1. In this equation, D is the set of input documents, and \(S(d_i)\) is the output of the Siamese network.
Equation 1 is repeated for all documents allowing the identification of the best ones. To simplify integration with the other methods that will be presented in Sect. 4, we normalize the document scores to sum to 1.
4 Materials and Methods
This section presents the materials used in the experiments, including the datasets and baseline methods. It also describes the changes we made to include document prioritization and some implementation details.
4.1 Datasets
Few datasets are available for multi-document summarization, mainly because producing reference summaries is labor-intensive. Some of the datasets we use are composed of news, where it is possible to find different sources describing the same event.
Multi-News [9]: Contains articles extracted from the news aggregation NewserFootnote 2. This site contains human-written news summaries that include citations to external sources of information, gathering information from over 1,500 news channels. To compose this dataset, summaries are used as references and the news is cited as sources. There are a few versions of this data and we are using the pre-processed version without truncation.
WCEP [11]: Contains documents extracted from the Wikipedia Current Events PortalFootnote 3. Following Wikipedia guidelines, summaries are short, approximately 35 words, written in the present tense and avoiding opinions and sensationalism. Each summary averages only 1.2 cited sources, but this number has been incremented through documents extracted from the Common Crawl News datasetFootnote 4. We are using the version distributed by the authors, which was truncated at a maximum of 100 documents per sample.
WikiSum [18]: Wikipedia was used to create this dataset by providing reference summaries. The task is to create the lead, or first part, of the Wikipedia page using the references listed in the article and ten Google search engine results. However, due to its enormous size, some works only use a part of this data. In this work we are using the partition of [20], which contains the first 40 paragraphs selected with a logistic regression method that was trained using ROUGE-2-Recall from the comparison of paragraphs with the reference summary. Since the input data is made up of separate paragraphs that are not identified with the document from which it was taken, we are treating each paragraph as a separate document.
arXiv [4]: Scientific publications taken from arXivFootnote 5 are included in this dataset. Since many of the datasets now in use are composed of news articles or other shorter content, the objective was to develop a dataset composed of longer documents. The extracted content contains the abstracts, which are used as reference summaries, and the article sections which are used as input. Since we are interested in multi-document datasets, we are considering each section as an input document.
Multi-XScience [21]: It is a dataset of scientific articles, where the objective is to produce the related works section using the abstract of the article itself and a collection of reference documents. The articles were extracted from arXiv, and related documents were identified by applying a set of heuristics in Microsoft Academic Graph [27].
Table 1 presents a summary of the evaluated datasets. The “Train Test Val” corresponds to the number of samples in each data partition. “Doc” is the average number of input documents in each sample. “Doc/Ref Len” is the average number of tokens per document and reference, calculated using the nltkFootnote 6 library.
4.2 Baseline and Integration Methods
In this section, we describe the methods we used to compare the results and the procedure we applied to integrate with document prioritization.
LeadSum: Since a part of the datasets is composed of news, we are considering extracting content from the beginning of articles. This strategy is justified by the fact that in journalistic content the first sentences are usually the most important [12]. In this way, all content is merged into a single large document, and the initial content is extracted until the length defined for the summary is reached.
DocScore+LeadSum: To use document prioritization, we are reordering the input documents, moving the most relevant ones to the beginning. This way, if the most relevant document is larger than the size of the summary, all content will be extracted from this source. Otherwise, content from other documents will be used following the same order of relevance.
TextRank [23]: Is an unsupervised extractive summarization method based on graphs. In this method, the sentences are the vertices and the similarity between the sentences is used to define connection weights. Using this graph, PageRank is applied to assign a score to each sentence, which is used to rank the sentences. Summaries are produced by selecting the sentences with the highest score. In our implementation, the similarity between sentences was calculated with Tf-Idf (term frequency–inverse document frequency).
DocScore+TextRank: The result of processing with TextRank is a score value related to each sentence. To integrate with document prioritization, we are adding the document score to the paragraph score, resulting in a greater probability of selecting the paragraphs from the best documents. In this sum, we are applying a weighting factor, according to Eq. 2, where \(T(s_{d, i})\) is the score obtained with TextRank for sentence i of document d, S(d) is the document score obtained with our model and \(\alpha \) is an adjustable parameter.
PacSum [34]: This is a summarization method that represents the input contents in the form of digraphs. The authors argue that when selecting sentences to construct summaries, there are those that contain the most relevant information and those that are complementary. A simple approach is proposed to identify the most relevant sentences considering their position of occurrence in the text. In the proposed digraph, the vertices are sentence embeddings, and the direction of the connections is defined by the order of occurrence of the paragraphs in the text. The most central sentences are selected, and centrality is defined with Eq. 3. In this equation, i and j are positions of sentences in the input documents, \(s_i\) is a sentence, \(e_{i,j}\) is a measure of similarity between sentences \(s_i\) and \(s_j \) and \(\lambda _1\) and \(\lambda _2\) are adjustable parameters that determine the importance of the initial and final sentences, respectively.
DocScore+PacSum: PacSum’s adjustable parameters are calibrated through a supervised process. In particular, the parameters \(\lambda _1\) and \(\lambda _2\) are configured to prioritize the initial and final sentences. This way, the method already has a mechanism in addition to sentence similarity that we will reuse to prioritize the most important documents. To achieve this, our approach was to reorder the input documents, keeping the most relevant ones first, which allows the parameters \(\lambda _1\) and \(\lambda _2\) to be adjusted more effectively.
BertSum [19]: It is a supervised method for producing extractive summaries using the BERT [6] network. The network receives the content to be summarized, which has been changed to include sentence separation and produce contextualized sentence embeddings. To select sentences, some approaches are evaluated, including a simple classifier, an inter-sentence Transformer, and an LSTM network. As an additional resource to avoid redundancy, summaries are produced avoiding inserting sentences with repeated trigrams.
DocScore+BertSum: It is challenging to use BertSum when there is a large amount of input data because it processes all input sentences at once. In the BERT network, input documents are truncated into 512 tokens, which in our test case can lead to the deletion of complete documents. To minimize this limitation, we reorganize the input documents in a manner similar to our approach using PacSum, allowing the best documents to participate in the summarization.
Oracle: Inserted as an upper bound for extractive methods. We are using a greedy approach similar to the one presented in [19]. Starting with an empty set, in each iteration, we include a sentence, and this sentence must be the one that increases the ROUGE metric the most of all sentences in all documents. Furthermore, repeated sentences should not be included. The process is repeated until the size defined for the summary is reached or there is no sentence that increases the summary score.
Oracle Lead: An additional upper bound that exclusively takes document ordering into account. Input documents were evaluated with ROUGE and ordered best first. Using the data in this sequence we apply LeadSum, which is the simplest classifier.
4.3 Implementation Details
Our goal when running the experiments is to find out whether adding document prioritization improved the quality of summaries. In order to achieve this, we conducted experiments on the datasets and evaluated how well the baseline and modified versions of the algorithms performed. All training of the document prioritization model was carried out with the training data and followed up with the validation data. We terminated training when no reductions in classification error were observed in the validation set for three consecutive iterations.
To determine the model settings, we performed tests on the Multi-News dataset, applying the same configuration to all datasets. In our evaluation, we only considered the quality of the results, disregarding the computational cost, although in our studies the larger network did not always produce better results. In this way, 128 convolutional kernels were used, and we produced document embeddings with a length of 32 values. The input data was converted to lowercase and we are using a vocabulary with the most frequent 10,000 lemmatized tokens. The word embeddings were generated with fastText [24], which has a sub-word mechanism capable of avoiding out-of-vocabulary occurrences. We used the distribution trained with Common Crawl and allowed these representations to be optimized in the training. As mentioned in Sect. 3, the input documents are truncated to the same size as expected for the summary.
Some of the baseline methods also require defining attributes. In the modified version of TextRank, we include the \(\alpha \) variable that needs to be defined. In order to achieve this, we generate summaries for various values of \(\alpha \) using the validation data. We used the first 1,000 samples from the validation set to evaluate ten \(\alpha \) values between 0 and 1.0, using the best-performing configuration to produce the summaries in the test set. For PacSum we use the optimizer provided by the authors, which selects values using the grid-search method. In this process, in the same way as in the original article, we use the validation set. We also use the version of BERT that was fine-tuned by the method’s authors.
We configured BertSum to select sentences with a classifier in the output layer and maintained the default settings, which include the block-trigram strategy. We train the model with the training dataset, evaluating with the validation set every 1,000 training iterations. The training was stopped after three consecutive iterations with no decrease in error on the validation set. Because it is a very large dataset, when training BertSum with WikiSum, we reduced the validation set to the initial 3,200 samples. We also reduced the training set to 20% of the total, which is enough to complete training in the first epoch, since in our experiments training ended approximately after using 5% of the data.
According to [28], evaluation using ROUGE can be influenced by the length of the summaries. This way, when generating summaries, enough sentences are chosen and the final sentence is truncated to reach the number of tokens specified by the summary, producing summaries that are precisely the same length across all methods. Summary lengths were chosen through a literature search. This way we are using 300 tokens for Multi-News, 40 for WCEP, 60 for WikiSum, 220 for arXiv, and 120 for Multi-XScience.
All experiments were repeated 10 times, always with different seeds of random numbers. Using Wilcoxon rank-sum statistics, we compared the statistical significance of the results from the prioritized version with the results from the original methods. For statistical tests, it is also necessary to repeat the tests with the original versions of the methods. Since in the LeadSum, PacSum, and BertSum methods, the order of documents is important, we are randomly changing the order of the input documents in each experiment. With this approach, we are comparing the random reordering of input documents with the reordering using our prioritization method.
The original version of TextRank does not use random numbers and is not affected by the order of documents, so we use another comparison approach. In its original implementation, the ROUGE metric is presented along with confidence intervals, which are calculated with bootstrap for a 95% confidence factor. In this way, we performed ten repetitions of our modified version and compared the median with this confidence interval.
5 Results and Discussion
5.1 Classification Scores
In this section, the classification results using the Siamese network are presented. The input data are the document pairs produced as indicated in Sect. 3, so these results refer to binary classification to identify the best document. Document pairs were produced for the training, testing, and validation data, and we are presenting the results with the testing data after the training is complete. The results are presented in Table 2.
We conclude from the results that the suggested network helps identify the best documents. Despite this, especially in the Multi-XScience dataset, the accuracy was not very high. We therefore conducted experiments to determine the situations in which the model made the most mistakes. In particular, we were interested in identifying the relationship between the difference between ROUGE values and classification performance.
Since each training sample is composed of a pair of documents, we consider the difference in ROUGE evaluations between documents, reusing the evaluation applied to construct the training data. Ordering the documents by this difference, we divide the test data into sets of 500 samples. Thus, the first group corresponds to documents with similar evaluations and the last group contains documents with greater differences in quality.
For each group produced, we calculate the f1-Score metric and the mean difference of ROUGE. We calculated the Spearman correlation between these values and the results are in the last column of Table 2, where it is possible to verify a very strong correlation. This indicates that the proposed Siamese network works better when the input documents have a large difference in the evaluation and not so well when the evaluations are very similar.
These results are also represented in Fig. 3. To simplify visualization, this figure represents a simplified version of the correlation experiment. In this experiment, we applied the same ordering procedure by ROUGE difference, with the change that the results were separated into just 10 groups. This figure shows that classifier performance is not particularly significant while the difference in evaluations with ROUGE is close to zero, but it advances to substantially greater performance when the difference with ROUGE increases.
5.2 Summarization Scores
We evaluate the results with the ROUGE 1, 2, and L metrics. In Table 3 are the results with ROUGE-1, and correspond to the medians after 10 repetitions of the experiments. The results with the other metrics were equivalent and are available with our code. We are highlighting in bold the best results that have statistical significance in the evaluation with Wilcoxon rank-sum for a confidence factor of 95%. At TextRank we highlight the best results that exceed confidence margins calculated with bootstrap and a confidence factor of 95%. When the results are not different enough to achieve statistical significance, we highlight both values in bold.
Evaluating these results, we can see that document prioritization had a significant impact on the results, which is evident in Oracle-Lead. Even applying the simplest summarizer, the perfect prioritization of input documents brought the best results. Although without achieving perfect ordering, the proposed model for prioritizing documents allowed important improvements. In these results presented, the median of ROUGE-1 increments was 2.7.
The prioritization of documents brought benefits in most results, and in the worst case, it obtained results equivalent to the non-prioritized version. Equivalent results were obtained with TextRank on the WCEP and Multi-XScience datasets. The original version of TextRank is not influenced by document order and prioritization gives an advantage to the best documents. In the case of WCEP, considering that it is the dataset with the largest volume of input documents, it might be necessary to increase the advantage of the best documents, which could be achieved by applying higher values of \(\alpha \) in Eq. 2. The Multi-XScience dataset had the smallest increase in the evaluation and was also the dataset with the lowest precision presented in Sect. 5.1.
6 Conclusions
We present a document comparison approach using a Siamese network that is useful for identifying the best documents in multi-document summarization tasks. As shown in the results, document prioritization can help increase the accuracy of summaries produced with different summarization techniques.
The integration process with several methods from the literature was presented, and a possible future work is integration with other methods.
Since each experiment was conducted with identical tools and procedures, in addition to evaluating document prioritization, it is possible to compare different summarization techniques.
In pairs, the original and prioritized versions were compared with statistical tests, indicating a significant difference in the results.
References
Bandara, W.G.C., Patel, V.M.: A transformer-based Siamese network for change detection. In: IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, pp. 207–210 (2022)
Baralis, E., Cagliero, L., Mahoto, N., Fiori, A.: GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf. Sci. 249, 96–109 (2013)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, New York, NY, USA, pp. 335–336. Association for Computing Machinery (1998)
Cohan, A., et al.: A discourse-aware attention model for abstractive summarization of long documents. In: Walker, M., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, Louisiana, pp. 615–621. Association for Computational Linguistics, June 2018
De Castro Santos, M.A., Berton, L.: An enhanced framework for overcoming pitfalls and enabling model interpretation in pneumonia and COVID-19 classification. IEEE Access 11, 115330–115347 (2023)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics, June 2019
El-Kassas, W.S., Salama, C.R., Rafea, A.A., Mohamed, H.K.: Automatic text summarization: a comprehensive survey. Expert Syst. Appl. 165, 113679 (2021)
Ernst, O., et al.: Proposition-level clustering for multi-document summarization. arXiv e-prints arXiv:2112.08770, December 2021
Fabbri, A.R., Li, I., She, T., Li, S., Radev, D.R.: Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 1074–1084 (2020)
Garcia, K., Shiguihara, P., Berton, L.: Breaking news: unveiling a new dataset for Portuguese news classification and comparative analysis of approaches. PLOS ONE 19(1), 1–15 (2024)
Ghalandari, D.G., Hokamp, C., Pham, N.T., Glover, J., Ifrim, G.: A large-scale multi-document summarization dataset from the wikipedia current events portal. arXiv preprint arXiv:2005.10070 (2020)
Grenander, M., Dong, Y., Cheung, J.C.K., Louis, A.: Countering the effects of lead bias in news summarization via multi-stage training and auxiliary losses. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 6019–6024. Association for Computational Linguistics, November 2019
Koch, G., Zemel, R., Salakhutdinov, R., et al.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, Lille, vol. 2 (2015)
Lan, M., Zhang, J., He, F., Zhang, L.: Siamese network with interactive transformer for video object segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1228–1236 (2022)
Li, W., Xiao, X., Liu, J., Wu, H., Wang, H., Du, J.: Leveraging graph to improve abstractive multi-document summarization. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6232–6243. Association for Computational Linguistics, Online, July 2020
Li, Y., Chen, C.L.P., Zhang, T.: A survey on Siamese network: methodologies, applications, and opportunities. IEEE Trans. Artif. Intell. 3(6), 994–1014 (2022)
Li, Z., Liu, F., Yang, W., Peng, S., Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 33(12), 6999–7019 (2022)
Liu, P.J., et al.: Generating Wikipedia by summarizing long sequences. arXiv e-prints arXiv:1801.10198, January 2018
Liu, Y.: Fine-tune BERT for extractive summarization. arXiv e-prints arXiv:1903.10318, March 2019
Liu, Y., Lapata, M.: Hierarchical transformers for multi-document summarization. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 5070–5081. Association for Computational Linguistics, July 2019
Lu, Y., Dong, Y., Charlin, L.: Multi-XScience: a large-scale dataset for extreme multi-document summarization of scientific articles. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8068–8074. Association for Computational Linguistics, Online, November 2020
Ma, C., Zhang, W.E., Guo, M., Wang, H., Sheng, Q.Z.: Multi-document summarization via deep learning techniques: a survey. ACM Comput. Surv. 55(5) (2022)
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. ACLdemo 2004, USA, p. 20–es. Association for Computational Linguistics (2004)
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Rick, R., Berton, L.: Energy forecasting model based on CNN-LSTM-AE for many time series with unequal lengths. Eng. Appl. Artif. Intell. 113, 104998 (2022)
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Barzilay, R., Kan, M.Y. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1073–1083. Association for Computational Linguistics, July 2017
Sinha, A., et al.: An overview of microsoft academic service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015 Companion, New York, NY, USA, pp. 243–246. Association for Computing Machinery (2015)
Sun, S., Shapira, O., Dagan, I., Nenkova, A.: How to compare summarizers without target length? Pitfalls, solutions and re-examination of the neural summarization literature. In: Bosselut, A., et al. (eds.) Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, Minneapolis, Minnesota, pp. 21–29. Association for Computational Linguistics, June 2019
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. arXiv e-prints arXiv:1710.10903, October 2017
Viji, D., Revathy, S.: A hybrid approach of weighted fine-tuned BERT extraction with deep Siamese Bi - LSTM model for semantic text similarity identification. Multimedia Tools Appl. 81(5), 6131–6157 (2022)
Wang, D., Liu, P., Zheng, Y., Qiu, X., Huang, X.: Heterogeneous graph neural networks for extractive document summarization. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6209–6219. Association for Computational Linguistics, Online, July 2020
Xu, S., Zhang, X., Wu, Y., Wei, F., Zhou, M.: Unsupervised extractive summarization by pre-training hierarchical transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 1784–1795. Association for Computational Linguistics, Online, November 2020
Zhao, J., et al.: SummPip: unsupervised multi-document summarization with sentence graph compression. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, New York, NY, USA, pp. 1949–1952. Association for Computing Machinery (2020)
Zheng, H., Lapata, M.: Sentence centrality revisited for unsupervised summarization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 6236–6247. Association for Computational Linguistics, July 2019
Acknowledgments
This work has been supported by the Brazilian research agencies Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Garcia, K., Berton, L. (2025). Siamese Network-Based Prioritization for Enhanced Multi-document Summarization. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15413. Springer, Cham. https://doi.org/10.1007/978-3-031-79032-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-79032-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79031-7
Online ISBN: 978-3-031-79032-4
eBook Packages: Computer ScienceComputer Science (R0)


