key: cord-0199023-77v765pr
authors: Heibi, Ivan; Peroni, Silvio
title: A quantitative and qualitative citation analysis of retracted articles in the humanities
date: 2021-11-09
journal: nan
DOI: nan
sha: 3e3086cb1c8cd30809ecce715c7c294b3e6f3c55
doc_id: 199023
cord_uid: 77v765pr

In this article, we show and discuss the results of a quantitative and qualitative analysis of citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, subject, etc.) and the characteristics of their in-text citations (e.g., intent, sentiment, etc.). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities' abstracts and the in-text citation contexts. As part of our main findings, we noticed a continuous increment in the overall number of citations after the retraction year, with few entities which have either mentioned the retraction or expressed a negative sentiment toward the cited entities. In addition, on several occasions we noticed a higher concern and awareness when it was about citing a retracted article, by the citing entities belonging to the health sciences domain, if compared to the humanities and the social sciences domains. Philosophy, arts, and history are the humanities areas that showed the higher concerns toward the retraction.

The retraction is a way for correcting the literature and alerting readers on erroneous materials in the published literature. A retraction should be formally accompanied by a retraction notice -a document that justify such a retraction. Reasons for retraction includes plagiarism, peer review manipulation, unethical research, etc. (Barbour et al., 2009) Several works in the past studied and uncovered important aspects regarding this phenomenon, such as the reasons for retraction (Casadevall et al., 2014; Corbyn, 2012) , the temporal characteristics of the retracted articles (Bar-Ilan & Halevi, 2018) , their authors' countries of origin (Ataie-Ashtiani, 2018) , and the impact factor of the journals publishing them (Fang & Casadevall, 2011) . Other works have analyzed authors with a higher number of retractions (Brainard, 2018) , and the scientific impact, technological impact, funding impact, and Altmetric impact in retractions (Feng et al., 2020) .

Other studies focused on the retraction in the medical and biomedical domain (Gaudino et al., 2021; Campos-Varela et al., 2020; Gasparyan et al., 2014) , and some recent works have also addressed the retractions of several recent COVID-19-related articles (Yeo-Teh & Tang, 2021) .

Scientometricians have also proposed several works on retraction based on quantitative data.

For instance, several works (Lu et al., 2013; Azoulay et al., 2017; Mongeon & Larivière, 2016; Shuai et al., 2017) focused on showing how a single retraction could trigger citation losses through an author's prior body of work. The work by Bordignon (2020) investigated the different impacts that negative citations in articles and comments posted on postpublication peer review platforms have on the correction of science, while Dinh et al. (2019) applied descriptive statistics and ego-network methods to examine 4,871 retracted articles and their citations before and after retraction. Other authors focused on the analysis of the citations made before the retraction (Bolland et al., 2021) and on a specific reason of retraction such as misconduct (Candal-Pedreira et al., 2020) . The studies that considered only one retraction case usually observed also the in-text citations and the related citation context in the articles citing retracted publications (van der Vet & Nijveen, 2016; Bornemann-Cimenti et al., 2016; Luwel et al., 2019; Schneider et al., 2020) .

Although a citation analysis toward the retraction has been done several times in STEM (Science, Technology, Engineering, and Mathematics) disciplines, less attention has been given to the humanities domain. One of this rare analysis done in the humanities domain has been recently presented by Halevi (2020) , which considered two examples of retracted articles and showed their continuous post-retraction citations.

Our study wants to expand the works concerning the analysis of citations to retracted articles in the humanities domain. By combining quantitative and qualitative analyses, we aimed at understanding this phenomenon in the humanities, which gained little attention in the past literature. In particular, the research questions (RQ1-RQ3) we aimed to address are:

1. How did scholarly research cite retracted humanities articles before and after their retraction?

2. Were all the humanities areas behaving similarly concerning the retraction phenomenon?

3. What were the main differences in citing retracted articles between STEM disciplines and the Humanities?

In this paper, we used a methodology developed to gather, characterize, and analyze incoming citations of retracted articles (Heibi & Peroni, 2021b) , and we adapted it for the case of the humanities 1 .

The workflow followed to gather and analyze the data in this study is based on the methodology introduced in (Heibi & Peroni, 2021b) , briefly summarized in Figure 1 . The first two phases of the methodology are dedicated to the collection and characterization of the entities that have cited the retracted articles. The third phase is focused in analyzing the information annotated in the first two phases to summarize quantitatively the data collected.

The fourth and final phase, applies a topic modeling analysis (Barde & Bainwad, 2017) on the textual information (extracted from the full text of the citing entities) and builds a set of dynamic visualizations to enable an overview and investigation of the generated topics. The data gathering of our study is detailed in the following sections. (1) identifying, retrieving, and characterizing the citing entities, (2) defining additional features based on the citing entities contents, (3) building a descriptive statistical summary, and (4) applying a topic modeling analysis

First, we wanted to have a descriptive statistical overview of the retractions in the humanities as a function of crucial features (e.g., reasons of retraction) to help us define the set of retractions to use as input in the next phases. Thus, we queried the Retraction Watch database (http://retractiondatabase.org) (Collier, 2011) searching for all the retracted articles labelled as humanities (marked with "HUM" in the database). Then we have classified the results as a function of three parameters: (a) the year of the retraction, (b) the subject area of the retracted articles (architecture, arts, etc.), and (c) the reason(s) for the retraction. We collected an overall number of 474 articles, the earlier retraction occurred in 2002, while the last year of retraction we obtained was 2020.

As shown in Figure 2 , we noticed an increasing trend throughout the years, with some exception, in particular, we observed that the highest number of retractions per year was 119 in 2010, probably due to an investigation and a massive retraction of several articles belonging to one author, i.e., Joachim Boldt (Brainard, 2018) . When looking at the subject areas, we noticed that most of the retractions are related to arts and history, while plagiarism motives 2 were by far the most representative ones, confirming the observation in (Halevi, 2020) .

Retractions in the humanities domain as a function of three features: the year of retraction (line chart), the subject areas of the retracted articles (ring chart), and the reasons for retraction (horizontal bar chart). Based on the data retrieved from the Retraction Watch database in June 2021.

Since the focus of our study is on the analysis of citations of retracted articles, we excluded all the retracted articles collected in the previous step that did not receive at least one citation

according to two open citation databases: Microsoft Academic Graph (MAG, https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/) (Wang et al., 2020) and OpenCitations' COCI (https://opencitations.net/index/coci) (Heibi et al., 2019) .

MAG is a knowledge graph that contains the scientific publication records, citations, authors, institutions, journals, conferences, and fields of study. It also provides a free REST API service to search, filter, and retrieve its data. COCI is a citation index which contains details of all the DOI-to-DOI citation links retrieved by processing the open bibliographic references available in Crossref (Hendricks et al., 2020) , and it can be queried using open and free REST APIs. We decided to not use other proprietary and non-open databases since we aimed at making our workflow and our results as reproducible as possible.

After querying COCI and MAG 3 , we found that 85 retracted items (out of 474) had at least one citation (2054 citations). We manually checked the dataset from possible mistakes introduced by the collections. Indeed, some of the citing entities identified in MAG either did not include a bibliographic reference to any of the retracted articles, or the retracted article in consideration was not cited in the content of the citing article (although present in its reference list), or the citing entities' type did not refer to a scholarly article (e.g., bibliography, retraction notice, presentation, data repository). There was also one retracted article that received 1,050 citations, that we decided to exclude from the study to reduce the bias in the results. Following these considerations, the final number of retracted articles considered was 84, involving a total number of 935 unique citing entities. As shown in the bubble chart in Figure 3 , most of the citing entities (i.e., 891) were included in MAG, 388

were included in COCI, and they shared 344 entities.

Although the retracted items identified so far were all in the humanities domain according to the categories specified in Retraction Watch, an item might have other non-humanities subjects associated with it. Sometimes, these non-humanities subjects might be more representative of the content of the retracted document and, thus, they might generate an unwanted bias for the rest of the analysis. For instance, consider the retracted article "The Good, the Bad, and the Ugly: Should We Completely Banish Human Albumin from Our Intensive Care Units?" (Boldt, 2000) . In Retraction Watch, the subjects associated with it were Medicine and Journalism. Yet, when we checked the full-text of the article, we noticed that the materials related to Journalism are very few and, as such, the article should not be considered as belonging to the humanities research.

To avoid considering these peculiar articles in our analysis, we devised a mechanism to help us evaluating the affinity of each retracted item to the humanities domain. We assigned to each retracted item in the list (84) an initial score of 1, named hum_affinity -this value ranges from 0 (i.e. very low) to 5 (i.e. very high). The final value of hum_affinity for each retracted item is calculated as follows:

1. We assigned to each retracted item additional subject categories obtained by searching the venue where it was published in external databases -we used Scimago classification (https://www.scimagojr.com/) for journals and the Library of Congress Classification (LCC, https://www.loc.gov/catdir/cpso/lcco/) for books/book chapters.

2. If both the Retraction Watch subjects and those gathered in step (1) included at least one subject identifying a discipline in the humanities, we added 1 to hum_affinity of that item.

3. If all the Retraction Watch subjects referred to disciplines in the humanities, we added another 1 to hum_affinity of that item.

4. If the title of the retracted item has a clear affinity to the humanities (e.g. "The Origins of Probabilism in Late Scholastic Moral Thought"), we added another 1 to hum_affinity of that item.

5. Finally, we provided a subjective evaluation ranging from -1 to 1 based on the abstract of the item.

The cake chart in Figure 3 shows how we classified the retracted articles and those citing them according to their hum_affinity score. To narrow our analysis and reduce the bias, we decided to consider only the retracted articles (and their corresponding citing entities) having a medium or high hum_affinity score (i.e., ≥ 2). At the end of this phase, the final number of retracted items we considered was 72, with a total of 678 citing entities. 

Once collected the 72 retracted items and their related 678 citing entities, we wanted to characterize such citing entities with respect to their (a) basic metadata and (b) full-text content.

We retrieved basic metadata via either COCI/MAG REST APIs, for each citing entity, i.e., DOI (if any), year of publication, title, venue id (ISSN/ISBN), and venue title. Then, using the Retraction Watch database, we annotated whether the citing entity was fully retracted as well.

We also classified the citing entities into areas of study and specific subjects, following the Scimago Journal Classification (https://www.scimagojr.com/), which uses 27 main subject areas (medicine, social sciences, etc.) and 313 subject categories (psychiatry, anatomy, etc.).

We searched for the titles and IDs (ISSN/ISBN) of the venues of publication of all the citing entities and classified them into specific subject areas and subject categories. For books/book chapters, we used the ISBNDB service (https://isbndb.com/) to look up the related Library of Congress Classification (LCC, https://www.loc.gov/catdir/cpso/lcco/), and then we mapped the LCC categories into a corresponding Scimago subject area using an established set of rules detailed in (Heibi & Peroni, 2021b) .

We extracted the abstract of each citing entity and all its in-text citations to the retracted articles in our set, marking the reference pointers to them (i.e., the in-line textual devices, e.g., "[3]" used to refer to bibliographic references), the section where they appear, and their citation context 4 . The citation context is based on the sentence that contains the in-text reference (i.e., the anchor sentence), plus the preceding and following sentences 5 . We annotated the first-level sections containing the in-text citation with their type using the categories "introduction", "method", "abstract", "results", "conclusions", "background", and "discussion" listed in (Suppe, 1998) if such section rhetoric was clear by looking at its title, otherwise we used other three residual categories, i.e., "first section", "middle section" and "final section" depending on their position in the citing article.

Then, we manually annotated each in-text citation with three main features: the citation sentiment conveyed by the citation context, whether the citation context mentioned the retraction of the cited article, and the citation intent. The annotation of the citation sentiment is inspired by the classification proposed in (Bar-Ilan & Halevi, 2017), and we marked each in-text citation with one of the following values:

• positive, when the retracted article was cited as sharing valid conclusions, and its findings could have been also used in the citing study;

• negative, if the citing study cited the retracted article and addressed its findings as inappropriate and/or invalid;

• neutral, when the author of the citing article referred to the retracted article without including any judgment or opinion regarding its validity.

Then, we annotated with yes/no each citing entity if any in-text citation context we gathered from it did/did not explicitly mention the fact that the cited entity was retracted. Finally, we annotated the intent of each in-text citation. The citation intent (or citation function) is defined as the authors' reason for citing a specific article (e.g., the citing entity uses a method defined in the cited entity). For labelling such citation functions we used those specified in the Citation Typing Ontology (CiTO, http://purl.org/spar/cito) (Peroni & Shotton, 2012) , an ontology for the characterization of factual and rhetorical bibliographic citations. We used the decision model developed and summarized in Figure 4 and already adopted in (Heibi & Peroni, 2021a) to decide which citation function select to label an in-text citation. We do not introduce the full details of the labelling process due to space constraints; an extensive introduction and explanation can be found in (Heibi & Peroni, 2021b) .

The decision model for the selection of a CiTO citation function for annotating the citation intent of an examined in-text citation based on its citation context. The first large row contains the three macro-categories: (1) "Reviewing", (2) "Affecting", and (3) "Referring". Each macro-category has at least two subcategories, and each subcategory refers to a set of citation functions. The first row defines what are the citation functions suitable for it through the help of a guiding sentence which needs to be completed according to the chosen sub-category and citation function.

We have produced an annotated dataset containing a total of 678 citing entities and 1020 intext citations to 72 retracted articles. We published a dedicated webpage (https://ivanhb.github.io/ret-analysis-hum-results/) embedding visualizations that enable the readers to view and interact with the results, also available in (Heibi & Peroni, 2021c) .

In the next sections, we introduce some important concepts adopted in the description and organization of our results, then we show the results of a quantitative and qualitative analysis of all the data we collected.

We defined three periods to distribute the citations to retracted articles:

• period P-Pre -from the year of publication of the retracted article to the year before its retraction (the year of the retraction is not part of this period).

• period P-Ret -the year of the full retraction. before the formal retraction of R. 6 A detailed explanation regarding the calculation of the periods is discussed in (Heibi & Peroni, 2021b) .

We have classified the distribution of the citing entities in the three periods (i.e. P-Pre, P-Ret,

and P-Post) as a function of the humanities disciplines used in Retraction Watch, as shown in Figure 5 . Religion was the discipline that received the highest number of citations (375),

while history had the highest number of retracted items (20).

In Figure 6 we have classified the entities citing a retracted article in each discipline according to their subject areas. Arts and humanities and Social Sciences (AH&SS) were highly represented in both the P-Pre and P-Post periods of almost all the retracted articles'

disciplines. However, we noticed some exceptions to this rule in P-Pre in Journalism (10% of citing entities were AH&SS publications), P-Post in Arts (13% AH&SS publications), and P-Pre and P-Post of Architecture (no AH&SS publications in both periods).

Since we expected, as also highlighted in previous studies (Ngah & Goi,1997) , that a good part of the citations to humanities articles should come from AH&SS publications, we decided to deepen into the obtained results before moving on the next stage. As shown in Public Health and Safety, and Sociology, therefore Journalism represents the only humanities subject. A further investigation in the full text of the paper revealed the fact that this article is highly related to health sciences, and Journalism has a marginal (almost absent) relevance in it. Considering these discovered facts, we felt that this article could represent a significant bias to our analysis. Therefore, to limit its impact on the results we decided to exclude it from our analysis.

As a further check, we have investigated all the retracted articles of all the humanities disciplines in Figure 6 having citations from Arts and Humanities publication less than 20% in either P-Pre or P-Post. Arts and Architecture are the two disciplines falling in this category. After a manual check, we detected the article "A systematic review on post-implementation evaluation models of enterprise architecture artefacts" (Nikpay et al., 2020) , classified under Architecture, yet while reading its full text we found little evidence supporting the proposed labelling, since it was a computer science study. Therefore, we decided to exclude also this article from our analysis. After this data refinement, our final data have been reduced to a total of 546 citing entities and 786 in-text citations to 70 retracted articles. We treated the citing entities and the in-text citations they contain as two different classes, and we present descriptive statistics of these two classes in the following sub-sections.

We examined the distribution of the citing entities to retracted articles as a function of two features: (1) the periods (i.e., P-Pre, P-Ret, and P-Post), further classified into those that mentioned the retraction or for which we could not access their full-text, and (2) their subject areas. The results are shown in Figure 7 .

A descriptive statistical summary for the distribution of the citing entities to retracted articles in the three periods (i.e., P-Pre, P-Ret, and P-Post), also considering their subject areas. The bar charts on top highlights the citing entities that either did/did not mention the retraction and those which we could not retrieve the full text.

The number of the citing entities before the retraction (192, period P-Pre) was lower than the number of the citing entities after the retraction (260, period P-Post). Along P-Pre and P-Ret,

we noticed a continuous increment in the overall number of citing entities, that suddenly started decreasing after the first fifth of P-Post, yet the numbers were in line with the ones observed in the third and fourth fifth of P-Pre. The last fifth of P-Post is an exception to the declining trend, with an unexpected high pick. This result was due to the fact that 27 retracted items received only one citation in P-Post and, in these cases, that citation always represented the last citation received, which is the final border of P-Post.

The full text of 8.42% of the citing entities was not accessible. For those that have successfully retrieved the full-text, our results showed that a relatively low percentage mentioned the retraction of the cited entity -2.25% of the total number of citing entities in P-Ret and P-Post.

Looking at their subject areas, we noticed that the citing entities started to spread into a higher number of subject areas (i.e., additional 9) in P-Post compared to P-Pre, where the residual category Others contained 16% of the citing entities. The Arts and Humanities subject area had a similar percentage throughout all the three periods (22.94%, 18.42% and 18.14%), and it represents, together with Social Sciences, the 2 most representative subject areas in P-Ret and P-Post. We also noticed an important drop-down in Psychology, from 15.41% in P-Pre to a 4.42% in P-Post.

We focused on the distribution of the in-text citations as a function of three features: (1) the periods (i.e., P-Pre, P-Ret, and P-Post), (2) the citation intent, and (3) the section containing the in-text citation. The results of the three distributions have been further classified according to the in-text citation sentiment (i.e., negative/neutral/positive), as shown in Figure   8 .

The overall trend in the number of in-text citations along the three periods was close to the one we observed for the citing entities (shown in the previous section), although the differences between P-Pre and P-Post were even more marked. As introduced in the previous section, the pick in the last fifth of P-Post was due to the retracted items receiving only one citation in P-Post. Even though the overall percentage of negative citations was low, it had a higher presence in P-Pre (4.5%). Generally, most in-text citations were tagged as neutral, and very few were positive (0.75%).

The citation intents "obtains background from" and "cites for information" were the two most dominant ones in the three periods, and they respectively represented 31.29% and 22.64% of the total number of in-text citations, respectively. The citation intent "cites for information" increased its presence moving from 17.8% in P-Pre to 27.20% in P-Post.

Considering the citation sections, we can clearly see that the in-text citations were mostly located in the "Introduction" section in all the three periods. The in-text citations in the section "Introduction" decreased a lot after P-Ret moving from 30.15% in P-Pre to 22.13%

in P-Post. Instead, the in-text citations contained in the section "Discussion" have an increasing trend, from 6.87% in P-Pre to 15.20% in P-Post.

A descriptive statistical summary for the distribution of the in-text citations contained in the citing entities to the retracted articles in the three periods (i.e., P-Pre, P-Ret, and P-Post), according to their intent, and section. The sentiment of the in-text citations is also highlighted.

A topic modeling analysis is a statistical modeling approach for automatically discovering the topics (represented as a set of words) that occur in a collection of documents. We used it with our data to understand what the evolution of the topics in time was and whether it was dependent, in some way, to the retraction received by the articles considered.

A standard workflow for building a topic model is based on three main steps: tokenization, vectorization, and topic model (TM) creation. The topic model we have built is based on the Latent Dirichlet Allocation (LDA) model (Jelodar et al., 2019) . In the tokenization process we have converted the text into a list of words, by removing punctuations, unnecessary characters, and stop words, and we also decided to lemmatize and stem the extracted tokens.

In the second step, we created vectors for each of the generated tokens using a Bag of Words (BoW) model (Brownlee, 2019) , which we considered appropriate to model our study considering our direct experience in previous findings (Heibi & Peroni, 2021a) and the suggestions by Bengfort et al. (2018) on the same issue. Finally, to build the LDA topic model, we determined in advance the number of topics to retrieve according to the examined corpus using a popular method based on the value of the topic coherence score, as suggested in (Schmiedel et al., 2019) , which can be used to measure the degree of the semantic similarity between high-scoring words in the topic.

We built and executed two LDA topic model, one using the abstracts of the entities citing the retracted articles (with 16 topics) -named TM-Abs, and another using the citation contexts where the in-text reference pointers to retracted articles were contained (with 20 topics)named TM-Cits. For creating the topic models, we used MITAO (Ferri et al., 2020) 

The total number of available abstracts in our dataset was 509. We extended the list of MITAO's default English stop-words (e.g., "the", "is", etc.) with ad-hoc stop-words devised for our study such as "method", "results", "conclusions", etc., which represents the typical words that might be part of a structured abstract. Figure 9 shows the topics represented in the two-dimensional space of LDAvis. Using

LDAvis interface, we set the parameter ! to 0.3 to determine the weight given to the probability of a term under a specific topic relative to its lift (Sievert & Shirley, 2014) , and retrieved the 30 most relevant terms of each topic. We gave an interpretation and a title to each topic by analyzing its related terms, that we avoid introducing here due to space limitation, but they are available in (Heibi & Peroni, 2021c) . Topic 6 ("Leadership organization, and management") was the dominant topic. The topics were distributed in four main clusters, as shown in Figure 9 :

• one composed by topics 2 ("Socio-political issues related to leadership") and 6, concerned issues related with leadership, work organization, and management form a socio-political point of view;

• a large one composed by topics 1("Socio-political issues possibly related to Vietnam"), 4 ("History of the Jewish culture"), 5 ("Music and psychological diseases"), 11("Family and religion"), etc. Treat several subjects from different domains close to social sciences, political sciences and psychology;

• other two clusters composed by one topic each, i.e. topic 16 ("Geography and climatic issues") and topic 3 ("Colonial history"). Figure 10 shows the chart generated using MTMvis. We plotted the topics distribution as a function of the three periods. At a first analysis, we noticed how topics 6 and 16 incremented their distribution along the three periods. On the other hand, topics 1 and 11 decreased their percentage throughout the three periods. 

The total number of in-text citation contexts in our dataset, we used as input to produce the second topic model, was 786. As we did with the abstracts, we have defined and used a list of ad-hoc stop-words, which included all the given and family names of the authors of the cited articles. Figure 11 shows the topics represented in the two-dimensional space of LDAvis. As we did for the abstracts' topic modeling, we set ! to 0.3 and interpreted each topic by analyzing its 30 most relevant terms (Heibi & Peroni, 2021) . In this case, we noticed that the topics are less overlapping and more distributed along all the axis of the visualization. Topic 12

("Leadership organization, and management") is the most representative (11.7%) and was very distant from the other topics. The bottom right part of the graphics -with topics 2 ("Countries in conflict"), 15 ("War and terrorism"), 17 ("War and history"), 18 ("History of Europe"), 20 ("War and army conflicts") -are mostly close to the history studies, especially discussion toward army conflicts. The part on the top of the graphics contains several singletopic clusters, such as topic 5 ("Gender social issues") and 9 ("Geography and climatic issues"). Figure 12 shows the chart generated using MTMvis, where we plotted the topic distribution as a function of the three periods. We noticed a continuous decrement in topics 7 ("Family and religion") and 18 along the three periods. Topic 3 ("Drugs/Alcohol and psychological diseases") had a high decrement right after P-Ret. On the other hand, we noticed an increment in topics 5, 9, and 11 ("Music and psychological diseases") -although the latter topic had a higher percentage in P-Ret than in P-Post. Figure 11 . The 20 topics of TM-Cits. The visualization is taken from LDAvis, it shows the topic distribution in a two-dimensional space. 

In this section, we address separately each of our research questions RQ1-RQ3 presented in Section "Introduction". We conclude the section by discussing the limits of our work and by sketching out some future works that might help us overcome these issues.

Answering RQ1: citing retracted articles in the humanities As shown in Figure 7 , it seems that, on average, the articles in the humanities does not have a drop of citations after their retraction. Indeed, citing retracted articles is not prohibited and is reasonable when the retracted article is subject of a study, such as in this article. However only 2.25% of the citing entities from P-Ret and P-Post -5 arts and humanities publications and 3 related to health sciences subject areas (e.g., medicine, psychology, nursing, etc.)mentioned the retraction in the citation context. Still, we noticed that the negative perception of a retracted work, although limited in the data we have, happened before their retraction if the cited entity had a low affinity (i.e., hum_affinity = 2) with the humanities domain. The fact that we reported few negative citations in P-Post is along the line of other studies (Bordignon, 2020; Schneider et al, 2020; Luwel et al., 2019) .

The results shown in Figure 15 suggested that the citing entities talking about retraction usually discussed the cited entity rather than obtaining background material from it or generic informative claims. In addition, most of the in-text citations marked as discusses occurred in the Discussion section, which explains the higher distribution of this section in P-Ret and P-Post (as shown in Figure 14 ). From TM-Cits we noticed the emerging of topic 6 ("The retraction phenomenon") in Discussion sections only in P-Post -in other words, the retraction was not mentioned in the Discussion section before the retraction, and the retraction event might have been the trigger of a higher discussion from the citing entities.

From the distribution of the subject areas of the citing entities over the three periods ( Figure   7 ), we noticed that social sciences and arts and humanities had almost the same percentages in the P-Ret and P-Post periods, which is less than their percentages in P-Pre, suggesting that the retraction event did have an impact on these subject areas. However, other subject areas such as psychology decreased in P-Ret and P-Post, that may be an indicator of a higher concern by these subject areas toward the citation of retracted articles. In addition, from the topics distribution of TM-Abs over the three periods for the citing articles assigned with the subject area psychology (shown in Figure 13 ), there was a clear decrement in the topics related with health sciences such as topics 10 and 11, while others such as topics 6 and 9

(close to socio-historical discussions with no relation to health sciences) increased their presence in P-Ret and P-Post. In other words, not only the overall number of the citing entities from the health sciences domain decreased after the retraction, but their subject areas moved from the health sciences domain to subjects that are closer to the social sciences and arts and humanities domain. Answering RQ2: citation behaviors in the humanities As shown in Figure 6 , religion and history had a very similar distribution pattern, in both, the citing entities from the subject area social sciences had an important decrement in P-Post.

After P-Ret the TM-Cits of these entities does not include topic 3 ("Drugs/Alcohol psychological diseases") for religion and topic 7 ("Family and religion") for history. We can speculate that social sciences studies significantly reduced its percentage due to a higher concern toward sensitive social subjects such as health care, family, and religion.

Arts had the highest number of citations in P-Post, although we reported an important drop in the arts and humanities citing entities, in favor of subject areas such as medicine, nursing and engineering, as shown in Figure 6 . On the other hand, for philosophy, we had a completely different situation: citing entities labeled as arts and humanities incremented a lot in P-Post at the expense of citing entities from Psychology. For the arts discipline, topic 11 ("Music and psychological diseases") of TM-Cits is the reason for the positive trend of P-Post. In other words, arts (and especially music) had been discussed with relation to psychological and medical diseases.

In Figure 16 , we show the distribution of topic 6 ("The retraction phenomenon") as a function of the three periods and considering the four humanities disciplines with the higher number of citing entities. Topic 6 increased a lot in P-Post in philosophy, in religion it had a steady trend, while in history and arts had the pick in P-Ret and had a lower, yet relatively high, percentage in P-Post. These results might suggest that the entities which cite retracted articles in philosophy, arts, and history, were those showing major concerns toward the retractionin the case of history and arts, starting from the year of the retraction. Answering RQ3: comparing STEM and the humanities As shown in Figure 7 , the retraction of humanities articles had positive impact on the citation trend since they increased citations after P-Ret. The opposite trend was observed in other disciplines, according to prior studies, such as biomedicine (Dinh et al., 2019) and

psychology . However, prior studies such as (Heibi & Peroni, 2021a) and (Schneider et al., 2020) , also observed that in the health sciences domain were cases where either a single or a few popular cases of retraction were characterized by an increment of citations after the retraction. This might suggest that the discipline related to the retracted article is not the only central factor to consider for predicting the citation trend after the retraction, and that other factors might play a crucial role, such as the popularity and the media attention to the retraction case, as it has been discussed in the studies by Mott et al. (2019) and Bar-Ilan and Halevi (2017) .

Another work by Bar-Ilan and Halevi (2018) analyzed the citations of 995 retracted articles and found the same growing trend in the citations in the post retraction period. However, they did not analyze the retraction according to different and separate disciplines. As such, we might consider such results as a representation of a general trend of retracted articles, that confirms the general observations we derived from our data.

Our findings showed that few citations in P-Ret and P-Post (2.45%) have mentioned the retraction of the cited entity in the in-text citation context, which is close to the findings in (Schneider et al., 2020) and (Bar-Ilan & Halevi, 2017) . Our findings are yet far from the results we obtained in analyzing a popular case of retracted article in medicine (Heibi & Peroni, 2021a) , where 38% of the article citing the retracted one mentioned explicitly the retraction. However, we think that, in the case analyzed in (Heibi & Peroni, 2021a) , such percentage was largely influenced by the popularity of the retracted article.

There are, indeed, some limitations in our studies that may have introduced some biases.

First, compared to other fields of study, bibliographic metadata in the humanities have a limited coverage in well-known citation databases (Hammarfelt, 2016) . This fact led to some limitations when applying a citation analysis in the humanities domain (Archambault & Larivière, 2010) . Pragmatically, for what our study was concerned, we surely collected fewer citing entities than those that had in fact cited the retracted articles. The availability of a larger amount of data could have strengthened and improved the quality of our results.

The selection of the retracted articles was another crucial issue since we faced two major problems: (1) some inconsistencies in the data provided by Retraction Watch, and (2) the presence of retracted articles labeled as humanities that, at a close analysis, actually belonged to a different discipline. The first descriptive statistical results, our manual check, and the definition of the humanities affinity score, helped us limit the biases of these two issues.

However, we can improve the approach adopted by using additional services such as

Elsevier's ScienceDirect -as done in (Bar-Ilan & Halevi, 2018) -and to high up the threshold of the humanities affinity level to exclude border cases A citation analysis toward the retraction in the humanities domain is something that have been rarely discussed in the past, therefore the discussion of our results included a comparison with similar works which considered also different domains or retraction cases.

Such works have not addressed the humanities domain or were based either on a single or a limited set of retraction cases. Works which considered other domains did not include most of the features that we have analyzed in this work. e.g., the citation intent, which made the comparison with them difficult. We aim that this study and others to be done in this field can favor a comparison and improvement in the understanding of the retraction phenomenon in the humanities domain.

The limits of bibliometrics for the analysis of the social sciences and humanities literature

On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations

Advances in Knowledge Discovery and Data Mining

World Map of Scientific Misconduct

The career effects of scandal: Evidence from scientific retractions

Guidelines for retracting articles

An overview of topic modeling methods and tools

Post retraction citations in context: A case study

Temporal characteristics of retracted articles

Applied text analysis with Python: Enabling language-aware data products with machine learning

The Good, the Bad, and the Ugly: Should We Completely Banish Human Albumin from Our Intensive Care Units?

Citation of retracted publications: A challenging problem

Self-correction of science: A comparative study of negative citations and post-publication peer review

Perpetuation of Retracted Publications Using the Example of the Scott S. Reuben Case: Incidences, Reasons and Possible Improvements

Characterizing in-text citations in scientific articles: A large-scale analysis

What a massive database of retracted papers reveals about science publishing's 'death penalty

A Gentle Introduction to the Bag-of-Words Model

Retraction of publications: A study of biomedical journals retracting publications based on impact factor and journal category

Does retraction after misconduct have an impact on citations? A pre-post study

Sources of error in the retracted scientific literature

Termite: Visualization techniques for assessing textual topic models

Shedding light on retractions

Misconduct is the main cause of life-sciences retractions

Systematic examination of pre-and post-retraction citations

Retracted Science and the Retraction Index

An observation framework for retracted publications in multiple dimensions

MITAO: A User Friendly and Modular Software for Topic Modelling

Self-correction in biomedical publications and the scientific impact

Trends and Characteristics of Retracted Articles in the Biomedical Literature

Personality, Stress and Disease: Description and Validation of a New Inventory

Why Articles in Arts and Humanities Are Being Retracted? Publishing Research Quarterly

Beyond Coverage: Toward a Bibliometrics for the Humanities

Research Assessment in the Humanities

A qualitative and quantitative analysis of open citations to retracted articles: The Wakefield 1998 et al.'s case

A protocol to gather, characterize and analyze incoming citations of retracted articles

Inputs and results of "A quantitative and qualitative citation analysis to retracted articles in the humanities domain

Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

Crossref: The sustainable source of community-owned scholarly metadata

Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey

The Retraction Penalty: Evidence from the Web of

The Schön case: Analyzing in-text citations to papers before and after retraction

Costly collaborations: The impact of scientific fraud on co-authors' careers: Costly Collaborations: The Impact of Scientific Fraud on Co-Authors' Careers

Assessing the impact of retraction on the citation of randomized controlled trial reports: An interrupted time-series analysis

Characteristics of citations used by humanities researchers

RETRACTED ARTICLE: A systematic review on post-implementation evaluation models of enterprise architecture artefacts

COCI CSV dataset of all the citation data (p. 18077041949 Bytes) [Data set

FaBiO and CiTO: Ontologies for describing bibliographic resources and citations

Using TF-IDF to Determine Word Relevance in Document Queries

Topic Modeling as a Strategy of Inquiry in Organizational Research: A Tutorial With an Application Example on Organizational Culture

Continued post-retraction citation of a fraudulent clinical trial report, 11 years after it was retracted for falsifying data

A Multidimensional Investigation of the Effects of Publication Retraction on Scholarly Impact

LDAvis: A method for visualizing and interpreting topics

The structure of a scientific paper

Propagation of errors in citation networks: A study involving the entire citation network of a widely cited paper published in, and later retracted from, the journal Nature

Microsoft Academic Graph: When experts are not enough

How Do Retractions Influence the Citations of Retracted Articles

An alarming retraction rate for scientific publications on Coronavirus Disease 2019 (COVID-19)

A heuristic approach to determine an appropriate number of topics in topic modeling