There's an App for That
Tree of Science with Scopus: A Shiny Application
Sebastian Robledo
Professor
Universidad Católica Luis Amigó
sebastian.robledogi@amigo.edu.co
Martha Zuluaga
Professor
Universidad Nacional Abierta y a Distancia
martha.zuluaga@unad.edu.co
Luis Alexander Valencia
Researcher
Core of Science
lavalenciah12@gmail.com
Oscar Arbelaez-Echeverri
Researcher
Core of Science
technology@coreofscience.org
Pedro Duque
Professor
Universidad Católica Luis Amigó
pedro.duquehu@amigo.edu.co
Juan David Alzate-Cardona
Software Engineer
Hourly, Inc.
juanda@hourly.io
Abstract
Tree of Science (ToS) is a scientific literature search tool that produces a small, selected list of citations from a larger pool of citations. Initially developed for searches in the Web of Science, this paper shows how to use it with bibliographic data from Scopus. This new Shiny web application analyzes data from Scopus. It processes a dataset from a Scopus search and creates three reports. The first one shows a descriptive analysis, the second one presents the Tree of Science of the search, and the third one presents a clustering analysis of the three main subtopics. The application is accessible from this link: https://coreofscience.shinyapps.io/scientometrics/.
Keywords: Tree of Science, Scientometrics, Scopus
Recommended citation:
Robledo, S., Zuluaga, M., Valencia, L.A., Arbelaez-Echeverri, O., Duque, P., & Alzate-Cardona, J.D. (2022). Tree of Science with Scopus: A shiny application. Issues in Science and Technology Librarianship, 100. https://doi.org/10.29173/istl2698
Introduction
Researchers and librarians can access millions of research papers. However, processing, selecting, and understanding the content of this data is a difficult and time-consuming task. Therefore, it is essential to use technology to identify the most relevant academic literature. There are several tools, and most of them are split between the point and click interface and code interface. Some examples of software point and click interfaces are CiteSpace (Chen, 2006), VOSviewer (van Eck & Waltman, 2010), and SciMAT (Cobo et al., 2012). However, the most popular programming languages for scientometric analysis are R and Python. Both have specialized packages; for example, R has bibliometrix (Aria & Cuccurullo, 2017) and litsearchr (Grames et al., 2019). Examples in Python are ScientoPy (Ruiz-Rosero et al., 2019) and metaknowledge (Evans & Foster, 2011).
The ToS algorithm creates a citation network and applies graph metrics to identify papers located in the roots, trunk, and leaves; for a detailed explanation, see Valencia-Hernandez et al. (2020). ToS has been widely applied in research topics such as entrepreneurship (Robledo et al., 2021), chemistry (Durán-Aranguren et al., 2021), management (Duque et al., 2021), and medicine (Gonzalez-Correa et al., 2022).
Scopus Search
The first step to creating the ToS of a research topic is searching the Scopus database. Figure 1a presents an example with the word scientometrics. In this case, here are 589 results from the search, see Figure 1b. This number is vital because ToS works best with a number of records between 100 and 600. A minimum number of records (100) is needed to create a citation network; a lower number generates dispersed networks (Pornprasit et al., 2022). A maximum number of about 600 records is due to the limited memory of Shiny apps (1024 MB); lower specificity will hinder the performance of the algorithm. In the last step, the user must select the BibTeX file, and all the parameters shown in Figure 1c. The “include references” item is key for creating the citation network.
ToS in a Shiny App
Shiny is an open-source framework to create web apps directly from R (Chang et al., 2017), and these apps can be uploaded to shinyapps.io to be accessed through a link. Also, shiny developers do not need previous knowledge of JavaScript or HTML to create useful and user-friendly apps. Shiny is used for academics to visualize their research; for professors to teach statistical concepts and big companies in the tech and pharma industry (Wickham, 2021). Some examples of shiny apps are PeptCreatR (Arumugaperumal et al., 2022) and DiaThor (Nicolosi et al., 2022).
Figures 2a-e show the steps for creating the ToS from a Scopus search. Once the user has the BibTeX file from Scopus (the seed of ToS), the user can move forward to the ToS Shiny app following this link https://coreofscience.shinyapps.io/scientometrics/. The browse button in Figure 2a opens a new window to upload the BibTeX file. Once the blue bar is completed, Figure 2b, the user can visualize a descriptive analysis in the Importance button, see Figure 2c. This descriptive analysis has the scientific production published each year and the most productive authors and journals. This report is created with the bibliometrix package (Aria & Cuccurullo, 2017).
The Evolution - ToS button presents the papers located in the roots, trunk, and leaves, see Figure 2d. Papers in the roots are seminal, papers in the trunk give structure to the research topic, and papers in the leaves are the current literature. The link buttons take the user to a search in Google with the preliminary information from the paper. For example, the seminal papers in scientometrics are Egghe (2006), Garfield (1955), and Hirsch (2005). Egghe (2006) proposed a new index called g-index to improve the famous h-index proposed by Hirsch (2005) and Garfield (1955) was the creator of the Institute of Scientific Information (ISI), nowadays known as Web of Science.
Finally, Figure 2e shows a clustering analysis of the main subtopics. This cluster analysis uses the Blondel et al. (2008) algorithm in the citation network. The Shiny app presents the biggest three clusters (or subtopics) of the seed (research topic) with a word cloud figure to understand the topic of each cluster. The user can change the features of the word cloud, for example, the number of words, their frequency, and remove the unnecessary words.
Discussion and Conclusions
ToS was developed as a part of a doctorate thesis, and later the creators decided to start a non-profit organization called Core of Science. The web tool was initially developed with WoS data; however, Scopus is also an important database often available in academic libraries. ToS uses the metaphor of the tree to present the most significant papers from the results in this case obtained from Scopus. Creating a web-based tool is expensive, and most of the time, users must pay this cost. The purpose of the Core of Science is “connecting people through sharing knowledge”; thus, one of the activities is to create free web-based tools for librarians and researchers to help them automate some processes. In this vein, this paper presents a new Shiny app that creates a scientometric analysis to have an overall view of a research topic.
One of the big challenges to creating a citation network with Scopus data is creating a unique identifier of each article and its references. Both should match with other papers in the same search. WoS data has a standard identifier for references, making it more accessible. Also, the references have their DOIs, which facilitates the match among the references and the primary papers.
A limitation of this study is that the ToS algorithm was designed for WoS data, but Scopus data is spread across a broader range of time which implies that some old papers will appear in the trunk because of their publication year. A further improvement of the ToS algorithm could take into consideration this feature in Scopus.
More information about Core of Science is found at: https://coreofscience.org/.
References
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007
Arumugaperumal, A., Velayudhan Krishna, D., Alaguponniah, S., Nallaperumal, K., & Sivasubramaniam, S. (2022). PeptCreatR: A web app for unique peptides in human. International Journal of Peptide Research and Therapeutics, 28(2), 64. https://doi.org/10.1007/s10989-022-10375-4
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Chang, W., Cheng, J., Allaire, J., Xie, Y., & McPherson, J. (2017). Shiny: Web application framework for R (R Package Version 1.5) [Computer software]. R Studio. https://rdrr.io/cran/shiny/
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377. https://doi.org/10.1002/asi.20317
Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609–1630. https://doi.org/10.1002/asi.22688
Duque, P., Meza, O. E., Giraldo, D., & Barreto, K. (2021). Economía social y economía solidaria: Un análisis bibliométrico y revisión de literatura. REVESCO. Revista de Estudios Cooperativos, 138, e75566–e75566. https://doi.org/10.5209/reve.75566
Durán-Aranguren, D. D., Robledo, S., Gomez-Restrepo, E., Arboleda Valencia, J. W., & Tarazona, N. A. (2021). Scientometric overview of coffee by-products and their applications. Molecules, 26(24), 7605. https://doi.org/10.3390/molecules26247605
Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69(1), 131–152. https://doi.org/10.1007/s11192-006-0144-7
Evans, J. A., & Foster, J. G. (2011). Metaknowledge. Science, 331(6018), 721–725. https://doi.org/10.1126/science.1201765
Garfield, E. (1955). Citation indexes for science. Science, 122(3159), 108–111. https://www.jstor.org/stable/1749965
Gonzalez-Correa, C.-A., Tapasco-Tapasco, L.-O., & Gomez-Buitrago, P.-A. (2002). A method for a literature search on microbiota and obesity for PhD biomedical research using the Web of Science (WoS) and the Tree of Science (ToS). Issues in Science and Technology Librarianship, 99. https://doi.org/10.29173/istl2679
Grames, E. M., Stillman, A. N., Tingley, M. W., & Elphick, C. S. (2019). An automated approach to identifying search terms for systematic reviews using keyword co-occurrence networks. Methods in Ecology and Evolution, 10, 1645–1654. https://doi.org/10.1111/2041-210x.13268
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102
Nicolosi Gelis, M. M., Sathicq, M. B., Jupke, J., & Cochero, J. (2022). DiaThor: R package for computing diatom metrics and biotic indices. Ecological Modelling, 465, 109859. https://doi.org/10.1016/j.ecolmodel.2021.109859
Pornprasit, C., Liu, X., Kiattipadungkul, P., Kertkeidkachorn, N., Kim, K.-S., Noraset, T., Hassan, S.-U., & Tuarob, S. (2022). Enhancing citation recommendation using citation network embedding. Scientometrics, 127(1), 233–264. https://doi.org/10.1007/s11192-021-04196-3
Robledo, S., Grisales Aguirre, A. M., Hughes, M., & Eggers, F. (2021). “Hasta la vista, baby” – will machine learning terminate human literature reviews in entrepreneurship? Journal of Small Business Management, 1–30. https://doi.org/10.1080/00472778.2021.1955125
Ruiz-Rosero, J., Ramirez-Gonzalez, G., & Viveros-Delgado, J. (2019). Software survey: ScientoPy, a scientometric tool for topics trend analysis in scientific publications. Scientometrics, 121(2), 1165–1188. https://doi.org/10.1007/s11192-019-03213-w
Valencia-Hernandez, D. S., Robledo, S., Pinilla, R., Duque-Méndez, N. D., & Olivar-Tost, G. (2020). SAP algorithm for citation analysis: An improvement to Tree of Science. Ingeniería E Investigación, 40(1), 45–49. https://doi.org/10.15446/ing.investig.v40n1.77718
van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538. https://doi.org/10.1007/s11192-009-0146-3
Wickham, H. (2021). Mastering shiny: Build interactive apps, reports, and dashboards powered by R. O’Reilly.
This work is licensed under a Creative Commons Attribution 4.0 International License.