key: cord-0057734-xofzc6qi authors: Karimova, Yulia; Ribeiro, Cristina; David, Gabriel title: Institutional Support for Data Management Plans: Five Case Studies date: 2021-02-22 journal: Metadata and Semantic Research DOI: 10.1007/978-3-030-71903-6_29 sha: f6880af77e06dddffd0cae101a0aeaee5c665203 doc_id: 57734 cord_uid: xofzc6qi Researchers are being prompted by funders and institutions to expose the variety of results of their projects and to submit a Data Management Plan as part of their funding requests. In this context, institutions are looking for solutions to provide support to research data management activities in general, including DMP creation. We propose a collaborative approach where a researcher and a data steward create a DMP, involving other parties as required. We describe this collaborative method and its implementation, by means of a set of case studies that show the importance of the data steward in the institution. Feedback from researchers shows that the DMP are simple enough to lead people to engage in data management, but present enough challenges to constitute an entry point to the next level, the machine-actionable DMP. The 17 Sustainable Development Goals [24] require the development of information infrastructures, directed to sharing and reusing data [9] that contribute to reproducible research, advance science and foster collaboration, while promoting researchers' work [17] . Along with this, Research Data Management (RDM) activities also became important in the daily work of researchers. Thus, both the SDG and the management of research projects require an alignment with RDM best practices and recommendations, as proposed by initiatives for Open Data such as the Research Data Alliance (RDA) [18] . In this context, data should be described with as much detail as possible and conform to the FAIR principles [26] . Moreover, with new RDM and funder requirements for grant applications, researchers need adequate, user-friendly tools to help them from the early stages of their projects, namely on the creation of Data Management Plans (DMP), as they are now required for most grant applications [3, 9] . A good DMP must include detailed information about data management and preservation during and after the project. Moreover, the context of the project, people in charge of RDM, and possible ethical and legal issues are also part of a good plan [22, 25] . A DMP can be regarded as a living document that is useful for structuring the course of research activities, integrating with other systems and workflows [22] , and leading the entire strategy of the project [4] . However, the creation of a DMP requires some effort, specific knowledge, some data publication experience and appropriate tools [20] . This is why many institutions are looking for solutions to help researchers in DMP creation and RDM activities in general [2, 5, 27] . At the Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), under the TAIL project [19] , we collaborate with researchers from different scientific domains, analyzing the difficulties they face in RDM activities and DMP creation. This led us to build an RDM workflow, taking into account researchers' needs and existing RDM and funder requirements, while exploring the integration of the available RDM tools and services, and developing our own. The collaborative DMP-building method is part of a workflow with the overall objective of improving the DMP quality while reducing the effort and time required from researchers on the creation of the detailed plan [11] , and will be described and illustrated with a set of case studies. Currently, institutions have different ways to support researchers on RDM activities during the life cycle of their project, seeking both the engagement of researchers and the establishment of RDM infrastructures according to their needs. The consultation services and training sessions that provide an overview of metadata, data standards, ethical issues and data repositories are the most common initiatives to support researchers with DMP creation. The provision of lists of data repositories, license models and guidelines, the explanation of the funding requirements, the creation of DMP templates and the review of DMP written by the researchers are also proposed as institutional support. Examples of support materials are the Guide to DMP used at the Digital Repository of Ireland 1 and the MOOC [16] and DMP template 2 created for the University of Edinburgh. Libraries are reported as one of the services where researchers can ask for RDM support [1, 5, 6, 23] . Although libraries are still in the early stages of connecting to RDM infrastructures and need staff with RDM skills, this represents an expansion of their traditional mission. Some of them already provide consultation services and include RDM support as a new internal working group [6] . Meetings at researcher's office, lab, or other location, attending researchers on the library, providing feedback by email forms, and collaborating with the departmental grant administrators and project managers, are some specific ways to support researchers at libraries [6] . Another source of support are the IT departments at universities and research centres. Partnerships between libraries and other hired support teams have become common and their goal is to develop new services to inform, train and support researchers [1, 5, 6, 21] . At the University of Melbourne, for example, there is a digital scholarship program [8] , while Cornell University implemented the Research Data Management Service [13] , the University of Glasgow [7, 15] developed a system for contacting researchers with approved projects and controlling requests for RDM support with automatically generated email if a DMP is required, and the University of Sydney organised eResearchUnit 3 that sends researchers a pre-filled DMP template, based on the abstract of the funded grant application. Despite the existence of different tools and workflows to help researchers in the creation of DMP, all of them aim to facilitate the researcher's daily work, to simplify the process of DMP creation, to decrease the time spent on it and to improve the quality of the results. The work developed in this area showed that the diversity of scientific domains and respective plans requires people in charge of RDM support, able to help with a multitude of requirements [15] . An RDM workflow has to take into account researchers' needs and institutional and funder requirements. The set of tools illustrated in Fig. 1 covers important stages of the data lifecycle. The DMPOnline tool 4 is used to create the plans, the LabTablet 5 for data collection, the Dendro platform 6 for data organization and description, and the INESC TEC research data repository 7 [10, 12] for data publishing. Given the requirement of DMP submission with grant applications, we proposed a collaborative method between data stewards and researchers in the preparation of a DMP [11] . The method has been tested with researchers from different scientific domains and includes several activities (see Fig. 2 ). First of all, the data steward makes an interview to understand the data and how they can be managed. Typically, researchers don't understand how they can organize the data. Another common issue is the existence of sensitive, private, or personal data. In one of the cases, researchers were not aware that the data collected by interviews involved personal data, with specific management requirements. This interview also collects information about publications and published data related to the project, if any. This helps to identify data repositories and metadata standards that are more appropriate for the project. After that, the data steward proceeds with an analysis of the existence of sensitive, private or personal data that may be collected during the project, and therefore the necessity of a Data Protection Impact Assessment (DPIA). In cases where a DPIA is required, besides the examples of DMP, the data steward also surveys DPIA examples for the corresponding domain. The next step of the method involves the analysis of prior publications related to the project, by the researchers in the team. This step helps the steward find detailed information about methodology, software, type and names of the instruments that can be used on the project and suggest what documents can be created besides datasets, for example agreements between partners, that also need to be preserved or published. The same analysis occurs with research data related to the project and data description requirements. The data steward proposes an appropriate metadata scheme, assesses the amount of space required on the repository and what formats and file types will be used. As a result, the necessary information is collected and the first version of the DMP is created by the data steward. In some cases, interaction with the Ethics Committee is promoted. And interaction with the Data Protection Officer (DPO) is proposed in case a DPIA is required. At the next step, the first DMP is presented to the researchers for validation and improvement. This step also includes clarification of the authorship and ownership of the data, and possible embargo periods, which in turn may involve iterations between project partners. Neither the DMP (nor the DPIA in case there is one) are public at this point. They will be open to the public only after the authorization for publication of the final version DMP by all the project partners. The project leaders and the data steward will decide where the DMP will be published, for example through DMPOnline, or on Zenodo. After the DMP publication, it is added as a formal project document for further monitoring. The data steward recommends that researchers keep the DMP synchronized with changes occurring in the project, regarding it as a "living document". INESC TEC is a research institute with over 700 researchers, from different areas that range from energy to computer science, from manufacturing to communications. For more than 5 years, INESC TEC has been nurturing experimental activities in RDM, partly as a research endeavour, but having in mind the development of new services to support RDM in the context of running projects and to expose datasets in the institutional data repository. The strategy to support DMP creation is part of this commitment, and currently involves 2 part-time people (one data steward and one repository manager) that promote awareness of RDM and process requests for DMP. The DMP creation starts with the request sent by email to the INESC TEC data steward. The collaborative method is therefore tested on real cases, allowing us to evaluate the method in different scientific domains and identify any specific requirements. The DMP is typically created at the beginning of the project, but sometimes also halfway through the project and very rarely at the end. At the moment, eight plans were created for projects in Environmental radioactivity, Biodiversity, Education, Oceanography, Psychology, Environmental engineering, Health and Statistics domains. Some of them are complete and published, some are in preparation or in monitoring. In the following we go into the details of the DMP-creation process for 5 case studies, highlighting aspects that may be transferable to other cases. These 5 cases are in an advanced stage of preparation and give us more information for analysis. This project was focused on the study of the concentration of the noble gas radon (Rn-222); the aim is to examine how meteorological conditions influence it, how it impacts the local atmospheric electric field and its association with the atmosphere's ionization and aerosol concentration 8 . The Principal Investigator (PI) of this project already had experience in RDM activities, but not on the creation of the DMP. The data steward held an interview with the PI, to understand the context of the project and to collect data and papers related to the project. After the interview, their analysis continued, identifying the absence of sensitive data, and studying specific requirements for data management and description in this domain. At the same time, several examples of plans in similar domains were analyzed, as well as some DMP templates, and the first version of the DMP was created. A list with specific questions for verification and confirmation with the PI was created on the second meeting. Not all questions from the list were used on the DMP, however many of them helped to add more specific details about tools used during the project, their calibration method, specific software used for processing, data analysis, measurement method, detailed description of the data transfer process from the station, and even information about what happens to the project data in case the PI leaves. This DMP was created on the final phase of the project and does not require monitoring 9 . The FARSYD project 10 aims to examine the relation between farming systems, biodiversity and ecosystem services in high nature value farmlands. The PI had little experience in RDM, and for this project she created an Excel file with the detailed description of each experience. This file and description of the project helped the data steward understand the context of the project and identify the existence of private and sensitive data that cannot be publicly exposed. Following the collaborative methodology, the data steward analysed all obtained information, collected examples of DMP in the biodiversity domain, and experimented with the GFBio DMP Tool 11 and the Best Practice Guide [14] , promoted by the German Federation for Biological Data. This analysis helped to prepare the new list with specific questions that were validated with the PI. Two existing checklists were verified: one prepared for the Environmental Radioactivity and the one specific to the Biodiversity domain. Although the questions from the first list are not immediately applicable to this plan, some of the points were used with adaptation. This led to the inclusion of the description of the specific tools used during the project, specific software, training areas for habitat mapping, several approaches to obtain data, depending on the specific target, and the location. Moreover, this project contained private data provided by the Portuguese Institute of Financing Agriculture and Fisheries, the Integrated Administrative and Control System and the Land Parcel Information Systems, so the data steward helped to add detailed information on the management of this kind of data and the corresponding preservation rules with restrictions and different access levels. The first version of the DMP was created halfway through the project, and was improved, detailed and publicly shared through DMPOnline 12 . Monitoring and improvement of DMP happened twice since the plan's publication and the last monitoring action was scheduled for August 2020. The project SCReLProg 13 aims to develop a pedagogical approach to overcome programming difficulties and effective strategies for self-and coregulation of e-learning. The researchers of this project did not have experience with RDM. They sent, together with the request for support, a lot of documentation about the project before the first meeting with the data steward. These elements were valuable to provide an overview of the project, prepare a list of questions, identify the existence of personal data and prepare information related to informed consent, ethics committee approval and the need for a DPIA, according to the General Data Protection Regulation. All meetings were conducted via Skype with the researcher in charge of RDM, not the PI of the project. The version for publication of the DMP 14 was shared, corrected and approved by the researchers in charge of project activities. Six months after the start, it was publicly open and the first monitoring session scheduled for March 2020. The specificity of this DMP lied in the necessity of the DPIA, that was also created as a collaboration between the data steward and the institution's DPO. The DMP process also led to the correction of the existing informed consent form, the detailed analysis of all tools, software and data collection methodologies, and the inclusion of recommendation for processing and preservation of personal data. Due to the small risks and threats estimated by the DPO, the publication of the DPIA was replaced by a signed agreement between the project partners. The data steward suggested changing the Google Drive storage by the institutional Drive at INESC TEC 15 , and Google Forms by UESurvey. Moreover, the DMP creation helped prevent complicated situations related to personal data before data collection, preparation and sharing, and provided the project team with all the required documentation. The next DMP monitoring was scheduled for August 2020. Oceanography: The DMP of the SAIL project 16 is the most complex and detailed plan created at INESC TEC, due to the diversity of data, tools, software, internal procedures and the number of institutions involved. Moreover, this is the first plan that comprises several scientific domains: biodiversity, oceanography and robotics. This DMP is not finished yet; it is under validation by the project team. The first version of the DMP was created faster than others, because the project was due to start. This DMP is regarded as a project output and will be published on Zenodo 17 . It is possible to publish new versions of the DMP after the monitoring actions, as Zenodo provides version control. The DMP is pending agreement by the members of the project team, and corrections and monitoring will follow. Psychology: Project "Identification of learning and development conditions at/through work: challenge the paradoxes of technological introduction and lifelong learning" focuses on learning about production processes, involves several types of data and data collection techniques, and will deal with personal and sensitive data. Although the project is currently under evaluation, the first version of the DMP was already created. The specificity of this case study is the existence of a researcher in charge of the RDM tasks, that will plan, organize and answer the RDM questions of researchers. This person had experience in data management, but not in DMP creation. This resulted in their request for help in this process. In this case, the DMP creation took an abbreviated path. A first version of the DMP has been directly sent to the data steward for evaluation and correction. In two days, the data steward analyzed the plan, added comments, and identified the points that needed more detail. As the project will deal with sensitive data, the data steward also raised issues regarding the informed consent and DPIA, the corresponding contact with the institution's DPO, and the approval by the ethics committee. The DMP is in preparation using DMPOnline tool and is not public yet. Work on it will proceed in case the project is approved. This case study is included, although the project is not funded, to illustrate the commitment of researchers to a DMP in the planning stage of the project. The fact that more and more project calls require DMP in the submission phase is evidence of the importance of the data steward services. Environmental Engineering, Health, Statistics: These projects are also under preparation and already have the first versions of the DMP. The concern of researchers with RDM issues is visible from their contacts. Like the previous case, the PI emailed us and sent a DMP draft for validation. Finalization, publication and monitoring of these plans are expected by October 2020. The proposal for a collaborative method for DMP creation at INESC TEC is intended as the first approach to establish a DMP workflow for research projects and to ingrain RDM into the project activities. The establishment of this workflow will introduce RDM-related activities in the internal project proposal and execution processes aiming at a double goal. It will enforce the compliance with the project funders requirements and it will also improve the quality of the research methods in the project, by making the data life cycle explicit, by assigning appropriate effort to data management activities and by avoiding misunderstandings among the project partners. This is expected to have a positive impact on the institutional research maturity. Although the collaborative method can be improved, the case studies show that the expertise of a data steward and RDM skills are essential in the institution. The researchers might create DMP based on existing examples from their domain on their own, but besides requiring more time, the final version would likely not have an adequate level of detail and possibly omit the monitoring of the plan. The results of the questionnaire 18 submitted after each DMP process also proved the importance of the existence of the data steward. Some researchers declared: "It is essential to have specialized staff to assist researchers in these tasks", and all stated that it is very important to have a data steward that responds to RDM issues at the institution, and helps prevent errors during the planning stage that might influence the course of the project. The results showed that the identification of sensitive and personal data is one of the main aspects where unexpected difficulties may arise. The experience from the cases described here led to a better articulation between the data steward, the DPO and the Ethics Committee. The RDM workflow can be seen as a more general one, as every research project is expected to have a DMP. Only projects with personal data demand the collaboration with the DPO and the elaboration of a DPIA requires prior knowledge of the data processing steps. Thus the DMP precedes the DPIA. However, the recommendations from the DPIA may lead to a revision of the DMP, and therefore an iterative approach is suggested. The Ethics Committee deals with more fundamental issues like the appropriateness of the project purpose and of its research methods. Although the DMP and a possible DPIA could be informative for the Ethics Committee, the level of analysis and the timings of a project proposal suggest that the ethics analysis may be performed in parallel, the details of the interaction among the three institutional roles being determined by the specificity of each project. Metadata schemes, the choice of a repository, data organization and preservation rules, and the scheduling of RDM tasks among the project partners are other complex issues where researchers need support. To describe all DMP elements, it is necessary to collaborate with specific teams, such as project managers and IT staff who know the institution's technical settings, such as repository capacity and internal regulations. In our case studies, the data steward is aware of these rules and is expected to follow the practices in specific fields, monitor RDM developments and good practices, international and institutional laws and policies, and suggest improvements in the institutional RDM workflow. Moreover, the data steward can help create DMP with more detail, satisfying FAIR requirements, anticipate problems with project partners, and monitor the resulting DMP. One of the researchers confirmed that the collaboration with the data steward was very useful and that, from now on, no project with their group will start without a DMP, to avoid difficulties related to the data, their organization, management and ownership. Considering the answers, researchers considered DMP monitoring also very important, and that it needs to be carried out "every 6 months", upon notification by email. The existence of a pre-filled DMP for a specific domain or "a generic template, that can be adapted for specific DMP" was also mentioned by researchers in the questionnaire. User support, RDM tools and good practices were indicated as important for the whole institution. With each new plan created, both the data steward and the researcher acquire new knowledge, skills and engage more with research data management. In other words, the collaboration and proposed method positively affect all of the institute's stakeholders. To take the INESC TEC DMP support to the next level, we will continue with the collaboration with researchers from different domains on their concrete projects. The DMP monitoring mechanisms will be detailed and evolve to conform to the Machine Actionable DMP standard (maDMP) proposed by the RDA DMP Common Standards Working Group 19 , that we have engaged with during the maDMP Hackathon 20 , mapping our DMP to the maDMP scheme. The results of the Hackathon are the starting point to incorporate tasks complying with the maDMP standard 21 in our proposal for the research project workflow. To this end, we will analyze the information collected during the Hackhathon, improve our DMP scheme and its implementation, and test it with researchers. The development of a DMP system, based on cases studies of researchers from different scientific domains and institutions, is planned as part of the improvement of the INESC TEC RDM workflow. The system will simplify DMP creation, help with its monitorization, and link research data with the corresponding projects and the monitoring mechanisms of the DMP, thus keeping DMP as a "live" document during of the project. Due to the diversity of scientific domains of INESC TEC, during the creation of the system, we will also be able to compare the experiences and requirements of different scientific domains to identify possible differences and contribute to the Data Domain Protocol proposed by Science Europe 22 , a flexible metadata models for DMP creation on different scientific domains. All in all, the interest and availability of researchers in this collaboration promotes an in-depth analysis of DMP issues, the application and testing of existing solutions and the development of our own. The Tuuli project: accelerating data management planning in Finnish research organisations The Cookbook, Engaging Researchers with Data Management European Commission: Annex L. Conditions related to open access to research data Data management planning Developing a data management consultation service for faculty researchers: a case study from a large Midwestern public university SPEC kit 334: research data management services Managing research data Managing Data@ Melbourne Working Group: Managing data@ Melbourne: an online research data management training program Committee on Data of the International Science Council Data deposit in a CKAN repository: a Dublin core-based simplified workflow The collaborative method between curators and researchers in the preparation of a Data Management Plan and Privacy Impact Assessment. 5 Forum Gestão de Dados de Investigação Promoting semantic annotation of research data by their creators: a use case with B2NOTE at the end of the RDM workflow Research data management: Practical strategies for information professionals GBIF-ICLEI best practice guide for publishing biodiversity data by local governments Managing research data JISC research data MANTRA project at EDINA, Information Services On the reuse of scientific data RDA: RDA for the Sustainable Development Goals. Introduction: Fit with the overall RDA vision and mission Research data management tools and workflows: experimental work at the University of Porto Exploring the determinants of scientific data sharing: understanding the motivation to publish research data Research data management in the French national research center (CNRS) Next-generation data management plans: global, machineactionable, FAIR Research data services in European academic research libraries Sustainable development goals: a need for relevant indicators Data management plans: a review The FAIR Guiding Principles for scientific data management and stewardship Building a research data management service at the University of California, Berkeley: a tale of collaboration