key: cord-0180254-g10gdd6z authors: Casino, Fran; Totosis, Nikolaos; Apostolopoulos, Theodoros; Lykousas, Nikolaos; Patsakis, Constantinos title: Analysis and Correlation of Visual Evidence in Campaigns of Malicious Office Documents date: 2021-03-30 journal: nan DOI: nan sha: 87cdbf1ef9bc8d3e180b3247fef4aa40a9de41ee doc_id: 180254 cord_uid: g10gdd6z Many malware campaigns use Microsoft (MS) Office documents as droppers to download and execute their malicious payload. Such campaigns often use these documents because MS Office is installed in billions of devices and that these files allow the execution of arbitrary VBA code. Recent versions of MS Office prevent the automatic execution of VBA macros, so malware authors try to convince users into enabling the content via images that, e.g. forge system or technical errors. In this work, we leverage these visual elements to construct lightweight malware signatures that can be applied with minimal effort. We test and validate our approach using an extensive database of malware samples and identify correlations between different campaigns that illustrate that some campaigns are either using the same tools or that there is some collaboration between them. While cybercrime has always been a threat, it has evolved into a multi-billion underground economy over the past few years. The economic impact of cybercrime [11, 20] is so is devastating that according to the World Economic Forum considers it the second most-concerning risk for global commerce over the next decade [6] . Moreover, the recent COVID-19 pandemic and the spike in usage of digital services has also resulted in an analogous increase of cybercrime activities as reported by multiple sources 1 . As cybercrime evolves in terms of scale and sophistication, artificial intelligence (AI) helps resource-intensive security operations by using technologies such as machine learning, pattern recognition, and natural language processing, which are capable of ingesting terabytes of unstructured data to enhance response times and expand the capacities of security operations. Nevertheless, attackers tend to be a step ahead due to the continuous appearance of novel technologies, industrialisation processes, the difficulty to collect data from different sources in orchestrated campaigns and their timely detection, and the lack of proactive security mechanisms. Undoubtedly, beyond the underground economy exchange (drugs, trafficking etc.), a significant share of this impact stems from the exploitation of security issues that allow an adversary to monetise vulnerabilities, e.g. by injecting commands and manipulating a compromised system or network traffic, by performing extortions, etc. Pervasive and sustained cyber-attacks could have a potentially devastating impact on national and international organisations, disrupting the operations of governments and businesses and the lives of private individuals. One of the most used cybercrime-related activities is phishing as it can be used to launch a series of attacks to the victims. The adversary tries to exploit the human factor by presenting an email that looks benign, e.g. appears to originate from a trusted source or having a harmless attachment; however, the attachment has a malicious payload. While attaching an executable may provide the adversary with immediate access to the victim's machine, this method is not used a lot. The reason is that executables are most often blocked from mail servers and that users do not usually receive such content via email. Therefore, the victim is less likely to receive it and open it. On the contrary, an email containing a Microsoft (MS) Office Document, e.g. a Word document or an Excel spreadsheet is more likely to be opened. Motivation and Contribution: Malware packed in MS Office documents was quite common in the past as the embedded macros were automatically executed when the corresponding trigger is launched, e.g. document is opened/closed etc. To address this issue, recent versions of MS Office have macros disabled by default reducing the success rate of such attacks. Nevertheless, the problem is not by any chance solved as repeatedly shown by the impact of the associated malicious campaigns. In most malware campaigns that are based on malicious MS Office documents, the mondus operandi is quite typical. The adversary tries to trick the user into opening an MS Office Document that comes in an attachment or a link. While the exploitation of Dynamic Data Exchange (DDE) may offer automatic code execution, in most campaigns, the malware authors opt for macros as several patches prevent DDE execution. Nonetheless, this choice requires the victim to accept the macro's execution in her device, as in most cases, this is disabled by default. Therefore, the adversary has embedded an image in the document, which is the only thing that the user sees and tries to mislead the victim and convince it to enable the content. The image in the bulk of the cases falsely states that either some technical error has occurred or the document's data is not accessible, and only by enabling the content it can be resolved, see Figure 1 . Should the victim be tricked into enabling the content, a macro is activated which executes a malicious payload (either contained in the file or more often downloaded) using some Living Off The Land Binaries and Scripts (LOLBAS) 2 . Upon infection, the executed malware may proceed to its core operation. The steps described before are depicted in Figure 2 . Based on the above, it is clear that one of the key components in the attack is the image that is displayed to the victim. This work aims to investigate the possible correlations between malware campaigns based on this visual piece of evidence. To this end, we have collected a big and broad dataset which consists of more than 11 thousands of malicious documents from 16 malicious campaigns. We extracted the embedded images, and we leverage them to cluster the documents per malware and campaign. We argue that this approach may act as an indicator of compromise (IOC) and prevent many attacks from the mail server or the end-point, depending on where the mechanism is deployed. Thus, contrary to the state of the art methods that are based on natural language processing (see Section 2.2), we perform more lightweight calculations, e.g. compute the perceptual hash of an image and only detect the presence of VBA, XLM macros 3 , p-code, or DDE to determine whether a file belongs to known malware campaign. If the image is not known, we analyse the image to extract relevant text which is known to be associated with malicious campaigns and flag it appropriately. As a result, we create a two-layer lightweight filter which identifies malicious MS Office documents efficiently and with high accuracy. The rest of the manuscript is structured as follows. In the next section, we present the related work, giving an overview of the structure of MS Office document files, methods used to weaponise them and to detect them. Moreover, we present the core idea of perceptual hashing, which will be used in our work. In Section 3, we discuss the concept behind our approach and detail our methodology. Then, in Section 4, we present our experimental results. In Section 5, we discuss the outcomes of our experiments and our findings. Finally, the manuscript concludes summarising our contributions. MS Office is considered the default suite for writing documents, working with spreadsheets, and making presentations. To allow interoperability, MS has adopted the Office Open XML, also known as OpenXML or OOXML, format. As the name implies, it is an XML-based format for all office documents. The specification was developed by MS, has been adopted by ECMA International as (ECMA-376) [5] and became an ISO and IEC standard (ISO/IEC 29500) [10] . In principle, all OOXML files are stored in a compressed ZIP file. Therefore, each office file contains a set of XML files and stores the necessary files along with the schema. Images, audio or other multimedia files, as well as scripts stored in the document are also stored inside the same ZIP file. The typical structure of an OOXML file is illustrated in Figure 3 . Notably, several other formats are supported by MS Office for output, such as the Compound File Binary Format (CFBF) which are structured storage files 4 . In this regard, there are XLSB files supported by MS Excel, and due to their binary format they are much faster than traditional XLS/XLSX files. The latter introduces a very interesting twist as files can be encrypted. Therefore, the content cannot be directly examined by an intermediate security mechanism. While Emotet has recently used encrypted ZIP files in its campaign [17] , in this scenario, there is no need for a ZIP file as the office file is directly encrypted. In fact, this approach was recently used in several recent malware campaigns which were further facilitated by an old Excel bug [23] . This bug allows Excel to automatically decrypt an Excel spreadsheet if the used password is VelvetSweatshop. Thus, the file was encrypted with this password, the security mechanisms could not scan the file, yet the file opened seamlessly in the user's device. MS Office documents are weaponised in a very straightforward way. Since most MS office support VBA code for macros and XLM macros, the most commonly used method to create a malicious document is add a macro to the document which is executed automatically, e.g. upon opening or closing a workbook or document with Workbook Open(), AutoOpen(), and AutoClose() functions. The code can then use the Shell command to execute a shell command, or even download data from the Internet. Therefore, malicious MS Office documents are often used as droppers, that is they download malicious binaries from the Internet, and they execute them, passing the control to them. Their authors obfuscate their code by adding unused code, base 64, hex and octal encoding, break strings into smaller ones or results of functions, even abuse MS Office related functions to prevent their analysis. A method to hide VBA code's execution is to actually destroy the VBA source code in the document but leave the compiled version of the macro code known as p-code. This method, known as VBA stomping, originally by V. Bontchev 5 , bypasses static analysers which try to simply extract the VBA code from a document, however, Office will execute the payload from the p-code [7] . Finally, one may inject a malicious payload using Dynamic Data Exchange (DDE) exploiting several MS Office vulnerabilities linked to it, such as CVE-2017-8759, CVE-2017-11292, and CVE-2017-11826, or use XLM macros. To detect malicious office documents, many researchers are using natural language processing methods [12, 22, 13, 15] to detect the presence of obfuscated code in VBA macros which is linked with the document being malicious. Due to the imbalances in the available datasets with such documents to train machine learning algorithms to detect such files, Mimura [14] recently proposed a method using Generative Adversarial Networks to generates fake samples with similar properties. In another research line, researchers try to exploit n-grams [1] of the documents or other similar features such as entropy and other byte-level statistics over fragments of the data stream [18] . For more details on the threats from malicious documents [16] the detection of malicious documents, the interested reader may refer to [19] . Traditionally, hash functions are used to create an easy way to deterministically collect a small "sample" from a data stream that can be used to identify it and differentiates it from others with overwhelming probability. Nevertheless, we may tolerate small variations for images as long as the actual content remains the same. In this regard, we want to create a hash that remains the same after small image manipulations, e.g. rotation, cropping, or some light colour distortion. This type of hashing is called perceptual hashing, and it is widely used to facilitate tasks such as image search, retrieval, and authentication. Therefore, perceptual hashing is ideal for finding similar images. Generally, we split the image on a grid and extract features from each segment. These are then transformed into a vector which is translated into the perceptual hash of the image. There are various methods for perceptual hashing, which in general fall into five main categories. According to Du et al. [4] these are Invariant feature transform-based methods, Local feature points based methods, dimension reduction based methods, Statistics features based methods, and Learning based methods. In what follows, we present the basic concept of our methodology and then our approach, consisting of our methodology and the collected dataset. In many attacks, the adversary tries to create an asymmetry in the balance between his needed effort to launch an attack and the victim's effort to detect and mitigate it. For instance, in the scenarios that we are investigating, the attacker tries to create many documents with small variations on the VBA code, some metadata, as well as text, which will lead to different hashes of the attachment. This way, one cannot simply blacklist a specific hash of a file or some code fragment. While we understand that the latter task can be easily achieved by an adversary which already has many tools to achieve this, we argue that the same does not apply for the visual part of the document. For the execution of the VBA, the attacker has to convince the victim to enable the content. Therefore, random images, not clear, and without visual guidance to enable content would significantly decrease the attack's success. Based on the above, we argue that by detecting the visual part of the attack, we increase the adversary's effort, and we may create much more efficient filters. As a result, we aim to extract the images that are contained in a document or a spreadsheet and correlate the information to i) create more efficient IoCs, ii) identify the use of common tools as well as cooperation between different malicious actors and campaigns. Therefore, our hypotheses are the following: 1. The images contained in an MS Office document can be a good indicator that a file is malicious. 2. Malicious campaigns and threat actors reuse images in campaigns due to the use of the same tools as well as cooperation. To assess the applicability and efficacy of our hypotheses we collected a large dataset of malicious files from Triage and Malware Bazaar. The office documents belong to 16 different campaigns, as illustrated in Table 1 . After collecting these samples, we exploited the fact that most office files are actually ZIP files so the stored multimedia can be easily exported without tampering the files and without opening the files in any virtual machine. Since several of the samples were XLSB files encrypted with the VelvetSweatshop password (see Section 2.1), we utilised msoffcrypto-tool 6 to decrypt the content and create a typical Excel file, without distorting its contents. This method is very efficient and lightweight; it guarantees that we are extracting the artefacts in a forensically secure manner and that no malicious code can be executed from the sample. Once the images have been collected from each sample, we compute its hash (using SHA-256), the perceptual hash of the image, and then proceed with the text of the image. To this end, we detect the text's language, extract the text of the image, and translate it, where applicable. The extracted information is stored in a database and used for correlating, assessing, and clustering, as discussed in the following paragraphs. In the following sections, we provide an exploratory analysis of our dataset as well as a description of the methods used to process the data and the corresponding outcomes. As described in Table 1 , we collected samples from different malware families. Moreover, each of such samples contained a set of images. In this regard, Figure 6 depicts the amount of images collected from each family. As it can be observed, Qbot, Emotet, and Smokeloader were the most populated families. It is relevant to note though that some families used more images per sample on average. For instance, Emotet has more samples than Qbot in our dataset, yet the latter exhibited a higher number of images. The next experiment focused on observing how many times the same image was used in the collected samples. In this regard, Figure 7 shows the amount of times that an image was detected in our dataset, according to its SHA-256 hash. Clearly, there are two hashes, namely 3eb3cd078172... and 49ad87680a... which appeared approximately 4650 times. Such hashes belong mainly to Qbot and smoketbot, which used them exactly 3638 and 998 times, respectively. Moreover, they were used by ZLoader and Hancitor, yet only in very few samples. Therefore, these families used the same image multiple times in their campaigns. The next most used image appeared 100 times, and the pace decreases smoothly after that value. In total, we found 623 images appearing only once from the 1090 unique images collected from our samples. However, the latter does not depict the actual truth as malicious actors manipulate the images to have minor distortions, most of which do not have any observable change for the human eye. This way, even if the payload is the same, the resulting file has a different hash. Therefore, to correlate the collected images and bypass the slight modifications, we computed their perceptual hashes and leveraged the same experiments that we previously performed for the SHA-256 hashes. In this regard, Figure 8 shows the amount of images and their appearance in our dataset. As it can be observed, many of them appear more than 100 times, contrary to the behaviour depicted in Figure 6 . Moreover, the amount of perceptual hashes appearing only once is 304 from a total of 526 unique perceptual hashes, which also showcases the strong similarity of some images and supports the use of perceptual hash in this context. In addition, we also depicted the number of images per family, according to their perceptual hash in Figure 9 . From the comparison between the statistics shown in Figure 6 and the ones represented in Figure 9 , it is evident that several families used the same images across different campaigns due to the dramatic reduction of the number of images when removing duplicates. Yet, the set of unique images differs substantially between families. For instance, a considerable amount of Qbot, Smokeloader and Emotet samples were processed, yet such families tend to use the same subset of images in their corresponding samples, a fact which is evinced in Figures 9 and 8 , and also by the amount of images appearing only once (i.e. from 623 to 304). In the next paragraphs, we will explore the connections between the different families and the subsets of images used by more than one family. Figure 10 shows the amount of times that a unique image was used by different families. In the case of images identified by their SHA-256, we found 36 unique hashes used by more than one family, and 41 in the case of using their perceptual hash. Moreover, as it can be seen in Figure 10 , the heat map is denser in the case of perceptual hashes (cf. Figure 10b) , denoting that different families are using almost identical images with slight modifications, which are obviously enough to modify the SHA-256 representation of the image but not their perceptual hash. Note that some images differ only in some bits due to, e.g. a minor colour change in some pixels. Moreover, note that in Figure 10 we do not consider the repeated use of an image by different families, which would yield much higher numbers due to the use of the same, or almost identical images in different campaigns. According to Figure 10a , the families that shared the most unique images were Gozi Isfb and Dridex, closely followed by ZLoader and Trickbot. Moreover, Loki and formbook are also using the same subset of images in several campaigns. These correlations are strengthened when using perceptual hashes to represent the images, as observed in Figure 10b . In this regard, images that were slightly different were now captured and correlated between Gozi Isfb and Dridex. In addition, we discovered that very similar images were used by different families, showcasing the possibility that either the same perpetrators are behind different campaigns, or that malicious actors are using previously published samples and materials to enhance their malware. To consider the quantity of the coincidences between different images and families, we performed another measurement. In this case, we collected the images according to their perceptual hash, searched in how many different samples they were used, and counted the co-occurrences between pairs of families. For instance, if image a appears in 10 samples of Qbot family and in 10 sam- Figure 11 . As it can be observed, there are some families atop the number of cooccurrences, namely Qbot, ZLoader, Smokeloader, and Hancitor. Therefore, we can establish a high interconnection between such families considering both the number of unique hashes (cf. Figure 10b) , and the amount of samples in which they were used (cf. Figure 11 ). It is worth to note that Dridex and Gozi Isfb did not share a high number of coincidences in this measurement despite exhibiting the highest number of unique images shared (cf. Figure 10b ), yet the latter is directly related with the nature and amount of samples retrieved in our dataset. Therefore, to have both a global perspective of the collaborations between families, as well as the distribution of our dataset, the outcomes of both experiments need to be contrasted. It is worth noticing that, after establishing a connection between Qbot, Zloader and Smokeloader documents and upon closer inspection, we observed that apart from the image, they shared more characteristics. More precisely, the name of their first sheet is always DocuSign, and they usually have two more hidden sheets. One of these sheets has the XLM macros and the other one the data used by these macros. Moreover, inspecting more samples revealed that for every different DocuSign image, there is a specific set of macros that accompanies it enables us to predict the behaviour of the document. The community has assigned the name SilentBuilder for these XLS(S-M) documents. To assess the limits of our hypothesis regarding the malicious intent of images, we opted to perform a blind test on images that are used in benign files. The blind test involves determining whether a file is benign or malicious based on the images that it contains and the fact that it has macros or DDE. While there are millions of MS documents shared online, typical users do not use macros or DDE in their documents. Therefore, public samples of such documents are very sparse. Nevertheless, we collected 890 such documents. In fact, we submitted these files to VirusTotal and Triage to validate that they are not malicious. Clearly, the aforementioned benign files introduce a bias against our methodology as typical files would not meet these requirements, nonetheless, it is the best approach to stress the efficacy of our hypothesis. Following the methodology described in Section 3, we extracted the images of all the collected malicious Microsoft Office samples, and then used their corresponding perceptual hash to end up with 526 unique perceptual hashes. In addition, we collected from the 890 benign samples 2497 unique images, with their corresponding perceptual hashes. Finally, we merged all these images with the malicious ones to end up with a dataset containing 3023 unique images. To automatically detect the malicious intent of these images blindly, that is without knowing any ground truth about the file, we leverage a text detection pipeline by using Tesseract 7 . Therefore, we extract the text of the images and look for specific keywords. The first step of our analysis consisted of manually annotating the images from the malicious samples that were asking the users to activate or enable content to grant access to a fully functional version of a document. Subsequently, we marked as malicious a total of 159 images. Next, we used our text pipeline extraction method, which first applies a transformation to the input image (i.e. removing the transparent layer and applying a threshold-based binarisation). Subsequently, we used Tesseract to extract the text of the image. Then we translated the text to English where applicable, to search for specific keywords such as "enable content" or "enable macros". In the event that an image contains one of the keywords tagged as malicious, we label the image as dangerous. Finally, we applied our automated method to our dataset of unique images and compared its detection rate with the manual annotations. The outcomes of such comparison are depicted in Table 2 . Our automated pipeline's accuracy is above 0.99, with only three false negatives, which were due to errors in the character recognition (e.g. "enabte content"). One solution for such a problem could be to, e.g. accept strings differing by one character, yet this kind of post-processing is left for future work. Note that, since the total amount of malicious files is 159, the amount of false positives has a big impact in the precision value (i.e. the amount of malicious files is around a 5.2% of the dataset). In our experiment, we had only 11 false positives. Such images contained several of the suspicious keywords, and 10 of them were directly suggesting users enable macros, yet these images were found in benign samples. Precision Recall Accuracy F1-score 0.934 0.981 0.995 0.957 Table 2 : Outcomes of our proposed method. One of our method's advantages is that it does not require training, so there is no need to split our dataset, and we used it as 100% testing. Therefore, all the images were directly used in our experiment to compute the accuracy. Note that such an experiment covers the worst-case scenario, and thus, our average outcomes could reach better values if we used, e.g. subsampling or n-fold splits. Nevertheless, we wanted to state the complete numbers for the sake of clarity and to stress the efficacy of our method. The predominant images that were found in our dataset are variants of MS Office-like text boxes asking users to enable content. Another subset of images used several combinations of colours, highlighted text and text with transparency to hinder the text detection task. A further significant subset of images is composed of images with blurred background, which creates the illusion that a real document will be unlocked if the user grants the corresponding permission. Finally, we also found other harmless images such as icons, business logos, and images belonging to step by step tutorials. It is worth noting that we found different languages corresponding to different international campaigns, including languages which used different scripts including, but not limited to, Latin, Greek, Cyrillic, Bengali, Japanese. Based on the above, our proposed method can achieve very good results even in the case of new unknown malicious campaigns. In fact, the benign dataset that we used cannot be considered a representative of the real-world samples, as benign documents do not use macros and DDE often. Therefore, in a real-world setting, the outcomes of our blind test would be significantly higher. The evolution of cybercrime into a huge underground economy has turned cybercrime into an actual industry. The above is justified by the collaboration between malware authors and the emergence of Malware-as-a-Service (MaaS) or Access-as-a-Service (AaaS) models where, for instance, malware authors "rent" or pass the control of the compromised devices to their peers. Moreover, for many malware families, the attribution of malware to an actor is not straightforward, and due to the malware evolution and code exchange in groups, it becomes a very challenging task. For instance, Gozi has several variations with different capabilities [2] , with occasional parallel campaigns of its variants. Emotet, one of the most notorious malware, is another fine example of these exchanges. It shares the same loader with Gozi Isfb, Dridex and BitPayme [21] , it has bonds with Qbot [8] , Trickbot [3] , and more recently with Ryuk [9, 17] . The latter is supported by our experiments, since we found correlations between such families in Section 4.2. More concretely, we found further correlations between families, especially in the case of, e.g. Qbot, ZLoader, Smokeloader, and Hancitor, which shared images across a high number of samples. Moreover, other families such as Dridex and Emotet also shared a relevant number of images between them and with further families such as ZLoader. Therefore, the information shared between families is higher than expected, since other families like Loki and formbook were also correlated with, e.g. Qbot and ZLoader. In summary, macro malware campaigns share more similarities than one would envision, compared with other malware campaigns. In the aforementioned campaigns, as well as the rest included in our dataset, the first step to launch the attack is made once the user opens an MS Office document and she enables the content. If the user is not convinced to enable the content, then the attack would not start. Therefore, the malicious actors try to present a convincing message that such action is necessary as e.g. a system error occurred. Our proposed method is very lightweight as to determine whether a file is malicious one uses a small signature which consists of the perceptual hashes of the images that it contains. If the image is in the database and the file contains VBA code, it is automatically flagged as malicious without the need to investigate the code. The latter can be easily checked with, e.g. the presence of the vbaProject.bin in the compressed files that comprise the document. Moreover, while the presence of obfuscated code implies that the document must be executed in a sandbox to determine in which family it belongs to, our approach can classify it far easier by the hashes. If the perceptual hash does not exist in our database, we apply the method described in Section 4.3 to determine the threat level of the sample's images. Perhaps the most important contribution of our work is that our approach introduces a significant effort to malware authors. The images that they use are easily flagged, and creating new and convincing ones is far from trivial. This way, even if a sample contains images which are not known, one may easily introduce them to the blacklist in the presence of specific keywords. Indeed, the latter illustrated an almost perfect efficacy. In this regard, the effort of the adversary is significantly increased. The automatic use and minor tampering of images are prevented, and the generation of convincing images cannot be automated. Finally, our work showcases the commonalities of malicious campaigns from another perspective. The existence of so many common images among campaigns shows that either the same tools are being used or that the same people are behind them as the manipulation of the images cannot be the same for all of them. It should also be noted that some families, e.g. Emotet, are making each image unique which signifies that they are aware that these images can serve as a signature, so each image has a unique hash. Future work will focus on the forensic analysis of tools used to generate malicious MS documents and deobfuscation methods. To this end, tools like Evil Clippy 8 and LuckyStrike 9 will be examined, to determine their use in malicious campaigns. Moreover, we plan to examine further the attack surface that can be provided by an MS Office document as, e.g. the use of remote resources from XML has already been used to leak information, without even opening the file 10 . Automated microsoft office macro malware detection using machine learning The malware with a thousand faces A one-two punch of emotet, trickbot, & ryuk stealing & ransoming data Perceptual hashing for image authentication: A survey Office open xml file formats Wild wide web consequences of digital fragmentation Vba stomping: Advanced malicious document techniques An old bot's nasty new tricks: Exploring qbot's latest attack methods Understanding the relationship between emotet, ryuk and trickbot International Organization for Standardization. Information technology -document description and processing languages -office open xml file formats -part 1: Fundamentals and markup language reference Internet Crime Complaint Center (IC3). 2019 internet crime report Obfuscated vba macro detection using machine learning Using sparse composite document vectors to classify vba macros Using fake text vectors to improve the sensitivity of minority class for macro malware detection Towards efficient detection of malicious vba macros with lsi Office document security and privacy Analysing the fall 2020 emotet campaign. CoRR, abs Towards a malicious email attachment detection engine Malware detection in pdf and office documents: A survey Cybercrime losses: An examination of us manufacturing and the total economy Detecting malicious windows commands using natural language processing techniques Velvetsweatshop: Default passwords can still make a difference This work was supported by the European Commission under the Horizon 2020 Programme (H2020), as part of the projects CyberSec4Europe (Grant Agreement no. 830929) and LOCARD (Grant Agreement no. 832735).The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors.