key: cord-0064899-wmdfnqow authors: Cortis, Keith; Davis, Brian title: Over a decade of social opinion mining: a systematic review date: 2021-06-25 journal: Artif Intell Rev DOI: 10.1007/s10462-021-10030-2 sha: 480c479e8b294baab24a351c7523d329996c1a44 doc_id: 64899 cord_uid: wmdfnqow Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, and other aspects derived. Social Opinion Mining can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. The latest developments in Social Opinion Mining beyond 2018 are also presented together with future research directions, with the aim of leaving a wider academic and societal impact in several real-world applications. Social media is increasing in popularity and also in its importance. This is principally due to the large number of people who make use of different social media platforms for various types of social interaction. Kaplan and Haenlein define social media as ''a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, which allows the creation and exchange of user generated content'' (Kaplan and Haenlein 2010) . This definition fully reflects that social media platforms are essential for online users to submit their views and also read the ones posted by other people about various aspects and/or entities, such as opinions about a political party they are supporting in an upcoming election, recommendations of products to buy, restaurants to eat in and holiday destinations to visit. In particular, people's social opinions as expressed through various social media platforms can be beneficial in several domains, used in several applications and applied in real-life scenarios. Therefore, mining of people's opinions, which are usually expressed in various media formats, such as textual (e.g., online posts, newswires), visual (e.g., images, videos) and audio, is a valuable business asset that can be utilised in many ways ranging from marketing strategies to product or service improvement. However as indicate in Ravi and Ravi (2015) , dealing with unstructured data, such as video, speech, audio and text, creates crucial research challenges. This research area is evolving due to the rise of social media platforms, where several work already exists on the analysis of sentiment polarity. Moreover, researchers can gauge widespread opinions from user-generated content and better model and understand human beliefs and their behaviour. Opinion Mining is regarded as a challenging Natural Language Processing (NLP) problem, in particular for social data obtained from social media platforms, such as Twitter 1 , and also for transcribed text. Standard linguistic processing tools were built and developed on newswires and review-related data due to such data following more strict grammar rules. These differences should be taken in consideration when performing any kind of analysis (Balazs and Velásquez 2016) . Therefore, social data is difficult to analyse due to the short length in text, the non-standard abbreviations used, the high sparse representation of terms and difficulties in finding out the synonyms and any other relations between terms, emoticons and hashtags used, lack of punctuations, use of informal text, slang, non-standard shortcuts and word concatenations. Hence, typical NLP solutions are not likely to work well for Opinion Mining. Opinion Mining-presently a very popular field of study-is defined by Liu and Zhang as ''the computational study of people's opinions, appraisals, attitudes, and emotions toward entities, individuals, issues, events, topics and their attributes'' (Liu and Zhang 2012) . Social is defined by the Merriam-Webster Online dictionary 2 as ''of or relating to human society, the interaction of the individual and the group, or the welfare of human beings as members of society''. In light of this, we define Social Opinion Mining (SOM) as ''the study of usergenerated content by a selective portion of society be it an individual or group, specifically those who express their opinion about a particular entity, individual, issue, event and/or topic via social media interaction''. Therefore, the research area of SOM is tasked with the identification of several dimensions of opinion, such as sentiment polarity, emotion, sarcasm, irony and mood, from social data which is represented in structured, semi-structured and/or unstructured data formats. Information fusion is the field tasked with researching about efficient methods for automatically or semi-automatically transforming information from different sources into a single coherent representation, which can be used to guide the fusion process. This is important due to the diversity in data in terms of content, format and volume (Balazs and Velásquez 2016) . Sections 1.1 and 1.2 provide information about SOM and its challenges. In addition, SOM is generally very personal to the individual responsible for expressing an opinion about an object or set of objects, thus making it user-oriented from an opinion point-of-view, e.g., a social post about an event on Twitter, a professional post about a job opening on LinkedIn 3 or a review about a hotel on TripAdvisor. 4 Our SOM research focuses on microposts-i.e. information published on the Web that is small in size and requires minimal effort to publish (Cano et al. 2016 )-that are expressed by individuals on a microblogging service, such as Sina Weibo 5 or Twitter and/ or a social network service that has its own microblogging feature, such as Facebook 6 and LinkedIn. In 2008, Pang and Lee had already identified the relevance between the field of ''social media monitoring and analysis'' and the body of work reviewed in Pang and Lee (2008) , which deals with the computational treatment of opinion, sentiment and subjectivity in text. This work is nowadays known as opinion mining, sentiment analysis, and/or subjectivity analysis (Pang and Lee 2008) . Other phrases, such as review mining and appraisal extraction have also been used in the same context, whereas some connections have been found to affective computing (where one of its goals is to enable computers in recognising and expressing emotions) (Pang and Lee 2008) . Merriam-Webster's Online Dictionary defines that the terms 7 ''opinion '', ''view'', ''belief'', ''conviction'', ''persuasion'' and ''sentiment'' mean a judgement one holds as true. This shows that the distinctions in common usage between these terms can be quite subtle. In light of this, three main three research areas-opinion mining, sentiment analysis and subjectivity analysis-are all related and use multiple techniques taken from NLP, information retrieval, structured and unstructured data mining (Ravi and Ravi 2015) . However, even though these three concepts are broadly used as synonyms, thus used interchangeably, it is worth noting that their origins differ. Some authors also consider that each concept presents a different understanding (Serrano-Guerrero et al. 2015) , and also have different notions (Tsytsarau and Palpanas 2012) . We are in agreement with this, hence we felt that a new terminology is required to properly specify what SOM means, as defined in Sect. 1. According to Cambria et al. , sentiment analysis can be considered as a very restricted NLP problem, where the polarity (negative/positive) of each sentence and/or target entities or topics needs to be understood (Cambria et al. 2013) . On the other hand, Liu discusses that ''opinions are usually subjective expressions that describe people's sentiments, appraisals or feelings toward entities, events and their properties'' (Liu 2010) . He further identifies two sub-topics of sentiment and subjectivity analysis, namely sentiment classification (or document-level sentiment classification) and subjectivity classification. SOM requires such classification methods to determine an opinion dimension, such as objectivity/subjectivity and sentiment polarity. For example, subjectivity classification is required to classify whether user-generated content, such as a product review, is objective or subjective, whereas sentiment classification is performed on subjective content to find the sentiment polarity (positive/negative) as expressed by the author of the opinionated text. In cases where the user-generated content is made up of multiple sentences, sentencelevel classification needs to be performed to determine the respective opinion dimension. In addition, sentence-level classification is not suitable for compound sentences, i.e., a sentence that expresses more than one opinion. For such cases, aspect-based opinion mining needs to be performed. Pang and Lee (2008) had already identified that the writings of Web users can be very challenging in their own way due to numerous factors, such as the quality of written text, discourse structure and the order in which different opinions are presented. The effects of the latter factor can result in a completely opposite overall sentiment polarity, where the order effects can completely overwhelm the frequency effects. This is not the case in traditional text classification, where if a document refers to the term ''car'' in a frequent manner, the document will probably somewhat be related to cars. Therefore, order dependence manifests itself in a more fine-grained level of analysis. Liu (2010) mentions that complete sentences (for reviews) are more complex than short phrases and contain a large amount of noise, thus making it more difficult to extract features for feature-based sentiment analysis. Even though we agree that with more text, comes a higher probability of spelling mistakes, etc., we tend to disagree that shorter text, such as microposts, contain less noise. The process of mining user-generated content posted on the Web is very intricate and challenging due to the nature of short textual content limit (e.g., tweets allowed up to 140 characters till October 2017), which at times forces a user to resort in using short words, such as acronyms and slang, to make a statement. These often lead to further issues in the text, such as misspellings, incomplete content, jargon, incorrect acronyms and/or abbreviations, emoticons and content misinterpretation (Cortis 2013) . Other noteworthy challenges include swear words, irony, sarcasm, negatives, conditional statements, grammatical mistakes, use of multiple languages, incorrect language syntax, syntactically inconsistent words, and different discourse structures. In fact, when informal language is used in the user-generated content, the grammar and lexicon varies from the standard language normally used (Dashtipour et al. 2016) . Moreover, user-generated text exhibits more language variation due to it being less grammatical than longer posts, where the aforementioned use of emoticons, abbreviations together with hashtags and inconsistent capitalisation, can form an important part of the meaning (Maynard et al. 2012) . Maynard et al. (2012) also points out that microposts are in some sense the most challenging type of text for text mining tools especially for opinion mining, since they do not contain a lot of contextual information and assume much implicit knowledge. Another issue is ambiguity, since microposts such as tweets do not follow a conversation thread. Therefore, this isolation from other tweets makes it more difficult to make use of coreference information unlike in blog posts and comments. Due to the short textual content, features can also be sparse to find and use, in terms of text representation ). In addition, the majority of microposts usually contain information about a single topic due to the length limitation, which is not the case in traditional blogs, where they contain information on more than one topic given that they do not face the same length limitations . Big data challenges, such as handling and processing large volumes of streaming data, are also encountered when analysing social data (Bravo-Marquez et al. 2014) . Limited availability of labelled data and dealing with the evolving nature of social streams usually results in the target concept changing which would require the learning models to be constantly updated . In light of the above, social networking services bring several issues and challenges with them and the way in how content is generated by their users. Therefore, several Information Extraction tasks, such as Named Entity Recognition (NER) and Coreference Resolution, might be required to carry out multi-dimensional SOM. In fact, several shared evaluation tasks are being organised to try and reach a standard mechanism towards performing IE tasks on noisy text which is very common in user-generated social media content. As already discussed in detail above, such tasks are much harder to solve when they are applied on micro-text like microposts (Ravi and Ravi 2015) . This problem presents serious challenges on several levels, such as performance. Examples of such tasks are ''Named Entity Recognition in Twitter'' 8 . In terms of content, social media-based studies present only analysis and results from a selective portion of society, since not everyone uses social media. Moreover, several crosscultural differences and factors determine the social media usage in each country and hence the results of such studies. For example for the Political domain, these services are used predominantly by young and politically active individuals or by ones with strong political views. This could be easily reflected in the Brexit results, where the majority of younger generation (age 18-44) voted to remain in the European Union as opposed to people over age 45. Such a result falls in line with the latest United Kingdom social media statistics, such as for Twitter, where 72% of the users are between the age of 15-44, whilst for Facebook the most popular age group is 25-34 (26% of users) (Hürlimann et al. 2016 ). However, results of similar studies in other cultures and languages might differ due to different use of social words to reflect a general opinion, sentiment polarity and/or emotion ). In light of the above, it is noteworthy that no systematic review within this newly defined domain exists even though there are several good survey papers (Liu and Zhang 2012; Tsytsarau and Palpanas 2012; Medhat et al. 2014; Ravi and Ravi 2015) . The research paper by Bukhari et al. (2016) is closest to a systematic review in this domain, whereby the authors performed a search over the ScienceDirect and SpringerLink electronic libraries for the ''sentiment analysis'', ''sentiment analysis models'', ''sentiment analysis of microblogs'' terms. As a result, we felt that the SOM domain well and truly deserves a thorough systematic review that captures all of the relevant research conducted over the last decade. This review also identifies the current literature gaps within this popular and constantly evolving research domain. The structure of this comprehensive systematic review is as follows: Sect. 2 presents the research method adopted to carry out this review, followed by Sect. 3 which provides a thorough review analysis of the main aspects derived from the analysed studies. This is followed by Sect. 4 which focuses on the different dimensions of social opinions as derived from the analysed studies, and Sect. 5 which presents the application areas where SOM is being used. Lastly, Sect. 6 discusses the the latest developments for SOM (beyond the period covered by the systematic review) and future research directions as identified by the authors. This survey paper about SOM adopts a systematic literature review process. This empirical research process was based on the guidelines and procedures proposed by Kitchenham (2004) , Brereton et al. (2007) , Dyba et al. (2007) and Attard et al. (2015) which were focused on the software engineering domain. The systematic review process although more time consuming is reproducible, minimising bias and maximising internal and external validity. The procedure undertaken was structured as follows and is explained in detail within the sub-sections below: 1. Specification of research questions; 2. Generation of search strategy which includes the identification of electronic sources (libraries) and selection of relevant search terms; 3. Application of the relevant search; 4. Choice of primary studies via the utilisation of inclusion and exclusion criteria on the obtained results; 5. Extraction of required data from primary studies; 6. Synthesis of data. A systematic literature review is usually characterised by an appropriate generic ''research question, topic area, or phenomenon of interest'' (Kitchenham 2004) . This question can be expanded into a set of sub-questions that are more clearly defined, whereby all available research relevant to these sub-questions are identified, evaluated and interpreted. The goal of this systematic review is to identify, analyse and evaluate current opinion mining solutions that make use of social data (data extracted from social media platforms). In light of this, the following generic research question is defined: • What are the existing opinion mining approaches which make use of usergenerated content obtained from social media platforms? The following are specific sub-questions that the generic question above can be subdivided into: 1. What are the existing approaches that make use of social data for opinion mining and how can they be classified 9 ? 2. What are the different dimensions/types of social opinion mining? 3. What are the challenges faced when performing opinion mining on social data? 4. What techniques, datasets, tools/technologies and resources are used in the current solutions? 5. What are the application areas of social opinion mining? The search strategy for this systematic review is primarily directed via the use of published papers which consist of journals, conference/workshop proceedings, or technical reports. The following electronic libraries were identified for use, due to their wide coverage of relevant publications within our domain: ACM Digital Library 10 , IEEE Xplore Digital Library 11 , ScienceDirect 12 , and SpringerLink 13 . The first three electronic libraries listed were used by three out of the four systematic reviews that our research process was based on (and which made use of a digital source), whereas SpringerLink is one of the most popular sources for publishing work in this domain (as will be seen in Sect. 2.4 below). Moreover, three other electronic libraries were considered for use, two -Web of Science 14 and Ei Compendex 15 -which the host university did not have access to and Google Scholar 16 which was not included, since content is obtained from the electronic libraries listed above (and more), thus making the process redundant. The relevant search terms were identified for answering the research questions defined in Sect. 2.1. In addition, these questions were also used to perform some trial searches before the following list of relevant search terms was determined: 12. ''Twitter opinion mining''; 13. ''Social data analysis''. The following are important justifications behind the search terms selected above: • ''opinion mining'' and ''sentiment analysis'': are both included due to the fact that these key terms are used interchangeably to denote the same field of study (Pang and Lee 2008; Cambria et al. 2013) , even though their origins differ and hence do not refer to the same concept (Serrano-Guerrero et al. 2015) ; • ''microblog'', ''social network'' and ''Twitter'': the majority of the opinion mining and/ or sentiment analysis research and development efforts target these two kinds of social media platforms, in particular the Twitter microblogging service. The ''OR'' Boolean operator was chosen to formulate the search string. The search terms were all linked using this operator, making the search query simple and easy to use across multiple electronic libraries. Therefore, a publication only had to include any one of the search terms to be retrieved (Attard et al. 2015) . In addition, this operator is more suitable for the defined search terms given that this study is not a general one e.g., about opinion mining in general, but is focused about opinion mining in a social context. Construction of the correct search string (and terms) is very important, since this eliminates noise (i.e. false positives) as much as possible and at the same time still retrieves potential relevant publication which increases recall. Several other factors had to be taken in consideration during the application of search terms on the electronic libraries. The following is a list of factors relevant to our study, identified in Brereton et al. (2007) and verified during our search application process: • Electronic library search engines have different underlying models, thus not always provide required support for systematic searching; • Same set of search terms cannot be used for multiple engines e.g., complex logical combination not supported by the ACM Digital Library but is by the IEEE Xplore Digital Library; • Boolean search string is dependent on the order of terms, independent of brackets; • Inconsistencies in the order or relevance in search results (e.g., IEEE Xplore Digital Library results are sorted in order of relevance); • Certain electronic libraries treat multiple words as a Boolean term and look for instances of all the words together (e.g., ''social opinion mining''). In this case, the use of the ''AND'' Boolean operator (e.g., ''social AND opinion AND mining'') looks for all of the words but not necessary together. On the above, in our case it was very important to select a search strategy that is more appropriate to the review's research question which could be applied to the selected electronic libraries. When applying the relevant search on top of the search strategy defined in Sect. 2.2, another important element was to identify appropriate metadata fields upon which the search string can be executed. Table 1 presents the ones applied in our study. Applying the search on the title metadata field alone would result in several missed and/ or incorrect results. Therefore, using the abstract and/or keywords in the search is very important to reduce the number of irrelevant results. In addition, this ensures that significant publications that lack any of the relevant search terms within their title are returned. A separate search method was applied for each electronic library, since they all offer different functionalities and have different underlying models. Each method is detailed below: • ACM: Separate searches for each metadata field were conducted and results were merged (duplicates removed). Reason being that the metadata field search functionality ''ANDs'' all metadata fields, whereas manual edition of the search query does not work well when amended. • IEEE: Separate searches for each metadata field were conducted and results were merged (duplicates removed). • ScienceDirect: One search that takes in consideration all the chosen metadata fields. • SpringerLink: By entering a search term or phrase, a search is conducted over the title, abstract and full-text (including authors, affiliations and references) of every article and book chapter. This was noted in the large amount of returned papers (as will be discussed in the next sub-section), which results in a high amount of false positives (and possibly a higher recall). A manual study selection was performed on the primary studies obtained from the search application defined in Sect. 2.3. This is required to eliminate any studies that might be irrelevant even though the search terms appear in either of the metadata fields defined in Table 1 above. Therefore, inclusion and exclusion criteria (listed below) were defined. Published papers that meet any of the following inclusion criteria are chosen as primary studies: • I1. A study that targeted at least one social networking service and/or utilised a social dataset besides other social media services, such as blogs, chats and wikis. Please note that only work performed on social data from social networking services is taken in consideration for the purposes of this review; • I2. A study published from the year 2007 onwards. This year was chosen, since the mid-2000s saw the evolution of several social networking services, in particular Facebook's growth (2007) , which currently contains the highest monthly active users; • I3. A study published in the English language. Published papers that satisfy any of the exclusion criteria from the following list, are removed from the systematic review: • E1. A study published before 2007; • E2. A study that does not focus on performing any sort of opinion mining on social media services, even though it mentions some of the search terms; • E3. A study that focuses on opinion mining or sentiment analysis in general i.e. no reference in a social context; • E4. A study that is only focused on social data sources obtained from online forums, communities, blogs, chats, social news websites (e.g., Slashdot 17 ), review websites (e.g., IMDb 18 ); • E5. A study that consists of either a paper's front cover and/or title page. Selection of the primary studies for this systematic review was carried out in 2019. Therefore, studies indexed or published from 2019 onwards, are not included in this review. Table 2 shows the results for each electronic library at each step of the procedure used for selecting the final set of primary studies. The results included one proceedings, which was resolved by including all the published papers within the track relevant to this study 19 , since the other papers were not relevant thus not included in the initial results. The search application phase resulted in a total of 861 published papers. False positives, which consist of duplicate papers and papers that meet any of the exclusion criteria were removed. This was done through a manual study selection which was performed on all the metadata fields considered i.e. the title, abstract and keywords. In cases where we were still unclear of whether a published paper is valid or not, we went through the full text. This study selection operation left us with 460 published papers, where the number of false positives totalled 401. Out of the final study selection published papers, we did not have full access to 9 published papers, thus reducing the total primary studies to 451. In addition to the primary studies selected from the electronic libraries, we added a set of relevant studies -34 published papers (excluding survey papers)-for completeness sake which were either published in reputable venues within the Opinion Mining community or were highly cited. Therefore, the final set of primary studies totals 485 published papers. The main objective of this study is to conduct a systematic analysis of the current literature in the field of SOM. Each published paper in this review was analysed in terms of the following information/parameters: social media platforms, techniques and approaches, social datasets, language, modality, tools and technologies, (other) NLP tasks, application areas and opinion mining dimensions. It is important to note that this information was manually extracted from each published paper. In the sub-sections below we discuss the overall statistics about the relevant primary studies that resulted from the study selection phase of this systematic review. 17 https://slashdot.org/. 18 https://www.imdb.com/. Figure 1 shows that the first three years of this evaluation period, i.e., 2007-2009, did not return any relevant literature. It is important to note that 2006 and 2007 was the period when opinion mining emerged in Web applications and weblogs within multiple domains, such as politics and marketing (Pang and Lee 2008) . However, 2010-which year coincides with the introduction of various social media platforms and the major increase in Facebook and Twitter usage 20 -resulted in the first relevant literature, which figures kept increasing in the following years. Please note that the final year in evaluation, that is 2018, contains literature that was published or indexed till the 31st December 2018. From the twelve full years evaluated, 2018 produced the highest number of relevant literature. This shows the importance of opinion mining on social data, and therefore the continuous increase in social media usage and popularity, in particular social networking services. Moreover, SOM solutions are on the increase for various real world applications. The additional set of studies included in this systematic review, were published in the period between the year 2009 and 2014. These ranged from various publishers, namely the four selected for this study (ACM, IEEE Xplore, ScieneDirect and SpringerLink) and other popular ones, such as Association for the Advancement of Artificial Intelligence (AAAI) 21 , Association for Computational Linguistics (ACL) 22 and Wiley Online Library 23 . The data synthesis of this detailed analysis is based on the extracted data mentioned in Sect. 2.5.1 above, which is discussed in the subsequent sections. It must be noted that not all the published papers were considered in the analysis conducted. Therefore, this table is referenced in all of the different aspects of the data synthesised, as presented below. It presents the primary studies returned from each electronic library and the additional ones, together with the ones that do not have full access, survey papers, papers which present work that can be applied/used on social data, and papers originating from organised tasks within the domain. The in-depth analysis, which focused on the social media platforms, techniques, social datasets, language, modality, tools and technologies, NLP tasks and other aspects used across the published papers, is presented in Sects. 3.1-3.7. Social data refers to online data generated from any type of social media platform be it from microblogging, social networking, blogging, photo/video sharing and crowdsourcing. Given that this systematic survey focuses on opinion mining approaches that make use of social networking and microblogging services, we identify the social media platforms used in the studies within this review. In total, 469 studies were evaluated with 66 from ACM, 155 from IEEE Xplore, 32 from ScienceDirect, 182 from SpringerLink and 34 additional ones. Papers which did not provide full access were excluded. Note that 4 survey papers-2 from ACM Zimbra et al. 2018) , 1 from IEEE Xplore (Wagh and Punde 2018), 1 from SpringerLink (Abdullah and Hadzikadic 2017)-and 2 SpringerLink organised/ shared task papers (Loukachevitch and Rubtsova 2015; Patra et al. 2015) were included, since the former papers focus on Twitter Sentiment Analysis methods whereas the latter papers focus on Sentiment Analysis of tweets (therefore the target social media platform of all evaluated papers is clear in both cases). None of the other 14 survey papers (Rajalakshmi et al. 2017; Yenkar and Sawarkar 2018; Abdelhameed and Muñoz-Hern'andez 2017; Rathan et al. 2017; Liu and Young 2018; Ravi and Ravi 2015; Nassirtoussi et al. 2014; Beigi et al. 2016; Lo et al. 2017; Ji et al. 2016; Batrinca and Treleaven 2015; Lin and He 2014) have been included, since various social media platforms were used in the respective studies evaluated. In addition, 2 papers that presented a general approach which can be applied/used on social data (i.e., not on any source) (Min et al. 2013; El Haddaoui et al. 2018 ) have also not been included. Out of these studies, 429 made use of 1 social media platform, whereas 32 made use of 2-4 social media platforms, as can be seen in Fig. 2 . With respect to social media platforms, in total 504 were used across all of the studies. These span over the following 18 different ones, which are also listed in Table 4 : 1. Twitter : a microblogging platform that allows publishing of short text updates (''microposts''); 2. Sina Weibo : a Chinese microblogging platform that is like a hybrid of Twitter and Facebook; 3. Facebook : a social networking platform that allows users to connect and share content with family and friends online; 4. YouTube 24 : a video sharing platform; 5. Tencent Weibo 25 : a Chinese microblogging platform; 6. TripAdvisor : a travel platform that allows people to post their reviews about hotels, restaurants and other travel-related content, besides offering accommodation bookings; 7. Instagram 26 : a platform for sharing photos and videos from a smartphone; 8. Flickr 27 : an image-and video-hosting platform that is popular for sharing personal photos; 9. Myspace 28 : a social networking platform for musicians and bands to show and share their talent and connect with fans; 10. Digg 29 : a social bookmarking and news aggregation platform that selects stories to the specific audience; 11. Foursquare 30 : formerly a location-based service and nowadays a local search and discovery service mobile application known as Foursquare City Guide; 12. Stocktwits 31 : a social networking platform for investors and traders to connect with each other; 13. LinkedIn 32 : a professional networking platform that allows users to communicate and share updates with colleagues and potential clients, job searching and recruitment; 14. Plurk 33 : a social networking and microblogging platform; 15. Weixin 34 : a Chinese multi-purpose messaging and social media app developed by Tencent; 16. PatientsLikeMe 35 : a health information sharing platform for patients; 17. Apontador 36 : a Brazilian platform that allows users to share their opinions and photos on social networks and also book hotels and restaurants; 18. Google? 37 : formerly a social networking platform (shut down in April 2019) that included features such as posting photos and status updates, group different relationship types into Circles, organise events and location tagging. Overall, Twitter was the most popular with 371 opinion mining studies making use of it, followed by Sina Weibo with 46 and Facebook with 30. Other popular platforms such as YouTube (12), Tencent Weibo (8), TripAdvisor (7), Instagram (6) and Flickr (5) were also used in a few studies. These results show the importance and popularity of microblogging platforms, such as Twitter and Sina Weibo, which are also very frequently used for research and development purposes in this domain. Such microblogging platforms provide researchers the possibility of using an Application Programming Interface (API) to access social data, which plays a crucial role in selecting them for their studies. On the other hand, data retrieval from other social media platforms such as Facebook, is becoming more challenging due to ethical concerns. For example, Facebook access to the Public Feed API 38 is restricted and users cannot apply for it. For this analysis, 465 studies were evaluated: 65 from ACM, 154 from IEEE Xplore, 32 from ScienceDirect, 180 from SpringerLink and 34 additional ones. Studies excluded are the ones with no full access, surveys, and organised task papers. The main aim was to identify the technique/s used for the opinion mining process on social data. Therefore, they were categorised under the following approaches: Lexicon (Lx), Machine Learning (ML), Deep Learning (DL), Statistical (St), Probabilistic (Pr), Fuzziness (Fz), Rule (Rl), Graph (Gr), Ontology (On), Hybrid (Hy) -a combination of more than one technique, Manual (Mn) and Other (Ot). Table 5 provides the yearly statistics for all the respective approaches adopted. From the studies analysed, 88 developed and used more than 1 technique within their respective studies. These techniques include the ones originally used in their approach and/ or ones used for comparison/baseline/experimentation purposes. In particular, from these 88 studies, 65 used 2 techniques each, 17 studies used 3 techniques, 4 studies used 4 techniques, and 2 studies made use of 5 techniques, which totals to 584 techniques used across all studies (including the studies that used 1 technique). The results show that a hybrid approach is the most popular one, with over half of the studies adopting such an approach. This is followed by Machine Learning and Lexicon techniques, which are usually chosen to perform any form of opinion mining. These results are explained in more detail in the sub-sections below. In total 94 unique studies adopted a lexicon-based approach to perform a form of SOM, which produced a total of 96 different techniques 39 . The majority of the lexicons used were specifically related to opinions and are well known in this domain, whereas the others that were not can still be used for conducting opinion mining. Table 6 presents the number of lexicons (first row and columns titled 1-8) used by the lexicon-based studies (second row). The column titled ''Other/NA'' refers to any other Table 6 Lexicon-based studies studies; 4. MPQA-Subjectivity 46 (Wilson et al. 2005 )-used in 8 studies; 5. HowNet Sentiment Analysis Word Library (HowNetSenti) 47 -used in 6 studies; 6. NRC Word-Emotion Association Lexicon (also known as NRC Emotion Lexicon or EmoLex) 48 Turney 2010, 2013) , WordNet 49 (Miller 1995) and Wikipedia-list of emoticons 50 -used in 5 studies. In addition to the lexicons mentioned above, 19 studies used lexicons that they created as part of their work or specifically focused on creating SOM lexicons, such as (Å rup Nielsen 2011) who created the AFINN word list for sentiment analysis in microblogs, (Javed et al. 2014 ) who built a bilingual sentiment lexicon for English and Roman Urdu, (Santarcangelo et al. 2015) the creators of the first Italian sentiment thesaurus, for Chinese sentiment analysis and (Bandhakavi et al. 2016 ) for sentiment analysis on Twitter. These lexicons varied from social media focused lexicons Ghiassi and Lee 2018; Pollacci et al. 2017) , to sentiment and/or emoticon lexicons Molina-González et al. 2014; Khuc et al. 2012; Ranjan et al. 2018; Vo et al. 2017; Feng et al. 2015; Wang and Wu 2015; Zhou et al. 2014 ) and extensions of existing state-of-theart lexicons Pandarachalil et al. 2015; Andriotis et al. 2014) , such as ) who extended HowNetSenti with words manually collected from the internet, and (Pandarachalil et al. 2015) who built a sentiment lexicon from SenticNet 51 and SentiWordNet for slang words and acronyms. 40 Adverbs/adverbial phrases that strengthen the meaning of other expressions and show emphasis e.g., very, extremely. 41 Words/phrases that reduce the force of another word/phrase e.g., hardly, slightly. A total of 121 studies adopted a machine learning-based approach to perform a form of SOM, where several supervised and unsupervised algorithms were used. Table 7 below presents the number of machine learning algorithms (first row and columns titled 1-7) used by the machine learning-based studies (second row). The column titled ''NA'' refers to studies who do not provide any information on the exact algorithms used. In total, 239 machine learning algorithms were used (not distinct) across 117 studies (since 4 studies did not provide any information), with 235 being supervised and 4 unsupervised. It is important to note that this figure does not include any supervised/semisupervised/unsupervised proposed algorithms by the respective authors, which algorithms shall be discussed below. Table 8 provides breakdown of the 235 supervised machine learning algorithms (not distinct) that were used within these studies. The NB and SVM algorithms are clearly the most popular in this domain, especially for text classification. With respect to the former, it is important to note that 20 out of the 75 studies used the Multinomial NB (MNB), which model is usually utilised for discrete counts i.e., the number of times a given term (word or token) appears in a document. The other 55 studies made use of the Multi-variate Bernoulli NB (MBNB) model, which is based on binary data, where every token in a feature vector of a document is classified with the value of 0 or 1. As for SVM, this method looks at the given data and sorts it in two categories (binary classification). If multi-class classification is required, the Support Vector Classification (SVC) 52 , NuSVC 53 or LinearSVC 54 algorithms are usually applied, where the ''one-against-one'' approach is implemented for SVC and NuSVC, whereas the ''one-vs-the-rest'' multi-class strategy is implemented for LinearSVC. The LoR statistical technique is also widely used in machine learning for binary classification problems. In total, 16 studies from the ones analysed, made use of this algorithm. DT learning has also been very much in use, which model uses a DT for both classification and regression problems. There are various algorithms in how a DT is built, with 2 studies using the C4.5 (Quinlan 1993 )-an extension of Quinlan's Iterative Dichotomiser 3 (ID3) algorithm, used for classification purposes, 3 studies using J48, a simple C4.5 DT for classification (Weka's implementation 55 ), 2 using the Hoeffding Tree (Hulten et al. 2001) and the other 8 using the basic ID3 algorithm. MaxEnt, used by 12 studies, is a probabilistic classifier that is also used for text classification problems, such as sentiment analysis. More specifically, it is generalisation of LoR for multi-class scenarios . RF was used in 9 studies, where this supervised learning algorithm -which can be used for both classification and regression tasks-creates a forest (which is an ensemble of DTs) and makes it somehow random. Moreover, 7 studies used the KNN algorithm, one of the simplest classification algorithms where no learning is required, since the model structure is determined from the entire dataset. The SentiStrength algorithm, utilised by 5 studies Lu et al. 2015; Baecchi et al. 2016; Yan et al. 2017; , can be used in both 52 https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC. 53 https://scikit-learn.org/stable/modules/generated/sklearn.svm.NuSVC.html#sklearn.svm.NuSVC. 54 https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC. 55 http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html. Machine learning-based studies In addition, the PA algorithm was used in 2 studies Filice et al. 2014 ). In the case of the former , this algorithm was used in a collaborative online learning framework to automatically classify whether a post is emotional or not, thereby overcoming challenges faced by the diversity of microblogging styles which increase the difficulty of classification. The authors in the latter study ) extend the budgeted PA algorithm to enable robust and efficient natural language learning processes based on semantic kernels. The proposed online learning learner was applied to two real world linguistic tasks, one of which was sentiment analysis. Nine other algorithms were used by 7 different studies, namely: Bagging , BN , CRB (Raja and Swamynathan 2016) , AB (Raja and Cortes and Vapnik (1995) Logistic Regression (LoR) 16 McCullagh (1984) Decision Tree (DT) 15 Quinlan (1986) Maximum Entropy (MaxEnt) 12 Jaynes (1957) Random Forest (RF) 9 Breiman ( In terms of unsupervised machine learning algorithms, 4 were used in 2 of the 80 studies that used a machine learning-based technique. Suresh and Raj S. used the K-Means (KM) (Lloyd 1982) and Expectation Maximization (Dempster et al. 1977) clustering algorithms in Suresh (2016) . Both were used for comparison purposes to an unsupervised modified fuzzy clustering algorithm proposed by authors. The proposed algorithm produced accurate results without manual processing, linguistic knowledge or training time, which concepts are required for supervised approaches. Baecchi et al. (Baecchi et al. 2016 ) used two unsupervised algorithms, namely Continuous Bag-Of-Word (CBOW) (Mikolov et al. 2013) and Denoising Autoencoder (DA) (Vincent et al. 2008 ) (the SGD and backpropagation algorithms were used for the DA learning process), and supervised ones, namely LoR, SVM and SentiStrength, for constructing their method and comparison purposes. They considered both textual and visual information in their work on sentiment analysis of social network multimedia. Their proposed unified model (CBOW-DA-LoR) works in both an unsupervised and semi-supervised manner, whereby learning text and image representation and also the sentiment polarity classifier for tweets containing images. Other studies proposed their own algorithms, with some of the already established algorithms discussed above playing an important role in their implementation and/or comparison. Zimmermann et al. proposed a semi-supervised algorithm, the S*3Learner ) which suits changing opinion stream classification environments, where the vector of words evolves over time, with new words appearing and old words disappearing. Severyn et al. (2016) defined a novel and efficient tree kernel function, the Shallow syntactic Tree Kernel, for multi-class supervised sentiment classification of online comments. This study focused on YouTube which is multilingual, multimodal, multidomain and multicultural, with the aim to find whether the polarity of a comment is directed towards the source video, product described in the video or another product. Furthermore, Ignatov and Ignatov (2017) presented a novel DT-based algorithm, a Decision Stream, where Twitter sentiment analysis was one of several common machine learning problems that it was evaluated on. Lastly, Fatyanosa et al. (2018) enhanced the ability of the NB classifier with an optimisation algorithm, the Variable Length Chromosome Genetic Algorithm (VLCGA), thereby proposing VLCGA-NB for Twitter sentiment analysis. Moreover, the following 13 studies proposed an ensemble method or evaluated ensemble-based classifiers: • Ç eliktug (2018) of classifiers, which is more accurate than a single one for classifying tweets; • Neethu and Rajasree (2013) used an ensemble classifier (and single algorithm classifiers) for sentiment classification. Ensembles created usually result in providing more accurate classification answers when compared to individual classifiers, i.e., classic learning approaches. In addition, ensembles reduce the overall risk of choosing a wrong classifier especially when applying it on a new dataset (Da Silva et al. 2014 ). Deep learning is a subset of machine learning based on Artificial Neural Networks (ANNs) -algorithms inspired by the human brain-where there are connections, layers and neurons for data to propagate. A total of 35 studies adopted a deep learning-based approach to perform a form of SOM, where supervised and unsupervised algorithms were used. Twenty six (26) of the studies made use of 1 deep learning algorithm, with 5 utilising 2 algorithms and 2 studies each using 3 and 4 algorithms, respectively. Table 9 provides breakdown of the 50 deep learning algorithms (not distinct) used within these studies. LSTM, a prominent variation of the RNN which makes it easier to remember past data in memory, was used in 13 studies Sun et al. 2018; Sanyal et al. 2018; Ameur et al. 2018; Wazery et al. 2018; Chen and Wang 2018; Sun et al. 2017; Hu et al. 2017; Shi et al. 2017; Yan and Tao 2016) , thus making it the most popular deep learning algorithm amongst the evaluated studies. Three further studies (Ameur et al. 2018; Balikas et al. 2017; used the BLSTM, an extension of the traditional LSTM which can improve model performance on sequence classification problems. In particular, a BLSTM was used in Balikas et al. (2017) to improve the performance of fine-grained sentiment classification, which approach can benefit sentiment expressed in different textual types (e.g., tweets and paragraphs), in different languages and different granularity levels (e.g., binary and ternary). Similarly, proposed a language-independent method based on BLSTM models for incorporating preceding microblogs for context-aware Chinese sentiment classification. The CNN algorithm -a variant of ANN-is made up of neurons that have learnable weights and biases, where each neuron receives an input, performs a dot product and optionally follows it with non-linearity. In total, 12 studies Ochoa-Luna and Ari 2018; Ameur et al. 2018; Adibi et al. 2018; Chen and Wang 2018; Shi et al. 2017; Wehrmann et al. 2017; Zhang et al. 2017; Stojanovski et al. 2015; Severyn and Moschitti 2015) made use of this algorithm. Notably, Wehrmann et al. (2017) propose a language-agnostic translation-free method for Twitter sentiment analysis. RNNs, a powerful set of ANNs useful for processing and recognising patterns in sequential data such as natural language, were used in 8 studies Ochoa-Luna and Ari 2018; Piñeiro-Chousa et al. 2018; Wazery et al. 2018; Pavel e al. 2017; Shi et al. 2017; Yan and Tao 2016; . One study in particular (Averchenkov et al. 2015) , considered a novel approach to aspect-based sentiment analysis of Russian social networks based on RNNs, where the best results were obtained by using a special network modification, the RNTN. Two further studies Sygkounas et al. 2016) Wazery et al. (2018) and Yan and Tao (2016) used the RNN and LSTM, whereas Sun et al. (2018) and Chen and Wang (2018) proposed new models based on CNN and LSTM. A total of 9 studies Kitaoka and Hasuike 2017; Arslan et al. 2017; Raja and Swamynathan 2016; Yang et al. 2014; Bukhari et al. 2016; Zhang et al. 2015; Supriya et al. 2016 ) adopted a statistical approach to perform a form of SOM. In particular, one of the approaches proposed in Arslan et al. (2017) uses the term frequency-inverse document frequency (tf-idf) (Salton and McGill 1986) numerical statistic to find out the important words within a tweet, to dynamically enrich Twitter specific dictionaries created by the authors. The tf-idf is also one of several statisticalbased techniques used in for comparing the proposed novel feature weighting approach for Twitter sentiment analysis. Moreover, Raja and Swamynathan (2016) focuses on a statistical sentiment score calculation technique based on adjectives, whereas Yang et al. (2014) use a variation of the point-wise mutual information to measure the opinion polarity of an entity and its competitors, which method is different from the traditional opinion mining way. A total of 6 studies (Bhattacharya and Banerjee 2017; Baecchi et al. 2016; Ragavi and Usharani 2014; Lek and Poo 2013) adopted a probabilistic approach to perform a form of SOM. In particular, Ou et al. (2014) propose a novel probabilistic model in the Content and Link Unsupervised Sentiment Model, where the focus is on microblog sentiment classification incorporating link information, namely behaviour, same user and friend. Two studies (D'Asaro et al. 2017; Del Bosque and Garza 2014) adopted a fuzzy-based approach to perform a form of SOM. D'Asaro et al. (2017) present a sentiment evaluation and analysis system based on fuzzy linguistic textual analysis. Del Bosque and Garza (2014) assume that aggressive text detection is a sub-task of sentiment analysis, which is closely related to document polarity detection given that aggressive text can be seen as intrinsically negative. This approach considers the document's length and the number of swear words as inputs, with the output being an aggressiveness value between 0 and 1. In total, 4 studies (El Haddaoui et al. 2018; Zhang et al. 2014; Min et al. 2013; Bosco et al. 2013 ) adopted a rule-based approach to perform a form of SOM. Notably, Bosco et al. (2013) applied an approach for automatic emotion annotation of ironic tweets. This relies on sentiment lexicons (words and expressions) and sentiment grammar expressed by compositional rules. Four studies Vilarinho and Ruiz 2018; Rabelo et al. 2012 ) adopted a graph-based approach to perform a form of SOM. The study in Vilarinho and Ruiz (2018) presents a word graph-based method for Twitter sentiment analysis using global centrality metrics over graphs to evaluate sentiment polarity. In Dritsas et al. (2018) , a graph-based method is proposed for sentiment classification at a hashtag level. Moreover, the authors in compare their proposed multimodal hypergraph-based microblog sentiment prediction approach with a combined hypergraph-based method (Huang et al. 2010) . Lastly, Rabelo et al. (2012) used link mining techniques to infer the opinions of users. Two studies Kontopoulos et al. 2013) adopted an ontology-based approach to perform a form of SOM. In particular, the technique developed in Kontopoulos et al. (2013) performs more fine-grained sentiment analysis of tweets where each subject within the tweets is broken down into a set of aspects, with each one being assigned a sentiment score. Hybrid approaches are very much in demand for performing different opinion mining tasks, where 244 unique studies (out of 465) adopted this approach and produced a total of 282 different techniques 56 . Tables 10 and 11 lists these studies, together with the type of techniques used for each. In total, there were 38 different hybrid approaches across the analysed studies. The majority of these studies used two different techniques (213 out of 282)-see Table 10 -within their hybrid approach, whereas 62 used three and 7 studies used four different techniques -see Table 11 . The Lexicon and Machine Learning-based techniques were mostly used, where they accounted for 40% of the hybrid approaches, followed by Lexicon and Statistical-based (7.8%), Machine Learning and Statistical-based (7.4%), and Lexicon, Machine Learning and Statistical-based (7.4%) techniques. Moreover, out of the 282 hybrid approaches, 232 used lexicons, 205 used Machine Learning and 39 used Deep Learning. These numbers reflect the importance of these three techniques within the SOM research and development domain. In light of these, a list of lexicons, machine learning and deep learning algorithms used in these studies have been compiled, similar to Sects. 3.2.1, 3.2.2 and 3.2.3 above. The lexicons, machine learning and deep learning algorithms quoted below were either used in the proposed method/s and/ or for comparison purposes in the respective studies. In terms of state-of-the-art lexicons, these total 403 within the studies adopting a hybrid approach. The top ones align with the results obtained from the lexicon-based approaches in Sect. 3.2.1 above. The following are the lexicons used for more than ten times across the hybrid approaches: 1. SentiWordNet-used in 51 studies; 2. MPQA-Subjectivity-used in 28 studies; 3. Hu and Liu-used in 25 studies; 4. WordNet-used in 24 studies; 5. AFINN-used in 22 studies; 6. SentiStrength-used in 21 studies; 7. HowNetSenti-used in 15 studies; 8. NRC Word-Emotion Association Lexicon-used in 13 studies; 9. NRC Hashtag Sentiment Lexicon 57 -used in 12 studies; 10. SenticNet, Sentiment140 Lexicon (also known as NRC Emoticon Lexicon) 58 , National Taiwan University Sentiment Dictionary (NTUSD) (Ku et al. 2006) and Wikipedia list of emoticons -used 11 studies. Further to the quoted lexicons, 49 studies used lexicons that they created as part of their work. Some studies composed their lexicons from emoticons/emojis that were extracted from a dataset Li and Fleyeh 2018; Azzouza et al. 2017; Zimbra et al. 2016; You and Tunçer 2016; Cui et al. 2011; Zhang et al. 2012; Vu et al. 2012) , combined publicly available emoticon lexicons/lists (Siddiqua et al. 2016) or mapped emoticons to their corresponding polarity (Tellez et al. (2015), , Kanakaraj and Guddeti (2015) , Jianqiang (2015) , Koto and Adriani (2015), Wu et al. (2015) , Shukri et al. (2015) , Sahu et al. (2015) , Lewenberg et al. (2015) , 2017), and others Souza et al. 2016; Su et al. 2014; Tang et al. 2013; Cui et al. 2011; Zhang et al. 2012; Li and Xu 2014) used seed/feeling/emotional words to establish a microblog typical emotional dictionary. Additionally, some authors constructed or used sentiment lexicons Vo et al. 2017; Rout et al. 2017; Jin et al. 2017; Ismail et al. 2018; Yan et al. 2017; Katiyar et al. 2018; Al Shammari 2018; Abdullah and Zolkepli 2017; Liu and Young 2016; Sahu et al. 2015; Cho et al. 2014; Jiang et al. 2013; Cui et al. 2013; Khuc et al. 2012; Montejo-Raez et al. 2014; Rui et al. 2013) some of which are domain or language specific (Konate and Du 2018; Hong and Sinnott 2018; Chen et al. 2017; Zhao et al. 2016; Lu et al. 2016; Zhou et al. 2014; , others that extend state-of-the-art lexicons , and some who made them available to the research community (Cotfas et al. 2017; such as the Distributional Polarity Lexicon 59 . Table 12 below presents a list of machine learning algorithms -in total 381 in 197 studies-that were used within the hybrid approaches. The first column indicates the algorithm, the second lists the type of learning algorithm, in terms of Supervised (Sup), Unsupervised (Unsup) and Semi-supervised (Semi-sup), and the last column lists the total number of studies using each respective algorithm. The SVM and NB algorithms were mostly used in supervised learning, which result corresponds to the machine learningbased approaches in Sect. 3.2.2 above. With respect to the latter, 76 studies used the MBNB algorithm, 19 studies the MNB and 1 study the Discriminative MNB. Moreover, the LoR, DT -namely the basic ID3 (10 studies), J48 (5 studies), C4.5 (5 studies), Classification And Regression Tree (3 studies), Reduced Error Pruning (1 study), DT with AB (1 study), McDiarmid Tree (McDiarmid 1989) (1 study) and Hoeffding Tree (1 study) algorithms, RF, MaxEnt and SentiStrength (used in both supervised and unsupervised settings) algorithms were also in various studies. Notably, some additional algorithms from the ones used in the machine learning-based approaches in Sect. 3.2.2 above, were used in a hybrid approach, in particular, SVR (Drucker et al. 1997) , Extremely Randomised Trees (Geurts et al. 2006) , Least Median of Squares Regression (Rousseeuw 1984) , Maximum Likelihood Estimation (Fisher 1925) , Hyperpipes (Witten et al. 2016) , Extreme Learning Machine (Huang et al. 2006) , Domain Adaptation Machine (Duan et al. 2009 ), RIPPER (Cohen 1995) , Affinity Propagation (Frey and Dueck 2007) , Multinomial inverse regression (Taddy 2013) , Apriori (Agrawal et al. 1994) , Distant Supervision (Go et al. 2009 ) and Label Propagation (Zhu and Ghahramani 2002) . Given that deep learning is a subset of machine learning, the algorithms used within the hybrid approaches are presented below. In total, 36 studies used the following deep learning algorithms: • Deep Belief Network (Hinton and Salakhutdinov 2006) , a probabilistic generative model that is composed of multiple layers of stochastic, latent variables-used in 2 studies (Jin et al. 2017; Tang et al. 2013) ; • GRU-used in 1 study ); • Generative Adversarial Networks (GAN) (Goodfellow et al. 2014) , are deep neural net architectures composed of a two networks, a generator and a discriminator, pitting one against the other-used in 1 study ); • Conditional GAN (Mirza and Osindero 2014) , a conditional version of GAN that can be constructed by feeding the data that needs to be conditioned on both the generator and discriminator-used in 1 study ); • Hierarchical Attention Network, a neural architecture for document classification (Yang et al. 2016 ), used in 1 study ). In total, 23 studies did not adopt any of the previous approaches (discussed in Sects. 3.2.1-3.2.10). This is mainly due to three reasons: no information provided by the authors (13 studies), use of an automated approach (4 studies), or use of a manual approach (6 studies Numerous datasets were used across the 465 studies evaluated for this systematic review. These consisted of SOM datasets released online for public use -which have been widely used across the studies-and newly collected datasets, some of which were made available for public use or else for private use within the respective studies. In terms of data collection, the majority of them used the respective platform's API, such as the Twitter Search API 62 , either directly or through a third-party library, e.g., Twitter4J 63 . Due to the large number of datasets, only the ones mostly used shall be discussed within this section. In addition, only social datasets are mentioned irrespective of whether other nonsocial datasets (e.g., news, movies, etc.,) were used, given that the main focus of this review is on social data. The first sub-section (Sect. 3.3.1) presents an overview of the top social datasets used, whereas the second sub-section (Sect. 3.3.2) presents a comparative analysis of the studies that produced the best performance for each respective social dataset. The following are the top ten social datasets used across all studies: continues with a re-run of dataset 10, where two new changes were introduced; inclusion of the Arabic language for all subtasks and provision of profile information of the Twitter users that posted the target tweets. All the datasets above are textual, with the majority of them composed of social data from Twitter. From the datasets above, in terms of language, only the SE-Twitter (number 13) social dataset can be considered as multilingual, with the rest targeting English (majority) or Chinese microblogs, whereas SemEval 2017-Task 4 (number 14) introduced a new language in Arabic. An additional dataset is the one produced by Mozetič et al., which contains 15 Twitter sentiment corpora for 15 European languages (Mozetič et al. 2016) . Some studies such as Munezero et al. (2015) used one of the English-based datasets above (STS-Gold) for multiple languages, given that they adopted a lexicon-based approach. Moreover, these datasets had different usage within the respective studies, with the most common being used as a training/test set, the final evaluation of the proposed solution/ lexicon, or for comparison purposes. Evaluation challenges like SemEval are important to generate social datasets such as the above and Cortis et al. (2017) , since these can be used by the Opinion Mining community for further research and development. A comparative analysis of all the studies that used the social datasets presented in the previous sub-section was carried out. The Precision, Recall, F-measure (F1-score), and Accuracy metrics were selected to evaluate the said studies (when available) and identify the best performance for each respective social dataset. It is important to note that for certain datasets, this could not be done, since the experiments conducted were not consistent across all the studies. The top three studies (where possible) obtaining the best results for each of the four evaluation metrics are presented in the tables below. Tables 13 and 14 provide the best results for the STS and Sanders datasets. Tables 15 and 16 provide the best results for the SemEval 2013-Task 2 and SemEval 2014-Task 9 datasets, specifically for sub-task B, which focused on message polarity classification. Moreover, the results obtained by the participants of this shared task should be reviewed for a more representative comparative evaluation. Tables 17, 18 and 19 provide the best results for the STS-Gold, HCR and OMD datasets. Table 20 provides the best results for the SemEval 2015-Task 10 dataset, specifically for sub-task B, which focused on message polarity classification. Moreover, the results obtained by the participants of this shared task should be reviewed for a more representative comparative evaluation. Table 21 provides the best results for the SS-Twitter dataset. Table 22 provides the best results for the SemEval 2016-Task 4 dataset, specifically for sub-task A, which focused on message polarity classification. Moreover, the results obtained by the participants of this shared task should be reviewed for a more representative comparative evaluation. Tables 23 and 24 provide the best results for the NLPCC 2012 dataset. Results quoted below are for task 1 which focused on subjectivity classification (see Table 23 ) and task 2 which focused on sentiment polarity classification (see Table 24 ). Moreover, the results obtained by the participants of this shared task should be reviewed for a more representative comparative evaluation. Tables 25 and 26 provide the best results for the NLPCC 2013 and SE-Twitter datasets. Table 27 provides the best results for the SemEval 2017-Task 4 dataset, specifically for sub-task A, which focused on message polarity classification. Moreover, the results obtained by the participants of this shared task should be reviewed for a more representative comparative evaluation. The following are some comments regarding the social dataset results quoted in the tables above: (Jianqiang 2016) 80.98% (Chikersal et al. 2015) 81.99% (Jianqiang et al. 2018) 83.82% (Jianqiang 2016) 3 78.93% (Chikersal et al. 2015) 76.89% (Jianqiang 2016) 79.81% (Chikersal et al. 2015) 83.06% (Jianqiang and Xueliang 2015) • In cases where several techniques and/or methods were applied, the highest result obtained in the study for each of the four evaluation metrics, was recorded, even if the technique did not produce the best result for all metrics. • The average Precision, Recall, and F-measure results are quoted (if provided by authors), i.e., average score of the results for each classified level (e.g., the average score of the results obtained for each sentiment polarity classification level -positive, negative and, neutral). • Results for social datasets that were released as a shared evaluation task, such as SemEval, were either only provided in the metrics used by the task organisers or other metrics were chosen by the authors, therefore not quoted. • Certain studies evaluated their techniques based on a subset of the actual dataset. Results quoted are the ones where the entire dataset was used (according to the authors and/our our understanding). • Quoted results are for classification tasks and not aspect-based SOM, which can vary depending on the focus of the study. • Results presented in a graph visualisation were not considered due to the exact values not being clear. Multilingual/bilingual SOM is very challenging, since it deals with multi-cultural social data. For example, analysing Chinese and English online posts can bring a mixed sentiment on such posts. Therefore, it is hard for researchers to make a fair judgement in cases where online posts' results from different languages contradict each other ). The majority of the studies (354 out of 465) considered for this review analysis support one language in their SOM solutions. A total of 80 studies did not specify whether their proposed solution is language-agnostic or otherwise, or else their modality was not textualbased. Lastly, only 31 studies cater for more than one language, with 18 being bilingual, 1 being trilingual and 12 proposed solutions claiming to be multilingual. Regarding the latter, the majority were tested on a few languages at most, with 77.80% 79.30% ) NA 2 78.60% 74.60% (Feng et al. 2015) 69.14% ) NA 3 70.83% 67.52% 67.10% NA • English and Italian D'Avanzo and Pilato 2015; Pupi et al. 2014 ); • English and German Tumasjan et al. 2010 ); • English and Spanish (Giachanou et al. 2017; Cotfas et al. 2015; Delcea et al. 2014 (Jianqiang et al. 2018) 87.32% (Jianqiang et al. 2018) 87.66% (Jianqiang et al. 2018) 87.39% (Jianqiang et al. 2018) 2 86.16% (Jianqiang 2016) 86.15% (Jianqiang 2016) 86.08% (Jianqiang 2016) 86.72% (Jianqiang 2016) 3 NA NA NA 85.87% (Jianqiang and Xueliang 2015) Moreover, Table 28 provides a list of the non-English languages identified from the 354 studies that support one language. Chou et al. (2017) claim that their method can be easily applied to any ConceptNet 72 supported language, with similarly claiming that their method is language independent, whereas the solution by Wang and Wu (2015) is multilingual given that emoticons are used in the majority of languages. The majority of the studies in this systematic review and in the state-of-the-art focus on SOM on the textual modality, with only 15 out of 465 studies applying their work on more than one modality. However, other modalities, such as visual (image, video), and audio information is often ignored, even though it contributes greatly towards expressing user emotions . Moreover, when two or more modalities are considered together for any form of social opinion, such as emotion recognition, they are often complementary, thus increase the system's performance (Caschera et al. 2016) . Table 29 lists the multimodal studies within the review analysis, with the ones catering for two modalities -text and image-being the most popular. Current available datasets and resources for SOM are restricted to the textual modality only. The following are the non-textual social datasets (not listed in Sect. 3.3) used across the mentioned studies: • YouTube Dataset (Morency et al. 2011 ) used in Poria et al. (2016 : 47 videos targeting various topics, such as politics, electronics and product reviews. • SentiBank Twitter Dataset 73 (Borth et al. 2013) used in Baecchi et al. (2016) and Cai and Xia (2015) : Image dataset from Twitter annotated for polarity using Amazon Mechanical Turk. Tweets with images related to 21 hashtags (topics) resulted in 470 being positive and 133 being negative. • SentiBank Flickr Dataset (Borth et al. 2013) used in Cai and Xia (2015) : 500,000 image posts from Flickr labeled by 1553 adjective noun pairs based on Plutchik's Wheel of Emotions (psychological theory) (Plutchik 1980 ). • You Image Dataset used in Cai and Xia (2015) : Image dataset from Twitter consisting of 769 positive and 500 negative tweets with images, annotated using Amazon Mechanical Turk. The novel methodology by Poria et al. (2016) (2015), , , Sui et al. (2012) , Yanmei and Yuda (2015) , Liu et al. (2015) , Zhang et al. (2015) , , Tian et al. (2015) , Feng et al. (2015) , Song et al. (2015) , Jiang et al. (2015) , Kuo et al. (2016) , , Wang and Wu (2015) , Du et al. (2014) , Gao et al. (2015) , Chen et al. Vietnamese 1 Vo et al. (2017) and audio. Sentiments are extracted from social Web videos. In Caschera et al. (2016) , the authors propose a method whereby machine learning techniques need to be trained on different and heterogeneous features when used on different modalities, such as polarity and intensity of lexicons from text, prosodic features from audio, and postures, gestures and expressions from video. The sentiment of video and audio data in Song and Gruzd (2017) was manually coded, which task is labour intensive and time consuming. The addition of images to the microblogs' textual data reinforces and clarifies certain feelings Baecchi et al. 2016) , thus improving the sentiment classifier with the image features Zhang et al. 2015; Cai and Xia 2015) . Similarly, also demonstrates superiority with their multimodal hypergraph method when compared to single modality (in this case textual) methods. Moreover, these results are further supported by the method in Poria et al. (2016) -which caters for more than two modalities, in audio, visual and textual-where it shows that accuracy improves drastically when such modalities are used together. Flaes et al. (2016) apply their multimodal (text, images) method in a real world application area, which research shows that several relationships exist between city liveability indicators collected by the local government and sentiment that is extracted automatically. For example, a negative linear association of detected sentiment from Flickr data is related with people living on welfare checks. Results in Rai et al. (2018) show that there is a high correlation between sentiment extracted from text-based social data and image-based landscape preferences by humans. In addition, results in Yuan et al. (2015) show some correlation between image and textual tweets. However, the authors mention that more features and robust data is required to determine the exact influence of multimedia content in the social domain. The work in Chen et al. (2017) adopts a bimodal approach to solve the problem of cross-domain image sentiment classification by using textual features and visual features from the target domain and measuring the text/image similarity simultaneously. Therefore, multimodality in the SOM domain is one of numerous research gaps identified in this systematic review. This provides researchers with an opportunity towards further research, development and innovation in this area. In this systematic review, we also analysed the tool and technologies that were used across all studies for various opinion mining operations conducted on social data, such as NLP, Rai et al. (2018) , Saini et al. (2018) , Chen et al. (2017) , , Baecchi et al. (2016) , Liu et al. (2015) , Zhang et al. (2015) , , Flaes et al. (2016) , Cai and Xia (2015) and Yuan et al. (2015) machine learning, and big data handling. The subsections below provide respective lists for the ones mostly used across the studies for the various operations required. The following are the top 5 NLP tools used across all studies for various NLP tasks: • Natural Language Toolkit (NLTK) 75 : a platform that provides lexical resources, text processing libraries for classification, tokenisation, stemming, tagging, parsing, and semantic reasoning, and wrappers for industrial NLP libraries; • TweetNLP 76 : consists of a tokeniser, Part-of-Speech (POS) tagger, hierarchical word clusters, and a dependency parser for tweets, besides annotated corpora and web-based annotation tools; • Stanford NLP 77 : software that provides statistical NLP, deep learning NLP and rulebased NLP tools, such as Stanford CoreNLP, Stanford Parser, Stanford POS Tagger; • NLPIR-ICTCLAS 78 : a Chinese word segmentation system that includes keyword extraction, POS tagging, NER, and microblog analysis, amongst other features; • word2vec 79 : an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. The top 5 machine learning tools used across all studies are listed below: • Weka 80 : a collection of machine learning algorithms for data mining tasks, including tools for data preparation, classification, regression, clustering, association rules mining and visualisation; • scikit-learn 81 : consists of a set of tools for data mining and analysis, such as classification, regression, clustering, dimensionality reduction, model selection and preprocessing; • LIBSVM 82 : an integrated software for support vector classification, regression, distribution estimation and multi-class classification; • LIBLINEAR 83 : a linear classifier for data with millions of instances and features; • SVM-Light 84 : is an implementation of SVMs for pattern recognition, classification, regression and ranking problems. Certain studies used opinion mining tools in their research to either conduct their main experiments or for comparison purposes to their proposed solution/s. The following are the top 3 opinion mining tools used: • SentiStrength 85 : a sentiment analysis tool that is able to conduct binary (positive/ negative), trinary (positive/neutral/negative), single-scale (-4 very negative to very positive ?4), keyword-oriented and domain-oriented classifications; • Sentiment140 86 : a tool that allows you to discover the sentiment of a brand, product, or topic on Twitter; • VADER (Valence Aware Dictionary and sEntiment Reasoner) 87 : a lexicon and rulebased sentiment analysis tool that is specifically focused on sentiments expressed in social media. Several big data technologies were used by the analysed studies. The most popular ones are categorised in the list below: The MySQL relational database management system was the most technology used for storing structured social data, whereas MongoDB was mostly used for processing unstructured social data. On the other hand, the distributed processing technologies were used for processing large scale social real-time and/or historical data. In particular, Hadoop MapReduce was used for parallel processing of large volumes of structured, semi-structured and unstructured social datasets, that are stored in the Hadoop Distributed File System. Spark's ability to process both batch and streaming data was utilised in cases where velocity is more important than volume. This section presents information about other NLP tasks that were conducted to perform SOM. An element of NLP is performed in 283 studies, out of the 465 analysed, either for preprocessing (248 studies), feature extraction (Machine Learning) or one of the processing parts within their SOM solution. The most common and important NLP tasks range from Tokenisation, Segmentation and POS, to NER and Language Detection. It is important to mention that the NLP tasks mentioned above together with Anaphora Resolution, Parsing, Sarcasm, and Sparsity, are some other challenges faced in the SOM domain . Moreover, online posts with complicated linguistic patterns are challenging to deal with Li and Xu (2014) . However, showcase the importance and potential of NLP within this domain, where they investigated the pattern or word combination of tweets in subjectivity and polarity by considering their POS sequence. Results reveal that subjective tweets tend to have word combinations consisting of adverb and adjective, whereas objective tweets tend to have a word combination of nouns. Moreover, negative tweets tend to have a word combination of affirmation words which often appear as a negation word. The majority (355 out of 465) of the studies performed some sort of pre-processing in their studies. Different methods and resources were used for such a process, such as NLP tasks (e.g., tokenisation, stemming, lemmatisation, NER), and dictionaries for stop words, acronyms for slang words, and others (e.g., noslang.com, noswearing.com, Urban Dictionary, Internet lingo). Negation handling is one of the most challenging issues faced by SOM solutions. However, 117 studies cater for negations within their approach, Several different methods are used, such as negation replacement, negation transformation, negation dictionaries, textual features based on negation words and negation models. Social media can be seen as a sub-language that uses emoticons/emojis mixed with text to show emotions (Min et al. 2013) . Emoticons/emojis are commonly used in tweets irrespective of the language, therefore are sometimes considered as being domain and language independent , thus useful for multilingual SOM (Cui et al. 2011) . Even though some researchers remove emoticons/emojis as part of their pre-processing stage (depending on what the authors want to achieve), many others have utilised the respective emotional meaning within their SOM process. This has led to emoticons/emojis in playing a very important role within 205 solutions of the analysed studies especially when the focus is on emotion recognition. Results obtained from the emoticon networks model in Zhang et al. (2013) show that emoticons can help in performing sentiment analysis. This is further supported by Jiang et al. (2015) who found that emoticons are a pure carrier of sentiment. This is further supported by the results obtained by the emoticon polarity-aware method in which show that emoticons can significantly improve the precision for identifying the sentiment polarity. In the case of hybrid (lexicon and machine learning) approaches, emoticon-aided lexicon expansion improve the performance of lexicon-based classifiers . From an emotion classification perspective, analysed users' emoticons on Twitter to improve the accuracy of predictions for the Dow Jones Industrial Average and S&P 500 stock market indices. Other researchers (Cvijikj and Michahelles 2011) were interested in analysing how people express emotions, displayed via adjectives or usage of internet slang i.e., emoticons, interjections and intentional misspelling. Several emoticon lists were used in these studies, with the Wikipedia and DataGenetics 102 ones commonly used. Moreover, emoticon dictionaries, such as (Agarwal et al. 2011; Aisopos et al. 2012; Becker et al. 2013) , consisting of emoticons and their corresponding polarity class were also used in certain studies. Word embeddings, a type of word representation which allows words with a similar meaning to have a similar representation, were used by several studies (Severyn and Moschitti 2015; Jiang et al. 2015; Cai and Xia 2015; Gao et al. 2015; Stojanovski et al. 2015; Zhao et al. 2016; Rexha et al. 2016; Hao et al. 2017; Kitaoka and Hasuike 2017; Arslan et al. 2018; Baccouche et al. 2018; Ghosal et al. 2018; Hanafy et al. 2018; Jianqiang et al. 2018; Stojanovski et al. 2018; Sun et al. 2018; Wan et al. 2018; Yan et al. 2018) adopting a learning-based (Machine Learning, Deep Learning and Statistical) or hybrid approach. These studies used word embedding algorithms, such as word2vec, fastText 103 , and/or GloVe. 104 Such a form of learned representation for text is capable of capturing the context of words within a piece of text, syntactic patterns, semantic similarity and relation with other words, amongst other word representations. Therefore, word embeddings are used for different NLP problems, with SOM being one of them. Sentence-level SOM approaches tend to fail in discovering an opinion dimension, such as sentiment polarity about a particular entity and/or its aspects (Cambria et al. 2013) . Therefore, an aspect-level (also referred to as feature/topic-based) (Hu and Liu 2004) approach -where an opinion is made up of targets and their associated opinion dimension (e.g., sentiment polarity)-has been used in some studies to overcome such issues. Certain NLP tasks, such as a parsing, POS tagger, and NER, are usually required to extract the entities or aspects from the respective social data. From all the studies analysed, 39 performed aspect-based SOM, with 37 (Bansal and Srivastava 2018; Dragoni 2018; Gandhe et al. 2018; Ghiassi and Lee 2018; Katz et al. 2018; Liu et al. 2018; Rathan et al. 2018; Zainuddin et al. 2018; Abdullah and Zolkepli 2017; Dambhare and Karale 2017; Hagge et al. 2017; Ray and Chakrabarti 2017; Rout et al. 2017; Tong et al. 2017; Vo et al. 2017; Zhou et al. 2017; Zimbra et al. 2016; Zainuddin et al. 2016 Zainuddin et al. , 2016 Kokkinogenis et al. 2015; Lima et al. 2015; Hridoy et al. 2015; Averchenkov et al. 2015; Tan et al. 2014; Lau et al. 2014; Del Bosque and Garza 2014; Varshney and Gupta 2014; Unankard et al. 2014; Lek and Poo 2013; Wang and Ye 2013; Min et al. 2013; Kontopoulos et al. 2013; Jiang et al. 2011; Prabowo and Thelwall 2009 ) focusing on aspect-based sentiment analysis, 1 (Aoudi and Malik 2018) on aspect-based sentiment and emotion analysis and 1 ) on aspect-based affect analysis. In particular, the Twitter aspect-based sentiment classification process in Lek and Poo (2013) consists of the following main steps: aspect-sentiment extraction, aspect ranking and selection, and aspect classification, whereas Lau et al. (2014) use NER to parse product names to determine their polarity. The aspect-based sentiment analysis approach in Hagge et al. (2017) leveraged POS tagging and dependency parsing. Moreover, Zainuddin et al. (2016) proposed a hybrid approach to analyse aspect-based sentiment of tweets. As the authors claim, it is more important to identify the opinions of tweets rather than finding the overall polarity which might not be useful to organisations. In Zainuddin et al. (2018) , the same authors used association rule mining augmented with a heuristic combination of POS patterns to find single and multi-word explicit and implicit aspects. Results in Jiang et al. (2011) show that classifiers incorporating target-dependent features significantly outperform target-independent ones. In contrast to the studies discussed, Weichselbraun et al. (2017) introduced an aspect-based analysis approach that integrates affective (includes sentiment polarity and emotions) and factual knowledge extraction to capture opinions related to certain aspects of brands and companies. The social data analysed is classified in terms of sentiment polarity and emotions, aligned with the ''Hourglass of Emotions'' (Susanto et al. 2020) . In terms of techniques, the majority of the aspect-based studies used a hybrid approach, where only 5 studies used deep learning for such a task. In particular, the study by Averchenkov et al. (2015) used a deep learning approach based on RNNs for aspect-based sentiment analysis. A comparative review of deep learning for aspect-based sentiment analysis published by Do et al. (2019) discusses current research in this domain. It focuses on deep learning approaches, such as CNN, LSTM and GRU, for extracting both syntactic and semantic features of text without the need for in-depth requirements for feature engineering as required by classical NLP. For future research directions on aspect-based SOM, refer to Sect. 6.2. An opinion describes a viewpoint or statement about a subjective matter. In many research problems, authors assume that an opinion is more specific and of a simpler definition. For example, sentiment analysis is considered to be a type of opinion mining even though it is only focused on extracting the sentimental score from a given text. Social data contains a wealth of signals to mine where opinions can be extracted over time. Different types of opinions require different modes of analysis . This leads to opinions being multi-dimensional semantic artefacts. In fact, Troussas et al. specify that ''emotions and polarities are mutually influenced by each other, conditioning opinion intensities and emotional strengths''. Moreover, multiple studies applied different approaches, where Bravo-Marquez et al. (2013) showed that a composition of polarity, emotion and strength features, achieve significant improvements over single approaches, whereas focused on finding the correlation between emotion-which can be differentiated by facial expression, voice intonation and also words-and sentiment in social media. Similar in nature, Buscaldi and Hernandez-Farias (2015) found out that finergrained negative tweets potentially help in differing between negative feelings, e.g., fear (emotion). Furthermore, mood, emotions and decision making are closely connected . Research on multi-dimensional sentiment analysis shows that human mood is very rich in social media, where a piece of text may contain multiple moods, such as calm and agreement . On the other hand, there are studies showing that one mood alone is already highly influential in encouraging people to rummage through Twitter feeds for predictive information. For example in Weiss et al. (2015) , ''calmness'' was highly correlated with stock market movement. Different dimenions of opinions are also able to effect different entities, such as events. Results in Zhang et al. (2012) show a strong correlation between emergent events and public moods. In such cases, new events can be identified by monitoring emotional vectors in microblogs. Moreover, work in Thelwall et al. (2011) assessed if popular events are correlated with sentiment strength as it increases, which is likely the case. All of the above motivates us to pursue further research and development on the identification of different opinion dimensions that are present within social data, such as microblogs, published across heterogeneous social media platforms. A more fine-grained opinion representation and classification of this social data shall lead to a better understanding of the messages conveyed, thus potentially influencing multiple application areas. Section 5 lists the application areas of the analysed studies. The analysed studies focused on different opinion dimensions, namely: objectivity/subjectivity, sentiment polarity, emotion, affect, irony, sarcasm and mood. These were conducted on different levels, such as, document-level, sentence-level, and/or feature/aspectbased, depending on the study. Same as for the techniques presented in Sect. 3.2, 465 studies were evaluated. The majority focused on one social opinion dimension, with 60 studies focusing on more than one; 58 on two dimensions, 1 on three dimensions, and 1 on four dimensions. In this regard, Table 30 lists the different dimensions and respective studies. Most of the studies focused on sentiment analysis, specifically polarity classification. The following sections present the different tasks conducted for each form of opinion mentioned above 105 . Subjectivity determines whether a sentence expresses an opinion -in terms of personal feelings or beliefs-or not, in which case a sentence expresses objectivity. Objectivity refers to sentences that express some factual information about the world (Liu 2010). In this domain, objective statements are usually classified as being neutral (in terms of polarity), whereas subjective statements are non-neutral. In the latter cases, sentiment analysis is performed to determine the polarity classification (more information on this below). However, it is important to clarify that neutrality and objectivity are not the same. Neutrality refers to situations whereby a balanced view is taken, whereas objectivity refers to factual based i.e., true statements/facts that are quantifiable and measurable. Sentiment determines the polarity (positive/negative/neutral) and strength/intensity (through a numeric rating score e.g., 1-5 stars, or level of depth e.g., low/high/medium) of an expressed opinion (Liu 2010) . Jiang et al. (2011 , Bravo-Marquez et al. (2013) , Zhu et al. (2013) , Wang and Ye (2013) , Cui et al. (2013) , , Rui et al. (2013) , Bravo-Marquez et al. (2014) , Tan et al. (2014) , Garg and Chatterjee (2014) , Abdul-Mageed et al. (2014) , Samoylov (2014), , , , Feng et al. (2015) , Mansour et al. (2015) , Wu et al. (2016) , Zainuddin et al. (2016) , Er et al. (2016) , Abdullah and Zolkepli (2017) , Hao et al. (2017) , Ahuja and Dubey (2017) , Sahni et al. (2017) , Moh et al. (2017) , Dritsas et al. (2018) , Gandhe et al. (2018) and Nausheen and Begum (2018) sentiment polarity and emotion Cvijikj and Michahelles (2011), Orellana-Rodriguez et al. (2013) , Sheth et al. (2014) , Yuan et al. (2015) , Orellana-Rodriguez et al. (2015) , Gallegos et al. (2016) , Qaisi and Aljarah (2016) , Shukri et al. (2015) , Munezero et al. (2015) , Barapatre et al. (2016) , Karyotis et al. (2017) , Bouazizi and Ohtsuki (2017) , Radhika and Sankar (2017) , Abdullah and Hadzikadic (2017) , Zhang et al. (2017) , Singh et al. (2018) , Aoudi and Malik (2018) , Pai and Alathur (2018) , Ghosal et al. (2018) , Rout et al. (2018) , dos and Stojanovski et al. (2018) sentiment polarity and mood Bollen et al. (2011) sentiment polarity and irony Reyes et al. (2013) sentiment polarity and sarcasm Unankard et al. (2014) sentiment polarity and affect Weichselbraun et al. (2017) emotion and anger Delcea et al. (2014) and Cotfas et al. (2015) irony and sarcasm Fersini et al. (2015) subjectivity, sentiment polarity and emotion Jiang et al. (2015) subjectivity, sentiment polarity, emotion and irony Bosco et al. (2013) 7. polarity classification: 12-level (a) future orientation/past orientation/positive emotions/negative emotions/sadness/anxiety/anger/ tentativeness/certainty/work/achievement/money 8. polarity score (a) negative ranging from 0-0.5 and positive ranging from 0.5-1 (b) negative/neutral/positive ranging from 0 (low) to 0.45 (high) (c) negative/positive ranging from -1 (low) to 1 (high) (d) negative/positive ranging from -1.5 (low) to 1.5 (high) (e) negative/positive ranging from -2 (low) to 2 (high) (f) negative/positive ranging from 1 (low) to 5 (high) (g) negative ranging from -1 (low) to -5 (high) and positive ranging from 1 (low) to 5 (high) (h) strongly negative to strongly positive ranging from -2 (low) to 2 (high) (i) normalised values from -100 to 100 (j) weighted average of polarity scores of the sentiment aspects/topic segments (k) score for every aspect/feature of the subject (l) score per aspect by calculating the distance between the aspect and sentiment word (m) total classification probability close to 1 9. polarity strength In some studies (Sandoval-Almazan and Valle-Cruz 2018; Bouazizi and Ohtsuki 2017; Chou et al. 2017; Karyotis et al. 2017; Furini and Montangero 2016; Gambino and Calvo 2016; Jiang et al. 2015; Yuan et al. 2015) , the sentiment polarity was derived from the emotion classification, such as, joy/love/surprise translated to positive, and anger/sadness/ fear translated to negative. Emotion refers to a person's subjective feelings and thoughts, such as love, joy, surprise, anger, sadness and fear (Liu 2010 2011)) (c) love-heart/quality/happiness-smile/sadness/amused/anger/thumbs up (emotions based on sentiment carrying words and/or emoticons )) (d) joy/surprise/sadness/fear/anger/disgust/unknown (e) like/happiness/sadness/disgust/anger/surprise/fear (f) joy/love/anger/sadness/fear/dislike/surprise (g) anger/joy/love/fear/surprise/sadness/disgust (h) joy/sadness/anger/love/fear/thankfulness/surprise (i) happiness/sadness/anger/disgust/fear/surprise/neutral (j) joy/surprise/fear/sadness/disgust/anger/neutral (k) love/happiness/fun/neutral/hate/sadness/anger (l) happiness/goodness/anger/sorrow/fear/evil/amazement 7. emotion classification: 8-level (a) anger/embarrassment/empathy/fear/pride/relief/sadness/other (b) flow/excitement/calm/boredom/stress/confusion/frustration/neutral (c) joy/sadness/fear/anger/anticipation/surprise/disgust/trust (d) anger/anxiety/expect/hate/joy/love/sorrow/surprise (e) happy/loving/calm/energetic/fearful/angry/tired/sad (f) anger/sadness/love/fear/disgust/shame/joy/surprise 8. emotion classification: 9-level (a) surprise/affection/anger/bravery/disgust/fear/happiness/neutral/sadness 9. emotion classification: 11-level (a) neutral/relax/docile/surprise/joy/contempt/hate/fear/sad/anxiety/anger (b) joy/excitement/wink/happiness/love/playfulness/surprise/scepticism/support/ sadness/annoyance (emotions based on emoticons (Cvijikj and Michahelles 2011)) 10. emotion classification: 22-level (a) hope/fear/joy/distress/pride/shame/admiration/reproach/linking/disliking/gratification/remorse/gratitude/anger/satisfaction/fears-confirmed/relief/disappointment/happy-for/resentment/gloating/pity (emotions based on an Emotion-Cause-OCC model that describe the eliciting conditions of emotions ) 11. emotion-anger classification: 7-level (a) frustration/sulking/fury/hostility/indignation/envy/annoyance 12. emotion score (a) valence/arousal/dominance ranging from 1 (low) to 9 (high) (b) prediction/valence/arousal/outcome from 0 (low) to 100 (high) 13. emotion intensity (a) 0 (minimum) to 1 (maximum) (b) 0 (minimum) to 9 (maximum) (c) high/medium/low 14. emotion-happiness measurement (a) average happiness score A study (Munezero et al. 2015) mapped the observed emotions into two broad categories of enduring sentiments: 'like' and 'dislike'. The former includes emotions that have a positive evaluation of the object, i.e., joy, trust and anticipation, and the latter includes emotions that have a negative evaluation of the object, i.e., anger, fear, disgust, and sadness. It is important to note that some of the emotion categories listed above are based on published theories of emotion, with the most popular ones being Paul Ekman's six basic emotions (anger, disgust, fear, happiness, sadness and surprise) (Ekman 1992) , and Plutchik's eight primary emotions (anger, fear, sadness, disgust, surprise, anticipation, trust, and joy) (Plutchik 1980) . Moreover, other studies have used emotion categories that are influenced from emotional state/psychological models, such as the Pleasure Arousal Dominance (Mehrabian 1996) and the Ortony, Clore and Collins (commonly referred to as OCC) (Ortony et al. 1988) . Several studies (Xu et al. 2012; Furini and Montangero 2016; Walha et al. 2016; Hubert et al. 2018 ) that targeted emotion classification incorrectly referred to such a task as sentiment analysis. Even though emotions and sentiment are highly related, the former are seen as enablers to the latter, i.e., an emotion/set of emotions affect the sentiment. Affect refers to a set of observable manifestations of a subjectively experienced emotion. The basic tasks of affective computing are emotion recognition and polarity detection (Cambria 2016 ). 1. affect classification: 4-level (a) aptitude/attention/pleasantness/sensitivity (based on the ''Hourglass of Emotions'', which was inspired by Plutchik's studies on human emotions) When using the affective model mentioned above, sentiment is based on the four independent dimensions mentioned, namely Pleasantness, Attention, Sensitivity, and Aptitude. The different levels of activation of these dimensions constitute the total emotional state of the mind (Hussain and Cambria 2018) . The semi-supervised learning model proposed by Hussain and Cambria (2018) based on the merged use of multi-dimensional scaling by means of random projections and biased SVM, has been exploited for the inference of semantics and sentics (conceptual and affective information) that are linked with concepts in a multi-dimensional vector space, in accordance with this affective model. This is used to carry out sentiment polarity detection and emotion recognition in cases when there is a lack of labelled common-sense data. Irony is usually used to convey, the opposite meaning of the actual things you say, but its purpose is not intended to hurt the other person. (2014) assume that aggressive text detection is a sub-task of sentiment analysis, which is closely related to document polarity detection. Their reasoning is that aggressive text can be seen as intrinsically negative. 1. Aggressiveness detection (a) aggressiveness score ranging from 0 (no aggression) to 10 (strong aggression) 4.2.9 Other 1. Opinion retrieval (a) opinion score from 0 (minimum) to 5 (maximum) Sarcasm and irony are often confused and/or misused. This leads to their classification in being very difficult even for humans (Unankard et al. 2014; Buscaldi and Hernandez-Farias 2015) , with most users holding negative views on such messages (Unankard et al. 2014) . The study by Buscaldi and Hernandez-Farias (Buscaldi and Hernandez-Farias 2015) is a relevant example, whereby a large number of false positives were identified in the tweets classified as ironic. Moreover, such tasks are also very time consuming and labour intensive particularly with the rapid growth in volume of online social data. Therefore, not many studies focused and/or catered for sarcasm and/or irony detection. The majority of the reviewed proposed approaches are not equipped to cater for traditional limitations, such as negation effects or ironic phenomena in text . Such opinion mining tasks face several challenges, with the main ones being: • Different languages and cultures result in various ways of how an opinion is expressed on certain social media platforms. For example, Sina Weibo users prefer to use irony when expressing negative polarity ). Future research is required for the development of cross-lingual/multilingual NLP tools that are able to identify irony and sarcasm ). • Presence of sarcasm and irony in social data, such as tweets, may affect the feature values of certain machine learning algorithms. Therefore, further advancement is required in the techniques used for handling sarcastic and ironic tweets (Pandey et al. 2017) . The work in Sarsam et al. (2020) addresses the main challenges faced for sarcasm detection in Twitter and the machine learning algorithms that can be used in this regard. • Classifying/rating a given sentence's sentiment is very difficult and ambiguous, since people often use negative words to be humorous or sarcastic. • Sarcasm and/or irony annotation is very hard for humans and thus it should be presented to multiple persons for accuracy purposes. This makes it very challenging to collect large datasets that can be used for supervised learning, with the only possible way being to hire people to carry out such annotations . Moreover, the differentiation between sarcasm and irony by human annotators result in a lack of available datasets and datasets with enough examples of ironic and/or sarcastic annotations. Such datasets are usually needed for ''data hungry'' computational learning methods (Sykora et al. 2020 ). Table 31 lists the studies within the review analysis that focused on sarcasm and/or irony. These account for only 18 out of 465 reviewed papers. One can clearly note the research gap that exists within these research areas. The following is an overview of the studies' main results and observations: • Bosco et al. (2013) : The authors found that irony is normally used together with a positive statement to express a negative statement, but seldomly the other way. Analysis shows that the Senti-TUT 106 corpus can be representative for a wide range of irony in phenomena from bitter sarcasm to genteel irony. • Reyes et al. (2013) : The study describes a number of textual features used to identify irony at a linguistic level. These are mostly applicable for short texts, such as tweets. The developed irony detection model is evaluated in terms of representativeness and relevance. Authors also mention that there are overlaps in occurrences of irony, satire, parody and sarcasm, with their main differentiators being tied to usage, tone and obviousness. • Mejova et al. (2013) : A multi-stage data-driven political sentiment classifier is proposed in this study. The authors found out ''that a humorous tweet is 76.7% likely to also be sarcastic'', whereas ''sarcastic tweets are only 26.2% likely to be humorous''. Future work is required on the connection between sarcasm and humour. • Fersini et al. (2015) : Addresses the automatic detection of sarcasm and irony by introducing an ensemble approach based on Bayesian Model Averaging, that takes into account several classifiers according to their reliabilities and their marginal probability predictions. Results show that not all the features are equally able to characterise sarcasm and irony, whereby sarcasm is better characterised by POS tags, and ironic statements by pragmatic particles (such as emoticons and emphatic/onomatopoeic expressions, which represent those linguistic elements typically used in social media to convey a particular message). • Jiang et al. (2015) : The authors' model classifies subjectivity, polarity and emotion in microblogs. Results show that emoticons are a pure carrier of sentiment, whereas sentiment words have more complex senses and contexts, such as negations and irony. : Post-facto analysis of user-generated content, such as tweets, show that political tweets tend to be quite sarcastic. • Ghiassi and Lee (2018): Certain keywords or hash-tagged words (e.g., ''thanks'', ''#smh'', '' #not'') that follow certain negative or positive sentiment markers in textual social data, usually indicate the presence of sarcasm. Around half of the studies analysed focused their work on a particular real-world application area (or multiple), where Fig. 3 shows the ones applicable for this systematic review. Note that each circle represents an application area, where the size reflects the Bouazizi and Ohtsuki (2018) , Ghiassi and Lee (2018) , Abdullah and Zolkepli (2017) , Bouazizi and Ohtsuki (2017) , Caschera et al. (2016) , Tan et al. (2014) , Unankard et al. (2014) , Mejova et al. (2013) , Bakliwal et al. (2013) , Mejova and Srinivasan (2012) and Wang et al. (2012) 4 Buscaldi and Hernandez-Farias (2015) , Hernandez-Farias et al. (2014) , Bosco et al. (2013) and Reyes et al. (2013 ) 4 4 Fersini et al. (2015 and Pandey et al. (2017) number of studies within the particular application area. The smallest circles represent a minimum of two studies that pertain to the respective application area, whereas the biggest circle reflects the most popular application area. Intersecting circles represent application areas that were identified as being related to each other based on the analysis conducted. The Politics domain is the dominant application area with 45 studies applying SOM on different events, namely elections (Elouardighi et al. 2017; Bansal and Srivastava 2018; Nugroho et al. 2017; Nausheen and Begum 2018; Abdullah and Hadzikadic 2017; Joyce and Deng 2017; Soni et al. 2017; Salari et al. 2018; Fatyanosa and Bachtiar 2017; Juneja and Ojha 2017; Sandoval-Almazan and Valle-Cruz 2018; Zhou et al. 2017; Le et al. 2017; Yuan et al. 2014; Ramteke et al. 2016; Smailović et al. 2015; Burnap et al. 2016; Rill et al. 2014; Anjaria and Guddeti 2014; Kuo et al. 2016; Batista and Ratté 2014; Mejova et al. 2013; Hoang et al. 2013; Gonçalves et al. 2013; Unankard et al. 2014; Maynard and Funk 2011; Bosco et al. 2013; Bakliwal et al. 2013; Tumasjan et al. 2010) , reforms, such as equality marriage (Lai et al. 2015) , debates (Tapia and Velásquez 2014) , referendums (Pavel e al. 2017; Fang and Ben-Miled 2017) , political parties or politicians (Ozer et al. 2017; Javed et al. 2014; Taddy 2013) , and political events, such as terrorism, protests, uprisings and riots (Sachdeva et al. 2018; Kamyab et al. 2018; Bouchlaghem et al. 2016; Mejova and Srinivasan 2012; de Souza Carvalho et al. 2016; Sheth et al. 2014; Weiss et al. 2013) . In terms of Marketing & Advertising & Sales, 29 studies focused on brand/product management and/or awareness (Giachanou et al. 2017; Ayoub and Elgammal 2018; Ghiassi and Lee 2018; Li and Fleyeh 2018; Ducange and Fazzolari 2017; Husnain et al. 2017; Teixeira and Laureano 2017; Halibas et al. 2018; Hu et al. 2017; Abdullah and Zolkepli 2017; Zimbra et al. 2016; Cho et al. 2014; Esiyok and Albayrak 2015; Dasgupta et al. 2015; Ghiassi et al. 2013; Mostafa 2013b; Min et al. 2013; Cvijikj and Michahelles Fig. 3 Application areas In terms of domains, the studies focused on: This section presents the latest research developments and advancements within the SOM research area (Sect. 6.1) and presents the overall conclusions of this systematic review in terms of target audience and future research and development in (Sect. 6.2). Given that this systematic review covers studies till 2018, some recent developments and advancements from 2019 till 2021 shall be discussed within this sub-section. This shows the fast research turnaround in SOM which has kept evolving at an incredibly fast rate, thus reiterating its validity and popularity as a research area. The number of studies using Deep Learning approaches continued to increase (as reflected in Table 5 ), especially ones using certain deep learning techniques, such as CNNs, RNNs, LSTM, GRU and Deep Belief Networks (Yadav and Vishwakarma 2020; Wadawadagi and Pagi 2020) , and with the introduction of new techniques, such as Transfer Learning. This is supported by numerous studies (Carvalho and Plastino 2021; Eke et al. 2020 ) who have noted that researchers are shifting from using traditional machine learning techniques to deep learning ones. Carvalho and Plastino (2021) Transfer learning is a deep learning technique where a model is trained for one or more tasks (source tasks), which learnt knowledge is applied to a related second task (target task) (Pan and Yang 2009 ). In particular, the Transformer model architecture introduced by Vaswani et al. (2017) in 2017, is based on attention mechanisms and is designed to handle sequential data like natural language for NLP tasks, such as sentiment analysis and text summarisation. This has coincided with the advancement of SOM for different opinion dimensions, such as sentiment polarity (Nguyen et al. 2020; Naseem et al. 2020) , emotion (Acheampong et al. 2021) , and irony (Nguyen et al. 2020) , especially studies focused on adaptation to new domains and/or knowledge transfer from one language to another. The latter application is extremely reliable for cross-lingual adaptation where a labelled dataset is available in one language e.g., English, which is then applied to another language, such as low-resourced languages (Ruder 2017) . With respect to language, more SOM studies supporting languages other than the popular ones (such as English and Chinese) are on the rise. In Rani and Kumar (2019) , the authors discuss the growth of research work in the fields of sentiment and emotion analysis for Indian languages. Moreover, Buechel et al. (2020) created emotion lexicons for 91 languages for sentiment and emotion analysis. Other recent studies have focused on languages, such as Urdu for sentiment analysis (Mukhtar and Khan 2019) , Maltese for sentiment and emotion analysis and sarcasm/irony detection (Cortis and Davis 2019), Indonesian for sentiment analysis (Koto et al. 2020) , Portuguese for sentiment and emotion analysis (Pereira 2021) , and Arabic for sentiment and emotion analysis (Alhumoud and Al Wazrah 2021) . Studies on code-switched languages is also on the increase, with Bansal et al. (2020) demonstrating how Hindi-English code-switching patterns from tweets can be used to improve sarcasm detection, and Appidi et al. (2020) analysing code-switched Kannada-English from tweets for emotion classification. In terms of modality, the visual modality is gaining more interest in the SOM research community. In Akhtar et al. (2019) , the authors propose a deep multi-task learning framework that carries out sentiment and emotion analysis from the textual, acoustic and visual frames of video data obtained from YouTube. On the other hand, Kumar and Garg (2019) propose a multi-modal sentiment analysis model for Twitter, where the sentiment polarity and strength is extracted from tweets based on their text and images (typographic and/or infographic). More research has been published on aspect-based SOM, with Jiang et al. (2020) focused on sentiment polarity in both single-aspect and multi-aspect scenarios, whereas Hyun et al. (2020) focused on sentiment polarity in the automotive domain for the English and Korean languages. In terms of application areas, the ones identified in Sect. 5 are still very popular, with research in new sub-domains emerging. In particular, several studies (Kapočiūt_ e-Dzikien_ e et al. 2019; Cresci et al. 2019; Guo and Li 2019; Xing et al. 2020; Chen et al. 2020; Mishev et al. 2020 ) focus on the Finance domain. Xing et al. (2020) identify common error patterns that result in financial sentiment analysis to fail, namely, irrealis mood, rhetoric, dependent opinion, unspecified aspects, unrecognised words, and external reference. On the other hand, in Mishev et al. (2020) evaluate sentiment analysis studies in the Finance domain by starting from lexicon-based approaches and finishes with the ones that use Transformers, such as the Bidirectional Enconder Representations from Transformers (BERT) (Devlin et al. 2018 ) and the Robustly optimised BERT approach (RoBERTa) (Liu et al. 2019) . The ongoing coronavirus disease (COVID-19) global pandemic has led to a rise in SOM studies analysing social opinions in terms of different dimensions, such as sentiment polarity. The work in Müller et al. (2020) released a COVID-19 Transformer-based model that was pre-trained on multiple datasets of tweets from Twitter. These datasets contained tweets on various topics, such as vaccine sentiment and maternal vaccine stance, and used other well known datasets, such as SemEval 2016-Task 4 which was previously discussed in Sect. 3.3. This model was pre-trained to carry out sentiment analysis on tweets written in other languages, such as Arabizi-a written form of spoken Arabic that relies on Latin characters and digits (Baert et al. 2020 ). On the other hand, Kruspe et al. (2020) presented sentiment analysis results of 4.6 million European tweets for the initial period of COVID-19 (December 2019 till April 2020), which results were aggregated by country (Italy, Spain, France, Germany, United Kingdom) and averaged over time. An ANN was trained to carry out sentiment analysis, which model was compared with several pre-trained models, such as BERT which is trained on BookCorpus and English Wikipedia data (Devlin et al. 2018 ), a multilingual version of BERT trained on COVID-19 tweets (Müller et al. 2020) , and the Embeddings from Language Models (ELMO) trained on the 1 Billion Word Benchmark dataset. In terms of NLP tools, Hugging Face 107 provides a state-of-the-art Transformer library for Pytorch and TensorFlow 2.0. 108 Therefore, it provides general-purpose architectures, such as BERT, GPT-2 (Radford et al. 2019 ), RoBERTa, cross-lingual language model (XLM) (Lample and Conneau 2019), DistilBert (Sanh et al. 2019) , and XLNET (Yang et al. 2019) for NLP tasks (like sentiment analysis), where over 32? pre-trained models are available in 100? languages. Similarly, TensorFlow Hub 109 provides a repository of trained machine learning models, with a variety of them using the Transformer architecture 110 , such as BERT. The carbon footprint for training new deep learning models should always be taken in consideration especially if a large number of Central Processing Units (CPUs), Graphical Processing Units (GPUs), or Tensor Processing Units (TPUs) are needed. This in turn increases the related costs for model training, which is becoming very expensive and is expected to keep increasing in the future. In Strubell et al. (2019) , Strubell et al. mention that such costs amount to both the financial aspect in terms of hardware and electricity or cloud compute time, and the environmental aspect in terms of carbon footprint needed to fuel modern tensor processing hardware. Therefore, researchers should report the training time and computational resources needed in their published work, and they should prioritise computationally efficient algorithms and hardware that need less energy. The main aim of this systematic review is to provide in-depth analysis and insights on the most prominent technical aspects, dimensions and application areas of SOM. The target audience of this comprehensive review is three fold: • Early-Stage Researchers who are interested in working within this evolving research field of study and/or are looking for an overview of this field; • Experienced Researchers already working in SOM who would like to progress further on the technical side of their work and/or looking for weaknesses in the the field of SOM; • Early-Stage and/or Experienced Researchers who are looking into applying SOM/their SOM work in a real-world application area. The identification of the current literature gaps within the SOM field of study is one of the main contributions of this systematic review. An overview below provides a pathway to future research and development work: • Social Media Platforms: Most studies focus on data gathered from one social media platform, with Twitter being the most popular followed by Sina Weibo for Chinese targeted studies. It is encouraged to possibly explore multi-source information by using other platforms, thus use data from multiple data sources, subject to any existing API limitations 111 . This shall increase the variety and volume of data (two of the V's of Big Data) used for evaluation purposes, thus ensuring that results provide more reflective picture of society in terms of opinions. The use of multiple data sources for studies focusing on the same real-world application areas are also beneficial for comparison purposes and identification of any potential common traits, patterns and/or results. Mining opinions from multiple sources of information also presents several advantages, such as higher authenticity, reduced ambiguity and greater availability (Balazs and Velásquez 2016) . • Techniques: The use of Deep Learning, Statistical, Probabilistic, Ontology and Graphbased approaches should be further explored both as standalone and/or part of hybrid techniques, due to their potential and accessibility. In particular, Deep Learning capabilities has made several applications feasible, whereas Ontologies and Graph Mining enable fine-grained opinion mining and the identification of relationships between opinions and their enablers (person, organisation, etc.). Moreover, ensemble Machine Learning and Deep Learning methods and fine-tuned Transformed-based models are still under-explored. In such a case, researchers should be attentive to the carbon footprint needed to train neural network models for NLP. • Social Datasets: The majority of available datasets are either English or Chinese specific. This domain needs further social datasets published under a common open license for use by the public domain. These should target any of the following criteria: bi-lingual/multilingual data, and/or annotations of multiple opinion dimensions within the data, e.g., sentiment polarity, emotion, sarcasm, irony, mood, etc. Both requirements are costly in terms of resources (time, funding and personnel), domain knowledge and expertise. • Language: The majority of the studies support one language, with English and Chinese being the most popular. Studies that support two or more languages is one of the major challenges in this domain due to numerous factors, such as cultural differences and lack of language-specific resources, e.g., lexicons, datasets, tools and technologies. This domain also needs more studies that focus on code-switched languages and lessresourced languages, which shall enable the development of certain language resources needed for the respective code-switched and less-resourced languages. • Modality: Bi-/Multi-modal SOM is another sub-domain that requires several research. Several studies cater for the text modality only, with the visual-image modality gaining more popularity. However, the visual-video and audio modalities are still in their early research phases with several aspects still unexplored. This also stems from a lack of available visual, audio and multimodal datasets. • Aspect-based SOM: Research in this sub-domain is increasing and developing, however, it is far from the finished article, especially when applied in certain domains. Further aspect-based research is encouraged on other opinion dimensions other than sentiment polarity, such as emotions and moods, which is still unexplored. Moreover, more research is required on the use of Deep Learning approaches for such a task, which is still at an early stage. analysis. The area of emotion analysis is increasing in popularity, however, sarcasm detection, irony detection and mood analysis are still in their early research phases. Moreover, from the analysis of this systematic review it is evident that there is a lack of research on any possible correlations between the different opinion dimensions, e.g., emotions and sentiment. Lastly, no studies cater for all the different SOM dimensions within their work. Shared evaluation tasks, such as International Workshop on Semantic Evaluation (SemEval), focused on any one of the current research gaps identified above, are very important and shall contribute to the advancement of the SOM research area. Therefore, researchers are encouraged to engage in these tasks through their participation and/or organisation of new tasks, since these shall advance the SOM research area. In conclusion, as identified through this systematic review, a fusion of social opinions represented in multiple sources and in various media formats can potentially influence multiple application areas. Acknowledgements This work is funded by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant No. 13/RC/2106). Funding Open Access funding provided by the IReL Consortium. Conflict of interest The authors declare that they have no conflict of interest. Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Ducange and Fazzolari Shyamasundar and Rani Anjaria and Guddeti Resource Description Framework Triplestore Gonçalves et al. 2013), products/services in general Li and Li 2013), local marketing (Costa et al. 2014) and online advertising (Adibi et al The Technology industry-oriented studies (23) focused on either: company perception operating systems (Huang et al. 2018), cloud service providers (Qaisi and Aljarah 2016), social media providers (Arslan et al. 2017) and multiple technologies All the 21 studies targeting the Finance domain applied SOM on demonitisation (Gupta and Singal 2017), currencies (Pavel e al. 2017) and the stock market, for risk management (Ishikawa and Sakurai 2017) and predictive analytics Thirteen studies applied SOM on the Film industry for recommendations Similarly, 13 studies focused on Healthcare, namely on epidemics/infectious diseases and in general, such as health-related tweets (Baccouche et al. 2018) and health applications (Pai and Alathur Yang et al. 2013) and hotel/resort perceptions • Aviation on specific airline services, e.g., customer relationship management Liu et al. 2015) or on safety American football Li et al. 2016) and e-Government urban mobility (Gallegos et al. 2016), wind energy (Politopoulou and Maragoudakis 2013), green initiatives (Rai et al. 2018) and peatland fires crisis management (Park et al. 2011), decision making (D'Avanzo and Pilato 2015) and policy making Karyotis et al. 2017) and on universities • Transportation for ride hailing services and logistics (Anastasia and Budi 2016) and traffic conditions (Cao et al Moreover, other studies applied SOM in the following areas Thelwall et al. 2011), flooding (Buscaldi and Hernandez-Farias 2015), explosions (Ouyang et al. 2017) and in general Zainuddin et al. 2016), cyberbullying (Del Bosque and Garza 2014), bullying (Xu et al. 2012) and violence and disorder Oscars, TV shows, product launch, earthquake (Thelwall et al. 2011), accidents e.g., shootings • Liveability in terms of place design to supports local authorities, urban designers and city planners (You and Tunçer 2016), and government services Lastly, 19 further studies -not represented in Fig. 3-focused on the following application areas: Human Development (Zafar et al. 2016), Human Mobility Refugee/ Migrant crisis Emotion and opinion retrieval from social media in Arabic language: survey Sentiment analysis of social media for evaluating universities Effect of training set size on SVM and Naive Bayes for twitter sentiment analysis Sentiment analysis of twitter data: emotions revealed regarding Donald Trump during the 2015-16 primary debates Sentiment analysis on Arabic tweets: challenges to dissecting the language Sentiment analysis of online crowd input towards brand provocation in facebook, twitter, and instagram Samar: subjectivity and sentiment analysis for Arabic social media Optimizing short message text sentiment analysis for mobile device forensics Transformer models for text-based emotion detection: a review of Bert-based approaches Personalized advertisement in the video games using deep social network sentiment analysis Fast algorithms for mining association rules Sentiment analysis of twitter data Big data in online social networks: User interaction analysis to model user behavior in social networks Clustering and sentiment analysis on twitter data Content vs. context for sentiment analysis: a comparative analysis over microblogs Identifying breakpoints in public opinion Multi-task learning for multimodal emotion recognition and sentiment analysis How intense are you? Predicting intensities of emotions and sentiments using stacked ensemble [application notes Real-time twitter sentiment analysis using 3-way classifier Enhance a deep neural network model for twitter sentiment analysis by incorporating user behavioral information Arabic sentiment analysis using recurrent neural networks: a review Big social data as a service: a service composition framework for social information service analysis An introduction to kernel and nearest-neighbor nonparametric regression Development of iot mining machine for twitter sentiment analysis: mining in the cloud and results on the mirror A new method for sentiment analysis using contextual autoencoders Twitter sentiment analysis of online transportation service providers Smartphone message sentiment analysis Social media analysis supporting smart city implementation (practical study in bandung district) A novel sentiment analysis of social networks using supervised learning Lexicon based sentiment comparison of iphone and android tweets during the Iran-Iraq earthquake Creation of corpus and analysis in code-mixed kannada-english twitter data for emotion prediction Real-time lexicon-based sentiment analysis experiments on twitter with a mild (more information, less data) approach Twitter sentiment analysis experiments using word embeddings on datasets of various scales A new ANEW: Evaluation of a word list for sentiment analysis in microblogs Rift: a rule induction framework for twitter sentiment analysis If you are happy and you know it A systematic review of open government data initiatives Stock market prediction: a big data approach Hierarchical deep learning: A promising technique for opinion monitoring and sentiment analysis in Russianlanguage social networks Utilizing twitter data for identifying and resolving runtime business process disruptions A real-time twitter sentiment analysis using an unsupervised method Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining Annotation technique for health-related tweets sentiment analysis A multimodal feature learning approach for sentiment analysis of social network multimedia Arabizi language models for sentiment analysis Sentiment analysis of political tweets: towards an accurate classifier Social opinion mining and concise rendition Opinion mining and information fusion: a survey Multitask learning for fine-grained twitter sentiment analysis An apache spark implementation for sentiment analysis on twitter data Emotion-corpus guided lexicons for sentiment analysis on twitter On predicting elections with hybrid topic based sentiment analysis of tweets Code-switching patterns can be an effective route to improve performance of downstream NLP applications: a case study of humour, sarcasm and hate speech detection The role of pre-processing in twitter sentiment analysis Twitter data classification using side information Multi-classifier system for sentiment analysis and opinion mining Social media analytics: a survey of techniques, tools and platforms Statistical inference for probabilistic functions of finite state Markov chains Avaya: sentiment analysis on twitter with self-training and polarity lexicon expansion An overview of sentiment analysis in social media and its applications in disaster relief Classifying sentiment in microblogs: is brevity an advantage Towards the exploitation of statistical language models for sentiment analysis of twitter posts Sentiment knowledge discovery in twitter streaming data Moa-tweetreader: real-time analysis in twitter streaming data Context-sensitive sentiment classification of short colloquial text Twitter reciprocal reply networks exhibit assortativity with respect to happiness Detection and visualization of misleading content on twitter Twitter mood predicts the stock market Large-scale visual sentiment ontology and detectors using adjective noun pairs Developing corpora for sentiment analysis: the case of irony and sentitut Large-scale machine learning with stochastic gradient descent A pattern-based approach for multi-class sentiment analysis in twitter Multi-class sentiment analysis in twitter: What if classification is not the answer A machine learning approach for classifying sentiments in Arabic tweets Combining strengths, emotions and polarities for boosting twitter sentiment analysis Meta-level sentiment models for big social data analysis From opinion lexicons to sentiment classification of tweets and vice versa: a transfer learning approach Bagging predictors Random forests Lessons from applying the systematic literature review process within the software engineering domain Collaborative visual analysis of sentiment in twitter events Topic-based sentiment analysis Learning and evaluating emotion lexicons for 91 languages Urwf: user reputation based weightage framework for twitter micropost classification 140 characters to victory?: Using twitter to predict the UK 2015 general election Sentiment analysis on microblogs for natural disasters management: a study on the 2014 genoa floodings Convolutional neural networks for multimedia sentiment analysis Cascading classifiers for twitter sentiment analysis with emotion lexicons Affective computing and sentiment analysis New avenues in opinion mining and sentiment analysis Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis # microposts2016: 6th workshop on making sense of microposts: Big things come in small packages Chinese microblog users' sentiment-based traffic condition analysis On the evaluation and combination of state-of-the-art features in twitter sentiment analysis Sentiment analysis from textual to multimodal features in digital environments Acquiring a large scale polarity lexicon through unsupervised distributional methods Bootstrapping large scale polarity lexicons through advanced distributional methods Twitter sentiment analysis, 3-way classification: positive, negative or neutral Tagnet: Toward tag-based sentiment analysis of large social media data Improved big data analytics solution using deep learning model and real-time sentiment data analysis approach Advanced combined LSTM-CNN model for twitter sentiment analysis Extracting diverse sentiment expressions with target-dependent polarity from twitter Deceptive opinion spam detection using deep level linguistic features Crime prediction using twitter sentiment and weather Research on micro-blog sentiment polarity classification based on SVM Multimodal hypergraph learning for microblog sentiment prediction Big data analytics on aviation social media: the case of china southern airlines on Sina Weibo Weighted co-training for cross-domain image sentiment classification Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM Issues and perspectives from 10,000 annotated financial social media data Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning Investigating temporal and spatial trends of brand images using twitter opinion mining Sentiment analysis for tracking breaking events: a case study on twitter Context-aware sentiment propagation using lDA topic modeling on Chinese conceptnet Multilingual irony detection with dependency syntax and neural models The cn2 induction algorithm Fast effective rule induction Detection of influential observation in linear regression Support-vector networks ACE: a concept extraction approach using linked open data A social opinion gold standard for the Malta government Semeval-2017 task 5: fine-grained sentiment analysis on financial microblogs and news Pollution, bad-mouthing, and local marketing: the underground of location-based social networks Grey sentiment analysis using sentiwordnet Twitter ontology-driven sentiment analysis Forecasting stock prices using social media analysis Online passive-aggressive algorithms Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter Emotion tokens: bridging the gap among multilingual twitter sentiment analysis Lexicon-based sentiment analysis on topical Chinese microblog messages Understanding social media marketing: a case study on topics, categories and sentiment on a facebook brand page Tweet sentiment analysis with classifier ensembles Smart map for smart city Computational intelligence and citizen communication in the smart city Role of conformity in opinion dynamics in social networks Sentiment analysis of facebook data using hadoop based open source technologies Multilingual sentiment analysis: state of the art and independent comparison of techniques Mining social network users opinions' to aid buyers' shopping decisions Brazilians divided: political protests as told by twitter Ensemble model for twitter sentiment analysis Aggressive text detection for cyberbullying Understanding online social networks' users-a twitter approach Maximum likelihood from incomplete data via the EM algorithm Twitter sentiment analysis using various classification algorithms Bert: pre-training of deep bidirectional transformers for language understanding Characterizing debate performance via aggregated twitter sentiment Deep learning for aspect-based sentiment analysis: a comparative review Annotation of a corpus of tweets for sentiment analysis Computational advertising in social networks: an opinion mining-based approach An apache spark implementation for graph-based hashtag sentiment classification on twitter Support vector regression machines Box office prediction based on microblog Domain adaptation from multiple sources via auxiliary classifiers Social sensing and sentiment analysis: Using social media as useful information source Applying systematic reviews to diverse study types: an experience report A comparison of pre-processing techniques for twitter sentiment analysis Sarcasm identification in textual data: systematic review, research challenges and open directions An argument for basic emotions Toward a sentiment analysis framework for social media Sentiment analysis of twitter data using machine learning techniques and scikitlearn Collecting and processing arabic facebook comments for sentiment analysis Sentiment analysis on twitter data using apache spark framework User-level twitter sentiment analysis with a hybrid approach Feature based sentiment analysis of tweets in multiple languages Twitter sentiment tracking for predicting marketing trends Does bad news spread faster Classification method comparison on Indonesian social media sentiment analysis Feature selection using variable length chromosome genetic algorithm for sentiment analysis A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs Detecting irony and sarcasm in microblogs: The role of expressive signals and ensemble classifiers Effective kernelized online learning in language processing tasks Theory of statistical estimation Sentiment analysis on the level of customer satisfaction to data cellular services using the Naive Bayes classifier algorithm What multimedia sentiment analysis says about city liveability Contextual sentiment analysis A short introduction to boosting Clustering by passing messages between data points Tsentiment: on gamifying twitter sentiment analysis Geography of emotion: Where in a city are people happier A comparison between two spanish sentiment lexicons in the twitter sentiment analysis task Sentiment analysis of twitter data with hybrid learning for recommender applications Toward understanding online sentiment expression: an interdisciplinary approach with subgroup comparison and visualization Chinese micro-blog sentiment analysis based on semantic features and pad model Emotion cause detection for Chinese micro-blogs based on ECOCC model Sentiment analysis of twitter feeds Twitter opinion mining and boosting using sentiment analysis Extremely randomized trees A domain transferable lexicon set for twitter sentiment analysis using a supervised machine learning approach A dynamic architecture for artificial neural networks Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network Deep ensemble model with the fusion of character, word and lexicon level information for emotion and sentiment prediction Like it or not: a survey of twitter sentiment analysis methods Opinion retrieval in twitter: is proximity effective Sentiment propagation for predicting reputation polarity Topic-specific stylistic variations for opinion retrieval on twitter Twitter sentiment classification using distant supervision Modeling recommendation system for real time analysis of social media dynamics Comparing and combining sentiment analysis methods Exploiting data of the twitter social network using sentiment analysis Generative adversarial nets Framewise phoneme classification with bidirectional LSTM and other neural network architectures LSTM: a search space odyssey Sentiment analysis on evolving social streams: How self-report imbalances can help A novel twitter sentiment analysis model with baseline correlation for financial market prediction with improved efficiency Twitter sentiment analysis in healthcare using hadoop and r Tweet normalization: a knowledge based approach Sentiment analysis of the demonitization of economy 2016 India, regionwise Twitter sentiment detection via ensemble classification using averaged confidence scores Design and implementation of a toolkit for the aspect-based sentiment analysis of tweets A comparative study of uncertainty based active learning strategies for general purpose twitter sentiment analysis with deep neural networks Application of text classification and clustering of twitter data for business analytics Combining classical and deep learning methods for twitter sentiment analysis On assessing the sentiment of general tweets A dynamic conditional random field based framework for sentence-level sentiment analysis of Chinese microblog Twitter sentiment analysis: a bootstrap ensemble framework Learning Bayesian networks: the combination of knowledge and statistical data Iradabe: Adapting english lexicons to the italian sentiment polarity classification task Improving neural networks by preventing co-adaptation of feature detectors Reducing the dimensionality of data with neural networks Politics, sharing and emotion in microblogs A social media platform for infectious disease analytics Multilayer feedforward networks are universal approximators Localized twitter opinion mining using sentiment analysis Using sentiment analysis to determine users' likes on twitter Extreme learning machine: theory and applications Image retrieval via probabilistic hypergraph ranking Boosting financial trend prediction with twitter mood based on selective hidden Markov models Examining government-citizen interactions on twitter using visual and sentiment analysis Analyzing users' sentiment towards popular consumer industries and brands on twitter Mining and summarizing customer reviews Mining time-changing data streams A twitter sentiment gold standard for the brexit referendum Estimating market trends by clustering social media reviews Semi-supervised learning for big social data analysis Exploiting social relations for sentiment analysis in microblogging Building large-scale English and Korean datasets for aspect-level sentiment analysis in automotive domain Decision stream: Cultivating deep decision trees A proposal of event study methodology with twitter sentimental analysis for risk management Semantic twitter sentiment analysis based on a fuzzy thesaurus Towards creation of linguistic resources for bilingual sentiment analysis of twitter data Information theory and statistical mechanics Twitter sentiment classification for measuring public health concerns Microblog sentiment analysis with emoticon space model Every term has sentiment: learning from emoticon evidences for Chinese microblog sentiment analysis Metnet: a mutual enhanced transformation network for aspect-based sentiment analysis Target-dependent twitter sentiment classification Pre-processing boosting twitter sentiment analysis? Combing semantic and prior polarity features for boosting twitter sentiment analysis using ensemble learning Comparison research on text pre-processing methods on twitter sentiment analysis Deep convolution neural networks for twitter sentiment analysis Combining semantic and prior polarity for boosting twitter sentiment analysis Knowledge-based tweet classification for disease sentiment monitoring Incorporating positional information into deep belief networks for sentiment classification Sentiment analysis of tweets for the 2016 us presidential election Casting online votes: to predict offline results using sentiment analysis by machine learning classifiers Twitter sentiment analysis for security-related information gathering How to take a good selfie Sentiment analysis on twitter: a text mining approach to the Afghanistan status reviews Performance analysis of ensemble methods on twitter sentiment analysis using NLP techniques Weighting public mood via microblogging analysis An effective social network sentiment mining model for healthcare product sales analysis Users of the world, unite! the challenges and opportunities of social media Sentiment analysis of lithuanian texts using traditional and deep learning approaches A fuzzy computational model of emotion for cloud based sentiment analysis Twitter sentiment analysis using dynamic vocabulary Image sentiment analysis using latent correlations among visual, textual, and sentiment views Analysing tv audience engagement via twitter: Incremental segment-level opinion mining of second screen tweets Impact of event reputation on the sponsor's sentiment Which configuration works best? an experimental study on supervised arabic twitter sentiment analysis Tom: Twitter opinion mining framework using hybrid classification scheme Towards building large-scale distributed systems for twitter sentiment analysis Where is safe: Analyzing the relationship between the area and emotion using twitter data Procedures for performing systematic reviews Mobility network evaluation in the user perspective: Real-time sensing of traffic information in twitter messages Sentiment analysis of code-mixed Bambara-French social media text using deep learning techniques Ontology-based sentiment analysis of twitter posts Sentiment analysis on microblog utilizing appraisal theory A comparative study on twitter sentiment analysis: Which features are good HBE: Hashtag-based emotion lexicons for twitter sentiment analysis The use of pos sequence for analyzing sentence pattern in twitter sentiment analysis Indolem and indobert: a benchmark dataset and pre-trained language model for Indonesian nlp Twitter sentiment analysis: the good the bad and the omg! ICWSM Real-time data analysis in clowdflows The effect of preprocessing techniques on twitter sentiment analysis Cross-language sentiment analysis of European twitter messages during the covid-19 pandemic Opinion extraction, summarization and tracking in news and blog corpora Sentiment analysis of multimodal twitter data Analyzing twitter sentiments through big data Integrated microblog sentiment analysis from users' social interaction patterns and textual opinions Conditional random fields: probabilistic models for segmenting and labeling sequence data Debate on political reforms in twitter: A hashtag-driven analysis of political polarization Cross-lingual language model pretraining Social analytics: learning fuzzy product ontologies for aspect-oriented sentiment analysis Twitter sentiment analysis using multi-class SVM Forests of oblique decision stumps for classifying very large number of tweets Bumps and bruises: mining presidential campaign announcements on twitter Handwritten digit recognition with a back-propagation network Labels and sentiment in social media: On the role of perceived agency in online discussions of the refugee crisis Aspect-based twitter sentiment classification Using emotions to predict user interest areas in online social networks Naive (Bayes) at forty: the independence assumption in information retrieval Opinion diffusion and analysis on social networks Deriving market intelligence from microblogs Text-based emotion classification using emotion cause extraction City digital pulse: a cloud based heterogeneous data analysis platform Twitter microblog sentiment analysis The new eye of smart city: novel citizen sentiment analysis in twitter Twitter sentiment analysis of new ikea stores using machine learning Chinese microblog sentiment analysis based on sentiment features An unsupervised machine learning model for discovering latent infectious diseases using social media data A polarity analysis framework for twitter messages Mining cross-cultural differences and similarities in social media Sentiment analysis in social media Emoticon-aware recurrent neural network model for Chinese sentiment analysis A survey of social media data analysis for physical activity surveillance Topic-based microblog polarity classification based on cascaded model Do photos help express our feelings: incorporating multimodal features into microblog sentiment analysis a robustly optimized Bert pretraining approach Mining text data. In: Ch. A survey of opinion mining and sentiment analysis Least squares quantization in PCM Multilingual sentiment analysis: from formal to informal and scarce resource languages Entity-oriented sentiment analysis of tweets: results and problems Semi-supervised microblog sentiment analysis using social relation and text similarity Context-aware sentiment detection from ratings Visualizing social media sentiment in disaster scenarios Twitter sentiment analysis based on writing style Feature selection for twitter sentiment analysis: An experimental study Challenges in developing opinion mining tools for social media Automatic detection of political opinions in tweets Generalized linear models A logical calculus of the ideas immanent in nervous activity On the method of bounded differences Profile of mood states. Educational & Industrial testing service Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament Political speech in social media streams: Youtube comments and twitter posts Gop primary season on twitter: popular political sentiment in social media Real time location based sentiment analysis on twitter: the airsent system Efficient estimation of word representations in vector space Wordnet: a lexical database for English Role of emoticons in sentence-level sentiment classification Conditional generative adversarial nets Evaluation of sentiment analysis in finance: from lexicons to transformers Detecting the correlation between sentiment and user-level as well as text-level meta-data from benchmark corpora On adverse drug event extractions using twitter sentiment analysis Nrc-canada: building the state-of-the-art in sentiment analysis of tweets Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon Crowdsourcing a word-emotion association lexicon Cross-domain sentiment analysis using Spanish opinionated words Crowd explicit sentiment analysis Towards multimodal sentiment analysis: harvesting opinions from the web An emotional polarity analysis of consumers' airline service tweets More than words: social networks' text mining for consumer brand sentiments Twitter sentiment for 15 european languages, slovenian language resource repository CLARIN Collision theory based sentiment detection of twitter using discourse relations Effective lexicon-based approach for Urdu sentiment analysis Fuzzy-set based sentiment analysis of big social data Towards a set theoretical approach to big data analytics Covid-twitter-bert: a natural language processing model to analyse covid-19 content on twitter Discovering community preference influence network by social network opinion posts mining Emotwitter-a fine-grained visualization system for identifying enduring sentiments in tweets Design of self-adjusting algorithm for data-intensive mapreduce applications Semeval-2013 task 2: sentiment analysis in twitter Twitter opinion mining predicts broadband internet's customer churn rate Language-independent twitter sentiment analysis, Knowledge discovery and machine learning (KDML) Transformer based deep intelligent contextual embedding for twitter sentiment analysis Text mining for market prediction: a systematic review Sentiment analysis to predict election results using python Sentiment analysis in twitter using machine learning techniques Bertweet: a pre-trained language model for English tweets Leveraging emotional consistency for semi-supervised sentiment classification Opinion mining from social media using fuzzy inference system (FIS) Twitter sentiment analysis of DKI Jakarta's gubernatorial election 2017 with predictive and descriptive approaches Deep neural network approaches for Spanish sentiment analysis of short texts How Trump won: the role of social media sentiment in political elections The influence of twitter on education policy making Mining emotions in short films: user comments or crowdsourcing? Mining affective context in short films for emotionaware recommendation Sentiment analysis in facebook and its application to e-learning Visual sentiment analysis based on on objective text description of images Clusm: an unsupervised model for microblog sentiment analysis incorporating link information Sentistory: multi-grained sentiment analysis and event summarization with crowdsourced social media data Negative link prediction and its applications in online political networks Feasibility analysis of Asterixdb and spark streaming with Cassandra for stream-based processing Assessing mobile health applications with twitter analytics Predicting vehicle sales by sentiment analysis of twitter data and stock market values Twitter as a corpus for sentiment analysis and opinion mining Twitter, myspace, digg: unsupervised sentiment analysis in social media A survey on transfer learning Twitter sentiment analysis for large-scale data: an unsupervised approach Twitter sentiment analysis using hybrid cuckoo search method Opinion mining and sentiment analysis Recurrent neural network based bitcoin price prediction by twitter sentiment analysis Ceo's apology in twitter: a case study of the fake beef labeling incident by e-mart Perisikan: An intelligent framework for social network data analysis Shared task on sentiment analysis in Indian languages (sail) tweets-an overview Using short urls in tweets to improve twitter opinion mining Microblog sentiment analysis model based on emoticons Efficient adverse drug event extraction using twitter sentiment analysis A survey of sentiment analysis in the Portuguese language Twitter sentiment analysis of movie reviews using ensemble features based Naïve Bayes Opinion mining on the web 2.0-characteristics of user generated content and their impacts. In: Human-computer interaction and knowledge discovery in complex, unstructured, big data Twitter sentiment analysis: capturing sentiment from integrated resort tweets Does social network sentiment influence the relationship between the s&p 500 and gold returns? Sentiment analysis in twitter for Spanish Chapter 1-a general psychoevolutionary theory of emotion On mining opinions from social media Sentiment spreading: an epidemic model for lexicon-based sentiment analysis on twitter Emotube: A sentiment analysis integrated environment for social web content Intelligence retrieval from a centralized IoT network Emosenticspace: a novel framework for affective common-sense reasoning Fusing audio, visual and textual clues for sentiment analysis from multimodal content Analysis of twitter users' mood for prediction of gold and silver prices in the stock market Modelling movement of stock market indexes with data from emoticons of twitter users Machine learning in prediction of stock market indicators based on historical data and data from twitter sentiment analysis Sentiment analysis: a combined approach Utilizing ensemble, data sampling and feature selection techniques for improving classification performance on tweet sentiment data Ent-it-up A twitter sentiment analysis for cloud providers: a case study of AZURE vs Induction of decision trees Using link structure to infer opinions in social networks Language models are unsupervised multitask learners Personalized language-independent music recommendation system Social data analysis for predicting next event Identification of landscape preferences by using social media analysis A comprehensive survey on sentiment analysis Tweet sentiment analyzer: sentiment score estimation method for assessing the value of opinions in tweets Twitter sentiment analysis using deep learning methods Comparison of Naive Bayes smoothing methods for twitter sentiment analysis Classification and clustering via dictionary learning with structured incoherence and shared features Election result prediction using twitter sentiment analysis A journey of Indian languages over sentiment analysis: a systematic review Twitter sentiment analysis of real-time customer experience feedback for predicting growth of Indian telecom companies Twitter sentiment analysis: how to hedge your bets in the stock markets Consumer insight mining: aspect based twitter opinion mining of mobile phone reviews Every post matters: a survey on applications of sentiment analysis in social media A survey on opinion mining and sentiment analysis: tasks, approaches and applications Twitter sentiment analysis for product review using lexicon method Polarity classification for target phrases in tweets: a word2vec approach. International semantic web conference A multidimensional approach for detecting irony in twitter Sentiment analysis on twitter using mcdiarmid tree algorithm A predictive government decision based on citizen opinions: tools & results Politwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis FVEC-SVM for opinion mining on Indonesian comments of youtube video Proceedings of the 11th international workshop on semantic evaluation Proceedings of the 9th international workshop on semantic evaluation Semeval-2014 task 9: sentiment analysis in twitter Least median of squares regression Deceptive review detection using labeled and unlabeled data A model for sentiment and emotion analysis of unstructured social media text Transfer learning-machine learning's next frontier Whose and what chatter matters? The effect of tweets on movie sales Learning internal representations by error propagation Opinion mining using support vector machine with web based diverse data An approach towards identification and prevention of riots by analysis of social media posts in real-time Efficient twitter sentiment classification using subjective distant supervision Twitter sentiment analysis-a more enhanced way of classification and scoring Skyline-based feature selection for polarity classification in social networks Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold Senticircles for contextual and conceptual semantic sentiment analysis of twitter Semantic sentiment analysis of twitter Adapting sentiment lexicons using contextual semantics for sentiment analysis of twitter. European semantic web conference Semantic patterns for sentiment analysis of twitter Emotion recognition using multimodal approach Estimation of 2017 Iran's presidential election using sentiment analysis on social media An ensemble classification system for twitter sentiment analysis Introduction to modern information retrieval Samoylov AB (2014) Evaluation of the delta TF-IDF features for sentiment analysis Assigning geo-relevance of sentiments mined from location-based social media posts Towards the study of sentiment in the public opinion of science in Spanish Facebook impact and sentiment analysis on political campaigns An algorithm for identification of natural disaster affected area Distilbert, a distilled version of Bert: smaller, faster, cheaper and lighter Towards sentiment orientation data set enrichment Social opinion mining: an approach for Italian language Optimizing non-decomposable measures with deep networks Twitter sentiment analysis Sarcasm detection using machine learning algorithms in twitter: a systematic review Phonetic-based microtext normalization for twitter sentiment analysis International conference on computational science and its applications Sentiment analysis: a review and comparative analysis of web services Feature expansion for sentiment analysis in twitter Multi-lingual opinion mining on youtube Twitter sentiment analysis with deep convolutional neural networks Tweet the debates: understanding community annotation of uncollected sources Web-based application for sentiment analysis of live tweets Algorithm for prediction of links using sentiment analysis in social networks Twitris: a system for collective social intelligence Opinion sentence extraction and sentiment analysis for Chinese microblogs A hierarchical LSTM model with multiple features for sentiment analysis of Sina Weibo texts Twitter sentiment analysis: a case study in the automotive industry Twitter sentiment analysis with different feature extractors and dimensionality reduction using supervised learning algorithms Combining a rule-based classifier with weakly supervised learning for twitter sentiment analysis Twitter sentiment analysis of movie reviews using information gain and naïve bayes classifier Using sentiment from twitter optimized by genetic algorithms to predict the stock market Role of text pre-processing in twitter sentiment analysis Analyzing the sentiment of crowd for improving the emergency response services Sentiment leaning of influential communities in social networks Monitoring the twitter sentiment during the Bulgarian elections Recursive deep models for semantic compositionality over a sentiment treebank Examining sentiments and popularity of pro-and anti-vaccination videos on youtube Implicit feedback mining for recommendation Political opinion mining using e-social network data Topie: An open-source opinion mining pipeline to analyze consumers' sentiment in brazilian portuguese Sentiment analysis on twitter data for Portuguese language Twitter polarity classification with label propagation over lexical links and the follower graph Deep neural network architecture for sentiment analysis and emotion identification of twitter messages Twitter sentiment analysis using deep convolutional neural network Energy and policy considerations for deep learning in NLP Sentiment analysis of Chinese micro-blog using semantic sentiment space model Detecting anomalous emotion through big data from social networks based on a deep learning method Predicting stock price returns using microblog sentiment for Chinese stock market Multi-strategy based sina microblog data acquisition for opinion mining Twitter sentiment analysis using binary classification technique An unsupervised fuzzy clustering method for twitter sentiment analysis The hourglass model revisited Lingosent-a platform for linguistic aware sentiment analysis for social media messages Analysis on Chinese microblog sentiment based on syntax parsing and support vector machine A replication study of the top performing systems in semeval twitter sentiment analysis. International Semantic Web Conference A qualitative analysis of sarcasm, irony and related# hashtags on twitter A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis Measuring political sentiment on twitter: factor optimal design for multinomial inverse regression Influence analysis of emotional behaviors and user relationships based on twitter data Learning the mapping rules for sentiment analysis Learning sentence representation for emotion classification on microblogs User-level sentiment analysis incorporating social networks Twitter sentiment polarity analysis: a novel approach for improving the automated labeling in a text corpora Real time sentiment change detection of twitter data streams Data extraction and preparation to perform a sentiment analysis using open source tools: The example of a facebook fashion brand page A case study of Spanish text transformations for twitter sentiment analysis Sentiment strength detection in short informal text Sentiment in twitter events Sentiment strength detection for the social web A recommendation mechanism for web publishing based on sentiment analysis of microblog Stock price prediction using data analytics Topic-adaptive sentiment analysis on tweets via learning from multisources data Evaluation of ensemble-based sentiment classifiers for twitter data Understanding effect of sentiment content toward information diffusion pattern in online social networks: a case study on tweetscope An ensemble model for cross-domain polarity classification on twitter Survey on mining subjective data on the web Dynamics of news events and social media reaction Predicting elections with twitter: What 140 characters reveal about political sentiment Predicting elections from social networks based on subevent detection and sentiment analysis Mining churning factors in Indian telecommunication sector using social media analytics Attention is all you need Global centrality measures in word graphs for twitter sentiment analysis Feature selection using sampling with replacement, covering arrays and rule-induction techniques to aid polarity detection in twitter sentiment analysis Extracting and composing robust features with denoising autoencoders Using geo-tagged sentiment to better understand social interactions An efficient hybrid model for vietnamese sentiment analysis Sentiment analysis of tweets to identify the correlated factors that influence an issue of interest An experiment in integrating sentiment features for tech stock prediction in twitter Sentiment analysis with deep neural networks: comparative study and performance assessment Survey on sentiment analysis using twitter dataset Ageing-based multinomial Naive Bayes classifiers over opinionated data streams ETL design toward social network opinion analysis Unsupervised opinion targets expansion and modification relation identification for microblog sentiment analysis Word clustering based on POS feature for efficient twitter sentiment analysis A system for real-time twitter sentiment analysis of 2012 us presidential election cycle Microblog sentiment analysis based on cross-media bag-of-words model Harnessing twitter'' big data'' for automatic emotion identification Multi-label Chinese microblog emotion classification via convolutional neural network. In: Asia-Pacific web conference Context-aware Chinese microblog sentiment classification with bidirectional LSTM Sentiment analysis of Chinese microblogs based on layered features Baselines and bigrams: Simple, good sentiment and topic classification Sentiment-bearing new words mining: Exploiting emoticons and latent polarities A novel calibrated label ranking based method for multiple emotions detection in Chinese microblogs Sentiment detection and visualization of Chinese micro-blog Shine: signed heterogeneous information network embedding for sentiment link prediction Vertical and sequential sentiment analysis of micro-blog topic Twitter sentiment analysis using deep neural network A character-based convolutional neural network for language-agnostic twitter sentiment analysis Aspect-based extraction and analysis of affective knowledge from social media streams A comparative study of social media and traditional polling in the Egyptian uprising of Data sources for prediction: databases, hybrid data and the web Ensemble approach for sentiment polarity analysis in user-generated Indonesian text Social media adoption and use by Australian capital city local governments Recognizing contextual polarity in phrase-level sentiment analysis Data Mining: practical machine learning tools and techniques Twitter opinion mining for adverse drug reactions Towards building a high-quality microblog-specific Chinese sentiment lexicon Solving unbalanced data for thai sentiment analysis Distantly supervised lifelong learning for large-scale social media sentiment analysis Microblog sentiment analysis with weak dependency connections Crowdsourcing recommendations from social sentiment Financial sentiment analysis: an investigation into common mistakes and silver bullets Dynamic evolution of collective emotions in social networks: a case study of Sina Weibo Fast learning for sentiment analysis on bullying Semi-automatic construction and refinement of an annotated corpus for a deep learning framework for emotion classification Sentiment analysis using deep learning architectures: a review A bilingual approach for conducting Chinese and English social media sentiment analysis Few-shot learning for short text classification Microblog sentiment analysis algorithm research and implementation based on classification A sentiment-enhanced personalized location recommendation system Competition component identification on twitter Enhanced twitter sentiment analysis by using feature selection and combination Hierarchical attention networks for document classification Xlnet: generalized autoregressive pretraining for language understanding Research on Chinese micro-blog sentiment analysis based on deep learning An empirical study and comparison for tweet sentiment analysis Comments-attached Chinese microblog sentiment classification based on machine learning technology Two simple and effective ensemble classifiers for twitter sentiment analysis A survey on social media analytics for smart city Exploring public sentiments for livable places based on a crowd-calibrated sentiment analysis mechanism Robust image sentiment analysis using progressively trained and domain transferred deep networks Dual coordinate descent methods for logistic regression and maximum entropy models The impact of social and conventional media on firm equity value: a sentiment analysis approach Exploiting sentiment homophily for link prediction Sentiment analysis using social multimedia Sentiment analysis of controversial topics on Pakistan's twitter user-base Improving twitter aspect-based sentiment analysis using hybrid approach Twitter feature selection and classification using support vector machine for aspect-based sentiment analysis Hybrid sentiment classification on twitter aspect-based sentiment analysis SES: sentiment elicitation system for social media data Microblogging sentiment analysis using emotional vector Grammatical phrase-level opinion target extraction on Chinese microblog messages Microblog sentiment analysis based on emoticon networks model Incorporating conditional random fields and active learning to improve sentiment identification Sentiment analysis on microblogging by integrating text and image features A word-character convolutional neural network for language-agnostic twitter sentiment analysis Unsupervised sentiment analysis of twitter posts using density matrix representation Opinion mining and sentiment analysis in social media: Challenges and applications Stock market prediction exploiting microblog sentiment analysis Sentiment analysis on twitter through topic-based lexicon expansion Coupling topic modelling in opinion mining for social media analysis Chinese microblog sentiment analysis based on semi-supervised learning Brand-related twitter sentiment analysis using feature engineering and the dynamic architecture for artificial neural networks The state-of-the-art in twitter sentiment analysis: a review and benchmark evaluation A semi-supervised self-adaptive classifier over opinionated streams Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations References of studies Rathan et al. (2018) , Li and Fleyeh (2018) , Geetha et al. (2018) , , Aoudi and Malik (2018) , Poortvliet and Wang (2018)