A survey on sentiment detection of reviews Expert Systems with Applications 36 (2009) 10760–10773 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a A survey on sentiment detection of reviews Huifeng Tang, Songbo Tan *, Xueqi Cheng Information Security Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, PR China a r t i c l e i n f o Keywords: Sentiment detection Opinion extraction Sentiment classification 0957-4174/$ - see front matter � 2009 Published by doi:10.1016/j.eswa.2009.02.063 * Corresponding author. E-mail addresses: tanghuifeng@software.ict.ac.cn (H e.ict.ac.cn (S. Tan), cxq@ict.ac.cn (X. Cheng). a b s t r a c t The sentiment detection of texts has been witnessed a booming interest in recent years, due to the increased availability of online reviews in digital form and the ensuing need to organize them. Till to now, there are mainly four different problems predominating in this research community, namely, sub- jectivity classification, word sentiment classification, document sentiment classification and opinion extraction. In fact, there are inherent relations between them. Subjectivity classification can prevent the sentiment classifier from considering irrelevant or even potentially misleading text. Document sen- timent classification and opinion extraction have often involved word sentiment classification tech- niques. This survey discusses related issues and main approaches to these problems. � 2009 Published by Elsevier Ltd. 1. Introduction Today, very large amount of reviews are available on the web, as well as the weblogs are fast-growing in blogsphere. Product re- views exist in a variety of forms on the web: sites dedicated to a specific type of product (such as digital camera), sites for newspa- pers and magazines that may feature reviews (like Rolling Stone or Consumer Reports), sites that couple reviews with commerce (like Amazon), and sites that specialize in collecting professional or user reviews in a variety of areas (like Rottentomates.com). Less formal reviews are available on discussion boards and mailing list archives, as well as in Usenet via Google Groups. Users also com- ment on products in their personal web sites and blogs, which are then aggregated by sites such as Blogstreet.com, AllConsum- ing.net, and onfocus.com. The information mentioned above is a rich and useful source for marketing intelligence, social psychologists, and others interested in extracting and mining opinions, views, moods, and attitudes. For example, whether a product review is positive or negative; what are the moods among Bloggers at that time; how the public reflect towards this political affair, etc. To achieve this goal, a core and essential job is to detect subjec- tive information contained in texts, include viewpoint, fancy, atti- tude, sensibility etc. This is so-called sentiment detection. A challenging aspect of this task seems to distinguish it from traditional topic-based detection (classification) is that while top- ics are often identifiable by keywords alone, sentiment can be ex- pressed in a much subtle manner. For example, the sentence ‘‘What a bad picture quality that digital camera has! . . . Oh, this Elsevier Ltd. . Tang), tansongbo@softwar- new type camera has a good picture, long battery life and beautiful appearance!” compares a negative experience of one product with a positive experience of another product. It is difficult to separate out the core assessment that should actually be correlated with the document. Thus, sentiment seems to require more understand- ing than the usual topic-based classification. Sentiment detection dates back to the late 1990s (Argamon, Koppel, & Avneri, 1998; Kessler, Nunberg, & SchÄutze, 1997; Sper- tus, 1997), but only in the early 2000s did it become a major sub- field of the information management discipline (Chaovalit & Zhou, 2005; Dimitrova, Finn, Kushmerick, & Smyth, 2002; Durbin, Neal Richter, & Warner, 2003; Efron, 2004; Gamon, 2004; Glance, Hurst, & Tomokiyo, 2004; Grefenstette, Qu, Shanahan, & Evans, 2004; Hil- lard, Ostendorf, & Shriberg, 2003; Inkpen, Feiguina, & Hirst, 2004; Kobayashi, Inui, & Inui, 2001; Liu, Lieberman, & Selker, 2003; Rau- bern & Muller-Kogler, 2001; Riloff and Wiebe, 2003; Subasic & Huettner, 2001; Tong, 2001; Vegnaduzzo, 2004; Wiebe & Riloff, 2005; Wilson, Wiebe, & Hoffmann, 2005). Until the early 2000s, the two main popular approaches to sentiment detection, espe- cially in the real-world applications, were based on machine learn- ing techniques and based on semantic analysis techniques. After that, the shallow nature language processing techniques were widely used in this area, especially in the document sentiment detection. Current-day sentiment detection is thus a discipline at the crossroads of NLP and IR, and as such it shares a number of characteristics with other tasks such as information extraction and text-mining. Although several international conferences have devoted spe- cial issues to this topic, such as ACL, AAAI, WWW, EMNLP, CIKM etc., there are no systematic treatments of the subject: there are neither textbooks nor journals entirely devoted to sentiment detection yet. mailto:tanghuifeng@software.ict.ac.cn mailto:tansongbo@software.ict.ac.cn mailto:tansongbo@software.ict.ac.cn mailto:cxq@ict.ac.cn http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 10761 This paper first introduces the definitions of several problems that pertain to sentiment detection. Then we present some appli- cations of sentiment detection. Section 4 discusses the subjectivity classification problem. Section 5 introduces semantic orientation method. The sixth section examines the effectiveness of applying machine learning techniques to document sentiment classification. The seventh section discusses opinion extraction problem. The eighth part talks about evaluation of sentiment detection. Last sec- tion concludes with challenges and discussion of future work. 2. Sentiment detection 2.1. Subjectivity classification Subjectivity in natural language refers to aspects of language used to express opinions and evaluations (Wiebe, 1994). Subjectiv- ity classification is stated as follows: Let S = {s1, . . . , sn} be a set of sentences in document D. The problem of subjectivity classification is to distinguish sentences used to present opinions and other forms of subjectivity (subjective sentences set Ss) from sentences used to objectively present factual information (objective sen- tences set So), where Ss [ So = S. This task is especially relevant for news reporting and Internet forums, in which opinions of various agents are expressed. 2.2. Sentiment classification Sentiment classification includes two kinds of classification forms, i.e., binary sentiment classification and multi-class senti- ment classification. Given a document set D = {d1, . . . , dn}, and a pre-defined categories set C = {positive, negative}, binary senti- ment classification is to classify each di in D, with a label expressed in C. If we set C* = {strong positive, positive, neutral, negative, strong negative} and classify each di in D with a label in C*, the problem changes to multi-class sentiment classification. Most prior work on learning to identify sentiment has focused on the binary distinction of positive vs. negative. But it is often helpful to have more information than this binary distinction pro- vides, especially if one is ranking items by recommendation or comparing several reviewers’ opinions. Koppel and Schler (2005a, 2005b) show that it is crucial to use neutral examples in learning polarity for a variety of reasons. Learning from negative and posi- tive examples alone will not permit accurate classification of neu- tral examples. Moreover, the use of neutral training examples in learning facilitates better distinction between positive and nega- tive examples. 3. Applications of sentiment detection In this section, we will expound some rising applications of sen- timent detection. 3.1. Products comparison It is a common practice for online merchants to ask their cus- tomers to review the products that they have purchased. With more and more people using the Web to express opinions, the number of reviews that a product receives grows rapidly. Most of the researches about these reviews were focused on automatically classifying the products into ‘‘recommended” or ‘‘not recom- mended” (Pang, Lee, & Vaithyanathan, 2002; Ranjan Das & Chen, 2001; Terveen, Hill, Amento, McDonald, & Creter, 1997). But every product has several features, in which maybe only part of them people are interested. Moreover, a product has shortcomings in one aspect, probably has merits in another place (Morinaga, Yamanishi, Tateishi, & Fukushima, 2002; Taboada, Gillies, & McFe- tridge, 2006). To analysis the online reviews and bring forward a visual man- ner to compare consumers’ opinions of different products, i.e., merely with a single glance the user can clearly see the advantages and weaknesses of each product in the minds of consumers. For a potential customer, he/she can see a visual side-by-side and fea- ture-by-feature comparison of consumer opinions on these prod- ucts, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Liu, Hu, and Cheng (2005) proposed a novel framework for ana- lyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is implemented. To en- able the visualization, two tasks were performed: (1) Identifying product features that customers have expressed their opinions on, based on language pattern mining techniques. Such features form the basis for the comparison. (2) For each feature, identifying whether the opinion from each reviewer is positive or negative, if any. Different users can visualize and compare opinions of different products using a user interface. The user simply chooses the prod- ucts that he/she wishes to compare and the system then retrieves the analyzed results of these products and displays them in the interface. 3.2. Opinion summarization The number of online reviews that a product receives grows rapidly, especially for some popular products. Furthermore, many reviews are long and have only a few sentences containing opin- ions on the product. This makes it hard for a potential customer to read them to make an informed decision on whether to purchase the product. The large number of reviews also makes it hard for product manufacturers to keep track of customer opinions of their products because many merchant sites may sell their products, and the manufacturer may produce many kinds of products. Opinion summarization (Ku, Lee, Wu, & Chen, 2005; Philip et al., 2004) summarizes opinions of articles by telling sentiment polari- ties, degree and the correlated events. With opinion summariza- tion, a customer can easily see how the existing customers feel about a product, and the product manufacturer can get the reason why different stands people like it or what they complain about. Hu and Liu (2004a, 2004b) conduct a work like that: Given a set of customer reviews of a particular product, the task involves three subtasks: (1) identifying features of the product that customers have expressed their opinions on (called product features); (2) for each feature, identifying review sentences that give positive or negative opinions; and (3) producing a summary using the dis- covered information. Ku, Liang, and Chen (2006) investigated both news and web blog articles. In their research, TREC, NTCIR and articles collected from web blogs serve as the information sources for opinion extraction. Documents related to the issue of animal cloning are selected as the experimental materials. Algorithms for opinion extraction at word, sentence and document level are proposed. The issue of relevant sentence selection is discussed, and then top- ical and opinionated information are summarized. Opinion sum- marizations are visualized by representative sentences. Finally, an opinionated curve showing supportive and non-supportive de- gree along the timeline is illustrated by an opinion tracking system. 3.3. Opinion reason mining In opinion analysis area, finding the polarity of opinions or aggregating and quantifying degree assessment of opinions 10762 H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 scattered throughout web pages is not enough. We can do more critical part of in-depth opinion assessment, such as finding rea- sons in opinion-bearing texts. For example, in film reviews, infor- mation such as ‘‘found 200 positive reviews and 150 negative reviews” may not fully satisfy the information needs of different people. More useful information would be ‘‘This film is great for its novel originality” or ‘‘Poor acting, which makes the film awful”. Opinion reason mining tries to identify one of the critical ele- ments of online reviews to answer the question, ‘‘What are the rea- sons that the author of this review likes or dislikes the product?” To answer this question, we should extract not only sentences that contain opinion-bearing expressions, but also sentences with rea- sons why an author of a review writes the review (Cardie, Wiebe, Wilson, & Litman, 2003; Clarke & Terra, 2003; Li & Yamanishi, 2001; Stoyanov, Cardie, Litman, & Wiebe, 2004). Kim and Hovy (2005) proposed a method for detecting opinion- bearing expressions. In their subsequent work (Kim & Hovy, 2006), they collected a large set of hreview text, pros, consi triplets from epinions.com, which explicitly state pros and cons phrases in their respective categories by each review’s author along with the re- view text. Their automatic labeling system first collects phrases in pro and con fields and then searches the main review text in or- der to collect sentences corresponding to those phrases. Then the system annotates this sentence with the appropriate ‘‘pro” or ‘‘con” label. All remaining sentences with neither label are marked as ‘‘neither”. After labeling all the data, they use it to train their pro and con sentence recognition system. 3.4. Other applications Thomas, Pang, and Lee (2006) try to determine from the tran- scripts of US Congressional floor debates whether the speeches rep- resent support of or opposition to proposed legislation. Mullen and Malouf (2006) describe a statistical sentiment analysis method on political discussion group postings to judge whether there is oppos- ing political viewpoint to the original post. Moreover, there are some potential applications of sentiment detection, such as online message sentiment filtering, E-mail sentiment classification, web- blog author’s attitude analysis, sentiment web search engine, etc. 4. Subjectivity classification Subjectivity classification is a task to investigate whether a par- agraph presents the opinion of its author or reports facts. In fact, most of the research showed there was very tight relation between subjectivity classification and document sentiment classification (Pang & Lee, 2004; Wiebe, 2000; Wiebe, Bruce, & O’Hara, 1999; Wiebe, Wilson, Bruce, Bell, & Martin, 2002; Yu & Hatzivassiloglou, 2003). Subjectivity classification can prevent the polarity classifier from considering irrelevant or even potentially misleading text. Pang and Lee (2004) find subjectivity detection can compress re- views into much shorter extracts that still retain polarity informa- tion at a level comparable to that of the full review. Much of the research in automated opinion detection has been performed and proposed for discriminating between subjective and objective text at the document and sentence levels (Bruce & Wiebe, 1999; Finn, Kushmerick, & Smyth, 2002; Hatzivassiloglou & Wiebe, 2000; Wiebe, 2000; Wiebe et al., 1999; Wiebe et al., 2002; Yu & Hatzivassiloglou, 2003). In this section, we will discuss some approaches used to automatically assign one document as objective or subjective. 4.1. Similarity approach Similarity approach to classifying sentences as opinions or facts explores the hypothesis that, within a given topic, opinion sen- tences will be more similar to other opinion sentences than to fac- tual sentences (Yu & Hatzivassiloglou, 2003). Similarity approach measures sentence similarity based on shared words, phrases, and WordNet synsets (Dagan, Shaul, & Markovitch, 1993; Dagan, Pereira, & Lee, 1994; Leacock & Chodorow, 1998; Miller & Charles, 1991; Resnik, 1995; Zhang, Xu, & Callan, 2002). To measure the overall similarity of a sentence to the opinion or fact documents, we need to go through three steps. First, use IR method to acquire the documents that are on the same topic as the sentence in question. Second, calculate its similarity scores with each sentence in those documents and make an average va- lue. Third, assign the sentence to the category (opinion or fact) for which the average value is the highest. Alternatively, for the frequency variant, we can use the similarity scores or count how many of them for each category, and then compare it with a prede- termined threshold. 4.2. Naive Bayes classifier Naive Bayes classifier is a commonly used supervised machine learning algorithm. This approach presupposes all sentences in opinion or factual articles as opinion or fact sentences. Naive Bayes uses the sentences in opinion and fact documents as the examples of the two categories. The features include words, bigrams, and trigrams, as well as the part of speech in each sen- tence. In addition, the presence of semantically oriented (positive and negative) words in a sentence is an indicator that the sentence is subjective. Therefore, it can include the counts of positive and negative words in the sentence, as well as counts of the polarities of sequences of semantically oriented words (e.g., ‘‘++” for two con- secutive positively oriented words). It also include the counts of parts of speech combined with polarity information (e.g., ‘‘JJ+” for positive adjectives), as well as features encoding the polarity (if any) of the head verb, the main subject, and their immediate modifiers. Generally speaking, Naive Bayes assigns a document dj (repre- sented by a vector d�j ) to the class ci that maximizes Pðcijd � j Þ by applying Bayes’ rule as follow, Pðcijd � j Þ¼ PðciÞPðd � j jciÞ Pðd�j Þ ð1Þ where Pðd�j Þ is the probability that a randomly picked document d has vector d�j as its representation, and P(c) is the probability that a randomly picked document belongs to class c. To estimate the term Pðd�j jcÞ, Naive Bayes decomposes it by assuming all the features in d�j (represented by fi,i = 1 to m) are con- ditionally independent, i.e., Pðcijd � j Þ¼ PðciÞ Qm i¼1 PðfijciÞ � � Pðd�j Þ ð2Þ 4.3. Multiple Naive Bayes classifier The hypothesis of all sentences in opinion or factual articles as opinion or fact sentences is an approximation. To address this, multiple Naive Bayes classifier approach applies an algorithm using multiple classifiers, each relying on a different subset of fea- tures. The goal is to reduce the training set to the sentences that are most likely to be correctly labeled, thus boosting classification accuracy. Given separate sets of features F1, F2, . . . , Fm, it train separate Na- ive Bayes classifiers C1, C2, . . . , Cm corresponding to each feature set. Assuming as ground truth the information provided by the docu- ment labels and that all sentences inherit the status of their docu- ment as opinions or facts, it first train C1 on the entire training set, H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 10763 then use the resulting classifier to predict labels for the training set. The sentences that receive a label different from the assumed truth are then removed, and train C2 on the remaining sentences. This process is repeated iteratively until no more sentences can be removed. Yu and Hatzivassiloglou (2003) report results using five feature sets, starting from words alone and adding in bigrams, trigrams, part-of-speech, and polarity. 4.4. Cut-based classifier Cut-based classifier approach put forward a hypothesis that, text spans (items) occurring near each other (within discourse boundaries) may share the same subjectivity status (Pang & Lee, 2004). Based on this hypothesis, Pang supplied his algorithm with pair-wise interaction information, e.g., to specify that two particu- lar sentences should ideally receive the same subjectivity label. This algorithm uses an efficient and intuitive graph-based formula- tion relying on finding minimum cuts. Suppose there are n items x1, x2, . . . , xn to divide into two classes C1 and C2, here access to two types of information: indj(xi): Individual scores. It is the non-negative estimates of each xi’s preference for being in Cj based on just the features of xi alone; assoc(xi, xk): Association scores. It is the non-negative estimates of how important it is that xi and xk be in the same class. Then, this problem changes to calculate the maximization of each item’s score for one class: its individual score for the class it is assigned to, minus its individual score for the other class, then minus associated items into different classes for penalization. Thus, after some algebra, it arrives at the following optimization problem: assign the xi to C1 and C2 so as to minimize the partition cost:X x2C1 ind2ðxÞþ X x2C2 ind1ðxÞþ X xi2C1;xk2C2 assocðxi; xkÞ ð3Þ This situation can be represented in the following manner. Build an undirected graph G with vertices {v1, . . . , vn, s, t}; the last two are, respectively, the source and sink. Add n edges (s, vi), each with weight ind1(xi), and n edges (vi, t), each with weight ind2(xi). Finally, add ðC2nÞ edges (vi, vk), each with weight assoc(xi, xk). A cut (S, T) of G is a partition of its nodes into sets S = {s}US0 and T = {t}UT0, where s R S0, t R T0. Its cost cost(S, T) is the sum of the weights of all edges crossing from S to T. A minimum cut of G is one of minimum cost. Then, finding solution of this problem is changed into looking for a minimum cut of G. 5. Word sentiment classification The task on document sentiment classification has usually in- volved the manual or semi-manual construction of semantic orien- tation word lexicons (Hatzivassiloglou & McKeown, 1997; Hatzivassiloglou & Wiebe, 2000; Lin, 1998; Pereira, Tishby, & Lee, 1993; Riloff, Wiebe, & Wilson, 2003; Turney & Littman, 2002; Wiebe, 2000), which built by word sentiment classification tech- niques. For instance, Das and Chen (2001) used a classifier on investor bulletin boards to see if apparently positive postings were correlated with stock price, in which several scoring methods were employed in conjunction with a manually crafted lexicon. Classify- ing the semantic orientation of individual words or phrases, such as whether it is positive or negative or has different intensities, generally using a pre-selected set of seed words, sometimes using linguistic heuristics (For example, Lin (1998) & Pereira et al. (1993) used linguistic co-locations to group words with similar uses or meanings). Some studies showed that restricting features to those adjec- tives for word sentiment classification would improve perfor- mance (Andreevskaia & Bergler, 2006; Turney & Littman, 2002; Wiebe, 2000). However, more researches showed most of the adjectives and adverb, a small group of nouns and verbs possess semantic orientation (Andreevskaia & Bergler, 2006; Esuli & Sebas- tiani, 2005; Gamon & Aue, 2005; Takamura, Inui, & Okumura, 2005; Turney & Littman, 2003). Automatic methods of sentiment annotation at the word level can be grouped into two major categories: (1) corpus-based ap- proaches and (2) dictionary-based approaches. The first group in- cludes methods that rely on syntactic or co-occurrence patterns of words in large texts to determine their sentiment (e.g., Hatzi- vassiloglou & McKeown, 1997; Turney & Littman, 2002; Yu & Hat- zivassiloglou, 2003 and others). The second group uses WordNet (http://wordnet.princeton.edu/) information, especially, synsets and hierarchies, to acquire sentiment-marked words (Hu & Liu, 2004a; Kim & Hovy, 2004) or to measure the similarity between candidate words and sentiment-bearing words such as good and bad (Kamps, Marx, Mokken, & de Rijke, 2004). 5.1. Analysis by conjunctions between adjectives This method attempts to predict the orientation of subjective adjectives by analyzing pairs of adjectives (conjoined by and, or, but, either-or, or neither-nor) which are extracted from a large unlabelled document set. The underlying intuition is that the act of conjoining adjectives is subject to linguistic constraints on the orientation of the adjectives involved (e.g. and usually conjoins two adjectives of the same-orientation, while but conjoins two adjectives of opposite orientation). This is shown in the following three sentences (where the first two are perceived as correct and the third is perceived as incorrect) taken from Hatzivassiloglou and McKeown (1997): ‘‘The tax proposal was simple and well received by the public”. ‘‘The tax proposal was simplistic but well received by the public”. ‘‘The tax proposal was simplistic and well received by the public”. To infer the orientation of adjectives from analysis of conjunc- tions, a supervised learning algorithm can be performed as follow- ing steps: 1. All conjunctions of adjectives are extracted from a set of documents. 2. Train a log-linear regression classifier and then classify pairs of adjectives either as having the same or as having different ori- entation. The hypothesized same-orientation or different-orien- tation links between all pairs form a graph. 3. A clustering algorithm partitions the graph produced in step 2 into two clusters. By using the intuition that positive adjectives tend to be used more frequently than negative ones, the cluster containing the terms of higher average frequency in the docu- ment set is deemed to contain the positive terms. The log-linear model offers an estimate of how good each pre- diction is, since it produces a value y between 0 and 1, in which 1 corresponds to same-orientation, and one minus the produced value y corresponds to dissimilarity. Same- and different-orienta- tion links between adjectives form a graph. To partition the graph nodes into subsets of the same-orientation, the clustering algo- rithm calculates an objective function U scoring each possible par- tition P of the adjectives into two subgroups C1 and C2 as, UðPÞ¼ X2 i¼1 1 jCij X x;y2Ci;x – y dðx; yÞ ! ð4Þ where jCij is the cardinality of cluster i, and d(x, y) is the dissimilarity between adjectives x and y. http://wordnet.princeton.edu/ 10764 H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 In general, because the model was unsupervised, it required an immense word corpus to function. 5.2. Analysis by lexical relations This method presents a strategy for inferring semantic orienta- tion from semantic association between words and phrases. It fol- lows a hypothesis that two words tend to be the same semantic orientation if they have strong semantic association. Therefore, it focused on the use of lexical relations defined in WordNet to calcu- late the distance between adjectives. Generally speaking, we can defined a graph on the adjectives contained in the intersection between a term set (For example, TL term set (Turney & Littman, 2003)) and WordNet, adding a link between two adjectives whenever WordNet indicates the presence of a synonymy relation between them, and defining a distance measure using elementary notions from graph theory. In more de- tail, this approach can be realized as following steps: 1. Construct relations at the level of words. The simplest approach here is just to collect all words in WordNet, and relate words that can be synonymous (i.e., they occurring in the same synset). 2. Define a distance measure d(t1, t2) between terms t1 and t2 on this graph, which amounts to the length of the shortest path that connects t1 and t2 (with d(t1, t2) = +1 if t1 and t2 are not connected). 3. Calculate the orientation of a term by its relative distance (Kamps et al., 2004) from the two seed terms good and bad, i.e., SOðtÞ¼ dðt; badÞ� dðt; goodÞ dðgood; badÞ ð5Þ 4. Get the result followed by this rules: The adjective t is deemed to belong to positive if SO(t) > 0, and the absolute value of SO(t) determines, as usual, the strength of this orientation (the con- stant denominator d(good, bad) is a normalization factor that constrains all values of SO to belong to the [�1, 1] range). 5.3. Analysis by glosses The characteristic of this method lies in the fact that it exploits the glosses (i.e. textual definitions) that one term has in an online ‘‘glossary”, or dictionary. Its basic assumption is that if a word is semantically oriented in one direction, then the words in its gloss tend to be oriented in the same direction (Esuli & Sebastiani, 2005; Esuli & Sebastiani, 2006a, 2006b). For instance, the glosses of good and excellent will both contain appreciative expressions; while the glosses of bad and awful will both contain derogative expressions. Generally, this method can determine the orientation of a term based on the classification of its glosses. The process is composed of the following steps: 1. A seed set (Sp, Sn), representative of the two categories positive and negative, is provided as input. 2. Search new terms to enrich Sp and Sn. Use lexical relations (e.g. synonymy) with the terms contained in Sp and Sn from a thesau- rus, or online dictionary, to find these new terms, and then append them to Sp or Sn. 3. For each term ti in S 0 p [ S 0 n or in the test set (i.e. the set of terms to be classified), a textual representation of ti is generated by collating all the glosses of ti as found in a machine-readable dic- tionary. Each such representation is converted into a vector by standard text indexing techniques. 4. A binary text classifier is trained on the terms in S0p [ S 0 n and then applied to the terms in the test set. 5.4. Analysis by both lexical relations and glosses This method determines sentiment of words and phrases both relies on lexical relations (synonymy, antonymy and hyponymy) and glosses provided in WordNet. Andreevskaia and Bergler (2006) proposed an algorithm named ‘‘STEP” (Semantic Tag Extraction Program). This algorithm starts with a small set of seed words of known sentiment value (positive or negative) and implements the following steps: 1. Extend the small set of seed words by adding synonyms, ant- onyms and hyponyms of the seed words supplied in WordNet. This step brings on average a 5-fold increase in the size of the original list with the accuracy of the resulting list comparable to manual annotations. 2. Go through all WordNet glosses, identifies the entries that con- tain in their definitions the sentiment-bearing words from the extended seed list, and adds these head words to the corre- sponding category – positive, negative or neutral. 3. Disambiguate the glosses with part-of-speech tagger, and elim- inate errors of some words acquired in step 1 and from the seed list. At this step, it also filters out all those words that have been assigned contradicting. In this algorithm, for each word we need compute a Net Overlap Score by subtracting the total number of runs assigning this word a negative sentiment from the total of the runs that consider it posi- tive. In order to make the Net Overlap Score measure usable in sen- timent tagging of texts and phrases, the absolute values of this score should be normalized and mapped onto a standard [0, 1] interval. STEP accomplishes this normalization by using the value of the Net Overlap Score as a parameter in the standard fuzzy mem- bership S-function (Zadeh, 1987). This function maps the absolute values of the Net Overlap Score onto the interval from 0 to 1, where 0 corresponds to the absence of membership in the category of sentiment (in this case, these will be the neutral words) and 1 re- flects the highest degree of membership in this category. The func- tion can be defined as follows, Sðu; a; b; cÞ¼ 0 if u 6 a 2 u�ac�a � �2 if a 6 u 6 b 1 � 2 u�ac�a � �2 if b 6 u 6 c 1 if u P c 8>>>>>>< >>>>>>: ð6Þ where u is the Net Overlap Score for the word and a, b, c are the three adjustable parameters: a is set to 1, c is set to 15 and b, which represents a crossover point, is defined as b = (a + c)/2 = 8. Defined this way, the S-function assigns highest degree of membership (=1) to words that have the Net Overlap Score u P 15. Net Overlap Score can be used as a measure of the words degree of membership in the fuzzy category of sentiment: the core adjec- tives, which had the highest Net Overlap Score, were identified most accurately both by STEP and by human annotators, while the words on the periphery of the category had the lowest scores and were associated with low rates of inter-annotator agreement. 5.5. Analysis by pointwise mutual information The general strategy of this method is to infer semantic orienta- tion from semantic association. The underlying assumption is that a phrase has a positive semantic orientation when it has good asso- ciations (e.g., ‘‘romantic ambience”) and a negative semantic orien- tation when it has bad associations (e.g., ‘‘horrific events”) (Turney, 2002). H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 10765 The semantic orientation of a given word is calculated from the strength of its association with a set of positive words, minus the strength of its association with a set of negative words. More con- cretely, the strength of the semantic association between words can express by calculating their pointwise mutual information (PMI) value. So, it focuses on inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Given a term t, and seed term sets (Sp for positive set and Sn for negative set), the t’s orientation value O(t) (where positive value means positive orientation, and higher absolute value means stronger orientation) is given by: OðtÞ¼ X ti2Sp PMIðt; tiÞ� X ti2Sn PMIðt; tiÞ ð7Þ Pointwise mutual information can be computed based on IR tech- niques. Term frequencies and co-occurrence frequencies are mea- sured by querying a document set by means of a search engine with a ‘‘t” query, a ‘‘ti” query, and a ‘‘t NEAR ti” query, and using the number of matching documents returned by the search engine as estimates of the probabilities needed for the computation of PMI. In the AltaVista search engine (http://www.altavista.com/), the NEAR operator produces a match for a document when its oper- ands appear in the document at a maximum distance of ten terms, in either order. The paradigm words can be selected as following (Turney & Littman, 2003): Sp ¼fgood; nice; excellent; positive; fortunate; correct; superiorg Sn ¼fbad; nasty; poor; negative; unfortunate; wrong; inferiorg In addition, Gamon and Aue (2005) described an extension to the technique for the automatic identification and labeling of sen- timent terms described in Turney and Littman (2003). Besides the basic assumption in Turney and Littman (2003), Turney (2002), Gamon and Aue (2005) adds a second assumption, namely that sentiment terms of opposite orientation tend not to co-occur at the sentence level. This additional assumption allows them to identify sentiment-bearing terms more reliably to some extent. 5.6. Analysis by general inquirer General Inquirer (GI) is a system which lists terms as well as dif- ferent senses for the terms. For each sense it provides a short def- inition as well as other information about the term. This includes tags that label the term as being positive, negative, a negation term, an overstatement, or an understatement. The labels are for each sense of a word. For example, there are two senses of the word fun as seen in Table 1. One sense is a noun or adjective for enjoyment or enjoyable. The second sense is a verb that means to ridicule or tease, to make fun of. The first sense of the word is positive, while the second is negative. The entry also indicates that the first sense is more fre- quent than the second sense (estimated to occur 97% of the times while the second sense occurs only 3% of the times). In addition, the GI dictionary includes negations, intensifiers, and diminishers. Table 2 shows the GI entries of the words not, fantastic and barely which are examples of a negation, an overstatement and an understatement, respectively. The GI dictionary contains 1,915 positive senses and 2,291 neg- ative senses. Kennedy and Inkpen (2006) add more positive and negative senses from Choose the Right Word (Hayakawa, 1994) Table 1 GI entries for the word fun. Fun (sense 1) H4Lvd Positiv Pstv Pleasur Ex Fun (sense 2) H4Lvd Negativ Ngtv Hostile C (hereafter CTRW). CTRW is a dictionary of synonyms, which lists nuances of lexical meaning. After adding them, they obtain 1955 positive senses and 2398 negative senses. There are 696 overstate- ments and 319 understatements in GI. When adding those from CTRW, they obtain 1269 overstatements and 412 understatements. Kennedy and Inkpen (2006) used this approach to classify reviews based on the number of positive and negative terms they contain. They examined the effect of three types of valence shifters: negations, intensifiers, and diminishers. In: GI intensifiers are known as overstate- ments and diminishers are known as understatements. Negations are used to reverse the semantic polarity of a particular term, while intensifiers and diminishers are used to increase and decrease, respec- tively, the degree to which a term is positive or negative. 6. Document sentiment classification based on machine learning methods 6.1. Single domain There are many possible approaches to identifying the actual polarity of a document. Here we talk about using supervised ma- chine learning techniques to identify the likelihood of reviews hav- ing ‘‘positive” or ‘‘negative” polarity with respect to previously hand-classified training data. The key problem of this method in- cludes two aspects: extracting feature and training classifier. 6.1.1. Extracting feature Starting with a raw document (a portion of a web page in test- ing and training, and a complete web page for mining), then strip out HTML tags and separate the document into sentences. These sentences are optionally run through a parser before being split into single-word tokens. A variety of transformations can then be applied to this ordered list of lists. This is called feature extraction. There are several methods for feature extraction: 1. Lexical filtering Lexical filtering can apply to reviews, on the accuracy of statis- tical classifiers trained on such filtered data. There are mainly two different kinds of lexical filters used (Salvetti, Lewis, & Rei- chenbach, 2004): one based on hypernymy as provided by WordNet (Budanitsky & Hirst, 2001; Curran, 2002; Devitt & Vogel, 2004; Edmonds, 1999; Edmonds & Hirst, 2002; Fellbaum, 1998; Grefenstette, 1994; Justeson & Slava, 1993; Kamps & Maarten, 2002; Rapp, 2004; Riloff & Shepherd, 1997; Roark & Charniak, 1998; Taboada, 2006; Thelen & Riloff, 2002; Valitutti, Strapparava, & Stock, 2004), the other based on part-of-speech (POS) tags (Ait-Mokhtar, Chanod, & Roux, 2002; Brill, 1992; Brill, 1994; Brill, 1995; Losee, 2001; Ratnaparkhi, 1996; Schmid, 1994; Wiebe, Wilson, & Bell, 2001; Wiebe, Wilson, & Cardie, 2005; Wilks & Stevenson, 1998). WordNet filter attempted to substitute synonyms in reviews by a set of likely synonyms and hypernymy generalization in WordNet, because it is uncommon to encounter repetitions of identical words in non-technical written text. POS filter considered that individual phrases and words vary in their contribution to opinion polarity. It may even be said that only some part of the meaning of a word contributes to opinion polarity. Any portion that does not contribute to sentiment detection is noise. POS filters were developed to reduce these noises. prsv WlbPsyc WlbTot Noun PFREQ 97% noun–adj: Enjoyment, enjoyable omForm SV RspLoss RspTot SUPV 3% idiom–verb: Make fun (of) – to tease, parody http://www.altavista.com/ Table 2 GI entries for the words not, fantastic and barely. Not H4Lvd negate NotLw LY – adv: expresses negation NotLw LY adv: expresses negation Fantastic H4Lvd Positiv Pstv Virtue Ovrst EVAL PosAff Modif – Virtue Ovrst EVAL PosAff Modif Barely H4Lvd Undrst Quan If LY – Quan If LY 10766 H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 2. Appraising adjective Appraising adjective method (Whitelaw, Argamon, & Garg, 2005; Whitelaw, Garg, & Argamon, 2005) focuses on the extraction and analysis of adjectival appraisal groups headed by an appraising adjective (such as ‘‘beautiful” or ‘‘boring”) and optionally modified by a sequence of modifiers (such as ‘‘very”, ‘‘sort of”, or ‘‘not”). It made a more detailed semantic analysis of attitude expressions, in the form of a well-designed taxonomy of attitude types and other semantic properties. Furthermore, it treats ‘‘atomic units” of such expressions not with individual words, but with appraisal groups: coherent groups of words that express together a particu- lar attitude, such as ‘‘extremely boring”, or ‘‘not really very good”. This method can be described as following steps: 1. Build a lexicon using semi-automatic techniques, gathering and classifying adjectives and modifiers to categories in several taxonomies of appraisal attributes. 2. Extract adjectival appraisal groups from texts and compute their attribute values according to this lexicon. 3. Represent documents as vectors of relative frequency features using these groups. 4. Train a support vector machine algorithm discriminating posi- tively from negatively oriented test documents. Beineke, Hastie, and Vaithyanathan (2004) extend this proce- dure by extract a pair of derived features that are linearly com- bined to predict sentiment. This perspective allows to improving upon previous methods, primarily through two strategies: incorpo- rating additional derived features into the model and, where pos- sible, using labeled data to estimate their relative influence. Matsumoto, Takamura, and Okumura (2005) used text-mining techniques to extract frequent word sub-sequences and depen- dency sub-trees from sentences in a document dataset and used them as features of support vector machines. 6.1.2. Training classifier Several classical text classifiers, such as K-Nearest Neighbor, Winnow, Naı̈ve Bayes, Maximum Entropy and Support Vector Ma- chine (SVM), are used in machine learning method for document sentiment classification. Pang et al. (2002) applied Naı̈ve Bayes, Maximum Entropy and Support Vector Machine classification techniques to the identifica- tion of the polarity of movie reviews. Their best result (82.9% accu- racy) was obtained by using unigrams and Support Vector Machine. In fact, most of the literatures showed that SVM and Naı̈ve Bayes are perfect methods in single domain document sentiment classification (Aue and Gamon, 2005; Beineke et al., 2004; Kennedy and Inkpen, 2006; Lin et al., 2006; Matsumoto et al., 2005; Mullen and Collier, 2004; Pang and Lee, 2004; Pang et al., 2002; Read et al., 2005; Salvetti et al., 2004; Whitelaw, Garg, et al., 2005). 6.2. Multiple domains Document sentiment classification is a very domain-specific problem: classifiers trained in one domain do not perform well in others Charlotta (2004), Nigam, McCallum, and Thrun (2000), Lafferty, McCallum, and Pereira (2001), Avrim and Chawla (2001). The main factor is that the polarity of sentiment words may change with the domain too. For instance, the word ‘‘small” in house review is negative (e.g. ‘‘The bedroom is very small”), while in cell-phone review is positive (e.g. ‘‘The Nokia N3100 is so small as to be put in any pockets”). Tan, Wu, Tang, and Cheng (2007) attempted to tackle domain- transfer problem by combining old-domain labeled examples with new domain unlabeled ones. Their basic idea is to use old-domain- trained classifier (‘‘old classifier” for brevity) to label top n most informative unlabeled examples in new domain and learn a new classifier based on these selected examples (n is a pre-defined number indicating how many examples in new domain shall be picked out as informative ones). Detailed algorithm for proposed scheme is: 1. Train a base classifier using labeled data in old-domain. 2. Label some informative unlabeled ones in new domain. 3. Train a new classifier based on these selected examples. 4. Classify examples in new domain using new classifier. The experimental results demonstrate their proposed scheme can boost the accuracy of the base sentiment classifier on new domain. Aue and Gamon (2005) surveyed four different approaches to customizing a sentiment classification system to a new target do- main with a small amount of labeled data. Read et al. (2005) pro- posed a novel source of training data based on the language used in conjunction with emoticons in Usenet newsgroups. Then they used emoticons-labeled data to train a classifier, which has the po- tential to reduce dependent of domain, topic and time. 7. Opinion extraction Analysis of favorable and unfavorable opinions is a task requir- ing high intelligence and deep understanding of the textual con- text, drawing on common sense and domain knowledge as well as linguistic knowledge. The interpretation of opinions can be debatable even for humans. For example, when we tried to deter- mine if each specific document was on balance favorable or unfa- vorable toward a subject after reading an entire group of such documents, we often found it difficult to reach a consensus, even for very small groups of evaluators. Many researchers think it is too coarse to compute a unit of an opinion for a whole document (Bai, Padman, & Airoldi, 2004; Bet- hard, Yu, Thornton, Hatzivassiloglou, & DanJurafsky, 2004; Bune- scu & Mooney, 2004; Daille, 1996; Didier, 1995; Etzioni et al., 2004; Freitag & McCallum, 2000; Jacquemin & Bourigault, 2001; Luca & Mazzini, 2002; Riloff & Jones, 1999; Wilson, Wiebe, & Hwa, 2004; Zhai & Liu, 2005). Turney (2002) makes a similar point, noting that for reviews, ‘‘the whole is not necessarily the sum of the parts”. Even in a single sentence, a holder might express two different opinions. It seems necessary using more sophisticated techniques to determine the focus of each sentence, so that one can decide whether the author is talking about the topic (Dave, Lawrence, & Pennock, 2003; Fei, Liu, & Wu, 2004; Kim & Hovy, 2004; Ku, Wu, Lee, & Chen, 2005; Mei, Ling, Wondra, Su, & Zhai, 2007). H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 10767 Consequently, opinion extraction plays a very important role in sentiment detection. It not only focuses on extracting opinion information from reviews, but also on extracting relation between opinion and document topic. 7.1. Opinion information extraction The opinion information we mainly discussed in this paper in- cludes opinion-bearing word and opinion holder. An opinion-bear- ing word is a word or a phrase that carries a positive or negative sentiment directly such as ‘‘good”, ‘‘bad”, ‘‘foolish”, ‘‘virtuous”, etc. An opinion holder is an entity (person, organization, country, or special group of people) who expresses explicitly or implicitly the opinion contained in the sentence. For instance: ‘‘According to the review in Internet, the keypad of Ericsson W810i is easy to use”. In above sentence, ‘‘the review in Internet” is an opinion holder; ‘‘easy” is an opinion-bearing word. 7.1.1. Opinion-bearing word extraction Opinions can be recognized from various granularities such as a word, a sentence, a text, or even multiple texts, and each is impor- tant. Here we focus on word level opinion detection, i.e., finding words or phrases that carry a positive or negative sentiment (opin- ion-bearing word) from subjectivity sentences or paragraphs. Actually, opinion-bearing word is the smallest unit of opinion that can thereafter be used as a clue for sentence-level or text-level opinion detection. How to extract opinion-bearing words from document? A straightforward way can be conducted by following steps: 1. Collect sentiment words from sentiment lexicon as opinion- bearing seed words. Opinion-bearing seed words can be col- lected from several sources: General Inquirer (GI), Dictionary of Affect of Language (DAL), and WordNet (Kim & Hovy, 2006; Yi, Nasukawa, Bunescu, & Niblack, 2003). 2. Expand those selected opinion-bearing seed words of each sen- timent class by collecting synonyms from WordNet. However, it cannot simply assume that all the synonyms of positive words are positive since most words could have synonym relation- ships with both positive and negative classes. It should calcu- late the closeness of a given word to each category and determine the most probable class (Kim & Hovy, 2006). 3. Refine some of the sentiment patterns from sentiment lexicon and training datasets (Yi et al., 2003). For GI and DAL, the sen- timent verb extraction is the same as the opinion-bearing seed words extraction. For WordNet, the sentiment verb can be extracted from the emotion cluster. The other sentiment pat- terns can be manually refined from the training datasets. Senti- ment pattern database contains sentiment extraction patterns for sentence predicates. 4. Extract sentences and text fragments from input documents containing subjectivity information (the approach we have dis- cussed in Section 4). Apply sentiment analysis with expanded opinion-bearing seed words and sentiment patterns to those subjectivity sentences and text fragments. At last, the opin- ion-bearing words were extracted. 7.1.2. Opinion holder extraction The goal of opinion holder extraction is to identify direct and indirect sources of opinions, emotions, sentiments, and other pri- vate states that are expressed in text. Identifying opinion sources (or opinion holder) will be especially critical for opinion-oriented question–answering systems (e.g., systems that answer questions of the form ‘‘Who feels about . . .?”) and opinion-oriented summa- rization systems, both of which need to distinguish the opinions of one source from another. There are mainly three research points in opinion holder extrac- tion. The first is: given a sentence, identifying opinion sources in it (Choi, Cardie, Riloff, & Patwardhan, 2005); the second is: given an opinion expression in a sentence, identifying its corresponding opinion sources (Kim & Hovy, 2006); the third is: given an opinion expression and a source, determines whether the source and the opinion expression is correspond (Choi, Breck, & Cardie, 2006). 1. The first research point The first research point learns patterns of opinion sources using a graphical model and extraction pattern learning. It views the opin- ion holder extraction problem as an information extraction task and adopts a hybrid approach that combines CRFs (Conditional Random Fields) and a variation of AutoSlog (a supervised extrac- tion pattern learner that takes a training corpus of texts and their associated answer keys as input) (Riloff, 1996). While CRFs identi- fies source and AutoSlog learns extraction patterns.It starts with a sequence of words (x1, x2, . . . , xn) in a sentence and then: (1) Generate a sequence of labels (y1, y2, . . . , yn) indicating whether the word is a holder or not. (2) Presents a new variation of AutoSlog, AutoSlog-SE, which generates patterns to extract sources. This algorithm is not perfect, however, so the resulting set of patterns needs to be manually reviewed by a person. In order to build a fully automatic system which has no use for manual review, Choi et al. (2005) combined AutoSlog’s heuristics with statistics from the annotated training data to create a fully auto- matic supervised learner. 2. The second research pointThe second research point uses clas- sification and ranking to model the problem with Maximum Entropy (ME) (Kim & Hovy, 2006). Classification allocates each holder candidate to a set of pre-defined classes while ranking selects a single candidate as answer. It chooses the most prob- able candidate via a conditional probability.This method acts as following steps. (1) Generate all possible holder candidates, given a sentence and an opinion expression hEi. After parsing the sentence, it extracts features such as the syntactic path information between each candidate hHi and the expression hEi and a distance between hHi and hEi. (2) Rank holder candidates according to the score obtained by the ME ranking model, and pick the candidate with the highest score. Given a set of holder candidates {h1, h2, . . . , hn} and opin- ion expression e. The conditional probability P(hj{h1, h2, . . . , hn}, e) can be calculated based on K feature functions fk(hj{h1, h2, . . . , hn}, e), as follows, h ¼ argmaxh½Pðhjfh1; h2; . . . ; hNg; eÞ� ¼ argmaxsh XK k¼1 kk fkðhjfh1; h2; . . . ; hNg; eÞ " # ð8Þ where each kk is a model parameter indicating the weight of its feature function. 3. The third research point The third research point (Choi et al., 2006) aimed to identify the opinion holder via extracting relations between opinion expres- sion entities and source entities. That is, given opinion expres- sion Oi and source Sj, it determines whether Sj is the source of opinion expression Oi. The global inference procedure is imple- mented via integer linear programming (ILP) to produce an optimal and coherent extraction of entities and relations. ILP formulation group consists of an objective function and a set of equality and inequality constraints among variables. The fol- lowing formulation is the objective function of binary ILP, 10768 H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 f ¼ X i ðwoi OiÞþ X i ðw�oi O � i Þþ X j ðwsj SjÞþ X j ðw�sj S � j Þ þ X i;j ðwli;j Li;jÞþ X i ðw�li;j L � i;jÞ ð9Þ 8i; Oi þ O�i ¼ 1; 8j; Sj þ S � i ¼ 1; 8i; j; Li;j þ L � i;j ¼ 1 ð10Þ where f is the objective function of ILP. Oi and O � i are two variables for each opinion entity, Oi = 1 means to extract the opinion entity, and O�i ¼ 1 means to discard the opinion entity. woi and w�oi are weights for Oi and O � i , respectively, which are computed based on the labels of the adjacent variables of the CRFs (Choi et al., 2006). Likewise, Sj and S � j are two variables for each source entity to indi- cate whether it had been extracted, their weights wsj and w�sj are computed in the same way as opinion entities; Li,j and L � i;j are two variables for each link relation between Oi and Sj. Li,j = 1 indicate Oi and Sj both be extracted, otherwise, L � i;j ¼ 1. Their weights wlij and w�lij are based on probabilities from the binary link classifier.The following is a set of equality and inequality constraints among variables. 8i; Oi ¼ X j Li;j ð11Þ 8j; Sj þ Aj ¼ X i Li;j ð12Þ 8j; Aj � Sj 6 0 ð13Þ 8i; j; i < j; Xi þ Xj ¼ 1; X 2fS; Og ð14Þ Formula (11) enforces that only one link can emanate from an opin- ion entity. Formulas (12) and (13) together allow a source to link to at most two opinions, where Aj is an auxiliary variable between 0 and 1. Aj can be assigned to 1 only if Sj is already assigned to 1. For- mula (14) is used to restrict all pairs of entities with overlapping spans. 7.2. Opinion-topic relation extraction Opinion-topic relation refers to relationship between opinion expression (opinion-bearing words) and document topic (or fea- ture of topic). A feature of a topic is a term that satisfies one of the following relationships: � A part-of relationship with the given topic. � An attribute-of relationship with the given topic. � An attribute-of relationship with a known feature of the given topic. For instance, ‘‘I found the Sony Ericsson W810i is a good mobile phone. The keypad is easy to use and texting is very simple as the buttons are small but well defined. The screen is bright and clear with good resolution. The software on this phone is excellent. I have had absolutely no problems with it what so ever, it never crashes, freezes or otherwise upsets me. The battery life on this phone is excellent. I have had mine for 18 months, with it rarely being switched off in all this time. In my normal use (only about 30–60 min calls a day) it will last for about 3–4 days before needing a charge. Charging is very quick too taking only around an hour to fully charge from about 10%. All in all an excellent little phone with very few faults. Recommended!” In above text, ‘‘good”, ‘‘easy to use”, ‘‘bright” etc., are opinion expression, ‘‘Sony Ericsson W810i” is topic, and ‘‘keypad”, ‘‘screen”, ‘‘software” etc., are features of topic. 7.2.1. Feature term extraction Yi et al. (2003) put forward the following three candidates as feature term to be extracted: 1. Base Noun Phrases (BNP). BNP restricts the candidate feature terms to one of the following base noun phrase (BNP) patterns: NN, NN NN, JJ NN, NN NN NN, JJ NN NN, JJ JJ NN, where NN and JJ are the part-of-speech (POS) tags for nouns and adjectives, respectively. 2. Definite Base Noun Phrases (dBNP). dBNP further restricts can- didate feature terms to definite base noun phrases, which are noun phrases of the form defined above that are preceded by the definite article ‘‘the”. Given that a document is focused on a certain topic, the definite noun phrases referring to topic fea- tures do not need any additional constructs such as attached prepositional phrases or relative clauses, in order for the reader to establish their referent. Thus, the phrase ‘‘the battery,” instead of ‘‘the battery of the digital camera,” is sufficient to infer its referent. 3. Beginning Definite Base Noun Phrases (bBNP). bBNP refers to dBNP at the beginning of sentences followed by a verb phrase. This heuristic is based on the observation that, when the focus shifts from one feature to another, the new feature is often expressed using a definite noun phrase at the beginning of the next sentence. They developed and tested two feature term selection algo- rithms based on a mixture language model and likelihood ratio, while the Likelihood Test method gets better result. Following is principle of the Likelihood Test method: Let D+ be a collection of documents focused on a topic T, D� those not focused on T, and bnp a candidate feature term extracted from D+. Then, the likeli- hood ratio �2log k is defined as follows, � 2 log k ¼�2 log maxp16p2 Lðp1; p2Þ maxp1;p2 Lðp1; p2Þ p1 ¼ pðd 2 Dþjbnp 2 dÞ p2 ¼ pðd 2 Dþjbnp 2 dÞ ð15Þ where L(p1, p2) is the likelihood of seeing bnp in both D+ and D�. The higher the value of �2log k, the more likely the bnp is relevant to the topic T. For each bnp, compute the likelihood score, �2log k, as de- fined in formula (15). Then, sort bnp in decreasing order by their likelihood score. Feature terms are all bnp’s whose likelihood ratio satisfying a pre-defined confidence level. Alternatively simply only the top N bnp’s can be selected. 7.2.2. Makes (topic j feature term, opinion) association In order to achieve high precision, we need focus on identifying semantic relationships between opinion expressions and topic (or feature of topic) terms, i.e. extracting opinion-bearing words asso- ciated with polarity of positive or negative for specific topic (or fea- ture of topic) from a document (Agrawal & Srikant, 1994; Jon & Tardos, 2002; Liu, Hsu, & Ma, 1998; Rosario & Hearst, 2004). In order to identify opinion expressions and analyze their semantic relationships with the topic (or feature of topic) term, fol- lowing natural language processing techniques play an important role (Nasukawa & Yi, 2003): 1. POS tagging POS tagging can disambiguate some polysemous expressions such as ‘‘like,” which denote sentiment only when used as a verb instead of as an adjective or preposition. 2. Syntactic parsing Syntactic parsing is used to identify relationships between senti- ment expressions and the subject term. Furthermore, in order to H. Tang et al. / Expert Systems with Applications 36 (2009) 10760–10773 10769 maintain robustness for noisy texts from various sources such as the WWW, it need to use a shallow parsing framework that iden- tifies phrase boundaries and their local dependencies in addition to POS tagging, instead of using a full parser that tries to identify the complete dependency structure among all of the terms. Yi et al. (2003) extracts ternary expressions (T-expressions) and binary expressions (B-expressions), in order to make (topic j feature term, opinion) association. There are two types of T-expressions: 1. positive or negative sentiment verbs:h