key: cord-131093-osukknqr
authors: Suzen, Neslihan; Mirkes, Evgeny M.; Gorban, Alexander N.
title: Informational Space of Meaning for Scientific Texts
date: 2020-04-28
journal: nan
DOI: nan
sha: 
doc_id: 131093
cord_uid: osukknqr

In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vector of Relative Information Gain (RIG) about the subject categories that the text belongs to, which can be obtained from observing the word in the text. This new approach is applied to construct the Meaning Space based on Leicester Scientific Corpus (LSC) and Leicester Scientific Dictionary-Core (LScDC). The LSC is a scientific corpus of 1,673,350 abstracts and the LScDC is a scientific dictionary which words are extracted from the LSC. Each text in the LSC belongs to at least one of 252 subject categories of Web of Science (WoS). These categories are used in construction of vectors of information gains. The Meaning Space is described and statistically analysed for the LSC with the LScDC. The usefulness of the proposed representation model is evaluated through top-ranked words in each category. The most informative n words are ordered. We demonstrated that RIG-based word ranking is much more useful than ranking based on raw word frequency in determining the science-specific meaning and importance of a word. The proposed model based on RIG is shown to have ability to stand out topic-specific words in categories. The most informative words are presented for 252 categories. The new scientific dictionary and the 103,998 x 252 Word-Category RIG Matrix are available online. Analysis of the Meaning Space provides us with a tool to further explore quantifying the meaning of a text using more complex and context-dependent meaning models that use co-occurrence of words and their combinations.

Automatic analysis of text meaning is one of the main problems in Natural Language Processing (NLP). This work is focused on the computational analysis of the meaning of short scientific texts (abstracts or brief reports). The starting point is a combination of a simple Bag of Word (BoW) model with the holistic approach to the text meaning: the text is considered as a collection of words, the meaning of the text is hidden in a situation of use, which is evaluated as a whole. A space of meaning for words is created from the analysis of situations of their use and then, after detailed analysis of this space (including dimensionality reduction and clustering) we will return to the texts and introduce more complex models including words co-occurrence analysis, combination of word's meaning etc.

First of all, we have to consider the "meaning of meaning". This is an extremely deeply discussed topic, since antiquity till modern time (see, for example, [1, 2, 3, 4] ), but the consensus is still on the way. We start from the Wittgenstein formulation: "Meaning is use" or, in more detail, "For a large class of cases though not for all in which we employ the word 'meaning' it can be defined thus: the meaning of a word is its use in the language"[5, §43].

This idea was widely discussed. This paper aims to propose an approach to computational analysis of meaning for a large family of texts. The texts we work with (abstracts or brief reports) have well defined dominant communicative function: 2 There is a representation of a situation on the sender's 'blackboard of the consciousness' (a representation 1 of a situation 1). A text related to this situation is generated by the sender (translation 1). This text is transmitted to the receiver and transformed by him into a representation of a situation (representation 2 of a situation 2). The situation can be the situation in a real world, the imaginary situation in a possible world, an impossible situation in an impossible world, a chimeric situation combined from several possible or imaginary situation, and so on. We do not study the relations of representation to reality, but only consider the chain: Representation 1 → Text → Representation 2.

this is the representative function. An elementary basic scheme of the correspondent act of communication is presented in Figure 1 .1. In this scheme, we see two representations of the situation on the blackboard of consciousness: the sender's representation of the situation and the receiver's representation of the situation. Moreover, the represented situations can be different. In fact, they are always different, and special tools are invented and used to make them as close as possible when necessary. The situations are not compulsory real. They can be real, possibly real or imaginary, and even impossible. It is necessary to stress that the sender's and the receiver's representations never coincide and:

• Do not represent any situation 'in detail' and, therefore, can represent parts (or projections, let us recall the Plato's Cave allegory) of many different situations at the same time; • Can include internal contradictions and, therefore, can represent nothing possible in reality; • Can partially represent different situations, that is, can be 'chimeric' combinations of different possible real or imaginary situations. There are two 'translation' operations in the scheme Figure 1 .1: (i) from the sender's representation of the situation to the text of the message and (ii) from the text of 3

Informational Space of Meaning for Scientific Texts the message to the the receiver's representation. Both these operations depend on much wider context of the communication including experience of the sender and receiver. Of course, the standard scientific communication assumes that there may be many receivers and the sender can be not a single person. This one-to-many or even many-to-many communication adds more situations and representations and may also add some less trivial multi-agent structures with additional communication channels.

We consider using the language to transmit information about the represented situations (Figure 1 .1) and neglect many other uses of the language, from military orders to psychological manipulations. The scheme of the act of communication (Figure 1 .1) includes just very basic elements and can be elaborated in much more detail. Here we should refer to the classical works of G.P. Shchedrovitsky [6, 7] and J. Habermas [8, 9] . For our purposes in this work, the basic scheme (Figure 1.1) is sufficient.

According to Shchedrovitsky [6] , at the level of 'simple communication' there is no 'meaning' different from the processes of understanding themselves, which correlate and connect the elements of the text message with each other and with the elements of the situation being restored.

Meaning, for our analysis, is hidden in the relationship between the representation of situations on the 'blackboard of the consciousness' and the texts of the messages. That is, a formal analysis of meaning requires the formalisation of translation operations presented in the scheme of a communication act (Figure 1.1) . Moreover, we can state that we understand the meaning of meaning if and only if we can produce such a translation. This translation is context-dependent, the unique experience of the sender and the receiver is involved in this context, so the task of "reproducing the translation" is not fully feasible. Moreover, understanding can be represented as a reflexive game [10] with different levels (The sender prepares a message taking into account the experience of the receiver, his goals and tools, and guesses that the receiver takes into account the experience of the sender, his goals and tools, and... Analogously, the receiver tries to understand the message taking into account..., etc.)

The relation between the text and the representation of the situation cannot be considered as a bijection (both for sender's and receiver's representation). It is many-to-many correspondence: each text corresponds to many situations and each situation can have many representing texts. Moreover, the further consistent formalisation requires the notion of fuzzy many-to-many correspondence elaborated for relational databases [11] .

According to Mel'uk [12, 13, 14, 15] , the natural language is "the meaning to text and text to meaning transformer". He accepted a very strong hypothesis that we are able to describe meaning in a special semantic language.

We prefer to be more flexible at this point and characterise a situation "behind the text" by a set of attributes, the method of this characterisation can be changed and does not give a unique and exhaustive presentation of it.

Despite the multiplicity of possible translations, creating of a plausible translation (one of many possible versions) and description of the cloud of such versions of translation can be challenging. This problem resembles the translation problem for natural languages. Now, after impressive progress of machine translation, it seems to be a very attractive idea to apply the modern machine learning tools and encoder-decoder approach [16] to analysis and simulation the translation operations Representation 1 → Text → Representation 2 (Figure 1.1) .

Huge digitized collections of texts exist and are available online. On the contrary, unfortunately, there is no generally accepted common tools for working directly with representations of situations. Various philosophical and logical aspects of this problem were discussed previously by many authors (see, for example, the book 'Representation and reality ' [17] ).

We do not have a universal toolbox for work with all representations of situations and cannot propose a general solution to this problem. Such a solution, perhaps, is impossible in a finite closed form despite many efforts over decades. Our goal is more modest. We will provide computational analysis of relations between texts of messages and representations of situations for a large collection of brief scientific texts. To do this, these representations must be standardised, at least in part, and expressed in the form of diagrams, specially organized texts or other means.

The simplest approach is to replace the situation representations with the values of some attributes. This approach is not only the simplest, but also quite universal. Many forms of more specific descriptions of situations can be transformed into vectors of attributes. The choice of attributes can be very broad. A classical collection of examples is provided by various version of sentiment analysis. We aim to provide another basic example specific to scientific texts: a list of scientific subject categories that the text belongs to. The list of 252 possible categories is generally accepted and standardised by WoS. Of course, the variety of possible extensions and modifications of the set of attributes characterising the situation is virtually infinite.

Initially, in the act of communication, the situation is not represented by a universally conventional set of attributes. The introduction of attributes is an additional operation external to the communication and is not included in the scheme of simple communication (Figure 1.1) . Moreover, an additional operation must be performed for the selected set of attributes: evaluating their values. This operation can be done either on the sender's side ( Figure 1 .2), the receiver's side ( Figure 1.3) , or by combinations of these approaches. For example, categorisation of a brief scientific texts is a result of combined efforts: the authors select the categories by their choice of the journal, of the keywords, or by the pointing the categories directly, then the editors can have their own choice, then WoS can finalise the list of subject categories for this text.

For most information services, the choice of subject categories is the result of an understanding of the text by many agents and conflicts of understanding are possible. Even on a famous and very 'liberal' preprint server, arXiv, moderators can sometimes change the category selected by the authors. For example, an author may decide that his paper belong to the category 'condensed matter', whereas the moderator may look through the paper and understand that the main category is not 'condensed matter' research but rather 'nonlinear science' (this was a real life example). This simple example is important because it demonstrates that the content of the text may differ from its meaning: the text contained an explicit reference to 'condensed matter', but this content was questioned by the moderator, since in his understanding the research refers mainly to nonlinear science, and not to condensed matter. There are important differences between the concepts of 'meaning' and 'content' [6] , which are often confused (just as understanding the situation behind the text is often confused with recognising the content of the text).

In the general case, agents who are looking for the meaning of the text can be both humans or computer systems. The latter understand the text in the sense that they define the attributes of the situation behind the text. In our analysis below, the starting point is the combination of the text with the list of the subject categories the text belongs to (1, 673 ,350 abstracts and 252 categories).

The core idea of this approach goes back to the lexical approach of Sir Francis Galton. He selected the personality-descriptive terms and stated the problem of their interrelations for real persons. This work was continued by Thurstone [18] . He selected sixty adjectives (attributes of a person that are in common use). The respondents (1300 persons) were asked to imagine a person they knew well and to select the adjectives that can best describe this person. That is, a person was described by a 60-dimensional Boolean vector. The coordinates correspond to the attributes, the value is 1, if the attribute was selected to characterise the person, and 0 otherwise. Factor analysis gave five factors. After many years of development and discussions, the modern five-factor personality model became one of the common tools in psychodiagnosis [19, 20] .

In psycholinguistics, Osgood with co-workers [21] used a similar approach for creation of the 3D space of meaning by extraction of three 'coordinates of meaning' from the evaluation of the 'affective meaning' of words (objects) by people. These three coordinates are three extracted factors: Evaluation, Potency, and Activity. Of course, the researches started from many different scales and these three were extracted by factor analysis. Galton, Thurstone, Osgood and their followers asked respondents to evaluate a single object or person. Nevertheless, we can guess that these evaluations were related to some situations with this single object or person, not just to an isolated abstract object. The people evaluated not the abstract 'terms' but the psychologically meaningful situations behind these terms. These situations were the sources of the 'affective meaning' or the personality evaluations. For example, if we evaluate a person as accurate, reliable, and friendly then we have in mind some situations where these properties were demonstrated. The same, if we evaluate a 'dog' as strong, good, and active (or, say, weak, bad and active), we have in mind a dominant situation which we associate with a dog.

The 'affective meaning' or psychological properties do not seem reasonable tools for description of the situations behind scientific texts. In our world of abstracts and brief scientific texts, there is another, scientifically specific description of the situation of use -the categories of the text. There are 252 Web of Science (WoS) categories for the Leicester Scientific Corpus (LSC), to which the text could belong (see Table B .1). These categories can intersect: a text can belong to several categories. We will use these 252 binary attributes (the text belongs to a given category, or does not belong to it) as a basic description of the situation.

The categories evaluate the situation (the research area) related to the text as a whole, not as a results of the combination of words' meaning. In this holistic approach, we define the general meaning of a word in short scientific texts as the information that the use of this word in texts carries about the categories to which 7

Informational Space of Meaning for Scientific Texts these texts belong. More specifically, that is the Relative Information Gain (RIG) about the subject categories that the text belongs to, which can be obtained from observing the word in the text. This RIG is defined for each word and each category. Thus, a meaning of a word is represented by a 252-dimensional vector of RIGs. We create and study this space of meanings.

(1) We intend to analyse the meaning of scientific texts.

(2) We considered the specific world of the texts -the abstracts of research papers.

(3) We narrowed the whole word of abstracts to a sample: 1,673,350 texts from the Leicester Scientific Corpus (LSC) [22] . (4) We characterize the research situations behind the text by 252 binary attributes -the scientific WoS categories.

Thus, to follow this way, we need a triad: dictionary, texts, and multidimensional evaluation of the situation of use presented by the categories. We prepared the first two elements in the previous work and the results are available online [23, 24, 25] . Now, we start to create the space of meaning.

In our case study, we employed very simple attributes for description of the text usage situation, the research subject categories of the text. This list of attributes can be modified and extended. The level of detail of the Meaning Space can vary greatly within the framework of the proposed approach.

Let us take a quick look at some relevant ideas about quantifying the meaning of words. Quantifying the meanings of words in a metric space might be used to measure the meanings of texts in the same metric as a BoW. A key issue in understanding the meaning of texts is to use a precise metric based on words' meanings. In classical psycholinguistic studies it is common to allocate words in a metric space based on their semantic connotations [26, 27] . Semantic space model is a representation technique where each word is assigned to a point in high dimensional vector space. Vector Space Model (VSM) is one of the most attractive models for researchers since it makes semantics computable [28] . Osgood hypothesised 3-dimensional semantic space to quantify connotative meanings in his theory of Semantic Differential concerning psychological and behavioural aspects [21, 29, 30 ]. The semantic space in his work was built by, in his words, 'three orthogonal bipolar dimensions': Evaluation (E), Potency (P) and Activity (A) where each word is uniquely located on. Following this method of semantic differential, many studies have been attempted by both psychologists and linguists to identify new dimensions of semantic space and to measure the meaning [21, 27, 31, 32] . The structures of semantic spaces are constructed differently by various researchers.

From the perspective of distributional linguistics, a semantic space model is a representation technique for contextual similarity of words by their co-occurrence counts. Distributional hypothesis was introduced by Harris [33] and Distributional Semantic Models (DSM) were then proposed to represent word semantic by distributional vectors [28, 34, 35, 36] . This idea claims that words' similarity can be characterised by their distribution of contexts [37, 38] . The model offers that each word is represented by distribution of its contexts and the distribution of contexts 8

Informational Space of Meaning for Scientific Texts can be learnt from the co-occurance. The axes in the space are determined by local word co-occurrences and the similarity of a word is measured by its position found by counting co-occurrences to other words in this semantic space [39] . This means that a word's distributional context is represented by a vector of co-occurrences with other context words in a window, where a window can be a certain number of words or lemmas (e.g. words, phrases, sentence, paragraph or document).

Researchers in cognitive studies and information retrieval noted that usage of raw co-occurrence counts is problematic as semantic similarity will have frequency bias [32] . It is proposed that degrees of similarity between word occurances can be assigned. Different approaches are used to avoid this problem by weighining of elements of the vector. Latent Semantic Analysis (LSA) is one of vector space models in NLP, in particular DSM, for estimating and representing the meaning of word based on statistical computations [40, 41] . In LSA, word senses (or meanings) are approximated in high dimensional space by its effect on the meaning of contexts in which it occurs [42] . Relationship between texts based on their words and relationship between words based on their appearances in texts are analysed simultaneously in order to extract relations of words in terms of their contexts.

LSA has been used for an adequate theory of word meaning by researchers from a wide range of research areas including psychology, philosophy, linguistics, information retrieval and cognitive science [43, 44, 45] . In cognitive science, the focus is to model human memory by activating the meaning potentials by other words in the context under the assumption that cognitive components of meaning of word are linked in a semantic-based network and changes dynamically [46] . It is assumed that human knowledge acquisition actually follows the same process that LSA does: checking events in their internal and external environments and deriving the knowledge from a high dimensional semantic space by a procedure like dimension reduction [47, 48] . Here, the semantic space is used as a basis for all cognitive processing. Although LSA supplies a usefull simulation of human cognitive processes, it is argued that LSA knowledge base does not provide a complete modelling of cognition [47] . There are limitations in modification of context and updating the model of semantic dimensions -in this knowledge base -which are characteristic of analytic thinking and dynamic structure of the human cognitive processes [49] . Even if this problem is solved, there are other fundamental semantic problems for LSA such as polysemous words. In LSA, when each word is represented as a single context-free vector in the semantic space, different meanings or senses of a word is not taken into account [40] . This problem matches the task of characterisation of word meaning by its dictionary senses in Word Sense Disambiguation (WSD). WSD is defined as a task to determine the word sense (meaning) by the use of the word in a context in NLP and Machine Learning. In traditional word sense studies, meaning of a word is characterised by mutually disjoint senses covered in dictionaries as the best fit to the its dictionary senses [50] . By both linguistics and psychologists, it has been argued that clear distinctions of senses can be difficult in certain contexts due to fluctuations of meaning in context [44, 46, 51] , especially for polysemous words. Hanks (lexigropher) pointed this problem in his paper where he questioned 'Do word meaning exist?' [46] as: Informational Space of Meaning for Scientific Texts "...words have meaning potentials, rather than just meaning. The meaning potential of each word is made up of a number of components, which may be activated cognitively by other words in the context in which it is used. These cognitive components are linked in a network which provides the whole semantic base of the language, with enormous dynamic potential for saying new things and relating the unknown to the known."

The problem of 'fluctuation of meaning in context' is also important in theories of mental representation of word senses in Psychology. This was very well discussed by Kintsch [40, 47, 52, 53] who stressed the complexity of representation of polygamous words into a single vector in the semantic space in LSA. He questioned 'How is the meaning of words represented in the mind?' and discussed the problem in the aspects of 'mental lexicon' and 'generative lexicon' approaches to the representation of meaning [51] . He came up with the result that both mental lexicon and generative lexicon approaches have limitations in representation of the meanings when word meanings are constructed by their explicit definitions due to multiple senses of words and the flexibility of word meanings. He then discussed the implicit way to define word meaning: relations of the word to other words in the context. According to his research, LSA allows us to modify word meaning by situating the meaning as a vector in high-dimensional semantic space. In this case, the full meaning of the word is not defined, but it is explained in a relational system by only its semantic relationships with other words. He argued that standard composition rule for vectors in LSA does not distinguish the different meanings of a word; therefore, word meanings should be modified according to the different context -where it appears in -by context-sensitive composition algorithms.

Polysemy is one of the characteristics of words in all natural languages. Psycholinguistic studies approach this phenomenon to answer questions of how to represent multiple senses in mental lexicon and how to activate senses during language comprehension [45] . The mental lexicon here can be considered as a mental repertoire containing the list of meanings or senses in the mind. Linguistics proposed several approaches for sense representation in mental lexicon, basically classified as seperate sense representation and single core representation. Even though some polysemy studies argue the discreteness of sense storage in mental lexicon [44, 54, 55] , the majority of studies suggests that polysemous senses can overlap in their mental representations [50, 56, 57, 58, 59] .

Moreover, polysemy of words is one of major focuses in distributional semantics and it is yet to be studied [60, 61, 62] . Some researches in distributional semantics have made modelling the differences of meanings in two occurrences of a word in different contexts possible by developing specialized models for word meaning [63, 64, 65, 66] . Such methods do not approach to word meaning by considering disjoint senses. Alternative models were purposed in which word meaning is not just extracted by pre-defined senses, but from the links between words and their window-based context words. To extract 'contexeualised meaning' of a word or a set of words, co-occurance vectors are constructed and vector operations are used [64, 65, 67, 68] . A probabilistic method that word meaning is modelled as a probability distribution over latent dimensions (senses) was applied by [65, 67] . Contexualized meaning was build as a change in original sense distribution. Cruys, 10

Informational Space of Meaning for Scientific Texts Poibeau and Korhonen then purposed a model in which latent space is used to identify important dimensions for a context and adapt to vector of words constructed by the dependency relations with window-based context words [69].

In academic disciplines, the notion of meaning of a word was analysed in many works ranging from psychology to linguistics, philosophy to pedagogy and computer science [70, 71] . Technical innovations in computerised methods and extensive psycholinguistic and neurolinguistic experiments have made investigating word meanings in different perspectives and linking between the language and cognition, and the language in people's mind possible.

There is no unique way to represent meanings that can be used in all theories of lexical semantics from different perspectives. Semantics studies require different semantic representations on the formalism for meaning of word [72] . According to Kintsch, philosophers work with meaning of concepts instead of words, psychologists mostly study concept formation than vocabulary acquisition and linguistics work on meaning of word [47] . But, at this point precise representing and approximating the meaning of concept or specially a text such as sentences, passages or documents are still active problems in NLP and all other disciplines concerning with 'meaning'.

In this research, we specifically focus on meanings in scientific texts. We concern with how meaning can be extracted by analysing the large scientific corpus. Our fundamental assumption is that the meaning of a text can be extracted from the occurrence of its words in texts across the scientific categories. We hypothesize that there is a great connection between the meaning in a text and the vocabulary used in the text; however, we cannot say that each word has the same importance in all research disciplines. In fact, words have scientifically specific meaning in texts based on differences of use in subject categories and these meanings can be estimated from their occurrences in texts within categories. Difference in word meanings for categories correlates with the difference in distribution of words across categories. As they are scientific texts, we consider that occurrence of these words in texts of categories can be used in characterisation of word meaning for science.

Our approach to quantifying the meaning of a word differs from measuring its meaning on the basis of human sensations and feelings, as in psycholinguistic studies. Although measuring the meaning of a word in context by characterization through its dictionary meanings has many important implications in computational linguistics and psycholinguistic research, we do not focus here on dictionary meanings. Rather, we create a model for word representation that allows us to extract the meaning of a word through its importance in various scientific fields without distinguishing its dictionary meanings. We approach the meaning of a word through the predictive power of a corpus analytical procedure under the assumption that the meaning of a word is determined by its use in scientific disciplines. This actually matches the statistical semantics hypothesis that 'statistical patterns of human word usage can be utilised to figure out what people mean' [28] . We can also reword this as 'statistical patterns of word usage in scientific fields can be used to figure out what a text means '. 11 Informational Space of Meaning for Scientific Texts

In these relations, the meaning of a word is defined as a vector of RIGs from the word to a category. Given such information, meaning can be defined for each word and then for research text [23] . A natural way to formalise this is to represent words as vectors and texts as sets of vectors in a specially constructed space. Differences in the distributions of vectors reflect differences in meaning of texts. This technique allowed us to represent each word by a distribution of numerical values over categories and meaning in text through a vector space model, that is, quantifying of meaning.

In many semantic studies, the vector space is obtained by co-occurrence of words as discussed before. There are currently two broad VSMs based on co-occurrence: word-word and word-document where vectors are (normalised) frequency counts and dimensions are contexts (words or documents) [73] . Vectors are called context vectors in this case, and words are represented by the context vectors. In distributional hypothesis, these vectors are used to compute vector similarity. However, cooccurrence models are plagued with efficiency in real-word applications [73] . There are two main problems in the usage of such approaches: first is the dimensionality in contexts vectors and the second is sparse data problem. In the first problem, the dimension of co-occurance matrix will tend to be extremely big for large data. In the second problem, as the vast majority of words occurs in a very small fraction of set of contexts [74] , the majority of the entities of vectors will be zero. Therefore, the co-occurance matrix will not give reliable results for large data and brief texts. Additional to these two problems, specifically, usage of co-occurrence is not appropriate for the representation of scientific texts due to multidisciplinary researches in the collection [23] . Therefore, we introduce a new vector space to represent word meaning based on words' informational importance in the subject categories.

We begin by creating a space to represent words meaning. The Meaning Space is defined as a vector space, in which coordinates correspond to the subject categories. A word is represented by a vector of RIG about the subject categories that the text belongs to, which can be obtained from observing the word in the text. This approach allows us to identify the importance of the word for the corresponding category in terms of information gained when separating the corresponding category from its complement (like, for example, separating texts in category 'algebra' from the text that do not belong to this category).

To define RIGs, we consider the following two attributes of text d for a given word w j and a given category c k :

The text d is in the category c k : Attribute values are Yes (c k (d) = 1) or No (c k (d) = 0); w j (d): The word is in the text: Attribute values are Yes (w j (d) = 1) or No (w j (d) = 0).

The corpus is considered as a probabilistic sample space (the space of equally probable elementary results, each of which is a random selection of text from the corpus). RIG measures the (normalized) information about the value of c k (d), which can be extracted from the value w j (d) (i.e. from observing the word w j in the text d) for a text d from the corpus.

As we have a number of word vectors, it is convenient to organise the vectors into a matrix. These vectors are used to construct Word-Category RIG Matrix, in which rows correspond to words and columns correspond to categories. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair 12 (c k , w j ) shows the RIG on the belonging of a text from the corpus to the category c k from observing the word w j in this text. Word-Category RIG vectors estimate the meaning of words as their importance in the research fields. Thus, row vectors in the Word-Category RIG Matrix indicate words' scientific meanings. This approach computes a distributional representation (RIGs) for a word across all research subjects (RIGs in categories). Following to the distributional semantic hypothesis, if words have similar row vectors in the Word-Category RIG Matrix, they tend to have similar meanings. The hypothesis is that if texts have a similar distributions of word meanings -similar clouds of word meanings vectors -then they tend to have similar meanings.

We note that proposed hypothesis does not require an explicit distinguishing between homonymy and polysemy for words; it only requires linking the meaning of words to their importance in categories. With this approach, vocabulary meanings do not directly affect the representation of the meaning of the word. Rather, the meaning of a word is characterized through its measured information content in various scientific subject categories.

In this research, we present the first stage of 'quantifying of meaning': construction of the Meaning Space and representing word meaning as a vector of RIGs for categories in this space. Such an understanding of meaning of words can help analyse the meaning of the texts. Having quantified meaning of words, one can represent all words in a corpus and then texts in the Meaning Space. Specifically, each text in the corpus is a cloud of RIG vectors and the text meaning can be later estimated and constructed by these distributions. Analysis of texts will be focused in the next stage of the research. Text analysis will be the next stage of the research. The earliest (preparatory) stage of the project was presented in [24, 23] .

The empirical analysis of this research is based on the Leicester Scientific Corpus which includes 1,673,350 texts [22] and Leicester Scientific Dictionary-Core (LScDC) of 103,998 words [75] . The main hypothesis for construction of the Meaning Space is: meaning is the vector of information gains from the word to the categories assigned to the text. We used 252 categories of WoS.

We evaluated the Meaning Space and representation of word meaning in this space through top-ranked words in each category. We constructed the Word-Category RIG Matrix for the LSC [76] . The most informative words in each category are presented. It is shown that the proposed representation technique stands out topic-specific words in categories. We compared this approach with the representation technique where words are represented by vectors of their raw frequencies in categories. Words are ranked by both frequencies and RIGs in categories. We demonstrated that frequencies are not much useful for identifying the most informative words in categories. We concluded that frequency is not much important in this sense.

For each word in the LScDC, the sum and maximum of RIGs in categories are calculated and added at the end of the Word-Category RIG Matrix. Words can be ordered by their informativeness in scientific texts by these two criteria. The most informative n words for scientific texts can be extracted by ordering/sorting words in column of the sum or maximum of RIGs. We compared these two ordering criteria by counting the number of matches in the top n words, where n ranges from 100 to 50,000. We concluded that the majority of the first 100 words do not match, 13 Informational Space of Meaning for Scientific Texts with 28% matched words. The intersection of words reaches to approximately 50% for the top 1,000 words, and then 99% for the top 50,000 words.

Finally, we created a scientific thesaurus in which the most informative words were selected from the LScDC by their average RIGs in categories. The thesaurus was called Leicester Scientific Thesaurus (LScT). LScT contains the most informative 5,000 words in the corpus LSC. These words are considered as the most meaningful words in science. The full list of words in LScT is available online [76].

This paper is organised as follows. In Section 2, the Meaning Space is constructed and the representation of words by vectors in the Meaning Space is discussed. Given the representation of words by vectors of RIGs, we look at words ordered by their RIGs in each category. In Section 3, we present the first findings of the new representation technique and the anomalies detected in the data by this model. To avoid a possible abnormal appearances of the words in the categories, we apply a further cleaning procedure of the LSC. The latest versions of the LSC, dictionaries Leicester Scientific Dictionary (LScD) and the LScDC are described [22, 75, 77] . Finally, we construct the Word-Category RIG Matrix for the LSC [76] and discuss the experimental results in this section. In section 4, we introduce the Leicester Scientific Thesaurus (LScT), in which there are 5,000 of the LScDC words selected by their average RIGs in categories. In Section 5, the conclusion and outlook are summarised.

In this section, we discuss the architecture of our approach to estimating the word meaning in a collection of documents. We assume that the dataset is a large corpus of natural language scientific texts and each text in the corpus belongs to at least one subject category. We hypothesize that words have scientifically specific meaning in categories and the meaning can be estimated by information gains from the word to the category. Before inquiring into the measurement of the meaning, we will mention how to represent each word as a vector of frequencies in categories. We then introduce a new approach to word meaning, in which each word is represented by a vector of RIGs in the Meaning Space.

In this section, we review how to represent a word in a vector space model by using appearances this word in texts belonging to subject categories. A word representation method is defined in order to indicate term absence/presence in texts of categories. Each word is represented by a vector of frequencies in categories. That is, the number of presence of a word is calculated by how frequently this word is observed in texts belonging to the category. Each entry of the vector consists of the number of texts containing the word in the corresponding category.

It is noteworthy that texts in a corpus do not necessarily belong to a single category as they are likely to correspond to multidisciplinary studies, specifically 14

Informational Space of Meaning for Scientific Texts in a corpus of scientific researches. In other words, categories may not be mutually exclusive.

For every word w j from the dictionary (j = 1, ..., N ) and every text d i from the corpus (i = 1, ..., M ) the indicator w j (d i ) is defined. If the word w j occurs in the text d i (once or more), then w j (d i ) = 1. Otherwise, w j (d i ) = 1.

Let D k be a set of texts in the category c k . The frequency of the word w j in the category c k is

This w jk is the number of texts containing the word w j in the category c k .

The vector of frequencies is defined for each word w j from the dictionary. Let us use the notation − → w j for it. Coordinates of this vectors are w jk , where index k = 1, ..., K corresponds to the subject categories.

Thus, each word w j in the corpus is represented by a vector of frequencies w jk denoted by − → w j = (w j1 , w j2 , ..., w jK ),

where K is the number of categories in the corpus. The collection of vectors, with all words and categories in the entire corpus, can be shown in a table. Each entry w jk of the Table 2 .1 corresponds to a word and a category. 

Category c 1 c 2 · · · c K w 1 w 11 w 12 · · · w 1K w 2 w 21 w 22 · · · w 2K . . . . . . . . . . . .

The number of documents in the category c k is |D k |. Importantly,

as each text usually has more than one word, and several different words can belong to the same text. To simplify the notation for further calculations, we now define the set of texts containing the word w j as D j . We note that |D j | ≤ k w jk and equality holds in the case when categories are mutually exclusive. The number of texts in the categories varies widely, so w jk is expected to increase as the number of texts in a category increases. This does not necessarily mean that a word rarely appearing in a category is less important for this category than for other categories in which the word appears more frequently (see the definition of 15 information gain in the next section). Therefore, direct usage of frequencies may result inappropriate findings in quantification of words' meanings. Given the collection of vectors, various schemes for normalisation can be performed to adjust the vectors − → w j to a common scale. The simplest and the most popular approach for normalisation is transformation to a vector where the sum of the elements is 1, that is normalisation to unite l 1 norm. For the mutually exclusive categories, this normalisation is related to the law of total probability. The objective of this normalisation scheme is to make vectors comparable by rescaling them to the same length in the l 1 norm. For a given vector − → w j , the normalisation can be performed as P jk = w jk i w ji where k P jk = 1. It should be stressed that when categories are not exclusive, k w jk is not the total number of texts containing the word w j . In other words, texts containing the word could be counted more than once in the sum.

In similar way, the column vectors can be normalised as

However, this representation does not indicate the proportion of exact number of texts in the category. A reasonable normalisation can also be obtained in two-steps:

(1) Normalize each frequency:

(2) Normalize the matrix to the unite sum in rows. As a result, w jk will be transformed into w jk

In calculation of RIGs below, the estimation of probabilities are used based on the table of frequencies. For ranking of words in categories, the raw frequencies were also used and compared to RIG-based ranking.

Having a collection of frequency vectors, it is easy to calculate the vectors of information gains (from observing the word in the text to categories which the text belongs to). These vectors will quantify the meaning the words. The hypothesis here is that the informational content of a word about each category can be measured by comparing the appearance of a word in texts of a given category and its appearance in texts not related to this category (i.e, how the presence/absence of the word in texts can help to separate the category from its set-theoretical complement).

A general concept for computing information is the Shannon entropy introduced by Shannon [78]. Information Gain (IG) is a common feature selection criterion in machine learning used, in particular, for evaluation of word goodness [79, 80] . The information gain is the measure of the information extracted about one random 16 variable if the value of another random variable is known. It is closely related to the mutual information, that measures the statistical dependence between two random variables. The larger value of the gain means the stronger relationship between the variables. The information gain of random variable A with values (or states) a 1 , . . . , a n from the random variable B with values (or states) b 1 , . . . , b m is defined as:

where P (A = a i ) is probability of observing the value a i of the random variable A, P (B = b j ) is probability of observing the value b j of the random variable B, P (A = a i |B = b j ) is conditional probability of observing the value a i of the random variable A given the value b j of the random variable B. IG(A|B) measures the number of bits of information obtained for prediction of a value of the variable A by knowing the value of the variable B.

In the concept of text categorisation, the information gain measures how important a given word is for category prediction. A larger gain indicates that the probability to find the word in the texts inside the category differs considerably from the probability to find it in the text outside this category. If the categories are mutually exclusive then we can consider them as values of a categoric feature C of the text with values c i and define the information gain IG(C, w) from observing a word w in the text about the value of C [80] by the textbook formula:

where {c i } is the set of classes in the target space, P (c i ) is the probability of observing the i th class, P (w) is the probability that the term w appears, P (w) is the probability that w does not appear, P (c i |w) is the conditional probability of observing the i th class given that the term w appears, and P (c i |w) is the conditional probability of observing the i th class given that the term w does not appear.

IG(C, w) measures the number of bits of information obtained for prediction of classes c i by knowing the presence and absence of a term w in documents of classes.

The quantity IG(C, w) measures the amount of information provided by a word when splitting the documents into classes but only in the case of mutually exclusive classes, that is, each text is assigned to a single class only. On the contrary, the scientific texts belong very often to several categories. The research subject categories are not mutually exclusive and this approach cannot be used directly.

Unlike this approach, we start from measuring how a word is informative for a category in terms of its ability to separate the corresponding category from its settheoretical complement. We hypothesize that the topic-specific words in categories have larger information gain than other words and such words are expected to have less gain in most other categories. Therefore, we approach to this problem by 17 

Since words are obviously not mutually exclusive (one text usually contains several different words) we cannot consider the occurrence of different words as values of a random variable to use (1) directly. To evaluate the information gain of the category c k from the word w j it is necessary to introduce for each word w j a random Boolean variable with two states: w j denotes the presence of the word in texts of the category c k and w j denotes the absence of the word w j in texts of the category c k . Contingency 2 × 2 table to calculate information gain of the category c k from the word w j is presented in Table 2 .3. It used the raw frequencies w jk introduced in previous Subsection. Table 2 .3 can be used to calculate two information gains: the word w j from the category c k and the category c k from the word w j . Both information gains have a meaning for different problems. The goal of this research is to evaluate informativeness of words for category identification and use this informativeness for word ranking and text representations. Therefore, we will consider information gain of the category c k from the word w j : IG(c k , w j ). This information gain evaluates the number of bits extracted from presence/absence of the word w j in the text for prediction of belonging of this text to the category c k . One may expect that if a word is a very topic-specific for a category, it appears in texts belonging to this category more frequently than in texts which do not belong to this category; and the major part of texts belonging to this category contains the word.

For each category, c k , a function is defined on texts that takes the value 1, if the text belongs to the category c k , and 0 otherwise. For each word, w j , a function is 

Informational Space of Meaning for Scientific Texts defined on texts that takes the value 1 if the word w j belongs to the text, and 0 otherwise. We use for these functions the same notations c k and w j . Consider the corpus as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, c k and w j , the joint probability distribution is defined according to Table 2 .3, the entropy and information gains can be defined as follows.

The information gain about the category c k from the word w j , IG(c k , w j ), is the amount of information on belonging of a text from the corpus to the category c k from observing the word w j in the text. It can be calculated as [78] :

where H(c k ) is the Shannon entropy of c k and H(c k |w j ) is the conditional entropy of c k given the observing the word w j . Entropies H(c k ) and H(c k |w j ) are

where P (c k ) is the probability that the text belongs to the category c k , P (c k ) is the probability that the text does not belong to the category c k and

where • P (w j ) is the probability that the word w j appears in a text from the corpus;

• P (w j ) is the probability that the word w j does not appear in a text from the corpus; • P (c k |w j ) is the probability that a text belongs to the category c k under the condition that it contains the word w j ; • P (c k |w j ) is the probability that a text does not belong to the category c k under the condition that it contains the word w j ; • P (c k |w j ) is the probability that a text belongs to the category c k under the condition that it does not contain the word w j ; • P (c k |w j ) is the probability that a text does not belong to the category c k under the condition that it does not contain the word w j .

information about whether an element belongs to a set. High value of the informational gain IG(c k , w j ) (3) does not mean, in general, that the large proportion of information about whether a text belongs to the category c k can be extracted from observing the word w j in this text. This proportion depends on the value of the entropy H(c k ) (6). The Relative Information Gain (RIG) measures this proportion directly. It provides a normalised measure of the Information Gain with regard to the entropy of c k . RIG is defined as

The value of RIG(c k |w j ) will be 0 when H(c k ) = H(c k |w j ) and 1 when H(c k |w j ) = 0. In the first case, the presence/absence of the given word w j does not contain information for the category c k . So, this word is uninformative. In the second case, using the word in the category provides exactly H(c k ) bits of information. That is, presence or absence of a word resolves exactly the question of belonging the text to the category. RIG(c k |w j ) can be equal to 1 in two cases:

• All texts with the word w j belong to the category c k and all texts without the word w j do not belong to the category c k ; • All texts with the word w j do not belong to the category c k and all texts without the word w j belong to the category c k ; We expect higher RIG(c k |w j ) for the topic-specific words of the category c k . For simplicity, we denote RIG(c k |w j ) by RIG jk . Given the word w j , RIG jk is used to form vector − −− → RIG j , where each component of the vector corresponds to a category. Therefore, each word is represented by a vector of RIGs. It is obvious that the dimension of vector for each word is the number of categories K (for the WoS subject categories K = 252). For the word w j , this vector is

The set of vectors − −− → RIG j can be used to form the Word-Category RIG Matrix, in which each column corresponds to a category c k and each row corresponds to a word w j . Each component RIG jk corresponds to a pair (c k , w j ) and its value is the RIG from the word w j to the category c k . The structure of the Word-Category RIG Matrix is demonstrated in Table 2 .4.

In the Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs for categories. We defined the Meaning Space as the vector space of such vectors − −− → RIG j . The dimension of this space is the number of categories and each coordinate is the RIG from a word to this category.

Note that in the Word-Category RIG Matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, the 20 

words can be ordered by their RIGs from the most informative word to the least informative one. We expect that the topic-specific words will appear at the top of the list. The words can be ordered by their informativeness in the whole corpus of scientific texts as well as they are ordered in each category. A norm or a more general proximity measure in the Meaning Space is needed to compare the meaningfulness of words across all categories. Two criteria were tested for measuring informativeness of words in the corpus of scientific texts: the sum (l 1 norm) and the maximum (l ∞ norm) of RIGs in categories. For a given word w j , the sum S j and the maximum M j of RIGs are calculated from the Word-Category RIG Matrix as:

and

The sum S j is a measure of the average informativeness of a word (this word has the informativeness S j /K on average), whereas the maximum M j is a measure of the maximal informativeness of the word across the categories (this word is not more informative than M j in any category). Now, the words in the dictionary can be ordered by their S j or M j . For each of these ordered lists of words, the most informative (meaningful) n words for scientific texts can be selected based on one of these two criteria. The higher the value of the criterion (S j or M j ), the more informative the word is.

This section describes the experimental details and the analysis done to show the performance of the vector representation method described in Section 2. The dataset used in this study is the Leicester Scientific Corpus (LSC) [22] . The LSC contains a collection of abstracts of research articles and proceeding papers with metadata such as authors, title, categories, research areas and times cited. Each record (text) in the dataset is assigned to at least one of the WoS categories. The Leicester Scientific Dictionary-Core (LScDC) is the collection of unique words appearing in 10 or more documents in the LSC [75] .

For each word w j and category c k , RIG jk is calculated and the Word-Category RIG Matrix for the LSC was formed as described in Section 2. In each category, 21

Informational Space of Meaning for Scientific Texts a list of words where words are sorted in descending order by their RIGs can be created. The higher the relative information a word gained in a category, the more important the word is in terms of being topic-specific for the category. Therefore, one could look at the top n words in categories in order to get a good grasp of the representation method. The visualisation of the top words in each category is carried out with the word clouds. Having calculated the frequencies of words in the categories (Table 2. 1), we compare the purposed method with the commonly-used approach based on raw frequency.

At first, the procedure of word representation was applied to the LSC version 1 [24] with the dictionary [25]. To visualise top words in each categories in a convenient way, we looked at word clouds. The font size of each word in a word cloud is proportional to its RIG in the category. Intuitively, the more informative the word is, the bigger size the word appears in word clouds. For example, from Figure 3 .1, it can be seen that the most informative 6 words for the category 'Acoustics' are 'acoust', 'ultrasound', 'speech', 'nois', 'sound' and 'frequenc'. The majority of papers in Acoustics is expected to include these words which are absent or at least less frequent in many other categories. These words are inferred to be informative for the category 'Acoustics'. However, this method detected anomalies in some categories. Anomalies here refers to words that do not conform to the expected set of words to be appearing in a subject category. Such words can appear in any category frequently regardless of being a topic-specific word. These words are likely to be potential anomalies 22

Informational Space of Meaning for Scientific Texts generated by inappropriate joints of words, phrases or sentences to the texts of abstracts. As shown in Figure 3 .2, for the category 'Chemistry, Applied', words 'elsevi', 'ltd', 'acid', 'reserv' and 'right' stand out in the word cloud. We see that trends in majority of words in the word cloud agree with each other as being related to the subject. However, 'elsevi', 'ltd', 'reserv' and 'right' seem like more prominent and unusual (non-specific) for Chemistry. In fact, the experiments were preliminary, but we discovered alarms indicating anomalies by our representation technique. To understand why these words arose and how they can be avoided, we checked the abstracts containing such words. Our review showed that these words appeared in copyright notices such as Published by Elsevier ltd. or 'All rights reserved', and they were added at the footer of abstracts. In order to have a comprehensive understanding of their appearance as being informative for only some categories, for instance in Chemistry, we compared distributions of 'elsevier', 'right' and 'reserve' in categories. For each word, categories are ordered by the number of documents containing the word, and the first 20 categories are presented in Figure 3 .3. When we consider the list of categories ordered by the number of documents in the entire corpus, we conclude that not all categories in the list of top categories appear in the charts. This is because usage of copyright notices is much more noticeable in some categories such as Chemistry. For instance, the rank of the category 'Engineering, Electrical & Electronic' is 1 in the corpus; however, one can see that this category has rank 15 for the word 'Elsevier'.

To show that not all categories have the same/similar distribution of use of copyright notices, we presented 

This subsection provides the description of procedure of additional cleaning and correction for the LSC and LScDC.

Many conferences and journals put copyright notices, permission policies or conference names below abstract of papers. Such footers were added to abstracts in many records in Web of Science database and so in the LSC during processing and storage of the original data (see Table 3 .1). 26

It is really a huge and practically impossible task to find out with the help of human inspection which notifications were added in the texts of 1,673,824 abstracts in [24] . Once a sample of abstracts containing publishing houses names was browsed, we found that there are much more scenarios to consider. Some examples of these scenarios are presented in Table 3 .2. As such expressions are more frequent in some categories than in others, a cleaning procedure is needed to avoid possible abnormal appearances of words in categories. A quick look at the scenarios is sufficient to conclude that clearing such sentences or phrases cannot be fully automated. Human intervention is needed to identify and list them to avoid deleting useful information from the data.

Individual notices with different appearances were identified by sampling of abstracts based on keyword search. A keyword search refers to browsing words, phrases or sentences to list different appearances of them in order to delete all identified appearances from abstracts. The position of notices was also taken into account since they appeared either at the beginning (by mistake) or at the end of the text. We used several specially developed procedures successively to clean them. For instance, when removing notices in the form of '(c) Published by Elsevier', we first checked the appearance of 'Crown Copyright (c) Published by Elsevier'. It can also appear in the form of 'Published by Elsevier', thus we consider all cases based on empirical study. During cleaning, we removed copyright notices, names of conferences, names of journals, authors' rights, licenses and permission policies identified. To give an insight, Table 3 .3 presents the number of document containing some notices before cleaning. These notices were completely removed after cleaning. We note that names of publishing houses could appear inside the text, in this case we did not remove them. More examples of notices that were removed from abstracts can be found in Appendix A.1.

To display the initial result of the cleaning, we present the word cloud and histogram of RIGs for the category 'Chemistry, Applied' in Figure 3.6. One can see that words 'elsevi', 'ltd', 'acid', 'reserv' and 'right' do not appear in the list of top words as was in the word cloud before cleaning (see Figure 3 .2). Instead, the cloud gives greater prominence to words that are related to specific topics and likely to be more informative for the category. The word 'acid' has been preserved in the list of the most informative words. It is noteworthy that, in both versions of the LSC, the number of subject categories is 252. All categories and the number of documents assigned to the corresponding category are presented in Table B .1. Same information for research areas was provided in Table C. 1. The distribution of length of abstracts is displayed in Figure 3 .7. There is no noticeable difference between distributions for two versions and the average length of texts is 176 words. 

The latest version of the Leicester Scientific Dictionary (LScD) was developed by extracting words from the new version of the LSC [77]. The procedure applied to process the LSC in creation the LScD was the same as described in [23] . The new version of the LScD contains 972,060 unique words with the number of texts that a word appears in. A new version of the core list, LScDC, was created from the LScD by removing words appearing in no more than 10 texts of the LSC [75]. All steps applied were the same as for the previous version of the LScDC and can be found in [23] .

Based on the decision to clean copyright notices, we expect that words such as 'Elsevier', 'Reserved', 'Ltd', 'Right' and 'Springer' will not appear frequently in the LSC as they did before. In fact, the number of appearance of these words decreased after cleaning (see Table 3 .4). The results indicated that some words, for instance 'Right' and 'Reserve', are still relatively frequent in the corpus. This is because these words are specific for some categories. To give an insight, we compared top categories for three words 'Elsevi', 'Reserv' and 'Right' (see Figure 3 .8). The results for the word 'Right' indicate that it is frequently used in medicine related categories such as 'Neuroscience' and 'Surgery', and in social science categories such as 'Law' and 'Political Science'. This is an expected result as it can appear to determine the side of organs such as 'right hippocampus' or 'right hemisphere' in medicine; and the normative rules in such disciplines as law and ethics. 'Elsevier' and 'Reserv' are much more uniformly distributed to the categories when the rank of categories is taken into account. For 'Reserv', one can identify categories related to Biosciences such as 'Ecology', 'Zoology' and 'Environmental Studies'. Specifically, this word occurs to indicate 'nature reserves'. 

Recall that a representation technique for words was introduced in Section 2. The vectors of frequencies in subject categories were obtained for each word. The frequency associated to a category was computed by counting texts containing the word in this category.

Subject categories are used to categorise papers in the WoS collection; however, documents do not necessarily belong to a unique category due to interdisciplinary studies. In other words, categories are not exclusive in WoS and so in the LSC. In the LSC, texts belong to at least 1 and a maximum of 6 categories out of a total of 252 subject categories (see Figure 3 .9). It is noteworthy that our consideration is to count the number of times a word appears in texts of a category rather than analysing exclusivity of categories. Therefore, in this stage, we just looked at the frequency of texts with these words in categories.

The vectors of frequencies in categories are built for 103,998 LScDC words and 252 subject categories. Each row represents a word of the LScDC in 252-dimensional space, that is, each word is represented by a vector of frequencies in 252 categories. For each category, a frequency distribution can be obtained for the set of words. The distribution indicates words used in texts of each category and the most frequently used words can be sorted in categories. To illustrate this, the most frequent 10 words for categories 'Astronomy & Astrophysics', 'Mathematics' and 'Asian Studies' with frequencies are displayed in Table 3 .5. A table containing all words and categories are included in [76] . One can expect that not all words in the table indicate a topic in the related subject. As an example, words 'use', 'also', 'studi' and 'paper' are frequent words in the LScDC and so in categories. These non-topic specific words occur many times in abstracts without indicating subject specificity. Therefore, using the frequencies of words in categories may not reflect how specific a word is to a category. 32

Informational Space of Meaning for Scientific Texts On the basis of exploratory work by the frequency table, we concluded that the use of word frequencies in categories does not provide much information about the category. To be specific, we expected that 'use' is not a topic-specific word as it appears in all 252 categories and it is likely to be used in almost all texts. This means that the meaning of a word in the text cannot be directly extracted from the frequency.

Aiming at this result, we must now apply a different perspective to measure the importance of words for categories, with a special attention given to the hypothesis that each word in the LScDC has scientifically specific meaning in categories and the meaning can be extracted from the information of words for 252 subject categories in the LSC. Thus, as described in Section 2, words were represented in a 252dimensional Meaning Space. RIGs for each word in 252 categories were calculated and vectors of words were formed. We then represented these vectors in the Word-Category RIG Matrix.

For each word in the Word-Category RIG Matrix, the sum S j and maximum M j of RIGs in categories were calculated and added at the end of the matrix. The Word-Category RIG Matrix can be found in [76] . One can extract the most informative n words for scientific texts by ordering/sorting the column of words based on their S j or M j .

The experimental results presented in this section were obtained using abstracts of academic research papers in the LSC [22] . We used words from the core dictionary LScDC [75] .

Having calculated RIGs for each word and created the Word-Category RIG Matrix, we evaluate the representation model by checking words in each category. That is, we consider the list of words with their RIGs in the corresponding category. Those words that have larger RIG are more informative in the category. Being 'more informative' here allows for the interpretation of being 'more specific' to the category's topic.

For each category, words are sorted by their RIGs and the top 100 words are shown in the word clouds. The bigger font size the word in word clouds, the more informative it is. Word clouds for the top 100 most informative words and histograms of RIGs for the top 10 most informative words for each of 252 categories can be found in [81] . The most informative 100 words with their RIGs for each of categories are presented in Appendix E and [81] .

In general, the RIG based method proves to be more sensitive than the frequency based method in identifying topic-specific words of a category. This means that representing words in Meaning Space has the advantage of transforming words to efficient vectors with a benefit of considerably lower dimension than the standard word representation schemes. To illustrate this result, we choose categories 'Biochemistry & Molecular Biology', 'Economics' and 'Mathematics' and compare two word clouds that are formed by using raw frequencies and RIGs in categories (see Figures 3.10, 3.11 and 3.12). It can be seen from the figures that the majority of the most frequent words in all three categories are frequent words in the entire corpus. These words are not topic-specific for categories as they appear in almost all abstracts. The frequent but non-informative words can be considered as generalised service words of Science and deserve special analysis.

This proves that raw frequency is not much important to identify scientifically specific meanings of words. Therefore, by representing words as vector of RIGs, we can avoid such frequency bias. The most informative words in categories for RIG representation are topic-related in the corresponding category. We interpret these results as evidence for the usefulness of the RIG based representation.

Words that are expected to be used together have very close values of RIGs. In 'Health Care Sciences & Services', 'health' and 'care' are top words and RIGs for these words are so close (see Figure D .1). Another example is 'xrd' and 'difract' in 'Material Science, Ceramics'. 'XRD' is actually abbreviation of 'X-ray diffraction'; therefore, they appear together as 'X-ray diffraction (XRD)' for most of cases in the category (see Figure D .2).

We can extract some stylistic properties in texts of categories. For instance, in computer science related categories the word 'paper' has the highest RIGs (see A casual observation indicates that while the most informative words in some categories have similar RIGs, differences in values of RIGs are much more noticeable for the most informative words in some other categories. To give an insight, we present the categories 'Chemistry, Medicinal' and 'Engineering, Chemical' in Figure  3 .13. In 'Chemistry, Medicinal', the word 'compound' can be easily separated from the other words, while in 'Engineering, Chemical', there is a slight decrease in RIGs Informational Space of Meaning for Scientific Texts for the top 10 most informative words. However, in general we did not observe any explicit rule for this property.

Finally, we formed two lists of words that arranged in descending order based on the sum and maximum of their RIGs in 252 categories. The top 100 words in two lists are displayed by word clouds in Figure 3 .14 and Figure 3 .15. Histograms in the figures show the most informative 10 words in the lists. We found that the most informative 10 words in two lists are completely different, as shown in the figures. From words clouds, one can see that the majority of the first 100 words do not match. We then compared two lists by counting the number of matches in the top n words, where n ranges from 100 to 50,000. The numbers of matched words for different n are presented in Table 3 .6. As can be seen, 18% of words match for the top 50 most informative words. This proportion increases to approximately 50% for the top 1,000 words and to 58% for the top 2,000 words. The intersection of lists reaches to approximately 99% for the top 50,000 words. From these results, one can conclude that two lists are different in the top words. When higher number of words is taken into account, lists become more similar in terms of words included. However, the rank of words are not similar. Any of these criteria for selecting the most informative words can be used depending on the task and the information required. 35

Informational Space of Meaning for Scientific Texts The numbers S j and M j are differently distributed for words. We observed from the lists that many words have low S j and M j . Figure 3.16 and Figure 3 .17 show the distribution of S j and M j for words in the logarithmic scale. Supper-exponential picks near zero RIGs are noticeable for both criteria. We can see that the trend is going down almost linearly beyond the picks. The bottom 10 least informative words in two lists are presented in Table 3 .7. One may consider words having almost zero S j or M j as less meaningful words for scientific texts. In this section, we introduce a scientific thesaurus of English: Leicester Scientific Thesaurus (LScT). LScT is a list of 5,000 words which are created by arranging words of LScDC in their informativeness in the scientific corpus. The procedure for creation of the thesaurus is described in detail.

Under the assumption that not all words having very low RIGs are informative in categories, we search a cut-off point for RIG to create a list of words that can be considered as relatively meaningful in scientific texts. In other words, we extract meaningful words for science from the LScDC to build a scientific thesaurus. Before moving on the decision taken to determine the number of words for the thesaurus, we recall the notion 'informativeness' and investigate further the criteria of S j and M j to arrange words of LScDC in their informativeness.

Having the top 100 words in two lists where words are descending ordered by their S j and M j , we see that the criteria of maximum is more likely to stand out some words that are frequently used in specific categories such as categories 'Dance', 'Music', 'Soil Science' and 'Theatre' (see Table 4 .1) and are relatively rarely used outside them. Indeed, we expect drastic differences in RIGs of such words for these categories. For instance, one of the most informative word 'dance' is used in 154 categories, but the RIG from this word to the category 'Dance' is very distinguishable from all others (see Table 4 .2). This is actually an expected result, since the word 'danc' is likely to be informative for categories related to the performing arts.

To compare the meaningfulness of words across all categories, we tested two norms in the Meaning Space, l 1 (S j or the sum of RIGs) and l ∞ (M j or the maximal RIG). After a series of trials, we decided to use l 1 . This choice cannot be proven formally but the ordering words by M j lead to some words that are very specific in only one category but stand out in the list of the most informative words on average. The sum can be considered as more appropriate measure for general scientific thesaurus. When creating an LScT, we consider ordering the LScDC words by the sum of their RIGs in categories. The meaningfulness of words was evaluated by the average informativeness of words in the categories. Given the dictionary LScDC, the procedure to create the LScT is:

• Sort the words of the LScDC by their S j in descending order.

• Take the top 5,000 words. To find the number of words to be contained in the thesaurus, we initially follow an empirical procedure:

(1) Having arranged list of words in descending order by S j , take a sub-list of the top m words, denoted by T m (2) Create the histogram of S j for the words in this sub-list (3) Check the trend in the histogram (4) Take words when the exponential pick is avoided and the histogram follows roughly linear trend. We begin with investigating the top 50,000 words in arranged list as it is almost the half of the 103,998 words of the LScDC. As the trend in the histogram for 50,000 words was showing the same behaviour with the histogram of 103,998 words (see Figure 3 .16 and Figure 4 .1 (a)), there was no point to check a number between 50,000 and 103,998. We then decreased the number m to 10,000, 5,000, 2,000, 1,500 and finally 1,000. All histograms are presented in Figure 4 .1. We see a substantial change in the trend of the histogram when we take the subset of 5,000 words. The trend at that point is almost linear. After that, the first bin in the histogram is slightly becoming smaller and finally it disappears for 1,000 words.

In this step, we also checked the minimum of the sum of RIGs in the lists T m to make sure that the minimal average informativeness in the list (to be selected) is not so close to zero. These values are displayed in the Table 4 .3. We can see from the table that the minimal RIG is decreased less than half from 1,000 to 2,000, while it decreased faster (more than halved) from 5,000 to 10,000.

Finally, to support our selection of the number of words for the LScT and for evaluation of the result, we consider the following heuristic suggestion: the majority of words in the LScT appears in the list of top informative words in the categories. This does not mean that all informative words in categories should appear in the LScT, but we expect that most of top n words in categories will be included in 42

Informational Space of Meaning for Scientific Texts We consider the matches of the list T m with the most informative words in categories defined by the sum of RIGs. For collection C k,n of n most informative words in the category k, we define X n = K k=1 C k,n . Then we test the coverage of the list T m by X n . For each category in the Word-Category RIG matrix (a column in Table 2 .4), order in descending by their RIGs. This gives a list of words sorted from the most informative to the least informative for this specific category. Then, individual collections C k,n are formed for each category.

The set C k,n was formed with different numbers of words (n). We built the collections containing the most informative 100, 200, 300, 400 and 500 words in each category, and X n is created by uniting them for each n. The numbers of words and the minimal RIGs of words in X n are presented in Table 4 .4. The minimal RIGs are checked to avoid zero/near-zero RIG in lists. One can see from the table that words in categories are not completely different. For instance, if all C k,100 do not intersect then there should be 25,200 words in the list X 100 , but there are just 6,254 words in this union, which is almost four times less. For other n, the result is similar, and the values of X n follow almost a linear trend (see Figure 4 .2). That is, the intersection K k=1 C k,n is not empty. The intersection may be pairwise or q-wise for different q.

The coverage is calculated by counting the number of matches words of the list T m and words of X n . Table 4 .5 illustrates the numbers of matches when n is 100, 200 and 500. Up to top 2,000 words, the words are concordant in the lists T m and X n , suggesting that the most informative words are highly consistent. In fact, words in the lists are in agreement for the case where 500 words in each category are considered as informative for categories. Given the list X 100 , the majority of the words (3,992 words) in the T 5,000 can be covered by words of X 100 . This trend changes and goes down when we consider the percentage of words found in 10,000 and 50,000 words. However, in this stage we have to consider the total number of words in X n . For instance, the number of words in X 100 is 6,254. In this case, the matches cannot be more than this number for the list T 10,000 . Similar conclusions 44 were obtained by comparing the number of matches for 5,000, 10,000 and 50,000 words for n=200. We examined various heuristic criteria to evaluate how many words are suitable for inclusion in a thesaurus. Since we want to keep the size of thesaurus reasonable, and pay attention not to loose many words in case there might be informative words having not very high RIGs, we decided to include these 5,000 words (T 5,000 ) in the scientific thesaurus. This thesaurus is called Leicester Scientific Thesaurus (LScT). It is published online [76]. 

In this work, we have studied the first stage of 'quantifying of meaning' for scientific texts: constructing the space of meaning. We have introduced the Meaning Space for scientific texts based on computational analysis of situations of words' use. The situation of use of the word is described by the absence/presence of the word in the text in scientific subject categories. The meaning of the text is hidden in the situations of usage and should be extracted by evaluating the situation related to the text as a whole.

This research is done based on 1,673,350 texts from the LSC and its 103,998 words listed in the LScDC [22, 75] . A text in the LSC belongs to at least one and at most six of 252 Web of Science categories presented in Table B .1. That is, categories can intersect. The situation of use is described by these 252 binary attributes of the text. These attributes have the form: a text is present (or not present) in a category. The meaning of a word is determined by categorising texts that contain the word and texts that do not. It is represented by the 252-dimensional vector of RIG about the categories that the text belongs to, which can be obtained from observing the word in the text. This representation is demonstrated in Table 2 .4. Each text in the LSC can be considered as a cloud of these RIG vectors.

We begin with representing each word as a vector of frequencies in categories (Table 2.1). Components of a vector are the number of texts containing the word and belonging to the corresponding category. Then we moved on to representing the meaning of a word as a RIGs vector about categories.

We consider the corpus (LSC) as a probabilistic sample space (the space of equal probable elementary outcomes). The function is defined on texts that takes the value 1, if the text belongs to the category c k , and 0 otherwise. Similarly, for each word w j , a function is defined on texts that takes the value 1 if the word w j belongs to the text, and 0 otherwise. Both functions can be considered as the random Boolean variables. The information gain IG(c k , w j ) about the category c k from the word w j is calculated by (3), (4) and (5). IG(c k , w j ) measures the amount of information extracted from observing the word w j in the text on prediction of belonging of this text to the category c k . The RIG RIG(c k , w j ) is calculated by 46

Informational Space of Meaning for Scientific Texts (8) that provides us a normalised measure of Information Gain giving the ability of comparing information gains for different categories.

Vectors of RIGs are denoted by − −− → RIG j for a word w j . − −− → RIG j vectors for all words are presented in a Word-Category RIG Matrix (see the structure in Table  2 .4) (available online [76]). A column vector of the matrix contains RIGs for all words in an individual category and a row vector represents the corresponding word's meaning as a vector of RIGs for categories. The Meaning Space has been described as a 252-dimensional vector space, where vectors are − −− → RIG j . Beyond the representation of words, the Word-Category RIG Matrix can be also used for the ordering words in a category from the most informative to the least informative as well as identifying the most informative words in the science for different subjects and their combinations. Ranking of words in a scientific corpus are performed based on two criteria: sum of RIGs (S j ) and maximum of RIGs (M j ) in a row vector. Calculations are done by (9) and (10). Given an ordered list of words, the top n words are considered as the most informative n words in the scientific corpus.

The LSC and LScDC were created and available online [23, 24, 25]. The proposed word representation technique was applied to this version of the corpus. The evaluation of the model is done based on checking the most informative words in each category. Word clouds are generated using words in lists for each category (For example, see Figure 3 .1 and Figure 3 .2). The higher RIG a word has, the bigger font size of the word is in the cloud. The clouds demonstrated that our methodology is able to identify topic specific words for categories, and most of the top words are related to the category subjects.

We note, however, that some words that were not expected to be appearing as the most informative words were prominent for some categories (Figure 3 .2). We concluded that words occurring in copyright notices, permission policies and the names of journals and organisations are added at the footer of abstracts in WoS database (see Table 3 .1, Table 3.2 and Table A .1). Such joints result in anomalies in the word clouds and our representation technique was able to detect them. A further cleaning on identified phrases, sentences and paragraphs was performed to avoid possible abnormal appearances of words in the lists. This is done by sampling of texts based on keywords search and then deleting them from the texts. After cleaning procedure, new versions of the LSC and the LScDC are created by the same pre-processing steps as for the previous versions and can be found in [22, 75] .

Words of LScDC were represented by vectors of RIGs in 252-dimensional Meaning Space as described before. The Word-Category Matrix for the LSC was formed with the collection of all words of the LScDC [76]. The sum S j and the maximum M j of RIGs in categories are calculated and added at the end of the matrix. Word clouds with the top 100 words and histograms of the most informative 10 words for each category are presented in [81] . The most informative 100 words for each category with their RIGs can be found in Appendix E and [81]. The proposed model of RIG-based word representation is analysed through these top ranked words in each category.

We have evaluated the Meaning Space by comparing our approach to traditional frequency-based model. Words in each category were also ranked and ordered by their raw frequencies in categories. It is proven that frequencies is not much important and efficient to represent scientifically specific meanings of words as the most frequent words are not topic related words such as 'use', 'studi' and 'result'. Figure 3 .12 compare two approaches using word clouds for three categories. The word clouds demonstrated that the information gainbased method is capable of standing topic-specific words out. This proves that the frequency is not much important in identifying such words. By representing words in the Meaning Space, we have shown by the human inspection that the top words in categories are topic-related in the corresponding category. It can therefore be viewed as an evidence of the usefulness of the Meaning Space and the representing words in this space.

S j and M j have been calculated for the LScDC words and two lists of words are created with words that are in descending orders by their S j and M j . The lists enable sorting the most important n words in science. We have compared these lists. The number of matches in the top n words in two lists are counted, where n is ranges from 100 to 50,000 (Table 3 .6). The top 10 words in two lists are completely different and only 28% of words match in the first 100 words. This follows approximately 50% for the first 1,000 and 58% for the first 2,000 words. This concludes that two lists are not the same for the top words ( Figure 3 .14 and Figure  3 .15), however, both criteria can be used for selection of top n words regarding to task and the information required. Many words in the lists have low S j and M j values. The plot of the number of words for S j and M j indicate a super exponential picks near zero S j and M j (see Figure 3 .16 and Figure 3 .17). The trend beyond the pick is going down almost linearly. Those words with near zero values can be considered as less meaningful words for scientific texts.

Finally, a scientific thesaurus of English, named Leicester Scientific Thesaurus (LScT), has been introduced. The thesaurus contains of the most informative 5,000 words from the LScDC. Words in the LScT are selected by their average RIGs in categories. That is, the top 5,000 most informative words in the LScDC, where words are arranged by their S j are considered as the most meaningful 5,000 words in scientific texts. The full list of words in the LScT with their S j can be found in [76] .

The next focus of the research in 'quantifying of meaning' will be extraction of the meaning of text in scientific corpus from the clouds of words in the Meaning Space and study of more complex models in which co-occurrence of words and combination of word's meaning will be used. This, we follow the road: Corpus of texts + categories → Meaning Space for words → Geometric representation of the meaning of texts. The first two technical steps were done: the Corpus of texts was collected and cleaned, and the meaning of words was represented and analysed in the Meaning Space. The next step will be analysis of the meaning of texts.

The analysis of dictionaries is not finalised yet. This work was focused on the most informative words. They are the main scientific content words. But, for example, the frequent but non-informative words (like 'use') can be considered as generalised service words of Science and deserve special analysis.

It is also very desirable to extend the set of attributes for representation of the situation behind the text (Figures 1.2, 1.3 ). The first choice, the research subject categories, is simple and natural, but it may be useful to enrich this list of attributes. [62] Asher, N., Van Dance 74

Appendix D. Word Clouds and Histograms for Categories.

Word clouds presenting the top 100 words ordered by their RIGs and histograms of RIGs for the first 10 words in the word clouds for 252 categories in LSC. .9 × 10 −3 2 farmer 8.7 × 10 −2 52 payment 9.9 × 10 −3 3 market 8.6 × 10 −2 53 agribusi 9.8 × 10 −3 4 farm 8.3 × 10 −2 54 economi 9.6 × 10 −3 5 price 8.1 × 10 −2 55 wtp 9.6 × 10 −3 6 food 7.2 × 10 −2 56 method 9.4 × 10 −3 7 polici 6 × 10 −2 57 choic 9.3 × 10 −3 8 product 4.8 × 10 −2 58 competit 9.3 × 10 −3 9 econom 3.6 × 10 −2 59 domest 9.1 × 10 −3 10 household 3.5 × 10 −2 60 conclus 8.8 × 10 −3 11 consum 3.3 × 10 −2 61 clinic 8.8 × 10 −3 12 incom 3.3 × 10 −2 62 nutrit 8.6 × 10 −3 13 countri 3.2 × 10 −2 63 veget 8.6 × 10 −3 14 sector 3. 1.3 × 10 −2 76 polyphenol 6.9 × 10 −3 27 feed 1.2 × 10 −2 77 genotyp 6.9 × 10 −3 28 rice 1.2 × 10 −2 78 grown 6.9 × 10 −3 29 season 1.2 × 10 −2 79 greenhous 6.8 × 10 −3 30 irrig 1.2 × 10 −2 80 anim 6.8 × 10 −3 31 veget 1.2 × 10 −2 81 intak 6.7 × 10 −3 32 nitrogen 1.2 × 10 −2 82 studi 6.7 × 10 −3 33 livestock 1.1 × 10 −2 83 ferment 6.6 × 10 −3 34 phenol 1.1 × 10 −2 84 corn 6.4 × 10 −3 35 growth 1.1 × 10 −2 85 meat 6.4 × 10 −3 36 harvest 1 × 10 −2 86 grow 6.4 × 10 −3 37 concentr 1 × 10 −2 87 fresh 6.4 × 10 −3 38 milk 1 × 10 −2 88 pakistan 6.3 × 10 −3 39 grass 9.9 × 10 −3 89 plot 6.3 × 10 −3 40 weight 9.9 × 10 −3 90 dietari 6.2 × 10 −3 41 grain 9.9 × 10 −3 91 per 6.2 × 10 −3 42 antioxid 9.8 × 10 −3 92 anthocyanin 6.1 × 10 −3 43 nutrit 9.7 × 10 −3 93 bean 6.1 × 10 −3 44 dairi 9.6 × 10 −3 94 tillag 6 × 10 −3 45 soybean 9.5 × 10 −3 95 rumin 6 × 10 −3 46 graze 9.5 × 10 −3 96 winter 5.9 × 10 −3 47 biomass 9.3 × 10 −3 97 potato 5.8 × 10 −3 48 cattl 9.2 × 10 −3 98 rumen 5.8 × 10 −3 49 manur 9.1 × 10 −3 99 compound 5.8 × 10 −3 50 trait 8.5 × 10 −3 100 grassland 5.8 × 10 −3 2.2 × 10 −2 83 tillag 1 × 10 −2 34 growth 2 × 10 −2 84 differ 9.9 × 10 −3 35 genet 1.9 × 10 −2 85 manag 9.9 × 10 −3 36 shoot 1.9 × 10 −2 86 three 9.7 × 10 −3 37 patient 1.9 × 10 −2 87 four 9.7 × 10 −3 38 field 1.8 × 10 −2 88 manur 9.6 × 10 −3 39 qtl 1.8 × 10 −2 89 grower 9.6 × 10 −3 40 veget 1.8 × 10 −2 90 qualiti 9.5 × 10 −3 41 grow 1.8 × 10 −2 91 tomato 9.4 × 10 −3 42 nutrient 1.8 × 10 −2 92 barley 9.4 × 10 −3 43 speci 1.8 × 10 −2 93 chromosom 9.2 × 10 −3 44 triticum 1.7 × 10 −2 94 method 9.1 × 10 −3 45 leav 1.7 × 10 −2 95 popul 9 × 10 −3 46 water 1.7 × 10 −2 96 treatment 8.9 × 10 −3 47 inocul 1.5 × 10 −2 97 spring 8.9 × 10 −3 48 paper 1.5 × 10 −2 98 suscept 8.8 × 10 −3 49 nitrogen 1.5 × 10 −2 99 fresh 8.7 × 10 −3 50 orchard 1.5 × 10 −2 100 progeni 8.7 × 10 −3 2.2 × 10 −2 87 cavernosum 1 × 10 −2 38 spermatogen 2 × 10 −2 88 glutathion 1 × 10 −2 39 intracytoplasm 1.9 × 10 −2 89 superoxid 9.8 × 10 −3 40 normal 1.9 × 10 −2 90 assay 9.8 × 10 −3 41 divid 1.9 × 10 −2 91 express 9.7 × 10 −3 42 level 1.9 × 10 −2 92 oestradiol 9.7 × 10 −3 43 iief 1.8 × 10 −2 93 fragment 9.6 × 10 −3 44 asthenozoosperm 1.7 × 10 −2 94 albuginea 9.6 × 10 −3 45 decreas 1.7 × 10 −2 95 simul 9.5 × 10 −3 46 normozoosperm 1.7 × 10 −2 96 stimul 9.5 × 10 −3 47 cauda 1.6 × 10 −2 97 ielt 9.5 × 10 −3 48 leydig 1.6 × 10 −2 98 lutein 9.5 × 10 −3 49 serum 1.6 × 10 −2 99 method 9.5 × 10 −3 50 icsi 1.6 × 10 −2 100 peroxid 9.5 × 10 −3 anthropolog 4.7 × 10 −2 52 radiocarbon 9.8 × 10 −3 3 ethnograph 4.5 × 10 −2 53 stone 9.8 × 10 −3 4 social 4.3 × 10 −2 54 live 9.7 × 10 −3 5 polit 3.1 × 10 −2 55 chronolog 9.6 × 10 −3 6 cultur 3.1 × 10 −2 56 individu 9.5 × 10 −3 7 argu 2.7 × 10 −2 57 world 9.1 × 10 −3 8 articl 2.4 × 10 −2 58 earli 9.1 × 10 −3 9 peopl 2.3 × 10 −2 59 debat 9 × 10 −3 10 centuri 2.1 × 10 −2 60 effect 9 × 10 −3 11 histor 2.1 × 10 −2 61 interpret 9 × 10 −3 12 neolith 2 × 10 −2 62 pleistocen 9 × 10 −3 13 site 2 × 10 −2 63 understand 8.7 × 10 −3 14 excav 1.9 × 10 −2 64 middl 8.7 × 10 −3 15 societi 1.9 × 10 −2 65 moral 8.6 × 10 −3 16 fieldwork 1. 1.9 × 10 −2 81 mesolith 1 × 10 −2 32 southern 1.9 × 10 −2 82 place 1 × 10 −2 33 interpret 1.9 × 10 −2 83 polit 1 × 10 −2 34 evid 1.8 × 10 −2 84 iron 1 × 10 −2 35 cal 1.8 × 10 −2 85 town 1 × 10 −2 36 tomb 1.8 × 10 −2 86 decor 1 × 10 −2 37 effect 1.8 × 10 −2 87 record 1 × 10 −2 38 context 1.7 × 10 −2 88 conclus 9.8 × 10 −3 39 remain 1.6 × 10 −2 89 taphonom 9.8 × 10 −3 40 mediev 1.6 × 10 −2 90 prehistori 9.7 × 10 −3 41 artifact 1.6 × 10 −2 91 east 9.7 × 10 −3 42 middl 1.6 × 10 −2 92 europ 9.7 × 10 −3 43 occup 1.6 × 10 −2 93 reduc 9.7 × 10 −3 44 patient 1.5 × 10 −2 94 isotop 9.6 × 10 −3 45 hunter 1.5 × 10 −2 95 histori 9.5 × 10 −3 46 northern 1.5 × 10 −2 96 templ 9.5 × 10 −3 47 zooarchaeolog 1.5 × 10 −2 97 citi 9.5 × 10 −3 48 social 1.5 × 10 −2 98 earliest 9.4 × 10 −3 49 cemeteri 1.4 × 10 −2 99 bone 9.4 × 10 −3 50 north 1.4 × 10 −2 100 central 9.4 × 10 −3 1.3 × 10 −2 82 sampl 6.9 × 10 −3 33 cell 1.2 × 10 −2 83 histori 6.9 × 10 −3 34 tradit 1.2 × 10 −2 84 detect 6.8 × 10 −3 35 typolog 1.2 × 10 −2 85 decreas 6.7 × 10 −3 36 sustain 1.1 × 10 −2 86 idea 6.7 × 10 −3 37 twentieth 1 × 10 −2 87 new 6.7 × 10 −3 38 conserv 9.7 × 10 −3 88 nineteenth 6.6 × 10 −3 39 earthen 9.6 × 10 −3 89 technolog 6.6 × 10 −3 40 digit 9.6 × 10 −3 90 induc 6.6 × 10 −3 41 preserv 9.5 × 10 −3 91 creation 6.6 × 10 −3 42 conclus 9.4 × 10 −3 92 acid 6.6 × 10 −3 43 show 9.3 × 10 −3 93 higher 6.5 × 10 −3 44 residenti 9.2 × 10 −3 94 survey 6.5 × 10 −3 45 villag 9.1 × 10 −3 95 dwell 6.4 × 10 −3 46 environ 9 × 10 −3 96 way 6.3 × 10 −3 47 today 8.9 × 10 −3 97 social 6.3 × 10 −3 48 clinic 8.8 × 10 −3 98 obtain 6.3 × 10 −3 49 artist 8.7 × 10 −3 99 high 6.1 × 10 −3 50 materi 8.7 × 10 −3 100 gene 6.1 × 10 −3 1.2 × 10 −2 87 beauti 6.9 × 10 −3 38 decor 1.2 × 10 −2 88 protein 6.9 × 10 −3 39 increas 1.2 × 10 −2 89 higher 6.9 × 10 −3 40 project 1.2 × 10 −2 90 low 6.9 × 10 −3 41 depict 1.2 × 10 −2 91 michelangelo 6.8 × 10 −3 42 explor 1.2 × 10 −2 92 mediev 6.7 × 10 −3 43 high 1.1 × 10 −2 93 christ 6.7 × 10 −3 44 method 1.1 × 10 −2 94 notion 6.6 × 10 −3 45 twentieth 1.1 × 10 −2 95 improv 6.6 × 10 −3 46 narrat 1.1 × 10 −2 96 induc 6.6 × 10 −3 47 social 1.1 × 10 −2 97 ratio 6.5 × 10 −3 48 effect 1 × 10 −2 98 style 6.5 × 10 −3 49 conclus 1 × 10 −2 99 roman 6.5 × 10 −3 50 music 1 × 10 −2 100 diseas 6.3 × 10 −3 telescop 9.9 × 10 −2 53 gev 1.8 × 10 −2 4 stellar 9.8 × 10 −2 54 baryon 1.8 × 10 −2 5 galact 6.6 × 10 −2 55 spectra 1.8 × 10 −2 6 mass 5.9 × 10 −2 56 near 1.7 × 10 −2 7 cosmolog 5.4 × 10 −2 57 relativist 1.6 × 10 −2 8 luminos 5.1 × 10 −2 58 space 1.6 × 10 −2 9 redshift 5 × 10 −2 59 astronom 1.6 × 10 −2 10 observ 4.9 × 10 −2 60 model 1.6 × 10 −2 11 cosmic 4.5 × 10 −2 61 giant 1.6 × 10 −2 12 gravit 4.5 × 10 −2 62 larg 1.6 × 10 −2 13 accret 3.9 × 10 −2 63 angular 1.6 × 10 −2 14 observatori 3.7 × 10 −2 64 dot 1.6 × 10 −2 15 circl 3.6 × 10 −2 65 agn 1. propos 7 × 10 −2 52 examin 1 × 10 −2 3 system 5.7 × 10 −2 53 trajectori 9.9 × 10 −3 4 control 4.4 × 10 −2 54 year 9.9 × 10 −3 5 robot 3.9 × 10 −2 55 constraint 9.5 × 10 −3 6 problem 3.8 × 10 −2 56 background 9.4 × 10 −3 7 algorithm 3.7 × 10 −2 57 converg 9.3 × 10 −3 8 simul 3.6 × 10 −2 58 motion 9.2 × 10 −3 9 lyapunov 2.6 × 10 −2 59 acid 9.2 × 10 −3 10 nonlinear 2.6 × 10 −2 60 inequ 9.1 × 10 −3 11 design 2.4 × 10 −2 61 assess 9.1 × 10 −3 12 conclus 2.3 × 10 −2 62 network 9 × 10 −3 13 studi 2. 1.9 × 10 −2 81 nest 1 × 10 −2 32 season 1.8 × 10 −2 82 prey 1 × 10 −2 33 patient 1.8 × 10 −2 83 woodland 9.9 × 10 −3 34 endem 1.8 × 10 −2 84 mammal 9.9 × 10 −3 35 north 1.8 × 10 −2 85 clinic 9.9 × 10 −3 36 extinct 1.8 × 10 −2 86 suggest 9.9 × 10 −3 37 assemblag 1.8 × 10 −2 87 america 9.8 × 10 −3 38 plant 1.7 × 10 −2 88 invertebr 9.8 × 10 −3 39 spatial 1.7 × 10 −2 89 within 9.8 × 10 −3 40 anthropogen 1.7 × 10 −2 90 disturb 9.7 × 10 −3 41 predat 1.7 × 10 −2 91 cell 9.7 × 10 −3 42 loci 1.7 × 10 −2 92 taxonom 9.7 × 10 −3 43 tree 1.6 × 10 −2 93 mountain 9.6 × 10 −3 44 northern 1.6 × 10 −2 94 nonnat 9.6 × 10 −3 45 river 1.6 × 10 −2 95 locat 9.6 × 10 −3 46 invas 1.5 × 10 −2 96 indic 9.5 × 10 −3 47 southern 1.5 × 10 −2 97 locus 9.4 × 10 −3 48 threat 1.5 × 10 −2 98 forag 9.3 × 10 −3 49 survey 1.5 × 10 −2 99 monitor 9.3 × 10 −3 50 rang 1.5 × 10 −2 100 freshwat 9.3 × 10 −3 bind 4.2 × 10 −2 52 signal 6.5 × 10 −3 3 cell 3.8 × 10 −2 53 phosphoryl 6.4 × 10 −3 4 membran 2 × 10 −2 54 target 6.3 × 10 −3 5 paper 1.9 × 10 −2 55 fold 6.3 × 10 −3 6 conform 1.6 × 10 −2 56 bilay 6.2 × 10 −3 7 angstrom 1.6 × 10 −2 57 suggest 5.8 × 10 −3 8 molecular 1.6 × 10 −2 58 kinas 5.8 × 10 −3 9 molecul 1.5 × 10 −2 59 play 5.7 × 10 −3 10 induc 1.5 × 10 −2 60 strand 5.6 × 10 −3 11 activ 1.5 × 10 −2 61 coli 5.6 × 10 −3 12 interact product 2.9 × 10 −2 54 concentr 8 × 10 −3 5 ferment 2.9 × 10 −2 55 virus 7.6 × 10 −3 6 cell 2.9 × 10 −2 56 fold 7.6 × 10 −3 7 protein 2.8 × 10 −2 57 sugar 7.5 × 10 −3 8 sequenc 2.7 × 10 −2 58 16s 7.3 × 10 −3 9 enzym 2.6 × 10 −2 59 medium 7.3 × 10 −3 10 express 2.3 × 10 −2 60 transcriptom 7.2 × 10 −3 11 paper 2.2 × 10 −2 61 hydrolysi 7 × 10 −3 12 cultur 2 × 10 −2 62 bacterium 6.9 × 10 −3 13 acid 2 × 10 −2 63 specif 6.9 × 10 −3 14 biomass 2 × 10 −2 64 mutant 6.8 × 10 −3 15 coli 2 × 10 −2 65 propos 6.7 × 10 −3 16 bacteria 1.8 × 10 −2 66 biotechnolog 6.6 × 10 −3 17 escherichia 1.8 × 10 −2 67 human 6.6 × 10 −3 18 microbi 1.6 × 10 −2 68 rrna 6.6 × 10 −3 19 bacteri 1.6 × 10 −2 69 biosynthesi 6.6 × 10 −3 20 produc 1.6 × 10 −2 70 amino 6.5 × 10 −3 21 recombin 1. 2.1 × 10 −2 77 anteced 9.8 × 10 −3 28 countri 2.1 × 10 −2 78 intent 9.7 × 10 −3 29 invest 2 × 10 −2 79 survey 9.7 × 10 −3 30 entrepreneuri 2 × 10 −2 80 theoret 9.5 × 10 −3 31 paper 1.9 × 10 −2 81 diseas 9.5 × 10 −3 32 cell 1.9 × 10 −2 82 financ 9.2 × 10 −3 33 relationship 1.8 × 10 −2 83 strategi 9.2 × 10 −3 34 practic 1.8 × 10 −2 84 stakehold 9.1 × 10 −3 35 sector 1.7 × 10 −2 85 multin 9 × 10 −3 36 entrepreneurship 1.7 × 10 −2 86 product 9 × 10 −3 37 perspect 1.7 × 10 −2 87 paramet 9 × 10 −3 38 patient 1.7 × 10 −2 88 technolog 8.9 × 10 −3 39 entrepreneur 1.5 × 10 −2 89 observ 8.9 × 10 −3 40 theori 1. crisi 4.4 × 10 −2 59 institut 1.1 × 10 −2 10 return 3.8 × 10 −2 60 incent 1.1 × 10 −2 11 capit 3.7 × 10 −2 61 protein 1 × 10 −2 12 compani 3.5 × 10 −2 62 auditor 1 × 10 −2 13 econom 3.5 × 10 −2 63 conclus 9.9 × 10 −3 14 invest 3.5 × 10 −2 64 experiment 9.9 × 10 −3 15 trade 3.3 × 10 −2 65 czech 9.9 × 10 −3 16 equiti 3.3 × 10 −2 66 industri 9.9 × 10 −3 17 economi 3.2 × 10 −2 67 impact 9.8 × 10 −3 18 corpor 3.2 × 10 −2 68 evid 9.5 × 10 −3 19 credit 3 × 10 −2 69 audit 9.5 × 10 −3 20 financ 3 × 10 −2 70 public 9.5 × 10 −3 21 countri 3 × 10 −2 71 valuat 9.1 × 10 −3 22 portfolio 3 × 10 −2 72 disclosur 9.1 × 10 −3 23 find 2.9 × 10 −2 73 romania 9 × 10 −3 24 empir 2.8 × 10 −2 74 transact 9 × 10 −3 25 debt 2.7 × 10 −2 75 diseas 8.9 × 10 −3 26 paper 2. oil 1.8 × 10 −2 57 fatti 6.9 × 10 −3 8 chromatographi 1.8 × 10 −2 58 gas 6.7 × 10 −3 9 antioxid 1.8 × 10 −2 59 synthesi 6.7 × 10 −3 10 reaction 1.7 × 10 −2 60 radic 6.6 × 10 −3 11 phenol 1.6 × 10 −2 61 outcom 6.5 × 10 −3 12 nmr 1.5 × 10 −2 62 polyphenol 6.5 × 10 −3 13 food 1. propos 8.9 × 10 −2 52 decreas 1 × 10 −2 3 algorithm 7.9 × 10 −2 53 robust 1 × 10 −2 4 problem 4 × 10 −2 54 concentr 1 × 10 −2 5 learn 3.4 × 10 −2 55 associ 1 × 10 −2 6 base 3 × 10 −2 56 age 9.9 × 10 −3 7 approach 2.8 × 10 −2 57 model 9.9 × 10 −3 8 fuzzi 2.7 × 10 −2 58 signific 9.5 × 10 −3 9 dataset 2.7 × 10 −2 59 can 9.5 × 10 −3 10 comput 2.5 × 10 −2 60 vector 9.4 × 10 −3 11 studi 2.5 × 10 −2 61 induc 9.4 × 10 −3 12 conclus 2.4 × 10 −2 62 investig 9.3 × 10 −3 13 real 2.2 × 10 −2 63 scene 9 × 10 −3 14 classif 2.2 × 10 −2 64 experi 9 × 10 −3 15 task 2.2 × 10 −2 65 materi 8.9 × 10 −3 16 art 2 × 10 −2 66 speci 8.7 × 10 −3 17 robot 2 × 10 −2 67 clinic 8.6 × 10 −3 18 featur 1.9 × 10 −2 68 report 8. 1 × 10 −2 94 provid 6.9 × 10 −3 45 observ 1 × 10 −2 95 video 6.9 × 10 −3 46 deploy 1 × 10 −2 96 report 6.9 × 10 −3 47 task 1 × 10 −2 97 onlin 6.9 × 10 −3 48 data 1 × 10 −2 98 environ 6.9 × 10 −3 49 age 1 × 10 −2 99 issu 6.9 × 10 −3 50 dataset 1 × 10 −2 100 scenario 6.9 × 10 −3 7 × 10 −3 79 discret 3.9 × 10 −3 30 induc 6.9 × 10 −3 80 virtual 3.9 × 10 −3 31 decreas 6.8 × 10 −3 81 servic 3.9 × 10 −3 32 suggest 6.7 × 10 −3 82 order 3.9 × 10 −3 33 associ 6.5 × 10 −3 83 examin 3.9 × 10 −3 34 machin 6.4 × 10 −3 84 appli 3. 1.1 × 10 −2 84 run 6.9 × 10 −3 35 clinic 1 × 10 −2 85 represent 6.8 × 10 −3 36 abstract 1 × 10 −2 86 enabl 6.8 × 10 −3 37 protein 1 × 10 −2 87 examin 6.7 × 10 −3 38 tool 1 × 10 −2 88 check 6.7 × 10 −3 39 runtim 1 × 10 −2 89 reaction 6.6 × 10 −3 40 servic 9.9 × 10 −3 90 indic 6.6 × 10 −3 41 concentr 9.6 × 10 −3 91 background 6.5 × 10 −3 42 investig 9.6 × 10 −3 92 make 6.5 × 10 −3 43 video 9.5 × 10 −3 93 support 6.4 × 10 −3 44 acid 9.5 × 10 −3 94 server 6.4 × 10 −3 45 set 9.4 × 10 −3 95 group 6.2 × 10 −3 46 graph 9.3 × 10 −3 96 platform 6.2 × 10 −3 47 age 9.3 × 10 −3 97 higher 6.1 × 10 −3 48 inform 9.3 × 10 −3 98 domain 6 × 10 −3 49 diseas 9.2 × 10 −3 99 provid 6 × 10 −3 50 queri 9 × 10 −3 100 featur 5.9 × 10 −3 propos 6.7 × 10 −2 52 surfac 9.7 × 10 −3 3 algorithm 5.7 × 10 −2 53 diseas 9.6 × 10 −3 4 user 3.4 × 10 −2 54 introduc 9.6 × 10 −3 5 comput 3.3 × 10 −2 55 queri 9.5 × 10 −3 6 problem 3 × 10 −2 56 indic 9.5 × 10 −3 7 conclus 2.7 × 10 −2 57 web 9.3 × 10 −3 8 network 2.5 × 10 −2 58 materi 9.3 × 10 −3 9 studi 2.5 × 10 −2 59 solv 9 × 10 −3 10 base 2.2 × 10 −2 60 effici 8.9 × 10 −3 11 patient 1.9 × 10 −2 61 gene 8.9 × 10 −3 12 secur 1. 1 × 10 −2 95 exist 6.9 × 10 −3 46 node 1 × 10 −2 96 mass 6.7 × 10 −3 47 automat 1 × 10 −2 97 platform 6.7 × 10 −3 48 framework 9.9 × 10 −3 98 messag 6.7 × 10 −3 49 speci 9.9 × 10 −3 99 prepar 6.7 × 10 −3 50 decreas 9.8 × 10 −3 100 resourc 6.7 × 10 −3 icu 9.9 × 10 −2 53 injur 2 × 10 −2 4 injuri 9.4 × 10 −2 54 sever 2 × 10 −2 5 hospit 8.4 × 10 −2 55 shock 2 × 10 −2 6 care 8 × 10 −2 56 death 2 × 10 −2 7 mortal 6.7 × 10 −2 57 odd 2 × 10 −2 8 trauma 6.7 × 10 −2 58 inhospit 2 × 10 −2 9 ventil 6.5 × 10 −2 59 adult 2 × 10 −2 10 outcom 6.4 × 10 −2 60 surviv 1.9 × 10 −2 11 admiss 5.6 × 10 −2 61 iss 1.8 × 10 −2 12 resuscit 5.5 × 10 −2 62 ohca 1.8 × 10 −2 13 prospect 4.9 × 10 −2 63 critic 1. 1.9 × 10 −2 79 perform 9.9 × 10 −3 30 union 1.9 × 10 −2 80 subsaharan 9.9 × 10 −3 31 argu 1.9 × 10 −2 81 mortal 9.9 × 10 −3 32 born 1.9 × 10 −2 82 europ 9.6 × 10 −3 33 transnat 1.9 × 10 −2 83 abroad 9.3 × 10 −3 34 refuge 1. 1.7 × 10 −2 94 find 9.9 × 10 −3 45 question 1.7 × 10 −2 95 peer 9.9 × 10 −3 46 lectur 1.6 × 10 −2 96 nation 9.9 × 10 −3 47 elearn 1.6 × 10 −2 97 peopl 9.7 × 10 −3 48 instructor 1.6 × 10 −2 98 acid 9.7 × 10 −3 49 collabor 1.6 × 10 −2 99 work 9.6 × 10 −3 50 questionnair 1.6 × 10 −2 100 issu 9.6 × 10 −3 aerodynam 3.6 × 10 −2 56 moon 1 × 10 −2 7 satellit 3.2 × 10 −2 57 diseas 9.9 × 10 −3 8 paper 3 × 10 −2 58 motion 9.6 × 10 −3 9 simul 2.5 × 10 −2 59 flowfield 9.5 × 10 −3 10 maneuv 2.5 × 10 −2 60 track 9.5 × 10 −3 11 space 2.5 × 10 −2 61 hyperson 9.3 × 10 −3 12 earth 2. 6.8 × 10 −3 79 propos 4 × 10 −3 30 regener 6.8 × 10 −3 80 bci 3.9 × 10 −3 31 signal 6.7 × 10 −3 81 postur 3.9 × 10 −3 32 knee 6.7 × 10 −3 82 algorithm 3.9 × 10 −3 33 brain 6.5 × 10 −3 83 compar 3.8 × 10 −3 34 stiff 6.4 × 10 −3 84 hemodynam 3.8 × 10 −3 35 joint 6.3 × 10 −3 85 dental 3.8 × 10 −3 36 ecg 6.2 × 10 −3 86 plant 3.8 × 10 −3 37 cell 6.1 × 10 −3 87 temperatur 3.8 × 10 −3 38 noninvas 5.9 × 10 −3 88 method 3.7 × 10 −3 39 rehabilit 5.9 × 10 −3 89 hip 3.7 × 10 −3 40 biomed 5.8 × 10 −3 90 regen 3.7 × 10 −3 41 walk 5.8 × 10 −3 91 ventricular 3.6 × 10 −3 42 tomographi 5.7 × 10 −3 92 perform 3.6 × 10 −3 43 load 5.7 × 10 −3 93 attach 3.6 × 10 −3 44 limb 5.7 × 10 −3 94 ultrasound 3.6 × 10 −3 45 forc 5.7 × 10 −3 95 neural 3.6 × 10 −3 46 physiolog 5.6 × 10 −3 96 registr 3.6 × 10 −3 47 emg 5.5 × 10 −3 97 base 3.5 × 10 −3 48 osteogen 5.3 × 10 −3 98 promis 3.5 × 10 −3 49 reconstruct 5.3 × 10 −3 99 biomimet 3.5 × 10 −3 50 deliveri 5.2 × 10 −3 100 modulus 3.5 × 10 −3 machin 3.9 × 10 −2 52 surfac 6.1 × 10 −3 3 paper 3 × 10 −2 53 strength 6.1 × 10 −3 4 process 2.7 × 10 −2 54 treatment 6.1 × 10 −3 5 workpiec 1.9 × 10 −2 55 popul 6.1 × 10 −3 6 cut 1.7 × 10 −2 56 custom 5.9 × 10 −3 7 patient 1.7 × 10 −2 57 cancer 5.9 × 10 −3 8 weld 1.6 × 10 −2 58 compani 5.7 × 10 −3 9 conclus 1.6 × 10 −2 59 aluminum 5.6 × 10 −3 10 industri 1.5 × 10 −2 60 deform 5.6 × 10 −3 11 steel 1.5 × 10 −2 61 taguchi 5.5 × 10 −3 12 tool 1.4 × 10 −2 62 grind 5.5 × 10 −3 13 alloy 1. 6.9 × 10 −3 90 inhibit 4.1 × 10 −3 41 express 6.9 × 10 −3 91 fabric 4 × 10 −3 42 report 6.8 × 10 −3 92 assay 4 × 10 −3 43 may 6.7 × 10 −3 93 day 4 × 10 −3 44 hard 6.6 × 10 −3 94 demand 4 × 10 −3 45 cost 6.6 × 10 −3 95 problem 3.9 × 10 −3 46 propos 6.6 × 10 −3 96 heat 3.9 × 10 −3 47 microstructur 6.5 × 10 −3 97 bind 3.9 × 10 −3 48 finit 6.4 × 10 −3 98 male 3.9 × 10 −3 49 engin 6.3 × 10 −3 99 sheet 3.9 × 10 −3 50 year 6.2 × 10 −3 100 metal 3.9 × 10 −3 8.6 × 10 −3 72 order 3.9 × 10 −3 23 base 8.3 × 10 −3 73 evid 3.9 × 10 −3 24 system 8.2 × 10 −3 74 popul 3.9 × 10 −3 25 exampl 7.9 × 10 −3 75 bind 3.9 × 10 −3 26 signific 7.2 × 10 −3 76 vitro 3.9 × 10 −3 27 group 7.1 × 10 −3 77 solut 3.8 × 10 −3 28 background 6.6 × 10 −3 78 mice 3.8 × 10 −3 29 comput 5.9 × 10 −3 79 therapeut 3.7 × 10 −3 30 activ 5.9 × 10 −3 80 pathway 3.6 × 10 −3 31 cancer 5.9 × 10 −3 81 cours 3.6 × 10 −3 32 machin 5.9 × 10 −3 82 beta 3.5 × 10 −3 33 report 5.8 × 10 −3 83 observ 3.5 × 10 −3 34 nonlinear 5.7 × 10 −3 84 day 3.5 × 10 −3 35 may 5.7 × 10 −3 85 learn 3.4 × 10 −3 36 studi 5.6 × 10 −3 86 male 3.4 × 10 −3 37 acid 5.5 × 10 −3 87 process 3.4 × 10 −3 38 model 5. 2.2 × 10 −2 77 perform 1.1 × 10 −2 28 context 2 × 10 −2 78 contest 1.1 × 10 −2 29 latino 2 × 10 −2 79 use 1.1 × 10 −2 30 communiti 2 × 10 −2 80 struggl 1 × 10 −2 31 contemporari 2 × 10 −2 81 school 1 × 10 −2 32 citizenship 1.9 × 10 −2 82 econom 1 × 10 −2 33 african 1.9 × 10 −2 83 diaspora 1 × 10 −2 34 countri 1.9 × 10 −2 84 right 1 × 10 −2 35 peopl 1.8 × 10 −2 85 temperatur 1 × 10 −2 36 muslim 1.8 × 10 −2 86 among 1 × 10 −2 37 racist 1.8 × 10 −2 87 obtain 9.9 × 10 −3 38 interview 1.8 × 10 −2 88 gender 9.9 × 10 −3 39 postraci 1.7 × 10 −2 89 perceiv 9.9 × 10 −3 40 result 1.7 × 10 −2 90 british 9.9 × 10 −3 41 accultur 1.7 × 10 −2 91 group 9.8 × 10 −3 42 ethnograph 1.6 × 10 −2 92 discurs 9.6 × 10 −3 43 attitud 1.6 × 10 −2 93 understand 9.6 × 10 −3 44 debat 1.6 × 10 −2 94 state 9.6 × 10 −3 45 migrat 1.6 × 10 −2 95 surfac 9.4 × 10 −3 46 narrat 1.6 × 10 −2 96 antiracist 9.4 × 10 −3 47 religi 1.5 × 10 −2 97 themselv 9.4 × 10 −3 48 explor 1.5 × 10 −2 98 histor 9.3 × 10 −3 49 usa 1.5 × 10 −2 99 nationalist 9.3 × 10 −3 50 educ 1.4 × 10 −2 100 war 9.2 × 10 −3 phylogenet 8 × 10 −2 53 patient 1.8 × 10 −2 4 genet 6.9 × 10 −2 54 dna 1.8 × 10 −2 5 popul 6.1 × 10 −2 55 sexual 1.7 × 10 −2 6 evolut 6 × 10 −2 56 nuclear 1.7 × 10 −2 7 diverg 5.9 × 10 −2 57 monophyli 1.6 × 10 −2 8 lineag 5.6 × 10 −2 58 evid 1.6 × 10 −2 9 clade 5 × 10 −2 59 plant 1.6 × 10 −2 10 phylogeni 4.9 × 10 −2 60 taxon 1.5 × 10 −2 11 taxa 4.9 × 10 −2 61 polymorph 1.5 × 10 −2 12 divers 4.8 × 10 −2 62 conserv 1. 1.9 × 10 −2 85 literari 1 × 10 −2 36 director 1.9 × 10 −2 86 compar 1 × 10 −2 37 bbc 1.9 × 10 −2 87 discuss 1 × 10 −2 38 portray 1.8 × 10 −2 88 theatric 1 × 10 −2 39 news 1.8 × 10 −2 89 theme 1 × 10 −2 40 popular 1.8 × 10 −2 90 applic 1 × 10 −2 41 screenplay 1.8 × 10 −2 91 industri 1 × 10 −2 42 american 1.7 × 10 −2 92 higher 1 × 10 −2 43 writer 1.7 × 10 −2 93 focus 9.8 × 10 −3 44 imagin 1.7 × 10 −2 94 histor 9.7 × 10 −3 45 way 1.7 × 10 −2 95 evalu 9.6 × 10 −3 46 artist 1.7 × 10 −2 96 effect 9.6 × 10 −3 47 style 1.6 × 10 −2 97 temperatur 9.5 × 10 −3 48 filmic 1.6 × 10 −2 98 debat 9.5 × 10 −3 49 comedi 1.6 × 10 −2 99 observ 9.4 × 10 −3 50 theatr 1.6 × 10 −2 100 discurs 9.3 × 10 −3 1.6 × 10 −2 95 bass 9.9 × 10 −3 46 popul 1.6 × 10 −2 96 pelag 9.8 × 10 −3 47 hatch 1.6 × 10 −2 97 conserv 9.8 × 10 −3 48 weight 1.5 × 10 −2 98 total 9.5 × 10 −3 49 salmonid 1.5 × 10 −2 99 anguillarum 9.5 × 10 −3 50 tilapia 1.4 × 10 −2 100 flounder 9.4 × 10 −3 folk 9 × 10 −2 53 vernacular 1.5 × 10 −2 4 folklorist 8.2 × 10 −2 54 today 1.5 × 10 −2 5 centuri 6.5 × 10 −2 55 histori 1.5 × 10 −2 6 cultur 6.1 × 10 −2 56 measur 1.5 × 10 −2 7 ritual 5.3 × 10 −2 57 anthropolog 1.5 × 10 −2 8 tradit 5.3 × 10 −2 58 alan 1.5 × 10 −2 9 legend 4.6 × 10 −2 59 supernatur 1.5 × 10 −2 10 narrat 4.3 × 10 −2 60 written 1.5 × 10 −2 11 result 4.2 × 10 −2 61 witch 1.4 × 10 −2 12 indigen 3.5 × 10 −2 62 postmediev 1.4 × 10 −2 13 song 3. antioxid 4 × 10 −2 54 solubl 1.1 × 10 −2 5 milk 3.8 × 10 −2 55 anthocyanin 1.1 × 10 −2 6 product 3.6 × 10 −2 56 raw 1.1 × 10 −2 7 extract 3.2 × 10 −2 57 respect 1.1 × 10 −2 8 chromatographi 2.9 × 10 −2 58 grape 1.1 × 10 −2 9 phenol 2.9 × 10 −2 59 moistur 1 × 10 −2 10 concentr 2.9 × 10 −2 60 whey 1 × 10 −2 11 compound 2.6 × 10 −2 61 lipid 1 × 10 −2 12 sensori 2.4 × 10 −2 62 min 1 × 10 −2 13 sampl 2.4 × 10 −2 63 contamin 9.9 × 10 −3 14 fruit 2.3 × 10 −2 64 monocytogen 9.8 × 10 −3 15 ferment 2.2 × 10 −2 65 flavor 9.7 × 10 −3 16 oil 2.2 × 10 −2 66 activ 9.7 × 10 −3 17 meat 2.2 × 10 −2 67 liquid 9.6 × 10 −3 18 hplc 2 × 10 −2 68 commerci 9.6 × 10 −3 19 flour 1.9 × 10 −2 69 water 9.6 × 10 −3 20 storag 1.9 × 10 −2 70 qualiti 9.6 × 10 −3 21 fatti 1.8 × 10 −2 71 edibl 9.6 × 10 −3 22 dri 1.8 × 10 −2 72 shelf 9.5 × 10 −3 23 dairi 1.8 × 10 −2 73 isol 9.4 × 10 −3 24 polyphenol 1.7 × 10 −2 74 bacteria 9.4 × 10 −3 25 fat 1.7 × 10 −2 75 highest 9.4 × 10 −3 26 starch 1.7 × 10 −2 76 textur 9.2 × 10 −3 27 patient 1.7 × 10 −2 77 enzym 9.2 × 10 −3 28 paper 1.7 × 10 −2 78 digest 9.1 × 10 −3 29 cook 1.6 × 10 −2 79 bread 9 × 10 −3 30 dpph 1.5 × 10 −2 80 salmonella 9 × 10 −3 31 juic 1.5 × 10 −2 81 lactic 8.9 × 10 −3 32 wine 1.4 × 10 −2 82 contain 8.8 × 10 −3 33 spectrometri 1. background 7 × 10 −2 56 recurr 2 × 10 −2 7 diseas 6.7 × 10 −2 57 bile 2 × 10 −2 8 endoscop 6.7 × 10 −2 58 infect 2 × 10 −2 9 bowel 6.6 × 10 −2 59 diagnos 1.9 × 10 −2 10 resect 4.7 × 10 −2 60 postop 1.9 × 10 −2 11 cirrhosi 4.6 × 10 −2 61 prospect 1.9 × 10 −2 12 clinic 4.2 × 10 −2 62 abdomin 1.9 × 10 −2 13 crohn 4.2 × 10 −2 63 multivari 1.9 × 10 −2 14 gastric 4 × 10 −2 64 pylori 1.8 × 10 −2 15 hepatocellular 3.9 × 10 −2 65 hospit 1.8 × 10 −2 16 underw 3.9 × 10 −2 66 age 1.8 × 10 −2 17 pancreat 3.8 × 10 −2 67 fibrosi 1.8 × 10 −2 18 coliti 3.7 × 10 −2 68 hbv 1.7 × 10 −2 19 endoscopi 3.7 × 10 −2 69 rectal 1.7 × 10 −2 20 cancer 3.7 × 10 −2 70 transplant 1.7 × 10 −2 21 ulcer 3.5 × 10 −2 71 helicobact 1.7 × 10 −2 22 treatment 3.4 × 10 −2 72 virus 1.7 × 10 −2 23 therapi 3.4 × 10 −2 73 serum 1.6 × 10 −2 24 gastrointestin 3.3 × 10 −2 74 result 1.6 × 10 −2 25 colorect 3.2 × 10 −2 75 treat 1.6 × 10 −2 26 carcinoma 3.2 × 10 −2 76 includ 1.6 × 10 −2 27 retrospect 3.1 × 10 −2 77 factor 1.6 × 10 −2 28 paper 3.1 × 10 −2 78 lesion 1.6 × 10 −2 29 chronic 3.1 × 10 −2 79 nonalcohol 1.6 × 10 −2 30 associ 2.9 × 10 −2 80 bleed 1.5 × 10 −2 31 hcc 2.9 × 10 −2 81 cohort 1.5 × 10 −2 32 esophag 2.9 × 10 −2 82 mortal 1.5 × 10 −2 33 ibd 2.9 × 10 −2 83 structur 1.5 × 10 −2 34 surgeri 2.8 × 10 −2 84 group 1.5 × 10 −2 35 outcom 2.7 × 10 −2 85 score 1.5 × 10 −2 36 complic 2.7 × 10 −2 86 propos 1. 2 × 10 −2 82 epigenet 1 × 10 −2 33 delet 2 × 10 −2 83 chromatin 9.9 × 10 −3 34 microsatellit 1.9 × 10 −2 84 missens 9.9 × 10 −3 35 transcriptom 1.9 × 10 −2 85 specif 9.7 × 10 −3 36 autosom 1.9 × 10 −2 86 solut 9.5 × 10 −3 37 cell 1.8 × 10 −2 87 regulatori 9.4 × 10 −3 38 phylogenet 1.8 × 10 −2 88 analysi 9.1 × 10 −3 39 suggest 1.8 × 10 −2 89 previous 9 × 10 −3 40 molecular 1.7 × 10 −2 90 evolut 8.9 × 10 −3 41 divers 1. .4 × 10 −2 51 radar 1.7 × 10 −2 2 rock 8.5 × 10 −2 52 northern 1.6 × 10 −2 3 mantl 6.9 × 10 −2 53 constrain 1.6 × 10 −2 4 crust 5.2 × 10 −2 54 model 1.6 × 10 −2 5 crustal 4.9 × 10 −2 55 orogen 1.6 × 10 −2 6 earthquak 4.7 × 10 −2 56 composit 1.6 × 10 −2 7 zone 4.6 × 10 −2 57 conclus 1.5 × 10 −2 8 isotop 4.5 × 10 −2 58 belt 1.5 × 10 −2 9 subduct 4.2 × 10 −2 59 upper 1.5 × 10 −2 10 tecton 4.1 × 10 −2 60 near 1.5 × 10 −2 11 geolog 4 × 10 −2 61 emplac 1.5 × 10 −2 12 lithospher 4 × 10 −2 62 estim 1.5 × 10 −2 13 earth 3.9 × 10 −2 63 fluid 1. 1.9 × 10 −2 70 livelihood 1 × 10 −2 21 explor 1.9 × 10 −2 71 clinic 1 × 10 −2 22 actor 1.9 × 10 −2 72 world 1 × 10 −2 23 territori 1.8 × 10 −2 73 cultur 1 × 10 −2 24 region 1.8 × 10 −2 74 result 1 × 10 −2 25 space 1.8 × 10 −2 75 develop 9.9 × 10 −3 26 patient 1.7 × 10 −2 76 particular 9.9 × 10 −3 27 communiti 1.7 × 10 −2 77 emerg 9.7 × 10 −3 28 context 1.7 × 10 −2 78 ecolog 9.7 × 10 −3 29 focus 1.7 × 10 −2 79 question 9.6 × 10 −3 30 public 1.7 × 10 −2 80 conceptu 9.6 × 10 −3 31 discours 1.6 × 10 −2 81 concept 9.3 × 10 −3 32 paper 1.6 × 10 −2 82 residenti 9.3 × 10 −3 33 socio 1.6 × 10 −2 83 protein 9.1 × 10 −3 34 nation 1.6 × 10 −2 84 map 9 × 10 −3 35 plan 1.6 × 10 −2 85 within 8.9 × 10 −3 36 understand 1.6 × 10 −2 86 resid 8.9 × 10 −3 37 engag 1.5 × 10 −2 87 natur 8. glacial 4.2 × 10 −2 56 topograph 1.6 × 10 −2 7 spatial 4 × 10 −2 57 eastern 1.6 × 10 −2 8 land 3.9 × 10 −2 58 south 1.6 × 10 −2 9 glacier 3.6 × 10 −2 59 geograph 1.6 × 10 −2 10 pleistocen 3.4 × 10 −2 60 period 1.6 × 10 −2 11 veget 3.3 × 10 −2 61 stratigraph 1.5 × 10 −2 12 sea 3.3 × 10 −2 62 season 1.5 × 10 −2 13 river 3.1 × 10 −2 63 east 1.5 × 10 −2 14 region 3. 3.6 × 10 −2 72 method 2 × 10 −2 23 sandston 3.6 × 10 −2 73 geochemistri 2 × 10 −2 24 metamorph 3.5 × 10 −2 74 clastic 2 × 10 −2 25 jurass 3.4 × 10 −2 75 grain 2 × 10 −2 26 middl 3.3 × 10 −2 76 ocean 2 × 10 −2 27 continent 3.3 × 10 −2 77 paleozo 1.9 × 10 −2 28 shallow 3.3 × 10 −2 78 uplift 1.9 × 10 −2 29 crust 3.2 × 10 −2 79 form 1.9 × 10 −2 30 earli 3.2 × 10 −2 80 date 1.9 × 10 −2 31 north 3.2 × 10 −2 81 siliciclast 1.9 × 10 −2 32 fossil 3.2 × 10 −2 82 fauna 1.9 × 10 −2 33 southern 3.2 × 10 −2 83 eocen 1.9 × 10 −2 34 limeston 3.1 × 10 −2 84 biostratigraph 1.9 × 10 −2 35 northern 3 × 10 −2 85 cambrian 1.9 × 10 −2 36 interpret 3 × 10 −2 86 occur 1.8 × 10 −2 37 belt 3 × 10 −2 87 coeval 1.8 × 10 −2 38 sea 3 × 10 −2 88 western 1.8 × 10 −2 39 orogen 2.9 × 10 −2 89 terran 1.8 × 10 −2 40 eastern 2.9 × 10 −2 90 central 1. dementia 7.5 × 10 −2 54 caregiv 1.6 × 10 −2 5 cognit 6.9 × 10 −2 55 frail 1.6 × 10 −2 6 adult 6.5 × 10 −2 56 adjust 1.5 × 10 −2 7 conclus 6 × 10 −2 57 daili 1.5 × 10 −2 8 particip 5.7 × 10 −2 58 comorbid 1.5 × 10 −2 9 geriatr 5.2 × 10 −2 59 result 1.5 × 10 −2 10 alzheim 5 × 10 −2 60 memori 1. 1.9 × 10 −2 93 includ 1 × 10 −2 44 symptom 1.9 × 10 −2 94 sex 1 × 10 −2 45 men 1.9 × 10 −2 95 walk 1 × 10 −2 46 signific 1.9 × 10 −2 96 brain 1 × 10 −2 47 cohort 1.8 × 10 −2 97 odd 1 × 10 −2 48 popul 1.8 × 10 −2 98 logist 1 × 10 −2 49 background 1.8 × 10 −2 99 interview 9.9 × 10 −3 50 group 1.8 × 10 −2 100 amyloid 9.9 × 10 −3 sustain 3.7 × 10 −2 52 system 7.2 × 10 −3 3 environment 3.3 × 10 −2 53 oper 7.2 × 10 −3 4 renew 2.7 × 10 −2 54 ghg 6.9 × 10 −3 5 co2 2.3 × 10 −2 55 yield 6.8 × 10 −3 6 product 2.3 × 10 −2 56 develop 6.7 × 10 −3 7 econom 2.2 × 10 −2 57 gene 6.7 × 10 −3 8 fuel 2.2 × 10 −2 58 plant 6.5 × 10 −3 9 patient 1.9 × 10 −2 59 biodiesel 6.5 × 10 −3 10 wast 1.8 × 10 −2 60 climat 6.5 × 10 −3 11 turbin 1.7 × 10 −2 61 solvent 6.4 × 10 −3 12 carbon 1.6 × 10 −2 62 age 6.3 × 10 −3 13 catalyst 1.6 × 10 −2 63 express 6.3 × 10 −3 14 solar 1.6 × 10 −2 64 eco 6.2 × 10 −3 15 wind 1.5 × 10 −2 65 catalyt 6.1 × 10 −3 16 biomass 1.5 × 10 −2 66 save 6 × 10 −3 17 electr 1.5 × 10 −2 67 temperatur 5.8 × 10 −3 18 effici 1. 1.6 × 10 −2 92 expenditur 9.9 × 10 −3 43 year 1.6 × 10 −2 93 payment 9.9 × 10 −3 44 regress 1.6 × 10 −2 94 address 9.9 × 10 −3 45 data 1.6 × 10 −2 95 live 9.8 × 10 −3 46 incom 1.5 × 10 −2 96 satisfact 9.8 × 10 −3 47 background 1.5 × 10 −2 97 identifi 9.8 × 10 −3 48 access 1.5 × 10 −2 98 report 9.8 × 10 −3 49 implement 1.5 × 10 −2 99 receiv 9.7 × 10 −3 50 ill 1.5 × 10 −2 100 energi 9.5 × 10 −3 philosoph 3.6 × 10 −2 55 polit 1 × 10 −2 6 ethic 3.1 × 10 −2 56 context 9.6 × 10 −3 7 result 3 × 10 −2 57 discuss 9.6 × 10 −3 8 argument 2.9 × 10 −2 58 decreas 9.6 × 10 −3 9 scientist 2.8 × 10 −2 59 british 9.6 × 10 −3 10 articl 2.5 × 10 −2 60 higher 9.4 × 10 −3 11 claim 2.3 × 10 −2 61 draw 9.4 × 10 −3 12 philosophi 2.3 × 10 −2 62 obtain 9.3 × 10 −3 13 epistem 2.2 × 10 −2 63 measur 9.2 × 10 −3 14 view 2 × 10 −2 64 darwin 9.2 × 10 −3 15 histori 2 × 10 −2 65 contemporari 9.1 × 10 −3 16 theori 1.9 × 10 −2 66 temperatur 9.1 × 10 −3 17 essay 1.9 × 10 −2 67 increas 9 × 10 −3 18 nineteenth 1. research 3.9 × 10 −2 56 motiv 1 × 10 −2 7 athlet 3.5 × 10 −2 57 author 1 × 10 −2 8 market 3.2 × 10 −2 58 tour 1 × 10 −2 9 social 3.2 × 10 −2 59 explor 1 × 10 −2 10 visitor 2.7 × 10 −2 60 leagu 1 × 10 −2 11 footbal 2.2 × 10 −2 61 simul 9.9 × 10 −3 12 travel librarian 3.7 × 10 −2 54 provid 9.2 × 10 −3 5 citat 3.7 × 10 −2 55 collect 9.1 × 10 −3 6 user 3.2 × 10 −2 56 protein 9.1 × 10 −3 7 journal 2.6 × 10 −2 57 explor 9 × 10 −3 8 public 2.5 × 10 −2 58 bibliograph 9 × 10 −3 9 social 2.5 × 10 −2 59 induc 9 × 10 −3 10 scienc 2.4 × 10 −2 60 purpos 9 × 10 −3 11 web 2.3 × 10 −2 61 data 8.9 × 10 −3 12 academ conclus 9.6 × 10 −2 53 therapeut 1.7 × 10 −2 4 treatment 6.6 × 10 −2 54 drug 1.7 × 10 −2 5 extract 6.6 × 10 −2 55 serum 1.6 × 10 −2 6 herbal 6.2 × 10 −2 56 cytotox 1.6 × 10 −2 7 tradit 6 × 10 −2 57 western 1.5 × 10 −2 8 chines 5.3 × 10 −2 58 aim 1.5 × 10 −2 9 rat 5.2 × 10 −2 59 express 1.5 × 10 −2 10 activ 5 × 10 −2 60 day 1.5 × 10 −2 11 effect 5 × 10 −2 61 materi 1.5 × 10 −2 12 inhibit 4.9 × 10 −2 62 radix 1. argu 6.6 × 10 −2 56 studi 1.7 × 10 −2 7 judici 6.2 × 10 −2 57 normat 1.7 × 10 −2 8 suprem 6.2 × 10 −2 58 case 1.6 × 10 −2 9 right 5.5 × 10 −2 59 oblig 1.6 × 10 −2 10 legisl 5.2 × 10 −2 60 wto 1.6 × 10 −2 11 justic 5.1 × 10 −2 61 union 1.5 × 10 −2 12 doctrin 5 × 10 −2 62 privat 1.5 × 10 −2 13 rule 4.2 × 10 −2 63 cell 1.5 × 10 −2 14 feder 3.8 × 10 −2 64 concern 1.5 × 10 −2 15 statut 3.7 × 10 −2 65 prosecut 1. write 7.5 × 10 −2 58 uber 3 × 10 −2 9 die 6.9 × 10 −2 59 book 2.9 × 10 −2 10 das 6.9 × 10 −2 60 anderen 2.9 × 10 −2 11 text 6.8 × 10 −2 61 deutschen 2.9 × 10 −2 12 narrat 6.8 × 10 −2 62 werk 2.9 × 10 −2 13 articl 6.7 × 10 −2 63 zwei 2. medic 3.6 × 10 −2 52 access 6.8 × 10 −3 3 inform 3.1 × 10 −2 53 water 6.7 × 10 −3 4 data 2.5 × 10 −2 54 cell 6.5 × 10 −3 5 care 2.3 × 10 −2 55 servic 6.3 × 10 −3 6 clinic 2.3 × 10 −2 56 person 6.3 × 10 −3 7 patient 2.1 × 10 −2 57 communic 6.2 × 10 −3 8 healthcar 2.1 × 10 −2 58 concentr 6.1 × 10 −3 9 method 2 × 10 −2 59 nurs 6 × 10 −3 10 object 1.9 × 10 −2 60 background 6 × 10 −3 11 ehr 1.9 × 10 −2 61 train 5.9 × 10 −3 12 user 1.8 × 10 −2 62 evalu 5.8 × 10 −3 13 base 1. 1.6 × 10 −2 71 renal 7 × 10 −3 22 object 1.6 × 10 −2 72 abbott 7 × 10 −3 23 healthi 1.6 × 10 −2 73 spectrometri 6.9 × 10 −3 24 paper 1.6 × 10 −2 74 determin 6.9 × 10 −3 25 carcinoma 1.6 × 10 −2 75 level 6.9 × 10 −3 26 marker 1.5 × 10 −2 76 protein 6.8 × 10 −3 27 detect 1.5 × 10 −2 77 benign 6.8 × 10 −3 28 context 1.5 × 10 −2 78 quantif 6.7 × 10 −3 29 specimen 1.4 × 10 −2 79 negat 6.7 × 10 −3 30 needl 1.4 × 10 −2 80 specif 6.6 × 10 −3 31 cell 1.3 × 10 −2 81 compar 6.6 × 10 −3 32 evalu 1.3 × 10 −2 82 prognost 6.5 × 10 −3 33 routin 1.3 × 10 −2 83 cholesterol 6.5 × 10 −3 34 signific 1.2 × 10 −2 84 kit 6.5 × 10 −3 35 pathologist 1.2 × 10 −2 85 hemoglobin 6.5 × 10 −3 36 stain 1.2 × 10 −2 86 simul 6.4 × 10 −3 37 neoplasm 1.2 × 10 −2 87 subject 6.4 × 10 −3 38 test 1.2 × 10 −2 88 tandem 6.4 × 10 −3 39 diagnos 1.2 × 10 −2 89 medicin 6.4 × 10 −3 40 preanalyt 1.2 × 10 −2 90 surfac 6.4 × 10 −3 41 chromatographi 1.2 × 10 −2 91 tissu 6.3 × 10 −3 42 malign 1.2 × 10 −2 92 antibodi 6.3 × 10 −3 43 pcr 1.1 × 10 −2 93 biobank 6.3 × 10 −3 44 imprecis 1.1 × 10 −2 94 rare 6.2 × 10 −3 45 immunohistochem 1.1 × 10 −2 95 assess 6.2 × 10 −3 46 correl 1.1 × 10 −2 96 valu 6.1 × 10 −3 47 elisa 1.1 × 10 −2 97 autom 6.1 × 10 −3 48 cytomorpholog 9.9 × 10 −3 98 diabet 5.9 × 10 −3 49 sensit 9.9 × 10 −3 99 studi 5.9 × 10 −3 50 biopsi 9.7 × 10 −3 100 interassay 5.9 × 10 −3 2 × 10 −2 77 day 1.1 × 10 −2 28 includ 2 × 10 −2 78 nation 1 × 10 −2 29 assess 1.9 × 10 −2 79 admit 1 × 10 −2 30 diagnos 1.9 × 10 −2 80 prevent 1 × 10 −2 31 studi 1.9 × 10 −2 81 odd 1 × 10 −2 32 symptom 1.8 × 10 −2 82 incid 1 × 10 −2 33 receiv 1.8 × 10 −2 83 death 1 × 10 −2 34 retrospect 1.7 × 10 −2 84 treat 1 × 10 −2 35 women 1.7 × 10 −2 85 advers 9.7 × 10 −3 36 propos 1.7 × 10 −2 86 case 9.6 × 10 −3 37 mortal 1.7 × 10 −2 87 medicin 9.5 × 10 −3 38 among 1.7 × 10 −2 88 section 9.5 × 10 −3 39 signific 1.7 × 10 −2 89 surgeri 9.3 × 10 −3 40 therapi 1.7 × 10 −2 90 prospect 9.3 × 10 −3 41 chronic 1.6 × 10 −2 91 common 9.3 × 10 −3 42 score 1.6 × 10 −2 92 complic 9.3 × 10 −3 43 diabet 1.6 × 10 −2 93 men 9.2 × 10 −3 44 preval 1.6 × 10 −2 94 hypertens 9.1 × 10 −3 45 pain 1.5 × 10 −2 95 interv 9.1 × 10 −3 46 trial 1.5 × 10 −2 96 baselin 9.1 × 10 −3 47 total 1.5 × 10 −2 97 older 9 × 10 −3 48 januari 1.4 × 10 −2 98 mean 9 × 10 −3 49 cohort 1.4 × 10 −2 99 demograph 9 × 10 −3 50 follow 1.4 × 10 −2 100 heart 8.8 × 10 −3 str 3.5 × 10 −2 56 report 7.9 × 10 −3 7 casework 3.4 × 10 −2 57 accident 7.9 × 10 −3 8 toxicolog 2.6 × 10 −2 58 sex 7.8 × 10 −3 9 victim 2.6 × 10 −2 59 profil 7.6 × 10 −3 10 legal 2.5 × 10 −2 60 cadav 7.5 × 10 −3 11 crime 2.3 × 10 −2 61 pmi 7.5 × 10 −3 12 sampl 2.2 × 10 −2 62 detect 7.4 × 10 −3 13 crimin 2 × 10 −2 63 anthropologist 7.4 × 10 −3 14 suicid 1.8 × 10 −2 64 medic 7.3 × 10 −3 15 homicid 1.8 × 10 −2 65 examin 7.3 × 10 −3 16 dna 1.8 × 10 −2 66 evid 7.2 × 10 −3 17 fatal 1.7 × 10 −2 67 toxic 7.1 × 10 −3 18 medico 1.6 × 10 −2 68 caus 6.8 × 10 −3 19 male 1.5 × 10 −2 69 noael 6.8 × 10 −3 20 suspect 1.5 × 10 −2 70 human 6.8 × 10 −3 21 identif 1.4 × 10 −2 71 function 6.8 × 10 −3 22 strs 1.3 × 10 −2 72 mortem 6.6 × 10 −3 23 injuri 1.3 × 10 −2 73 year 6.3 × 10 −3 24 substanc 1.3 × 10 −2 74 murder 6.2 × 10 −3 25 blood 1.2 × 10 −2 75 assault 6.2 × 10 −3 26 scene 1.2 × 10 −2 76 amelogenin 6.2 × 10 −3 27 abus 1.2 × 10 −2 77 hair 6.2 × 10 −3 28 polic 1.2 × 10 −2 78 medicin 6.1 × 10 −3 29 firearm 1.1 × 10 −2 79 pmct 5.9 × 10 −3 30 tandem 1.1 × 10 −2 80 exposur 5.9 × 10 −3 31 femal 1.1 × 10 −2 81 popul 5.9 × 10 −3 32 antemortem 1.1 × 10 −2 82 d12s391 5.9 × 10 −3 33 anthropolog 1.1 × 10 −2 83 weapon 5.8 × 10 −3 34 pathologist 1.1 × 10 −2 84 autosom 5.8 × 10 −3 35 kit 1.1 × 10 −2 85 repeat 5.8 × 10 −3 36 corps 9.9 × 10 −3 86 seiz 5.8 × 10 −3 37 medicoleg 9.8 × 10 −3 87 structur 5.8 × 10 −3 38 skelet 9.8 × 10 −3 88 skull 5.7 × 10 −3 39 court 9.5 × 10 −3 89 trauma 5.7 × 10 −3 40 intox 9.4 × 10 −3 90 ancestri 5.6 × 10 −3 41 drug 9.3 × 10 −3 91 properti 5.6 × 10 −3 42 allel 9.3 × 10 −3 92 age 5.6 × 10 −3 43 powerplex 9.2 × 10 −3 93 multiplex 5.5 × 10 −3 44 discrimin 9 × 10 −3 94 spectrometri 5.5 × 10 −3 45 sudden 8.8 × 10 −3 95 lethal 5.5 × 10 −3 46 deceas 8.6 × 10 −3 96 poison 5.5 × 10 −3 47 fingermark 8. climat 9.5 × 10 −2 52 land 2 × 10 −2 3 aerosol 6.7 × 10 −2 53 vertic 2 × 10 −2 4 season 5.8 × 10 −2 54 atlant 2 × 10 −2 5 precipit 5.4 × 10 −2 55 patient 2 × 10 −2 6 meteorolog 5.2 × 10 −2 56 scale 2 × 10 −2 7 wind 5.1 × 10 −2 57 resolut 2 × 10 −2 8 region 5 × 10 −2 58 ensembl 2 × 10 −2 9 ocean 5 × 10 −2 59 microphys 1.9 × 10 −2 10 weather 5 × 10 −2 60 wrf 1.8 × 10 −2 11 summer 4.9 × 10 −2 61 radiat 1.8 × 10 −2 12 air 4.6 × 10 −2 62 intercomparison 1.8 × 10 −2 13 tropospher 4.6 × 10 −2 63 sst 1.8 × 10 −2 14 warm 4.6 × 10 −2 64 pollut 1.7 × 10 −2 15 tropic 4.4 × 10 −2 65 event 1.7 × 10 −2 16 sea 4.4 × 10 −2 66 synopt 1.7 × 10 −2 17 forecast 4.2 × 10 −2 67 zonal 1.7 × 10 −2 18 winter 3.9 × 10 −2 68 altitud 1.7 × 10 −2 19 satellit 3.7 × 10 −2 69 anthropogen 1.7 × 10 −2 20 rainfal 3.7 × 10 −2 70 averag 1.7 × 10 −2 21 cloud 3.6 × 10 −2 71 estim 1.7 × 10 −2 22 observ 3.5 × 10 −2 72 rain 1.6 × 10 −2 23 convect 3.4 × 10 −2 73 midlatitud 1.6 × 10 −2 24 climatolog 3.3 × 10 −2 74 eastern 1.6 × 10 −2 25 global 3.1 × 10 −2 75 impact 1.6 × 10 −2 26 north 3.1 × 10 −2 76 diurnal 1.6 × 10 −2 27 reanalysi 3 × 10 −2 77 east 1.6 × 10 −2 28 circul 2.9 × 10 −2 78 simul 1.6 × 10 −2 29 model 2.9 × 10 −2 79 water 1.6 × 10 −2 30 cyclon 2.9 × 10 −2 80 data 1.6 × 10 −2 31 station 2.9 × 10 −2 81 conclus 1.6 × 10 −2 32 pacif 2.7 × 10 −2 82 variat 1.6 × 10 −2 33 period 2.6 × 10 −2 83 humid 1.6 × 10 −2 34 temperatur 2.6 × 10 −2 84 south 1.5 × 10 −2 35 southern 2.5 × 10 −2 85 nino 1.5 × 10 −2 36 variabl 2.5 × 10 −2 86 enso 1.5 × 10 −2 37 monsoon 2.5 × 10 −2 87 mesoscal 1. 8.8 × 10 −3 77 paper 3.9 × 10 −3 28 confoc 8.7 × 10 −3 78 flim 3.9 × 10 −3 29 tomographi 8.7 × 10 −3 79 nanoscal 3.9 × 10 −3 30 laser 8.6 × 10 −3 80 organell 3.9 × 10 −3 31 tissu 8.5 × 10 −3 81 model 3.9 × 10 −3 32 scatter 8.4 × 10 −3 82 correct 3.9 × 10 −3 33 surfac 8.4 × 10 −3 83 focal 3.8 × 10 −3 34 holographi 7.6 × 10 −3 84 wavefront 3.8 × 10 −3 35 detector 7.5 × 10 −3 85 hologram 3.8 × 10 −3 36 structur 7.5 × 10 −3 86 allow 3.8 × 10 −3 37 section 7.3 × 10 −3 87 vacuol 3.8 × 10 −3 38 eel 7.2 × 10 −3 88 immunoreact 3.7 × 10 −3 39 stem 6.8 × 10 −3 89 tomograph 3.7 × 10 −3 40 thin 6.5 × 10 −3 90 year 3.7 × 10 −3 41 cytoplasm 6.3 × 10 −3 91 dentin 3.7 × 10 −3 42 fib 6.2 × 10 −3 92 subcellular 3.6 × 10 −3 43 contrast 6.2 × 10 −3 93 risk 3.6 × 10 −3 44 dark 6.2 × 10 −3 94 manag 3.6 × 10 −3 45 conclus 6.1 × 10 −3 95 defocus 3.6 × 10 −3 46 stain 6 × 10 −3 96 holder 3.6 × 10 −3 47 patient 5.9 × 10 −3 97 particip 3.6 × 10 −3 48 multiphoton 5.8 × 10 −3 98 axi 3.5 × 10 −3 49 illumin 5.6 × 10 −3 99 dimension 3.5 × 10 −3 50 monochrom 5.6 × 10 −3 100 microstructur 3.4 × 10 −3 .4 × 10 −2 51 recoveri 6 × 10 −3 2 coal 7.2 × 10 −2 52 background 6 × 10 −3 3 ore 6.2 × 10 −2 53 porphyri 5.9 × 10 −3 4 rock 5.5 × 10 −2 54 orebodi 5.9 × 10 −3 5 miner 4.4 × 10 −2 55 depth 5.9 × 10 −3 6 flotat 3.8 × 10 −2 56 sandston 5.8 × 10 −3 7 geolog 2.8 × 10 −2 57 pit 5.7 × 10 −3 8 underground 2.4 × 10 −2 week 2 × 10 −2 85 malnutrit 1.1 × 10 −2 36 studi 2 × 10 −2 86 cardiovascular 1.1 × 10 −2 37 insulin 2 × 10 −2 87 plasma 1 × 10 −2 38 assess 2 × 10 −2 88 content 1 × 10 −2 39 fruit 1.9 × 10 −2 89 lipoprotein 1 × 10 −2 40 antioxid 1.9 × 10 −2 90 trial 1 × 10 −2 41 adipos 1.8 × 10 −2 91 polyunsatur 1 × 10 −2 42 risk 1.8 × 10 −2 92 calori 9.9 × 10 −3 43 subject 1.7 × 10 −2 93 concentr 9.9 × 10 −3 44 serum 1.7 × 10 −2 94 breakfast 9.9 × 10 −3 45 group 1.7 × 10 −2 95 hdl 9.6 × 10 −3 46 intervent 1.7 × 10 −2 96 ffq 9.6 × 10 −3 47 background 1.7 × 10 −2 97 infant 9.4 × 10 −3 48 fed 1.6 × 10 −2 98 effect 9.2 × 10 −3 49 anthropometr 1.6 × 10 −2 99 decreas 8.9 × 10 −3 50 mass 1.6 × 10 −2 100 daili 8.9 × 10 −3 1.5 × 10 −1 51 activ 9.7 × 10 −3 2 laser 7.6 × 10 −2 52 risk 9.7 × 10 −3 3 wavelength 5.4 × 10 −2 53 band 9.5 × 10 −3 4 photon 4.2 × 10 −2 54 telescop 9.4 × 10 −3 5 fiber 3.8 × 10 −2 55 assess 9.3 × 10 −3 6 light 2.9 × 10 −2 56 silicon 9.2 × 10 −3 7 beam 2.8 × 10 −2 57 aim 9.1 × 10 −3 8 conclus 2.5 × 10 −2 nestl 7 × 10 −2 56 owl 1.9 × 10 −2 7 season 5.8 × 10 −2 57 songbird 1.9 × 10 −2 8 popul 5.6 × 10 −2 58 bill 1.9 × 10 −2 9 brood 5.4 × 10 −2 59 tit 1.8 × 10 −2 10 avian 5.2 × 10 −2 60 adult 1.8 × 10 −2 11 forag 5.2 × 10 −2 61 record 1.8 × 10 −2 12 chick 4.3 × 10 −2 62 juvenil 1.8 × 10 −2 13 migratori 4.2 × 10 −2 63 eastern 1.7 × 10 −2 14 winter 3.9 × 10 −2 64 paper 1.7 × 10 −2 15 conserv 3.9 × 10 −2 65 tern 1.7 × 10 −2 16 clutch 3.8 × 10 −2 66 landscap 1.7 × 10 −2 17 prey 3.6 × 10 −2 67 southern 1.7 × 10 −2 18 reproduct 3.6 × 10 −2 68 raptor 1.6 × 10 −2 19 plumag 3.6 × 10 −2 69 recaptur 1.6 × 10 −2 20 fledg 3.6 × 10 −2 70 endem 1.6 × 10 −2 21 egg 3.5 × 10 −2 71 eurasian 1.5 × 10 −2 22 site 3.4 × 10 −2 72 variat 1.5 × 10 −2 23 predat 3.2 × 10 −2 73 eagl 1.5 × 10 −2 24 passerin 3.2 × 10 −2 74 patient 1.5 × 10 −2 25 area 3.2 × 10 −2 75 mate 1.5 × 10 −2 26 territori 3.2 × 10 −2 76 sex 1.5 × 10 −2 27 warbler 3.2 × 10 −2 77 gull 1.5 × 10 −2 28 ecolog 3.2 × 10 −2 78 distanc 1.5 × 10 −2 29 abund 3. postop 7.9 × 10 −2 56 studi 2.7 × 10 −2 7 pain 7.8 × 10 −2 57 method 2.7 × 10 −2 8 hip 7.4 × 10 −2 58 later 2.6 × 10 −2 9 conclus 6.6 × 10 −2 59 distal 2.6 × 10 −2 10 surgic 6.6 × 10 −2 60 foot 2.5 × 10 −2 11 fractur 6.2 × 10 −2 61 group 2.5 × 10 −2 12 femor 6.2 × 10 −2 62 signific 2.5 × 10 −2 13 background 6.1 × 10 −2 63 treat 2.5 × 10 −2 14 score 5. 4.4 × 10 −2 77 acl 2 × 10 −2 28 underw 4.3 × 10 −2 78 motion 1.9 × 10 −2 29 tibial 4.2 × 10 −2 79 limb 1.9 × 10 −2 30 purpos 4.2 × 10 −2 80 loosen 1.9 × 10 −2 31 follow 4 × 10 −2 81 valgus 1.9 × 10 −2 32 posterior 4 × 10 −2 82 procedur 1.9 × 10 −2 33 month 4 × 10 −2 83 cadaver 1.8 × 10 −2 34 retrospect 4 × 10 −2 84 elbow 1.8 × 10 −2 35 medial 3.9 × 10 −2 85 reconstruct 1.8 × 10 −2 36 shoulder 3.8 × 10 −2 86 proxim 1.8 × 10 −2 37 complic 3.7 × 10 −2 87 tha 1.8 × 10 −2 38 mean 3.6 × 10 −2 88 heal 1.8 × 10 −2 39 spine 3.6 × 10 −2 89 patellar 1.6 × 10 −2 40 tka 3.6 × 10 −2 90 tibia 1.6 × 10 −2 41 tendon 3.5 × 10 −2 91 summari 1.6 × 10 −2 42 age 3.5 × 10 −2 92 rotat 1.6 × 10 −2 43 orthopaed 3.4 × 10 −2 93 disloc 1.6 × 10 −2 44 ankl 3.4 × 10 −2 94 scoliosi 1.6 × 10 −2 45 arthroscop 3.1 × 10 −2 95 consecut 1.6 × 10 −2 46 lumbar 3.1 × 10 −2 96 anatom 1.6 × 10 −2 47 revis 3.1 × 10 −2 97 tear 1.6 × 10 −2 48 cruciat 3 × 10 −2 98 disabl 1.6 × 10 −2 49 screw 2.9 × 10 −2 99 degen 1.6 × 10 −2 50 paper 2.9 × 10 −2 100 result 1.5 × 10 −2 cue 3.6 × 10 −2 54 cell 9.8 × 10 −3 5 behavior 3.5 × 10 −2 55 evid 9.7 × 10 −3 6 suggest 3 × 10 −2 56 train 9.7 × 10 −3 7 respons 2.9 × 10 −2 57 reinforc 9.6 × 10 −3 8 particip 2.8 × 10 −2 58 elicit 9.6 × 10 −3 9 cognit 2.6 × 10 −2 59 cortisol 9.5 × 10 −3 10 whether 2.3 × 10 −2 60 auditori 9.3 × 10 −3 11 erp 2.1 × 10 −2 61 avers 9.3 × 10 −3 particip 7.4 × 10 −2 52 cbt 1.8 × 10 −2 3 depress 6.8 × 10 −2 53 among 1.7 × 10 −2 4 symptom 6.7 × 10 −2 54 paper 1.7 × 10 −2 5 anxieti 5.7 × 10 −2 55 youth 1.7 × 10 −2 6 cognit 5.2 × 10 −2 56 trauma 1.6 × 10 −2 7 examin 4.9 × 10 −2 57 cell 1.6 × 10 −2 8 intervent 4.4 × 10 −2 58 relat parent 9.7 × 10 −2 53 relat 1.7 × 10 −2 4 age 8.7 × 10 −2 54 hyperact 1.6 × 10 −2 5 child 8.7 × 10 −2 55 anxieti 1.6 × 10 −2 6 autism 8.3 × 10 −2 56 relationship 1.6 × 10 −2 7 examin 5.6 × 10 −2 57 cell 1.6 × 10 −2 8 youth 5.6 × 10 −2 58 languag 1.5 × 10 −2 9 asd 5.4 × 10 −2 59 month 1.5 × 10 −2 10 social 5 × 10 −2 60 suggest 1.4 × 10 −2 11 year 4.9 × 10 −2 61 toddler unconsci 6 × 10 −2 55 idea 1.5 × 10 −2 6 countertransfer 5.9 × 10 −2 56 ferenczi 1.5 × 10 −2 7 psychoanalyst 5.8 × 10 −2 57 enact 1.5 × 10 −2 8 psychic 5.6 × 10 −2 polici 9.7 × 10 −2 53 focus 9.9 × 10 −3 4 articl 5.3 × 10 −2 54 make 9.8 × 10 −3 5 polit 4.7 × 10 −2 55 author 9.7 × 10 −3 6 reform 3.7 × 10 −2 56 issu 9.6 × 10 −3 7 sector 3.7 × 10 −2 57 government 9.6 × 10 −3 8 social 3.6 × 10 −2 propos 4.2 × 10 −2 54 treatment 9.6 × 10 −3 5 actuat 3.4 × 10 −2 55 protein 9.6 × 10 −3 6 humanoid 3.2 × 10 −2 56 wheel articl 5.5 × 10 −2 51 justic 9.8 × 10 −3 2 argu 5.3 × 10 −2 52 religion 9.6 × 10 −3 3 ethic 4.8 × 10 −2 53 issu 9.6 × 10 −3 4 social 4.2 × 10 −2 54 paramet 9.5 × 10 −3 5 polici 3.9 × 10 −2 55 oblig 9.4 × 10 −3 6 polit 2.9 × 10 −2 56 surfac 9.4 × 10 −3 7 moral 2.8 × 10 −2 57 individu 9.3 × 10 −3 8 welfar 2.6 × 10 −2 

The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the

sea 6.5 × 10 −2 53 popul 1.7 × 10 −2 4 marin 6 × 10 −2 54 gulf 1.7 × 10 −2 5 habitat 5.1 × 10 −2 55 diatom 1.6 × 10 −2 6 water 4.7 × 10 −2 56 pacif 1.6 × 10 −2 7 abund 4.6 × 10 −2 57 larval 1.6 × 10 −2 8 coastal 3.8 × 10 −2 58 reproduct 1.5 × 10 −2 9 benthic 3.6 × 10 −2 59 mediterranean 1.5 × 10 −2 10 ecosystem 3.6 × 10 −2 60 chlorophyl 1.5 × 10 −2 11 river 3.2 × 10 −2 61 mussel 1.5 × 10 −2 12 atlant 3.2 × 10 −2 62 plankton 1.5 × 10 −2 13 fisheri 3. 2 × 10 −2 76 attach 1 × 10 −2 27 coat 2 × 10 −2 77 exhibit 1 × 10 −2 28 regener 2 × 10 −2 78 biolog 1 × 10 −2 29 bioactiv 2 × 10 −2 79 gelatin 1 × 10 −2 30 engin 2 × 10 −2 80 synthes 1 × 10 −2 31 load 1.9 × 10 −2 81 sem 1 × 10 −2 32 glycol 1.9 × 10 −2 82 dox 9.9 × 10 −3 33 chitosan 1.9 × 10 −2 83 fluoresc 9.7 × 10 −3 34 encapsul 1.8 × 10 −2 84 extracellular 9.6 × 10 −3 35 osteoblast 1.8 × 10 −2 85 titanium 9.5 × 10 −3 36 biodegrad 1.8 × 10 −2 86 gel 9.4 × 10 −3 37 polymer 1.7 × 10 −2 87 methacryl 9.3 × 10 −3 38 materi 1.7 × 10 −2 88 caprolacton 9 × 10 −3 39 osteogen 1.6 × 10 −2 89 doxorubicin 9 × 10 −3 40 phosphat 1.6 × 10 −2 90 spectroscopi 8.8 × 10 −3 41 viabil 1.6 × 10 −2 91 immobil 8.6 × 10 −3 42 promis 1.6 × 10 −2 92 assay 8.6 × 10 −3 43 fabric 1 2.1 × 10 −2 81 densiti 1 × 10 −2 32 densif 2 × 10 −2 82 hydroxyapatit 1 × 10 −2 33 spectroscopi 2 × 10 −2 83 diseas 1 × 10 −2 34 particl 2 × 10 −2 84 associ 1 × 10 −2 35 size 1.9 × 10 −2 85 character 1 × 10 −2 36 oxid 1.9 × 10 −2 86 paper 1 × 10 −2 37 conclus 1.9 × 10 −2 87 zrb2 9.9 × 10 −3 38 poros 1.8 × 10 −2 88 chemic 9.8 × 10 −3 39 crystallin 1.8 × 10 −2 89 exhibit 9.7 × 10 −3 40 calcin 1.8 × 10 −2 90 b2o3 9.7 × 10 −3 41 structur 1.8 × 10 −2 91 investig 9.7 × 10 −3 42 fabric 1.8 × 10 −2 92 level 9.6 × 10 −3 43 mpa 1.8 × 10 −2 93 mullit 9.6 × 10 −3 44 patient 1.7 × 10 −2 94 pure 9.6 × 10 −3 45 coat 1.6 × 10 −2 95 mol 9.6 × 10 −3 46 morpholog 1.6 × 10 −2 96 mechan 9.5 × 10 −3 47 sio2 1.6 × 10 −2 97 dispers 9.5 × 10 −3 48 zirconia 1.5 × 10 −2 98 vol 9.5 × 10 −3 49 precursor 1.5 × 10 −2 99 silic 9.5 × 10 −3 50 zro2 1.4 × 10 −2 100 dens 9.4 × 10 −3 207 3.5 × 10 −2 51 surfac 6.8 × 10 −3 2 crack 3 × 10 −2 52 activ 6.6 × 10 −3 3 materi 2.8 × 10 −2 53 uniaxi 6.6 × 10 −3 4 stress 2.3 × 10 −2 54 asphalt 6.4 × 10 −3 5 load 2.2 × 10 −2 55 diffract 6.1 × 10 −3 6 specimen 2.1 × 10 −2 56 damag 6.1 × 10 −3 7 tensil 2.1 × 10 −2 57 group 6 × 10 −3 8 strain 1.8 × 10 −2 58 ductil 6 × 10 −3 9 test 1.7 × 10 −2 59 displac 5.9 × 10 −3 10 weld 1.7 × 10 −2 60 popul 5.9 × 10 −3 11 strength 1.7 × 10 −2 61 plate 5.9 × 10 −3 12 deform 1.6 × 10 −2 62 harden 5.5 × 10 −3 13 patient 1.6 × 10 −2 63 cell 5.4 × 10 −3 14 microstructur 1 51 trial 9.7 × 10 −3 2 express 4 × 10 −2 52 stem 9.7 × 10 −3 3 patient 3.9 × 10 −2 53 background 9.5 × 10 −3 4 diseas 3.1 × 10 −2 54 mrna 9.4 × 10 −3 5 treatment 2.8 × 10 −2 55 temperatur 9.3 × 10 −3 6 clinic 2.7 × 10 −2 56 target 9.2 × 10 −3 7 paper 2.7 × 10 −2 57 mous 9.1 × 10 −3 8 protein 2.5 × 10 −2 58 upregul 9 × 10 −3 9 conclus 2.5 × 10 −2 59 week 8.9 × 10 −3 10 induc 2.3 × 10 −2 60 chronic 8.8 × 10 −3 11 mice 2.3 × 10 −2 61 mesenchym 8.8 × 10 −3 12 therapeut 2.2 × 10 −2 62 endotheli 8.7 × 10 −3 13 therapi 2.1 × 10 −2 63 drug 8.4 × 10 −3 14 studi 2.1 × 10 −2 64 downregul 8.3 × 10 −3 15 tissu 2 × 10 −2 65 marker 8.1 × 10 −3 16 tumor 2 × 10 −2 66 liver 7.9 × 10 −3 17 gene 1.9 × 10 −2 67 injuri 7.9 × 10 −3 18 signific 1.9 × 10 −2 68 inflamm 7.9 × 10 −3 19 immun 1.8 × 10 −2 69 bone 7.8 × 10 −3 20 cancer 1.8 × 10 −2 70 administr 7.8 × 10 −3 21 inhibit 1.8 × 10 −2 71 diabet 7.7 × 10 −3 22 human 1.7 × 10 −2 72 control 7.7 × 10 −3 23 vaccin 1.7 × 10 −2 73 inhibitor 7.7 × 10 −3 24 blood 1.7 × 10 −2 74 factor 7.7 × 10 −3 25 rat 1.5 × 10 −2 75 increas 7.7 × 10 −3 26 prolifer 1.5 × 10 −2 76 anim 7.6 × 10 −3 27 vitro 1.5 × 10 −2 77 simul 7.4 × 10 −3 28 receptor 1.5 × 10 −2 78 energi 7.3 × 10 −3 29 associ 1.4 × 10 −2 79 progress 7.1 × 10 −3 30 treat 1.4 × 10 −2 80 vascular 7 × 10 −3 31 vivo 1.3 × 10 −2 81 polymeras 7 × 10 −3 32 level 1.3 × 10 −2 82 surviv 6.9 × 10 −3 33 aim 1.3 × 10 −2 83 interleukin 6.8 × 10 −3 34 may 1.3 × 10 −2 84 inject 6.8 × 10 −3 35 group 1.2 × 10 −2 85 stimul 6.7 × 10 −3 36 apoptosi 1.2 × 10 −2 86 infect 6.7 × 10 −3 37 blot 1.2 × 10 −2 87 normal 6.7 × 10 −3 38 serum 1.2 × 10 −2 88 kinas 6.7 × 10 −3 39 antibodi 1.2 × 10 −2 89 transplant 6.7 × 10 −3 40 dose 1.2 × 10 −2 90 beta 6.7 × 10 −3 41 efficaci 1.2 × 10 −2 91 carcinoma 6.6 × 10 −3 42 assay 1.2 × 10 −2 92 pcr 6.6 × 10 −3 43 activ 1.1 × 10 −2 93 day 6.5 × 10 −3 44 regul 1.1 × 10 −2 94 prevent 6.5 × 10 −3 45 propos 1 × 10 −2 95 cellular 6.4 × 10 −3 46 inflammatori 1 × 10 −2 96 marrow 6.4 × 10 −3 47 antigen 1 × 10 −2 97 follow 6.3 × 10 −3 48 pathway 1 × 10 −2 98 pathogenesi 6.3 × 10 −3 49 mediat 9.9 × 10 −3 99 effect 6.3 × 10 −3 50 cytokin 9.8 × 10 −3 100 stain 6.3 × 10 −3 224 2.1 × 10 −2 72 teach 1 × 10 −2 23 orchestra 2 × 10 −2 73 discours 1 × 10 −2 24 aesthet 2 × 10 −2 74 scholar 1 × 10 −2 25 improvis 1.9 × 10 −2 75 manuscript 1 × 10 −2 26 explor 1.9 × 10 −2 76 way 1 × 10 −2 27 vocal 1.8 × 10 −2 77 theme 9.9 × 10 −3 28 notat 1.8 × 10 −2 78 histor 9.8 × 10 −3 29 singer 1.7 × 10 −2 79 quartet 9.8 × 10 −3 30 audienc 1.7 × 10 −2 80 temperatur 9.7 × 10 −3 31 work 1.7 × 10 −2 81 musicologist 9.6 × 10 −3 32 student 1.6 × 10 −2 82 eighteenth 9.5 × 10 −3 33 musicolog 1.6 × 10 −2 83 narrat 9.5 × 10 −3 34 teacher 1.5 × 10 −2 84 context 9.4 × 10 −3 35 voic 1.5 × 10 −2 85 creat 9.4 × 10 −3 36 school 1.5 × 10 −2 86 increas 9.4 × 10 −3 37 tonal 1.5 × 10 −2 87 musicianship 9.4 × 10 −3 38 contemporari 1 surfac 3.9 × 10 −2 54 high 1 × 10 −2 5 film 2.9 × 10 −2 55 carbon 1 × 10 −2 6 devic 2.6 × 10 −2 56 background 1 × 10 −2 7 nanostructur 2.5 × 10 −2 57 tio2 9.8 × 10 −3 8 layer 2.2 × 10 −2 58 assess 9.8 × 10 −3 9 microscopi 2.2 × 10 −2 59 risk 9.8 × 10 −3 10 metal 2.2 × 10 −2 60 particl 9.6 × 10 −3 11 graphen 2.2 × 10 −2 61 selfassembl 9.6 × 10 −3 12 properti 8.2 × 10 −3 9 accid 3.1 × 10 −2 59 dosimet 8.2 × 10 −3 10 code 3 × 10 −2 60 radon 8.2 × 10 −3 11 mev 2.9 × 10 −2 61 co60 8. 1.8 × 10 −2 89 depress 9.9 × 10 −3 40 team 1.7 × 10 −2 90 compet 9.9 × 10 −3 41 manag 1.7 × 10 −2 91 understand 9.6 × 10 −3 42 perceiv 1.7 × 10 −2 92 distress 9.5 × 10 −3 43 mental 1.7 × 10 −2 93 paramet 9.5 × 10 −3 44 life 1.6 × 10 −2 94 focus 9.3 × 10 −3 45 unit 1.6 × 10 −2 95 servic 9.2 × 10 −3 46 studi 1.6 × 10 −2 96 infant 9.2 × 10 −3 47 experienc 1.5 × 10 −2 97 live 9.2 × 10 −3 48 famili 1.5 × 10 −2 98 psycholog 9.1 × 10 −3 49 midwiv 1.5 × 10 −2 99 assess 9.1 × 10 −3 50 find 1.5 × 10 −2 100 conduct 9 × 10 −3 women 2.1 × 10 −1 51 mother 2.1 × 10 −2 2 pregnanc 1.6 × 10 −1 52 reproduct 2.1 × 10 −2 3 conclus 1.3 × 10 −1 53 prenat 2 × 10 −2 4 gestat 1 × 10 −1 54 placenta 1.9 × 10 −2 5 object 8.9 × 10 −2 55 intervent 1.9 × 10 −2 6 matern 7.9 × 10 −2 56 woman 1.9 × 10 −2 7 birth 7.3 × 10 −2 57 method 1.9 × 10 −2 8 outcom 5.9 × 10 −2 58 ultrasound 1.9 × 10 −2 9 fetal 5.9 × 10 −2 59 care 1.9 × 10 −2 10 vagin 4.8 × 10 −2 60 contracept 1.9 × 10 −2 11 obstetr 4.8 × 10 −2 61 underw 1.9 × 10 −2 12 ovarian 4.7 × 10 −2 62 hormon 1. numer 3.7 × 10 −2 54 age 1 × 10 −2 5 turbul 3.7 × 10 −2 55 magnetohydrodynam 1 × 10 −2 6 veloc 3.6 × 10 −2 56 steadi 9.7 × 10 −3 7 reynold 2.8 × 10 −2 57 cyclotron 9.6 × 10 −3 8 equat 2.8 × 10 −2 58 motion 9.6 × 10 −3 9 particl 2.7 × 10 −2 59 energi 9.4 × 10 −3 10 wave 2.6 × 10 −2 60 analyt 9.2 × 10 −3 11 tokamak 2.6 × 10 −2 61 navier 9.2 × 10 −3 12 instabl 2.5 × 10 −2 62 forc 9.2 × 10 −3 13 simul 2.3 × 10 −2 63 transit 9 × 10 −3 14 field 2.1 × 10 −2 64 divertor 9 × 10 −3 15 regim 2 × 10 −2 65 convect 8.9 × 10 −3 16 vortic 2 × 10 −2 66 propag 8.9 × 10 −3 17 discharg 2 × 10 −2 67 activ 8.7 × 10 −3 18 patient 1. collis 4 × 10 −2 59 spin 1 × 10 −2 10 gev 3.9 × 10 −2 scalar 7.3 × 10 −2 51 collis 1.9 × 10 −2 2 quark 6.4 × 10 −2 52 conclus 1.9 × 10 −2 3 higg 5.9 × 10 −2 53 increas 1.9 × 10 −2 4 gaug 5.9 × 10 −2 54 string 1.9 × 10 −2 5 lhc 5.9 × 10 −2 55 black 1.9 × 10 −2 6 cosmolog 5.3 × 10 −2 56 proton 1.8 × 10 −2 7 gev 5.2 × 10 −2 57 nucleon 1.8 × 10 −2 8 boson 5.2 × 10 −2 induc 3.7 × 10 −2 53 pathway 1.1 × 10 −2 4 rat 3.6 × 10 −2 54 studi 1 × 10 −2 5 express 3 × 10 −2 55 base 1 × 10 −2 6 increas 3 × 10 −2 56 inhibitor 1 × 10 −2 7 paper 2.7 × 10 −2 57 signal 1 × 10 −2 8 receptor 2.5 × 10 −2 58 kinas 1 × 10 −2 9 regul 2.4 × 10 −2 59 glucos 9.9 × 10 −3 10 respons 2.4 × 10 −2 60 propos 9.7 × 10 −3 11 physiolog 2.4 × 10 −2 61 mechan 9.5 × 10 −3 12 protein 2.4 × 10 −2 articl 3.4 × 10 −2 56 temperatur 1 × 10 −2 7 social 3.4 × 10 −2 57 trade 1 × 10 −2 8 polit 3.3 × 10 −2 58 labour 9.9 × 10 −3 9 urban 3.1 × 10 −2 59 hous 9.8 × 10 −3 10 economi 2.1 × 10 −2 72 pollin 1 × 10 −2 23 patient 2 × 10 −2 73 auxin 1 × 10 −2 24 breed 1.9 × 10 −2 74 model 1 × 10 −2 25 isol 1.9 × 10 −2 75 biomass 9.9 × 10 −3 26 genet 1.9 × 10 −2 76 oryza 9.9 × 10 −3 27 accumul 1.9 × 10 −2 77 dri 9.7 × 10 −3 28 genus 1.9 × 10 −2 78 morpholog 9.7 × 10 −3 29 rice 1.8 × 10 −2 79 aba 9.6 × 10 −3 30 veget 1.7 × 10 −2 80 taxonom 9.5 × 10 −3 31 grown 1.7 × 10 −2 81 marker 9.5 × 10 −3 32 tree 1.7 × 10 −2 82 weed 9.5 × 10 −3 33 chloroplast 1.6 × 10 −2 83 infloresc 9.4 × 10 −3 34 photosynthesi 1.6 × 10 −2 84 antioxid 9.2 × 10 −3 35 transgen 1.6 × 10 −2 85 forest 9.1 × 10 −3 36 drought 1.6 × 10 −2 86 product 9.1 × 10 −3 37 wheat 1.6 × 10 −2 87 comput 9.1 × 10 −3 38 mutant 1.5 × 10 −2 88 maiz 9 × 10 −3 39 transcript 1.5 × 10 −2 89 plastid 9 × 10 −3 40 wild 1.5 × 10 −2 90 qtl 9 × 10 −3 41 toler 1.5 × 10 −2 91 greenhous 8.9 × 10 −3 42 pollen 1. argu 9.3 × 10 −2 57 blur 2 × 10 −2 8 emili 7.9 × 10 −2 58 letter 2 × 10 −2 9 literari 7.5 × 10 −2 59 artist 2 × 10 −2 10 text 7.4 × 10 −2 60 pen 2 × 10 −2 11 lyric 6.3 × 10 −2 61 wife 2 × 10 −2 implic 3.9 × 10 −2 55 perspect 9.6 × 10 −3 6 psycholog 3.4 × 10 −2 56 negat 9.3 × 10 −3 7 examin 3.4 × 10 −2 57 offend 9.1 × 10 −3 8 perceiv 3.2 × 10 −2 58 temperatur 9 × 10 −3 9 relationship conclus 7.5 × 10 −2 54 relat 2 × 10 −2 5 addict 6.7 × 10 −2 55 month 2 × 10 −2 6 drug 6.7 × 10 −2 56 gambler 1.9 × 10 −2 7 smoke 6.5 × 10 −2 57 male 1.9 × 10 −2 8 particip 6.2 × 10 −2 58 youth 1.9 × 10 −2 9 abus 6.1 × 10 −2 59 misus 1.9 × 10 −2 10 abstin 5.8 × 10 −2 60 report 1.9 × 10 −2 11 smoker 5.3 × 10 −2 61 social 1.9 × 10 −2 12 background 1.6 × 10 −1 51 nrtl 9.9 × 10 −3 2 temperatur 8.8 × 10 −2 52 pipe 9.9 × 10 −3 3 thermal 7.3 × 10 −2 53 calcul 9.8 × 10 −3 4 flow 6 × 10 −2 54 mass 9.6 × 10 −3 5 transfer 5.1 × 10 −2 55 age 9.6 × 10 −3 6 cool 4.3 × 10 −2 56 obtain 9.5 × 10 −3 7 fluid 4.1 × 10 −2 57 protein 9.5 × 10 −3 8 pressur 4 × 10 −2 58 isotherm 9.5 × 10 −3 9 experiment 3.7 × 10 −2 59 drop 9.5 × 10 −3 10 convect traffic 8.4 × 10 −2 52 batteri 9.7 × 10 −3 3 paper 6.4 × 10 −2 53 demand 9.6 × 10 −3 4 road 5.8 × 10 −2 54 plan 9.1 × 10 −3 5 travel 4 × 10 −2 55 scheme 9 × 10 −3 6 transport 3.3 × 10 −2 56 asphalt 9 × 10 −3 7 propos 2.9 × 10 −2 57 intersect 8.9 × 10 −3 8 car 2.9 × 10 −2 58 crash 1.6 × 10 −2 97 coinfect 9.9 × 10 −3 48 pathogen 1.5 × 10 −2 98 stool 9.9 × 10 −3 49 countri 1.5 × 10 −2 99 larval 9.9 × 10 −3 50 area 1.5 × 10 −2 100 diagnosi 9.7 × 10 −3 313 social 3.5 × 10 −2 59 draw 1 × 10 −2 10 metropolitan 3.4 × 10 −2 60 suburban 1 × 10 −2 11 polit 3.1 × 10 −2 61 rural 1 × 10 −2 12 articl prostat 7.4 × 10 −2 56 proteinuria 2 × 10 −2 7 bladder 6.6 × 10 −2 57 treat 2 × 10 −2 8 dialysi 5.9 × 10 −2 58 nephropathi 2 × 10 −2 9 outcom 5.3 × 10 −2 59 dysfunct 2 × 10 −2 10 glomerular replic 8.7 × 10 −2 54 entri 1.6 × 10 −2 5 hiv 8.7 × 10 −2 55 nucleotid 1.6 × 10 −2 6 host 6.6 × 10 −2 56 immunodefici 1.6 × 10 −2 7 protein 6.5 × 10 −2 57 titer 1.6 × 10 −2 8 rna 6.3 × 10 −2 58 inhibit 1.6 × 10 −2 9 cell 6.1 × 10 −2 59 mediat 1.6 × 10 −2 10 antivir 5.8 × 10 −2 60 epitop 1.6 × 10 −2 11 genom 5.5 × 10 −2 61 hbv 1.6 × 10 −2 12 immun hydrolog 6.9 × 10 −2 52 drink 1.1 × 10 −2 3 river 6.8 × 10 −2 53 eros 1.1 × 10 −2 4 groundwat 6 × 10 −2 54 coastal 1.1 × 10 −2 5 soil 3.8 × 10 −2 55 storm 1 × 10 −2 6 rainfal 3.8 × 10 −2 56 ecosystem 1 × 10 −2 7 runoff 3.8 × 10 −2 57 estim 1 × 10 −2 8 flow 3.7 × 10 −2 58 impact 1 × 10 −2 9 flood 3.5 × 10 −2 59 resourc 1 × 10 −2 10 catchment habitat 3.9 × 10 −2 54 abund 1 × 10 −2 5 femal 3.9 × 10 −2 55 conserv 1 × 10 −2 6 male 3.8 × 10 −2 56 applic 1 × 10 −2 7 describ 2.6 × 10 −2 57 insect 1 × 10 −2 8 anim 2.5 × 10 −2 58 simul