key: cord-0677849-gqvk4h15
authors: Ge, Yao; Guo, Yuting; Yang, Yuan-Chi; Al-Garadi, Mohammed Ali; Sarker, Abeed
title: Few-shot learning for medical text: A systematic review
date: 2022-04-21
journal: nan
DOI: nan
sha: 462a00d7e5ab8f28235963bb0833c8f3629cee5b
doc_id: 677849
cord_uid: gqvk4h15

Objective: Few-shot learning (FSL) methods require small numbers of labeled instances for training. As many medical topics have limited annotated textual data in practical settings, FSL-based natural language processing (NLP) methods hold substantial promise. We aimed to conduct a systematic review to explore the state of FSL methods for medical NLP. Materials and Methods: We searched for articles published between January 2016 and August 2021 using PubMed/Medline, Embase, ACL Anthology, and IEEE Xplore Digital Library. To identify the latest relevant methods, we also searched other sources such as preprint servers (eg., medRxiv) via Google Scholar. We included all articles that involved FSL and any type of medical text. We abstracted articles based on data source(s), aim(s), training set size(s), primary method(s)/approach(es), and evaluation method(s). Results: 31 studies met our inclusion criteria-all published after 2018; 22 (71%) since 2020. Concept extraction/named entity recognition was the most frequently addressed task (13/31; 42%), followed by text classification (10/31; 32%). Twenty-one (68%) studies reconstructed existing datasets to create few-shot scenarios synthetically, and MIMIC-III was the most frequently used dataset (7/31; 23%). Common methods included FSL with attention mechanisms (12/31; 39%), prototypical networks (8/31; 26%), and meta-learning (6/31; 19%). Discussion: Despite the potential for FSL in biomedical NLP, progress has been limited compared to domain-independent FSL. This may be due to the paucity of standardized, public datasets, and the relative underperformance of FSL methods on biomedical topics. Creation and release of specialized datasets for biomedical FSL may aid method development by enabling comparative analyses.

training data is used to optimize a model for a specific task, and a separate set is used to evaluate the performance of the trained model. In the meta-learning framework, a model is trained using a set of training tasks, not data, and model performance is evaluated on a set of test tasks. In the experimental setting, the learner obtains prior knowledge by incorporating generic knowledge across different tasks (ie., algorithm level prior knowledge). The small number of labeled instances for the target task are then used to fine-tune the model. Figure 1 (a) illustrates the meta-learning framework using a simple example-an entity recognition model is trained using different tasks involving news and music data, and is evaluated on a medical task.

Several additional classes of FSL methods have evolved over the years, some building on meta-learning. Ravi and Larochelle, 8 presented a long-short term memory (LSTM) based meta-learner that is trained and customized separately for mini-batches of training data (referred to as episodes), rather than as a single model over all the mini-batches. Separately, matching networks were recently proposed, and they attempt to use two embedding functions (ie., functions that project data into vector space while capturing relevant semantics)-one for the training sets and one for the test sets-to imitate how humans generalize the knowledge learned from examples. The framework attempts to optimize the two embedding functions from the training (support sets) and the validation examples (query sets), and attempts to measure how well the trained model can generalize. 9,17 Figure 1 (b) illustrates the functionality of matching networks in a simplified manner. A variant of matching networks utilizes active learning by adding a sample selection step that augments the training data by labeling the most beneficial unlabeled sample (ie., model level prior knowledge).

Another related class of FSL approaches known as metric learning employs distance-based metrics (eg., nearest neighbor). Given a support set, metric learning methods typically produce weighted nearest neighbor classifiers via non-linear transformations in an embedding space, and the examples in the support set close to the query example (based on the metric applied) are used to make classification decisions, imitating how humans use similar examples or analogies to learn. Prototypical networks, 2 yet another similar class of approaches, particularly attempt to address the issue of overfitting due to small training samples by generating prototype representations of classes from the training samples, similar to how humans summarize knowledge learned from examples. Prediction of unknown data samples can be performed by computing distances to the class prototypes (eg., support set means), and choosing the nearest one as the predicted label. Figure 1 (c) visually illustrates the functionality of a prototypical network. A semi-supervised variant of prototypical networks applies soft assignment on unlabeled samples, and incorporates these as prior knowledge (ie., data level prior knowledge. Transfer learning, a commonly used approach in FSL, also incorporates prior knowledge at the data level as knowledge learned from data in prior tasks are transferred to new few-shot tasks. 18 The problems that these and other FSL methods attempt to solve are closely aligned with the practical challenges faced by many medical NLP tasks. While a number of FSL strategies have been explored for medical texts by distinct research communities (eg., health informatics, computational linguistics), there is currently no review that compares the performances of these strategies or summarizes the current state of the art. There is also no study that has compiled the reported performances of FSL methods on distinct medical NLP data/tasks. We attempt to address these gaps in this systematic review. Specifically, we review FSL methods for medical NLP tasks, and characterize each reviewed article in terms of type of task (eg., text classification, NER), primary aim(s), dataset(s), evaluation metrics, and other relevant aspects. We summarize our findings about FSL methods for medical NLP, and discuss challenges, limitations, opportunities and necessary future efforts for progressing research on the topic.

We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) protocol to conduct this review. 20 FSL for NLP is a relatively recent research topic, so we concentrated on a short time range for our literature search-January 2016 to August 2021. We searched the following bibliographic databases to identify relevant papers: (1) PubMed/Medline, (2) Embase, (3) IEEE Xplore Digital Library, (4) ACL Anthology, and (5) Google Scholar, the latter being a meta-search engine, not a database. We included ACL Anthology (the primary source for the latest NLP research) and IEEE Xplore, in addition to EMBASE and PubMed/Medline, because much of the methodological progress in FSL has been published in non-medical journals and conference proceedings. At the time of searching (September 2021), ACL Anthology hosted 71,290, and IEEE Xplore hosted over 5.4 million articles, although most articles in the latter did not focus on NLP or medicine. Over recent years, preprint servers have emerged as major sources of the latest information regarding research progress in computer science and NLP, and we used Google Scholar primarily as a medium for searching these preprint servers or published papers from other sources. Note that we also searched the ACM Digital Library*, but discovered no additional article. Hence, we do not report it as a data source for our review.

We applied marginally different search strategies depending on the database to account for the differences in their contents. We used three types of queries:

1. Queries focusing on the technical field of research (phrases included: 'natural language processing', 'text mining', 'text classification', 'named entity recognition', and 'concept extraction'); 2. Queries focusing on the learning strategy (phrases included: 'few-shot', 'low-shot', 'one-shot', and 'zeroshot'); and 3. Queries focusing on the domain of interest (phrases included: 'medical', 'clinical', 'biomedical', 'health', 'health-related').

All articles on PubMed and Embase fall within the broader biomedical domain, so we used combinations of the phrases in 1 and 2 above for searching these two databases, leaving out the phrases in 3. All articles in the ACL Anthology involve NLP, so we used phrases from 2 and 3 for this source. For IEEE Xplore and Google Scholar, the articles can be from any domain and on any topic, so we used combinations of all three sets of phrases for searching. PubMed, Embase, and IEEE only returned articles that entirely matched the queries. However, ACL Anthology and Google Scholar retrieved larger sets of articles and ranked them by relevance. For ACL Anthology, the articles retrieved were reviewed sequentially in decreasing order of relevance. For each query combination, we continued reviewing candidate articles until we came across at least two pages (about 20 articles) of no relevant articles, at which point we decided that no relevant articles would be found in the following pages. Since FSL is a relatively new research area, we anticipated that there would be some relevant research papers that are not yet indexed in PubMed, Embase, IEEE Xplore or ACL Anthology. Specifically, preprint servers such as arXiv, bioRxiv and medRxiv are very popular among machine learning and NLP researchers as they enable the publication of the latest research progress early. We used Google Scholar as an auxiliary search engine to identify potentially relevant articles indexed in such preprint servers or other sources (eg., Open Review † ). Google Scholar, like ACL Anthology, sorts returned articles by relevance, but the total number of articles returned is much larger. For this search engine, therefore, we reviewed the top 40 articles returned by each query combination, excluding those that were retrieved from the other databases.

All articles shortlisted from initial searches were screened for eligibility by two authors of the manuscript (YGe and AS). We removed duplicate articles and those that either did not include at least one dataset from the biomedical domain, or did not involve NLP. While it was always possible to identify the technical field/topic (NLP or not) from the titles and abstracts, to determine domain, we had to review full articles because a subset of papers included multiple datasets, and only some of these datasets were from the medical domain. We excluded papers if none of the datasets were related to medicine/health, or did not explicitly focus on few/low-shot settings, and reviewed the remaining articles.

We abstracted the following details from each article, if available: publication year, data source, primary research aim(s), training set size(s), number of entities/classes, entity type for training, entity type for evaluation/testing, primary method(s), and evaluation methodology. For studies including data from multiple sources, we only abstracted those related to health/medicine. In terms of primary aim(s), some studies reported multiple objectives, and we abstracted all the NLP-oriented ones (eg., text classification, concept extraction). With respect to training set sizes, we abstracted information about the number of instances that were used for training, and, if applicable, how larger datasets were reconstructed to create few-shot samples. We also extracted the number of labels for each study/task; for NER/concept extraction methods, we identified the number of entities/concepts, and for classification, we identified the type of classification (ie., multi-label or multi-class) along with the number of classes. . Given a query, the goal is to calculate a value that indicates if the instance is an example of a given class. For a similarity metric, two embedding functions f() and g() need to take similarity on feature space. The function f() is a neural network and then the embedding function g() applied to each instance to process the kernel for each support set. (c) Prototypical network: a class's prototype is the mean of its support s6 et /3i6 n the embedding space. Given a query, its distance to each class's prototype is computed to decide its label. Note: (b) and (c) use the DASH 2020 Drug Data 19 )

We also noted down the training domain(s) and test/evaluation domain(s) for each few-shot method, when applicable. Abstracting primary approach(es) and evaluation methodology was more challenging due to the complexities of some of the model implementations, and we reviewed and summarized the descriptions provided in each paper. For evaluation, we abstracted evaluation strategies and reported performances.

31 studies met our inclusion criteria. Initial searches retrieved 1241 articles from PubMed, Embase, IEEE Xplore and ACL Anthology, and an additional 459 from Google Scholar. Figure 2 presents the screening procedures and numbers at each stage. After initial filtering, we reviewed 46 full-text articles for eligibility, and excluded 15 from the final review. The first included study was from 2018, and most articles (22/31; 71%) were from 2020 and 2021, although for the latter year, only studies published prior to August 31 were included.

Records identified from databas e search (n=1241):

Pubmed 

Text Classification 

One-Shot N/A (NER) 

15-shot N/A Leverage the label hierarchy to improve few and zero-shot learning. Propose a self-supervised learning algorithm to monitor COVID-19 Twitter using an autoencoder to learn the latent representations and then transfer the knowledge to COVID-19 Infection classifier by fine-tuning the Multi-Layer Perceptron (MLP) using fewshot learning.

Present FewJoint, a novel Few-Shot Learning benchmark for NLP. This benchmark introduces few-shot joint dialogue language understanding, which additionally covers the structure prediction and multi-task reliance problems.

The design of the model architecture is based 68 Sequence Tagging (NER) 70 Abstractive function to model the task of biomedical event trigger identification. In addition, in order to make full use of the external knowledge base to learn the complex biological context, we introduced a selfattention mechanism.

Compare the summarization quality produced F1-score Table 2 provides summaries of the methods proposed, and the evaluation strategies. Variants of neural network based (deep learning) algorithms, such as Siamese Convolutional Neural Networks (an artificial neural network, which processes two different input vectors simultaneously with the same weights to compute a comparable output vector), 35 were the most common. Only 3/31 (10%) articles proposed new datasets, and 2/31 (7%) presented benchmarks for comparing multiple few-shot methods. Evaluation strategies had considerably less diversity. Almost all evaluation methodologies for classification tasks involved standard metrics such as accuracy, precision, recall, and F1-scores, and NER tasks mainly relied on F1-scores only.

We grouped the datasets used into three categories: (i) publicly downloadable (de-identified) data; (ii) datasets from shared tasks; and (iii) new datasets specifically created for the target tasks. We found that datasets belonging to (ii) and (iii) were particularly difficult to obtain-shared task data are often difficult to obtain after their completion, and specialized datasets are often not made public, particularly if they contain protected health information (PHI). Studies using datasets from category (i) often reported performances on multiple datasets, consequently making the evaluations more comparable. Overlap of datasets among different studies was relatively low, making comparative analyses difficult. The MIMIC-III (Medical Information Mart for Intensive Care) dataset, 23 was the most frequently used across studies (7/31; 23%), particularly for few-shot classification and NER tasks. This was likely due to the public availability of the dataset and the presence of many labels in it (7000). 24 6 papers used datasets from shared tasks, of which 4 were from BioNLP, 47,55 one from the Social Media Mining for Health Applications (SMM4H), 41 and one from the Medical Document Anonymization (MEDDOCAN) shared task. 45 Only 3 papers created new datasets, reflecting the paucity of corpora built to support FSL for medical NLP.

19/31 (61%) reviewed studies reconstructed existing datasets for conducting experiments in few-shot settings (ie., subsets of labeled instances were extracted from larger datasets). For multi-label text classification tasks, especially when the number of labels is very large, and for few-shot NER tasks, reconstructing datasets can be complex. A popular way to represent data in FSL is K-Shot-N-Way, where "-shot" applies to the number of examples per category, and the suffix "-way" refers to the number of possible categories. Therefore, K-Shot-N-Way means that each of N classes or entities contains K labeled samples, as well as several instances from each class for each test batch. For multi-label classification tasks, each instance may have more than one label, often making it difficult to ensure that the reconstructed datasets include only K labeled samples for each class. Similar challenges exist for NER tasks, as each text segment may have overlapping entities. 39% (12/31) of the studies did not construct special datasets to represent few-shot settings. 16% (5/31) used existing datasets with high class imbalances, and the few-shot algorithms were focused on sparsely-occurring labels.

23/31 (74%) studies addressed text classification or NER/concept extraction tasks while only 8 (26%) studies focused on others. 24/31 (77%) studies attempted to incorporate prior knowledge to augment the small datasets available for training. 19 of these chose to augment the training data with other available annotated datasets as domain knowledge; or through transfer learning, aggregating and adjusting input-output pairs from other larger datasets. For example, due to the scarcity of samples, Manousogiannis et al. 40 attempted to incorporate prior or domain knowledge into their approach by adding concept codes from MEDDRA (Medical Dictionary for Regulatory Activities). 5 papers used pre-trained models learned from other tasks and then refined parameters on the given training data, and 6 studies learned a meta-learner as optimizer or refined meta-learned parameters. it's worth noting that some papers incorporated prior knowledge from more than one source. classification. Multi-label classification is a popular task because the associated datasets generally contain some very low-frequency classes. 7/10 (70%) papers incorporated data level prior knowledge. 7/10 (70%) classification papers proposed deep learning algorithms, and 3/10 (30%) were inspired by label-wise attention mechanisms. 2/10 (20%) combined few-shot tasks with graphs, such as similarity or co-occurrence graphs, or hierarchical structures that encode relationships between labels for knowledge aggregation. While convolutional neural networks have been popular for FSL, transformer-based models such as BERT 90 and RoBERTa 91 rarely appeared in these articles. Only 1 paper 59 mentioned applying BERT to generate instance embeddings, and then passed top-level output representations into a label-wise attention mechanism. Few-shot NER or concept extraction 8 reviewed papers were described as NER; 5 as concept extraction. Generally, studies described as concept extraction had less commonalities in their methods and involved task-specific configurations based on the characteristics of the data and/or extraction objectives. 5 papers attempted to incorporate data level, 2 model level, and another 2 algorithm level prior knowledge. 63% (5/8) of the studies described as NER employed transfer learning, with training and testing data from different domains. Studies commonly used the BIO (beginning, inside, outside) or IO tagging schemes. 2 papers investigated both BIO and IO tagging schemes, concluding that systems trained using IO schemes outperform those trained using BIO schemes. Studies reported that the O (outside) tag was often ill-defined, as specific entities (eg., time entities such as 'today', 'tomorrow') would be tagged as O if they were not the primary focus of the dataset, while the same entities would be tagged as B or I for other datasets. 5 papers used BIO schemes while 1 considered only the entity names without any tagging schemes. The NLP/machine learning strategies employed varied significantly, and included, for example, the application of fusion layers for combining features, 80 biological semantic and positional features, 84 prototypical representations and nearest neighbor classifiers, 71 transition scorers for modeling transition probabilities between abstract labels, 48,66,71 self-supervised methods, 61,66,83 noise networks for auxiliary training, 54, 83 and LSTM cells for encoding multiple entity type sequences. 54

Overview of other methods 6/31 (19%) studies applied meta-learning strategies, and 12/31 (39%) articles demonstrated the advantages of attention mechanisms in few-shot scenarios, such as handling the difficulty of recognizing multiple unseen labels. Among the latter, 5/12 used self-attention-related methods, and 4/12 used label-wise attention mechanisms. 8/31 (26%) studies reproduced prototypical networks, and/or added enhancements to them. Only 1 article used matching networks, and 2 studies included them as baselines.

9/31 (29%) studies used accuracy, and the reported values on medical datasets or datasets that included medical texts varied between 67.4% and 96%. Two-thirds (6/9) reported accuracies higher than 70%. For the 17/31 (55%) studies that reported F1-score, performance variations were even larger-from 31.8% to 95.7% (median: 68.6%). We were unable to determine in most cases if the performance differences were due to the effectiveness of the FSL methods, or if the dataset characteristics were primarily responsible.

For the vast majority of studies, reported performances on medical datasets were relatively low compared to other datasets. For papers that reported good performances, we investigated their methods as described, and found that in most cases did not mention how many training examples they used, or their training sizes were large (e.g., in the hundreds). While these approaches may still be considered few-shot learning, comparing these reported performances with those obtained in low-shot settings (e.g., 5-shot) does not constitute fair comparison. We also observed that some of the papers reporting high F1-scores actually included datasets from different domains, and only reported aggregated performances rather than dataset-specific performances.

In this review, we systematically collected and compared 31 studies that focus on FSL for biomedical NLP. Although there are many potential applications of FSL for biomedical texts, this research space has received relatively little attention. Similar to its progress in the general domain, FSL research in the medical domain has 24/36 largely been in computer vision. 92 Over two-thirds of the papers included in our review, however, were published in the last 24 months, which illustrates a fast-growing interest. Despite the relatively small number of studies that met our inclusion criteria, several observations were fairly consistent across studies: (i) under the same experimental parameters, the performances reported on medical data were worse than those reported on data from other domains; 35, 66, 71 (ii) incorporating prior knowledge via transfer learning or using specialized training datasets typically produced better results; and (iii) systems generally reported better performances on datasets with more formal texts compared to those with noisy texts (e.g., from social media). 48, 61, 71 We found it difficult to perform head-to-head comparisons of the few proposed methods due to the use of different or non-standardized evaluation strategies, training/test data, and experimental settings. For example, Chalkidis et al. 59 used 50 or less instances in their few-shot setting, while Rios and Kavuluru 21 used 5 or less, making it impossible to perform meaningful comparisons of the proposed methods. In the absence of specialized datasets for FSL, K-Shot-N-Way datasets were commonly reported for simulating few-shot scenarios. In such synthetically created datasets, the number of instances for training are predetermined. Such consistency in characteristics are almost never the case with real-world text-based medical data. Though this design attempts to make direct comparison between different methods or tasks easier, only speculative estimates can be made about how the proposed methods may perform if deployed in real-world settings. It was also typically impossible to compare performances of FSL methods with the state-of-the-art systems reported in prior literature, as FSL methods were expected to underperform compared to methods trained using larger training sets.

Few studies reported the creation of new datasets specialized for FSL, or provided benchmarks that future studies could use for comparison. The scarcity of standardized datasets, and the consequent need to reconstruct datasets for simulating few-shot scenarios is a notable obstacle to progress. Since FSL for biomedical NLP is an underexplored field, such datasets and benchmarks are essential for promoting future development. FSL datasets specialized for biomedical NLP need to contain entities/classes that are naturally sparsely occurring, and the distribution of classes/entities need to reflect real-life data. These conditions are necessary for ensuring that developed systems can be compared directly, and that the system performances reflect what is expected in practical settings. Reconstructed datasets often use randomly sampled subsets for evaluation, making direct comparisons between systems difficult (since the specific training and test instances may not be known), and increasing the potential for biased performance estimates.

An overarching aim of FSL is to enable systems to learn from few examples, like humans. 93 With text based data, complexities in semantics, syntax, and structure all make it harder to learn and generalize information, particularly with low number of examples. Specialized terminologies used in medical texts present additional challenges for FSL. Our review shows that while the utility of FSL for medical NLP is well acknowledged, this area of research is underexplored. Consequently, FSL systems for biomedical NLP tasks are very much in their infancy, and reported performances are typically low with high variance. Importantly, our review enabled us to identify future research activities that will be most impactful in moving this sub-field of research forward. We outline these in the following subsections.

To improve the state of the art in FSL for medical text, the most important activity currently is the creation of specialized, standardized, publicly available datasets. Ideally, such datasets should replicate real-world scenarios and pose practical challenges for FSL. Creation of such datasets will enable the direct comparison of distinct FSL strategies, and of FSL methods with traditional methods (eg., deep neural networks). Public datasets have helped progress NLP and machine learning research over the years, such as through shared tasks. 41 Our review, however, did not find any current shared task that provides specialized datasets for FSL-based biomedical NLP.

FSL methods for NLP comprise a wide variety of approaches. Facilitated by standardized datasets, studies need to focus on comparing distinct categories of FSL for biomedical NLP tasks and identify promising methods that need exploration. In the absence of standardized data, benchmarking studies can customize existing datasets and compare distinct FSL methods on identical evaluation sets. Researchers proposing new FSL methods for biomedical NLP should also take the steps necessary to enable head-to-head comparisons and reproducible research. This includes making the evaluation data explicit. Systems evaluated on reconstructed data need to report the exact instances involved in each performance estimate. In the absence of standardized datasets, reporting performances on multiple datasets/settings is also helpful for subjective assessment.

The paucity of research in this space means there are many potential opportunities. Domain-independent FSL methods have benefited by incorporating prior knowledge via transfer learning to compensate for the low numbers of training instances. 92 FSL methods for biomedical NLP can follow the same path by using models pre-trained on specific medical datasets. Over the years, medical NLP researchers have created many resources to support NLP methods, such as the Unified Medical Language System (UMLS), 94, 95 MedDRA, 96 and others. Effectively incorporating prior knowledge by utilizing these domain-specific knowledge sources is a particularly attractive opportunity.

In the recent review by , 92 the authors specified multi-modal data augmentation as a potential opportunity for improving the state-of-the-art in FSL. The same opportunity also exists in the medical domain. To enable FSL systems achieve performance levels suitable for deployment, future research may focus on augmenting information derived from medical texts with other information, such as images and ontologies. Existing FSL techniques for medical free-text data usually incorporate prior knowledge from one single modality (text), and it is generally not possible to incorporate information from other types of data, such as images. Multimodal strategies that combine knowledge from distinct sources (eg., texts, images, knowledge bases, ontologies) may enable FSL methods to achieve the performance levels needed to be applicable in real-world medical settings. Intuitively, multi-modal learning models are more akin to human learning. Unsurprisingly data augmentation methods in NLP have recently seen growing interest. 97 Notwithstanding this recent rise, this space is still comparatively underexplored, maybe due to the difficulties in augmentation of natural language data in general, and medical free text in particular, due to the presence of domain-specific terminologies.

There is also the opportunity to create novel datasets specialized for FSL-based biomedical NLP. Creation of comprehensive standardized datasets for FSL involving biomedical texts can lead research in this space. Such contributions to this area at such a crucial point in its progress will inevitably have long-term impact. Efforts to benchmark existing methods on medical datasets will also help identify promising methods requiring further investigation.

FSL approaches have substantial promise for NLP in the medical domain as many medical datasets naturally have low numbers of annotated instances. Some promising approaches have been proposed in the recent past, most of which focused on classification or NER. Meta-learning and transfer learning were commonly used strategies, and a number of studies reported on the benefits of incorporating attention mechanisms. Typical performances of FSL based medical NLP systems are not yet good enough to be suitable for real-world application, and further research on improving performance is required. Lack of public datasets specialized for FSL presents an obstacle to progressing research on the topic, and future research should consider creating such datasets and benchmarks for comparative analyses.

YGe and AS conducted initial searches and filtering. YGuo, YCY and MAA contributed to the review of the articles, determine their relevance, and/or summarized findings included in the review. All authors contributed to the writing of the final manuscript.

Research reported in this publication was supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health (NIH) under award number R01DA046619. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Learning to compare: Relation network for few-shot learning

Prototypical networks for few-shot learning. Adv. neural information processing systems

One-shot learning by inverting a compositional causal process. Adv. neural information processing systems

Few-shot semantic segmentation with prototype learning

Revisiting local descriptor based image-to-class measure for few-shot learning

One shot learning of simple visual concepts

Recent trends in deep learning based natural language processing

Optimization as a model for few-shot learning

Matching networks for one shot learning

Induction networks for few-shot text classification

Fewjoint: A few-shot learning benchmark for joint language understanding

Few-shot learning for named entity recognition in medical text

Prior knowledge in recalling arguments in bioethical dilemmas

Generalizing from a few examples: A survey on few-shot learning

Meta-learning in neural networks: A survey

Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta

Learning algorithms for active learning

A survey on transfer learning

Utilizing social media for identifying drug addiction and recovery intervention

Preferred reporting items for systematic reviews and meta-analyses: The prisma statement

Few-shot and zero-shot multi-label learning for structured label spaces

Automated classification of free-text pathology reports for registration of incident cases of cancer

Mimic-iii, a freely accessible critical care database

Emr coding with semi-parametric multi-head matching networks

Extracting medication information from clinical text

i2b2/va challenge on concepts, assertions, and relations in clinical text

Evaluating temporal relations in clinical text: 2012 i2b2 challenge

Introduction to the conll-2003 shared task: Language-independent named entity recognition

How to train good word embeddings for biomedical nlp

Developing a new model for patient recruitment in mental health services: a cohort study using electronic health records

The south london and maudsley nhs foundation trust biomedical research centre (slam brc) case register: development and descriptive data

Towards one-shot learning for rare-word translation with external experts

Europarl: A parallel corpus for statistical machine translation

Wit3: Web inventory of transcribed and translated talks. In Conference of european association for machine translation

Few-shot learning for short text classification

Learning discriminative sentiment chunk vectors for twitter sentiment analysis

Twitter polarity classification with label propagation over lexical links and the follower graph

Sentiment strength detection for the social web

Developing a successful semeval task in sentiment analysis of twitter and other social media texts

Give it a shot: Few-shot learning to normalize adr mentions in social media posts

Overview of the fourth social media mining for health (smm4h) shared tasks at acl 2019

Fewrel 2.0: Towards more challenging few-shot relation classification

Fewrel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation

Key phrases annotation in medical documents: Meddocan 2019 anonymization task

Automatic de-identification of medical texts in spanish: the meddocan track, corpus, guidelines, methods and evaluation of results

C-norm: a neural approach to few-shot entity normalization

Bacteria biotope at bionlp open shared tasks 2019

Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network

Snips voice platform: an embedded spoken language understanding system for private-bydesign voice interfaces

Meta-learning for few-shot nmt adaptation

Parallel data, tools and interfaces in opus

Multi-label few/zero-shot learning with knowledge aggregated from multiple label graphs

Large-scale multi-label text classification on eu legislation

Multi-cell compositional lstm for ner domain adaptation

Overview of bionlp shared task 2013

Broad twitter corpus: A diverse named entity recognition resource

Visual attention model for name tagging in multimodal social media

Cross-domain ner using cross-domain language modeling

An empirical study on large-scale multi-label text classification including few and zero-shot labels

Rcv1: A new benchmark collection for text categorization research

Covid-19 surveillance through twitter using self-supervised and few shot learning

Coronavirus (covid-19) tweets dataset

Few-shot nlg with pre-trained language model

Neural text generation from structured data with application to the biography domain

Natural language processing for structuring clinical text data on depression using uk-cris

Few-shot named entity recognition: A comprehensive study

A multimodal diagnosis predictive model of alzheimer's disease with few-shot learning

Knowledge-aware few-shot learning framework for biomedical event trigger identification

Event extraction across multiple levels of biological organization

Flight of the pegasus? comparing transformers on few-shot and zero-shot multi-document abstractive summarization

Simple and effective few-shot named entity recognition with structured nearest neighbor learning

Ontonotes release 5.0. Linguist. Data Consortium

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/uthealth corpus

Results of the wnut2017 shared task on novel and emerging entity recognition

Multilingual negation scope resolution for clinical text

Annotation of negation in the iula spanish clinical record corpus

Nubes: A corpus of negation and uncertainty in spanish clinical texts

Détection de la négation : corpus français et apprentissage supervisé

Conceptual grounding constraints for truly robust biomedical name representations

xiaoning Song & Fang, W. A novel few-shot learning based multi-modality fusion model for covid-19 rumor detection from online social media

Learning reporting dynamics during breaking news for rumour detection in social media

Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients

Med7: A transferable clinical natural language processing model for electronic health records

Extracting biomedical entity relations using biological interaction knowledge

Towards few-shot fact-checking via perplexity

Caire-covid: A question answering and query-focused multi-document summarization system for covid-19 scholarly information management

Where is your evidence: Improving fact-checking by justification modeling

Fever: a large-scale dataset for fact extraction and verification

Scalable few-shot learning of robust biomedical name representations

Pre-training of Deep Bidirectional Transformers for Language Understanding

A Robustly Optimized BERT Pretraining Approach

Generalizing from a few examples: A survey on few-shot learning. 1-34

Language models are few-shot learners

The Unified Medical Language System (UMLS): integrating biomedical terminology

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications

An overview of the medical dictionary for regulatory activities

A survey of data augmentation approaches for nlp

None.