id author title date pages extension mime words sentences flesch summary cache txt work_w7xmeafrznavjlelreuofhde2i Mandar Joshi SpanBERT: Improving Pre-training by Representing and Predicting Spans 2020 14 .pdf application/pdf 7798 843 66 In this paper, we introduce a spanlevel pretraining approach that consistently outperforms BERT, with the largest gains on span Span-based masking forces the model to predict the output representations of the boundary tokens, x4 and x9 (in blue), to predict each token in the masked span. Together, our pre-training process yields models that outperform all BERT baselines on a tasks that do not explicitly involve span selection, and show that our approach even improves performance on TACRED (Zhang et al., In summary, SpanBERT pre-trains span representations by: (1) masking spans of full words including seven question answering tasks, coreference resolution, nine tasks in the GLUE benchmark (Wang et al., 2019), and relation extraction. answering and coreference resolution, will particularly benefit from our span-based pre-training. training with NSP with BERT's choice of sequence lengths for a wide variety of tasks. for models pre-trained with span masking, and also ./cache/work_w7xmeafrznavjlelreuofhde2i.pdf ./txt/work_w7xmeafrznavjlelreuofhde2i.txt