id author title date pages extension mime words sentences flesch summary cache txt cord-025523-6ttps1nx Barlas, Georgios Cross-Domain Authorship Attribution Using Pre-trained Language Models 2020-05-06 .txt text/plain 3611 193 56 title: Cross-Domain Authorship Attribution Using Pre-trained Language Models An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. Recently, the use of pre-trained language models (e.g., BERT, ELMo, ULM-FiT, has been demonstrated to obtain significant gains in several text classification tasks including sentiment analysis, emotion classification, and topic classification [2, 7, 13, 14] . This method is based on a character-level recurrent (RNN) neural network language model and a multiheaded classifier (MHC) [1] . We examine the use of pre-trained language models (e.g., BERT, ELMo, ULMFiT, GPT-2) in AA and the potentials of MHC. Based on Bagnall's model [1] , originally proposed for authorship verification, we compare the performance when we use either the original characterlevel RNN trained from scratch in the small-size AA corpus or pre-trained tokenbased language models obtained from general-domain corpora. ./cache/cord-025523-6ttps1nx.txt ./txt/cord-025523-6ttps1nx.txt