id sid tid token lemma pos 10_1101-2021_02_08_430070 1 1 On on IN 10_1101-2021_02_08_430070 1 2 the the DT 10_1101-2021_02_08_430070 1 3 application application NN 10_1101-2021_02_08_430070 1 4 of of IN 10_1101-2021_02_08_430070 1 5 BERT BERT NNP 10_1101-2021_02_08_430070 1 6 models model NNS 10_1101-2021_02_08_430070 1 7 for for IN 10_1101-2021_02_08_430070 1 8 nanopore nanopore JJ 10_1101-2021_02_08_430070 1 9 methylation methylation NN 10_1101-2021_02_08_430070 1 10 detection detection NN 10_1101-2021_02_08_430070 1 11 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 12 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 13 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 14 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 15 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 16 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 17 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 18 ✐ ✐ NNP 10_1101-2021_02_08_430070 1 19 Genome Genome NNP 10_1101-2021_02_08_430070 1 20 Analysis Analysis NNP 10_1101-2021_02_08_430070 1 21 On on IN 10_1101-2021_02_08_430070 1 22 the the DT 10_1101-2021_02_08_430070 1 23 application application NN 10_1101-2021_02_08_430070 1 24 of of IN 10_1101-2021_02_08_430070 1 25 BERT BERT NNP 10_1101-2021_02_08_430070 1 26 models model NNS 10_1101-2021_02_08_430070 1 27 for for IN 10_1101-2021_02_08_430070 1 28 nanopore nanopore JJ 10_1101-2021_02_08_430070 1 29 methylation methylation NN 10_1101-2021_02_08_430070 1 30 detection detection NN 10_1101-2021_02_08_430070 1 31 Yao Yao NNP 10_1101-2021_02_08_430070 1 32 - - HYPH 10_1101-2021_02_08_430070 1 33 zhong zhong NNP 10_1101-2021_02_08_430070 1 34 Zhang Zhang NNP 10_1101-2021_02_08_430070 1 35 1,∗ 1,∗ CD 10_1101-2021_02_08_430070 1 36 , , , 10_1101-2021_02_08_430070 1 37 Sera Sera NNP 10_1101-2021_02_08_430070 1 38 Hatakeyama Hatakeyama NNP 10_1101-2021_02_08_430070 1 39 1 1 CD 10_1101-2021_02_08_430070 1 40 , , , 10_1101-2021_02_08_430070 1 41 Kiyoshi Kiyoshi NNP 10_1101-2021_02_08_430070 1 42 Yamaguchi Yamaguchi NNP 10_1101-2021_02_08_430070 1 43 1 1 CD 10_1101-2021_02_08_430070 1 44 , , , 10_1101-2021_02_08_430070 1 45 Yoichi Yoichi NNP 10_1101-2021_02_08_430070 1 46 Furukawa Furukawa NNP 10_1101-2021_02_08_430070 1 47 1 1 CD 10_1101-2021_02_08_430070 1 48 , , , 10_1101-2021_02_08_430070 1 49 Satoru Satoru NNP 10_1101-2021_02_08_430070 1 50 Miyano Miyano NNP 10_1101-2021_02_08_430070 1 51 2 2 CD 10_1101-2021_02_08_430070 1 52 , , , 10_1101-2021_02_08_430070 1 53 Rui Rui NNP 10_1101-2021_02_08_430070 1 54 Yamaguchi Yamaguchi NNP 10_1101-2021_02_08_430070 1 55 3 3 CD 10_1101-2021_02_08_430070 1 56 , , , 10_1101-2021_02_08_430070 1 57 and and CC 10_1101-2021_02_08_430070 1 58 Seiya Seiya NNP 10_1101-2021_02_08_430070 1 59 Imoto Imoto NNP 10_1101-2021_02_08_430070 1 60 1,∗ 1,∗ CD 10_1101-2021_02_08_430070 1 61 1Institute 1institute CD 10_1101-2021_02_08_430070 1 62 of of IN 10_1101-2021_02_08_430070 1 63 Medical Medical NNP 10_1101-2021_02_08_430070 1 64 Science Science NNP 10_1101-2021_02_08_430070 1 65 , , , 10_1101-2021_02_08_430070 1 66 the the DT 10_1101-2021_02_08_430070 1 67 University University NNP 10_1101-2021_02_08_430070 1 68 of of IN 10_1101-2021_02_08_430070 1 69 Tokyo Tokyo NNP 10_1101-2021_02_08_430070 1 70 , , , 10_1101-2021_02_08_430070 1 71 Tokyo Tokyo NNP 10_1101-2021_02_08_430070 1 72 , , , 10_1101-2021_02_08_430070 1 73 108 108 CD 10_1101-2021_02_08_430070 1 74 - - HYPH 10_1101-2021_02_08_430070 1 75 0071 0071 CD 10_1101-2021_02_08_430070 1 76 , , , 10_1101-2021_02_08_430070 1 77 Japan Japan NNP 10_1101-2021_02_08_430070 1 78 2 2 CD 10_1101-2021_02_08_430070 1 79 M&D M&D NNP 10_1101-2021_02_08_430070 1 80 Data Data NNP 10_1101-2021_02_08_430070 1 81 Science Science NNP 10_1101-2021_02_08_430070 1 82 Center Center NNP 10_1101-2021_02_08_430070 1 83 , , , 10_1101-2021_02_08_430070 1 84 Tokyo Tokyo NNP 10_1101-2021_02_08_430070 1 85 Medical Medical NNP 10_1101-2021_02_08_430070 1 86 and and CC 10_1101-2021_02_08_430070 1 87 Dental Dental NNP 10_1101-2021_02_08_430070 1 88 University University NNP 10_1101-2021_02_08_430070 1 89 , , , 10_1101-2021_02_08_430070 1 90 Tokyo Tokyo NNP 10_1101-2021_02_08_430070 1 91 , , , 10_1101-2021_02_08_430070 1 92 101 101 CD 10_1101-2021_02_08_430070 1 93 - - SYM 10_1101-2021_02_08_430070 1 94 0062 0062 CD 10_1101-2021_02_08_430070 1 95 , , , 10_1101-2021_02_08_430070 1 96 Japan Japan NNP 10_1101-2021_02_08_430070 1 97 3Aichi 3Aichi NNP 10_1101-2021_02_08_430070 1 98 Cancer Cancer NNP 10_1101-2021_02_08_430070 1 99 Center Center NNP 10_1101-2021_02_08_430070 1 100 Research Research NNP 10_1101-2021_02_08_430070 1 101 Institute Institute NNP 10_1101-2021_02_08_430070 1 102 , , , 10_1101-2021_02_08_430070 1 103 Nagoya Nagoya NNP 10_1101-2021_02_08_430070 1 104 , , , 10_1101-2021_02_08_430070 1 105 464 464 CD 10_1101-2021_02_08_430070 1 106 - - SYM 10_1101-2021_02_08_430070 1 107 8681 8681 CD 10_1101-2021_02_08_430070 1 108 , , , 10_1101-2021_02_08_430070 1 109 Japan Japan NNP 10_1101-2021_02_08_430070 1 110 ∗To ∗To NNP 10_1101-2021_02_08_430070 1 111 whom whom WP 10_1101-2021_02_08_430070 1 112 correspondence correspondence NN 10_1101-2021_02_08_430070 1 113 should should MD 10_1101-2021_02_08_430070 1 114 be be VB 10_1101-2021_02_08_430070 1 115 addressed address VBN 10_1101-2021_02_08_430070 1 116 . . . 10_1101-2021_02_08_430070 2 1 Abstract Abstract NNP 10_1101-2021_02_08_430070 2 2 Motivation Motivation NNP 10_1101-2021_02_08_430070 2 3 : : : 10_1101-2021_02_08_430070 2 4 DNA dna NN 10_1101-2021_02_08_430070 2 5 methylation methylation NN 10_1101-2021_02_08_430070 2 6 is be VBZ 10_1101-2021_02_08_430070 2 7 a a DT 10_1101-2021_02_08_430070 2 8 common common JJ 10_1101-2021_02_08_430070 2 9 epigenetic epigenetic JJ 10_1101-2021_02_08_430070 2 10 modification modification NN 10_1101-2021_02_08_430070 2 11 , , , 10_1101-2021_02_08_430070 2 12 which which WDT 10_1101-2021_02_08_430070 2 13 is be VBZ 10_1101-2021_02_08_430070 2 14 widely widely RB 10_1101-2021_02_08_430070 2 15 associated associate VBN 10_1101-2021_02_08_430070 2 16 with with IN 10_1101-2021_02_08_430070 2 17 various various JJ 10_1101-2021_02_08_430070 2 18 biological biological JJ 10_1101-2021_02_08_430070 2 19 processes process NNS 10_1101-2021_02_08_430070 2 20 , , , 10_1101-2021_02_08_430070 2 21 such such JJ 10_1101-2021_02_08_430070 2 22 as as IN 10_1101-2021_02_08_430070 2 23 gene gene NN 10_1101-2021_02_08_430070 2 24 expression expression NN 10_1101-2021_02_08_430070 2 25 , , , 10_1101-2021_02_08_430070 2 26 aging age VBG 10_1101-2021_02_08_430070 2 27 , , , 10_1101-2021_02_08_430070 2 28 and and CC 10_1101-2021_02_08_430070 2 29 disease disease NN 10_1101-2021_02_08_430070 2 30 . . . 10_1101-2021_02_08_430070 3 1 Nanopore nanopore JJ 10_1101-2021_02_08_430070 3 2 sequencing sequencing NN 10_1101-2021_02_08_430070 3 3 provides provide VBZ 10_1101-2021_02_08_430070 3 4 a a DT 10_1101-2021_02_08_430070 3 5 promising promising JJ 10_1101-2021_02_08_430070 3 6 methylation methylation NN 10_1101-2021_02_08_430070 3 7 detection detection NN 10_1101-2021_02_08_430070 3 8 approach approach NN 10_1101-2021_02_08_430070 3 9 through through IN 10_1101-2021_02_08_430070 3 10 monitoring monitor VBG 10_1101-2021_02_08_430070 3 11 abnormal abnormal JJ 10_1101-2021_02_08_430070 3 12 signal signal NN 10_1101-2021_02_08_430070 3 13 shifts shift NNS 10_1101-2021_02_08_430070 3 14 for for IN 10_1101-2021_02_08_430070 3 15 detecting detect VBG 10_1101-2021_02_08_430070 3 16 modified modify VBN 10_1101-2021_02_08_430070 3 17 bases basis NNS 10_1101-2021_02_08_430070 3 18 in in IN 10_1101-2021_02_08_430070 3 19 target target NN 10_1101-2021_02_08_430070 3 20 motif motif NN 10_1101-2021_02_08_430070 3 21 regions region NNS 10_1101-2021_02_08_430070 3 22 . . . 10_1101-2021_02_08_430070 4 1 Recently recently RB 10_1101-2021_02_08_430070 4 2 , , , 10_1101-2021_02_08_430070 4 3 model model NN 10_1101-2021_02_08_430070 4 4 - - HYPH 10_1101-2021_02_08_430070 4 5 based base VBN 10_1101-2021_02_08_430070 4 6 approaches approach NNS 10_1101-2021_02_08_430070 4 7 , , , 10_1101-2021_02_08_430070 4 8 especially especially RB 10_1101-2021_02_08_430070 4 9 those those DT 10_1101-2021_02_08_430070 4 10 with with IN 10_1101-2021_02_08_430070 4 11 deep deep JJ 10_1101-2021_02_08_430070 4 12 learning learning NN 10_1101-2021_02_08_430070 4 13 models model NNS 10_1101-2021_02_08_430070 4 14 , , , 10_1101-2021_02_08_430070 4 15 have have VBP 10_1101-2021_02_08_430070 4 16 achieved achieve VBN 10_1101-2021_02_08_430070 4 17 significant significant JJ 10_1101-2021_02_08_430070 4 18 performance performance NN 10_1101-2021_02_08_430070 4 19 improvements improvement NNS 10_1101-2021_02_08_430070 4 20 on on IN 10_1101-2021_02_08_430070 4 21 nanopore nanopore JJ 10_1101-2021_02_08_430070 4 22 methylation methylation NN 10_1101-2021_02_08_430070 4 23 detection detection NN 10_1101-2021_02_08_430070 4 24 . . . 10_1101-2021_02_08_430070 5 1 In in IN 10_1101-2021_02_08_430070 5 2 this this DT 10_1101-2021_02_08_430070 5 3 work work NN 10_1101-2021_02_08_430070 5 4 , , , 10_1101-2021_02_08_430070 5 5 we -PRON- PRP 10_1101-2021_02_08_430070 5 6 explore explore VBP 10_1101-2021_02_08_430070 5 7 using use VBG 10_1101-2021_02_08_430070 5 8 bidirectional bidirectional JJ 10_1101-2021_02_08_430070 5 9 encoder encoder NN 10_1101-2021_02_08_430070 5 10 representations representation NNS 10_1101-2021_02_08_430070 5 11 from from IN 10_1101-2021_02_08_430070 5 12 transformers transformer NNS 10_1101-2021_02_08_430070 5 13 ( ( -LRB- 10_1101-2021_02_08_430070 5 14 BERT BERT NNP 10_1101-2021_02_08_430070 5 15 ) ) -RRB- 10_1101-2021_02_08_430070 5 16 for for IN 10_1101-2021_02_08_430070 5 17 doing do VBG 10_1101-2021_02_08_430070 5 18 the the DT 10_1101-2021_02_08_430070 5 19 task task NN 10_1101-2021_02_08_430070 5 20 , , , 10_1101-2021_02_08_430070 5 21 which which WDT 10_1101-2021_02_08_430070 5 22 can can MD 10_1101-2021_02_08_430070 5 23 provide provide VB 10_1101-2021_02_08_430070 5 24 non non JJ 10_1101-2021_02_08_430070 5 25 - - JJ 10_1101-2021_02_08_430070 5 26 recurrent recurrent JJ 10_1101-2021_02_08_430070 5 27 neural neural JJ 10_1101-2021_02_08_430070 5 28 structures structure NNS 10_1101-2021_02_08_430070 5 29 for for IN 10_1101-2021_02_08_430070 5 30 fast fast JJ 10_1101-2021_02_08_430070 5 31 parallel parallel JJ 10_1101-2021_02_08_430070 5 32 computation computation NN 10_1101-2021_02_08_430070 5 33 . . . 10_1101-2021_02_08_430070 6 1 Results result NNS 10_1101-2021_02_08_430070 6 2 : : : 10_1101-2021_02_08_430070 6 3 We -PRON- PRP 10_1101-2021_02_08_430070 6 4 find find VBP 10_1101-2021_02_08_430070 6 5 original original JJ 10_1101-2021_02_08_430070 6 6 BERT BERT NNP 10_1101-2021_02_08_430070 6 7 architecture architecture NN 10_1101-2021_02_08_430070 6 8 does do VBZ 10_1101-2021_02_08_430070 6 9 not not RB 10_1101-2021_02_08_430070 6 10 work work VB 10_1101-2021_02_08_430070 6 11 as as RB 10_1101-2021_02_08_430070 6 12 well well RB 10_1101-2021_02_08_430070 6 13 as as IN 10_1101-2021_02_08_430070 6 14 the the DT 10_1101-2021_02_08_430070 6 15 bidirectional bidirectional JJ 10_1101-2021_02_08_430070 6 16 recurrent recurrent NN 10_1101-2021_02_08_430070 6 17 neural neural JJ 10_1101-2021_02_08_430070 6 18 network network NN 10_1101-2021_02_08_430070 6 19 ( ( -LRB- 10_1101-2021_02_08_430070 6 20 biRNN birnn NN 10_1101-2021_02_08_430070 6 21 ) ) -RRB- 10_1101-2021_02_08_430070 6 22 on on IN 10_1101-2021_02_08_430070 6 23 the the DT 10_1101-2021_02_08_430070 6 24 nanopore nanopore JJ 10_1101-2021_02_08_430070 6 25 methylation methylation NN 10_1101-2021_02_08_430070 6 26 prediction prediction NN 10_1101-2021_02_08_430070 6 27 task task NN 10_1101-2021_02_08_430070 6 28 . . . 10_1101-2021_02_08_430070 7 1 Through through IN 10_1101-2021_02_08_430070 7 2 further further JJ 10_1101-2021_02_08_430070 7 3 analysis analysis NN 10_1101-2021_02_08_430070 7 4 , , , 10_1101-2021_02_08_430070 7 5 we -PRON- PRP 10_1101-2021_02_08_430070 7 6 observe observe VBP 10_1101-2021_02_08_430070 7 7 recurrent recurrent JJ 10_1101-2021_02_08_430070 7 8 patterns pattern NNS 10_1101-2021_02_08_430070 7 9 of of IN 10_1101-2021_02_08_430070 7 10 positional positional JJ 10_1101-2021_02_08_430070 7 11 - - HYPH 10_1101-2021_02_08_430070 7 12 signal signal NN 10_1101-2021_02_08_430070 7 13 - - HYPH 10_1101-2021_02_08_430070 7 14 shift shift NN 10_1101-2021_02_08_430070 7 15 in in IN 10_1101-2021_02_08_430070 7 16 the the DT 10_1101-2021_02_08_430070 7 17 context context NN 10_1101-2021_02_08_430070 7 18 window window NN 10_1101-2021_02_08_430070 7 19 surrounding surround VBG 10_1101-2021_02_08_430070 7 20 target target NN 10_1101-2021_02_08_430070 7 21 5-methylcytosine 5-methylcytosine CD 10_1101-2021_02_08_430070 7 22 ( ( -LRB- 10_1101-2021_02_08_430070 7 23 5mC 5mc CD 10_1101-2021_02_08_430070 7 24 ) ) -RRB- 10_1101-2021_02_08_430070 7 25 and and CC 10_1101-2021_02_08_430070 7 26 N6-methyladenine N6-methyladenine NNP 10_1101-2021_02_08_430070 7 27 ( ( -LRB- 10_1101-2021_02_08_430070 7 28 6mA 6mA NNP 10_1101-2021_02_08_430070 7 29 ) ) -RRB- 10_1101-2021_02_08_430070 7 30 motifs motif NNS 10_1101-2021_02_08_430070 7 31 . . . 10_1101-2021_02_08_430070 8 1 We -PRON- PRP 10_1101-2021_02_08_430070 8 2 propose propose VBP 10_1101-2021_02_08_430070 8 3 a a DT 10_1101-2021_02_08_430070 8 4 refined refined JJ 10_1101-2021_02_08_430070 8 5 BERT BERT NNP 10_1101-2021_02_08_430070 8 6 with with IN 10_1101-2021_02_08_430070 8 7 relative relative JJ 10_1101-2021_02_08_430070 8 8 position position NN 10_1101-2021_02_08_430070 8 9 representation representation NN 10_1101-2021_02_08_430070 8 10 and and CC 10_1101-2021_02_08_430070 8 11 center center NN 10_1101-2021_02_08_430070 8 12 hidden hide VBN 10_1101-2021_02_08_430070 8 13 units units NNPS 10_1101-2021_02_08_430070 8 14 concatenation concatenation NNP 10_1101-2021_02_08_430070 8 15 , , , 10_1101-2021_02_08_430070 8 16 which which WDT 10_1101-2021_02_08_430070 8 17 takes take VBZ 10_1101-2021_02_08_430070 8 18 account account NN 10_1101-2021_02_08_430070 8 19 of of IN 10_1101-2021_02_08_430070 8 20 task task NN 10_1101-2021_02_08_430070 8 21 - - HYPH 10_1101-2021_02_08_430070 8 22 specific specific JJ 10_1101-2021_02_08_430070 8 23 characters character NNS 10_1101-2021_02_08_430070 8 24 into into IN 10_1101-2021_02_08_430070 8 25 modeling modeling NN 10_1101-2021_02_08_430070 8 26 . . . 10_1101-2021_02_08_430070 9 1 We -PRON- PRP 10_1101-2021_02_08_430070 9 2 perform perform VBP 10_1101-2021_02_08_430070 9 3 systematic systematic JJ 10_1101-2021_02_08_430070 9 4 evaluations evaluation NNS 10_1101-2021_02_08_430070 9 5 in in IN 10_1101-2021_02_08_430070 9 6 - - HYPH 10_1101-2021_02_08_430070 9 7 sample sample NN 10_1101-2021_02_08_430070 9 8 and and CC 10_1101-2021_02_08_430070 9 9 cross cross NN 10_1101-2021_02_08_430070 9 10 - - NN 10_1101-2021_02_08_430070 9 11 sample sample NN 10_1101-2021_02_08_430070 9 12 . . . 10_1101-2021_02_08_430070 10 1 The the DT 10_1101-2021_02_08_430070 10 2 experiment experiment NN 10_1101-2021_02_08_430070 10 3 results result NNS 10_1101-2021_02_08_430070 10 4 show show VBP 10_1101-2021_02_08_430070 10 5 that that IN 10_1101-2021_02_08_430070 10 6 the the DT 10_1101-2021_02_08_430070 10 7 refined refined JJ 10_1101-2021_02_08_430070 10 8 BERT BERT NNP 10_1101-2021_02_08_430070 10 9 model model NN 10_1101-2021_02_08_430070 10 10 can can MD 10_1101-2021_02_08_430070 10 11 achieve achieve VB 10_1101-2021_02_08_430070 10 12 competitive competitive JJ 10_1101-2021_02_08_430070 10 13 or or CC 10_1101-2021_02_08_430070 10 14 even even RB 10_1101-2021_02_08_430070 10 15 better well JJR 10_1101-2021_02_08_430070 10 16 results result NNS 10_1101-2021_02_08_430070 10 17 than than IN 10_1101-2021_02_08_430070 10 18 the the DT 10_1101-2021_02_08_430070 10 19 state state NN 10_1101-2021_02_08_430070 10 20 - - HYPH 10_1101-2021_02_08_430070 10 21 of of IN 10_1101-2021_02_08_430070 10 22 - - HYPH 10_1101-2021_02_08_430070 10 23 the the DT 10_1101-2021_02_08_430070 10 24 - - HYPH 10_1101-2021_02_08_430070 10 25 art art NN 10_1101-2021_02_08_430070 10 26 biRNN birnn NN 10_1101-2021_02_08_430070 10 27 model model NN 10_1101-2021_02_08_430070 10 28 , , , 10_1101-2021_02_08_430070 10 29 while while IN 10_1101-2021_02_08_430070 10 30 the the DT 10_1101-2021_02_08_430070 10 31 model model NN 10_1101-2021_02_08_430070 10 32 inference inference NN 10_1101-2021_02_08_430070 10 33 speed speed NN 10_1101-2021_02_08_430070 10 34 is be VBZ 10_1101-2021_02_08_430070 10 35 about about RB 10_1101-2021_02_08_430070 10 36 6x 6x CD 10_1101-2021_02_08_430070 10 37 faster fast RBR 10_1101-2021_02_08_430070 10 38 . . . 10_1101-2021_02_08_430070 11 1 Besides besides RB 10_1101-2021_02_08_430070 11 2 , , , 10_1101-2021_02_08_430070 11 3 on on IN 10_1101-2021_02_08_430070 11 4 the the DT 10_1101-2021_02_08_430070 11 5 cross cross JJ 10_1101-2021_02_08_430070 11 6 - - JJ 10_1101-2021_02_08_430070 11 7 sample sample JJ 10_1101-2021_02_08_430070 11 8 evaluation evaluation NN 10_1101-2021_02_08_430070 11 9 of of IN 10_1101-2021_02_08_430070 11 10 datasets dataset NNS 10_1101-2021_02_08_430070 11 11 from from IN 10_1101-2021_02_08_430070 11 12 the the DT 10_1101-2021_02_08_430070 11 13 different different JJ 10_1101-2021_02_08_430070 11 14 research research NN 10_1101-2021_02_08_430070 11 15 groups group NNS 10_1101-2021_02_08_430070 11 16 , , , 10_1101-2021_02_08_430070 11 17 BERT BERT NNP 10_1101-2021_02_08_430070 11 18 models model NNS 10_1101-2021_02_08_430070 11 19 demonstrate demonstrate VBP 10_1101-2021_02_08_430070 11 20 a a DT 10_1101-2021_02_08_430070 11 21 good good JJ 10_1101-2021_02_08_430070 11 22 generalization generalization NN 10_1101-2021_02_08_430070 11 23 performance performance NN 10_1101-2021_02_08_430070 11 24 . . . 10_1101-2021_02_08_430070 12 1 Availability availability NN 10_1101-2021_02_08_430070 12 2 : : : 10_1101-2021_02_08_430070 12 3 The the DT 10_1101-2021_02_08_430070 12 4 source source NN 10_1101-2021_02_08_430070 12 5 code code NN 10_1101-2021_02_08_430070 12 6 and and CC 10_1101-2021_02_08_430070 12 7 data datum NNS 10_1101-2021_02_08_430070 12 8 are be VBP 10_1101-2021_02_08_430070 12 9 available available JJ 10_1101-2021_02_08_430070 12 10 at at IN 10_1101-2021_02_08_430070 12 11 https://github.com/yaozhong/methBERT https://github.com/yaozhong/methBERT NNP 10_1101-2021_02_08_430070 12 12 Contact:yaozhong@ims.u-tokyo.ac.jp Contact:yaozhong@ims.u-tokyo.ac.jp NNP 10_1101-2021_02_08_430070 12 13 1 1 CD 10_1101-2021_02_08_430070 12 14 Introduction introduction NN 10_1101-2021_02_08_430070 12 15 Methylation Methylation NNP 10_1101-2021_02_08_430070 12 16 of of IN 10_1101-2021_02_08_430070 12 17 DNA DNA NNP 10_1101-2021_02_08_430070 12 18 / / SYM 10_1101-2021_02_08_430070 12 19 RNA RNA NNP 10_1101-2021_02_08_430070 12 20 / / SYM 10_1101-2021_02_08_430070 12 21 histone histone NN 10_1101-2021_02_08_430070 12 22 is be VBZ 10_1101-2021_02_08_430070 12 23 commonly commonly RB 10_1101-2021_02_08_430070 12 24 observed observe VBN 10_1101-2021_02_08_430070 12 25 in in IN 10_1101-2021_02_08_430070 12 26 developmental developmental JJ 10_1101-2021_02_08_430070 12 27 disorders disorder NNS 10_1101-2021_02_08_430070 12 28 , , , 10_1101-2021_02_08_430070 12 29 aging age VBG 10_1101-2021_02_08_430070 12 30 , , , 10_1101-2021_02_08_430070 12 31 and and CC 10_1101-2021_02_08_430070 12 32 genomic genomic JJ 10_1101-2021_02_08_430070 12 33 disease disease NN 10_1101-2021_02_08_430070 12 34 , , , 10_1101-2021_02_08_430070 12 35 such such JJ 10_1101-2021_02_08_430070 12 36 as as IN 10_1101-2021_02_08_430070 12 37 cancer cancer NN 10_1101-2021_02_08_430070 12 38 . . . 10_1101-2021_02_08_430070 13 1 Fast fast RB 10_1101-2021_02_08_430070 13 2 and and CC 10_1101-2021_02_08_430070 13 3 accurately accurately RB 10_1101-2021_02_08_430070 13 4 detecting detect VBG 10_1101-2021_02_08_430070 13 5 methylation methylation NN 10_1101-2021_02_08_430070 13 6 status status NN 10_1101-2021_02_08_430070 13 7 has have VBZ 10_1101-2021_02_08_430070 13 8 a a DT 10_1101-2021_02_08_430070 13 9 fundamental fundamental JJ 10_1101-2021_02_08_430070 13 10 requirement requirement NN 10_1101-2021_02_08_430070 13 11 to to TO 10_1101-2021_02_08_430070 13 12 find find VB 10_1101-2021_02_08_430070 13 13 distinctive distinctive JJ 10_1101-2021_02_08_430070 13 14 biomarkers biomarker NNS 10_1101-2021_02_08_430070 13 15 for for IN 10_1101-2021_02_08_430070 13 16 aging age VBG 10_1101-2021_02_08_430070 13 17 / / SYM 10_1101-2021_02_08_430070 13 18 disease disease NN 10_1101-2021_02_08_430070 13 19 profiling profiling NN 10_1101-2021_02_08_430070 13 20 . . . 10_1101-2021_02_08_430070 14 1 For for IN 10_1101-2021_02_08_430070 14 2 a a DT 10_1101-2021_02_08_430070 14 3 virome virome NN 10_1101-2021_02_08_430070 14 4 / / SYM 10_1101-2021_02_08_430070 14 5 metagenome metagenome NN 10_1101-2021_02_08_430070 14 6 study study NN 10_1101-2021_02_08_430070 14 7 , , , 10_1101-2021_02_08_430070 14 8 quick quick JJ 10_1101-2021_02_08_430070 14 9 and and CC 10_1101-2021_02_08_430070 14 10 accurate accurate JJ 10_1101-2021_02_08_430070 14 11 epi epi JJ 10_1101-2021_02_08_430070 14 12 - - HYPH 10_1101-2021_02_08_430070 14 13 transcriptome transcriptome NNP 10_1101-2021_02_08_430070 14 14 detection detection NN 10_1101-2021_02_08_430070 14 15 also also RB 10_1101-2021_02_08_430070 14 16 plays play VBZ 10_1101-2021_02_08_430070 14 17 an an DT 10_1101-2021_02_08_430070 14 18 important important JJ 10_1101-2021_02_08_430070 14 19 role role NN 10_1101-2021_02_08_430070 14 20 in in IN 10_1101-2021_02_08_430070 14 21 understanding understand VBG 10_1101-2021_02_08_430070 14 22 unseen unseen JJ 10_1101-2021_02_08_430070 14 23 strains strain NNS 10_1101-2021_02_08_430070 14 24 ( ( -LRB- 10_1101-2021_02_08_430070 14 25 Kim Kim NNP 10_1101-2021_02_08_430070 14 26 et et FW 10_1101-2021_02_08_430070 14 27 al al NNP 10_1101-2021_02_08_430070 14 28 . . NNP 10_1101-2021_02_08_430070 14 29 , , , 10_1101-2021_02_08_430070 14 30 2020 2020 CD 10_1101-2021_02_08_430070 14 31 ) ) -RRB- 10_1101-2021_02_08_430070 14 32 . . . 10_1101-2021_02_08_430070 15 1 One one CD 10_1101-2021_02_08_430070 15 2 commonly commonly RB 10_1101-2021_02_08_430070 15 3 used use VBN 10_1101-2021_02_08_430070 15 4 DNA dna NN 10_1101-2021_02_08_430070 15 5 methylation methylation NN 10_1101-2021_02_08_430070 15 6 detection detection NN 10_1101-2021_02_08_430070 15 7 approach approach NN 10_1101-2021_02_08_430070 15 8 is be VBZ 10_1101-2021_02_08_430070 15 9 Whole whole JJ 10_1101-2021_02_08_430070 15 10 - - HYPH 10_1101-2021_02_08_430070 15 11 Genome genome JJ 10_1101-2021_02_08_430070 15 12 Bisulfite Bisulfite NNP 10_1101-2021_02_08_430070 15 13 Sequencing Sequencing NNP 10_1101-2021_02_08_430070 15 14 ( ( -LRB- 10_1101-2021_02_08_430070 15 15 WGBS WGBS NNP 10_1101-2021_02_08_430070 15 16 ) ) -RRB- 10_1101-2021_02_08_430070 15 17 . . . 10_1101-2021_02_08_430070 16 1 To to TO 10_1101-2021_02_08_430070 16 2 detect detect VB 10_1101-2021_02_08_430070 16 3 modified modify VBN 10_1101-2021_02_08_430070 16 4 bases basis NNS 10_1101-2021_02_08_430070 16 5 , , , 10_1101-2021_02_08_430070 16 6 WGBS WGBS NNP 10_1101-2021_02_08_430070 16 7 first first RB 10_1101-2021_02_08_430070 16 8 takes take VBZ 10_1101-2021_02_08_430070 16 9 sodium sodium NN 10_1101-2021_02_08_430070 16 10 bisulfite bisulfite NNP 10_1101-2021_02_08_430070 16 11 conversion conversion NN 10_1101-2021_02_08_430070 16 12 before before IN 10_1101-2021_02_08_430070 16 13 sequencing sequencing NN 10_1101-2021_02_08_430070 16 14 . . . 10_1101-2021_02_08_430070 17 1 As as IN 10_1101-2021_02_08_430070 17 2 the the DT 10_1101-2021_02_08_430070 17 3 pre pre JJ 10_1101-2021_02_08_430070 17 4 - - JJ 10_1101-2021_02_08_430070 17 5 chemical chemical JJ 10_1101-2021_02_08_430070 17 6 bisulfite bisulfite NN 10_1101-2021_02_08_430070 17 7 conversion conversion NN 10_1101-2021_02_08_430070 17 8 is be VBZ 10_1101-2021_02_08_430070 17 9 a a DT 10_1101-2021_02_08_430070 17 10 relatively relatively RB 10_1101-2021_02_08_430070 17 11 harsh harsh JJ 10_1101-2021_02_08_430070 17 12 process process NN 10_1101-2021_02_08_430070 17 13 , , , 10_1101-2021_02_08_430070 17 14 it -PRON- PRP 10_1101-2021_02_08_430070 17 15 makes make VBZ 10_1101-2021_02_08_430070 17 16 DNA dna NN 10_1101-2021_02_08_430070 17 17 sequences sequence NNS 10_1101-2021_02_08_430070 17 18 more more RBR 10_1101-2021_02_08_430070 17 19 fragmental fragmental JJ 10_1101-2021_02_08_430070 17 20 and and CC 10_1101-2021_02_08_430070 17 21 a a DT 10_1101-2021_02_08_430070 17 22 large large JJ 10_1101-2021_02_08_430070 17 23 amount amount NN 10_1101-2021_02_08_430070 17 24 of of IN 10_1101-2021_02_08_430070 17 25 DNA dna NN 10_1101-2021_02_08_430070 17 26 is be VBZ 10_1101-2021_02_08_430070 17 27 usually usually RB 10_1101-2021_02_08_430070 17 28 required require VBN 10_1101-2021_02_08_430070 17 29 . . . 10_1101-2021_02_08_430070 18 1 Also also RB 10_1101-2021_02_08_430070 18 2 , , , 10_1101-2021_02_08_430070 18 3 limited limit VBN 10_1101-2021_02_08_430070 18 4 to to IN 10_1101-2021_02_08_430070 18 5 the the DT 10_1101-2021_02_08_430070 18 6 read read VBN 10_1101-2021_02_08_430070 18 7 length length NN 10_1101-2021_02_08_430070 18 8 , , , 10_1101-2021_02_08_430070 18 9 it -PRON- PRP 10_1101-2021_02_08_430070 18 10 is be VBZ 10_1101-2021_02_08_430070 18 11 difficult difficult JJ 10_1101-2021_02_08_430070 18 12 to to TO 10_1101-2021_02_08_430070 18 13 align align VB 10_1101-2021_02_08_430070 18 14 short short JJ 10_1101-2021_02_08_430070 18 15 reads read NNS 10_1101-2021_02_08_430070 18 16 in in IN 10_1101-2021_02_08_430070 18 17 low low JJ 10_1101-2021_02_08_430070 18 18 - - HYPH 10_1101-2021_02_08_430070 18 19 complex complex NN 10_1101-2021_02_08_430070 18 20 regions region NNS 10_1101-2021_02_08_430070 18 21 and and CC 10_1101-2021_02_08_430070 18 22 analyze analyze VB 10_1101-2021_02_08_430070 18 23 methylation methylation NN 10_1101-2021_02_08_430070 18 24 patterns pattern NNS 10_1101-2021_02_08_430070 18 25 in in IN 10_1101-2021_02_08_430070 18 26 a a DT 10_1101-2021_02_08_430070 18 27 long- long- JJ 10_1101-2021_02_08_430070 18 28 range range NN 10_1101-2021_02_08_430070 18 29 . . . 10_1101-2021_02_08_430070 19 1 The the DT 10_1101-2021_02_08_430070 19 2 data datum NNS 10_1101-2021_02_08_430070 19 3 processing processing NN 10_1101-2021_02_08_430070 19 4 of of IN 10_1101-2021_02_08_430070 19 5 WGBS WGBS NNP 10_1101-2021_02_08_430070 19 6 is be VBZ 10_1101-2021_02_08_430070 19 7 sophisticated sophisticated JJ 10_1101-2021_02_08_430070 19 8 and and CC 10_1101-2021_02_08_430070 19 9 time time NN 10_1101-2021_02_08_430070 19 10 - - HYPH 10_1101-2021_02_08_430070 19 11 consuming consume VBG 10_1101-2021_02_08_430070 19 12 . . . 10_1101-2021_02_08_430070 20 1 Various various JJ 10_1101-2021_02_08_430070 20 2 biases bias NNS 10_1101-2021_02_08_430070 20 3 ( ( -LRB- 10_1101-2021_02_08_430070 20 4 e.g. e.g. RB 10_1101-2021_02_08_430070 21 1 GC GC NNP 10_1101-2021_02_08_430070 21 2 and and CC 10_1101-2021_02_08_430070 21 3 fragment fragment NN 10_1101-2021_02_08_430070 21 4 length length NN 10_1101-2021_02_08_430070 21 5 ) ) -RRB- 10_1101-2021_02_08_430070 21 6 including include VBG 10_1101-2021_02_08_430070 21 7 those those DT 10_1101-2021_02_08_430070 21 8 introduced introduce VBN 10_1101-2021_02_08_430070 21 9 by by IN 10_1101-2021_02_08_430070 21 10 bisulfite bisulfite NN 10_1101-2021_02_08_430070 21 11 treatment treatment NN 10_1101-2021_02_08_430070 21 12 are be VBP 10_1101-2021_02_08_430070 21 13 required require VBN 10_1101-2021_02_08_430070 21 14 to to TO 10_1101-2021_02_08_430070 21 15 be be VB 10_1101-2021_02_08_430070 21 16 dealt deal VBN 10_1101-2021_02_08_430070 21 17 with with IN 10_1101-2021_02_08_430070 21 18 in in IN 10_1101-2021_02_08_430070 21 19 the the DT 10_1101-2021_02_08_430070 21 20 data datum NNS 10_1101-2021_02_08_430070 21 21 analysis analysis NN 10_1101-2021_02_08_430070 21 22 . . . 10_1101-2021_02_08_430070 22 1 WGBS WGBS NNP 10_1101-2021_02_08_430070 22 2 can can MD 10_1101-2021_02_08_430070 22 3 only only RB 10_1101-2021_02_08_430070 22 4 be be VB 10_1101-2021_02_08_430070 22 5 used use VBN 10_1101-2021_02_08_430070 22 6 for for IN 10_1101-2021_02_08_430070 22 7 DNA dna NN 10_1101-2021_02_08_430070 22 8 samples sample NNS 10_1101-2021_02_08_430070 22 9 , , , 10_1101-2021_02_08_430070 22 10 which which WDT 10_1101-2021_02_08_430070 22 11 limits limit VBZ 10_1101-2021_02_08_430070 22 12 its -PRON- PRP$ 10_1101-2021_02_08_430070 22 13 application application NN 10_1101-2021_02_08_430070 22 14 of of IN 10_1101-2021_02_08_430070 22 15 detecting detect VBG 10_1101-2021_02_08_430070 22 16 RNA RNA NNP 10_1101-2021_02_08_430070 22 17 methylation methylation NN 10_1101-2021_02_08_430070 22 18 . . . 10_1101-2021_02_08_430070 23 1 Single single JJ 10_1101-2021_02_08_430070 23 2 - - HYPH 10_1101-2021_02_08_430070 23 3 molecule molecule NN 10_1101-2021_02_08_430070 23 4 sequencing sequencing NN 10_1101-2021_02_08_430070 23 5 ( ( -LRB- 10_1101-2021_02_08_430070 23 6 e.g. e.g. RB 10_1101-2021_02_08_430070 23 7 , , , 10_1101-2021_02_08_430070 23 8 PacBio PacBio NNP 10_1101-2021_02_08_430070 23 9 and and CC 10_1101-2021_02_08_430070 23 10 Nanopore Nanopore NNP 10_1101-2021_02_08_430070 23 11 ) ) -RRB- 10_1101-2021_02_08_430070 23 12 provides provide VBZ 10_1101-2021_02_08_430070 23 13 a a DT 10_1101-2021_02_08_430070 23 14 promising promising JJ 10_1101-2021_02_08_430070 23 15 approach approach NN 10_1101-2021_02_08_430070 23 16 through through IN 10_1101-2021_02_08_430070 23 17 detecting detect VBG 10_1101-2021_02_08_430070 23 18 abnormal abnormal JJ 10_1101-2021_02_08_430070 23 19 signals signal NNS 10_1101-2021_02_08_430070 23 20 in in IN 10_1101-2021_02_08_430070 23 21 target target NN 10_1101-2021_02_08_430070 23 22 motif motif NN 10_1101-2021_02_08_430070 23 23 regions region NNS 10_1101-2021_02_08_430070 23 24 , , , 10_1101-2021_02_08_430070 23 25 as as IN 10_1101-2021_02_08_430070 23 26 modified modify VBN 10_1101-2021_02_08_430070 23 27 bases basis NNS 10_1101-2021_02_08_430070 23 28 usually usually RB 10_1101-2021_02_08_430070 23 29 have have VBP 10_1101-2021_02_08_430070 23 30 different different JJ 10_1101-2021_02_08_430070 23 31 current current JJ 10_1101-2021_02_08_430070 23 32 signals signal NNS 10_1101-2021_02_08_430070 23 33 . . . 10_1101-2021_02_08_430070 24 1 Compared compare VBN 10_1101-2021_02_08_430070 24 2 with with IN 10_1101-2021_02_08_430070 24 3 the the DT 10_1101-2021_02_08_430070 24 4 sodium sodium NN 10_1101-2021_02_08_430070 24 5 bisulfite bisulfite NN 10_1101-2021_02_08_430070 24 6 approach approach NN 10_1101-2021_02_08_430070 24 7 , , , 10_1101-2021_02_08_430070 24 8 no no DT 10_1101-2021_02_08_430070 24 9 extra extra JJ 10_1101-2021_02_08_430070 24 10 chemical chemical NN 10_1101-2021_02_08_430070 24 11 treatment treatment NN 10_1101-2021_02_08_430070 24 12 is be VBZ 10_1101-2021_02_08_430070 24 13 required require VBN 10_1101-2021_02_08_430070 24 14 , , , 10_1101-2021_02_08_430070 24 15 which which WDT 10_1101-2021_02_08_430070 24 16 helps help VBZ 10_1101-2021_02_08_430070 24 17 to to TO 10_1101-2021_02_08_430070 24 18 reduce reduce VB 10_1101-2021_02_08_430070 24 19 potential potential JJ 10_1101-2021_02_08_430070 24 20 biases bias NNS 10_1101-2021_02_08_430070 24 21 . . . 10_1101-2021_02_08_430070 25 1 Currently currently RB 10_1101-2021_02_08_430070 25 2 exist exist VBP 10_1101-2021_02_08_430070 25 3 nanopore nanopore JJ 10_1101-2021_02_08_430070 25 4 methylation methylation NN 10_1101-2021_02_08_430070 25 5 detection detection NN 10_1101-2021_02_08_430070 25 6 methods method NNS 10_1101-2021_02_08_430070 25 7 can can MD 10_1101-2021_02_08_430070 25 8 be be VB 10_1101-2021_02_08_430070 25 9 categorized categorize VBN 10_1101-2021_02_08_430070 25 10 into into IN 10_1101-2021_02_08_430070 25 11 two two CD 10_1101-2021_02_08_430070 25 12 types type NNS 10_1101-2021_02_08_430070 25 13 . . . 10_1101-2021_02_08_430070 26 1 One one CD 10_1101-2021_02_08_430070 26 2 is be VBZ 10_1101-2021_02_08_430070 26 3 testing testing NN 10_1101-2021_02_08_430070 26 4 - - HYPH 10_1101-2021_02_08_430070 26 5 based base VBN 10_1101-2021_02_08_430070 26 6 ( ( -LRB- 10_1101-2021_02_08_430070 26 7 e.g e.g NNP 10_1101-2021_02_08_430070 26 8 . . NNP 10_1101-2021_02_08_430070 26 9 ,Tombo ,Tombo , 10_1101-2021_02_08_430070 26 10 ( ( -LRB- 10_1101-2021_02_08_430070 26 11 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 26 12 et et FW 10_1101-2021_02_08_430070 26 13 al al NNP 10_1101-2021_02_08_430070 26 14 . . NNP 10_1101-2021_02_08_430070 26 15 , , , 10_1101-2021_02_08_430070 26 16 2016 2016 CD 10_1101-2021_02_08_430070 26 17 ) ) -RRB- 10_1101-2021_02_08_430070 26 18 ) ) -RRB- 10_1101-2021_02_08_430070 26 19 , , , 10_1101-2021_02_08_430070 26 20 the the DT 10_1101-2021_02_08_430070 26 21 other other JJ 10_1101-2021_02_08_430070 26 22 is be VBZ 10_1101-2021_02_08_430070 26 23 model model NN 10_1101-2021_02_08_430070 26 24 - - HYPH 10_1101-2021_02_08_430070 26 25 based base VBN 10_1101-2021_02_08_430070 26 26 ( ( -LRB- 10_1101-2021_02_08_430070 26 27 e.g. e.g. RB 10_1101-2021_02_08_430070 26 28 , , , 10_1101-2021_02_08_430070 26 29 nanopolish nanopolish JJ 10_1101-2021_02_08_430070 26 30 ( ( -LRB- 10_1101-2021_02_08_430070 26 31 Simpson Simpson NNP 10_1101-2021_02_08_430070 26 32 et et FW 10_1101-2021_02_08_430070 26 33 al al NNP 10_1101-2021_02_08_430070 26 34 . . NNP 10_1101-2021_02_08_430070 26 35 , , , 10_1101-2021_02_08_430070 26 36 2017 2017 CD 10_1101-2021_02_08_430070 26 37 ) ) -RRB- 10_1101-2021_02_08_430070 26 38 , , , 10_1101-2021_02_08_430070 26 39 deepMod(Liu deepMod(Liu NNP 10_1101-2021_02_08_430070 26 40 et et NNP 10_1101-2021_02_08_430070 26 41 al al NNP 10_1101-2021_02_08_430070 26 42 . . NNP 10_1101-2021_02_08_430070 26 43 , , , 10_1101-2021_02_08_430070 26 44 2019 2019 CD 10_1101-2021_02_08_430070 26 45 ) ) -RRB- 10_1101-2021_02_08_430070 26 46 and and CC 10_1101-2021_02_08_430070 26 47 deepSignal deepsignal JJ 10_1101-2021_02_08_430070 26 48 ( ( -LRB- 10_1101-2021_02_08_430070 26 49 Ni Ni NNP 10_1101-2021_02_08_430070 26 50 et et FW 10_1101-2021_02_08_430070 26 51 al al NNP 10_1101-2021_02_08_430070 26 52 . . NNP 10_1101-2021_02_08_430070 26 53 , , , 10_1101-2021_02_08_430070 26 54 2019 2019 CD 10_1101-2021_02_08_430070 26 55 ) ) -RRB- 10_1101-2021_02_08_430070 26 56 ) ) -RRB- 10_1101-2021_02_08_430070 26 57 . . . 10_1101-2021_02_08_430070 27 1 A a DT 10_1101-2021_02_08_430070 27 2 testing- testing- NN 10_1101-2021_02_08_430070 27 3 based base VBN 10_1101-2021_02_08_430070 27 4 approach approach NN 10_1101-2021_02_08_430070 27 5 performs perform VBZ 10_1101-2021_02_08_430070 27 6 statistical statistical JJ 10_1101-2021_02_08_430070 27 7 test test NN 10_1101-2021_02_08_430070 27 8 on on IN 10_1101-2021_02_08_430070 27 9 paired pair VBN 10_1101-2021_02_08_430070 27 10 signals signal NNS 10_1101-2021_02_08_430070 27 11 ( ( -LRB- 10_1101-2021_02_08_430070 27 12 candidate candidate NN 10_1101-2021_02_08_430070 27 13 and and CC 10_1101-2021_02_08_430070 27 14 reference reference NN 10_1101-2021_02_08_430070 27 15 ) ) -RRB- 10_1101-2021_02_08_430070 27 16 and and CC 10_1101-2021_02_08_430070 27 17 does do VBZ 10_1101-2021_02_08_430070 27 18 not not RB 10_1101-2021_02_08_430070 27 19 require require VB 10_1101-2021_02_08_430070 27 20 any any DT 10_1101-2021_02_08_430070 27 21 training training NN 10_1101-2021_02_08_430070 27 22 process process NN 10_1101-2021_02_08_430070 27 23 . . . 10_1101-2021_02_08_430070 28 1 Also also RB 10_1101-2021_02_08_430070 28 2 , , , 10_1101-2021_02_08_430070 28 3 it -PRON- PRP 10_1101-2021_02_08_430070 28 4 can can MD 10_1101-2021_02_08_430070 28 5 be be VB 10_1101-2021_02_08_430070 28 6 applied apply VBN 10_1101-2021_02_08_430070 28 7 for for IN 10_1101-2021_02_08_430070 28 8 any any DT 10_1101-2021_02_08_430070 28 9 chemical chemical NN 10_1101-2021_02_08_430070 28 10 modifications modification NNS 10_1101-2021_02_08_430070 28 11 . . . 10_1101-2021_02_08_430070 29 1 A a DT 10_1101-2021_02_08_430070 29 2 model model NN 10_1101-2021_02_08_430070 29 3 - - HYPH 10_1101-2021_02_08_430070 29 4 based base VBN 10_1101-2021_02_08_430070 29 5 approach approach NN 10_1101-2021_02_08_430070 29 6 trains train VBZ 10_1101-2021_02_08_430070 29 7 a a DT 10_1101-2021_02_08_430070 29 8 model model NN 10_1101-2021_02_08_430070 29 9 1 1 CD 10_1101-2021_02_08_430070 29 10 .license .license . 10_1101-2021_02_08_430070 29 11 CC cc NN 10_1101-2021_02_08_430070 29 12 - - HYPH 10_1101-2021_02_08_430070 29 13 BY BY NNP 10_1101-2021_02_08_430070 29 14 - - HYPH 10_1101-2021_02_08_430070 29 15 NC NC NNP 10_1101-2021_02_08_430070 29 16 - - HYPH 10_1101-2021_02_08_430070 29 17 ND ND NNP 10_1101-2021_02_08_430070 29 18 4.0 4.0 CD 10_1101-2021_02_08_430070 29 19 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 29 20 review review NN 10_1101-2021_02_08_430070 29 21 ) ) -RRB- 10_1101-2021_02_08_430070 29 22 is be VBZ 10_1101-2021_02_08_430070 29 23 the the DT 10_1101-2021_02_08_430070 29 24 author author NN 10_1101-2021_02_08_430070 29 25 / / SYM 10_1101-2021_02_08_430070 29 26 funder funder NN 10_1101-2021_02_08_430070 29 27 , , , 10_1101-2021_02_08_430070 29 28 who who WP 10_1101-2021_02_08_430070 29 29 has have VBZ 10_1101-2021_02_08_430070 29 30 granted grant VBN 10_1101-2021_02_08_430070 29 31 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 29 32 a a DT 10_1101-2021_02_08_430070 29 33 license license NN 10_1101-2021_02_08_430070 29 34 to to TO 10_1101-2021_02_08_430070 29 35 display display VB 10_1101-2021_02_08_430070 29 36 the the DT 10_1101-2021_02_08_430070 29 37 preprint preprint NN 10_1101-2021_02_08_430070 29 38 in in IN 10_1101-2021_02_08_430070 29 39 perpetuity perpetuity NN 10_1101-2021_02_08_430070 29 40 . . . 10_1101-2021_02_08_430070 30 1 It -PRON- PRP 10_1101-2021_02_08_430070 30 2 is be VBZ 10_1101-2021_02_08_430070 30 3 made make VBN 10_1101-2021_02_08_430070 30 4 available available JJ 10_1101-2021_02_08_430070 30 5 under under IN 10_1101-2021_02_08_430070 30 6 a a DT 10_1101-2021_02_08_430070 30 7 The the DT 10_1101-2021_02_08_430070 30 8 copyright copyright NN 10_1101-2021_02_08_430070 30 9 holder holder NN 10_1101-2021_02_08_430070 30 10 for for IN 10_1101-2021_02_08_430070 30 11 this this DT 10_1101-2021_02_08_430070 30 12 preprint preprint NN 10_1101-2021_02_08_430070 30 13 ( ( -LRB- 10_1101-2021_02_08_430070 30 14 which which WDT 10_1101-2021_02_08_430070 30 15 was be VBD 10_1101-2021_02_08_430070 30 16 not not RB 10_1101-2021_02_08_430070 30 17 certified certify VBN 10_1101-2021_02_08_430070 30 18 bythis bythis DT 10_1101-2021_02_08_430070 30 19 version version NN 10_1101-2021_02_08_430070 30 20 posted post VBD 10_1101-2021_02_08_430070 30 21 February February NNP 10_1101-2021_02_08_430070 30 22 10 10 CD 10_1101-2021_02_08_430070 30 23 , , , 10_1101-2021_02_08_430070 30 24 2021 2021 CD 10_1101-2021_02_08_430070 30 25 . . . 10_1101-2021_02_08_430070 30 26 ; ; : 10_1101-2021_02_08_430070 30 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 30 28 : : : 10_1101-2021_02_08_430070 30 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 30 30 preprint preprint NN 10_1101-2021_02_08_430070 30 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 30 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 30 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 30 34 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 35 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 36 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 37 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 38 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 39 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 40 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 41 ✐ ✐ NNP 10_1101-2021_02_08_430070 30 42 2 2 CD 10_1101-2021_02_08_430070 30 43 Zhang Zhang NNP 10_1101-2021_02_08_430070 30 44 et et NNP 10_1101-2021_02_08_430070 30 45 al al NNP 10_1101-2021_02_08_430070 30 46 . . . 10_1101-2021_02_08_430070 31 1 x x SYM 10_1101-2021_02_08_430070 31 2 x x SYM 10_1101-2021_02_08_430070 31 3 x x NNP 10_1101-2021_02_08_430070 31 4 x x NN 10_1101-2021_02_08_430070 31 5 x1 x1 NN 10_1101-2021_02_08_430070 31 6 2 2 CD 10_1101-2021_02_08_430070 31 7 i i NN 10_1101-2021_02_08_430070 31 8 n-1 n-1 NNP 10_1101-2021_02_08_430070 31 9 n n NN 10_1101-2021_02_08_430070 31 10 ...... ...... NFP 10_1101-2021_02_08_430070 31 11 ...... ...... NFP 10_1101-2021_02_08_430070 31 12 Embedding embed VBG 10_1101-2021_02_08_430070 31 13 Attention attention NN 10_1101-2021_02_08_430070 31 14 Feed feed NN 10_1101-2021_02_08_430070 31 15 Forwad Forwad NNP 10_1101-2021_02_08_430070 31 16 Attention Attention NNP 10_1101-2021_02_08_430070 31 17 Feed feed NN 10_1101-2021_02_08_430070 31 18 Forwad Forwad NNP 10_1101-2021_02_08_430070 31 19 Attention Attention NNP 10_1101-2021_02_08_430070 31 20 Linear Linear NNP 10_1101-2021_02_08_430070 31 21 Methylation Methylation NNP 10_1101-2021_02_08_430070 31 22 status status NN 10_1101-2021_02_08_430070 31 23 Feed Feed NNP 10_1101-2021_02_08_430070 31 24 Forwad Forwad NNP 10_1101-2021_02_08_430070 31 25 Linear Linear NNP 10_1101-2021_02_08_430070 31 26 C c NN 10_1101-2021_02_08_430070 31 27 G g NN 10_1101-2021_02_08_430070 31 28 5mcC 5mcc CD 10_1101-2021_02_08_430070 31 29 A a NN 10_1101-2021_02_08_430070 31 30 T t NN 10_1101-2021_02_08_430070 31 31 A a NN 10_1101-2021_02_08_430070 31 32 5 5 CD 10_1101-2021_02_08_430070 31 33 ’ ’ NN 10_1101-2021_02_08_430070 31 34 3 3 CD 10_1101-2021_02_08_430070 31 35 ’ ' '' 10_1101-2021_02_08_430070 31 36 DNA dna NN 10_1101-2021_02_08_430070 31 37 sequence sequence NN 10_1101-2021_02_08_430070 31 38 x x SYM 10_1101-2021_02_08_430070 31 39 ix ix NNP 10_1101-2021_02_08_430070 31 40 i i NNP 10_1101-2021_02_08_430070 31 41 - - HYPH 10_1101-2021_02_08_430070 31 42 k k NNP 10_1101-2021_02_08_430070 31 43 x x NN 10_1101-2021_02_08_430070 31 44 i+k i+k NNP 10_1101-2021_02_08_430070 31 45 W W NNP 10_1101-2021_02_08_430070 31 46 V V NNP 10_1101-2021_02_08_430070 31 47 − − NNP 10_1101-2021_02_08_430070 31 48 k k NN 10_1101-2021_02_08_430070 31 49 , , , 10_1101-2021_02_08_430070 31 50 W W NNP 10_1101-2021_02_08_430070 31 51 K K NNP 10_1101-2021_02_08_430070 31 52 − − NNP 10_1101-2021_02_08_430070 31 53 k k NN 10_1101-2021_02_08_430070 31 54 W w NN 10_1101-2021_02_08_430070 31 55 V v NN 10_1101-2021_02_08_430070 31 56 k k NN 10_1101-2021_02_08_430070 31 57 , , , 10_1101-2021_02_08_430070 31 58 W w NN 10_1101-2021_02_08_430070 31 59 K K NNP 10_1101-2021_02_08_430070 31 60 k k NN 10_1101-2021_02_08_430070 31 61 ...... ...... NFP 10_1101-2021_02_08_430070 31 62 ............ ............ NFP 10_1101-2021_02_08_430070 31 63 ...... ...... . 10_1101-2021_02_08_430070 32 1 Attention attention NN 10_1101-2021_02_08_430070 32 2 Feed feed NN 10_1101-2021_02_08_430070 32 3 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 4 Attention Attention NNP 10_1101-2021_02_08_430070 32 5 Feed feed NN 10_1101-2021_02_08_430070 32 6 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 7 Attention Attention NNP 10_1101-2021_02_08_430070 32 8 Concate Concate NNP 10_1101-2021_02_08_430070 32 9 Methylation Methylation NNP 10_1101-2021_02_08_430070 32 10 status status NN 10_1101-2021_02_08_430070 32 11 Feed Feed NNP 10_1101-2021_02_08_430070 32 12 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 13 Attention attention NN 10_1101-2021_02_08_430070 32 14 Feed feed NN 10_1101-2021_02_08_430070 32 15 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 16 Attention attention NN 10_1101-2021_02_08_430070 32 17 Feed feed NN 10_1101-2021_02_08_430070 32 18 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 19 Attention Attention NNP 10_1101-2021_02_08_430070 32 20 Feed Feed NNP 10_1101-2021_02_08_430070 32 21 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 22 Linear Linear NNP 10_1101-2021_02_08_430070 32 23 ( ( -LRB- 10_1101-2021_02_08_430070 32 24 tanh tanh NN 10_1101-2021_02_08_430070 32 25 ) ) -RRB- 10_1101-2021_02_08_430070 32 26 Attention attention NN 10_1101-2021_02_08_430070 32 27 Feed Feed NNP 10_1101-2021_02_08_430070 32 28 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 29 Attention attention NN 10_1101-2021_02_08_430070 32 30 Feed feed NN 10_1101-2021_02_08_430070 32 31 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 32 Attention Attention NNP 10_1101-2021_02_08_430070 32 33 Feed Feed NNP 10_1101-2021_02_08_430070 32 34 Forwad Forwad NNP 10_1101-2021_02_08_430070 32 35 relative relative JJ 10_1101-2021_02_08_430070 32 36 position position NN 10_1101-2021_02_08_430070 32 37 constraint constraint NN 10_1101-2021_02_08_430070 32 38 window window NN 10_1101-2021_02_08_430070 32 39 x1 x1 NNP 10_1101-2021_02_08_430070 32 40 xn xn NNP 10_1101-2021_02_08_430070 32 41 ( ( -LRB- 10_1101-2021_02_08_430070 32 42 a a DT 10_1101-2021_02_08_430070 32 43 ) ) -RRB- 10_1101-2021_02_08_430070 32 44 . . . 10_1101-2021_02_08_430070 33 1 Basic basic JJ 10_1101-2021_02_08_430070 33 2 BERT BERT NNP 10_1101-2021_02_08_430070 33 3 for for IN 10_1101-2021_02_08_430070 33 4 methyaltion methyaltion NN 10_1101-2021_02_08_430070 33 5 detection detection NN 10_1101-2021_02_08_430070 33 6 ( ( -LRB- 10_1101-2021_02_08_430070 33 7 b b NN 10_1101-2021_02_08_430070 33 8 ) ) -RRB- 10_1101-2021_02_08_430070 33 9 . . . 10_1101-2021_02_08_430070 34 1 Refined refined JJ 10_1101-2021_02_08_430070 34 2 BERT BERT NNP 10_1101-2021_02_08_430070 34 3 with with IN 10_1101-2021_02_08_430070 34 4 relative relative JJ 10_1101-2021_02_08_430070 34 5 position position NN 10_1101-2021_02_08_430070 34 6 representation representation NN 10_1101-2021_02_08_430070 34 7 Fig Fig NNP 10_1101-2021_02_08_430070 34 8 . . . 10_1101-2021_02_08_430070 35 1 1 1 LS 10_1101-2021_02_08_430070 35 2 : : : 10_1101-2021_02_08_430070 35 3 Basic Basic NNP 10_1101-2021_02_08_430070 35 4 BERT BERT NNP 10_1101-2021_02_08_430070 35 5 ’s ’s POS 10_1101-2021_02_08_430070 35 6 and and CC 10_1101-2021_02_08_430070 35 7 refined refine VBD 10_1101-2021_02_08_430070 35 8 BERT BERT NNP 10_1101-2021_02_08_430070 35 9 ’s ’s POS 10_1101-2021_02_08_430070 35 10 model model NN 10_1101-2021_02_08_430070 35 11 structure structure NN 10_1101-2021_02_08_430070 35 12 used use VBN 10_1101-2021_02_08_430070 35 13 for for IN 10_1101-2021_02_08_430070 35 14 methylation methylation NN 10_1101-2021_02_08_430070 35 15 detection detection NN 10_1101-2021_02_08_430070 35 16 . . . 10_1101-2021_02_08_430070 36 1 Compared compare VBN 10_1101-2021_02_08_430070 36 2 with with IN 10_1101-2021_02_08_430070 36 3 the the DT 10_1101-2021_02_08_430070 36 4 basic basic JJ 10_1101-2021_02_08_430070 36 5 BERT BERT NNP 10_1101-2021_02_08_430070 36 6 , , , 10_1101-2021_02_08_430070 36 7 enhanced enhanced JJ 10_1101-2021_02_08_430070 36 8 constraints constraint NNS 10_1101-2021_02_08_430070 36 9 and and CC 10_1101-2021_02_08_430070 36 10 additional additional JJ 10_1101-2021_02_08_430070 36 11 edges edge NNS 10_1101-2021_02_08_430070 36 12 are be VBP 10_1101-2021_02_08_430070 36 13 highlighted highlight VBN 10_1101-2021_02_08_430070 36 14 in in IN 10_1101-2021_02_08_430070 36 15 red red JJ 10_1101-2021_02_08_430070 36 16 color color NN 10_1101-2021_02_08_430070 36 17 . . . 10_1101-2021_02_08_430070 37 1 on on IN 10_1101-2021_02_08_430070 37 2 known know VBN 10_1101-2021_02_08_430070 37 3 chemical chemical NN 10_1101-2021_02_08_430070 37 4 modifications modification NNS 10_1101-2021_02_08_430070 37 5 and and CC 10_1101-2021_02_08_430070 37 6 makes make VBZ 10_1101-2021_02_08_430070 37 7 predictions prediction NNS 10_1101-2021_02_08_430070 37 8 whether whether IN 10_1101-2021_02_08_430070 37 9 a a DT 10_1101-2021_02_08_430070 37 10 signal signal JJ 10_1101-2021_02_08_430070 37 11 sequence sequence NN 10_1101-2021_02_08_430070 37 12 contains contain VBZ 10_1101-2021_02_08_430070 37 13 methylation methylation NN 10_1101-2021_02_08_430070 37 14 signals signal NNS 10_1101-2021_02_08_430070 37 15 or or CC 10_1101-2021_02_08_430070 37 16 not not RB 10_1101-2021_02_08_430070 37 17 . . . 10_1101-2021_02_08_430070 38 1 Sequential sequential JJ 10_1101-2021_02_08_430070 38 2 models model NNS 10_1101-2021_02_08_430070 38 3 , , , 10_1101-2021_02_08_430070 38 4 such such JJ 10_1101-2021_02_08_430070 38 5 as as IN 10_1101-2021_02_08_430070 38 6 hidden hide VBN 10_1101-2021_02_08_430070 38 7 Markov Markov NNP 10_1101-2021_02_08_430070 38 8 model model NN 10_1101-2021_02_08_430070 38 9 ( ( -LRB- 10_1101-2021_02_08_430070 38 10 HMM HMM NNP 10_1101-2021_02_08_430070 38 11 ) ) -RRB- 10_1101-2021_02_08_430070 38 12 and and CC 10_1101-2021_02_08_430070 38 13 bidirectional bidirectional JJ 10_1101-2021_02_08_430070 38 14 recurrent recurrent NN 10_1101-2021_02_08_430070 38 15 neural neural JJ 10_1101-2021_02_08_430070 38 16 network network NN 10_1101-2021_02_08_430070 38 17 ( ( -LRB- 10_1101-2021_02_08_430070 38 18 biRNN biRNN NNP 10_1101-2021_02_08_430070 38 19 ) ) -RRB- 10_1101-2021_02_08_430070 38 20 , , , 10_1101-2021_02_08_430070 38 21 are be VBP 10_1101-2021_02_08_430070 38 22 commonly commonly RB 10_1101-2021_02_08_430070 38 23 used use VBN 10_1101-2021_02_08_430070 38 24 in in IN 10_1101-2021_02_08_430070 38 25 the the DT 10_1101-2021_02_08_430070 38 26 model model NN 10_1101-2021_02_08_430070 38 27 - - HYPH 10_1101-2021_02_08_430070 38 28 based base VBN 10_1101-2021_02_08_430070 38 29 approach approach NN 10_1101-2021_02_08_430070 38 30 . . . 10_1101-2021_02_08_430070 39 1 Although although IN 10_1101-2021_02_08_430070 39 2 model model NN 10_1101-2021_02_08_430070 39 3 - - HYPH 10_1101-2021_02_08_430070 39 4 based base VBN 10_1101-2021_02_08_430070 39 5 approaches approach NNS 10_1101-2021_02_08_430070 39 6 have have VBP 10_1101-2021_02_08_430070 39 7 already already RB 10_1101-2021_02_08_430070 39 8 achieved achieve VBN 10_1101-2021_02_08_430070 39 9 competitive competitive JJ 10_1101-2021_02_08_430070 39 10 results result NNS 10_1101-2021_02_08_430070 39 11 , , , 10_1101-2021_02_08_430070 39 12 the the DT 10_1101-2021_02_08_430070 39 13 sequential sequential JJ 10_1101-2021_02_08_430070 39 14 computational computational JJ 10_1101-2021_02_08_430070 39 15 order order NN 10_1101-2021_02_08_430070 39 16 makes make VBZ 10_1101-2021_02_08_430070 39 17 them -PRON- PRP 10_1101-2021_02_08_430070 39 18 difficult difficult JJ 10_1101-2021_02_08_430070 39 19 to to TO 10_1101-2021_02_08_430070 39 20 be be VB 10_1101-2021_02_08_430070 39 21 optimized optimize VBN 10_1101-2021_02_08_430070 39 22 in in IN 10_1101-2021_02_08_430070 39 23 parallel parallel NN 10_1101-2021_02_08_430070 39 24 for for IN 10_1101-2021_02_08_430070 39 25 fast fast JJ 10_1101-2021_02_08_430070 39 26 inference inference NN 10_1101-2021_02_08_430070 39 27 . . . 10_1101-2021_02_08_430070 40 1 Meanwhile meanwhile RB 10_1101-2021_02_08_430070 40 2 , , , 10_1101-2021_02_08_430070 40 3 finding find VBG 10_1101-2021_02_08_430070 40 4 discriminative discriminative JJ 10_1101-2021_02_08_430070 40 5 signal signal NN 10_1101-2021_02_08_430070 40 6 patterns pattern NNS 10_1101-2021_02_08_430070 40 7 for for IN 10_1101-2021_02_08_430070 40 8 identifying identify VBG 10_1101-2021_02_08_430070 40 9 methylated methylate VBN 10_1101-2021_02_08_430070 40 10 signals signal NNS 10_1101-2021_02_08_430070 40 11 is be VBZ 10_1101-2021_02_08_430070 40 12 also also RB 10_1101-2021_02_08_430070 40 13 important important JJ 10_1101-2021_02_08_430070 40 14 for for IN 10_1101-2021_02_08_430070 40 15 developing develop VBG 10_1101-2021_02_08_430070 40 16 novel novel JJ 10_1101-2021_02_08_430070 40 17 detection detection NN 10_1101-2021_02_08_430070 40 18 algorithms algorithm NNS 10_1101-2021_02_08_430070 40 19 . . . 10_1101-2021_02_08_430070 41 1 In in IN 10_1101-2021_02_08_430070 41 2 this this DT 10_1101-2021_02_08_430070 41 3 work work NN 10_1101-2021_02_08_430070 41 4 , , , 10_1101-2021_02_08_430070 41 5 based base VBN 10_1101-2021_02_08_430070 41 6 on on IN 10_1101-2021_02_08_430070 41 7 the the DT 10_1101-2021_02_08_430070 41 8 bidirectional bidirectional JJ 10_1101-2021_02_08_430070 41 9 encoder encoder NN 10_1101-2021_02_08_430070 41 10 representations representation NNS 10_1101-2021_02_08_430070 41 11 from from IN 10_1101-2021_02_08_430070 41 12 transformers transformer NNS 10_1101-2021_02_08_430070 41 13 ( ( -LRB- 10_1101-2021_02_08_430070 41 14 BERT BERT NNP 10_1101-2021_02_08_430070 41 15 ) ) -RRB- 10_1101-2021_02_08_430070 41 16 , , , 10_1101-2021_02_08_430070 41 17 we -PRON- PRP 10_1101-2021_02_08_430070 41 18 explore explore VBP 10_1101-2021_02_08_430070 41 19 the the DT 10_1101-2021_02_08_430070 41 20 non non JJ 10_1101-2021_02_08_430070 41 21 - - JJ 10_1101-2021_02_08_430070 41 22 recurrent recurrent JJ 10_1101-2021_02_08_430070 41 23 modeling modeling NN 10_1101-2021_02_08_430070 41 24 approach approach NN 10_1101-2021_02_08_430070 41 25 for for IN 10_1101-2021_02_08_430070 41 26 nanopore nanopore JJ 10_1101-2021_02_08_430070 41 27 methylation methylation NN 10_1101-2021_02_08_430070 41 28 detection detection NN 10_1101-2021_02_08_430070 41 29 . . . 10_1101-2021_02_08_430070 42 1 Though though IN 10_1101-2021_02_08_430070 42 2 analyzing analyze VBG 10_1101-2021_02_08_430070 42 3 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 42 4 sequences sequence NNS 10_1101-2021_02_08_430070 42 5 with with IN 10_1101-2021_02_08_430070 42 6 both both CC 10_1101-2021_02_08_430070 42 7 methylated methylated JJ 10_1101-2021_02_08_430070 42 8 and and CC 10_1101-2021_02_08_430070 42 9 unmethylated unmethylated JJ 10_1101-2021_02_08_430070 42 10 signals signal NNS 10_1101-2021_02_08_430070 42 11 , , , 10_1101-2021_02_08_430070 42 12 we -PRON- PRP 10_1101-2021_02_08_430070 42 13 profile profile VBP 10_1101-2021_02_08_430070 42 14 positional positional JJ 10_1101-2021_02_08_430070 42 15 signal signal NN 10_1101-2021_02_08_430070 42 16 - - HYPH 10_1101-2021_02_08_430070 42 17 shift shift NN 10_1101-2021_02_08_430070 42 18 for for IN 10_1101-2021_02_08_430070 42 19 different different JJ 10_1101-2021_02_08_430070 42 20 motifs motif NNS 10_1101-2021_02_08_430070 42 21 and and CC 10_1101-2021_02_08_430070 42 22 methyltransferases methyltransferase NNS 10_1101-2021_02_08_430070 42 23 . . . 10_1101-2021_02_08_430070 43 1 We -PRON- PRP 10_1101-2021_02_08_430070 43 2 find find VBP 10_1101-2021_02_08_430070 43 3 ±3bp ±3bp IN 10_1101-2021_02_08_430070 43 4 region region NN 10_1101-2021_02_08_430070 43 5 surrounding surround VBG 10_1101-2021_02_08_430070 43 6 the the DT 10_1101-2021_02_08_430070 43 7 center center NN 10_1101-2021_02_08_430070 43 8 methylation methylation NN 10_1101-2021_02_08_430070 43 9 candidate candidate NN 10_1101-2021_02_08_430070 43 10 shows show VBZ 10_1101-2021_02_08_430070 43 11 significant significant JJ 10_1101-2021_02_08_430070 43 12 signal signal NN 10_1101-2021_02_08_430070 43 13 - - HYPH 10_1101-2021_02_08_430070 43 14 shifts shift NNS 10_1101-2021_02_08_430070 43 15 . . . 10_1101-2021_02_08_430070 44 1 Different different JJ 10_1101-2021_02_08_430070 44 2 methylation methylation NN 10_1101-2021_02_08_430070 44 3 types type NNS 10_1101-2021_02_08_430070 44 4 , , , 10_1101-2021_02_08_430070 44 5 such such JJ 10_1101-2021_02_08_430070 44 6 as as IN 10_1101-2021_02_08_430070 44 7 5-methylcytosine 5-methylcytosine CD 10_1101-2021_02_08_430070 44 8 ( ( -LRB- 10_1101-2021_02_08_430070 44 9 5mC 5mc CD 10_1101-2021_02_08_430070 44 10 ) ) -RRB- 10_1101-2021_02_08_430070 44 11 and and CC 10_1101-2021_02_08_430070 44 12 N6-methyladenine N6-methyladenine NNP 10_1101-2021_02_08_430070 44 13 ( ( -LRB- 10_1101-2021_02_08_430070 44 14 6mA 6ma CD 10_1101-2021_02_08_430070 44 15 ) ) -RRB- 10_1101-2021_02_08_430070 44 16 , , , 10_1101-2021_02_08_430070 44 17 also also RB 10_1101-2021_02_08_430070 44 18 demonstrate demonstrate VBP 10_1101-2021_02_08_430070 44 19 different different JJ 10_1101-2021_02_08_430070 44 20 signal signal JJ 10_1101-2021_02_08_430070 44 21 - - HYPH 10_1101-2021_02_08_430070 44 22 shift shift NN 10_1101-2021_02_08_430070 44 23 patterns pattern NNS 10_1101-2021_02_08_430070 44 24 . . . 10_1101-2021_02_08_430070 45 1 We -PRON- PRP 10_1101-2021_02_08_430070 45 2 hence hence RB 10_1101-2021_02_08_430070 45 3 propose propose VBP 10_1101-2021_02_08_430070 45 4 a a DT 10_1101-2021_02_08_430070 45 5 refined refined JJ 10_1101-2021_02_08_430070 45 6 BERT BERT NNP 10_1101-2021_02_08_430070 45 7 model model NN 10_1101-2021_02_08_430070 45 8 to to TO 10_1101-2021_02_08_430070 45 9 take take VB 10_1101-2021_02_08_430070 45 10 account account NN 10_1101-2021_02_08_430070 45 11 of of IN 10_1101-2021_02_08_430070 45 12 signal signal JJ 10_1101-2021_02_08_430070 45 13 - - HYPH 10_1101-2021_02_08_430070 45 14 shift shift NN 10_1101-2021_02_08_430070 45 15 patterns pattern NNS 10_1101-2021_02_08_430070 45 16 in in IN 10_1101-2021_02_08_430070 45 17 the the DT 10_1101-2021_02_08_430070 45 18 modeling modeling NN 10_1101-2021_02_08_430070 45 19 . . . 10_1101-2021_02_08_430070 46 1 We -PRON- PRP 10_1101-2021_02_08_430070 46 2 evaluate evaluate VBP 10_1101-2021_02_08_430070 46 3 the the DT 10_1101-2021_02_08_430070 46 4 proposed propose VBN 10_1101-2021_02_08_430070 46 5 methods method NNS 10_1101-2021_02_08_430070 46 6 on on IN 10_1101-2021_02_08_430070 46 7 the the DT 10_1101-2021_02_08_430070 46 8 publicly publicly RB 10_1101-2021_02_08_430070 46 9 available available JJ 10_1101-2021_02_08_430070 46 10 benchmark benchmark NN 10_1101-2021_02_08_430070 46 11 dataset dataset NN 10_1101-2021_02_08_430070 46 12 . . . 10_1101-2021_02_08_430070 47 1 In in IN 10_1101-2021_02_08_430070 47 2 both both DT 10_1101-2021_02_08_430070 47 3 in in IN 10_1101-2021_02_08_430070 47 4 - - HYPH 10_1101-2021_02_08_430070 47 5 sample sample NN 10_1101-2021_02_08_430070 47 6 and and CC 10_1101-2021_02_08_430070 47 7 cross cross JJ 10_1101-2021_02_08_430070 47 8 - - JJ 10_1101-2021_02_08_430070 47 9 sample sample JJ 10_1101-2021_02_08_430070 47 10 evaluation evaluation NN 10_1101-2021_02_08_430070 47 11 , , , 10_1101-2021_02_08_430070 47 12 the the DT 10_1101-2021_02_08_430070 47 13 proposed propose VBN 10_1101-2021_02_08_430070 47 14 refined refine VBN 10_1101-2021_02_08_430070 47 15 BERT BERT NNP 10_1101-2021_02_08_430070 47 16 model model NN 10_1101-2021_02_08_430070 47 17 achieves achieve VBZ 10_1101-2021_02_08_430070 47 18 a a DT 10_1101-2021_02_08_430070 47 19 competitive competitive JJ 10_1101-2021_02_08_430070 47 20 or or CC 10_1101-2021_02_08_430070 47 21 even even RB 10_1101-2021_02_08_430070 47 22 better well JJR 10_1101-2021_02_08_430070 47 23 result result VB 10_1101-2021_02_08_430070 47 24 when when WRB 10_1101-2021_02_08_430070 47 25 compared compare VBN 10_1101-2021_02_08_430070 47 26 with with IN 10_1101-2021_02_08_430070 47 27 the the DT 10_1101-2021_02_08_430070 47 28 state state NN 10_1101-2021_02_08_430070 47 29 - - HYPH 10_1101-2021_02_08_430070 47 30 of of IN 10_1101-2021_02_08_430070 47 31 - - HYPH 10_1101-2021_02_08_430070 47 32 the the DT 10_1101-2021_02_08_430070 47 33 - - HYPH 10_1101-2021_02_08_430070 47 34 art art NN 10_1101-2021_02_08_430070 47 35 biRNN birnn NN 10_1101-2021_02_08_430070 47 36 model model NN 10_1101-2021_02_08_430070 47 37 , , , 10_1101-2021_02_08_430070 47 38 while while IN 10_1101-2021_02_08_430070 47 39 its -PRON- PRP$ 10_1101-2021_02_08_430070 47 40 model model NN 10_1101-2021_02_08_430070 47 41 inference inference NN 10_1101-2021_02_08_430070 47 42 speed speed NN 10_1101-2021_02_08_430070 47 43 is be VBZ 10_1101-2021_02_08_430070 47 44 about about RB 10_1101-2021_02_08_430070 47 45 6x 6x CD 10_1101-2021_02_08_430070 47 46 faster fast RBR 10_1101-2021_02_08_430070 47 47 . . . 10_1101-2021_02_08_430070 48 1 In in IN 10_1101-2021_02_08_430070 48 2 the the DT 10_1101-2021_02_08_430070 48 3 cross cross JJ 10_1101-2021_02_08_430070 48 4 - - JJ 10_1101-2021_02_08_430070 48 5 sample sample JJ 10_1101-2021_02_08_430070 48 6 evaluation evaluation NN 10_1101-2021_02_08_430070 48 7 , , , 10_1101-2021_02_08_430070 48 8 BERT BERT NNP 10_1101-2021_02_08_430070 48 9 models model NNS 10_1101-2021_02_08_430070 48 10 also also RB 10_1101-2021_02_08_430070 48 11 demonstrate demonstrate VBP 10_1101-2021_02_08_430070 48 12 their -PRON- PRP$ 10_1101-2021_02_08_430070 48 13 transfer transfer NN 10_1101-2021_02_08_430070 48 14 learning learn VBG 10_1101-2021_02_08_430070 48 15 ability ability NN 10_1101-2021_02_08_430070 48 16 across across IN 10_1101-2021_02_08_430070 48 17 different different JJ 10_1101-2021_02_08_430070 48 18 datasets dataset NNS 10_1101-2021_02_08_430070 48 19 . . . 10_1101-2021_02_08_430070 49 1 2 2 CD 10_1101-2021_02_08_430070 49 2 Methods Methods NNPS 10_1101-2021_02_08_430070 49 3 In in IN 10_1101-2021_02_08_430070 49 4 this this DT 10_1101-2021_02_08_430070 49 5 section section NN 10_1101-2021_02_08_430070 49 6 , , , 10_1101-2021_02_08_430070 49 7 we -PRON- PRP 10_1101-2021_02_08_430070 49 8 introduce introduce VBP 10_1101-2021_02_08_430070 49 9 BERT BERT NNP 10_1101-2021_02_08_430070 49 10 ( ( -LRB- 10_1101-2021_02_08_430070 49 11 Devlin Devlin NNP 10_1101-2021_02_08_430070 49 12 et et NNP 10_1101-2021_02_08_430070 49 13 al al NNP 10_1101-2021_02_08_430070 49 14 . . NNP 10_1101-2021_02_08_430070 49 15 , , , 10_1101-2021_02_08_430070 49 16 2018 2018 CD 10_1101-2021_02_08_430070 49 17 ) ) -RRB- 10_1101-2021_02_08_430070 49 18 and and CC 10_1101-2021_02_08_430070 49 19 refined refine VBN 10_1101-2021_02_08_430070 49 20 BERT BERT NNP 10_1101-2021_02_08_430070 49 21 applied apply VBN 10_1101-2021_02_08_430070 49 22 for for IN 10_1101-2021_02_08_430070 49 23 nanopore nanopore JJ 10_1101-2021_02_08_430070 49 24 methylation methylation NN 10_1101-2021_02_08_430070 49 25 detection detection NN 10_1101-2021_02_08_430070 49 26 . . . 10_1101-2021_02_08_430070 50 1 The the DT 10_1101-2021_02_08_430070 50 2 BERT BERT NNP 10_1101-2021_02_08_430070 50 3 is be VBZ 10_1101-2021_02_08_430070 50 4 built build VBN 10_1101-2021_02_08_430070 50 5 on on IN 10_1101-2021_02_08_430070 50 6 the the DT 10_1101-2021_02_08_430070 50 7 base base NN 10_1101-2021_02_08_430070 50 8 of of IN 10_1101-2021_02_08_430070 50 9 Transformer Transformer NNP 10_1101-2021_02_08_430070 50 10 ( ( -LRB- 10_1101-2021_02_08_430070 50 11 Vaswani Vaswani NNP 10_1101-2021_02_08_430070 50 12 et et FW 10_1101-2021_02_08_430070 50 13 al al NNP 10_1101-2021_02_08_430070 50 14 . . NNP 10_1101-2021_02_08_430070 50 15 , , , 10_1101-2021_02_08_430070 50 16 2017 2017 CD 10_1101-2021_02_08_430070 50 17 ) ) -RRB- 10_1101-2021_02_08_430070 50 18 , , , 10_1101-2021_02_08_430070 50 19 which which WDT 10_1101-2021_02_08_430070 50 20 employs employ VBZ 10_1101-2021_02_08_430070 50 21 self self NN 10_1101-2021_02_08_430070 50 22 - - HYPH 10_1101-2021_02_08_430070 50 23 attention attention NN 10_1101-2021_02_08_430070 50 24 as as IN 10_1101-2021_02_08_430070 50 25 the the DT 10_1101-2021_02_08_430070 50 26 core core NN 10_1101-2021_02_08_430070 50 27 module module NN 10_1101-2021_02_08_430070 50 28 in in IN 10_1101-2021_02_08_430070 50 29 its -PRON- PRP$ 10_1101-2021_02_08_430070 50 30 stacked stack VBN 10_1101-2021_02_08_430070 50 31 network network NN 10_1101-2021_02_08_430070 50 32 structure structure NN 10_1101-2021_02_08_430070 50 33 . . . 10_1101-2021_02_08_430070 51 1 It -PRON- PRP 10_1101-2021_02_08_430070 51 2 is be VBZ 10_1101-2021_02_08_430070 51 3 proposed propose VBN 10_1101-2021_02_08_430070 51 4 to to TO 10_1101-2021_02_08_430070 51 5 replace replace VB 10_1101-2021_02_08_430070 51 6 recurrent recurrent NN 10_1101-2021_02_08_430070 51 7 and and CC 10_1101-2021_02_08_430070 51 8 convolution convolution NN 10_1101-2021_02_08_430070 51 9 operation operation NN 10_1101-2021_02_08_430070 51 10 with with IN 10_1101-2021_02_08_430070 51 11 purely purely RB 10_1101-2021_02_08_430070 51 12 attention attention NN 10_1101-2021_02_08_430070 51 13 mechanisms mechanism NNS 10_1101-2021_02_08_430070 51 14 . . . 10_1101-2021_02_08_430070 52 1 A a DT 10_1101-2021_02_08_430070 52 2 typical typical JJ 10_1101-2021_02_08_430070 52 3 transformer transformer NN 10_1101-2021_02_08_430070 52 4 network network NN 10_1101-2021_02_08_430070 52 5 consists consist VBZ 10_1101-2021_02_08_430070 52 6 of of IN 10_1101-2021_02_08_430070 52 7 encoding encode VBG 10_1101-2021_02_08_430070 52 8 and and CC 10_1101-2021_02_08_430070 52 9 decoding decode VBG 10_1101-2021_02_08_430070 52 10 module module NN 10_1101-2021_02_08_430070 52 11 . . . 10_1101-2021_02_08_430070 53 1 BERT BERT NNP 10_1101-2021_02_08_430070 53 2 only only RB 10_1101-2021_02_08_430070 53 3 uses use VBZ 10_1101-2021_02_08_430070 53 4 the the DT 10_1101-2021_02_08_430070 53 5 encoding encode VBG 10_1101-2021_02_08_430070 53 6 module module NN 10_1101-2021_02_08_430070 53 7 of of IN 10_1101-2021_02_08_430070 53 8 a a DT 10_1101-2021_02_08_430070 53 9 typical typical JJ 10_1101-2021_02_08_430070 53 10 transformer transformer NN 10_1101-2021_02_08_430070 53 11 for for IN 10_1101-2021_02_08_430070 53 12 pre- pre- NN 10_1101-2021_02_08_430070 53 13 training training NN 10_1101-2021_02_08_430070 53 14 on on IN 10_1101-2021_02_08_430070 53 15 the the DT 10_1101-2021_02_08_430070 53 16 unsupervised unsupervised JJ 10_1101-2021_02_08_430070 53 17 data datum NNS 10_1101-2021_02_08_430070 53 18 . . . 10_1101-2021_02_08_430070 54 1 BERT BERT NNP 10_1101-2021_02_08_430070 54 2 has have VBZ 10_1101-2021_02_08_430070 54 3 achieved achieve VBN 10_1101-2021_02_08_430070 54 4 break break NN 10_1101-2021_02_08_430070 54 5 - - HYPH 10_1101-2021_02_08_430070 54 6 through through RP 10_1101-2021_02_08_430070 54 7 results result NNS 10_1101-2021_02_08_430070 54 8 on on IN 10_1101-2021_02_08_430070 54 9 many many JJ 10_1101-2021_02_08_430070 54 10 natural natural JJ 10_1101-2021_02_08_430070 54 11 language language NN 10_1101-2021_02_08_430070 54 12 understanding understanding NN 10_1101-2021_02_08_430070 54 13 tasks task NNS 10_1101-2021_02_08_430070 54 14 . . . 10_1101-2021_02_08_430070 55 1 In in IN 10_1101-2021_02_08_430070 55 2 this this DT 10_1101-2021_02_08_430070 55 3 work work NN 10_1101-2021_02_08_430070 55 4 , , , 10_1101-2021_02_08_430070 55 5 we -PRON- PRP 10_1101-2021_02_08_430070 55 6 explore explore VBP 10_1101-2021_02_08_430070 55 7 applying apply VBG 10_1101-2021_02_08_430070 55 8 the the DT 10_1101-2021_02_08_430070 55 9 BERT BERT NNP 10_1101-2021_02_08_430070 55 10 model model NN 10_1101-2021_02_08_430070 55 11 for for IN 10_1101-2021_02_08_430070 55 12 the the DT 10_1101-2021_02_08_430070 55 13 nanopore nanopore JJ 10_1101-2021_02_08_430070 55 14 methylation methylation NN 10_1101-2021_02_08_430070 55 15 detection detection NN 10_1101-2021_02_08_430070 55 16 task task NN 10_1101-2021_02_08_430070 55 17 to to TO 10_1101-2021_02_08_430070 55 18 leverage leverage VB 10_1101-2021_02_08_430070 55 19 the the DT 10_1101-2021_02_08_430070 55 20 power power NN 10_1101-2021_02_08_430070 55 21 of of IN 10_1101-2021_02_08_430070 55 22 advanced advanced JJ 10_1101-2021_02_08_430070 55 23 deep deep JJ 10_1101-2021_02_08_430070 55 24 learning learning NN 10_1101-2021_02_08_430070 55 25 models model NNS 10_1101-2021_02_08_430070 55 26 . . . 10_1101-2021_02_08_430070 56 1 2.1 2.1 CD 10_1101-2021_02_08_430070 56 2 BERT BERT NNS 10_1101-2021_02_08_430070 56 3 and and CC 10_1101-2021_02_08_430070 56 4 refined refine VBN 10_1101-2021_02_08_430070 56 5 BERT BERT NNP 10_1101-2021_02_08_430070 56 6 model model NN 10_1101-2021_02_08_430070 56 7 Figure figure NN 10_1101-2021_02_08_430070 56 8 1 1 CD 10_1101-2021_02_08_430070 56 9 shows show VBZ 10_1101-2021_02_08_430070 56 10 the the DT 10_1101-2021_02_08_430070 56 11 model model NN 10_1101-2021_02_08_430070 56 12 structures structure NNS 10_1101-2021_02_08_430070 56 13 of of IN 10_1101-2021_02_08_430070 56 14 BERT BERT NNP 10_1101-2021_02_08_430070 56 15 models model NNS 10_1101-2021_02_08_430070 56 16 used use VBN 10_1101-2021_02_08_430070 56 17 for for IN 10_1101-2021_02_08_430070 56 18 nanopore nanopore JJ 10_1101-2021_02_08_430070 56 19 methylation methylation NN 10_1101-2021_02_08_430070 56 20 detection detection NN 10_1101-2021_02_08_430070 56 21 . . . 10_1101-2021_02_08_430070 57 1 We -PRON- PRP 10_1101-2021_02_08_430070 57 2 explore explore VBP 10_1101-2021_02_08_430070 57 3 two two CD 10_1101-2021_02_08_430070 57 4 types type NNS 10_1101-2021_02_08_430070 57 5 of of IN 10_1101-2021_02_08_430070 57 6 BERT BERT NNP 10_1101-2021_02_08_430070 57 7 models model NNS 10_1101-2021_02_08_430070 57 8 . . . 10_1101-2021_02_08_430070 58 1 One one CD 10_1101-2021_02_08_430070 58 2 is be VBZ 10_1101-2021_02_08_430070 58 3 the the DT 10_1101-2021_02_08_430070 58 4 most most RBS 10_1101-2021_02_08_430070 58 5 commonly commonly RB 10_1101-2021_02_08_430070 58 6 used use VBN 10_1101-2021_02_08_430070 58 7 BERT BERT NNP 10_1101-2021_02_08_430070 58 8 ( ( -LRB- 10_1101-2021_02_08_430070 58 9 Figure figure NN 10_1101-2021_02_08_430070 58 10 1(a 1(a CD 10_1101-2021_02_08_430070 58 11 ) ) -RRB- 10_1101-2021_02_08_430070 58 12 ) ) -RRB- 10_1101-2021_02_08_430070 58 13 , , , 10_1101-2021_02_08_430070 58 14 the the DT 10_1101-2021_02_08_430070 58 15 other other JJ 10_1101-2021_02_08_430070 58 16 is be VBZ 10_1101-2021_02_08_430070 58 17 the the DT 10_1101-2021_02_08_430070 58 18 refined refined JJ 10_1101-2021_02_08_430070 58 19 BERT BERT NNP 10_1101-2021_02_08_430070 58 20 ( ( -LRB- 10_1101-2021_02_08_430070 58 21 Figure figure NN 10_1101-2021_02_08_430070 58 22 1(b 1(b CD 10_1101-2021_02_08_430070 58 23 ) ) -RRB- 10_1101-2021_02_08_430070 58 24 ) ) -RRB- 10_1101-2021_02_08_430070 58 25 , , , 10_1101-2021_02_08_430070 58 26 which which WDT 10_1101-2021_02_08_430070 58 27 is be VBZ 10_1101-2021_02_08_430070 58 28 optimized optimize VBN 10_1101-2021_02_08_430070 58 29 for for IN 10_1101-2021_02_08_430070 58 30 nanopore nanopore JJ 10_1101-2021_02_08_430070 58 31 methylation methylation NN 10_1101-2021_02_08_430070 58 32 detection detection NN 10_1101-2021_02_08_430070 58 33 . . . 10_1101-2021_02_08_430070 59 1 2.1.1 2.1.1 CD 10_1101-2021_02_08_430070 59 2 Embedding embed VBG 10_1101-2021_02_08_430070 59 3 module module NN 10_1101-2021_02_08_430070 59 4 Given Given NNP 10_1101-2021_02_08_430070 59 5 extracted extract VBD 10_1101-2021_02_08_430070 59 6 features feature NNS 10_1101-2021_02_08_430070 59 7 for for IN 10_1101-2021_02_08_430070 59 8 each each DT 10_1101-2021_02_08_430070 59 9 position position NN 10_1101-2021_02_08_430070 59 10 in in IN 10_1101-2021_02_08_430070 59 11 a a DT 10_1101-2021_02_08_430070 59 12 sequence sequence NN 10_1101-2021_02_08_430070 59 13 , , , 10_1101-2021_02_08_430070 59 14 the the DT 10_1101-2021_02_08_430070 59 15 embedding embedding NN 10_1101-2021_02_08_430070 59 16 layer layer NN 10_1101-2021_02_08_430070 59 17 maps map VBZ 10_1101-2021_02_08_430070 59 18 input input NN 10_1101-2021_02_08_430070 59 19 vectors vector NNS 10_1101-2021_02_08_430070 59 20 into into IN 10_1101-2021_02_08_430070 59 21 hidden hide VBN 10_1101-2021_02_08_430070 59 22 spaces space NNS 10_1101-2021_02_08_430070 59 23 . . . 10_1101-2021_02_08_430070 60 1 In in IN 10_1101-2021_02_08_430070 60 2 the the DT 10_1101-2021_02_08_430070 60 3 embedding embedding NN 10_1101-2021_02_08_430070 60 4 layer layer NN 10_1101-2021_02_08_430070 60 5 , , , 10_1101-2021_02_08_430070 60 6 besides besides IN 10_1101-2021_02_08_430070 60 7 event event NN 10_1101-2021_02_08_430070 60 8 embedding embedding NN 10_1101-2021_02_08_430070 60 9 , , , 10_1101-2021_02_08_430070 60 10 positional positional JJ 10_1101-2021_02_08_430070 60 11 embedding embedding NN 10_1101-2021_02_08_430070 60 12 ( ( -LRB- 10_1101-2021_02_08_430070 60 13 PE PE NNP 10_1101-2021_02_08_430070 60 14 ) ) -RRB- 10_1101-2021_02_08_430070 60 15 is be VBZ 10_1101-2021_02_08_430070 60 16 also also RB 10_1101-2021_02_08_430070 60 17 included include VBN 10_1101-2021_02_08_430070 60 18 . . . 10_1101-2021_02_08_430070 61 1 As as IN 10_1101-2021_02_08_430070 61 2 a a DT 10_1101-2021_02_08_430070 61 3 BERT BERT NNP 10_1101-2021_02_08_430070 61 4 is be VBZ 10_1101-2021_02_08_430070 61 5 used use VBN 10_1101-2021_02_08_430070 61 6 to to TO 10_1101-2021_02_08_430070 61 7 learn learn VB 10_1101-2021_02_08_430070 61 8 bidirectional bidirectional JJ 10_1101-2021_02_08_430070 61 9 contextual contextual JJ 10_1101-2021_02_08_430070 61 10 information information NN 10_1101-2021_02_08_430070 61 11 , , , 10_1101-2021_02_08_430070 61 12 positional positional JJ 10_1101-2021_02_08_430070 61 13 information information NN 10_1101-2021_02_08_430070 61 14 is be VBZ 10_1101-2021_02_08_430070 61 15 important important JJ 10_1101-2021_02_08_430070 61 16 in in IN 10_1101-2021_02_08_430070 61 17 the the DT 10_1101-2021_02_08_430070 61 18 modeling modeling NN 10_1101-2021_02_08_430070 61 19 . . . 10_1101-2021_02_08_430070 62 1 The the DT 10_1101-2021_02_08_430070 62 2 original original JJ 10_1101-2021_02_08_430070 62 3 PE PE NNP 10_1101-2021_02_08_430070 62 4 ( ( -LRB- 10_1101-2021_02_08_430070 62 5 Vaswani Vaswani NNP 10_1101-2021_02_08_430070 62 6 et et FW 10_1101-2021_02_08_430070 62 7 al al NNP 10_1101-2021_02_08_430070 62 8 . . NNP 10_1101-2021_02_08_430070 62 9 , , , 10_1101-2021_02_08_430070 62 10 2017 2017 CD 10_1101-2021_02_08_430070 62 11 ) ) -RRB- 10_1101-2021_02_08_430070 62 12 uses use VBZ 10_1101-2021_02_08_430070 62 13 a a DT 10_1101-2021_02_08_430070 62 14 sinusoid sinusoid JJ 10_1101-2021_02_08_430070 62 15 embedding embedding NN 10_1101-2021_02_08_430070 62 16 , , , 10_1101-2021_02_08_430070 62 17 which which WDT 10_1101-2021_02_08_430070 62 18 is be VBZ 10_1101-2021_02_08_430070 62 19 fixed fix VBN 10_1101-2021_02_08_430070 62 20 and and CC 10_1101-2021_02_08_430070 62 21 not not RB 10_1101-2021_02_08_430070 62 22 learnable learnable JJ 10_1101-2021_02_08_430070 62 23 . . . 10_1101-2021_02_08_430070 63 1 PE(pos pe(pos UH 10_1101-2021_02_08_430070 63 2 , , , 10_1101-2021_02_08_430070 63 3 2i 2i NNP 10_1101-2021_02_08_430070 63 4 ) ) -RRB- 10_1101-2021_02_08_430070 63 5 = = NFP 10_1101-2021_02_08_430070 63 6 sin sin NN 10_1101-2021_02_08_430070 63 7 pos pos NNP 10_1101-2021_02_08_430070 63 8 100002i 100002i NNP 10_1101-2021_02_08_430070 63 9 / / SYM 10_1101-2021_02_08_430070 63 10 dmodel dmodel NNP 10_1101-2021_02_08_430070 63 11 PE(pos pe(pos CD 10_1101-2021_02_08_430070 63 12 , , , 10_1101-2021_02_08_430070 63 13 2i 2i NNP 10_1101-2021_02_08_430070 63 14 + + SYM 10_1101-2021_02_08_430070 63 15 1 1 CD 10_1101-2021_02_08_430070 63 16 ) ) -RRB- 10_1101-2021_02_08_430070 63 17 = = NFP 10_1101-2021_02_08_430070 63 18 cos cos NNP 10_1101-2021_02_08_430070 63 19 pos pos NNP 10_1101-2021_02_08_430070 63 20 100002i 100002i NNP 10_1101-2021_02_08_430070 63 21 / / SYM 10_1101-2021_02_08_430070 63 22 dmodel dmodel NN 10_1101-2021_02_08_430070 63 23 , , , 10_1101-2021_02_08_430070 63 24 where where WRB 10_1101-2021_02_08_430070 63 25 pos pos NNP 10_1101-2021_02_08_430070 63 26 is be VBZ 10_1101-2021_02_08_430070 63 27 the the DT 10_1101-2021_02_08_430070 63 28 position position NN 10_1101-2021_02_08_430070 63 29 and and CC 10_1101-2021_02_08_430070 63 30 i i PRP 10_1101-2021_02_08_430070 63 31 is be VBZ 10_1101-2021_02_08_430070 63 32 the the DT 10_1101-2021_02_08_430070 63 33 embedding embedding NN 10_1101-2021_02_08_430070 63 34 dimension dimension NN 10_1101-2021_02_08_430070 63 35 . . . 10_1101-2021_02_08_430070 64 1 For for IN 10_1101-2021_02_08_430070 64 2 any any DT 10_1101-2021_02_08_430070 64 3 fixed fix VBN 10_1101-2021_02_08_430070 64 4 offset offset NN 10_1101-2021_02_08_430070 64 5 k k NN 10_1101-2021_02_08_430070 64 6 , , , 10_1101-2021_02_08_430070 64 7 PEpos+k PEpos+k NNP 10_1101-2021_02_08_430070 64 8 can can MD 10_1101-2021_02_08_430070 64 9 be be VB 10_1101-2021_02_08_430070 64 10 represented represent VBN 10_1101-2021_02_08_430070 64 11 as as IN 10_1101-2021_02_08_430070 64 12 a a DT 10_1101-2021_02_08_430070 64 13 linear linear JJ 10_1101-2021_02_08_430070 64 14 function function NN 10_1101-2021_02_08_430070 64 15 of of IN 10_1101-2021_02_08_430070 64 16 PEpos pepos PRP 10_1101-2021_02_08_430070 64 17 . . . 10_1101-2021_02_08_430070 65 1 According accord VBG 10_1101-2021_02_08_430070 65 2 to to IN 10_1101-2021_02_08_430070 65 3 the the DT 10_1101-2021_02_08_430070 65 4 recent recent JJ 10_1101-2021_02_08_430070 65 5 progress progress NN 10_1101-2021_02_08_430070 65 6 ( ( -LRB- 10_1101-2021_02_08_430070 65 7 Huang Huang NNP 10_1101-2021_02_08_430070 65 8 et et NNP 10_1101-2021_02_08_430070 65 9 al al NNP 10_1101-2021_02_08_430070 65 10 . . NNP 10_1101-2021_02_08_430070 65 11 , , , 10_1101-2021_02_08_430070 65 12 2020 2020 CD 10_1101-2021_02_08_430070 65 13 ) ) -RRB- 10_1101-2021_02_08_430070 65 14 , , , 10_1101-2021_02_08_430070 65 15 learnable learnable JJ 10_1101-2021_02_08_430070 65 16 PE PE NNP 10_1101-2021_02_08_430070 65 17 and and CC 10_1101-2021_02_08_430070 65 18 relative relative JJ 10_1101-2021_02_08_430070 65 19 position position NN 10_1101-2021_02_08_430070 65 20 embedding embedding NN 10_1101-2021_02_08_430070 65 21 can can MD 10_1101-2021_02_08_430070 65 22 help help VB 10_1101-2021_02_08_430070 65 23 to to TO 10_1101-2021_02_08_430070 65 24 further further RB 10_1101-2021_02_08_430070 65 25 improve improve VB 10_1101-2021_02_08_430070 65 26 BERT BERT NNP 10_1101-2021_02_08_430070 65 27 ’s ’s POS 10_1101-2021_02_08_430070 65 28 performances performance NNS 10_1101-2021_02_08_430070 65 29 . . . 10_1101-2021_02_08_430070 66 1 Therefore therefore RB 10_1101-2021_02_08_430070 66 2 , , , 10_1101-2021_02_08_430070 66 3 in in IN 10_1101-2021_02_08_430070 66 4 the the DT 10_1101-2021_02_08_430070 66 5 refined refined JJ 10_1101-2021_02_08_430070 66 6 BERT BERT NNP 10_1101-2021_02_08_430070 66 7 model model NN 10_1101-2021_02_08_430070 66 8 , , , 10_1101-2021_02_08_430070 66 9 we -PRON- PRP 10_1101-2021_02_08_430070 66 10 use use VBP 10_1101-2021_02_08_430070 66 11 learnable learnable JJ 10_1101-2021_02_08_430070 66 12 PE PE NNP 10_1101-2021_02_08_430070 66 13 and and CC 10_1101-2021_02_08_430070 66 14 relative relative JJ 10_1101-2021_02_08_430070 66 15 position position NN 10_1101-2021_02_08_430070 66 16 representation representation NN 10_1101-2021_02_08_430070 66 17 . . . 10_1101-2021_02_08_430070 67 1 The the DT 10_1101-2021_02_08_430070 67 2 learnable learnable JJ 10_1101-2021_02_08_430070 67 3 PE PE NNP 10_1101-2021_02_08_430070 67 4 takes take VBZ 10_1101-2021_02_08_430070 67 5 positional positional JJ 10_1101-2021_02_08_430070 67 6 embedding embed VBG 10_1101-2021_02_08_430070 67 7 vectors vector NNS 10_1101-2021_02_08_430070 67 8 as as IN 10_1101-2021_02_08_430070 67 9 parameters parameter NNS 10_1101-2021_02_08_430070 67 10 , , , 10_1101-2021_02_08_430070 67 11 which which WDT 10_1101-2021_02_08_430070 67 12 are be VBP 10_1101-2021_02_08_430070 67 13 updated update VBN 10_1101-2021_02_08_430070 67 14 during during IN 10_1101-2021_02_08_430070 67 15 the the DT 10_1101-2021_02_08_430070 67 16 learning learning NN 10_1101-2021_02_08_430070 67 17 process process NN 10_1101-2021_02_08_430070 67 18 . . . 10_1101-2021_02_08_430070 68 1 .license .license NNP 10_1101-2021_02_08_430070 68 2 CC CC NNP 10_1101-2021_02_08_430070 68 3 - - HYPH 10_1101-2021_02_08_430070 68 4 BY BY NNP 10_1101-2021_02_08_430070 68 5 - - HYPH 10_1101-2021_02_08_430070 68 6 NC NC NNP 10_1101-2021_02_08_430070 68 7 - - HYPH 10_1101-2021_02_08_430070 68 8 ND ND NNP 10_1101-2021_02_08_430070 68 9 4.0 4.0 CD 10_1101-2021_02_08_430070 68 10 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 68 11 review review NN 10_1101-2021_02_08_430070 68 12 ) ) -RRB- 10_1101-2021_02_08_430070 68 13 is be VBZ 10_1101-2021_02_08_430070 68 14 the the DT 10_1101-2021_02_08_430070 68 15 author author NN 10_1101-2021_02_08_430070 68 16 / / SYM 10_1101-2021_02_08_430070 68 17 funder funder NN 10_1101-2021_02_08_430070 68 18 , , , 10_1101-2021_02_08_430070 68 19 who who WP 10_1101-2021_02_08_430070 68 20 has have VBZ 10_1101-2021_02_08_430070 68 21 granted grant VBN 10_1101-2021_02_08_430070 68 22 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 68 23 a a DT 10_1101-2021_02_08_430070 68 24 license license NN 10_1101-2021_02_08_430070 68 25 to to TO 10_1101-2021_02_08_430070 68 26 display display VB 10_1101-2021_02_08_430070 68 27 the the DT 10_1101-2021_02_08_430070 68 28 preprint preprint NN 10_1101-2021_02_08_430070 68 29 in in IN 10_1101-2021_02_08_430070 68 30 perpetuity perpetuity NN 10_1101-2021_02_08_430070 68 31 . . . 10_1101-2021_02_08_430070 69 1 It -PRON- PRP 10_1101-2021_02_08_430070 69 2 is be VBZ 10_1101-2021_02_08_430070 69 3 made make VBN 10_1101-2021_02_08_430070 69 4 available available JJ 10_1101-2021_02_08_430070 69 5 under under IN 10_1101-2021_02_08_430070 69 6 a a DT 10_1101-2021_02_08_430070 69 7 The the DT 10_1101-2021_02_08_430070 69 8 copyright copyright NN 10_1101-2021_02_08_430070 69 9 holder holder NN 10_1101-2021_02_08_430070 69 10 for for IN 10_1101-2021_02_08_430070 69 11 this this DT 10_1101-2021_02_08_430070 69 12 preprint preprint NN 10_1101-2021_02_08_430070 69 13 ( ( -LRB- 10_1101-2021_02_08_430070 69 14 which which WDT 10_1101-2021_02_08_430070 69 15 was be VBD 10_1101-2021_02_08_430070 69 16 not not RB 10_1101-2021_02_08_430070 69 17 certified certify VBN 10_1101-2021_02_08_430070 69 18 bythis bythis DT 10_1101-2021_02_08_430070 69 19 version version NN 10_1101-2021_02_08_430070 69 20 posted post VBD 10_1101-2021_02_08_430070 69 21 February February NNP 10_1101-2021_02_08_430070 69 22 10 10 CD 10_1101-2021_02_08_430070 69 23 , , , 10_1101-2021_02_08_430070 69 24 2021 2021 CD 10_1101-2021_02_08_430070 69 25 . . . 10_1101-2021_02_08_430070 69 26 ; ; : 10_1101-2021_02_08_430070 69 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 69 28 : : : 10_1101-2021_02_08_430070 69 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 69 30 preprint preprint NN 10_1101-2021_02_08_430070 69 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 69 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 69 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 69 34 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 35 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 36 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 37 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 38 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 39 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 40 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 41 ✐ ✐ NNP 10_1101-2021_02_08_430070 69 42 BERT BERT NNP 10_1101-2021_02_08_430070 69 43 for for IN 10_1101-2021_02_08_430070 69 44 nanopore nanopore JJ 10_1101-2021_02_08_430070 69 45 methylation methylation NN 10_1101-2021_02_08_430070 69 46 detection detection NN 10_1101-2021_02_08_430070 69 47 3 3 CD 10_1101-2021_02_08_430070 69 48 2.1.2 2.1.2 CD 10_1101-2021_02_08_430070 69 49 Self self NN 10_1101-2021_02_08_430070 69 50 - - HYPH 10_1101-2021_02_08_430070 69 51 attention attention NN 10_1101-2021_02_08_430070 69 52 module module NN 10_1101-2021_02_08_430070 69 53 Following follow VBG 10_1101-2021_02_08_430070 69 54 the the DT 10_1101-2021_02_08_430070 69 55 embedding embedding NN 10_1101-2021_02_08_430070 69 56 layer layer NN 10_1101-2021_02_08_430070 69 57 , , , 10_1101-2021_02_08_430070 69 58 there there EX 10_1101-2021_02_08_430070 69 59 are be VBP 10_1101-2021_02_08_430070 69 60 three three CD 10_1101-2021_02_08_430070 69 61 stacked stack VBN 10_1101-2021_02_08_430070 69 62 transformer transformer NN 10_1101-2021_02_08_430070 69 63 blocks block NNS 10_1101-2021_02_08_430070 69 64 . . . 10_1101-2021_02_08_430070 70 1 Each each DT 10_1101-2021_02_08_430070 70 2 transformer transformer NN 10_1101-2021_02_08_430070 70 3 block block NN 10_1101-2021_02_08_430070 70 4 consists consist VBZ 10_1101-2021_02_08_430070 70 5 of of IN 10_1101-2021_02_08_430070 70 6 a a DT 10_1101-2021_02_08_430070 70 7 multi multi JJ 10_1101-2021_02_08_430070 70 8 - - JJ 10_1101-2021_02_08_430070 70 9 head head JJ 10_1101-2021_02_08_430070 70 10 self self NN 10_1101-2021_02_08_430070 70 11 - - HYPH 10_1101-2021_02_08_430070 70 12 attention attention NN 10_1101-2021_02_08_430070 70 13 layer layer NN 10_1101-2021_02_08_430070 70 14 and and CC 10_1101-2021_02_08_430070 70 15 position position NN 10_1101-2021_02_08_430070 70 16 - - HYPH 10_1101-2021_02_08_430070 70 17 wise wise JJ 10_1101-2021_02_08_430070 70 18 fully fully RB 10_1101-2021_02_08_430070 70 19 connected connect VBN 10_1101-2021_02_08_430070 70 20 feed feed NN 10_1101-2021_02_08_430070 70 21 - - HYPH 10_1101-2021_02_08_430070 70 22 forward forward NN 10_1101-2021_02_08_430070 70 23 network network NN 10_1101-2021_02_08_430070 70 24 . . . 10_1101-2021_02_08_430070 71 1 The the DT 10_1101-2021_02_08_430070 71 2 self self NN 10_1101-2021_02_08_430070 71 3 - - HYPH 10_1101-2021_02_08_430070 71 4 attention attention NN 10_1101-2021_02_08_430070 71 5 mechanism mechanism NN 10_1101-2021_02_08_430070 71 6 is be VBZ 10_1101-2021_02_08_430070 71 7 a a DT 10_1101-2021_02_08_430070 71 8 modeling modeling JJ 10_1101-2021_02_08_430070 71 9 approach approach NN 10_1101-2021_02_08_430070 71 10 of of IN 10_1101-2021_02_08_430070 71 11 describing describe VBG 10_1101-2021_02_08_430070 71 12 context context NN 10_1101-2021_02_08_430070 71 13 information information NN 10_1101-2021_02_08_430070 71 14 for for IN 10_1101-2021_02_08_430070 71 15 different different JJ 10_1101-2021_02_08_430070 71 16 positions position NNS 10_1101-2021_02_08_430070 71 17 of of IN 10_1101-2021_02_08_430070 71 18 inputs input NNS 10_1101-2021_02_08_430070 71 19 under under IN 10_1101-2021_02_08_430070 71 20 a a DT 10_1101-2021_02_08_430070 71 21 deep deep JJ 10_1101-2021_02_08_430070 71 22 learning learning NN 10_1101-2021_02_08_430070 71 23 framework framework NN 10_1101-2021_02_08_430070 71 24 . . . 10_1101-2021_02_08_430070 72 1 The the DT 10_1101-2021_02_08_430070 72 2 self- self- NN 10_1101-2021_02_08_430070 72 3 attention attention NN 10_1101-2021_02_08_430070 72 4 mechanism mechanism NN 10_1101-2021_02_08_430070 72 5 imitates imitate VBZ 10_1101-2021_02_08_430070 72 6 the the DT 10_1101-2021_02_08_430070 72 7 human human JJ 10_1101-2021_02_08_430070 72 8 sight sight NN 10_1101-2021_02_08_430070 72 9 mechanism mechanism NN 10_1101-2021_02_08_430070 72 10 and and CC 10_1101-2021_02_08_430070 72 11 provides provide VBZ 10_1101-2021_02_08_430070 72 12 a a DT 10_1101-2021_02_08_430070 72 13 model model NN 10_1101-2021_02_08_430070 72 14 with with IN 10_1101-2021_02_08_430070 72 15 the the DT 10_1101-2021_02_08_430070 72 16 ability ability NN 10_1101-2021_02_08_430070 72 17 to to TO 10_1101-2021_02_08_430070 72 18 zoom zoom VB 10_1101-2021_02_08_430070 72 19 in in RP 10_1101-2021_02_08_430070 72 20 or or CC 10_1101-2021_02_08_430070 72 21 out out RB 10_1101-2021_02_08_430070 72 22 in in IN 10_1101-2021_02_08_430070 72 23 a a DT 10_1101-2021_02_08_430070 72 24 particular particular JJ 10_1101-2021_02_08_430070 72 25 position position NN 10_1101-2021_02_08_430070 72 26 of of IN 10_1101-2021_02_08_430070 72 27 an an DT 10_1101-2021_02_08_430070 72 28 input input NN 10_1101-2021_02_08_430070 72 29 sequence sequence NN 10_1101-2021_02_08_430070 72 30 . . . 10_1101-2021_02_08_430070 73 1 It -PRON- PRP 10_1101-2021_02_08_430070 73 2 demonstrates demonstrate VBZ 10_1101-2021_02_08_430070 73 3 the the DT 10_1101-2021_02_08_430070 73 4 effectiveness effectiveness NN 10_1101-2021_02_08_430070 73 5 in in IN 10_1101-2021_02_08_430070 73 6 many many JJ 10_1101-2021_02_08_430070 73 7 different different JJ 10_1101-2021_02_08_430070 73 8 tasks task NNS 10_1101-2021_02_08_430070 73 9 including include VBG 10_1101-2021_02_08_430070 73 10 natural natural JJ 10_1101-2021_02_08_430070 73 11 language language NN 10_1101-2021_02_08_430070 73 12 understanding understanding NN 10_1101-2021_02_08_430070 73 13 , , , 10_1101-2021_02_08_430070 73 14 image image NN 10_1101-2021_02_08_430070 73 15 recognition recognition NN 10_1101-2021_02_08_430070 73 16 , , , 10_1101-2021_02_08_430070 73 17 and and CC 10_1101-2021_02_08_430070 73 18 several several JJ 10_1101-2021_02_08_430070 73 19 bioinformatics bioinformatic NNS 10_1101-2021_02_08_430070 73 20 applications application NNS 10_1101-2021_02_08_430070 73 21 . . . 10_1101-2021_02_08_430070 74 1 Attention attention NN 10_1101-2021_02_08_430070 74 2 function function NN 10_1101-2021_02_08_430070 74 3 is be VBZ 10_1101-2021_02_08_430070 74 4 described describe VBN 10_1101-2021_02_08_430070 74 5 as as IN 10_1101-2021_02_08_430070 74 6 mapping mapping NN 10_1101-2021_02_08_430070 74 7 Q Q NNP 10_1101-2021_02_08_430070 74 8 and and CC 10_1101-2021_02_08_430070 74 9 a a DT 10_1101-2021_02_08_430070 74 10 set set NN 10_1101-2021_02_08_430070 74 11 of of IN 10_1101-2021_02_08_430070 74 12 key key JJ 10_1101-2021_02_08_430070 74 13 - - HYPH 10_1101-2021_02_08_430070 74 14 value value NN 10_1101-2021_02_08_430070 74 15 ( ( -LRB- 10_1101-2021_02_08_430070 74 16 K K NNP 10_1101-2021_02_08_430070 74 17 , , , 10_1101-2021_02_08_430070 74 18 V V NNP 10_1101-2021_02_08_430070 74 19 ) ) -RRB- 10_1101-2021_02_08_430070 74 20 pairs pair NNS 10_1101-2021_02_08_430070 74 21 to to IN 10_1101-2021_02_08_430070 74 22 an an DT 10_1101-2021_02_08_430070 74 23 output output NN 10_1101-2021_02_08_430070 74 24 . . . 10_1101-2021_02_08_430070 75 1 Formally formally RB 10_1101-2021_02_08_430070 75 2 , , , 10_1101-2021_02_08_430070 75 3 for for IN 10_1101-2021_02_08_430070 75 4 an an DT 10_1101-2021_02_08_430070 75 5 input input NN 10_1101-2021_02_08_430070 75 6 x x SYM 10_1101-2021_02_08_430070 75 7 = = NFP 10_1101-2021_02_08_430070 75 8 ( ( -LRB- 10_1101-2021_02_08_430070 75 9 x1 x1 NNP 10_1101-2021_02_08_430070 75 10 , , , 10_1101-2021_02_08_430070 75 11 ... ... : 10_1101-2021_02_08_430070 75 12 , , , 10_1101-2021_02_08_430070 75 13 xn xn NNP 10_1101-2021_02_08_430070 75 14 ) ) -RRB- 10_1101-2021_02_08_430070 75 15 of of IN 10_1101-2021_02_08_430070 75 16 n n DT 10_1101-2021_02_08_430070 75 17 elements element NNS 10_1101-2021_02_08_430070 75 18 where where WRB 10_1101-2021_02_08_430070 75 19 xi xi NNP 10_1101-2021_02_08_430070 75 20 ∈ ∈ NNP 10_1101-2021_02_08_430070 75 21 Rdx Rdx NNP 10_1101-2021_02_08_430070 75 22 , , , 10_1101-2021_02_08_430070 75 23 we -PRON- PRP 10_1101-2021_02_08_430070 75 24 calculate calculate VBP 10_1101-2021_02_08_430070 75 25 query query NN 10_1101-2021_02_08_430070 75 26 Q q NN 10_1101-2021_02_08_430070 75 27 , , , 10_1101-2021_02_08_430070 75 28 key key JJ 10_1101-2021_02_08_430070 75 29 K k NN 10_1101-2021_02_08_430070 75 30 and and CC 10_1101-2021_02_08_430070 75 31 value value NN 10_1101-2021_02_08_430070 75 32 V V NNP 10_1101-2021_02_08_430070 75 33 vectors vector NNS 10_1101-2021_02_08_430070 75 34 of of IN 10_1101-2021_02_08_430070 75 35 dimension dimension NN 10_1101-2021_02_08_430070 75 36 dk dk NN 10_1101-2021_02_08_430070 75 37 based base VBN 10_1101-2021_02_08_430070 75 38 on on IN 10_1101-2021_02_08_430070 75 39 the the DT 10_1101-2021_02_08_430070 75 40 embedding embed VBG 10_1101-2021_02_08_430070 75 41 vector vector NN 10_1101-2021_02_08_430070 75 42 of of IN 10_1101-2021_02_08_430070 75 43 embed(x embed(x NNP 10_1101-2021_02_08_430070 75 44 ) ) -RRB- 10_1101-2021_02_08_430070 75 45 . . . 10_1101-2021_02_08_430070 76 1 The the DT 10_1101-2021_02_08_430070 76 2 attention attention NN 10_1101-2021_02_08_430070 76 3 module module NN 10_1101-2021_02_08_430070 76 4 generates generate VBZ 10_1101-2021_02_08_430070 76 5 a a DT 10_1101-2021_02_08_430070 76 6 new new JJ 10_1101-2021_02_08_430070 76 7 sequence sequence NN 10_1101-2021_02_08_430070 76 8 z z NN 10_1101-2021_02_08_430070 76 9 = = NFP 10_1101-2021_02_08_430070 76 10 ( ( -LRB- 10_1101-2021_02_08_430070 76 11 z1 z1 NNP 10_1101-2021_02_08_430070 76 12 , , , 10_1101-2021_02_08_430070 76 13 ... ... : 10_1101-2021_02_08_430070 76 14 , , , 10_1101-2021_02_08_430070 76 15 zn zn NNP 10_1101-2021_02_08_430070 76 16 ) ) -RRB- 10_1101-2021_02_08_430070 76 17 of of IN 10_1101-2021_02_08_430070 76 18 the the DT 10_1101-2021_02_08_430070 76 19 same same JJ 10_1101-2021_02_08_430070 76 20 length length NN 10_1101-2021_02_08_430070 76 21 as as IN 10_1101-2021_02_08_430070 76 22 x. x. NNP 10_1101-2021_02_08_430070 76 23 zi zi NNP 10_1101-2021_02_08_430070 76 24 is be VBZ 10_1101-2021_02_08_430070 76 25 calculated calculate VBN 10_1101-2021_02_08_430070 76 26 as as IN 10_1101-2021_02_08_430070 76 27 a a DT 10_1101-2021_02_08_430070 76 28 weighted weighted JJ 10_1101-2021_02_08_430070 76 29 sum sum NN 10_1101-2021_02_08_430070 76 30 of of IN 10_1101-2021_02_08_430070 76 31 linearly linearly JJ 10_1101-2021_02_08_430070 76 32 transformed transform VBN 10_1101-2021_02_08_430070 76 33 input input NN 10_1101-2021_02_08_430070 76 34 elements element NNS 10_1101-2021_02_08_430070 76 35 as as IN 10_1101-2021_02_08_430070 76 36 follows follow VBZ 10_1101-2021_02_08_430070 76 37 : : : 10_1101-2021_02_08_430070 76 38 zi zi NN 10_1101-2021_02_08_430070 76 39 = = SYM 10_1101-2021_02_08_430070 76 40 n∑ n∑ VB 10_1101-2021_02_08_430070 76 41 j=1 j=1 NNS 10_1101-2021_02_08_430070 76 42 aij(xjW aij(xjW NNP 10_1101-2021_02_08_430070 76 43 V v NN 10_1101-2021_02_08_430070 76 44 ) ) -RRB- 10_1101-2021_02_08_430070 76 45 aij aij NN 10_1101-2021_02_08_430070 76 46 = = SYM 10_1101-2021_02_08_430070 76 47 exp exp NN 10_1101-2021_02_08_430070 76 48 eij∑n eij∑n NNP 10_1101-2021_02_08_430070 76 49 k=1 k=1 JJ 10_1101-2021_02_08_430070 76 50 exp exp NN 10_1101-2021_02_08_430070 76 51 eik eik NNP 10_1101-2021_02_08_430070 76 52 eij eij NNP 10_1101-2021_02_08_430070 76 53 = = NFP 10_1101-2021_02_08_430070 76 54 ( ( -LRB- 10_1101-2021_02_08_430070 76 55 xiW xiW NNP 10_1101-2021_02_08_430070 76 56 Q)(xjW q)(xjw NN 10_1101-2021_02_08_430070 76 57 T t NN 10_1101-2021_02_08_430070 76 58 ) ) -RRB- 10_1101-2021_02_08_430070 76 59 T T NNP 10_1101-2021_02_08_430070 76 60 √ √ NNP 10_1101-2021_02_08_430070 76 61 dz dz NN 10_1101-2021_02_08_430070 76 62 , , , 10_1101-2021_02_08_430070 76 63 where where WRB 10_1101-2021_02_08_430070 76 64 W W NNP 10_1101-2021_02_08_430070 76 65 Q Q NNP 10_1101-2021_02_08_430070 76 66 , , , 10_1101-2021_02_08_430070 76 67 W W NNP 10_1101-2021_02_08_430070 76 68 K k NN 10_1101-2021_02_08_430070 76 69 , , , 10_1101-2021_02_08_430070 76 70 W w NN 10_1101-2021_02_08_430070 76 71 T t NN 10_1101-2021_02_08_430070 76 72 ∈ ∈ JJ 10_1101-2021_02_08_430070 76 73 Rdx×dz rdx×dz NN 10_1101-2021_02_08_430070 76 74 are be VBP 10_1101-2021_02_08_430070 76 75 parameter parameter NN 10_1101-2021_02_08_430070 76 76 matrices matrix NNS 10_1101-2021_02_08_430070 76 77 . . . 10_1101-2021_02_08_430070 77 1 The the DT 10_1101-2021_02_08_430070 77 2 self self NN 10_1101-2021_02_08_430070 77 3 - - HYPH 10_1101-2021_02_08_430070 77 4 attention attention NN 10_1101-2021_02_08_430070 77 5 computes compute NNS 10_1101-2021_02_08_430070 77 6 a a DT 10_1101-2021_02_08_430070 77 7 pairwise pairwise NN 10_1101-2021_02_08_430070 77 8 correlation correlation NN 10_1101-2021_02_08_430070 77 9 of of IN 10_1101-2021_02_08_430070 77 10 embed(xi embed(xi ADD 10_1101-2021_02_08_430070 77 11 ) ) -RRB- 10_1101-2021_02_08_430070 77 12 and and CC 10_1101-2021_02_08_430070 77 13 embed(xj embed(xj ADD 10_1101-2021_02_08_430070 77 14 ) ) -RRB- 10_1101-2021_02_08_430070 77 15 , , , 10_1101-2021_02_08_430070 77 16 which which WDT 10_1101-2021_02_08_430070 77 17 can can MD 10_1101-2021_02_08_430070 77 18 be be VB 10_1101-2021_02_08_430070 77 19 calculated calculate VBN 10_1101-2021_02_08_430070 77 20 in in IN 10_1101-2021_02_08_430070 77 21 a a DT 10_1101-2021_02_08_430070 77 22 parallel parallel JJ 10_1101-2021_02_08_430070 77 23 way way NN 10_1101-2021_02_08_430070 77 24 . . . 10_1101-2021_02_08_430070 78 1 While while IN 10_1101-2021_02_08_430070 78 2 in in IN 10_1101-2021_02_08_430070 78 3 a a DT 10_1101-2021_02_08_430070 78 4 biRNN birnn JJ 10_1101-2021_02_08_430070 78 5 , , , 10_1101-2021_02_08_430070 78 6 recurrent recurrent JJ 10_1101-2021_02_08_430070 78 7 hidden hide VBN 10_1101-2021_02_08_430070 78 8 units unit NNS 10_1101-2021_02_08_430070 78 9 are be VBP 10_1101-2021_02_08_430070 78 10 required require VBN 10_1101-2021_02_08_430070 78 11 to to TO 10_1101-2021_02_08_430070 78 12 be be VB 10_1101-2021_02_08_430070 78 13 calculated calculate VBN 10_1101-2021_02_08_430070 78 14 successively successively RB 10_1101-2021_02_08_430070 78 15 . . . 10_1101-2021_02_08_430070 79 1 This this DT 10_1101-2021_02_08_430070 79 2 architecture architecture NN 10_1101-2021_02_08_430070 79 3 difference difference NN 10_1101-2021_02_08_430070 79 4 makes make VBZ 10_1101-2021_02_08_430070 79 5 BERT BERT NNP 10_1101-2021_02_08_430070 79 6 can can MD 10_1101-2021_02_08_430070 79 7 be be VB 10_1101-2021_02_08_430070 79 8 optimized optimize VBN 10_1101-2021_02_08_430070 79 9 for for IN 10_1101-2021_02_08_430070 79 10 fast fast JJ 10_1101-2021_02_08_430070 79 11 inference inference NN 10_1101-2021_02_08_430070 79 12 . . . 10_1101-2021_02_08_430070 80 1 2.1.3 2.1.3 CD 10_1101-2021_02_08_430070 80 2 Relative relative JJ 10_1101-2021_02_08_430070 80 3 position position NN 10_1101-2021_02_08_430070 80 4 representation representation NN 10_1101-2021_02_08_430070 80 5 in in IN 10_1101-2021_02_08_430070 80 6 self self NN 10_1101-2021_02_08_430070 80 7 - - HYPH 10_1101-2021_02_08_430070 80 8 attention attention NN 10_1101-2021_02_08_430070 80 9 heads head NNS 10_1101-2021_02_08_430070 80 10 For for IN 10_1101-2021_02_08_430070 80 11 nanopore nanopore JJ 10_1101-2021_02_08_430070 80 12 sequencing sequencing NN 10_1101-2021_02_08_430070 80 13 , , , 10_1101-2021_02_08_430070 80 14 signals signal NNS 10_1101-2021_02_08_430070 80 15 are be VBP 10_1101-2021_02_08_430070 80 16 supposed suppose VBN 10_1101-2021_02_08_430070 80 17 to to TO 10_1101-2021_02_08_430070 80 18 be be VB 10_1101-2021_02_08_430070 80 19 more more RBR 10_1101-2021_02_08_430070 80 20 affected affected JJ 10_1101-2021_02_08_430070 80 21 by by IN 10_1101-2021_02_08_430070 80 22 the the DT 10_1101-2021_02_08_430070 80 23 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 80 24 passing pass VBG 10_1101-2021_02_08_430070 80 25 through through IN 10_1101-2021_02_08_430070 80 26 the the DT 10_1101-2021_02_08_430070 80 27 pore pore NN 10_1101-2021_02_08_430070 80 28 . . . 10_1101-2021_02_08_430070 81 1 Its -PRON- PRP$ 10_1101-2021_02_08_430070 81 2 surrounding surround VBG 10_1101-2021_02_08_430070 81 3 nucleotides nucleotide NNS 10_1101-2021_02_08_430070 81 4 may may MD 10_1101-2021_02_08_430070 81 5 also also RB 10_1101-2021_02_08_430070 81 6 have have VB 10_1101-2021_02_08_430070 81 7 effects effect NNS 10_1101-2021_02_08_430070 81 8 on on IN 10_1101-2021_02_08_430070 81 9 the the DT 10_1101-2021_02_08_430070 81 10 current current JJ 10_1101-2021_02_08_430070 81 11 signals signal NNS 10_1101-2021_02_08_430070 81 12 . . . 10_1101-2021_02_08_430070 82 1 For for IN 10_1101-2021_02_08_430070 82 2 those those DT 10_1101-2021_02_08_430070 82 3 nucleotides nucleotide NNS 10_1101-2021_02_08_430070 82 4 that that WDT 10_1101-2021_02_08_430070 82 5 are be VBP 10_1101-2021_02_08_430070 82 6 too too RB 10_1101-2021_02_08_430070 82 7 far far RB 10_1101-2021_02_08_430070 82 8 away away RB 10_1101-2021_02_08_430070 82 9 in in IN 10_1101-2021_02_08_430070 82 10 a a DT 10_1101-2021_02_08_430070 82 11 context context NN 10_1101-2021_02_08_430070 82 12 window window NN 10_1101-2021_02_08_430070 82 13 , , , 10_1101-2021_02_08_430070 82 14 it -PRON- PRP 10_1101-2021_02_08_430070 82 15 is be VBZ 10_1101-2021_02_08_430070 82 16 intuitive intuitive JJ 10_1101-2021_02_08_430070 82 17 to to TO 10_1101-2021_02_08_430070 82 18 assume assume VB 10_1101-2021_02_08_430070 82 19 they -PRON- PRP 10_1101-2021_02_08_430070 82 20 have have VBP 10_1101-2021_02_08_430070 82 21 less less JJR 10_1101-2021_02_08_430070 82 22 effect effect NN 10_1101-2021_02_08_430070 82 23 on on IN 10_1101-2021_02_08_430070 82 24 the the DT 10_1101-2021_02_08_430070 82 25 detected detected JJ 10_1101-2021_02_08_430070 82 26 current current JJ 10_1101-2021_02_08_430070 82 27 signals signal NNS 10_1101-2021_02_08_430070 82 28 . . . 10_1101-2021_02_08_430070 83 1 In in IN 10_1101-2021_02_08_430070 83 2 the the DT 10_1101-2021_02_08_430070 83 3 refined refined JJ 10_1101-2021_02_08_430070 83 4 BERT BERT NNP 10_1101-2021_02_08_430070 83 5 model model NN 10_1101-2021_02_08_430070 83 6 , , , 10_1101-2021_02_08_430070 83 7 we -PRON- PRP 10_1101-2021_02_08_430070 83 8 add add VBP 10_1101-2021_02_08_430070 83 9 relative relative JJ 10_1101-2021_02_08_430070 83 10 position position NN 10_1101-2021_02_08_430070 83 11 representation representation NN 10_1101-2021_02_08_430070 83 12 in in IN 10_1101-2021_02_08_430070 83 13 the the DT 10_1101-2021_02_08_430070 83 14 attention attention NN 10_1101-2021_02_08_430070 83 15 module module NN 10_1101-2021_02_08_430070 83 16 following follow VBG 10_1101-2021_02_08_430070 83 17 the the DT 10_1101-2021_02_08_430070 83 18 method method NN 10_1101-2021_02_08_430070 83 19 proposed propose VBN 10_1101-2021_02_08_430070 83 20 by by IN 10_1101-2021_02_08_430070 83 21 Shaw Shaw NNP 10_1101-2021_02_08_430070 83 22 et et NNP 10_1101-2021_02_08_430070 83 23 al al NNP 10_1101-2021_02_08_430070 83 24 . . . 10_1101-2021_02_08_430070 84 1 ( ( -LRB- 10_1101-2021_02_08_430070 84 2 2018 2018 CD 10_1101-2021_02_08_430070 84 3 ) ) -RRB- 10_1101-2021_02_08_430070 84 4 . . . 10_1101-2021_02_08_430070 85 1 For for IN 10_1101-2021_02_08_430070 85 2 any any DT 10_1101-2021_02_08_430070 85 3 two two CD 10_1101-2021_02_08_430070 85 4 input input NN 10_1101-2021_02_08_430070 85 5 elements element NNS 10_1101-2021_02_08_430070 85 6 xi xi NNP 10_1101-2021_02_08_430070 85 7 and and CC 10_1101-2021_02_08_430070 85 8 xj xj NNP 10_1101-2021_02_08_430070 85 9 , , , 10_1101-2021_02_08_430070 85 10 the the DT 10_1101-2021_02_08_430070 85 11 relative relative JJ 10_1101-2021_02_08_430070 85 12 position position NN 10_1101-2021_02_08_430070 85 13 information information NN 10_1101-2021_02_08_430070 85 14 is be VBZ 10_1101-2021_02_08_430070 85 15 modeled model VBN 10_1101-2021_02_08_430070 85 16 with with IN 10_1101-2021_02_08_430070 85 17 two two CD 10_1101-2021_02_08_430070 85 18 distinct distinct JJ 10_1101-2021_02_08_430070 85 19 edge edge NN 10_1101-2021_02_08_430070 85 20 representations representation NNS 10_1101-2021_02_08_430070 85 21 aVij aVij NNP 10_1101-2021_02_08_430070 85 22 , , , 10_1101-2021_02_08_430070 85 23 a a DT 10_1101-2021_02_08_430070 85 24 K K NNP 10_1101-2021_02_08_430070 85 25 ij ij NN 10_1101-2021_02_08_430070 85 26 . . . 10_1101-2021_02_08_430070 86 1 For for IN 10_1101-2021_02_08_430070 86 2 linear linear JJ 10_1101-2021_02_08_430070 86 3 sequences sequence NNS 10_1101-2021_02_08_430070 86 4 , , , 10_1101-2021_02_08_430070 86 5 those those DT 10_1101-2021_02_08_430070 86 6 edges edge NNS 10_1101-2021_02_08_430070 86 7 are be VBP 10_1101-2021_02_08_430070 86 8 used use VBN 10_1101-2021_02_08_430070 86 9 to to TO 10_1101-2021_02_08_430070 86 10 capture capture VB 10_1101-2021_02_08_430070 86 11 the the DT 10_1101-2021_02_08_430070 86 12 relative relative JJ 10_1101-2021_02_08_430070 86 13 position position NN 10_1101-2021_02_08_430070 86 14 differences difference NNS 10_1101-2021_02_08_430070 86 15 between between IN 10_1101-2021_02_08_430070 86 16 input input NN 10_1101-2021_02_08_430070 86 17 elements element NNS 10_1101-2021_02_08_430070 86 18 . . . 10_1101-2021_02_08_430070 87 1 As as IN 10_1101-2021_02_08_430070 87 2 the the DT 10_1101-2021_02_08_430070 87 3 precise precise JJ 10_1101-2021_02_08_430070 87 4 relative relative JJ 10_1101-2021_02_08_430070 87 5 position position NN 10_1101-2021_02_08_430070 87 6 is be VBZ 10_1101-2021_02_08_430070 87 7 not not RB 10_1101-2021_02_08_430070 87 8 useful useful JJ 10_1101-2021_02_08_430070 87 9 beyond beyond IN 10_1101-2021_02_08_430070 87 10 a a DT 10_1101-2021_02_08_430070 87 11 certain certain JJ 10_1101-2021_02_08_430070 87 12 distance distance NN 10_1101-2021_02_08_430070 87 13 , , , 10_1101-2021_02_08_430070 87 14 we -PRON- PRP 10_1101-2021_02_08_430070 87 15 clip clip VBP 10_1101-2021_02_08_430070 87 16 the the DT 10_1101-2021_02_08_430070 87 17 maximum maximum JJ 10_1101-2021_02_08_430070 87 18 distance distance NN 10_1101-2021_02_08_430070 87 19 ( ( -LRB- 10_1101-2021_02_08_430070 87 20 e.g. e.g. RB 10_1101-2021_02_08_430070 88 1 ±3bp ±3bp NNP 10_1101-2021_02_08_430070 88 2 ) ) -RRB- 10_1101-2021_02_08_430070 88 3 in in IN 10_1101-2021_02_08_430070 88 4 calculating calculate VBG 10_1101-2021_02_08_430070 88 5 attention attention NN 10_1101-2021_02_08_430070 88 6 aij aij NN 10_1101-2021_02_08_430070 88 7 ∈ ∈ NNP 10_1101-2021_02_08_430070 88 8 A. a. NN 10_1101-2021_02_08_430070 88 9 a a DT 10_1101-2021_02_08_430070 88 10 K k NN 10_1101-2021_02_08_430070 88 11 ij ij NN 10_1101-2021_02_08_430070 88 12 = = NFP 10_1101-2021_02_08_430070 88 13 W W NNP 10_1101-2021_02_08_430070 88 14 K K NNP 10_1101-2021_02_08_430070 88 15 clip(j−i clip(j−i NNP 10_1101-2021_02_08_430070 88 16 , , , 10_1101-2021_02_08_430070 88 17 k k LS 10_1101-2021_02_08_430070 88 18 ) ) -RRB- 10_1101-2021_02_08_430070 88 19 a a DT 10_1101-2021_02_08_430070 88 20 V v NN 10_1101-2021_02_08_430070 88 21 ij ij NN 10_1101-2021_02_08_430070 88 22 = = NN 10_1101-2021_02_08_430070 88 23 W w NN 10_1101-2021_02_08_430070 88 24 V V NNP 10_1101-2021_02_08_430070 88 25 clip(j−i clip(j−i NNP 10_1101-2021_02_08_430070 88 26 , , , 10_1101-2021_02_08_430070 88 27 k k NN 10_1101-2021_02_08_430070 88 28 ) ) -RRB- 10_1101-2021_02_08_430070 88 29 clip(x clip(x NN 10_1101-2021_02_08_430070 88 30 , , , 10_1101-2021_02_08_430070 88 31 k k LS 10_1101-2021_02_08_430070 88 32 ) ) -RRB- 10_1101-2021_02_08_430070 88 33 = = SYM 10_1101-2021_02_08_430070 88 34 max(−k max(−k NNP 10_1101-2021_02_08_430070 88 35 , , , 10_1101-2021_02_08_430070 88 36 min(k min(k NN 10_1101-2021_02_08_430070 88 37 , , , 10_1101-2021_02_08_430070 88 38 x x NNP 10_1101-2021_02_08_430070 88 39 ) ) -RRB- 10_1101-2021_02_08_430070 88 40 ) ) -RRB- 10_1101-2021_02_08_430070 88 41 2.1.4 2.1.4 CD 10_1101-2021_02_08_430070 88 42 Final final JJ 10_1101-2021_02_08_430070 88 43 full full JJ 10_1101-2021_02_08_430070 88 44 connection connection NN 10_1101-2021_02_08_430070 88 45 layer layer NN 10_1101-2021_02_08_430070 88 46 After after IN 10_1101-2021_02_08_430070 88 47 the the DT 10_1101-2021_02_08_430070 88 48 stacked stacked JJ 10_1101-2021_02_08_430070 88 49 transformer transformer NN 10_1101-2021_02_08_430070 88 50 blocks block NNS 10_1101-2021_02_08_430070 88 51 , , , 10_1101-2021_02_08_430070 88 52 hidden hide VBN 10_1101-2021_02_08_430070 88 53 units unit NNS 10_1101-2021_02_08_430070 88 54 of of IN 10_1101-2021_02_08_430070 88 55 the the DT 10_1101-2021_02_08_430070 88 56 center center NN 10_1101-2021_02_08_430070 88 57 position position NN 10_1101-2021_02_08_430070 88 58 feed feed VBP 10_1101-2021_02_08_430070 88 59 to to IN 10_1101-2021_02_08_430070 88 60 a a DT 10_1101-2021_02_08_430070 88 61 full full JJ 10_1101-2021_02_08_430070 88 62 connection connection NN 10_1101-2021_02_08_430070 88 63 linear linear NN 10_1101-2021_02_08_430070 88 64 layer layer NN 10_1101-2021_02_08_430070 88 65 that that WDT 10_1101-2021_02_08_430070 88 66 makes make VBZ 10_1101-2021_02_08_430070 88 67 the the DT 10_1101-2021_02_08_430070 88 68 final final JJ 10_1101-2021_02_08_430070 88 69 prediction prediction NN 10_1101-2021_02_08_430070 88 70 of of IN 10_1101-2021_02_08_430070 88 71 whether whether IN 10_1101-2021_02_08_430070 88 72 a a DT 10_1101-2021_02_08_430070 88 73 given give VBN 10_1101-2021_02_08_430070 88 74 input input NN 10_1101-2021_02_08_430070 88 75 contains contain VBZ 10_1101-2021_02_08_430070 88 76 a a DT 10_1101-2021_02_08_430070 88 77 methylated methylated JJ 10_1101-2021_02_08_430070 88 78 motif motif NN 10_1101-2021_02_08_430070 88 79 or or CC 10_1101-2021_02_08_430070 88 80 not not RB 10_1101-2021_02_08_430070 88 81 . . . 10_1101-2021_02_08_430070 89 1 In in IN 10_1101-2021_02_08_430070 89 2 the the DT 10_1101-2021_02_08_430070 89 3 refined refined JJ 10_1101-2021_02_08_430070 89 4 BERT BERT NNP 10_1101-2021_02_08_430070 89 5 , , , 10_1101-2021_02_08_430070 89 6 besides besides IN 10_1101-2021_02_08_430070 89 7 the the DT 10_1101-2021_02_08_430070 89 8 hidden hide VBN 10_1101-2021_02_08_430070 89 9 units unit NNS 10_1101-2021_02_08_430070 89 10 of of IN 10_1101-2021_02_08_430070 89 11 the the DT 10_1101-2021_02_08_430070 89 12 center center NN 10_1101-2021_02_08_430070 89 13 position position NN 10_1101-2021_02_08_430070 89 14 , , , 10_1101-2021_02_08_430070 89 15 hidden hide VBN 10_1101-2021_02_08_430070 89 16 units unit NNS 10_1101-2021_02_08_430070 89 17 in in IN 10_1101-2021_02_08_430070 89 18 its -PRON- PRP$ 10_1101-2021_02_08_430070 89 19 surrounding surround VBG 10_1101-2021_02_08_430070 89 20 window window NN 10_1101-2021_02_08_430070 89 21 ( ( -LRB- 10_1101-2021_02_08_430070 89 22 e.g. e.g. RB 10_1101-2021_02_08_430070 89 23 , , , 10_1101-2021_02_08_430070 89 24 ±3bp ±3bp NN 10_1101-2021_02_08_430070 89 25 ) ) -RRB- 10_1101-2021_02_08_430070 89 26 are be VBP 10_1101-2021_02_08_430070 89 27 concatenated concatenate VBN 10_1101-2021_02_08_430070 89 28 as as IN 10_1101-2021_02_08_430070 89 29 the the DT 10_1101-2021_02_08_430070 89 30 input input NN 10_1101-2021_02_08_430070 89 31 of of IN 10_1101-2021_02_08_430070 89 32 the the DT 10_1101-2021_02_08_430070 89 33 final final JJ 10_1101-2021_02_08_430070 89 34 full full JJ 10_1101-2021_02_08_430070 89 35 connection connection NN 10_1101-2021_02_08_430070 89 36 layer layer NN 10_1101-2021_02_08_430070 89 37 . . . 10_1101-2021_02_08_430070 90 1 2.2 2.2 CD 10_1101-2021_02_08_430070 90 2 Applying apply VBG 10_1101-2021_02_08_430070 90 3 BERT BERT NNP 10_1101-2021_02_08_430070 90 4 models model NNS 10_1101-2021_02_08_430070 90 5 for for IN 10_1101-2021_02_08_430070 90 6 nanopore nanopore JJ 10_1101-2021_02_08_430070 90 7 methylation methylation NN 10_1101-2021_02_08_430070 90 8 detection detection NN 10_1101-2021_02_08_430070 90 9 The the DT 10_1101-2021_02_08_430070 90 10 BERT BERT NNP 10_1101-2021_02_08_430070 90 11 models model NNS 10_1101-2021_02_08_430070 90 12 are be VBP 10_1101-2021_02_08_430070 90 13 then then RB 10_1101-2021_02_08_430070 90 14 applied apply VBN 10_1101-2021_02_08_430070 90 15 to to TO 10_1101-2021_02_08_430070 90 16 replace replace VB 10_1101-2021_02_08_430070 90 17 different different JJ 10_1101-2021_02_08_430070 90 18 classification classification NN 10_1101-2021_02_08_430070 90 19 models model NNS 10_1101-2021_02_08_430070 90 20 ( ( -LRB- 10_1101-2021_02_08_430070 90 21 e.g. e.g. RB 10_1101-2021_02_08_430070 91 1 biRNN birnn LS 10_1101-2021_02_08_430070 91 2 ) ) -RRB- 10_1101-2021_02_08_430070 91 3 in in IN 10_1101-2021_02_08_430070 91 4 a a DT 10_1101-2021_02_08_430070 91 5 typical typical JJ 10_1101-2021_02_08_430070 91 6 model model NN 10_1101-2021_02_08_430070 91 7 - - HYPH 10_1101-2021_02_08_430070 91 8 based base VBN 10_1101-2021_02_08_430070 91 9 methylation methylation NN 10_1101-2021_02_08_430070 91 10 detection detection NN 10_1101-2021_02_08_430070 91 11 framework framework NN 10_1101-2021_02_08_430070 91 12 . . . 10_1101-2021_02_08_430070 92 1 In in IN 10_1101-2021_02_08_430070 92 2 this this DT 10_1101-2021_02_08_430070 92 3 framework framework NN 10_1101-2021_02_08_430070 92 4 , , , 10_1101-2021_02_08_430070 92 5 raw raw JJ 10_1101-2021_02_08_430070 92 6 signals signal NNS 10_1101-2021_02_08_430070 92 7 of of IN 10_1101-2021_02_08_430070 92 8 each each DT 10_1101-2021_02_08_430070 92 9 read read NN 10_1101-2021_02_08_430070 92 10 are be VBP 10_1101-2021_02_08_430070 92 11 first first RB 10_1101-2021_02_08_430070 92 12 translated translate VBN 10_1101-2021_02_08_430070 92 13 into into IN 10_1101-2021_02_08_430070 92 14 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 92 15 sequences sequence NNS 10_1101-2021_02_08_430070 92 16 ( ( -LRB- 10_1101-2021_02_08_430070 92 17 basecalling basecalle VBG 10_1101-2021_02_08_430070 92 18 ) ) -RRB- 10_1101-2021_02_08_430070 92 19 . . . 10_1101-2021_02_08_430070 93 1 Signals signal NNS 10_1101-2021_02_08_430070 93 2 are be VBP 10_1101-2021_02_08_430070 93 3 then then RB 10_1101-2021_02_08_430070 93 4 aligned align VBN 10_1101-2021_02_08_430070 93 5 to to IN 10_1101-2021_02_08_430070 93 6 corresponding correspond VBG 10_1101-2021_02_08_430070 93 7 reference reference NN 10_1101-2021_02_08_430070 93 8 nucleotides nucleotide NNS 10_1101-2021_02_08_430070 93 9 through through IN 10_1101-2021_02_08_430070 93 10 the the DT 10_1101-2021_02_08_430070 93 11 re re JJ 10_1101-2021_02_08_430070 93 12 - - NN 10_1101-2021_02_08_430070 93 13 squiggle squiggle JJ 10_1101-2021_02_08_430070 93 14 process process NN 10_1101-2021_02_08_430070 93 15 . . . 10_1101-2021_02_08_430070 94 1 After after IN 10_1101-2021_02_08_430070 94 2 that that DT 10_1101-2021_02_08_430070 94 3 , , , 10_1101-2021_02_08_430070 94 4 the the DT 10_1101-2021_02_08_430070 94 5 target target NN 10_1101-2021_02_08_430070 94 6 motif motif NN 10_1101-2021_02_08_430070 94 7 ( ( -LRB- 10_1101-2021_02_08_430070 94 8 e.g. e.g. RB 10_1101-2021_02_08_430070 95 1 CpG CpG NNP 10_1101-2021_02_08_430070 95 2 ) ) -RRB- 10_1101-2021_02_08_430070 95 3 and and CC 10_1101-2021_02_08_430070 95 4 its -PRON- PRP$ 10_1101-2021_02_08_430070 95 5 context context NN 10_1101-2021_02_08_430070 95 6 regions region NNS 10_1101-2021_02_08_430070 95 7 are be VBP 10_1101-2021_02_08_430070 95 8 localized localize VBN 10_1101-2021_02_08_430070 95 9 through through IN 10_1101-2021_02_08_430070 95 10 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 95 11 matching matching NN 10_1101-2021_02_08_430070 95 12 and and CC 10_1101-2021_02_08_430070 95 13 signals signal NNS 10_1101-2021_02_08_430070 95 14 in in IN 10_1101-2021_02_08_430070 95 15 a a DT 10_1101-2021_02_08_430070 95 16 context context NN 10_1101-2021_02_08_430070 95 17 window window NN 10_1101-2021_02_08_430070 95 18 of of IN 10_1101-2021_02_08_430070 95 19 a a DT 10_1101-2021_02_08_430070 95 20 fixed fix VBN 10_1101-2021_02_08_430070 95 21 length length NN 10_1101-2021_02_08_430070 95 22 ( ( -LRB- 10_1101-2021_02_08_430070 95 23 e.g. e.g. RB 10_1101-2021_02_08_430070 96 1 21bp 21bp LS 10_1101-2021_02_08_430070 96 2 ) ) -RRB- 10_1101-2021_02_08_430070 96 3 are be VBP 10_1101-2021_02_08_430070 96 4 transformed transform VBN 10_1101-2021_02_08_430070 96 5 into into IN 10_1101-2021_02_08_430070 96 6 event event NN 10_1101-2021_02_08_430070 96 7 - - HYPH 10_1101-2021_02_08_430070 96 8 based base VBN 10_1101-2021_02_08_430070 96 9 features feature NNS 10_1101-2021_02_08_430070 96 10 as as IN 10_1101-2021_02_08_430070 96 11 the the DT 10_1101-2021_02_08_430070 96 12 input input NN 10_1101-2021_02_08_430070 96 13 of of IN 10_1101-2021_02_08_430070 96 14 methylation methylation NN 10_1101-2021_02_08_430070 96 15 callers caller NNS 10_1101-2021_02_08_430070 96 16 . . . 10_1101-2021_02_08_430070 97 1 Typical typical JJ 10_1101-2021_02_08_430070 97 2 event event NN 10_1101-2021_02_08_430070 97 3 - - HYPH 10_1101-2021_02_08_430070 97 4 based base VBN 10_1101-2021_02_08_430070 97 5 features feature NNS 10_1101-2021_02_08_430070 97 6 include include VBP 10_1101-2021_02_08_430070 97 7 signal signal NNP 10_1101-2021_02_08_430070 97 8 mean mean NNP 10_1101-2021_02_08_430070 97 9 , , , 10_1101-2021_02_08_430070 97 10 signal signal NNP 10_1101-2021_02_08_430070 97 11 standard standard JJ 10_1101-2021_02_08_430070 97 12 deviation deviation NN 10_1101-2021_02_08_430070 97 13 , , , 10_1101-2021_02_08_430070 97 14 event event NN 10_1101-2021_02_08_430070 97 15 length length NN 10_1101-2021_02_08_430070 97 16 , , , 10_1101-2021_02_08_430070 97 17 and and CC 10_1101-2021_02_08_430070 97 18 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 97 19 information information NN 10_1101-2021_02_08_430070 97 20 ( ( -LRB- 10_1101-2021_02_08_430070 97 21 Liu Liu NNP 10_1101-2021_02_08_430070 97 22 et et NNP 10_1101-2021_02_08_430070 97 23 al al NNP 10_1101-2021_02_08_430070 97 24 . . NNP 10_1101-2021_02_08_430070 97 25 , , , 10_1101-2021_02_08_430070 97 26 2019 2019 CD 10_1101-2021_02_08_430070 97 27 ) ) -RRB- 10_1101-2021_02_08_430070 97 28 . . . 10_1101-2021_02_08_430070 98 1 Here here RB 10_1101-2021_02_08_430070 98 2 , , , 10_1101-2021_02_08_430070 98 3 we -PRON- PRP 10_1101-2021_02_08_430070 98 4 utilize utilize VBP 10_1101-2021_02_08_430070 98 5 the the DT 10_1101-2021_02_08_430070 98 6 framework framework NN 10_1101-2021_02_08_430070 98 7 of of IN 10_1101-2021_02_08_430070 98 8 deepMOD deepMOD NNP 10_1101-2021_02_08_430070 98 9 and and CC 10_1101-2021_02_08_430070 98 10 perform perform VB 10_1101-2021_02_08_430070 98 11 the the DT 10_1101-2021_02_08_430070 98 12 same same JJ 10_1101-2021_02_08_430070 98 13 pre pre NN 10_1101-2021_02_08_430070 98 14 - - NN 10_1101-2021_02_08_430070 98 15 process process NN 10_1101-2021_02_08_430070 98 16 for for IN 10_1101-2021_02_08_430070 98 17 the the DT 10_1101-2021_02_08_430070 98 18 data datum NNS 10_1101-2021_02_08_430070 98 19 . . . 10_1101-2021_02_08_430070 99 1 We -PRON- PRP 10_1101-2021_02_08_430070 99 2 use use VBP 10_1101-2021_02_08_430070 99 3 Tombo Tombo NNP 10_1101-2021_02_08_430070 99 4 ( ( -LRB- 10_1101-2021_02_08_430070 99 5 Ver Ver NNP 10_1101-2021_02_08_430070 99 6 1.5.1 1.5.1 NNP 10_1101-2021_02_08_430070 99 7 ) ) -RRB- 10_1101-2021_02_08_430070 99 8 to to TO 10_1101-2021_02_08_430070 99 9 perform perform VB 10_1101-2021_02_08_430070 99 10 re- re- JJ 10_1101-2021_02_08_430070 99 11 squiggling squiggle VBG 10_1101-2021_02_08_430070 99 12 and and CC 10_1101-2021_02_08_430070 99 13 utilize utilize VB 10_1101-2021_02_08_430070 99 14 Minimap2 Minimap2 NNP 10_1101-2021_02_08_430070 99 15 ( ( -LRB- 10_1101-2021_02_08_430070 99 16 Ver Ver NNP 10_1101-2021_02_08_430070 99 17 2.17-r941 2.17-r941 CD 10_1101-2021_02_08_430070 99 18 ) ) -RRB- 10_1101-2021_02_08_430070 99 19 to to TO 10_1101-2021_02_08_430070 99 20 align align VB 10_1101-2021_02_08_430070 99 21 events event NNS 10_1101-2021_02_08_430070 99 22 to to IN 10_1101-2021_02_08_430070 99 23 the the DT 10_1101-2021_02_08_430070 99 24 reference reference NN 10_1101-2021_02_08_430070 99 25 genome genome JJ 10_1101-2021_02_08_430070 99 26 . . . 10_1101-2021_02_08_430070 100 1 Here here RB 10_1101-2021_02_08_430070 100 2 , , , 10_1101-2021_02_08_430070 100 3 we -PRON- PRP 10_1101-2021_02_08_430070 100 4 use use VBP 10_1101-2021_02_08_430070 100 5 E.coli e.coli JJ 10_1101-2021_02_08_430070 100 6 K-12 k-12 CD 10_1101-2021_02_08_430070 100 7 MG1655 mg1655 CD 10_1101-2021_02_08_430070 100 8 and and CC 10_1101-2021_02_08_430070 100 9 H.Sapiens H.Sapiens NNP 10_1101-2021_02_08_430070 100 10 GRCh38 GRCh38 NNP 10_1101-2021_02_08_430070 100 11 as as IN 10_1101-2021_02_08_430070 100 12 the the DT 10_1101-2021_02_08_430070 100 13 reference reference NN 10_1101-2021_02_08_430070 100 14 genomes genome VBZ 10_1101-2021_02_08_430070 100 15 . . . 10_1101-2021_02_08_430070 101 1 3 3 LS 10_1101-2021_02_08_430070 101 2 Experiments experiment NNS 10_1101-2021_02_08_430070 101 3 We -PRON- PRP 10_1101-2021_02_08_430070 101 4 compare compare VBP 10_1101-2021_02_08_430070 101 5 BERT BERT NNP 10_1101-2021_02_08_430070 101 6 models model NNS 10_1101-2021_02_08_430070 101 7 with with IN 10_1101-2021_02_08_430070 101 8 the the DT 10_1101-2021_02_08_430070 101 9 state state NN 10_1101-2021_02_08_430070 101 10 - - HYPH 10_1101-2021_02_08_430070 101 11 of of IN 10_1101-2021_02_08_430070 101 12 - - HYPH 10_1101-2021_02_08_430070 101 13 the the DT 10_1101-2021_02_08_430070 101 14 - - HYPH 10_1101-2021_02_08_430070 101 15 art art NN 10_1101-2021_02_08_430070 101 16 biRNN birnn NN 10_1101-2021_02_08_430070 101 17 model model NN 10_1101-2021_02_08_430070 101 18 , , , 10_1101-2021_02_08_430070 101 19 which which WDT 10_1101-2021_02_08_430070 101 20 is be VBZ 10_1101-2021_02_08_430070 101 21 used use VBN 10_1101-2021_02_08_430070 101 22 as as IN 10_1101-2021_02_08_430070 101 23 the the DT 10_1101-2021_02_08_430070 101 24 basic basic JJ 10_1101-2021_02_08_430070 101 25 network network NN 10_1101-2021_02_08_430070 101 26 structure structure NN 10_1101-2021_02_08_430070 101 27 in in IN 10_1101-2021_02_08_430070 101 28 DeepMOD DeepMOD NNP 10_1101-2021_02_08_430070 101 29 ( ( -LRB- 10_1101-2021_02_08_430070 101 30 Liu Liu NNP 10_1101-2021_02_08_430070 101 31 et et NNP 10_1101-2021_02_08_430070 101 32 al al NNP 10_1101-2021_02_08_430070 101 33 . . NNP 10_1101-2021_02_08_430070 101 34 , , , 10_1101-2021_02_08_430070 101 35 2019 2019 CD 10_1101-2021_02_08_430070 101 36 ) ) -RRB- 10_1101-2021_02_08_430070 101 37 and and CC 10_1101-2021_02_08_430070 101 38 DeepSignal DeepSignal NNP 10_1101-2021_02_08_430070 101 39 ( ( -LRB- 10_1101-2021_02_08_430070 101 40 Ni Ni NNP 10_1101-2021_02_08_430070 101 41 et et FW 10_1101-2021_02_08_430070 101 42 al al NNP 10_1101-2021_02_08_430070 101 43 . . NNP 10_1101-2021_02_08_430070 101 44 , , , 10_1101-2021_02_08_430070 101 45 2019 2019 CD 10_1101-2021_02_08_430070 101 46 ) ) -RRB- 10_1101-2021_02_08_430070 101 47 . . . 10_1101-2021_02_08_430070 102 1 To to TO 10_1101-2021_02_08_430070 102 2 compare compare VB 10_1101-2021_02_08_430070 102 3 with with IN 10_1101-2021_02_08_430070 102 4 other other JJ 10_1101-2021_02_08_430070 102 5 non non JJ 10_1101-2021_02_08_430070 102 6 - - JJ 10_1101-2021_02_08_430070 102 7 deep deep JJ 10_1101-2021_02_08_430070 102 8 - - HYPH 10_1101-2021_02_08_430070 102 9 learning- learning- NN 10_1101-2021_02_08_430070 102 10 based base VBN 10_1101-2021_02_08_430070 102 11 methods method NNS 10_1101-2021_02_08_430070 102 12 , , , 10_1101-2021_02_08_430070 102 13 we -PRON- PRP 10_1101-2021_02_08_430070 102 14 utilized utilize VBD 10_1101-2021_02_08_430070 102 15 the the DT 10_1101-2021_02_08_430070 102 16 CpG CpG NNP 10_1101-2021_02_08_430070 102 17 benchmark benchmark NNP 10_1101-2021_02_08_430070 102 18 pipeline pipeline NN 10_1101-2021_02_08_430070 102 19 ( ( -LRB- 10_1101-2021_02_08_430070 102 20 Yuen Yuen NNP 10_1101-2021_02_08_430070 102 21 et et FW 10_1101-2021_02_08_430070 102 22 al al NNP 10_1101-2021_02_08_430070 102 23 . . NNP 10_1101-2021_02_08_430070 102 24 , , , 10_1101-2021_02_08_430070 102 25 2020 2020 CD 10_1101-2021_02_08_430070 102 26 ) ) -RRB- 10_1101-2021_02_08_430070 102 27 as as IN 10_1101-2021_02_08_430070 102 28 a a DT 10_1101-2021_02_08_430070 102 29 pivot pivot NN 10_1101-2021_02_08_430070 102 30 . . . 10_1101-2021_02_08_430070 103 1 3.1 3.1 CD 10_1101-2021_02_08_430070 103 2 Data datum NNS 10_1101-2021_02_08_430070 103 3 and and CC 10_1101-2021_02_08_430070 103 4 model model NN 10_1101-2021_02_08_430070 103 5 parameters parameter NNS 10_1101-2021_02_08_430070 103 6 We -PRON- PRP 10_1101-2021_02_08_430070 103 7 train train VBP 10_1101-2021_02_08_430070 103 8 and and CC 10_1101-2021_02_08_430070 103 9 test test VBP 10_1101-2021_02_08_430070 103 10 the the DT 10_1101-2021_02_08_430070 103 11 models model NNS 10_1101-2021_02_08_430070 103 12 on on IN 10_1101-2021_02_08_430070 103 13 the the DT 10_1101-2021_02_08_430070 103 14 public public JJ 10_1101-2021_02_08_430070 103 15 accessible accessible JJ 10_1101-2021_02_08_430070 103 16 5mC 5mc CD 10_1101-2021_02_08_430070 103 17 ( ( -LRB- 10_1101-2021_02_08_430070 103 18 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 103 19 et et FW 10_1101-2021_02_08_430070 103 20 al al NNP 10_1101-2021_02_08_430070 103 21 . . NNP 10_1101-2021_02_08_430070 103 22 , , , 10_1101-2021_02_08_430070 103 23 2016 2016 CD 10_1101-2021_02_08_430070 103 24 ; ; : 10_1101-2021_02_08_430070 103 25 Simpson Simpson NNP 10_1101-2021_02_08_430070 103 26 et et FW 10_1101-2021_02_08_430070 103 27 al al NNP 10_1101-2021_02_08_430070 103 28 . . NNP 10_1101-2021_02_08_430070 103 29 , , , 10_1101-2021_02_08_430070 103 30 2017 2017 CD 10_1101-2021_02_08_430070 103 31 ) ) -RRB- 10_1101-2021_02_08_430070 103 32 and and CC 10_1101-2021_02_08_430070 103 33 6mA 6ma CD 10_1101-2021_02_08_430070 103 34 ( ( -LRB- 10_1101-2021_02_08_430070 103 35 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 103 36 et et FW 10_1101-2021_02_08_430070 103 37 al al NNP 10_1101-2021_02_08_430070 103 38 . . NNP 10_1101-2021_02_08_430070 103 39 , , , 10_1101-2021_02_08_430070 103 40 2016 2016 CD 10_1101-2021_02_08_430070 103 41 ) ) -RRB- 10_1101-2021_02_08_430070 103 42 datasets dataset VBZ 10_1101-2021_02_08_430070 103 43 . . . 10_1101-2021_02_08_430070 104 1 The the DT 10_1101-2021_02_08_430070 104 2 datasets dataset NNS 10_1101-2021_02_08_430070 104 3 include include VBP 10_1101-2021_02_08_430070 104 4 samples sample NNS 10_1101-2021_02_08_430070 104 5 of of IN 10_1101-2021_02_08_430070 104 6 E.coli e.coli JJ 10_1101-2021_02_08_430070 104 7 K-12 k-12 CD 10_1101-2021_02_08_430070 104 8 MG1655 mg1655 NN 10_1101-2021_02_08_430070 104 9 , , , 10_1101-2021_02_08_430070 104 10 K-12 k-12 CD 10_1101-2021_02_08_430070 104 11 ER2925 ER2925 NNP 10_1101-2021_02_08_430070 104 12 , , , 10_1101-2021_02_08_430070 104 13 and and CC 10_1101-2021_02_08_430070 104 14 H.sapiens H.sapiens NNP 10_1101-2021_02_08_430070 104 15 NA12878 NA12878 NNP 10_1101-2021_02_08_430070 104 16 . . . 10_1101-2021_02_08_430070 105 1 Negative negative JJ 10_1101-2021_02_08_430070 105 2 control control NN 10_1101-2021_02_08_430070 105 3 samples sample NNS 10_1101-2021_02_08_430070 105 4 are be VBP 10_1101-2021_02_08_430070 105 5 amplified amplify VBN 10_1101-2021_02_08_430070 105 6 with with IN 10_1101-2021_02_08_430070 105 7 PCR PCR NNP 10_1101-2021_02_08_430070 105 8 and and CC 10_1101-2021_02_08_430070 105 9 no no DT 10_1101-2021_02_08_430070 105 10 modified modify VBN 10_1101-2021_02_08_430070 105 11 bases basis NNS 10_1101-2021_02_08_430070 105 12 are be VBP 10_1101-2021_02_08_430070 105 13 included include VBN 10_1101-2021_02_08_430070 105 14 . . . 10_1101-2021_02_08_430070 106 1 Positive positive JJ 10_1101-2021_02_08_430070 106 2 control control NN 10_1101-2021_02_08_430070 106 3 samples sample NNS 10_1101-2021_02_08_430070 106 4 are be VBP 10_1101-2021_02_08_430070 106 5 synthetically synthetically RB 10_1101-2021_02_08_430070 106 6 introduced introduce VBN 10_1101-2021_02_08_430070 106 7 by by IN 10_1101-2021_02_08_430070 106 8 specific specific JJ 10_1101-2021_02_08_430070 106 9 enzymes enzyme NNS 10_1101-2021_02_08_430070 106 10 after after IN 10_1101-2021_02_08_430070 106 11 PCR PCR NNP 10_1101-2021_02_08_430070 106 12 amplification amplification NN 10_1101-2021_02_08_430070 106 13 , , , 10_1101-2021_02_08_430070 106 14 which which WDT 10_1101-2021_02_08_430070 106 15 includes include VBZ 10_1101-2021_02_08_430070 106 16 SssI SssI NNP 10_1101-2021_02_08_430070 106 17 , , , 10_1101-2021_02_08_430070 106 18 Hhal Hhal NNP 10_1101-2021_02_08_430070 106 19 , , , 10_1101-2021_02_08_430070 106 20 MpeI MpeI NNP 10_1101-2021_02_08_430070 106 21 methylases methylase VBZ 10_1101-2021_02_08_430070 106 22 for for IN 10_1101-2021_02_08_430070 106 23 5mC 5mc CD 10_1101-2021_02_08_430070 106 24 , , , 10_1101-2021_02_08_430070 106 25 and and CC 10_1101-2021_02_08_430070 106 26 TaqI TaqI NNP 10_1101-2021_02_08_430070 106 27 , , , 10_1101-2021_02_08_430070 106 28 EcoRI EcoRI NNP 10_1101-2021_02_08_430070 106 29 , , , 10_1101-2021_02_08_430070 106 30 and and CC 10_1101-2021_02_08_430070 106 31 Dam Dam NNP 10_1101-2021_02_08_430070 106 32 for for IN 10_1101-2021_02_08_430070 106 33 6mA 6ma CD 10_1101-2021_02_08_430070 106 34 modification modification NN 10_1101-2021_02_08_430070 106 35 . . . 10_1101-2021_02_08_430070 107 1 We -PRON- PRP 10_1101-2021_02_08_430070 107 2 use use VBP 10_1101-2021_02_08_430070 107 3 the the DT 10_1101-2021_02_08_430070 107 4 samples sample NNS 10_1101-2021_02_08_430070 107 5 that that WDT 10_1101-2021_02_08_430070 107 6 are be VBP 10_1101-2021_02_08_430070 107 7 sequenced sequence VBN 10_1101-2021_02_08_430070 107 8 with with IN 10_1101-2021_02_08_430070 107 9 Oxford Oxford NNP 10_1101-2021_02_08_430070 107 10 Nanopore Nanopore NNP 10_1101-2021_02_08_430070 107 11 R9 r9 JJ 10_1101-2021_02_08_430070 107 12 flow flow NN 10_1101-2021_02_08_430070 107 13 cells cell NNS 10_1101-2021_02_08_430070 107 14 . . . 10_1101-2021_02_08_430070 108 1 For for IN 10_1101-2021_02_08_430070 108 2 each each DT 10_1101-2021_02_08_430070 108 3 dataset dataset NN 10_1101-2021_02_08_430070 108 4 , , , 10_1101-2021_02_08_430070 108 5 we -PRON- PRP 10_1101-2021_02_08_430070 108 6 randomly randomly RB 10_1101-2021_02_08_430070 108 7 shuffle shuffle VBP 10_1101-2021_02_08_430070 108 8 reads read VBZ 10_1101-2021_02_08_430070 108 9 in in IN 10_1101-2021_02_08_430070 108 10 positive positive JJ 10_1101-2021_02_08_430070 108 11 and and CC 10_1101-2021_02_08_430070 108 12 negative negative JJ 10_1101-2021_02_08_430070 108 13 controls control NNS 10_1101-2021_02_08_430070 108 14 and and CC 10_1101-2021_02_08_430070 108 15 construct construct VB 10_1101-2021_02_08_430070 108 16 the the DT 10_1101-2021_02_08_430070 108 17 training training NN 10_1101-2021_02_08_430070 108 18 , , , 10_1101-2021_02_08_430070 108 19 validate validate NN 10_1101-2021_02_08_430070 108 20 and and CC 10_1101-2021_02_08_430070 108 21 test test NN 10_1101-2021_02_08_430070 108 22 set set NN 10_1101-2021_02_08_430070 108 23 according accord VBG 10_1101-2021_02_08_430070 108 24 to to IN 10_1101-2021_02_08_430070 108 25 a a DT 10_1101-2021_02_08_430070 108 26 split split JJ 10_1101-2021_02_08_430070 108 27 proportion proportion NN 10_1101-2021_02_08_430070 108 28 of of IN 10_1101-2021_02_08_430070 108 29 80/10/10 80/10/10 CD 10_1101-2021_02_08_430070 108 30 for for IN 10_1101-2021_02_08_430070 108 31 in in IN 10_1101-2021_02_08_430070 108 32 - - HYPH 10_1101-2021_02_08_430070 108 33 sample sample NN 10_1101-2021_02_08_430070 108 34 evaluation evaluation NN 10_1101-2021_02_08_430070 108 35 . . . 10_1101-2021_02_08_430070 109 1 For for IN 10_1101-2021_02_08_430070 109 2 the the DT 10_1101-2021_02_08_430070 109 3 cross cross JJ 10_1101-2021_02_08_430070 109 4 - - JJ 10_1101-2021_02_08_430070 109 5 sample sample JJ 10_1101-2021_02_08_430070 109 6 evaluation evaluation NN 10_1101-2021_02_08_430070 109 7 , , , 10_1101-2021_02_08_430070 109 8 we -PRON- PRP 10_1101-2021_02_08_430070 109 9 train train VBP 10_1101-2021_02_08_430070 109 10 models model NNS 10_1101-2021_02_08_430070 109 11 on on IN 10_1101-2021_02_08_430070 109 12 one one CD 10_1101-2021_02_08_430070 109 13 dataset dataset NN 10_1101-2021_02_08_430070 109 14 and and CC 10_1101-2021_02_08_430070 109 15 test test VB 10_1101-2021_02_08_430070 109 16 on on IN 10_1101-2021_02_08_430070 109 17 the the DT 10_1101-2021_02_08_430070 109 18 other other JJ 10_1101-2021_02_08_430070 109 19 dataset dataset NN 10_1101-2021_02_08_430070 109 20 . . . 10_1101-2021_02_08_430070 110 1 BiRNN birnn DT 10_1101-2021_02_08_430070 110 2 uses use VBZ 10_1101-2021_02_08_430070 110 3 the the DT 10_1101-2021_02_08_430070 110 4 default default NN 10_1101-2021_02_08_430070 110 5 model model NN 10_1101-2021_02_08_430070 110 6 architecture architecture NN 10_1101-2021_02_08_430070 110 7 and and CC 10_1101-2021_02_08_430070 110 8 parameter parameter NN 10_1101-2021_02_08_430070 110 9 setting setting NN 10_1101-2021_02_08_430070 110 10 of of IN 10_1101-2021_02_08_430070 110 11 DeepMOD DeepMOD NNP 10_1101-2021_02_08_430070 110 12 , , , 10_1101-2021_02_08_430070 110 13 which which WDT 10_1101-2021_02_08_430070 110 14 consists consist VBZ 10_1101-2021_02_08_430070 110 15 of of IN 10_1101-2021_02_08_430070 110 16 three three CD 10_1101-2021_02_08_430070 110 17 stacked stack VBN 10_1101-2021_02_08_430070 110 18 bi bi JJ 10_1101-2021_02_08_430070 110 19 - - JJ 10_1101-2021_02_08_430070 110 20 directional directional JJ 10_1101-2021_02_08_430070 110 21 recurrent recurrent JJ 10_1101-2021_02_08_430070 110 22 layers layer NNS 10_1101-2021_02_08_430070 110 23 ( ( -LRB- 10_1101-2021_02_08_430070 110 24 hidden_size=100 hidden_size=100 NNP 10_1101-2021_02_08_430070 110 25 ) ) -RRB- 10_1101-2021_02_08_430070 110 26 and and CC 10_1101-2021_02_08_430070 110 27 one one CD 10_1101-2021_02_08_430070 110 28 full full JJ 10_1101-2021_02_08_430070 110 29 connection connection NN 10_1101-2021_02_08_430070 110 30 layer layer NN 10_1101-2021_02_08_430070 110 31 for for IN 10_1101-2021_02_08_430070 110 32 the the DT 10_1101-2021_02_08_430070 110 33 center center NN 10_1101-2021_02_08_430070 110 34 position position NN 10_1101-2021_02_08_430070 110 35 . . . 10_1101-2021_02_08_430070 111 1 The the DT 10_1101-2021_02_08_430070 111 2 total total JJ 10_1101-2021_02_08_430070 111 3 number number NN 10_1101-2021_02_08_430070 111 4 of of IN 10_1101-2021_02_08_430070 111 5 biRNN birnn JJ 10_1101-2021_02_08_430070 111 6 parameters parameter NNS 10_1101-2021_02_08_430070 111 7 is be VBZ 10_1101-2021_02_08_430070 111 8 570,802 570,802 CD 10_1101-2021_02_08_430070 111 9 for for IN 10_1101-2021_02_08_430070 111 10 an an DT 10_1101-2021_02_08_430070 111 11 input input JJ 10_1101-2021_02_08_430070 111 12 length length NN 10_1101-2021_02_08_430070 111 13 of of IN 10_1101-2021_02_08_430070 111 14 21bp 21bp NN 10_1101-2021_02_08_430070 111 15 . . . 10_1101-2021_02_08_430070 112 1 BERTs bert NNS 10_1101-2021_02_08_430070 112 2 use use VBP 10_1101-2021_02_08_430070 112 3 three three CD 10_1101-2021_02_08_430070 112 4 attention attention NN 10_1101-2021_02_08_430070 112 5 layers layer NNS 10_1101-2021_02_08_430070 112 6 ( ( -LRB- 10_1101-2021_02_08_430070 112 7 hidden_size=100 hidden_size=100 UH 10_1101-2021_02_08_430070 112 8 , , , 10_1101-2021_02_08_430070 112 9 attention_head=4 attention_head=4 NNP 10_1101-2021_02_08_430070 112 10 ) ) -RRB- 10_1101-2021_02_08_430070 112 11 and and CC 10_1101-2021_02_08_430070 112 12 one one CD 10_1101-2021_02_08_430070 112 13 full full JJ 10_1101-2021_02_08_430070 112 14 connection connection NN 10_1101-2021_02_08_430070 112 15 layer layer NN 10_1101-2021_02_08_430070 112 16 . . . 10_1101-2021_02_08_430070 113 1 For for IN 10_1101-2021_02_08_430070 113 2 the the DT 10_1101-2021_02_08_430070 113 3 refined refined JJ 10_1101-2021_02_08_430070 113 4 BERT BERT NNP 10_1101-2021_02_08_430070 113 5 , , , 10_1101-2021_02_08_430070 113 6 learnable learnable JJ 10_1101-2021_02_08_430070 113 7 positional positional JJ 10_1101-2021_02_08_430070 113 8 encoding encoding NN 10_1101-2021_02_08_430070 113 9 , , , 10_1101-2021_02_08_430070 113 10 attention attention NN 10_1101-2021_02_08_430070 113 11 with with IN 10_1101-2021_02_08_430070 113 12 relative relative JJ 10_1101-2021_02_08_430070 113 13 position position NN 10_1101-2021_02_08_430070 113 14 representation representation NN 10_1101-2021_02_08_430070 113 15 and and CC 10_1101-2021_02_08_430070 113 16 center center NN 10_1101-2021_02_08_430070 113 17 - - HYPH 10_1101-2021_02_08_430070 113 18 hidden hide VBN 10_1101-2021_02_08_430070 113 19 - - HYPH 10_1101-2021_02_08_430070 113 20 concatenation concatenation NN 10_1101-2021_02_08_430070 113 21 are be VBP 10_1101-2021_02_08_430070 113 22 used use VBN 10_1101-2021_02_08_430070 113 23 . . . 10_1101-2021_02_08_430070 114 1 For for IN 10_1101-2021_02_08_430070 114 2 BERT BERT NNP 10_1101-2021_02_08_430070 114 3 and and CC 10_1101-2021_02_08_430070 114 4 refined refine VBN 10_1101-2021_02_08_430070 114 5 BERT BERT NNP 10_1101-2021_02_08_430070 114 6 , , , 10_1101-2021_02_08_430070 114 7 there there EX 10_1101-2021_02_08_430070 114 8 are be VBP 10_1101-2021_02_08_430070 114 9 total total NN 10_1101-2021_02_08_430070 114 10 of of IN 10_1101-2021_02_08_430070 114 11 364,902 364,902 CD 10_1101-2021_02_08_430070 114 12 and and CC 10_1101-2021_02_08_430070 114 13 368,202 368,202 CD 10_1101-2021_02_08_430070 114 14 parameters parameter NNS 10_1101-2021_02_08_430070 114 15 , , , 10_1101-2021_02_08_430070 114 16 which which WDT 10_1101-2021_02_08_430070 114 17 are be VBP 10_1101-2021_02_08_430070 114 18 around around RB 10_1101-2021_02_08_430070 114 19 35 35 CD 10_1101-2021_02_08_430070 114 20 % % NN 10_1101-2021_02_08_430070 114 21 less less JJR 10_1101-2021_02_08_430070 114 22 than than IN 10_1101-2021_02_08_430070 114 23 that that DT 10_1101-2021_02_08_430070 114 24 of of IN 10_1101-2021_02_08_430070 114 25 biRNN birnn NN 10_1101-2021_02_08_430070 114 26 . . . 10_1101-2021_02_08_430070 115 1 More more RBR 10_1101-2021_02_08_430070 115 2 detailed detailed JJ 10_1101-2021_02_08_430070 115 3 information information NN 10_1101-2021_02_08_430070 115 4 on on IN 10_1101-2021_02_08_430070 115 5 the the DT 10_1101-2021_02_08_430070 115 6 model model NN 10_1101-2021_02_08_430070 115 7 structures structure NNS 10_1101-2021_02_08_430070 115 8 is be VBZ 10_1101-2021_02_08_430070 115 9 described describe VBN 10_1101-2021_02_08_430070 115 10 in in IN 10_1101-2021_02_08_430070 115 11 the the DT 10_1101-2021_02_08_430070 115 12 supplement supplement NN 10_1101-2021_02_08_430070 115 13 material material NN 10_1101-2021_02_08_430070 115 14 . . . 10_1101-2021_02_08_430070 116 1 We -PRON- PRP 10_1101-2021_02_08_430070 116 2 implement implement VBP 10_1101-2021_02_08_430070 116 3 the the DT 10_1101-2021_02_08_430070 116 4 three three CD 10_1101-2021_02_08_430070 116 5 models model NNS 10_1101-2021_02_08_430070 116 6 using use VBG 10_1101-2021_02_08_430070 116 7 Pytorch Pytorch NNP 10_1101-2021_02_08_430070 116 8 . . . 10_1101-2021_02_08_430070 117 1 All all PDT 10_1101-2021_02_08_430070 117 2 the the DT 10_1101-2021_02_08_430070 117 3 models model NNS 10_1101-2021_02_08_430070 117 4 are be VBP 10_1101-2021_02_08_430070 117 5 optimized optimize VBN 10_1101-2021_02_08_430070 117 6 using use VBG 10_1101-2021_02_08_430070 117 7 Adam Adam NNP 10_1101-2021_02_08_430070 117 8 optimizer optimizer NN 10_1101-2021_02_08_430070 117 9 ( ( -LRB- 10_1101-2021_02_08_430070 117 10 Kingma Kingma NNP 10_1101-2021_02_08_430070 117 11 and and CC 10_1101-2021_02_08_430070 117 12 Ba Ba NNP 10_1101-2021_02_08_430070 117 13 , , , 10_1101-2021_02_08_430070 117 14 2014 2014 CD 10_1101-2021_02_08_430070 117 15 ) ) -RRB- 10_1101-2021_02_08_430070 117 16 with with IN 10_1101-2021_02_08_430070 117 17 the the DT 10_1101-2021_02_08_430070 117 18 learning learning NN 10_1101-2021_02_08_430070 117 19 rate rate NN 10_1101-2021_02_08_430070 117 20 of of IN 10_1101-2021_02_08_430070 117 21 1e 1e CD 10_1101-2021_02_08_430070 117 22 − − NNP 10_1101-2021_02_08_430070 117 23 4 4 CD 10_1101-2021_02_08_430070 117 24 and and CC 10_1101-2021_02_08_430070 117 25 maximum maximum JJ 10_1101-2021_02_08_430070 117 26 iteration iteration NN 10_1101-2021_02_08_430070 117 27 epoch epoch NN 10_1101-2021_02_08_430070 117 28 of of IN 10_1101-2021_02_08_430070 117 29 50 50 CD 10_1101-2021_02_08_430070 117 30 . . . 10_1101-2021_02_08_430070 118 1 Model model NN 10_1101-2021_02_08_430070 118 2 parameters parameter NNS 10_1101-2021_02_08_430070 118 3 are be VBP 10_1101-2021_02_08_430070 118 4 selected select VBN 10_1101-2021_02_08_430070 118 5 based base VBN 10_1101-2021_02_08_430070 118 6 on on IN 10_1101-2021_02_08_430070 118 7 the the DT 10_1101-2021_02_08_430070 118 8 minimum minimum JJ 10_1101-2021_02_08_430070 118 9 validation validation NN 10_1101-2021_02_08_430070 118 10 loss loss NN 10_1101-2021_02_08_430070 118 11 . . . 10_1101-2021_02_08_430070 119 1 3.2 3.2 CD 10_1101-2021_02_08_430070 119 2 Exploring Exploring NNP 10_1101-2021_02_08_430070 119 3 differentiated differentiated JJ 10_1101-2021_02_08_430070 119 4 signal signal NN 10_1101-2021_02_08_430070 119 5 positions position NNS 10_1101-2021_02_08_430070 119 6 in in IN 10_1101-2021_02_08_430070 119 7 the the DT 10_1101-2021_02_08_430070 119 8 context context NN 10_1101-2021_02_08_430070 119 9 window window NN 10_1101-2021_02_08_430070 119 10 surrounding surround VBG 10_1101-2021_02_08_430070 119 11 target target NN 10_1101-2021_02_08_430070 119 12 motifs motif NNS 10_1101-2021_02_08_430070 119 13 Ideally ideally RB 10_1101-2021_02_08_430070 119 14 , , , 10_1101-2021_02_08_430070 119 15 we -PRON- PRP 10_1101-2021_02_08_430070 119 16 assume assume VBP 10_1101-2021_02_08_430070 119 17 a a DT 10_1101-2021_02_08_430070 119 18 modified modify VBN 10_1101-2021_02_08_430070 119 19 nucleotide nucleotide NN 10_1101-2021_02_08_430070 119 20 ( ( -LRB- 10_1101-2021_02_08_430070 119 21 e.g. e.g. RB 10_1101-2021_02_08_430070 119 22 , , , 10_1101-2021_02_08_430070 119 23 the the DT 10_1101-2021_02_08_430070 119 24 center center JJ 10_1101-2021_02_08_430070 119 25 position position NN 10_1101-2021_02_08_430070 119 26 of of IN 10_1101-2021_02_08_430070 119 27 XXXXXXXXXXC5mCGXXXXXXXXX XXXXXXXXXXC5mCGXXXXXXXXX NNP 10_1101-2021_02_08_430070 119 28 ) ) -RRB- 10_1101-2021_02_08_430070 119 29 has have VBZ 10_1101-2021_02_08_430070 119 30 different different JJ 10_1101-2021_02_08_430070 119 31 current current JJ 10_1101-2021_02_08_430070 119 32 signals signal NNS 10_1101-2021_02_08_430070 119 33 , , , 10_1101-2021_02_08_430070 119 34 .license .license . 10_1101-2021_02_08_430070 119 35 CC cc NN 10_1101-2021_02_08_430070 119 36 - - HYPH 10_1101-2021_02_08_430070 119 37 BY BY NNP 10_1101-2021_02_08_430070 119 38 - - HYPH 10_1101-2021_02_08_430070 119 39 NC NC NNP 10_1101-2021_02_08_430070 119 40 - - HYPH 10_1101-2021_02_08_430070 119 41 ND ND NNP 10_1101-2021_02_08_430070 119 42 4.0 4.0 CD 10_1101-2021_02_08_430070 119 43 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 119 44 review review NN 10_1101-2021_02_08_430070 119 45 ) ) -RRB- 10_1101-2021_02_08_430070 119 46 is be VBZ 10_1101-2021_02_08_430070 119 47 the the DT 10_1101-2021_02_08_430070 119 48 author author NN 10_1101-2021_02_08_430070 119 49 / / SYM 10_1101-2021_02_08_430070 119 50 funder funder NN 10_1101-2021_02_08_430070 119 51 , , , 10_1101-2021_02_08_430070 119 52 who who WP 10_1101-2021_02_08_430070 119 53 has have VBZ 10_1101-2021_02_08_430070 119 54 granted grant VBN 10_1101-2021_02_08_430070 119 55 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 119 56 a a DT 10_1101-2021_02_08_430070 119 57 license license NN 10_1101-2021_02_08_430070 119 58 to to TO 10_1101-2021_02_08_430070 119 59 display display VB 10_1101-2021_02_08_430070 119 60 the the DT 10_1101-2021_02_08_430070 119 61 preprint preprint NN 10_1101-2021_02_08_430070 119 62 in in IN 10_1101-2021_02_08_430070 119 63 perpetuity perpetuity NN 10_1101-2021_02_08_430070 119 64 . . . 10_1101-2021_02_08_430070 120 1 It -PRON- PRP 10_1101-2021_02_08_430070 120 2 is be VBZ 10_1101-2021_02_08_430070 120 3 made make VBN 10_1101-2021_02_08_430070 120 4 available available JJ 10_1101-2021_02_08_430070 120 5 under under IN 10_1101-2021_02_08_430070 120 6 a a DT 10_1101-2021_02_08_430070 120 7 The the DT 10_1101-2021_02_08_430070 120 8 copyright copyright NN 10_1101-2021_02_08_430070 120 9 holder holder NN 10_1101-2021_02_08_430070 120 10 for for IN 10_1101-2021_02_08_430070 120 11 this this DT 10_1101-2021_02_08_430070 120 12 preprint preprint NN 10_1101-2021_02_08_430070 120 13 ( ( -LRB- 10_1101-2021_02_08_430070 120 14 which which WDT 10_1101-2021_02_08_430070 120 15 was be VBD 10_1101-2021_02_08_430070 120 16 not not RB 10_1101-2021_02_08_430070 120 17 certified certify VBN 10_1101-2021_02_08_430070 120 18 bythis bythis DT 10_1101-2021_02_08_430070 120 19 version version NN 10_1101-2021_02_08_430070 120 20 posted post VBD 10_1101-2021_02_08_430070 120 21 February February NNP 10_1101-2021_02_08_430070 120 22 10 10 CD 10_1101-2021_02_08_430070 120 23 , , , 10_1101-2021_02_08_430070 120 24 2021 2021 CD 10_1101-2021_02_08_430070 120 25 . . . 10_1101-2021_02_08_430070 120 26 ; ; : 10_1101-2021_02_08_430070 120 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 120 28 : : : 10_1101-2021_02_08_430070 120 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 120 30 preprint preprint NN 10_1101-2021_02_08_430070 120 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 120 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 120 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 120 34 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 35 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 36 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 37 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 38 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 39 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 40 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 41 ✐ ✐ NNP 10_1101-2021_02_08_430070 120 42 4 4 CD 10_1101-2021_02_08_430070 120 43 Zhang Zhang NNP 10_1101-2021_02_08_430070 120 44 et et NNP 10_1101-2021_02_08_430070 120 45 al al NNP 10_1101-2021_02_08_430070 120 46 . . . 10_1101-2021_02_08_430070 121 1 ( ( -LRB- 10_1101-2021_02_08_430070 121 2 a1 a1 NNP 10_1101-2021_02_08_430070 121 3 ) ) -RRB- 10_1101-2021_02_08_430070 121 4 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 121 5 - - HYPH 10_1101-2021_02_08_430070 121 6 E.coli_Cg_SssI E.coli_Cg_SssI NNP 10_1101-2021_02_08_430070 121 7 ( ( -LRB- 10_1101-2021_02_08_430070 121 8 a2 a2 NNP 10_1101-2021_02_08_430070 121 9 ) ) -RRB- 10_1101-2021_02_08_430070 121 10 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 121 11 - - HYPH 10_1101-2021_02_08_430070 121 12 E.coli_Cg_MpeI E.coli_Cg_MpeI NNP 10_1101-2021_02_08_430070 121 13 ( ( -LRB- 10_1101-2021_02_08_430070 121 14 a3 a3 NNP 10_1101-2021_02_08_430070 121 15 ) ) -RRB- 10_1101-2021_02_08_430070 121 16 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 121 17 - - HYPH 10_1101-2021_02_08_430070 121 18 E.coli_gCgc_Hhal e.coli_gcgc_hhal NN 10_1101-2021_02_08_430070 121 19 ( ( -LRB- 10_1101-2021_02_08_430070 121 20 b1 b1 NN 10_1101-2021_02_08_430070 121 21 ) ) -RRB- 10_1101-2021_02_08_430070 121 22 Simpson Simpson NNP 10_1101-2021_02_08_430070 121 23 - - HYPH 10_1101-2021_02_08_430070 121 24 E.coli_Cg_SssI E.coli_Cg_SssI NNP 10_1101-2021_02_08_430070 121 25 ( ( -LRB- 10_1101-2021_02_08_430070 121 26 b2 b2 NN 10_1101-2021_02_08_430070 121 27 ) ) -RRB- 10_1101-2021_02_08_430070 121 28 Simpson Simpson NNP 10_1101-2021_02_08_430070 121 29 - - HYPH 10_1101-2021_02_08_430070 121 30 H.Sapiens_Cg_SssI H.Sapiens_Cg_SssI NNP 10_1101-2021_02_08_430070 121 31 ( ( -LRB- 10_1101-2021_02_08_430070 121 32 c1 c1 NN 10_1101-2021_02_08_430070 121 33 ) ) -RRB- 10_1101-2021_02_08_430070 121 34 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 121 35 - - HYPH 10_1101-2021_02_08_430070 121 36 E.coli_gaAttc_EcoRI E.coli_gaAttc_EcoRI NNP 10_1101-2021_02_08_430070 121 37 ( ( -LRB- 10_1101-2021_02_08_430070 121 38 c2 c2 NN 10_1101-2021_02_08_430070 121 39 ) ) -RRB- 10_1101-2021_02_08_430070 121 40 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 121 41 - - HYPH 10_1101-2021_02_08_430070 121 42 E.coli_tcgA_TaqI E.coli_tcgA_TaqI NNP 10_1101-2021_02_08_430070 121 43 ( ( -LRB- 10_1101-2021_02_08_430070 121 44 c3 c3 NN 10_1101-2021_02_08_430070 121 45 ) ) -RRB- 10_1101-2021_02_08_430070 121 46 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 121 47 - - HYPH 10_1101-2021_02_08_430070 121 48 E.coli_gAtc_Dam E.coli_gAtc_Dam NNP 10_1101-2021_02_08_430070 121 49 Fig Fig NNP 10_1101-2021_02_08_430070 121 50 . . . 10_1101-2021_02_08_430070 122 1 2 2 LS 10_1101-2021_02_08_430070 122 2 : : : 10_1101-2021_02_08_430070 122 3 Boxplot boxplot NN 10_1101-2021_02_08_430070 122 4 of of IN 10_1101-2021_02_08_430070 122 5 positional positional JJ 10_1101-2021_02_08_430070 122 6 signal signal NN 10_1101-2021_02_08_430070 122 7 - - HYPH 10_1101-2021_02_08_430070 122 8 shift shift NN 10_1101-2021_02_08_430070 122 9 for for IN 10_1101-2021_02_08_430070 122 10 5mC 5mc CD 10_1101-2021_02_08_430070 122 11 and and CC 10_1101-2021_02_08_430070 122 12 6mA 6ma CD 10_1101-2021_02_08_430070 122 13 datasets dataset NNS 10_1101-2021_02_08_430070 122 14 of of IN 10_1101-2021_02_08_430070 122 15 the the DT 10_1101-2021_02_08_430070 122 16 specific specific JJ 10_1101-2021_02_08_430070 122 17 motif motif NN 10_1101-2021_02_08_430070 122 18 and and CC 10_1101-2021_02_08_430070 122 19 methyltransferase methyltransferase NN 10_1101-2021_02_08_430070 122 20 . . . 10_1101-2021_02_08_430070 123 1 ( ( -LRB- 10_1101-2021_02_08_430070 123 2 a1),(a2 a1),(a2 NNP 10_1101-2021_02_08_430070 123 3 ) ) -RRB- 10_1101-2021_02_08_430070 123 4 and and CC 10_1101-2021_02_08_430070 123 5 ( ( -LRB- 10_1101-2021_02_08_430070 123 6 a3 a3 NNP 10_1101-2021_02_08_430070 123 7 ) ) -RRB- 10_1101-2021_02_08_430070 123 8 are be VBP 10_1101-2021_02_08_430070 123 9 on on IN 10_1101-2021_02_08_430070 123 10 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 123 11 ’s ’s POS 10_1101-2021_02_08_430070 123 12 E.coli e.coli JJ 10_1101-2021_02_08_430070 123 13 5mC 5mc CD 10_1101-2021_02_08_430070 123 14 dataset dataset NN 10_1101-2021_02_08_430070 123 15 . . . 10_1101-2021_02_08_430070 124 1 ( ( -LRB- 10_1101-2021_02_08_430070 124 2 b1 b1 NN 10_1101-2021_02_08_430070 124 3 ) ) -RRB- 10_1101-2021_02_08_430070 124 4 and and CC 10_1101-2021_02_08_430070 124 5 ( ( -LRB- 10_1101-2021_02_08_430070 124 6 b2 b2 NN 10_1101-2021_02_08_430070 124 7 ) ) -RRB- 10_1101-2021_02_08_430070 124 8 are be VBP 10_1101-2021_02_08_430070 124 9 on on IN 10_1101-2021_02_08_430070 124 10 Simpson Simpson NNP 10_1101-2021_02_08_430070 124 11 ’s ’s NNP 10_1101-2021_02_08_430070 124 12 5mC 5mc CD 10_1101-2021_02_08_430070 124 13 dataset dataset NN 10_1101-2021_02_08_430070 124 14 . . . 10_1101-2021_02_08_430070 125 1 ( ( -LRB- 10_1101-2021_02_08_430070 125 2 c1 c1 NN 10_1101-2021_02_08_430070 125 3 ) ) -RRB- 10_1101-2021_02_08_430070 125 4 , , , 10_1101-2021_02_08_430070 125 5 ( ( -LRB- 10_1101-2021_02_08_430070 125 6 c2 c2 NNP 10_1101-2021_02_08_430070 125 7 ) ) -RRB- 10_1101-2021_02_08_430070 125 8 and and CC 10_1101-2021_02_08_430070 125 9 ( ( -LRB- 10_1101-2021_02_08_430070 125 10 c3 c3 NN 10_1101-2021_02_08_430070 125 11 ) ) -RRB- 10_1101-2021_02_08_430070 125 12 are be VBP 10_1101-2021_02_08_430070 125 13 on on IN 10_1101-2021_02_08_430070 125 14 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 125 15 ’s ’s POS 10_1101-2021_02_08_430070 125 16 E.coli e.coli JJ 10_1101-2021_02_08_430070 125 17 6mA 6ma CD 10_1101-2021_02_08_430070 125 18 dataset dataset NN 10_1101-2021_02_08_430070 125 19 . . . 10_1101-2021_02_08_430070 126 1 Each each DT 10_1101-2021_02_08_430070 126 2 dataset dataset NN 10_1101-2021_02_08_430070 126 3 is be VBZ 10_1101-2021_02_08_430070 126 4 represented represent VBN 10_1101-2021_02_08_430070 126 5 in in IN 10_1101-2021_02_08_430070 126 6 a a DT 10_1101-2021_02_08_430070 126 7 format format NN 10_1101-2021_02_08_430070 126 8 of of IN 10_1101-2021_02_08_430070 126 9 dataSource_motif_methltansferase datasource_motif_methltansferase ADD 10_1101-2021_02_08_430070 126 10 . . . 10_1101-2021_02_08_430070 127 1 when when WRB 10_1101-2021_02_08_430070 127 2 compared compare VBN 10_1101-2021_02_08_430070 127 3 with with IN 10_1101-2021_02_08_430070 127 4 the the DT 10_1101-2021_02_08_430070 127 5 unmodified unmodified JJ 10_1101-2021_02_08_430070 127 6 one one NN 10_1101-2021_02_08_430070 127 7 . . . 10_1101-2021_02_08_430070 128 1 As as IN 10_1101-2021_02_08_430070 128 2 the the DT 10_1101-2021_02_08_430070 128 3 boundary boundary NN 10_1101-2021_02_08_430070 128 4 of of IN 10_1101-2021_02_08_430070 128 5 nucleotide nucleotide NNP 10_1101-2021_02_08_430070 128 6 / / SYM 10_1101-2021_02_08_430070 128 7 k- k- NNP 10_1101-2021_02_08_430070 128 8 mer mer NNP 10_1101-2021_02_08_430070 128 9 signals signal NNS 10_1101-2021_02_08_430070 128 10 are be VBP 10_1101-2021_02_08_430070 128 11 not not RB 10_1101-2021_02_08_430070 128 12 rigorous rigorous JJ 10_1101-2021_02_08_430070 128 13 and and CC 10_1101-2021_02_08_430070 128 14 surrounding surround VBG 10_1101-2021_02_08_430070 128 15 nucleotides nucleotide NNS 10_1101-2021_02_08_430070 128 16 may may MD 10_1101-2021_02_08_430070 128 17 also also RB 10_1101-2021_02_08_430070 128 18 be be VB 10_1101-2021_02_08_430070 128 19 affected affect VBN 10_1101-2021_02_08_430070 128 20 , , , 10_1101-2021_02_08_430070 128 21 it -PRON- PRP 10_1101-2021_02_08_430070 128 22 is be VBZ 10_1101-2021_02_08_430070 128 23 worthwhile worthwhile JJ 10_1101-2021_02_08_430070 128 24 investigating investigate VBG 10_1101-2021_02_08_430070 128 25 signal signal JJ 10_1101-2021_02_08_430070 128 26 - - HYPH 10_1101-2021_02_08_430070 128 27 shift shift NN 10_1101-2021_02_08_430070 128 28 patterns pattern NNS 10_1101-2021_02_08_430070 128 29 related relate VBN 10_1101-2021_02_08_430070 128 30 to to IN 10_1101-2021_02_08_430070 128 31 methylation methylation NN 10_1101-2021_02_08_430070 128 32 in in IN 10_1101-2021_02_08_430070 128 33 a a DT 10_1101-2021_02_08_430070 128 34 large large JJ 10_1101-2021_02_08_430070 128 35 context context NN 10_1101-2021_02_08_430070 128 36 . . . 10_1101-2021_02_08_430070 129 1 To to TO 10_1101-2021_02_08_430070 129 2 identify identify VB 10_1101-2021_02_08_430070 129 3 signal signal NN 10_1101-2021_02_08_430070 129 4 - - HYPH 10_1101-2021_02_08_430070 129 5 shift shift NN 10_1101-2021_02_08_430070 129 6 affected affect VBN 10_1101-2021_02_08_430070 129 7 by by IN 10_1101-2021_02_08_430070 129 8 methylation methylation NN 10_1101-2021_02_08_430070 129 9 for for IN 10_1101-2021_02_08_430070 129 10 a a DT 10_1101-2021_02_08_430070 129 11 specific specific JJ 10_1101-2021_02_08_430070 129 12 dataset dataset NN 10_1101-2021_02_08_430070 129 13 , , , 10_1101-2021_02_08_430070 129 14 we -PRON- PRP 10_1101-2021_02_08_430070 129 15 use use VBP 10_1101-2021_02_08_430070 129 16 a a DT 10_1101-2021_02_08_430070 129 17 simple simple JJ 10_1101-2021_02_08_430070 129 18 quantification quantification NN 10_1101-2021_02_08_430070 129 19 approach approach NN 10_1101-2021_02_08_430070 129 20 to to TO 10_1101-2021_02_08_430070 129 21 calculate calculate VB 10_1101-2021_02_08_430070 129 22 significant significant JJ 10_1101-2021_02_08_430070 129 23 signal signal NN 10_1101-2021_02_08_430070 129 24 changes change NNS 10_1101-2021_02_08_430070 129 25 of of IN 10_1101-2021_02_08_430070 129 26 each each DT 10_1101-2021_02_08_430070 129 27 position position NN 10_1101-2021_02_08_430070 129 28 in in IN 10_1101-2021_02_08_430070 129 29 the the DT 10_1101-2021_02_08_430070 129 30 context context NN 10_1101-2021_02_08_430070 129 31 window window NN 10_1101-2021_02_08_430070 129 32 . . . 10_1101-2021_02_08_430070 130 1 Given give VBN 10_1101-2021_02_08_430070 130 2 a a DT 10_1101-2021_02_08_430070 130 3 dataset dataset NN 10_1101-2021_02_08_430070 130 4 of of IN 10_1101-2021_02_08_430070 130 5 a a DT 10_1101-2021_02_08_430070 130 6 specific specific JJ 10_1101-2021_02_08_430070 130 7 motif motif NN 10_1101-2021_02_08_430070 130 8 and and CC 10_1101-2021_02_08_430070 130 9 methyltransferase methyltransferase NN 10_1101-2021_02_08_430070 130 10 , , , 10_1101-2021_02_08_430070 130 11 we -PRON- PRP 10_1101-2021_02_08_430070 130 12 first first RB 10_1101-2021_02_08_430070 130 13 cluster cluster VBD 10_1101-2021_02_08_430070 130 14 instances instance NNS 10_1101-2021_02_08_430070 130 15 with with IN 10_1101-2021_02_08_430070 130 16 the the DT 10_1101-2021_02_08_430070 130 17 same same JJ 10_1101-2021_02_08_430070 130 18 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 130 19 sequence sequence NN 10_1101-2021_02_08_430070 130 20 to to TO 10_1101-2021_02_08_430070 130 21 avoid avoid VB 10_1101-2021_02_08_430070 130 22 the the DT 10_1101-2021_02_08_430070 130 23 effect effect NN 10_1101-2021_02_08_430070 130 24 of of IN 10_1101-2021_02_08_430070 130 25 nucleotide nucleotide JJ 10_1101-2021_02_08_430070 130 26 sequences sequence NNS 10_1101-2021_02_08_430070 130 27 . . . 10_1101-2021_02_08_430070 131 1 We -PRON- PRP 10_1101-2021_02_08_430070 131 2 reserve reserve VBP 10_1101-2021_02_08_430070 131 3 sequence sequence NN 10_1101-2021_02_08_430070 131 4 clusters cluster NNS 10_1101-2021_02_08_430070 131 5 that that WDT 10_1101-2021_02_08_430070 131 6 contain contain VBP 10_1101-2021_02_08_430070 131 7 both both DT 10_1101-2021_02_08_430070 131 8 methylation methylation NN 10_1101-2021_02_08_430070 131 9 and and CC 10_1101-2021_02_08_430070 131 10 unmethylation unmethylation NN 10_1101-2021_02_08_430070 131 11 instances instance NNS 10_1101-2021_02_08_430070 131 12 ( ( -LRB- 10_1101-2021_02_08_430070 131 13 ≥ ≥ UH 10_1101-2021_02_08_430070 131 14 1 1 CD 10_1101-2021_02_08_430070 131 15 ) ) -RRB- 10_1101-2021_02_08_430070 131 16 . . . 10_1101-2021_02_08_430070 132 1 For for IN 10_1101-2021_02_08_430070 132 2 each each DT 10_1101-2021_02_08_430070 132 3 sequence sequence NN 10_1101-2021_02_08_430070 132 4 cluster cluster NN 10_1101-2021_02_08_430070 132 5 , , , 10_1101-2021_02_08_430070 132 6 we -PRON- PRP 10_1101-2021_02_08_430070 132 7 normalize normalize VBP 10_1101-2021_02_08_430070 132 8 event event NN 10_1101-2021_02_08_430070 132 9 signal signal NN 10_1101-2021_02_08_430070 132 10 values value NNS 10_1101-2021_02_08_430070 132 11 of of IN 10_1101-2021_02_08_430070 132 12 methylation methylation NN 10_1101-2021_02_08_430070 132 13 samples sample NNS 10_1101-2021_02_08_430070 132 14 with with IN 10_1101-2021_02_08_430070 132 15 their -PRON- PRP$ 10_1101-2021_02_08_430070 132 16 according accord VBG 10_1101-2021_02_08_430070 132 17 unmodified unmodified JJ 10_1101-2021_02_08_430070 132 18 averaged averaged JJ 10_1101-2021_02_08_430070 132 19 event event NN 10_1101-2021_02_08_430070 132 20 signal signal NN 10_1101-2021_02_08_430070 132 21 values value NNS 10_1101-2021_02_08_430070 132 22 for for IN 10_1101-2021_02_08_430070 132 23 each each DT 10_1101-2021_02_08_430070 132 24 position position NN 10_1101-2021_02_08_430070 132 25 . . . 10_1101-2021_02_08_430070 133 1 The the DT 10_1101-2021_02_08_430070 133 2 i i NNP 10_1101-2021_02_08_430070 133 3 - - HYPH 10_1101-2021_02_08_430070 133 4 th th NNP 10_1101-2021_02_08_430070 133 5 positional positional JJ 10_1101-2021_02_08_430070 133 6 signal signal JJ 10_1101-2021_02_08_430070 133 7 - - HYPH 10_1101-2021_02_08_430070 133 8 shift shift NN 10_1101-2021_02_08_430070 133 9 is be VBZ 10_1101-2021_02_08_430070 133 10 then then RB 10_1101-2021_02_08_430070 133 11 calculated calculate VBN 10_1101-2021_02_08_430070 133 12 as as IN 10_1101-2021_02_08_430070 133 13 smethi smethi NN 10_1101-2021_02_08_430070 133 14 − − NNP 10_1101-2021_02_08_430070 133 15 avg(s avg(s UH 10_1101-2021_02_08_430070 133 16 unmeth unmeth JJ 10_1101-2021_02_08_430070 133 17 i i PRP 10_1101-2021_02_08_430070 133 18 ) ) -RRB- 10_1101-2021_02_08_430070 133 19 . . . 10_1101-2021_02_08_430070 134 1 For for IN 10_1101-2021_02_08_430070 134 2 those those DT 10_1101-2021_02_08_430070 134 3 normalized normalized JJ 10_1101-2021_02_08_430070 134 4 methylation methylation NN 10_1101-2021_02_08_430070 134 5 samples sample NNS 10_1101-2021_02_08_430070 134 6 , , , 10_1101-2021_02_08_430070 134 7 we -PRON- PRP 10_1101-2021_02_08_430070 134 8 calculate calculate VBP 10_1101-2021_02_08_430070 134 9 basic basic JJ 10_1101-2021_02_08_430070 134 10 statistics statistic NNS 10_1101-2021_02_08_430070 134 11 of of IN 10_1101-2021_02_08_430070 134 12 signal signal NN 10_1101-2021_02_08_430070 134 13 - - HYPH 10_1101-2021_02_08_430070 134 14 shift shift NN 10_1101-2021_02_08_430070 134 15 for for IN 10_1101-2021_02_08_430070 134 16 each each DT 10_1101-2021_02_08_430070 134 17 position position NN 10_1101-2021_02_08_430070 134 18 and and CC 10_1101-2021_02_08_430070 134 19 draw draw VB 10_1101-2021_02_08_430070 134 20 boxplots boxplot NNS 10_1101-2021_02_08_430070 134 21 for for IN 10_1101-2021_02_08_430070 134 22 5mC 5mc CD 10_1101-2021_02_08_430070 134 23 and and CC 10_1101-2021_02_08_430070 134 24 6mA 6ma CD 10_1101-2021_02_08_430070 134 25 training training NN 10_1101-2021_02_08_430070 134 26 sets set NNS 10_1101-2021_02_08_430070 134 27 . . . 10_1101-2021_02_08_430070 135 1 Shown show VBN 10_1101-2021_02_08_430070 135 2 in in IN 10_1101-2021_02_08_430070 135 3 Figure Figure NNP 10_1101-2021_02_08_430070 135 4 2 2 CD 10_1101-2021_02_08_430070 135 5 , , , 10_1101-2021_02_08_430070 135 6 for for IN 10_1101-2021_02_08_430070 135 7 all all DT 10_1101-2021_02_08_430070 135 8 datasets dataset NNS 10_1101-2021_02_08_430070 135 9 , , , 10_1101-2021_02_08_430070 135 10 we -PRON- PRP 10_1101-2021_02_08_430070 135 11 can can MD 10_1101-2021_02_08_430070 135 12 observed observe VBN 10_1101-2021_02_08_430070 135 13 positions position NNS 10_1101-2021_02_08_430070 135 14 of of IN 10_1101-2021_02_08_430070 135 15 significantly significantly RB 10_1101-2021_02_08_430070 135 16 signal signal NN 10_1101-2021_02_08_430070 135 17 - - HYPH 10_1101-2021_02_08_430070 135 18 shift shift NN 10_1101-2021_02_08_430070 135 19 are be VBP 10_1101-2021_02_08_430070 135 20 located locate VBN 10_1101-2021_02_08_430070 135 21 in in IN 10_1101-2021_02_08_430070 135 22 a a DT 10_1101-2021_02_08_430070 135 23 range range NN 10_1101-2021_02_08_430070 135 24 of of IN 10_1101-2021_02_08_430070 135 25 ±3bp ±3bp NN 10_1101-2021_02_08_430070 135 26 to to IN 10_1101-2021_02_08_430070 135 27 the the DT 10_1101-2021_02_08_430070 135 28 center center JJ 10_1101-2021_02_08_430070 135 29 position position NN 10_1101-2021_02_08_430070 135 30 ( ( -LRB- 10_1101-2021_02_08_430070 135 31 the the DT 10_1101-2021_02_08_430070 135 32 11th 11th NN 10_1101-2021_02_08_430070 135 33 ) ) -RRB- 10_1101-2021_02_08_430070 135 34 in in IN 10_1101-2021_02_08_430070 135 35 which which WDT 10_1101-2021_02_08_430070 135 36 the the DT 10_1101-2021_02_08_430070 135 37 target target NN 10_1101-2021_02_08_430070 135 38 nucleotide nucleotide RB 10_1101-2021_02_08_430070 135 39 is be VBZ 10_1101-2021_02_08_430070 135 40 located locate VBN 10_1101-2021_02_08_430070 135 41 . . . 10_1101-2021_02_08_430070 136 1 For for IN 10_1101-2021_02_08_430070 136 2 the the DT 10_1101-2021_02_08_430070 136 3 rest rest NN 10_1101-2021_02_08_430070 136 4 off off IN 10_1101-2021_02_08_430070 136 5 - - HYPH 10_1101-2021_02_08_430070 136 6 center center NN 10_1101-2021_02_08_430070 136 7 positions position NNS 10_1101-2021_02_08_430070 136 8 , , , 10_1101-2021_02_08_430070 136 9 the the DT 10_1101-2021_02_08_430070 136 10 averaged average VBN 10_1101-2021_02_08_430070 136 11 signal signal JJ 10_1101-2021_02_08_430070 136 12 - - HYPH 10_1101-2021_02_08_430070 136 13 shift shift NN 10_1101-2021_02_08_430070 136 14 values value NNS 10_1101-2021_02_08_430070 136 15 are be VBP 10_1101-2021_02_08_430070 136 16 close close JJ 10_1101-2021_02_08_430070 136 17 to to IN 10_1101-2021_02_08_430070 136 18 0 0 CD 10_1101-2021_02_08_430070 136 19 . . . 10_1101-2021_02_08_430070 137 1 This this DT 10_1101-2021_02_08_430070 137 2 indicates indicate VBZ 10_1101-2021_02_08_430070 137 3 a a DT 10_1101-2021_02_08_430070 137 4 modified modify VBN 10_1101-2021_02_08_430070 137 5 nucleotide nucleotide RB 10_1101-2021_02_08_430070 137 6 not not RB 10_1101-2021_02_08_430070 137 7 only only RB 10_1101-2021_02_08_430070 137 8 affect affect VB 10_1101-2021_02_08_430070 137 9 its -PRON- PRP$ 10_1101-2021_02_08_430070 137 10 corresponding correspond VBG 10_1101-2021_02_08_430070 137 11 current current JJ 10_1101-2021_02_08_430070 137 12 signals signal NNS 10_1101-2021_02_08_430070 137 13 but but CC 10_1101-2021_02_08_430070 137 14 also also RB 10_1101-2021_02_08_430070 137 15 the the DT 10_1101-2021_02_08_430070 137 16 signals signal NNS 10_1101-2021_02_08_430070 137 17 of of IN 10_1101-2021_02_08_430070 137 18 its -PRON- PRP$ 10_1101-2021_02_08_430070 137 19 surrounding surround VBG 10_1101-2021_02_08_430070 137 20 nucleotides nucleotide NNS 10_1101-2021_02_08_430070 137 21 . . . 10_1101-2021_02_08_430070 138 1 Besides besides RB 10_1101-2021_02_08_430070 138 2 , , , 10_1101-2021_02_08_430070 138 3 5mC 5mc CD 10_1101-2021_02_08_430070 138 4 and and CC 10_1101-2021_02_08_430070 138 5 6mA 6mA NNP 10_1101-2021_02_08_430070 138 6 datasets dataset NNS 10_1101-2021_02_08_430070 138 7 show show VBP 10_1101-2021_02_08_430070 138 8 different different JJ 10_1101-2021_02_08_430070 138 9 positional positional JJ 10_1101-2021_02_08_430070 138 10 - - HYPH 10_1101-2021_02_08_430070 138 11 signal signal NN 10_1101-2021_02_08_430070 138 12 - - HYPH 10_1101-2021_02_08_430070 138 13 shift shift NN 10_1101-2021_02_08_430070 138 14 patterns pattern NNS 10_1101-2021_02_08_430070 138 15 . . . 10_1101-2021_02_08_430070 139 1 Specific specific JJ 10_1101-2021_02_08_430070 139 2 positions position NNS 10_1101-2021_02_08_430070 139 3 , , , 10_1101-2021_02_08_430070 139 4 such such JJ 10_1101-2021_02_08_430070 139 5 as as IN 10_1101-2021_02_08_430070 139 6 -2bp -2bp NN 10_1101-2021_02_08_430070 139 7 position position NN 10_1101-2021_02_08_430070 139 8 ( ( -LRB- 10_1101-2021_02_08_430070 139 9 9th 9th NN 10_1101-2021_02_08_430070 139 10 ) ) -RRB- 10_1101-2021_02_08_430070 139 11 in in IN 10_1101-2021_02_08_430070 139 12 the the DT 10_1101-2021_02_08_430070 139 13 5mC 5mc CD 10_1101-2021_02_08_430070 139 14 dataset dataset NN 10_1101-2021_02_08_430070 139 15 and and CC 10_1101-2021_02_08_430070 139 16 +1bp +1bp NN 10_1101-2021_02_08_430070 139 17 position position NN 10_1101-2021_02_08_430070 139 18 ( ( -LRB- 10_1101-2021_02_08_430070 139 19 12th 12th NN 10_1101-2021_02_08_430070 139 20 ) ) -RRB- 10_1101-2021_02_08_430070 139 21 in in IN 10_1101-2021_02_08_430070 139 22 the the DT 10_1101-2021_02_08_430070 139 23 6mA 6ma CD 10_1101-2021_02_08_430070 139 24 dataset dataset NN 10_1101-2021_02_08_430070 139 25 , , , 10_1101-2021_02_08_430070 139 26 have have VBP 10_1101-2021_02_08_430070 139 27 larger large JJR 10_1101-2021_02_08_430070 139 28 averaged average VBN 10_1101-2021_02_08_430070 139 29 signal- signal- NN 10_1101-2021_02_08_430070 139 30 shift shift NN 10_1101-2021_02_08_430070 139 31 values value NNS 10_1101-2021_02_08_430070 139 32 . . . 10_1101-2021_02_08_430070 140 1 Such such JJ 10_1101-2021_02_08_430070 140 2 pattern pattern NN 10_1101-2021_02_08_430070 140 3 can can MD 10_1101-2021_02_08_430070 140 4 be be VB 10_1101-2021_02_08_430070 140 5 generalized generalize VBN 10_1101-2021_02_08_430070 140 6 across across IN 10_1101-2021_02_08_430070 140 7 the the DT 10_1101-2021_02_08_430070 140 8 different different JJ 10_1101-2021_02_08_430070 140 9 dataset dataset NN 10_1101-2021_02_08_430070 140 10 with with IN 10_1101-2021_02_08_430070 140 11 the the DT 10_1101-2021_02_08_430070 140 12 same same JJ 10_1101-2021_02_08_430070 140 13 motif motif NN 10_1101-2021_02_08_430070 140 14 and and CC 10_1101-2021_02_08_430070 140 15 methyltransferase methyltransferase NN 10_1101-2021_02_08_430070 140 16 . . . 10_1101-2021_02_08_430070 141 1 For for IN 10_1101-2021_02_08_430070 141 2 example example NN 10_1101-2021_02_08_430070 141 3 , , , 10_1101-2021_02_08_430070 141 4 Figure figure NN 10_1101-2021_02_08_430070 141 5 2 2 CD 10_1101-2021_02_08_430070 141 6 ( ( -LRB- 10_1101-2021_02_08_430070 141 7 a1 a1 NNP 10_1101-2021_02_08_430070 141 8 ) ) -RRB- 10_1101-2021_02_08_430070 141 9 , , , 10_1101-2021_02_08_430070 141 10 ( ( -LRB- 10_1101-2021_02_08_430070 141 11 b1 b1 NN 10_1101-2021_02_08_430070 141 12 ) ) -RRB- 10_1101-2021_02_08_430070 141 13 and and CC 10_1101-2021_02_08_430070 141 14 ( ( -LRB- 10_1101-2021_02_08_430070 141 15 b2 b2 NN 10_1101-2021_02_08_430070 141 16 ) ) -RRB- 10_1101-2021_02_08_430070 141 17 show show VBP 10_1101-2021_02_08_430070 141 18 a a DT 10_1101-2021_02_08_430070 141 19 similar similar JJ 10_1101-2021_02_08_430070 141 20 positional positional JJ 10_1101-2021_02_08_430070 141 21 signal signal JJ 10_1101-2021_02_08_430070 141 22 - - HYPH 10_1101-2021_02_08_430070 141 23 shift shift NN 10_1101-2021_02_08_430070 141 24 pattern pattern NN 10_1101-2021_02_08_430070 141 25 . . . 10_1101-2021_02_08_430070 142 1 For for IN 10_1101-2021_02_08_430070 142 2 different different JJ 10_1101-2021_02_08_430070 142 3 methyltransferases methyltransferase NNS 10_1101-2021_02_08_430070 142 4 , , , 10_1101-2021_02_08_430070 142 5 such such JJ 10_1101-2021_02_08_430070 142 6 as as IN 10_1101-2021_02_08_430070 142 7 Hhal Hhal NNP 10_1101-2021_02_08_430070 142 8 ( ( -LRB- 10_1101-2021_02_08_430070 142 9 Figure Figure NNP 10_1101-2021_02_08_430070 142 10 2(a3 2(a3 NNP 10_1101-2021_02_08_430070 142 11 ) ) -RRB- 10_1101-2021_02_08_430070 142 12 ) ) -RRB- 10_1101-2021_02_08_430070 142 13 also also RB 10_1101-2021_02_08_430070 142 14 shows show VBZ 10_1101-2021_02_08_430070 142 15 a a DT 10_1101-2021_02_08_430070 142 16 similar similar JJ 10_1101-2021_02_08_430070 142 17 pattern pattern NN 10_1101-2021_02_08_430070 142 18 as as IN 10_1101-2021_02_08_430070 142 19 in in IN 10_1101-2021_02_08_430070 142 20 SssI SssI NNP 10_1101-2021_02_08_430070 142 21 , , , 10_1101-2021_02_08_430070 142 22 while while IN 10_1101-2021_02_08_430070 142 23 MpeI MpeI NNP 10_1101-2021_02_08_430070 142 24 does do VBZ 10_1101-2021_02_08_430070 142 25 not not RB 10_1101-2021_02_08_430070 142 26 have have VB 10_1101-2021_02_08_430070 142 27 a a DT 10_1101-2021_02_08_430070 142 28 similar similar JJ 10_1101-2021_02_08_430070 142 29 pattern pattern NN 10_1101-2021_02_08_430070 142 30 obviously obviously RB 10_1101-2021_02_08_430070 142 31 ( ( -LRB- 10_1101-2021_02_08_430070 142 32 Figure figure NN 10_1101-2021_02_08_430070 142 33 2(a2 2(a2 NNP 10_1101-2021_02_08_430070 142 34 ) ) -RRB- 10_1101-2021_02_08_430070 142 35 ) ) -RRB- 10_1101-2021_02_08_430070 142 36 . . . 10_1101-2021_02_08_430070 143 1 Those those DT 10_1101-2021_02_08_430070 143 2 positional positional JJ 10_1101-2021_02_08_430070 143 3 signal signal JJ 10_1101-2021_02_08_430070 143 4 patterns pattern NNS 10_1101-2021_02_08_430070 143 5 can can MD 10_1101-2021_02_08_430070 143 6 be be VB 10_1101-2021_02_08_430070 143 7 directly directly RB 10_1101-2021_02_08_430070 143 8 modeled model VBN 10_1101-2021_02_08_430070 143 9 by by IN 10_1101-2021_02_08_430070 143 10 a a DT 10_1101-2021_02_08_430070 143 11 biRNN birnn NN 10_1101-2021_02_08_430070 143 12 , , , 10_1101-2021_02_08_430070 143 13 while while IN 10_1101-2021_02_08_430070 143 14 for for IN 10_1101-2021_02_08_430070 143 15 the the DT 10_1101-2021_02_08_430070 143 16 basic basic JJ 10_1101-2021_02_08_430070 143 17 BERT BERT NNP 10_1101-2021_02_08_430070 143 18 , , , 10_1101-2021_02_08_430070 143 19 they -PRON- PRP 10_1101-2021_02_08_430070 143 20 are be VBP 10_1101-2021_02_08_430070 143 21 not not RB 10_1101-2021_02_08_430070 143 22 specifically specifically RB 10_1101-2021_02_08_430070 143 23 considered consider VBN 10_1101-2021_02_08_430070 143 24 in in IN 10_1101-2021_02_08_430070 143 25 its -PRON- PRP$ 10_1101-2021_02_08_430070 143 26 model model NN 10_1101-2021_02_08_430070 143 27 structure structure NN 10_1101-2021_02_08_430070 143 28 . . . 10_1101-2021_02_08_430070 144 1 In in IN 10_1101-2021_02_08_430070 144 2 a a DT 10_1101-2021_02_08_430070 144 3 biRNN birnn NN 10_1101-2021_02_08_430070 144 4 , , , 10_1101-2021_02_08_430070 144 5 such such JJ 10_1101-2021_02_08_430070 144 6 as as IN 10_1101-2021_02_08_430070 144 7 the the DT 10_1101-2021_02_08_430070 144 8 implementation implementation NN 10_1101-2021_02_08_430070 144 9 of of IN 10_1101-2021_02_08_430070 144 10 deepMOD deepMOD NNP 10_1101-2021_02_08_430070 144 11 , , , 10_1101-2021_02_08_430070 144 12 the the DT 10_1101-2021_02_08_430070 144 13 last last JJ 10_1101-2021_02_08_430070 144 14 full full JJ 10_1101-2021_02_08_430070 144 15 connection connection NN 10_1101-2021_02_08_430070 144 16 layer layer NN 10_1101-2021_02_08_430070 144 17 uses use VBZ 10_1101-2021_02_08_430070 144 18 hidden hide VBN 10_1101-2021_02_08_430070 144 19 units unit NNS 10_1101-2021_02_08_430070 144 20 of of IN 10_1101-2021_02_08_430070 144 21 the the DT 10_1101-2021_02_08_430070 144 22 center center NN 10_1101-2021_02_08_430070 144 23 time time NN 10_1101-2021_02_08_430070 144 24 step step NN 10_1101-2021_02_08_430070 144 25 as as IN 10_1101-2021_02_08_430070 144 26 the the DT 10_1101-2021_02_08_430070 144 27 input input NN 10_1101-2021_02_08_430070 144 28 . . . 10_1101-2021_02_08_430070 145 1 Meanwhile meanwhile RB 10_1101-2021_02_08_430070 145 2 , , , 10_1101-2021_02_08_430070 145 3 the the DT 10_1101-2021_02_08_430070 145 4 bi bi JJ 10_1101-2021_02_08_430070 145 5 - - JJ 10_1101-2021_02_08_430070 145 6 directional directional JJ 10_1101-2021_02_08_430070 145 7 structure structure NN 10_1101-2021_02_08_430070 145 8 and and CC 10_1101-2021_02_08_430070 145 9 the the DT 10_1101-2021_02_08_430070 145 10 information information NN 10_1101-2021_02_08_430070 145 11 decay decay NN 10_1101-2021_02_08_430070 145 12 from from IN 10_1101-2021_02_08_430070 145 13 both both DT 10_1101-2021_02_08_430070 145 14 ends end NNS 10_1101-2021_02_08_430070 145 15 to to IN 10_1101-2021_02_08_430070 145 16 the the DT 10_1101-2021_02_08_430070 145 17 center center JJ 10_1101-2021_02_08_430070 145 18 position position NN 10_1101-2021_02_08_430070 145 19 render render VB 10_1101-2021_02_08_430070 145 20 the the DT 10_1101-2021_02_08_430070 145 21 model model NN 10_1101-2021_02_08_430070 145 22 focusing focus VBG 10_1101-2021_02_08_430070 145 23 more more RBR 10_1101-2021_02_08_430070 145 24 on on IN 10_1101-2021_02_08_430070 145 25 center center NN 10_1101-2021_02_08_430070 145 26 positions position NNS 10_1101-2021_02_08_430070 145 27 . . . 10_1101-2021_02_08_430070 146 1 For for IN 10_1101-2021_02_08_430070 146 2 the the DT 10_1101-2021_02_08_430070 146 3 basic basic JJ 10_1101-2021_02_08_430070 146 4 BERT BERT NNP 10_1101-2021_02_08_430070 146 5 , , , 10_1101-2021_02_08_430070 146 6 as as IN 10_1101-2021_02_08_430070 146 7 any any DT 10_1101-2021_02_08_430070 146 8 arbitrary arbitrary JJ 10_1101-2021_02_08_430070 146 9 time- time- NN 10_1101-2021_02_08_430070 146 10 step step NN 10_1101-2021_02_08_430070 146 11 pair pair NN 10_1101-2021_02_08_430070 146 12 is be VBZ 10_1101-2021_02_08_430070 146 13 processed process VBN 10_1101-2021_02_08_430070 146 14 with with IN 10_1101-2021_02_08_430070 146 15 the the DT 10_1101-2021_02_08_430070 146 16 same same JJ 10_1101-2021_02_08_430070 146 17 attention attention NN 10_1101-2021_02_08_430070 146 18 module module NN 10_1101-2021_02_08_430070 146 19 , , , 10_1101-2021_02_08_430070 146 20 the the DT 10_1101-2021_02_08_430070 146 21 importance importance NN 10_1101-2021_02_08_430070 146 22 of of IN 10_1101-2021_02_08_430070 146 23 center center NN 10_1101-2021_02_08_430070 146 24 positions position NNS 10_1101-2021_02_08_430070 146 25 are be VBP 10_1101-2021_02_08_430070 146 26 not not RB 10_1101-2021_02_08_430070 146 27 specifically specifically RB 10_1101-2021_02_08_430070 146 28 considered consider VBN 10_1101-2021_02_08_430070 146 29 in in IN 10_1101-2021_02_08_430070 146 30 the the DT 10_1101-2021_02_08_430070 146 31 model model NN 10_1101-2021_02_08_430070 146 32 . . . 10_1101-2021_02_08_430070 147 1 Therefore therefore RB 10_1101-2021_02_08_430070 147 2 , , , 10_1101-2021_02_08_430070 147 3 we -PRON- PRP 10_1101-2021_02_08_430070 147 4 propose propose VBP 10_1101-2021_02_08_430070 147 5 a a DT 10_1101-2021_02_08_430070 147 6 refined refined JJ 10_1101-2021_02_08_430070 147 7 BERT BERT NNP 10_1101-2021_02_08_430070 147 8 model model NN 10_1101-2021_02_08_430070 147 9 to to TO 10_1101-2021_02_08_430070 147 10 solve solve VB 10_1101-2021_02_08_430070 147 11 this this DT 10_1101-2021_02_08_430070 147 12 problem problem NN 10_1101-2021_02_08_430070 147 13 . . . 10_1101-2021_02_08_430070 148 1 We -PRON- PRP 10_1101-2021_02_08_430070 148 2 incorporate incorporate VBP 10_1101-2021_02_08_430070 148 3 relative relative JJ 10_1101-2021_02_08_430070 148 4 - - HYPH 10_1101-2021_02_08_430070 148 5 position position NN 10_1101-2021_02_08_430070 148 6 attention attention NN 10_1101-2021_02_08_430070 148 7 and and CC 10_1101-2021_02_08_430070 148 8 center center NN 10_1101-2021_02_08_430070 148 9 - - HYPH 10_1101-2021_02_08_430070 148 10 hidden hide VBN 10_1101-2021_02_08_430070 148 11 - - HYPH 10_1101-2021_02_08_430070 148 12 units unit NNS 10_1101-2021_02_08_430070 148 13 concatenation concatenation NN 10_1101-2021_02_08_430070 148 14 to to TO 10_1101-2021_02_08_430070 148 15 enable enable VB 10_1101-2021_02_08_430070 148 16 a a DT 10_1101-2021_02_08_430070 148 17 BERT BERT NNP 10_1101-2021_02_08_430070 148 18 model model NN 10_1101-2021_02_08_430070 148 19 to to TO 10_1101-2021_02_08_430070 148 20 pay pay VB 10_1101-2021_02_08_430070 148 21 more more JJR 10_1101-2021_02_08_430070 148 22 attention attention NN 10_1101-2021_02_08_430070 148 23 to to IN 10_1101-2021_02_08_430070 148 24 center center NN 10_1101-2021_02_08_430070 148 25 positions position NNS 10_1101-2021_02_08_430070 148 26 . . . 10_1101-2021_02_08_430070 149 1 3.3 3.3 CD 10_1101-2021_02_08_430070 149 2 In in IN 10_1101-2021_02_08_430070 149 3 - - HYPH 10_1101-2021_02_08_430070 149 4 sample sample NN 10_1101-2021_02_08_430070 149 5 evaluation evaluation NN 10_1101-2021_02_08_430070 149 6 To to TO 10_1101-2021_02_08_430070 149 7 evaluate evaluate VB 10_1101-2021_02_08_430070 149 8 model model NN 10_1101-2021_02_08_430070 149 9 performance performance NN 10_1101-2021_02_08_430070 149 10 , , , 10_1101-2021_02_08_430070 149 11 we -PRON- PRP 10_1101-2021_02_08_430070 149 12 first first RB 10_1101-2021_02_08_430070 149 13 perform perform VBP 10_1101-2021_02_08_430070 149 14 the the DT 10_1101-2021_02_08_430070 149 15 in in IN 10_1101-2021_02_08_430070 149 16 - - HYPH 10_1101-2021_02_08_430070 149 17 sample sample NN 10_1101-2021_02_08_430070 149 18 evaluation evaluation NN 10_1101-2021_02_08_430070 149 19 on on IN 10_1101-2021_02_08_430070 149 20 5mC 5mC NNP 10_1101-2021_02_08_430070 149 21 and and CC 10_1101-2021_02_08_430070 149 22 6mA 6mA NNP 10_1101-2021_02_08_430070 149 23 datasets dataset NNS 10_1101-2021_02_08_430070 149 24 . . . 10_1101-2021_02_08_430070 150 1 The the DT 10_1101-2021_02_08_430070 150 2 predictions prediction NNS 10_1101-2021_02_08_430070 150 3 of of IN 10_1101-2021_02_08_430070 150 4 different different JJ 10_1101-2021_02_08_430070 150 5 models model NNS 10_1101-2021_02_08_430070 150 6 are be VBP 10_1101-2021_02_08_430070 150 7 evaluated evaluate VBN 10_1101-2021_02_08_430070 150 8 on on IN 10_1101-2021_02_08_430070 150 9 the the DT 10_1101-2021_02_08_430070 150 10 read read JJ 10_1101-2021_02_08_430070 150 11 and and CC 10_1101-2021_02_08_430070 150 12 genomic genomic JJ 10_1101-2021_02_08_430070 150 13 level level NN 10_1101-2021_02_08_430070 150 14 . . . 10_1101-2021_02_08_430070 151 1 For for IN 10_1101-2021_02_08_430070 151 2 the the DT 10_1101-2021_02_08_430070 151 3 genomic genomic JJ 10_1101-2021_02_08_430070 151 4 level level NN 10_1101-2021_02_08_430070 151 5 evaluation evaluation NN 10_1101-2021_02_08_430070 151 6 , , , 10_1101-2021_02_08_430070 151 7 we -PRON- PRP 10_1101-2021_02_08_430070 151 8 group group VBP 10_1101-2021_02_08_430070 151 9 all all DT 10_1101-2021_02_08_430070 151 10 reads read VBZ 10_1101-2021_02_08_430070 151 11 aligned align VBN 10_1101-2021_02_08_430070 151 12 to to IN 10_1101-2021_02_08_430070 151 13 the the DT 10_1101-2021_02_08_430070 151 14 same same JJ 10_1101-2021_02_08_430070 151 15 genomic genomic JJ 10_1101-2021_02_08_430070 151 16 coordinate coordinate NN 10_1101-2021_02_08_430070 151 17 , , , 10_1101-2021_02_08_430070 151 18 and and CC 10_1101-2021_02_08_430070 151 19 uses use VBZ 10_1101-2021_02_08_430070 151 20 a a DT 10_1101-2021_02_08_430070 151 21 threshold threshold NN 10_1101-2021_02_08_430070 151 22 of of IN 10_1101-2021_02_08_430070 151 23 prediction prediction NN 10_1101-2021_02_08_430070 151 24 methylation methylation NN 10_1101-2021_02_08_430070 151 25 percentage percentage NN 10_1101-2021_02_08_430070 151 26 ≥ ≥ CD 10_1101-2021_02_08_430070 151 27 0.1 0.1 CD 10_1101-2021_02_08_430070 151 28 ( ( -LRB- 10_1101-2021_02_08_430070 151 29 same same JJ 10_1101-2021_02_08_430070 151 30 as as IN 10_1101-2021_02_08_430070 151 31 deepMOD deepMOD NNP 10_1101-2021_02_08_430070 151 32 ) ) -RRB- 10_1101-2021_02_08_430070 151 33 as as IN 10_1101-2021_02_08_430070 151 34 a a DT 10_1101-2021_02_08_430070 151 35 genomic genomic JJ 10_1101-2021_02_08_430070 151 36 position position NN 10_1101-2021_02_08_430070 151 37 prediction prediction NN 10_1101-2021_02_08_430070 151 38 . . . 10_1101-2021_02_08_430070 152 1 In in IN 10_1101-2021_02_08_430070 152 2 general general JJ 10_1101-2021_02_08_430070 152 3 , , , 10_1101-2021_02_08_430070 152 4 on on IN 10_1101-2021_02_08_430070 152 5 the the DT 10_1101-2021_02_08_430070 152 6 five five CD 10_1101-2021_02_08_430070 152 7 5mC 5mc CD 10_1101-2021_02_08_430070 152 8 datasets dataset NNS 10_1101-2021_02_08_430070 152 9 , , , 10_1101-2021_02_08_430070 152 10 the the DT 10_1101-2021_02_08_430070 152 11 AUC AUC NNP 10_1101-2021_02_08_430070 152 12 performance performance NN 10_1101-2021_02_08_430070 152 13 of of IN 10_1101-2021_02_08_430070 152 14 the the DT 10_1101-2021_02_08_430070 152 15 three three CD 10_1101-2021_02_08_430070 152 16 models model NNS 10_1101-2021_02_08_430070 152 17 are be VBP 10_1101-2021_02_08_430070 152 18 relatively relatively RB 10_1101-2021_02_08_430070 152 19 close close JJ 10_1101-2021_02_08_430070 152 20 on on IN 10_1101-2021_02_08_430070 152 21 both both DT 10_1101-2021_02_08_430070 152 22 read read VBN 10_1101-2021_02_08_430070 152 23 level level NN 10_1101-2021_02_08_430070 152 24 and and CC 10_1101-2021_02_08_430070 152 25 genomic genomic JJ 10_1101-2021_02_08_430070 152 26 level level NN 10_1101-2021_02_08_430070 152 27 . . . 10_1101-2021_02_08_430070 153 1 The the DT 10_1101-2021_02_08_430070 153 2 basic basic JJ 10_1101-2021_02_08_430070 153 3 BERT BERT NNP 10_1101-2021_02_08_430070 153 4 model model NN 10_1101-2021_02_08_430070 153 5 does do VBZ 10_1101-2021_02_08_430070 153 6 not not RB 10_1101-2021_02_08_430070 153 7 work work VB 10_1101-2021_02_08_430070 153 8 as as RB 10_1101-2021_02_08_430070 153 9 well well RB 10_1101-2021_02_08_430070 153 10 as as IN 10_1101-2021_02_08_430070 153 11 the the DT 10_1101-2021_02_08_430070 153 12 biRNN birnn JJ 10_1101-2021_02_08_430070 153 13 model model NN 10_1101-2021_02_08_430070 153 14 that that IN 10_1101-2021_02_08_430070 153 15 AUC AUC NNP 10_1101-2021_02_08_430070 153 16 scores score NNS 10_1101-2021_02_08_430070 153 17 are be VBP 10_1101-2021_02_08_430070 153 18 lower low JJR 10_1101-2021_02_08_430070 153 19 . . . 10_1101-2021_02_08_430070 154 1 The the DT 10_1101-2021_02_08_430070 154 2 refined refined JJ 10_1101-2021_02_08_430070 154 3 BERT BERT NNP 10_1101-2021_02_08_430070 154 4 model model NN 10_1101-2021_02_08_430070 154 5 achieves achieve VBZ 10_1101-2021_02_08_430070 154 6 equivalent equivalent JJ 10_1101-2021_02_08_430070 154 7 or or CC 10_1101-2021_02_08_430070 154 8 better well JJR 10_1101-2021_02_08_430070 154 9 AUC AUC NNP 10_1101-2021_02_08_430070 154 10 scores score NNS 10_1101-2021_02_08_430070 154 11 on on IN 10_1101-2021_02_08_430070 154 12 the the DT 10_1101-2021_02_08_430070 154 13 genomic- genomic- JJ 10_1101-2021_02_08_430070 154 14 level level NN 10_1101-2021_02_08_430070 154 15 . . . 10_1101-2021_02_08_430070 155 1 Note note VB 10_1101-2021_02_08_430070 155 2 that that IN 10_1101-2021_02_08_430070 155 3 on on IN 10_1101-2021_02_08_430070 155 4 the the DT 10_1101-2021_02_08_430070 155 5 dataset dataset JJ 10_1101-2021_02_08_430070 155 6 Stoiber_E.coli_CG_MpeI Stoiber_E.coli_CG_MpeI NNP 10_1101-2021_02_08_430070 155 7 and and CC 10_1101-2021_02_08_430070 155 8 .license .license NNP 10_1101-2021_02_08_430070 155 9 CC CC NNP 10_1101-2021_02_08_430070 155 10 - - HYPH 10_1101-2021_02_08_430070 155 11 BY BY NNP 10_1101-2021_02_08_430070 155 12 - - HYPH 10_1101-2021_02_08_430070 155 13 NC NC NNP 10_1101-2021_02_08_430070 155 14 - - HYPH 10_1101-2021_02_08_430070 155 15 ND ND NNP 10_1101-2021_02_08_430070 155 16 4.0 4.0 CD 10_1101-2021_02_08_430070 155 17 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 155 18 review review NN 10_1101-2021_02_08_430070 155 19 ) ) -RRB- 10_1101-2021_02_08_430070 155 20 is be VBZ 10_1101-2021_02_08_430070 155 21 the the DT 10_1101-2021_02_08_430070 155 22 author author NN 10_1101-2021_02_08_430070 155 23 / / SYM 10_1101-2021_02_08_430070 155 24 funder funder NN 10_1101-2021_02_08_430070 155 25 , , , 10_1101-2021_02_08_430070 155 26 who who WP 10_1101-2021_02_08_430070 155 27 has have VBZ 10_1101-2021_02_08_430070 155 28 granted grant VBN 10_1101-2021_02_08_430070 155 29 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 155 30 a a DT 10_1101-2021_02_08_430070 155 31 license license NN 10_1101-2021_02_08_430070 155 32 to to TO 10_1101-2021_02_08_430070 155 33 display display VB 10_1101-2021_02_08_430070 155 34 the the DT 10_1101-2021_02_08_430070 155 35 preprint preprint NN 10_1101-2021_02_08_430070 155 36 in in IN 10_1101-2021_02_08_430070 155 37 perpetuity perpetuity NN 10_1101-2021_02_08_430070 155 38 . . . 10_1101-2021_02_08_430070 156 1 It -PRON- PRP 10_1101-2021_02_08_430070 156 2 is be VBZ 10_1101-2021_02_08_430070 156 3 made make VBN 10_1101-2021_02_08_430070 156 4 available available JJ 10_1101-2021_02_08_430070 156 5 under under IN 10_1101-2021_02_08_430070 156 6 a a DT 10_1101-2021_02_08_430070 156 7 The the DT 10_1101-2021_02_08_430070 156 8 copyright copyright NN 10_1101-2021_02_08_430070 156 9 holder holder NN 10_1101-2021_02_08_430070 156 10 for for IN 10_1101-2021_02_08_430070 156 11 this this DT 10_1101-2021_02_08_430070 156 12 preprint preprint NN 10_1101-2021_02_08_430070 156 13 ( ( -LRB- 10_1101-2021_02_08_430070 156 14 which which WDT 10_1101-2021_02_08_430070 156 15 was be VBD 10_1101-2021_02_08_430070 156 16 not not RB 10_1101-2021_02_08_430070 156 17 certified certify VBN 10_1101-2021_02_08_430070 156 18 bythis bythis DT 10_1101-2021_02_08_430070 156 19 version version NN 10_1101-2021_02_08_430070 156 20 posted post VBD 10_1101-2021_02_08_430070 156 21 February February NNP 10_1101-2021_02_08_430070 156 22 10 10 CD 10_1101-2021_02_08_430070 156 23 , , , 10_1101-2021_02_08_430070 156 24 2021 2021 CD 10_1101-2021_02_08_430070 156 25 . . . 10_1101-2021_02_08_430070 156 26 ; ; : 10_1101-2021_02_08_430070 156 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 156 28 : : : 10_1101-2021_02_08_430070 156 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 156 30 preprint preprint NN 10_1101-2021_02_08_430070 156 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 156 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 156 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 156 34 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 35 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 36 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 37 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 38 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 39 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 40 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 41 ✐ ✐ NNP 10_1101-2021_02_08_430070 156 42 BERT BERT NNP 10_1101-2021_02_08_430070 156 43 for for IN 10_1101-2021_02_08_430070 156 44 nanopore nanopore JJ 10_1101-2021_02_08_430070 156 45 methylation methylation NN 10_1101-2021_02_08_430070 156 46 detection detection NN 10_1101-2021_02_08_430070 156 47 5 5 CD 10_1101-2021_02_08_430070 156 48 Dataset Dataset NNP 10_1101-2021_02_08_430070 156 49 Species Species NNP 10_1101-2021_02_08_430070 156 50 Motif_Methyltransferase Motif_Methyltransferase NNP 10_1101-2021_02_08_430070 156 51 Model Model NNP 10_1101-2021_02_08_430070 156 52 Single Single NNP 10_1101-2021_02_08_430070 156 53 ( ( -LRB- 10_1101-2021_02_08_430070 156 54 read read VBN 10_1101-2021_02_08_430070 156 55 - - HYPH 10_1101-2021_02_08_430070 156 56 level level NN 10_1101-2021_02_08_430070 156 57 ) ) -RRB- 10_1101-2021_02_08_430070 156 58 Group group NN 10_1101-2021_02_08_430070 156 59 ( ( -LRB- 10_1101-2021_02_08_430070 156 60 > > XX 10_1101-2021_02_08_430070 156 61 = = SYM 10_1101-2021_02_08_430070 156 62 1 1 CD 10_1101-2021_02_08_430070 156 63 , , , 10_1101-2021_02_08_430070 156 64 genomic genomic JJ 10_1101-2021_02_08_430070 156 65 - - HYPH 10_1101-2021_02_08_430070 156 66 level level NN 10_1101-2021_02_08_430070 156 67 ) ) -RRB- 10_1101-2021_02_08_430070 156 68 AUC AUC NNP 10_1101-2021_02_08_430070 156 69 Precision Precision NNP 10_1101-2021_02_08_430070 156 70 Recall Recall NNP 10_1101-2021_02_08_430070 156 71 AUC AUC NNP 10_1101-2021_02_08_430070 156 72 Precision Precision NNP 10_1101-2021_02_08_430070 156 73 Recall Recall NNP 10_1101-2021_02_08_430070 156 74 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 156 75 E.coli e.coli JJ 10_1101-2021_02_08_430070 156 76 GCGC_HhaI GCGC_HhaI VBZ 10_1101-2021_02_08_430070 156 77 biRNN birnn NN 10_1101-2021_02_08_430070 156 78 0.9205 0.9205 CD 10_1101-2021_02_08_430070 156 79 0.9545 0.9545 CD 10_1101-2021_02_08_430070 156 80 0.8593 0.8593 CD 10_1101-2021_02_08_430070 156 81 0.9322 0.9322 CD 10_1101-2021_02_08_430070 156 82 0.9320 0.9320 CD 10_1101-2021_02_08_430070 156 83 0.9134 0.9134 CD 10_1101-2021_02_08_430070 156 84 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 156 85 0.9183 0.9183 CD 10_1101-2021_02_08_430070 156 86 0.9528 0.9528 CD 10_1101-2021_02_08_430070 156 87 0.8556 0.8556 CD 10_1101-2021_02_08_430070 156 88 0.9305 0.9305 CD 10_1101-2021_02_08_430070 156 89 0.9299 0.9299 CD 10_1101-2021_02_08_430070 156 90 0.9113 0.9113 CD 10_1101-2021_02_08_430070 156 91 BERT_refined BERT_refined NNP 10_1101-2021_02_08_430070 156 92 0.9239 0.9239 CD 10_1101-2021_02_08_430070 156 93 0.9563 0.9563 CD 10_1101-2021_02_08_430070 156 94 0.8655 0.8655 CD 10_1101-2021_02_08_430070 156 95 0.9351 0.9351 CD 10_1101-2021_02_08_430070 156 96 0.9341 0.9341 CD 10_1101-2021_02_08_430070 156 97 0.9177 0.9177 CD 10_1101-2021_02_08_430070 156 98 CG_MpeI CG_MpeI NNP 10_1101-2021_02_08_430070 156 99 BiRNN BiRNN NNP 10_1101-2021_02_08_430070 156 100 0.7184 0.7184 CD 10_1101-2021_02_08_430070 156 101 0.8943 0.8943 CD 10_1101-2021_02_08_430070 156 102 0.4555 0.4555 CD 10_1101-2021_02_08_430070 156 103 0.7482 0.7482 CD 10_1101-2021_02_08_430070 156 104 0.8764 0.8764 CD 10_1101-2021_02_08_430070 156 105 0.5452 0.5452 CD 10_1101-2021_02_08_430070 156 106 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 156 107 0.7045 0.7045 CD 10_1101-2021_02_08_430070 156 108 0.8682 0.8682 CD 10_1101-2021_02_08_430070 156 109 0.4316 0.4316 CD 10_1101-2021_02_08_430070 156 110 0.7312 0.7312 CD 10_1101-2021_02_08_430070 156 111 0.8494 0.8494 CD 10_1101-2021_02_08_430070 156 112 0.5211 0.5211 CD 10_1101-2021_02_08_430070 156 113 BERT_refined bert_refine VBN 10_1101-2021_02_08_430070 156 114 0.717 0.717 CD 10_1101-2021_02_08_430070 156 115 0.9017 0.9017 CD 10_1101-2021_02_08_430070 156 116 0.4511 0.4511 CD 10_1101-2021_02_08_430070 156 117 0.7482 0.7482 CD 10_1101-2021_02_08_430070 156 118 0.8848 0.8848 CD 10_1101-2021_02_08_430070 156 119 0.5412 0.5412 CD 10_1101-2021_02_08_430070 156 120 CG_SssI CG_SssI NNP 10_1101-2021_02_08_430070 156 121 BiRNN birnn DT 10_1101-2021_02_08_430070 156 122 0.9017 0.9017 CD 10_1101-2021_02_08_430070 156 123 0.9576 0.9576 CD 10_1101-2021_02_08_430070 156 124 0.8097 0.8097 CD 10_1101-2021_02_08_430070 156 125 0.9127 0.9127 CD 10_1101-2021_02_08_430070 156 126 0.9508 0.9508 CD 10_1101-2021_02_08_430070 156 127 0.8420 0.8420 CD 10_1101-2021_02_08_430070 156 128 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 156 129 0.9001 0.9001 CD 10_1101-2021_02_08_430070 156 130 0.9534 0.9534 CD 10_1101-2021_02_08_430070 156 131 0.8071 0.8071 CD 10_1101-2021_02_08_430070 156 132 0.9107 0.9107 CD 10_1101-2021_02_08_430070 156 133 0.9463 0.9463 CD 10_1101-2021_02_08_430070 156 134 0.8395 0.8395 CD 10_1101-2021_02_08_430070 156 135 BERT_refined bert_refine VBN 10_1101-2021_02_08_430070 156 136 0.9068 0.9068 CD 10_1101-2021_02_08_430070 156 137 0.9509 0.9509 CD 10_1101-2021_02_08_430070 156 138 0.821 0.821 CD 10_1101-2021_02_08_430070 156 139 0.9162 0.9162 CD 10_1101-2021_02_08_430070 156 140 0.9433 0.9433 CD 10_1101-2021_02_08_430070 156 141 0.852 0.852 CD 10_1101-2021_02_08_430070 156 142 Simpson Simpson NNP 10_1101-2021_02_08_430070 156 143 E. E. NNP 10_1101-2021_02_08_430070 156 144 coli coli NNS 10_1101-2021_02_08_430070 156 145 CG_SssI CG_SssI NNP 10_1101-2021_02_08_430070 156 146 BiRNN birnn DT 10_1101-2021_02_08_430070 156 147 0.9514 0.9514 CD 10_1101-2021_02_08_430070 156 148 0.9512 0.9512 CD 10_1101-2021_02_08_430070 156 149 0.9316 0.9316 CD 10_1101-2021_02_08_430070 156 150 0.9284 0.9284 CD 10_1101-2021_02_08_430070 156 151 0.8805 0.8805 CD 10_1101-2021_02_08_430070 156 152 0.9854 0.9854 CD 10_1101-2021_02_08_430070 156 153 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 156 154 0.9477 0.9477 CD 10_1101-2021_02_08_430070 156 155 0.9469 0.9469 CD 10_1101-2021_02_08_430070 156 156 0.9268 0.9268 CD 10_1101-2021_02_08_430070 156 157 0.9227 0.9227 CD 10_1101-2021_02_08_430070 156 158 0.8718 0.8718 CD 10_1101-2021_02_08_430070 156 159 0.9845 0.9845 CD 10_1101-2021_02_08_430070 156 160 BERT_refined bert_refine VBN 10_1101-2021_02_08_430070 156 161 0.9464 0.9464 CD 10_1101-2021_02_08_430070 156 162 0.9656 0.9656 CD 10_1101-2021_02_08_430070 156 163 0.9124 0.9124 CD 10_1101-2021_02_08_430070 156 164 0.9456 0.9456 CD 10_1101-2021_02_08_430070 156 165 0.9135 0.9135 CD 10_1101-2021_02_08_430070 156 166 0.9803 0.9803 CD 10_1101-2021_02_08_430070 156 167 H.Sapiens H.Sapiens NNP 10_1101-2021_02_08_430070 156 168 CG_SssI CG_SssI NNS 10_1101-2021_02_08_430070 156 169 BiRNN birnn DT 10_1101-2021_02_08_430070 156 170 0.9004 0.9004 CD 10_1101-2021_02_08_430070 156 171 0.8891 0.8891 CD 10_1101-2021_02_08_430070 156 172 0.9230 0.9230 CD 10_1101-2021_02_08_430070 156 173 0.9010 0.9010 CD 10_1101-2021_02_08_430070 156 174 0.8900 0.8900 CD 10_1101-2021_02_08_430070 156 175 0.9240 0.9240 CD 10_1101-2021_02_08_430070 156 176 BERT_basic bert_basic NN 10_1101-2021_02_08_430070 156 177 0.8962 0.8962 CD 10_1101-2021_02_08_430070 156 178 0.8813 0.8813 CD 10_1101-2021_02_08_430070 156 179 0.9248 0.9248 CD 10_1101-2021_02_08_430070 156 180 0.8969 0.8969 CD 10_1101-2021_02_08_430070 156 181 0.8823 0.8823 CD 10_1101-2021_02_08_430070 156 182 0.9256 0.9256 CD 10_1101-2021_02_08_430070 156 183 BERT_refined BERT_refined NNP 10_1101-2021_02_08_430070 156 184 0.9045 0.9045 CD 10_1101-2021_02_08_430070 156 185 0.9143 0.9143 CD 10_1101-2021_02_08_430070 156 186 0.8984 0.8984 CD 10_1101-2021_02_08_430070 156 187 0.9053 0.9053 CD 10_1101-2021_02_08_430070 156 188 0.9147 0.9147 CD 10_1101-2021_02_08_430070 156 189 0.9003 0.9003 CD 10_1101-2021_02_08_430070 156 190 Table table NN 10_1101-2021_02_08_430070 156 191 1 1 CD 10_1101-2021_02_08_430070 156 192 . . . 10_1101-2021_02_08_430070 157 1 In in IN 10_1101-2021_02_08_430070 157 2 - - HYPH 10_1101-2021_02_08_430070 157 3 sample sample NN 10_1101-2021_02_08_430070 157 4 evaluation evaluation NN 10_1101-2021_02_08_430070 157 5 of of IN 10_1101-2021_02_08_430070 157 6 different different JJ 10_1101-2021_02_08_430070 157 7 deep deep JJ 10_1101-2021_02_08_430070 157 8 learning learning NN 10_1101-2021_02_08_430070 157 9 models model NNS 10_1101-2021_02_08_430070 157 10 on on IN 10_1101-2021_02_08_430070 157 11 5mC 5mc CD 10_1101-2021_02_08_430070 157 12 datasets dataset NNS 10_1101-2021_02_08_430070 157 13 . . . 10_1101-2021_02_08_430070 158 1 The the DT 10_1101-2021_02_08_430070 158 2 best good JJS 10_1101-2021_02_08_430070 158 3 score score NN 10_1101-2021_02_08_430070 158 4 of of IN 10_1101-2021_02_08_430070 158 5 each each DT 10_1101-2021_02_08_430070 158 6 dataset dataset NN 10_1101-2021_02_08_430070 158 7 is be VBZ 10_1101-2021_02_08_430070 158 8 highlighted highlight VBN 10_1101-2021_02_08_430070 158 9 in in IN 10_1101-2021_02_08_430070 158 10 bold bold JJ 10_1101-2021_02_08_430070 158 11 . . . 10_1101-2021_02_08_430070 159 1 Dataset Dataset NNP 10_1101-2021_02_08_430070 159 2 Species Species NNP 10_1101-2021_02_08_430070 159 3 Motif_Methyltransferase Motif_Methyltransferase NNP 10_1101-2021_02_08_430070 159 4 Model Model NNP 10_1101-2021_02_08_430070 159 5 Single Single NNP 10_1101-2021_02_08_430070 159 6 ( ( -LRB- 10_1101-2021_02_08_430070 159 7 read read VBN 10_1101-2021_02_08_430070 159 8 - - HYPH 10_1101-2021_02_08_430070 159 9 level level NN 10_1101-2021_02_08_430070 159 10 ) ) -RRB- 10_1101-2021_02_08_430070 159 11 Group group NN 10_1101-2021_02_08_430070 159 12 ( ( -LRB- 10_1101-2021_02_08_430070 159 13 > > XX 10_1101-2021_02_08_430070 159 14 = = SYM 10_1101-2021_02_08_430070 159 15 1 1 CD 10_1101-2021_02_08_430070 159 16 , , , 10_1101-2021_02_08_430070 159 17 genomic genomic JJ 10_1101-2021_02_08_430070 159 18 level level NN 10_1101-2021_02_08_430070 159 19 ) ) -RRB- 10_1101-2021_02_08_430070 159 20 AUC AUC NNP 10_1101-2021_02_08_430070 159 21 Precision Precision NNP 10_1101-2021_02_08_430070 159 22 Recall Recall NNP 10_1101-2021_02_08_430070 159 23 AUC AUC NNP 10_1101-2021_02_08_430070 159 24 Precision Precision NNP 10_1101-2021_02_08_430070 159 25 Recall Recall NNP 10_1101-2021_02_08_430070 159 26 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 159 27 E.coli e.coli JJ 10_1101-2021_02_08_430070 159 28 gaAttc_EcoRI gaattc_ecori NN 10_1101-2021_02_08_430070 159 29 BiRNN birnn JJ 10_1101-2021_02_08_430070 159 30 0.8524 0.8524 CD 10_1101-2021_02_08_430070 159 31 0.8088 0.8088 CD 10_1101-2021_02_08_430070 159 32 0.7497 0.7497 CD 10_1101-2021_02_08_430070 159 33 0.8429 0.8429 CD 10_1101-2021_02_08_430070 159 34 0.7797 0.7797 CD 10_1101-2021_02_08_430070 159 35 0.8035 0.8035 CD 10_1101-2021_02_08_430070 159 36 BERT_basic bert_basic JJ 10_1101-2021_02_08_430070 159 37 0.8607 0.8607 CD 10_1101-2021_02_08_430070 159 38 0.8151 0.8151 CD 10_1101-2021_02_08_430070 159 39 0.7653 0.7653 CD 10_1101-2021_02_08_430070 159 40 0.8591 0.8591 CD 10_1101-2021_02_08_430070 159 41 0.7969 0.7969 CD 10_1101-2021_02_08_430070 159 42 0.8277 0.8277 CD 10_1101-2021_02_08_430070 159 43 BERT_refined BERT_refined NNP 10_1101-2021_02_08_430070 159 44 0.8611 0.8611 CD 10_1101-2021_02_08_430070 159 45 0.8826 0.8826 CD 10_1101-2021_02_08_430070 159 46 0.7473 0.7473 CD 10_1101-2021_02_08_430070 159 47 0.8655 0.8655 CD 10_1101-2021_02_08_430070 159 48 0.8596 0.8596 CD 10_1101-2021_02_08_430070 159 49 0.7987 0.7987 CD 10_1101-2021_02_08_430070 159 50 tcgA_TaqI tcga_taqi NN 10_1101-2021_02_08_430070 159 51 BiRNN BiRNN NNP 10_1101-2021_02_08_430070 159 52 0.7722 0.7722 CD 10_1101-2021_02_08_430070 159 53 0.7922 0.7922 CD 10_1101-2021_02_08_430070 159 54 0.5750 0.5750 CD 10_1101-2021_02_08_430070 159 55 0.7750 0.7750 CD 10_1101-2021_02_08_430070 159 56 0.7789 0.7789 CD 10_1101-2021_02_08_430070 159 57 0.6290 0.6290 CD 10_1101-2021_02_08_430070 159 58 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 159 59 0.7573 0.7573 CD 10_1101-2021_02_08_430070 159 60 0.8168 0.8168 CD 10_1101-2021_02_08_430070 159 61 0.5392 0.5392 CD 10_1101-2021_02_08_430070 159 62 0.7653 0.7653 CD 10_1101-2021_02_08_430070 159 63 0.8063 0.8063 CD 10_1101-2021_02_08_430070 159 64 0.5937 0.5937 CD 10_1101-2021_02_08_430070 159 65 BERT_refined BERT_refined NNP 10_1101-2021_02_08_430070 159 66 0.7857 0.7857 CD 10_1101-2021_02_08_430070 159 67 0.7788 0.7788 CD 10_1101-2021_02_08_430070 159 68 0.6064 0.6064 CD 10_1101-2021_02_08_430070 159 69 0.7843 0.7843 CD 10_1101-2021_02_08_430070 159 70 0.7643 0.7643 CD 10_1101-2021_02_08_430070 159 71 0.6586 0.6586 CD 10_1101-2021_02_08_430070 159 72 gAtc_Dam gatc_dam CD 10_1101-2021_02_08_430070 159 73 BiRNN BiRNN NNP 10_1101-2021_02_08_430070 159 74 0.6123 0.6123 CD 10_1101-2021_02_08_430070 159 75 0.7656 0.7656 CD 10_1101-2021_02_08_430070 159 76 0.247 0.247 CD 10_1101-2021_02_08_430070 159 77 0.6337 0.6337 CD 10_1101-2021_02_08_430070 159 78 0.7631 0.7631 CD 10_1101-2021_02_08_430070 159 79 0.3241 0.3241 CD 10_1101-2021_02_08_430070 159 80 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 159 81 0.6128 0.6128 CD 10_1101-2021_02_08_430070 159 82 0.7329 0.7329 CD 10_1101-2021_02_08_430070 159 83 0.2529 0.2529 CD 10_1101-2021_02_08_430070 159 84 0.631 0.631 CD 10_1101-2021_02_08_430070 159 85 0.7311 0.7311 CD 10_1101-2021_02_08_430070 159 86 0.3305 0.3305 CD 10_1101-2021_02_08_430070 159 87 BERT_refined bert_refine VBN 10_1101-2021_02_08_430070 159 88 0.6188 0.6188 CD 10_1101-2021_02_08_430070 159 89 0.7513 0.7513 CD 10_1101-2021_02_08_430070 159 90 0.2634 0.2634 CD 10_1101-2021_02_08_430070 159 91 0.6385 0.6385 CD 10_1101-2021_02_08_430070 159 92 0.7471 0.7471 CD 10_1101-2021_02_08_430070 159 93 0.3421 0.3421 CD 10_1101-2021_02_08_430070 159 94 Table table NN 10_1101-2021_02_08_430070 159 95 2 2 CD 10_1101-2021_02_08_430070 159 96 . . . 10_1101-2021_02_08_430070 160 1 In in IN 10_1101-2021_02_08_430070 160 2 - - HYPH 10_1101-2021_02_08_430070 160 3 sample sample NN 10_1101-2021_02_08_430070 160 4 evaluation evaluation NN 10_1101-2021_02_08_430070 160 5 of of IN 10_1101-2021_02_08_430070 160 6 different different JJ 10_1101-2021_02_08_430070 160 7 deep deep JJ 10_1101-2021_02_08_430070 160 8 learning learning NN 10_1101-2021_02_08_430070 160 9 models model NNS 10_1101-2021_02_08_430070 160 10 on on IN 10_1101-2021_02_08_430070 160 11 6mA 6ma CD 10_1101-2021_02_08_430070 160 12 datasets dataset NNS 10_1101-2021_02_08_430070 160 13 . . . 10_1101-2021_02_08_430070 160 14 The the DT 10_1101-2021_02_08_430070 160 15 best good JJS 10_1101-2021_02_08_430070 160 16 score score NN 10_1101-2021_02_08_430070 160 17 of of IN 10_1101-2021_02_08_430070 160 18 each each DT 10_1101-2021_02_08_430070 160 19 dataset dataset NN 10_1101-2021_02_08_430070 160 20 is be VBZ 10_1101-2021_02_08_430070 160 21 highlighted highlight VBN 10_1101-2021_02_08_430070 160 22 in in IN 10_1101-2021_02_08_430070 160 23 bold bold JJ 10_1101-2021_02_08_430070 160 24 . . . 10_1101-2021_02_08_430070 161 1 Simpson_E.coli_CG_SssI Simpson_E.coli_CG_SssI NNP 10_1101-2021_02_08_430070 161 2 , , , 10_1101-2021_02_08_430070 161 3 although although IN 10_1101-2021_02_08_430070 161 4 the the DT 10_1101-2021_02_08_430070 161 5 read read JJ 10_1101-2021_02_08_430070 161 6 - - HYPH 10_1101-2021_02_08_430070 161 7 level level NN 10_1101-2021_02_08_430070 161 8 AUC auc NN 10_1101-2021_02_08_430070 161 9 of of IN 10_1101-2021_02_08_430070 161 10 the the DT 10_1101-2021_02_08_430070 161 11 refined refined JJ 10_1101-2021_02_08_430070 161 12 BERT BERT NNP 10_1101-2021_02_08_430070 161 13 are be VBP 10_1101-2021_02_08_430070 161 14 0.0014 0.0014 CD 10_1101-2021_02_08_430070 161 15 and and CC 10_1101-2021_02_08_430070 161 16 0.005 0.005 CD 10_1101-2021_02_08_430070 161 17 lower low JJR 10_1101-2021_02_08_430070 161 18 than than IN 10_1101-2021_02_08_430070 161 19 that that DT 10_1101-2021_02_08_430070 161 20 of of IN 10_1101-2021_02_08_430070 161 21 biRNN birnn NN 10_1101-2021_02_08_430070 161 22 , , , 10_1101-2021_02_08_430070 161 23 the the DT 10_1101-2021_02_08_430070 161 24 genomic genomic JJ 10_1101-2021_02_08_430070 161 25 - - HYPH 10_1101-2021_02_08_430070 161 26 level level NN 10_1101-2021_02_08_430070 161 27 performance performance NN 10_1101-2021_02_08_430070 161 28 of of IN 10_1101-2021_02_08_430070 161 29 the the DT 10_1101-2021_02_08_430070 161 30 refined refined JJ 10_1101-2021_02_08_430070 161 31 BERT BERT NNP 10_1101-2021_02_08_430070 161 32 is be VBZ 10_1101-2021_02_08_430070 161 33 equal equal JJ 10_1101-2021_02_08_430070 161 34 or or CC 10_1101-2021_02_08_430070 161 35 significantly significantly RB 10_1101-2021_02_08_430070 161 36 better well JJR 10_1101-2021_02_08_430070 161 37 than than IN 10_1101-2021_02_08_430070 161 38 biRNN birnn NN 10_1101-2021_02_08_430070 161 39 . . . 10_1101-2021_02_08_430070 162 1 This this DT 10_1101-2021_02_08_430070 162 2 can can MD 10_1101-2021_02_08_430070 162 3 be be VB 10_1101-2021_02_08_430070 162 4 explained explain VBN 10_1101-2021_02_08_430070 162 5 by by IN 10_1101-2021_02_08_430070 162 6 the the DT 10_1101-2021_02_08_430070 162 7 more more RBR 10_1101-2021_02_08_430070 162 8 accurate accurate JJ 10_1101-2021_02_08_430070 162 9 prediction prediction NN 10_1101-2021_02_08_430070 162 10 in in IN 10_1101-2021_02_08_430070 162 11 several several JJ 10_1101-2021_02_08_430070 162 12 low low JJ 10_1101-2021_02_08_430070 162 13 read read NN 10_1101-2021_02_08_430070 162 14 - - HYPH 10_1101-2021_02_08_430070 162 15 coverage coverage NN 10_1101-2021_02_08_430070 162 16 regions region NNS 10_1101-2021_02_08_430070 162 17 . . . 10_1101-2021_02_08_430070 163 1 On on IN 10_1101-2021_02_08_430070 163 2 the the DT 10_1101-2021_02_08_430070 163 3 6mA 6ma CD 10_1101-2021_02_08_430070 163 4 dataset dataset NN 10_1101-2021_02_08_430070 163 5 , , , 10_1101-2021_02_08_430070 163 6 the the DT 10_1101-2021_02_08_430070 163 7 refined refined JJ 10_1101-2021_02_08_430070 163 8 BERT BERT NNP 10_1101-2021_02_08_430070 163 9 model model NN 10_1101-2021_02_08_430070 163 10 achieves achieve VBZ 10_1101-2021_02_08_430070 163 11 the the DT 10_1101-2021_02_08_430070 163 12 best good JJS 10_1101-2021_02_08_430070 163 13 AUC AUC NNP 10_1101-2021_02_08_430070 163 14 performance performance NN 10_1101-2021_02_08_430070 163 15 on on IN 10_1101-2021_02_08_430070 163 16 both both DT 10_1101-2021_02_08_430070 163 17 read read NN 10_1101-2021_02_08_430070 163 18 - - HYPH 10_1101-2021_02_08_430070 163 19 level level NN 10_1101-2021_02_08_430070 163 20 and and CC 10_1101-2021_02_08_430070 163 21 genomic genomic JJ 10_1101-2021_02_08_430070 163 22 - - HYPH 10_1101-2021_02_08_430070 163 23 level level NN 10_1101-2021_02_08_430070 163 24 . . . 10_1101-2021_02_08_430070 164 1 The the DT 10_1101-2021_02_08_430070 164 2 performance performance NN 10_1101-2021_02_08_430070 164 3 of of IN 10_1101-2021_02_08_430070 164 4 the the DT 10_1101-2021_02_08_430070 164 5 basic basic JJ 10_1101-2021_02_08_430070 164 6 BERT BERT NNP 10_1101-2021_02_08_430070 164 7 model model NN 10_1101-2021_02_08_430070 164 8 is be VBZ 10_1101-2021_02_08_430070 164 9 variant variant JJ 10_1101-2021_02_08_430070 164 10 and and CC 10_1101-2021_02_08_430070 164 11 unstable unstable JJ 10_1101-2021_02_08_430070 164 12 . . . 10_1101-2021_02_08_430070 165 1 On on IN 10_1101-2021_02_08_430070 165 2 Stobier_E.coli_gaAttc_EcoRI Stobier_E.coli_gaAttc_EcoRI NNP 10_1101-2021_02_08_430070 165 3 and and CC 10_1101-2021_02_08_430070 165 4 Stoiber_E.coli_gAtc_Dam Stoiber_E.coli_gAtc_Dam NNP 10_1101-2021_02_08_430070 165 5 , , , 10_1101-2021_02_08_430070 165 6 the the DT 10_1101-2021_02_08_430070 165 7 basic basic JJ 10_1101-2021_02_08_430070 165 8 BERT BERT NNP 10_1101-2021_02_08_430070 165 9 performs perform VBZ 10_1101-2021_02_08_430070 165 10 slightly slightly RB 10_1101-2021_02_08_430070 165 11 better well JJR 10_1101-2021_02_08_430070 165 12 than than IN 10_1101-2021_02_08_430070 165 13 biRNN birnn NN 10_1101-2021_02_08_430070 165 14 on on IN 10_1101-2021_02_08_430070 165 15 the the DT 10_1101-2021_02_08_430070 165 16 read read JJ 10_1101-2021_02_08_430070 165 17 - - HYPH 10_1101-2021_02_08_430070 165 18 level level NN 10_1101-2021_02_08_430070 165 19 AUC AUC NNP 10_1101-2021_02_08_430070 165 20 , , , 10_1101-2021_02_08_430070 165 21 but but CC 10_1101-2021_02_08_430070 165 22 has have VBZ 10_1101-2021_02_08_430070 165 23 a a DT 10_1101-2021_02_08_430070 165 24 large large JJ 10_1101-2021_02_08_430070 165 25 performance performance NN 10_1101-2021_02_08_430070 165 26 gap gap NN 10_1101-2021_02_08_430070 165 27 on on IN 10_1101-2021_02_08_430070 165 28 Stoiber_E.coli_gaAttc_EcoRI stoiber_e.coli_gaattc_ecori NN 10_1101-2021_02_08_430070 165 29 . . . 10_1101-2021_02_08_430070 166 1 In in IN 10_1101-2021_02_08_430070 166 2 summary summary NN 10_1101-2021_02_08_430070 166 3 , , , 10_1101-2021_02_08_430070 166 4 in in IN 10_1101-2021_02_08_430070 166 5 the the DT 10_1101-2021_02_08_430070 166 6 in in IN 10_1101-2021_02_08_430070 166 7 - - HYPH 10_1101-2021_02_08_430070 166 8 sample sample NN 10_1101-2021_02_08_430070 166 9 evaluation evaluation NN 10_1101-2021_02_08_430070 166 10 , , , 10_1101-2021_02_08_430070 166 11 the the DT 10_1101-2021_02_08_430070 166 12 refined refined JJ 10_1101-2021_02_08_430070 166 13 BERT BERT NNP 10_1101-2021_02_08_430070 166 14 model model NN 10_1101-2021_02_08_430070 166 15 can can MD 10_1101-2021_02_08_430070 166 16 achieve achieve VB 10_1101-2021_02_08_430070 166 17 competitive competitive JJ 10_1101-2021_02_08_430070 166 18 or or CC 10_1101-2021_02_08_430070 166 19 better well JJR 10_1101-2021_02_08_430070 166 20 results result NNS 10_1101-2021_02_08_430070 166 21 when when WRB 10_1101-2021_02_08_430070 166 22 compared compare VBN 10_1101-2021_02_08_430070 166 23 with with IN 10_1101-2021_02_08_430070 166 24 the the DT 10_1101-2021_02_08_430070 166 25 biRNN biRNN NNP 10_1101-2021_02_08_430070 166 26 model model NN 10_1101-2021_02_08_430070 166 27 on on IN 10_1101-2021_02_08_430070 166 28 benchmark benchmark JJ 10_1101-2021_02_08_430070 166 29 5mC 5mc CD 10_1101-2021_02_08_430070 166 30 and and CC 10_1101-2021_02_08_430070 166 31 6mA 6mA NNP 10_1101-2021_02_08_430070 166 32 datasets dataset NNS 10_1101-2021_02_08_430070 166 33 . . . 10_1101-2021_02_08_430070 167 1 3.4 3.4 CD 10_1101-2021_02_08_430070 167 2 Cross cross JJ 10_1101-2021_02_08_430070 167 3 - - JJ 10_1101-2021_02_08_430070 167 4 sample sample NN 10_1101-2021_02_08_430070 167 5 evaluation evaluation NN 10_1101-2021_02_08_430070 167 6 We -PRON- PRP 10_1101-2021_02_08_430070 167 7 then then RB 10_1101-2021_02_08_430070 167 8 conduct conduct VBP 10_1101-2021_02_08_430070 167 9 the the DT 10_1101-2021_02_08_430070 167 10 cross cross JJ 10_1101-2021_02_08_430070 167 11 - - JJ 10_1101-2021_02_08_430070 167 12 sample sample JJ 10_1101-2021_02_08_430070 167 13 evaluation evaluation NN 10_1101-2021_02_08_430070 167 14 . . . 10_1101-2021_02_08_430070 168 1 To to TO 10_1101-2021_02_08_430070 168 2 compare compare VB 10_1101-2021_02_08_430070 168 3 with with IN 10_1101-2021_02_08_430070 168 4 other other JJ 10_1101-2021_02_08_430070 168 5 non- non- NN 10_1101-2021_02_08_430070 168 6 deep deep RB 10_1101-2021_02_08_430070 168 7 - - HYPH 10_1101-2021_02_08_430070 168 8 learning learn VBG 10_1101-2021_02_08_430070 168 9 based base VBN 10_1101-2021_02_08_430070 168 10 methods method NNS 10_1101-2021_02_08_430070 168 11 , , , 10_1101-2021_02_08_430070 168 12 we -PRON- PRP 10_1101-2021_02_08_430070 168 13 utilize utilize VBP 10_1101-2021_02_08_430070 168 14 the the DT 10_1101-2021_02_08_430070 168 15 benchmark benchmark NN 10_1101-2021_02_08_430070 168 16 pipeline pipeline NN 10_1101-2021_02_08_430070 168 17 ( ( -LRB- 10_1101-2021_02_08_430070 168 18 Yuen Yuen NNP 10_1101-2021_02_08_430070 168 19 et et FW 10_1101-2021_02_08_430070 168 20 al al NNP 10_1101-2021_02_08_430070 168 21 . . NNP 10_1101-2021_02_08_430070 168 22 , , , 10_1101-2021_02_08_430070 168 23 2020 2020 CD 10_1101-2021_02_08_430070 168 24 ) ) -RRB- 10_1101-2021_02_08_430070 168 25 as as IN 10_1101-2021_02_08_430070 168 26 a a DT 10_1101-2021_02_08_430070 168 27 pivot pivot NN 10_1101-2021_02_08_430070 168 28 . . . 10_1101-2021_02_08_430070 169 1 We -PRON- PRP 10_1101-2021_02_08_430070 169 2 test test VBP 10_1101-2021_02_08_430070 169 3 models model NNS 10_1101-2021_02_08_430070 169 4 on on IN 10_1101-2021_02_08_430070 169 5 the the DT 10_1101-2021_02_08_430070 169 6 same same JJ 10_1101-2021_02_08_430070 169 7 benchmark benchmark JJ 10_1101-2021_02_08_430070 169 8 dataset1 dataset1 NN 10_1101-2021_02_08_430070 169 9 , , , 10_1101-2021_02_08_430070 169 10 which which WDT 10_1101-2021_02_08_430070 169 11 is be VBZ 10_1101-2021_02_08_430070 169 12 generated generate VBN 10_1101-2021_02_08_430070 169 13 based base VBN 10_1101-2021_02_08_430070 169 14 on on IN 10_1101-2021_02_08_430070 169 15 Simpson Simpson NNP 10_1101-2021_02_08_430070 169 16 ’s ’s POS 10_1101-2021_02_08_430070 169 17 E.coli e.coli JJ 10_1101-2021_02_08_430070 169 18 dataset dataset NN 10_1101-2021_02_08_430070 169 19 with with IN 10_1101-2021_02_08_430070 169 20 different different JJ 10_1101-2021_02_08_430070 169 21 methylation methylation NN 10_1101-2021_02_08_430070 169 22 levels level NNS 10_1101-2021_02_08_430070 169 23 . . . 10_1101-2021_02_08_430070 170 1 In in IN 10_1101-2021_02_08_430070 170 2 the the DT 10_1101-2021_02_08_430070 170 3 dataset dataset NN 10_1101-2021_02_08_430070 170 4 , , , 10_1101-2021_02_08_430070 170 5 100 100 CD 10_1101-2021_02_08_430070 170 6 arbitrary arbitrary JJ 10_1101-2021_02_08_430070 170 7 sites site NNS 10_1101-2021_02_08_430070 170 8 are be VBP 10_1101-2021_02_08_430070 170 9 selected select VBN 10_1101-2021_02_08_430070 170 10 , , , 10_1101-2021_02_08_430070 170 11 which which WDT 10_1101-2021_02_08_430070 170 12 contain contain VBP 10_1101-2021_02_08_430070 170 13 singleton singleton NNP 10_1101-2021_02_08_430070 170 14 CpG CpG NNP 10_1101-2021_02_08_430070 170 15 in in IN 10_1101-2021_02_08_430070 170 16 a a DT 10_1101-2021_02_08_430070 170 17 window window NN 10_1101-2021_02_08_430070 170 18 of of IN 10_1101-2021_02_08_430070 170 19 10nt 10nt NN 10_1101-2021_02_08_430070 170 20 from from IN 10_1101-2021_02_08_430070 170 21 both both CC 10_1101-2021_02_08_430070 170 22 methylated methylated JJ 10_1101-2021_02_08_430070 170 23 and and CC 10_1101-2021_02_08_430070 170 24 unmethylated unmethylated JJ 10_1101-2021_02_08_430070 170 25 instances instance NNS 10_1101-2021_02_08_430070 170 26 in in IN 10_1101-2021_02_08_430070 170 27 the the DT 10_1101-2021_02_08_430070 170 28 Simpson Simpson NNP 10_1101-2021_02_08_430070 170 29 ’s ’s POS 10_1101-2021_02_08_430070 170 30 E.coli e.coli JJ 10_1101-2021_02_08_430070 170 31 dataset dataset NN 10_1101-2021_02_08_430070 170 32 . . . 10_1101-2021_02_08_430070 171 1 Yuen Yuen NNP 10_1101-2021_02_08_430070 171 2 et et FW 10_1101-2021_02_08_430070 171 3 al al NNP 10_1101-2021_02_08_430070 171 4 . . . 10_1101-2021_02_08_430070 172 1 created create VBN 10_1101-2021_02_08_430070 172 2 11 11 CD 10_1101-2021_02_08_430070 172 3 specific specific JJ 10_1101-2021_02_08_430070 172 4 mixtures mixture NNS 10_1101-2021_02_08_430070 172 5 of of IN 10_1101-2021_02_08_430070 172 6 methylated methylated JJ 10_1101-2021_02_08_430070 172 7 and and CC 10_1101-2021_02_08_430070 172 8 unmethylated unmethylated JJ 10_1101-2021_02_08_430070 172 9 reads read NNS 10_1101-2021_02_08_430070 172 10 , , , 10_1101-2021_02_08_430070 172 11 containing contain VBG 10_1101-2021_02_08_430070 172 12 0 0 CD 10_1101-2021_02_08_430070 172 13 % % NN 10_1101-2021_02_08_430070 172 14 , , , 10_1101-2021_02_08_430070 172 15 10 10 CD 10_1101-2021_02_08_430070 172 16 % % NN 10_1101-2021_02_08_430070 172 17 , , , 10_1101-2021_02_08_430070 172 18 ... ... : 10_1101-2021_02_08_430070 172 19 , , , 10_1101-2021_02_08_430070 172 20 100 100 CD 10_1101-2021_02_08_430070 172 21 % % NN 10_1101-2021_02_08_430070 172 22 of of IN 10_1101-2021_02_08_430070 172 23 methylated methylated JJ 10_1101-2021_02_08_430070 172 24 reads read NNS 10_1101-2021_02_08_430070 172 25 . . . 10_1101-2021_02_08_430070 173 1 Each each DT 10_1101-2021_02_08_430070 173 2 mixture mixture NN 10_1101-2021_02_08_430070 173 3 contains contain VBZ 10_1101-2021_02_08_430070 173 4 approximately approximately RB 10_1101-2021_02_08_430070 173 5 2400 2400 CD 10_1101-2021_02_08_430070 173 6 reads read NNS 10_1101-2021_02_08_430070 173 7 . . . 10_1101-2021_02_08_430070 174 1 More more RBR 10_1101-2021_02_08_430070 174 2 detailed detailed JJ 10_1101-2021_02_08_430070 174 3 information information NN 10_1101-2021_02_08_430070 174 4 can can MD 10_1101-2021_02_08_430070 174 5 be be VB 10_1101-2021_02_08_430070 174 6 found find VBN 10_1101-2021_02_08_430070 174 7 in in IN 10_1101-2021_02_08_430070 174 8 ( ( -LRB- 10_1101-2021_02_08_430070 174 9 Yuen Yuen NNP 10_1101-2021_02_08_430070 174 10 et et FW 10_1101-2021_02_08_430070 174 11 al al NNP 10_1101-2021_02_08_430070 174 12 . . NNP 10_1101-2021_02_08_430070 174 13 , , , 10_1101-2021_02_08_430070 174 14 2020 2020 CD 10_1101-2021_02_08_430070 174 15 ) ) -RRB- 10_1101-2021_02_08_430070 174 16 . . . 10_1101-2021_02_08_430070 175 1 Different different JJ 10_1101-2021_02_08_430070 175 2 from from IN 10_1101-2021_02_08_430070 175 3 the the DT 10_1101-2021_02_08_430070 175 4 deepMOD deepMOD NNP 10_1101-2021_02_08_430070 175 5 model model NN 10_1101-2021_02_08_430070 175 6 used use VBN 10_1101-2021_02_08_430070 175 7 in in IN 10_1101-2021_02_08_430070 175 8 the the DT 10_1101-2021_02_08_430070 175 9 original original JJ 10_1101-2021_02_08_430070 175 10 benchmark benchmark NN 10_1101-2021_02_08_430070 175 11 pipeline pipeline NN 10_1101-2021_02_08_430070 175 12 , , , 10_1101-2021_02_08_430070 175 13 which which WDT 10_1101-2021_02_08_430070 175 14 is be VBZ 10_1101-2021_02_08_430070 175 15 pre pre VBN 10_1101-2021_02_08_430070 175 16 - - VBN 10_1101-2021_02_08_430070 175 17 trained train VBN 10_1101-2021_02_08_430070 175 18 on on IN 10_1101-2021_02_08_430070 175 19 a a DT 10_1101-2021_02_08_430070 175 20 mixture mixture NN 10_1101-2021_02_08_430070 175 21 dataset dataset NN 10_1101-2021_02_08_430070 175 22 of of IN 10_1101-2021_02_08_430070 175 23 all all DT 10_1101-2021_02_08_430070 175 24 5mC 5mc CD 10_1101-2021_02_08_430070 175 25 positive positive JJ 10_1101-2021_02_08_430070 175 26 ( ( -LRB- 10_1101-2021_02_08_430070 175 27 Cg_SssI Cg_SssI NNP 10_1101-2021_02_08_430070 175 28 , , , 10_1101-2021_02_08_430070 175 29 Cg_MpeI Cg_MpeI NNP 10_1101-2021_02_08_430070 175 30 , , , 10_1101-2021_02_08_430070 175 31 and and CC 10_1101-2021_02_08_430070 175 32 gCgc_Hhal gcgc_hhal CD 10_1101-2021_02_08_430070 175 33 ) ) -RRB- 10_1101-2021_02_08_430070 175 34 and and CC 10_1101-2021_02_08_430070 175 35 negative negative JJ 10_1101-2021_02_08_430070 175 36 controls control NNS 10_1101-2021_02_08_430070 175 37 ( ( -LRB- 10_1101-2021_02_08_430070 175 38 UMR UMR NNP 10_1101-2021_02_08_430070 175 39 , , , 10_1101-2021_02_08_430070 175 40 con1 con1 NNP 10_1101-2021_02_08_430070 175 41 , , , 10_1101-2021_02_08_430070 175 42 and and CC 10_1101-2021_02_08_430070 175 43 con2 con2 NN 10_1101-2021_02_08_430070 175 44 ) ) -RRB- 10_1101-2021_02_08_430070 175 45 . . . 10_1101-2021_02_08_430070 176 1 Here here RB 10_1101-2021_02_08_430070 176 2 , , , 10_1101-2021_02_08_430070 176 3 we -PRON- PRP 10_1101-2021_02_08_430070 176 4 test test VBP 10_1101-2021_02_08_430070 176 5 two two CD 10_1101-2021_02_08_430070 176 6 different different JJ 10_1101-2021_02_08_430070 176 7 models model NNS 10_1101-2021_02_08_430070 176 8 trained train VBN 10_1101-2021_02_08_430070 176 9 on on IN 10_1101-2021_02_08_430070 176 10 a a DT 10_1101-2021_02_08_430070 176 11 single single JJ 10_1101-2021_02_08_430070 176 12 dataset dataset NN 10_1101-2021_02_08_430070 176 13 with with IN 10_1101-2021_02_08_430070 176 14 the the DT 10_1101-2021_02_08_430070 176 15 same same JJ 10_1101-2021_02_08_430070 176 16 methyltransferase methyltransferase NN 10_1101-2021_02_08_430070 176 17 to to TO 10_1101-2021_02_08_430070 176 18 reduce reduce VB 10_1101-2021_02_08_430070 176 19 potential potential JJ 10_1101-2021_02_08_430070 176 20 overlapping overlapping NN 10_1101-2021_02_08_430070 176 21 between between IN 10_1101-2021_02_08_430070 176 22 the the DT 10_1101-2021_02_08_430070 176 23 training training NN 10_1101-2021_02_08_430070 176 24 and and CC 10_1101-2021_02_08_430070 176 25 testing testing NN 10_1101-2021_02_08_430070 176 26 set set NN 10_1101-2021_02_08_430070 176 27 . . . 10_1101-2021_02_08_430070 177 1 All all DT 10_1101-2021_02_08_430070 177 2 three three CD 10_1101-2021_02_08_430070 177 3 models model NNS 10_1101-2021_02_08_430070 177 4 are be VBP 10_1101-2021_02_08_430070 177 5 trained train VBN 10_1101-2021_02_08_430070 177 6 on on IN 10_1101-2021_02_08_430070 177 7 Stoiber_Ecoli_CG_SssI Stoiber_Ecoli_CG_SssI NNP 10_1101-2021_02_08_430070 177 8 and and CC 10_1101-2021_02_08_430070 177 9 Simpson_Hsapiens_CG_SssI Simpson_Hsapiens_CG_SssI NNP 10_1101-2021_02_08_430070 177 10 , , , 10_1101-2021_02_08_430070 177 11 separately separately RB 10_1101-2021_02_08_430070 177 12 . . . 10_1101-2021_02_08_430070 178 1 Simpson_Hsapiens_CG_SssI Simpson_Hsapiens_CG_SssI NNP 10_1101-2021_02_08_430070 178 2 is be VBZ 10_1101-2021_02_08_430070 178 3 sequenced sequence VBN 10_1101-2021_02_08_430070 178 4 by by IN 10_1101-2021_02_08_430070 178 5 the the DT 10_1101-2021_02_08_430070 178 6 same same JJ 10_1101-2021_02_08_430070 178 7 group group NN 10_1101-2021_02_08_430070 178 8 on on IN 10_1101-2021_02_08_430070 178 9 different different JJ 10_1101-2021_02_08_430070 178 10 species specie NNS 10_1101-2021_02_08_430070 178 11 , , , 10_1101-2021_02_08_430070 178 12 while while IN 10_1101-2021_02_08_430070 178 13 Stoiber_Ecoli_CG_SssI Stoiber_Ecoli_CG_SssI NNP 10_1101-2021_02_08_430070 178 14 is be VBZ 10_1101-2021_02_08_430070 178 15 sequenced sequence VBN 10_1101-2021_02_08_430070 178 16 by by IN 10_1101-2021_02_08_430070 178 17 a a DT 10_1101-2021_02_08_430070 178 18 different different JJ 10_1101-2021_02_08_430070 178 19 group group NN 10_1101-2021_02_08_430070 178 20 on on IN 10_1101-2021_02_08_430070 178 21 the the DT 10_1101-2021_02_08_430070 178 22 same same JJ 10_1101-2021_02_08_430070 178 23 species species NN 10_1101-2021_02_08_430070 178 24 . . . 10_1101-2021_02_08_430070 179 1 We -PRON- PRP 10_1101-2021_02_08_430070 179 2 use use VBP 10_1101-2021_02_08_430070 179 3 METEORE METEORE NNP 10_1101-2021_02_08_430070 179 4 pipeline pipeline NN 10_1101-2021_02_08_430070 179 5 ( ( -LRB- 10_1101-2021_02_08_430070 179 6 Yuen Yuen NNP 10_1101-2021_02_08_430070 179 7 et et FW 10_1101-2021_02_08_430070 179 8 al al NNP 10_1101-2021_02_08_430070 179 9 . . NNP 10_1101-2021_02_08_430070 179 10 , , , 10_1101-2021_02_08_430070 179 11 2020 2020 CD 10_1101-2021_02_08_430070 179 12 ) ) -RRB- 10_1101-2021_02_08_430070 179 13 to to TO 10_1101-2021_02_08_430070 179 14 generate generate VB 10_1101-2021_02_08_430070 179 15 violin violin NN 10_1101-2021_02_08_430070 179 16 plots plot NNS 10_1101-2021_02_08_430070 179 17 for for IN 10_1101-2021_02_08_430070 179 18 model model NN 10_1101-2021_02_08_430070 179 19 predictions prediction NNS 10_1101-2021_02_08_430070 179 20 on on IN 10_1101-2021_02_08_430070 179 21 each each DT 10_1101-2021_02_08_430070 179 22 mixture mixture NN 10_1101-2021_02_08_430070 179 23 . . . 10_1101-2021_02_08_430070 180 1 The the DT 10_1101-2021_02_08_430070 180 2 Pearson Pearson NNP 10_1101-2021_02_08_430070 180 3 ’s ’s POS 10_1101-2021_02_08_430070 180 4 correlation correlation NN 10_1101-2021_02_08_430070 180 5 r r NN 10_1101-2021_02_08_430070 180 6 , , , 10_1101-2021_02_08_430070 180 7 coefficient coefficient NN 10_1101-2021_02_08_430070 180 8 of of IN 10_1101-2021_02_08_430070 180 9 determination determination NN 10_1101-2021_02_08_430070 180 10 r2 r2 NN 10_1101-2021_02_08_430070 180 11 and and CC 10_1101-2021_02_08_430070 180 12 root root NN 10_1101-2021_02_08_430070 180 13 mean mean VBP 10_1101-2021_02_08_430070 180 14 square square JJ 10_1101-2021_02_08_430070 180 15 error error NN 10_1101-2021_02_08_430070 180 16 ( ( -LRB- 10_1101-2021_02_08_430070 180 17 RMSE RMSE NNP 10_1101-2021_02_08_430070 180 18 ) ) -RRB- 10_1101-2021_02_08_430070 180 19 are be VBP 10_1101-2021_02_08_430070 180 20 used use VBN 10_1101-2021_02_08_430070 180 21 as as IN 10_1101-2021_02_08_430070 180 22 the the DT 10_1101-2021_02_08_430070 180 23 evaluation evaluation NN 10_1101-2021_02_08_430070 180 24 metrics metric NNS 10_1101-2021_02_08_430070 180 25 for for IN 10_1101-2021_02_08_430070 180 26 each each DT 10_1101-2021_02_08_430070 180 27 model model NN 10_1101-2021_02_08_430070 180 28 . . . 10_1101-2021_02_08_430070 181 1 With with IN 10_1101-2021_02_08_430070 181 2 the the DT 10_1101-2021_02_08_430070 181 3 training training NN 10_1101-2021_02_08_430070 181 4 data datum NNS 10_1101-2021_02_08_430070 181 5 of of IN 10_1101-2021_02_08_430070 181 6 Simpson_Hsapiens_CG_SssI Simpson_Hsapiens_CG_SssI NNP 10_1101-2021_02_08_430070 181 7 , , , 10_1101-2021_02_08_430070 181 8 all all DT 10_1101-2021_02_08_430070 181 9 three three CD 10_1101-2021_02_08_430070 181 10 models model NNS 10_1101-2021_02_08_430070 181 11 achieve achieve VBP 10_1101-2021_02_08_430070 181 12 performances performance NNS 10_1101-2021_02_08_430070 181 13 ranked rank VBD 10_1101-2021_02_08_430070 181 14 next next RB 10_1101-2021_02_08_430070 181 15 to to IN 10_1101-2021_02_08_430070 181 16 the the DT 10_1101-2021_02_08_430070 181 17 best good JJS 10_1101-2021_02_08_430070 181 18 reported report VBN 10_1101-2021_02_08_430070 181 19 results result NNS 10_1101-2021_02_08_430070 181 20 of of IN 10_1101-2021_02_08_430070 181 21 Megalodon Megalodon NNP 10_1101-2021_02_08_430070 181 22 ( ( -LRB- 10_1101-2021_02_08_430070 181 23 r=0.9860 r=0.9860 NNP 10_1101-2021_02_08_430070 181 24 , , , 10_1101-2021_02_08_430070 181 25 r2 r2 NNP 10_1101-2021_02_08_430070 181 26 = = SYM 10_1101-2021_02_08_430070 181 27 0.9723 0.9723 CD 10_1101-2021_02_08_430070 181 28 , , , 10_1101-2021_02_08_430070 181 29 RMSE=0.0758 RMSE=0.0758 NNP 10_1101-2021_02_08_430070 181 30 ) ) -RRB- 10_1101-2021_02_08_430070 181 31 on on IN 10_1101-2021_02_08_430070 181 32 the the DT 10_1101-2021_02_08_430070 181 33 dataset dataset NN 10_1101-2021_02_08_430070 181 34 ( ( -LRB- 10_1101-2021_02_08_430070 181 35 Yuen Yuen NNP 10_1101-2021_02_08_430070 181 36 et et FW 10_1101-2021_02_08_430070 181 37 al al NNP 10_1101-2021_02_08_430070 181 38 . . NNP 10_1101-2021_02_08_430070 181 39 , , , 10_1101-2021_02_08_430070 181 40 2020 2020 CD 10_1101-2021_02_08_430070 181 41 ) ) -RRB- 10_1101-2021_02_08_430070 181 42 . . . 10_1101-2021_02_08_430070 182 1 BiRNN birnn DT 10_1101-2021_02_08_430070 182 2 achieves achieve VBZ 10_1101-2021_02_08_430070 182 3 the the DT 10_1101-2021_02_08_430070 182 4 best good JJS 10_1101-2021_02_08_430070 182 5 Pearson Pearson NNP 10_1101-2021_02_08_430070 182 6 correlation correlation NN 10_1101-2021_02_08_430070 182 7 r=0.9828 r=0.9828 NNP 10_1101-2021_02_08_430070 182 8 and and CC 10_1101-2021_02_08_430070 182 9 r2=0.9658 r2=0.9658 NNP 10_1101-2021_02_08_430070 182 10 , , , 10_1101-2021_02_08_430070 182 11 while while IN 10_1101-2021_02_08_430070 182 12 refine refine JJ 10_1101-2021_02_08_430070 182 13 BERT BERT NNP 10_1101-2021_02_08_430070 182 14 achieves achieve VBZ 10_1101-2021_02_08_430070 182 15 minimal minimal JJ 10_1101-2021_02_08_430070 182 16 RMSE rmse NN 10_1101-2021_02_08_430070 182 17 of of IN 10_1101-2021_02_08_430070 182 18 0.0732 0.0732 CD 10_1101-2021_02_08_430070 182 19 among among IN 10_1101-2021_02_08_430070 182 20 the the DT 10_1101-2021_02_08_430070 182 21 evaluated evaluated JJ 10_1101-2021_02_08_430070 182 22 three three CD 10_1101-2021_02_08_430070 182 23 models model NNS 10_1101-2021_02_08_430070 182 24 . . . 10_1101-2021_02_08_430070 183 1 When when WRB 10_1101-2021_02_08_430070 183 2 using use VBG 10_1101-2021_02_08_430070 183 3 Stoiber_Ecoli_CG_SssI Stoiber_Ecoli_CG_SssI NNP 10_1101-2021_02_08_430070 183 4 for for IN 10_1101-2021_02_08_430070 183 5 training training NN 10_1101-2021_02_08_430070 183 6 models model NNS 10_1101-2021_02_08_430070 183 7 , , , 10_1101-2021_02_08_430070 183 8 the the DT 10_1101-2021_02_08_430070 183 9 performances performance NNS 10_1101-2021_02_08_430070 183 10 of of IN 10_1101-2021_02_08_430070 183 11 all all DT 10_1101-2021_02_08_430070 183 12 three three CD 10_1101-2021_02_08_430070 183 13 models model NNS 10_1101-2021_02_08_430070 183 14 decrease decrease VBP 10_1101-2021_02_08_430070 183 15 . . . 10_1101-2021_02_08_430070 184 1 This this DT 10_1101-2021_02_08_430070 184 2 indicates indicate VBZ 10_1101-2021_02_08_430070 184 3 the the DT 10_1101-2021_02_08_430070 184 4 challenge challenge NN 10_1101-2021_02_08_430070 184 5 of of IN 10_1101-2021_02_08_430070 184 6 using use VBG 10_1101-2021_02_08_430070 184 7 datasets dataset NNS 10_1101-2021_02_08_430070 184 8 sequenced sequence VBN 10_1101-2021_02_08_430070 184 9 by by IN 10_1101-2021_02_08_430070 184 10 different different JJ 10_1101-2021_02_08_430070 184 11 research research NN 10_1101-2021_02_08_430070 184 12 groups group NNS 10_1101-2021_02_08_430070 184 13 . . . 10_1101-2021_02_08_430070 185 1 Here here RB 10_1101-2021_02_08_430070 185 2 , , , 10_1101-2021_02_08_430070 185 3 both both DT 10_1101-2021_02_08_430070 185 4 BERT BERT NNP 10_1101-2021_02_08_430070 185 5 models model NNS 10_1101-2021_02_08_430070 185 6 show show VBP 10_1101-2021_02_08_430070 185 7 better well JJR 10_1101-2021_02_08_430070 185 8 performances performance NNS 10_1101-2021_02_08_430070 185 9 than than IN 10_1101-2021_02_08_430070 185 10 biRNN birnn NN 10_1101-2021_02_08_430070 185 11 , , , 10_1101-2021_02_08_430070 185 12 as as IN 10_1101-2021_02_08_430070 185 13 in in IN 10_1101-2021_02_08_430070 185 14 Figure Figure NNP 10_1101-2021_02_08_430070 185 15 3b 3b NNP 10_1101-2021_02_08_430070 185 16 . . . 10_1101-2021_02_08_430070 186 1 The the DT 10_1101-2021_02_08_430070 186 2 refined refined JJ 10_1101-2021_02_08_430070 186 3 .license .license . 10_1101-2021_02_08_430070 186 4 CC cc NN 10_1101-2021_02_08_430070 186 5 - - HYPH 10_1101-2021_02_08_430070 186 6 BY BY NNP 10_1101-2021_02_08_430070 186 7 - - HYPH 10_1101-2021_02_08_430070 186 8 NC NC NNP 10_1101-2021_02_08_430070 186 9 - - HYPH 10_1101-2021_02_08_430070 186 10 ND ND NNP 10_1101-2021_02_08_430070 186 11 4.0 4.0 CD 10_1101-2021_02_08_430070 186 12 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 186 13 review review NN 10_1101-2021_02_08_430070 186 14 ) ) -RRB- 10_1101-2021_02_08_430070 186 15 is be VBZ 10_1101-2021_02_08_430070 186 16 the the DT 10_1101-2021_02_08_430070 186 17 author author NN 10_1101-2021_02_08_430070 186 18 / / SYM 10_1101-2021_02_08_430070 186 19 funder funder NN 10_1101-2021_02_08_430070 186 20 , , , 10_1101-2021_02_08_430070 186 21 who who WP 10_1101-2021_02_08_430070 186 22 has have VBZ 10_1101-2021_02_08_430070 186 23 granted grant VBN 10_1101-2021_02_08_430070 186 24 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 186 25 a a DT 10_1101-2021_02_08_430070 186 26 license license NN 10_1101-2021_02_08_430070 186 27 to to TO 10_1101-2021_02_08_430070 186 28 display display VB 10_1101-2021_02_08_430070 186 29 the the DT 10_1101-2021_02_08_430070 186 30 preprint preprint NN 10_1101-2021_02_08_430070 186 31 in in IN 10_1101-2021_02_08_430070 186 32 perpetuity perpetuity NN 10_1101-2021_02_08_430070 186 33 . . . 10_1101-2021_02_08_430070 187 1 It -PRON- PRP 10_1101-2021_02_08_430070 187 2 is be VBZ 10_1101-2021_02_08_430070 187 3 made make VBN 10_1101-2021_02_08_430070 187 4 available available JJ 10_1101-2021_02_08_430070 187 5 under under IN 10_1101-2021_02_08_430070 187 6 a a DT 10_1101-2021_02_08_430070 187 7 The the DT 10_1101-2021_02_08_430070 187 8 copyright copyright NN 10_1101-2021_02_08_430070 187 9 holder holder NN 10_1101-2021_02_08_430070 187 10 for for IN 10_1101-2021_02_08_430070 187 11 this this DT 10_1101-2021_02_08_430070 187 12 preprint preprint NN 10_1101-2021_02_08_430070 187 13 ( ( -LRB- 10_1101-2021_02_08_430070 187 14 which which WDT 10_1101-2021_02_08_430070 187 15 was be VBD 10_1101-2021_02_08_430070 187 16 not not RB 10_1101-2021_02_08_430070 187 17 certified certify VBN 10_1101-2021_02_08_430070 187 18 bythis bythis DT 10_1101-2021_02_08_430070 187 19 version version NN 10_1101-2021_02_08_430070 187 20 posted post VBD 10_1101-2021_02_08_430070 187 21 February February NNP 10_1101-2021_02_08_430070 187 22 10 10 CD 10_1101-2021_02_08_430070 187 23 , , , 10_1101-2021_02_08_430070 187 24 2021 2021 CD 10_1101-2021_02_08_430070 187 25 . . . 10_1101-2021_02_08_430070 187 26 ; ; : 10_1101-2021_02_08_430070 187 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 187 28 : : : 10_1101-2021_02_08_430070 187 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 187 30 preprint preprint NN 10_1101-2021_02_08_430070 187 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 187 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 187 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 187 34 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 35 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 36 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 37 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 38 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 39 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 40 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 41 ✐ ✐ NNP 10_1101-2021_02_08_430070 187 42 6 6 CD 10_1101-2021_02_08_430070 187 43 Zhang Zhang NNP 10_1101-2021_02_08_430070 187 44 et et NNP 10_1101-2021_02_08_430070 187 45 al al NNP 10_1101-2021_02_08_430070 187 46 . . . 10_1101-2021_02_08_430070 188 1 ( ( -LRB- 10_1101-2021_02_08_430070 188 2 a a LS 10_1101-2021_02_08_430070 188 3 ) ) -RRB- 10_1101-2021_02_08_430070 188 4 Models model NNS 10_1101-2021_02_08_430070 188 5 trained train VBN 10_1101-2021_02_08_430070 188 6 with with IN 10_1101-2021_02_08_430070 188 7 Simpson_Hsapiens_CG_SssI simpson_hsapiens_cg_sssi JJ 10_1101-2021_02_08_430070 188 8 dataset dataset NN 10_1101-2021_02_08_430070 188 9 . . . 10_1101-2021_02_08_430070 189 1 ( ( -LRB- 10_1101-2021_02_08_430070 189 2 b b LS 10_1101-2021_02_08_430070 189 3 ) ) -RRB- 10_1101-2021_02_08_430070 189 4 Models model NNS 10_1101-2021_02_08_430070 189 5 trained train VBN 10_1101-2021_02_08_430070 189 6 with with IN 10_1101-2021_02_08_430070 189 7 Stoiber_Ecoli_CG_SssI Stoiber_Ecoli_CG_SssI NNP 10_1101-2021_02_08_430070 189 8 dataset dataset NN 10_1101-2021_02_08_430070 189 9 . . . 10_1101-2021_02_08_430070 190 1 Fig fig NN 10_1101-2021_02_08_430070 190 2 . . . 10_1101-2021_02_08_430070 191 1 3 3 LS 10_1101-2021_02_08_430070 191 2 : : : 10_1101-2021_02_08_430070 191 3 Violin Violin NNP 10_1101-2021_02_08_430070 191 4 plots plot NNS 10_1101-2021_02_08_430070 191 5 of of IN 10_1101-2021_02_08_430070 191 6 prediction prediction NN 10_1101-2021_02_08_430070 191 7 results result NNS 10_1101-2021_02_08_430070 191 8 of of IN 10_1101-2021_02_08_430070 191 9 models model NNS 10_1101-2021_02_08_430070 191 10 trained train VBN 10_1101-2021_02_08_430070 191 11 on on IN 10_1101-2021_02_08_430070 191 12 different different JJ 10_1101-2021_02_08_430070 191 13 datasets dataset NNS 10_1101-2021_02_08_430070 191 14 . . . 10_1101-2021_02_08_430070 192 1 BERT BERT NNP 10_1101-2021_02_08_430070 192 2 achieves achieve VBZ 10_1101-2021_02_08_430070 192 3 the the DT 10_1101-2021_02_08_430070 192 4 best good JJS 10_1101-2021_02_08_430070 192 5 r=0.9446 r=0.9446 NNPS 10_1101-2021_02_08_430070 192 6 , , , 10_1101-2021_02_08_430070 192 7 r2=0.8924 r2=0.8924 NNP 10_1101-2021_02_08_430070 192 8 and and CC 10_1101-2021_02_08_430070 192 9 RMSE RMSE NNP 10_1101-2021_02_08_430070 192 10 of of IN 10_1101-2021_02_08_430070 192 11 0.1449 0.1449 CD 10_1101-2021_02_08_430070 192 12 among among IN 10_1101-2021_02_08_430070 192 13 the the DT 10_1101-2021_02_08_430070 192 14 three three CD 10_1101-2021_02_08_430070 192 15 models model NNS 10_1101-2021_02_08_430070 192 16 , , , 10_1101-2021_02_08_430070 192 17 which which WDT 10_1101-2021_02_08_430070 192 18 demonstrate demonstrate VBP 10_1101-2021_02_08_430070 192 19 the the DT 10_1101-2021_02_08_430070 192 20 generalization generalization NN 10_1101-2021_02_08_430070 192 21 ability ability NN 10_1101-2021_02_08_430070 192 22 on on IN 10_1101-2021_02_08_430070 192 23 datasets dataset NNS 10_1101-2021_02_08_430070 192 24 sequenced sequence VBN 10_1101-2021_02_08_430070 192 25 by by IN 10_1101-2021_02_08_430070 192 26 different different JJ 10_1101-2021_02_08_430070 192 27 research research NN 10_1101-2021_02_08_430070 192 28 groups group NNS 10_1101-2021_02_08_430070 192 29 . . . 10_1101-2021_02_08_430070 193 1 Based base VBN 10_1101-2021_02_08_430070 193 2 on on IN 10_1101-2021_02_08_430070 193 3 the the DT 10_1101-2021_02_08_430070 193 4 reported report VBN 10_1101-2021_02_08_430070 193 5 benchmark benchmark JJ 10_1101-2021_02_08_430070 193 6 results result NNS 10_1101-2021_02_08_430070 193 7 , , , 10_1101-2021_02_08_430070 193 8 the the DT 10_1101-2021_02_08_430070 193 9 Pearson Pearson NNP 10_1101-2021_02_08_430070 193 10 correlation correlation NN 10_1101-2021_02_08_430070 193 11 ranks rank NNS 10_1101-2021_02_08_430070 193 12 between between IN 10_1101-2021_02_08_430070 193 13 reported report VBN 10_1101-2021_02_08_430070 193 14 deepMOD deepMOD NNP 10_1101-2021_02_08_430070 193 15 and and CC 10_1101-2021_02_08_430070 193 16 deepSignal deepsignal JJ 10_1101-2021_02_08_430070 193 17 ( ( -LRB- 10_1101-2021_02_08_430070 193 18 Megalodon Megalodon NNP 10_1101-2021_02_08_430070 193 19 > > XX 10_1101-2021_02_08_430070 193 20 DeepMODmixModel DeepMODmixModel NNP 10_1101-2021_02_08_430070 193 21 ( ( -LRB- 10_1101-2021_02_08_430070 193 22 0.9467 0.9467 CD 10_1101-2021_02_08_430070 193 23 ) ) -RRB- 10_1101-2021_02_08_430070 193 24 > > XX 10_1101-2021_02_08_430070 193 25 refined refined JJ 10_1101-2021_02_08_430070 193 26 BERT BERT NNP 10_1101-2021_02_08_430070 193 27 > > XX 10_1101-2021_02_08_430070 193 28 DeepSignalhuman_hx1 DeepSignalhuman_hx1 NNP 10_1101-2021_02_08_430070 193 29 ( ( -LRB- 10_1101-2021_02_08_430070 193 30 0.9420 0.9420 CD 10_1101-2021_02_08_430070 193 31 ) ) -RRB- 10_1101-2021_02_08_430070 193 32 > > XX 10_1101-2021_02_08_430070 193 33 Guppy Guppy NNP 10_1101-2021_02_08_430070 193 34 > > XX 10_1101-2021_02_08_430070 193 35 Nanopolish nanopolish JJ 10_1101-2021_02_08_430070 193 36 > > NN 10_1101-2021_02_08_430070 193 37 Tombo Tombo NNP 10_1101-2021_02_08_430070 193 38 ) ) -RRB- 10_1101-2021_02_08_430070 193 39 . . . 10_1101-2021_02_08_430070 194 1 3.5 3.5 CD 10_1101-2021_02_08_430070 194 2 Model model NN 10_1101-2021_02_08_430070 194 3 inference inference NN 10_1101-2021_02_08_430070 194 4 speed speed NN 10_1101-2021_02_08_430070 194 5 The the DT 10_1101-2021_02_08_430070 194 6 main main JJ 10_1101-2021_02_08_430070 194 7 motivation motivation NN 10_1101-2021_02_08_430070 194 8 of of IN 10_1101-2021_02_08_430070 194 9 applying apply VBG 10_1101-2021_02_08_430070 194 10 BERT BERT NNP 10_1101-2021_02_08_430070 194 11 models model NNS 10_1101-2021_02_08_430070 194 12 is be VBZ 10_1101-2021_02_08_430070 194 13 to to TO 10_1101-2021_02_08_430070 194 14 use use VB 10_1101-2021_02_08_430070 194 15 a a DT 10_1101-2021_02_08_430070 194 16 non non JJ 10_1101-2021_02_08_430070 194 17 - - JJ 10_1101-2021_02_08_430070 194 18 recurrent recurrent JJ 10_1101-2021_02_08_430070 194 19 modeling modeling NN 10_1101-2021_02_08_430070 194 20 approach approach NN 10_1101-2021_02_08_430070 194 21 for for IN 10_1101-2021_02_08_430070 194 22 the the DT 10_1101-2021_02_08_430070 194 23 nanopore nanopore JJ 10_1101-2021_02_08_430070 194 24 methylation methylation NN 10_1101-2021_02_08_430070 194 25 detection detection NN 10_1101-2021_02_08_430070 194 26 task task NN 10_1101-2021_02_08_430070 194 27 to to TO 10_1101-2021_02_08_430070 194 28 improve improve VB 10_1101-2021_02_08_430070 194 29 the the DT 10_1101-2021_02_08_430070 194 30 model model NN 10_1101-2021_02_08_430070 194 31 inference inference NN 10_1101-2021_02_08_430070 194 32 speed speed NN 10_1101-2021_02_08_430070 194 33 . . . 10_1101-2021_02_08_430070 195 1 We -PRON- PRP 10_1101-2021_02_08_430070 195 2 performed perform VBD 10_1101-2021_02_08_430070 195 3 a a DT 10_1101-2021_02_08_430070 195 4 speed speed NN 10_1101-2021_02_08_430070 195 5 test test NN 10_1101-2021_02_08_430070 195 6 on on IN 10_1101-2021_02_08_430070 195 7 a a DT 10_1101-2021_02_08_430070 195 8 server server NN 10_1101-2021_02_08_430070 195 9 with with IN 10_1101-2021_02_08_430070 195 10 24 24 CD 10_1101-2021_02_08_430070 195 11 CPU cpu NN 10_1101-2021_02_08_430070 195 12 cores core NNS 10_1101-2021_02_08_430070 195 13 ( ( -LRB- 10_1101-2021_02_08_430070 195 14 Intel(R intel(r NN 10_1101-2021_02_08_430070 195 15 ) ) -RRB- 10_1101-2021_02_08_430070 195 16 Xeon(R Xeon(R NNP 10_1101-2021_02_08_430070 195 17 ) ) -RRB- 10_1101-2021_02_08_430070 195 18 Gold gold NN 10_1101-2021_02_08_430070 195 19 6126 6126 CD 10_1101-2021_02_08_430070 195 20 CPU CPU NNP 10_1101-2021_02_08_430070 195 21 @ @ SYM 10_1101-2021_02_08_430070 195 22 2.60GHz 2.60ghz CD 10_1101-2021_02_08_430070 195 23 ) ) -RRB- 10_1101-2021_02_08_430070 195 24 and and CC 10_1101-2021_02_08_430070 195 25 .license .license NNP 10_1101-2021_02_08_430070 195 26 CC cc NN 10_1101-2021_02_08_430070 195 27 - - HYPH 10_1101-2021_02_08_430070 195 28 BY BY NNP 10_1101-2021_02_08_430070 195 29 - - HYPH 10_1101-2021_02_08_430070 195 30 NC NC NNP 10_1101-2021_02_08_430070 195 31 - - HYPH 10_1101-2021_02_08_430070 195 32 ND ND NNP 10_1101-2021_02_08_430070 195 33 4.0 4.0 CD 10_1101-2021_02_08_430070 195 34 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 195 35 review review NN 10_1101-2021_02_08_430070 195 36 ) ) -RRB- 10_1101-2021_02_08_430070 195 37 is be VBZ 10_1101-2021_02_08_430070 195 38 the the DT 10_1101-2021_02_08_430070 195 39 author author NN 10_1101-2021_02_08_430070 195 40 / / SYM 10_1101-2021_02_08_430070 195 41 funder funder NN 10_1101-2021_02_08_430070 195 42 , , , 10_1101-2021_02_08_430070 195 43 who who WP 10_1101-2021_02_08_430070 195 44 has have VBZ 10_1101-2021_02_08_430070 195 45 granted grant VBN 10_1101-2021_02_08_430070 195 46 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 195 47 a a DT 10_1101-2021_02_08_430070 195 48 license license NN 10_1101-2021_02_08_430070 195 49 to to TO 10_1101-2021_02_08_430070 195 50 display display VB 10_1101-2021_02_08_430070 195 51 the the DT 10_1101-2021_02_08_430070 195 52 preprint preprint NN 10_1101-2021_02_08_430070 195 53 in in IN 10_1101-2021_02_08_430070 195 54 perpetuity perpetuity NN 10_1101-2021_02_08_430070 195 55 . . . 10_1101-2021_02_08_430070 196 1 It -PRON- PRP 10_1101-2021_02_08_430070 196 2 is be VBZ 10_1101-2021_02_08_430070 196 3 made make VBN 10_1101-2021_02_08_430070 196 4 available available JJ 10_1101-2021_02_08_430070 196 5 under under IN 10_1101-2021_02_08_430070 196 6 a a DT 10_1101-2021_02_08_430070 196 7 The the DT 10_1101-2021_02_08_430070 196 8 copyright copyright NN 10_1101-2021_02_08_430070 196 9 holder holder NN 10_1101-2021_02_08_430070 196 10 for for IN 10_1101-2021_02_08_430070 196 11 this this DT 10_1101-2021_02_08_430070 196 12 preprint preprint NN 10_1101-2021_02_08_430070 196 13 ( ( -LRB- 10_1101-2021_02_08_430070 196 14 which which WDT 10_1101-2021_02_08_430070 196 15 was be VBD 10_1101-2021_02_08_430070 196 16 not not RB 10_1101-2021_02_08_430070 196 17 certified certify VBN 10_1101-2021_02_08_430070 196 18 bythis bythis DT 10_1101-2021_02_08_430070 196 19 version version NN 10_1101-2021_02_08_430070 196 20 posted post VBD 10_1101-2021_02_08_430070 196 21 February February NNP 10_1101-2021_02_08_430070 196 22 10 10 CD 10_1101-2021_02_08_430070 196 23 , , , 10_1101-2021_02_08_430070 196 24 2021 2021 CD 10_1101-2021_02_08_430070 196 25 . . . 10_1101-2021_02_08_430070 196 26 ; ; : 10_1101-2021_02_08_430070 196 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 196 28 : : : 10_1101-2021_02_08_430070 196 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 196 30 preprint preprint NN 10_1101-2021_02_08_430070 196 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 196 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 196 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 196 34 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 35 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 36 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 37 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 38 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 39 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 40 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 41 ✐ ✐ NNP 10_1101-2021_02_08_430070 196 42 BERT BERT NNP 10_1101-2021_02_08_430070 196 43 for for IN 10_1101-2021_02_08_430070 196 44 nanopore nanopore JJ 10_1101-2021_02_08_430070 196 45 methylation methylation NN 10_1101-2021_02_08_430070 196 46 detection detection NN 10_1101-2021_02_08_430070 196 47 7 7 CD 10_1101-2021_02_08_430070 196 48 Model Model NNP 10_1101-2021_02_08_430070 196 49 Model Model NNP 10_1101-2021_02_08_430070 196 50 inference inference NN 10_1101-2021_02_08_430070 196 51 time time NN 10_1101-2021_02_08_430070 196 52 Total total JJ 10_1101-2021_02_08_430070 196 53 running running NN 10_1101-2021_02_08_430070 196 54 time time NN 10_1101-2021_02_08_430070 196 55 biRNN birnn NN 10_1101-2021_02_08_430070 196 56 162.91 162.91 NNP 10_1101-2021_02_08_430070 196 57 s s NNP 10_1101-2021_02_08_430070 196 58 711.56 711.56 NNP 10_1101-2021_02_08_430070 196 59 s s NNP 10_1101-2021_02_08_430070 196 60 BERT_basic BERT_basic NNP 10_1101-2021_02_08_430070 196 61 22.71 22.71 NNP 10_1101-2021_02_08_430070 196 62 s s NNP 10_1101-2021_02_08_430070 196 63 615.36 615.36 CD 10_1101-2021_02_08_430070 196 64 s s NNP 10_1101-2021_02_08_430070 196 65 BERT_refined BERT_refined NNP 10_1101-2021_02_08_430070 196 66 27.29 27.29 CD 10_1101-2021_02_08_430070 196 67 s s NNPS 10_1101-2021_02_08_430070 196 68 622.73 622.73 NNP 10_1101-2021_02_08_430070 196 69 s s NNP 10_1101-2021_02_08_430070 196 70 Table table NN 10_1101-2021_02_08_430070 196 71 3 3 CD 10_1101-2021_02_08_430070 196 72 . . . 10_1101-2021_02_08_430070 197 1 Model model NN 10_1101-2021_02_08_430070 197 2 inference inference NN 10_1101-2021_02_08_430070 197 3 and and CC 10_1101-2021_02_08_430070 197 4 total total JJ 10_1101-2021_02_08_430070 197 5 running running NN 10_1101-2021_02_08_430070 197 6 time time NN 10_1101-2021_02_08_430070 197 7 on on IN 10_1101-2021_02_08_430070 197 8 the the DT 10_1101-2021_02_08_430070 197 9 benchmark benchmark JJ 10_1101-2021_02_08_430070 197 10 dataset1 dataset1 NN 10_1101-2021_02_08_430070 197 11 for for IN 10_1101-2021_02_08_430070 197 12 all all DT 10_1101-2021_02_08_430070 197 13 26402 26402 CD 10_1101-2021_02_08_430070 197 14 reads read NNS 10_1101-2021_02_08_430070 197 15 . . . 10_1101-2021_02_08_430070 198 1 one one CD 10_1101-2021_02_08_430070 198 2 V100 V100 NNP 10_1101-2021_02_08_430070 198 3 NIVIDA NIVIDA NNP 10_1101-2021_02_08_430070 198 4 GPU GPU NNP 10_1101-2021_02_08_430070 198 5 card card NN 10_1101-2021_02_08_430070 198 6 . . . 10_1101-2021_02_08_430070 199 1 In in IN 10_1101-2021_02_08_430070 199 2 the the DT 10_1101-2021_02_08_430070 199 3 running running NN 10_1101-2021_02_08_430070 199 4 , , , 10_1101-2021_02_08_430070 199 5 CPUs cpu NNS 10_1101-2021_02_08_430070 199 6 are be VBP 10_1101-2021_02_08_430070 199 7 responsible responsible JJ 10_1101-2021_02_08_430070 199 8 for for IN 10_1101-2021_02_08_430070 199 9 data datum NNS 10_1101-2021_02_08_430070 199 10 loading loading NN 10_1101-2021_02_08_430070 199 11 and and CC 10_1101-2021_02_08_430070 199 12 feature feature NN 10_1101-2021_02_08_430070 199 13 extraction extraction NN 10_1101-2021_02_08_430070 199 14 , , , 10_1101-2021_02_08_430070 199 15 while while IN 10_1101-2021_02_08_430070 199 16 GPU GPU NNP 10_1101-2021_02_08_430070 199 17 works work VBZ 10_1101-2021_02_08_430070 199 18 for for IN 10_1101-2021_02_08_430070 199 19 model model NN 10_1101-2021_02_08_430070 199 20 inference inference NN 10_1101-2021_02_08_430070 199 21 . . . 10_1101-2021_02_08_430070 200 1 We -PRON- PRP 10_1101-2021_02_08_430070 200 2 tested test VBD 10_1101-2021_02_08_430070 200 3 the the DT 10_1101-2021_02_08_430070 200 4 model model NN 10_1101-2021_02_08_430070 200 5 inference inference NN 10_1101-2021_02_08_430070 200 6 time time NN 10_1101-2021_02_08_430070 200 7 and and CC 10_1101-2021_02_08_430070 200 8 total total JJ 10_1101-2021_02_08_430070 200 9 running running NN 10_1101-2021_02_08_430070 200 10 time time NN 10_1101-2021_02_08_430070 200 11 of of IN 10_1101-2021_02_08_430070 200 12 the the DT 10_1101-2021_02_08_430070 200 13 three three CD 10_1101-2021_02_08_430070 200 14 models model NNS 10_1101-2021_02_08_430070 200 15 on on IN 10_1101-2021_02_08_430070 200 16 the the DT 10_1101-2021_02_08_430070 200 17 benchmark benchmark JJ 10_1101-2021_02_08_430070 200 18 dataset1 dataset1 NN 10_1101-2021_02_08_430070 200 19 . . . 10_1101-2021_02_08_430070 201 1 For for IN 10_1101-2021_02_08_430070 201 2 each each DT 10_1101-2021_02_08_430070 201 3 mixture mixture NN 10_1101-2021_02_08_430070 201 4 split split NN 10_1101-2021_02_08_430070 201 5 , , , 10_1101-2021_02_08_430070 201 6 we -PRON- PRP 10_1101-2021_02_08_430070 201 7 repeated repeat VBD 10_1101-2021_02_08_430070 201 8 5 5 CD 10_1101-2021_02_08_430070 201 9 times time NNS 10_1101-2021_02_08_430070 201 10 running run VBG 10_1101-2021_02_08_430070 201 11 and and CC 10_1101-2021_02_08_430070 201 12 took take VBD 10_1101-2021_02_08_430070 201 13 the the DT 10_1101-2021_02_08_430070 201 14 averaged average VBN 10_1101-2021_02_08_430070 201 15 value value NN 10_1101-2021_02_08_430070 201 16 . . . 10_1101-2021_02_08_430070 202 1 As as IN 10_1101-2021_02_08_430070 202 2 shown show VBN 10_1101-2021_02_08_430070 202 3 in in IN 10_1101-2021_02_08_430070 202 4 Table table NN 10_1101-2021_02_08_430070 202 5 3 3 CD 10_1101-2021_02_08_430070 202 6 , , , 10_1101-2021_02_08_430070 202 7 the the DT 10_1101-2021_02_08_430070 202 8 model model NN 10_1101-2021_02_08_430070 202 9 inference inference NN 10_1101-2021_02_08_430070 202 10 speed speed NN 10_1101-2021_02_08_430070 202 11 of of IN 10_1101-2021_02_08_430070 202 12 BERT BERT NNP 10_1101-2021_02_08_430070 202 13 models model NNS 10_1101-2021_02_08_430070 202 14 is be VBZ 10_1101-2021_02_08_430070 202 15 around around RB 10_1101-2021_02_08_430070 202 16 6x∼7x 6x∼7x CD 10_1101-2021_02_08_430070 202 17 faster fast JJR 10_1101-2021_02_08_430070 202 18 than than IN 10_1101-2021_02_08_430070 202 19 biRNN biRNN NNP 10_1101-2021_02_08_430070 202 20 model model NNP 10_1101-2021_02_08_430070 202 21 ( ( -LRB- 10_1101-2021_02_08_430070 202 22 BERT_refined:5.96x BERT_refined:5.96x NNP 10_1101-2021_02_08_430070 202 23 , , , 10_1101-2021_02_08_430070 202 24 BERT_basic:7.16x BERT_basic:7.16x NNP 10_1101-2021_02_08_430070 202 25 ) ) -RRB- 10_1101-2021_02_08_430070 202 26 . . . 10_1101-2021_02_08_430070 203 1 The the DT 10_1101-2021_02_08_430070 203 2 inference inference NN 10_1101-2021_02_08_430070 203 3 time time NN 10_1101-2021_02_08_430070 203 4 of of IN 10_1101-2021_02_08_430070 203 5 refined refined JJ 10_1101-2021_02_08_430070 203 6 BERT BERT NNP 10_1101-2021_02_08_430070 203 7 is be VBZ 10_1101-2021_02_08_430070 203 8 only only RB 10_1101-2021_02_08_430070 203 9 slightly slightly RB 10_1101-2021_02_08_430070 203 10 slower slow JJR 10_1101-2021_02_08_430070 203 11 than than IN 10_1101-2021_02_08_430070 203 12 the the DT 10_1101-2021_02_08_430070 203 13 basic basic JJ 10_1101-2021_02_08_430070 203 14 BERT BERT NNP 10_1101-2021_02_08_430070 203 15 model model NN 10_1101-2021_02_08_430070 203 16 . . . 10_1101-2021_02_08_430070 204 1 The the DT 10_1101-2021_02_08_430070 204 2 gap gap NN 10_1101-2021_02_08_430070 204 3 of of IN 10_1101-2021_02_08_430070 204 4 the the DT 10_1101-2021_02_08_430070 204 5 total total JJ 10_1101-2021_02_08_430070 204 6 time time NN 10_1101-2021_02_08_430070 204 7 is be VBZ 10_1101-2021_02_08_430070 204 8 not not RB 10_1101-2021_02_08_430070 204 9 that that RB 10_1101-2021_02_08_430070 204 10 large large JJ 10_1101-2021_02_08_430070 204 11 ( ( -LRB- 10_1101-2021_02_08_430070 204 12 BERT_refined:1.14x BERT_refined:1.14x NNP 10_1101-2021_02_08_430070 204 13 , , , 10_1101-2021_02_08_430070 204 14 BERT_basic:1.16x bert_basic:1.16x ADD 10_1101-2021_02_08_430070 204 15 ) ) -RRB- 10_1101-2021_02_08_430070 204 16 , , , 10_1101-2021_02_08_430070 204 17 as as IN 10_1101-2021_02_08_430070 204 18 the the DT 10_1101-2021_02_08_430070 204 19 data datum NNS 10_1101-2021_02_08_430070 204 20 I -PRON- PRP 10_1101-2021_02_08_430070 204 21 / / SYM 10_1101-2021_02_08_430070 204 22 O o NN 10_1101-2021_02_08_430070 204 23 and and CC 10_1101-2021_02_08_430070 204 24 feature feature NN 10_1101-2021_02_08_430070 204 25 extraction extraction NN 10_1101-2021_02_08_430070 204 26 take take VBP 10_1101-2021_02_08_430070 204 27 major major JJ 10_1101-2021_02_08_430070 204 28 time time NN 10_1101-2021_02_08_430070 204 29 . . . 10_1101-2021_02_08_430070 205 1 In in IN 10_1101-2021_02_08_430070 205 2 the the DT 10_1101-2021_02_08_430070 205 3 current current JJ 10_1101-2021_02_08_430070 205 4 implementation implementation NN 10_1101-2021_02_08_430070 205 5 of of IN 10_1101-2021_02_08_430070 205 6 BERT BERT NNP 10_1101-2021_02_08_430070 205 7 , , , 10_1101-2021_02_08_430070 205 8 we -PRON- PRP 10_1101-2021_02_08_430070 205 9 use use VBP 10_1101-2021_02_08_430070 205 10 reads read VBZ 10_1101-2021_02_08_430070 205 11 as as IN 10_1101-2021_02_08_430070 205 12 the the DT 10_1101-2021_02_08_430070 205 13 basic basic JJ 10_1101-2021_02_08_430070 205 14 data datum NNS 10_1101-2021_02_08_430070 205 15 unit unit NN 10_1101-2021_02_08_430070 205 16 and and CC 10_1101-2021_02_08_430070 205 17 integrate integrate VB 10_1101-2021_02_08_430070 205 18 the the DT 10_1101-2021_02_08_430070 205 19 data data NN 10_1101-2021_02_08_430070 205 20 pre pre JJ 10_1101-2021_02_08_430070 205 21 - - JJ 10_1101-2021_02_08_430070 205 22 processing processing JJ 10_1101-2021_02_08_430070 205 23 part part NN 10_1101-2021_02_08_430070 205 24 during during IN 10_1101-2021_02_08_430070 205 25 a a DT 10_1101-2021_02_08_430070 205 26 read read JJ 10_1101-2021_02_08_430070 205 27 - - HYPH 10_1101-2021_02_08_430070 205 28 batch batch NN 10_1101-2021_02_08_430070 205 29 loading loading NN 10_1101-2021_02_08_430070 205 30 process process NN 10_1101-2021_02_08_430070 205 31 . . . 10_1101-2021_02_08_430070 206 1 The the DT 10_1101-2021_02_08_430070 206 2 data datum NNS 10_1101-2021_02_08_430070 206 3 I i NN 10_1101-2021_02_08_430070 206 4 / / SYM 10_1101-2021_02_08_430070 206 5 O o NN 10_1101-2021_02_08_430070 206 6 and and CC 10_1101-2021_02_08_430070 206 7 feature feature NN 10_1101-2021_02_08_430070 206 8 extraction extraction NN 10_1101-2021_02_08_430070 206 9 part part NN 10_1101-2021_02_08_430070 206 10 can can MD 10_1101-2021_02_08_430070 206 11 be be VB 10_1101-2021_02_08_430070 206 12 further further RB 10_1101-2021_02_08_430070 206 13 accelerated accelerate VBN 10_1101-2021_02_08_430070 206 14 . . . 10_1101-2021_02_08_430070 207 1 4 4 LS 10_1101-2021_02_08_430070 207 2 Discussion Discussion NNP 10_1101-2021_02_08_430070 207 3 A A NNP 10_1101-2021_02_08_430070 207 4 BERT BERT NNP 10_1101-2021_02_08_430070 207 5 commonly commonly RB 10_1101-2021_02_08_430070 207 6 works work VBZ 10_1101-2021_02_08_430070 207 7 in in IN 10_1101-2021_02_08_430070 207 8 a a DT 10_1101-2021_02_08_430070 207 9 pre pre JJ 10_1101-2021_02_08_430070 207 10 - - JJ 10_1101-2021_02_08_430070 207 11 training training JJ 10_1101-2021_02_08_430070 207 12 and and CC 10_1101-2021_02_08_430070 207 13 fine fine JJ 10_1101-2021_02_08_430070 207 14 - - HYPH 10_1101-2021_02_08_430070 207 15 tuning tuning NN 10_1101-2021_02_08_430070 207 16 approach approach NN 10_1101-2021_02_08_430070 207 17 . . . 10_1101-2021_02_08_430070 208 1 In in IN 10_1101-2021_02_08_430070 208 2 the the DT 10_1101-2021_02_08_430070 208 3 pre pre JJ 10_1101-2021_02_08_430070 208 4 - - JJ 10_1101-2021_02_08_430070 208 5 training training JJ 10_1101-2021_02_08_430070 208 6 phase phase NN 10_1101-2021_02_08_430070 208 7 , , , 10_1101-2021_02_08_430070 208 8 a a DT 10_1101-2021_02_08_430070 208 9 BERT BERT NNP 10_1101-2021_02_08_430070 208 10 learns learn VBZ 10_1101-2021_02_08_430070 208 11 bi bi JJ 10_1101-2021_02_08_430070 208 12 - - JJ 10_1101-2021_02_08_430070 208 13 directional directional JJ 10_1101-2021_02_08_430070 208 14 representations representation NNS 10_1101-2021_02_08_430070 208 15 from from IN 10_1101-2021_02_08_430070 208 16 unlabeled unlabeled JJ 10_1101-2021_02_08_430070 208 17 data datum NNS 10_1101-2021_02_08_430070 208 18 . . . 10_1101-2021_02_08_430070 209 1 After after IN 10_1101-2021_02_08_430070 209 2 that that DT 10_1101-2021_02_08_430070 209 3 , , , 10_1101-2021_02_08_430070 209 4 learned learn VBN 10_1101-2021_02_08_430070 209 5 feature feature NN 10_1101-2021_02_08_430070 209 6 representations representation NNS 10_1101-2021_02_08_430070 209 7 are be VBP 10_1101-2021_02_08_430070 209 8 used use VBN 10_1101-2021_02_08_430070 209 9 on on IN 10_1101-2021_02_08_430070 209 10 task- task- NN 10_1101-2021_02_08_430070 209 11 specific specific JJ 10_1101-2021_02_08_430070 209 12 data datum NNS 10_1101-2021_02_08_430070 209 13 for for IN 10_1101-2021_02_08_430070 209 14 further further JJ 10_1101-2021_02_08_430070 209 15 fine fine NN 10_1101-2021_02_08_430070 209 16 - - HYPH 10_1101-2021_02_08_430070 209 17 tuning tuning NN 10_1101-2021_02_08_430070 209 18 . . . 10_1101-2021_02_08_430070 210 1 It -PRON- PRP 10_1101-2021_02_08_430070 210 2 has have VBZ 10_1101-2021_02_08_430070 210 3 lead lead NN 10_1101-2021_02_08_430070 210 4 to to IN 10_1101-2021_02_08_430070 210 5 several several JJ 10_1101-2021_02_08_430070 210 6 state state NN 10_1101-2021_02_08_430070 210 7 - - HYPH 10_1101-2021_02_08_430070 210 8 of of IN 10_1101-2021_02_08_430070 210 9 - - HYPH 10_1101-2021_02_08_430070 210 10 the the DT 10_1101-2021_02_08_430070 210 11 - - HYPH 10_1101-2021_02_08_430070 210 12 art art NN 10_1101-2021_02_08_430070 210 13 results result NNS 10_1101-2021_02_08_430070 210 14 on on IN 10_1101-2021_02_08_430070 210 15 many many JJ 10_1101-2021_02_08_430070 210 16 downstream downstream JJ 10_1101-2021_02_08_430070 210 17 tasks task NNS 10_1101-2021_02_08_430070 210 18 in in IN 10_1101-2021_02_08_430070 210 19 language language NN 10_1101-2021_02_08_430070 210 20 understanding understanding NN 10_1101-2021_02_08_430070 210 21 . . . 10_1101-2021_02_08_430070 211 1 According accord VBG 10_1101-2021_02_08_430070 211 2 to to IN 10_1101-2021_02_08_430070 211 3 the the DT 10_1101-2021_02_08_430070 211 4 data data NN 10_1101-2021_02_08_430070 211 5 scale scale NN 10_1101-2021_02_08_430070 211 6 , , , 10_1101-2021_02_08_430070 211 7 the the DT 10_1101-2021_02_08_430070 211 8 number number NN 10_1101-2021_02_08_430070 211 9 of of IN 10_1101-2021_02_08_430070 211 10 BERT BERT NNP 10_1101-2021_02_08_430070 211 11 parameters parameter NNS 10_1101-2021_02_08_430070 211 12 is be VBZ 10_1101-2021_02_08_430070 211 13 usually usually RB 10_1101-2021_02_08_430070 211 14 large large JJ 10_1101-2021_02_08_430070 211 15 , , , 10_1101-2021_02_08_430070 211 16 and and CC 10_1101-2021_02_08_430070 211 17 training train VBG 10_1101-2021_02_08_430070 211 18 such such PDT 10_1101-2021_02_08_430070 211 19 a a DT 10_1101-2021_02_08_430070 211 20 model model NN 10_1101-2021_02_08_430070 211 21 requires require VBZ 10_1101-2021_02_08_430070 211 22 a a DT 10_1101-2021_02_08_430070 211 23 huge huge JJ 10_1101-2021_02_08_430070 211 24 amount amount NN 10_1101-2021_02_08_430070 211 25 of of IN 10_1101-2021_02_08_430070 211 26 computational computational JJ 10_1101-2021_02_08_430070 211 27 resources resource NNS 10_1101-2021_02_08_430070 211 28 . . . 10_1101-2021_02_08_430070 212 1 For for IN 10_1101-2021_02_08_430070 212 2 example example NN 10_1101-2021_02_08_430070 212 3 , , , 10_1101-2021_02_08_430070 212 4 the the DT 10_1101-2021_02_08_430070 212 5 BERT BERT NNP 10_1101-2021_02_08_430070 212 6 used use VBN 10_1101-2021_02_08_430070 212 7 for for IN 10_1101-2021_02_08_430070 212 8 natural natural JJ 10_1101-2021_02_08_430070 212 9 language language NN 10_1101-2021_02_08_430070 212 10 modeling modeling NN 10_1101-2021_02_08_430070 212 11 has have VBZ 10_1101-2021_02_08_430070 212 12 a a DT 10_1101-2021_02_08_430070 212 13 parameter parameter NN 10_1101-2021_02_08_430070 212 14 scale scale NN 10_1101-2021_02_08_430070 212 15 ranging range VBG 10_1101-2021_02_08_430070 212 16 from from IN 10_1101-2021_02_08_430070 212 17 110 110 CD 10_1101-2021_02_08_430070 212 18 M m NN 10_1101-2021_02_08_430070 212 19 to to IN 10_1101-2021_02_08_430070 212 20 340 340 CD 10_1101-2021_02_08_430070 212 21 M m NN 10_1101-2021_02_08_430070 212 22 ( ( -LRB- 10_1101-2021_02_08_430070 212 23 Devlin Devlin NNP 10_1101-2021_02_08_430070 212 24 et et NNP 10_1101-2021_02_08_430070 212 25 al al NNP 10_1101-2021_02_08_430070 212 26 . . NNP 10_1101-2021_02_08_430070 212 27 , , , 10_1101-2021_02_08_430070 212 28 2018 2018 CD 10_1101-2021_02_08_430070 212 29 ) ) -RRB- 10_1101-2021_02_08_430070 212 30 . . . 10_1101-2021_02_08_430070 213 1 In in IN 10_1101-2021_02_08_430070 213 2 this this DT 10_1101-2021_02_08_430070 213 3 work work NN 10_1101-2021_02_08_430070 213 4 , , , 10_1101-2021_02_08_430070 213 5 we -PRON- PRP 10_1101-2021_02_08_430070 213 6 did do VBD 10_1101-2021_02_08_430070 213 7 not not RB 10_1101-2021_02_08_430070 213 8 follow follow VB 10_1101-2021_02_08_430070 213 9 this this DT 10_1101-2021_02_08_430070 213 10 schema schema NN 10_1101-2021_02_08_430070 213 11 . . . 10_1101-2021_02_08_430070 214 1 Instead instead RB 10_1101-2021_02_08_430070 214 2 , , , 10_1101-2021_02_08_430070 214 3 we -PRON- PRP 10_1101-2021_02_08_430070 214 4 utilized utilize VBD 10_1101-2021_02_08_430070 214 5 the the DT 10_1101-2021_02_08_430070 214 6 model model NN 10_1101-2021_02_08_430070 214 7 architecture architecture NN 10_1101-2021_02_08_430070 214 8 of of IN 10_1101-2021_02_08_430070 214 9 BERT BERT NNP 10_1101-2021_02_08_430070 214 10 to to TO 10_1101-2021_02_08_430070 214 11 provide provide VB 10_1101-2021_02_08_430070 214 12 a a DT 10_1101-2021_02_08_430070 214 13 lightweight lightweight JJ 10_1101-2021_02_08_430070 214 14 and and CC 10_1101-2021_02_08_430070 214 15 non non JJ 10_1101-2021_02_08_430070 214 16 - - JJ 10_1101-2021_02_08_430070 214 17 recurrent recurrent JJ 10_1101-2021_02_08_430070 214 18 solution solution NN 10_1101-2021_02_08_430070 214 19 to to TO 10_1101-2021_02_08_430070 214 20 replace replace VB 10_1101-2021_02_08_430070 214 21 the the DT 10_1101-2021_02_08_430070 214 22 recurrent recurrent JJ 10_1101-2021_02_08_430070 214 23 biRNN biRNN NNP 10_1101-2021_02_08_430070 214 24 model model NN 10_1101-2021_02_08_430070 214 25 . . . 10_1101-2021_02_08_430070 215 1 In in IN 10_1101-2021_02_08_430070 215 2 our -PRON- PRP$ 10_1101-2021_02_08_430070 215 3 experiment experiment NN 10_1101-2021_02_08_430070 215 4 , , , 10_1101-2021_02_08_430070 215 5 the the DT 10_1101-2021_02_08_430070 215 6 BERT BERT NNP 10_1101-2021_02_08_430070 215 7 uses use VBZ 10_1101-2021_02_08_430070 215 8 three three CD 10_1101-2021_02_08_430070 215 9 attention attention NN 10_1101-2021_02_08_430070 215 10 layers layer NNS 10_1101-2021_02_08_430070 215 11 with with IN 10_1101-2021_02_08_430070 215 12 4 4 CD 10_1101-2021_02_08_430070 215 13 attention attention NN 10_1101-2021_02_08_430070 215 14 heads head NNS 10_1101-2021_02_08_430070 215 15 and and CC 10_1101-2021_02_08_430070 215 16 100 100 CD 10_1101-2021_02_08_430070 215 17 hidden hide VBN 10_1101-2021_02_08_430070 215 18 units unit NNS 10_1101-2021_02_08_430070 215 19 for for IN 10_1101-2021_02_08_430070 215 20 each each DT 10_1101-2021_02_08_430070 215 21 layer layer NN 10_1101-2021_02_08_430070 215 22 . . . 10_1101-2021_02_08_430070 216 1 The the DT 10_1101-2021_02_08_430070 216 2 total total JJ 10_1101-2021_02_08_430070 216 3 number number NN 10_1101-2021_02_08_430070 216 4 of of IN 10_1101-2021_02_08_430070 216 5 model model NN 10_1101-2021_02_08_430070 216 6 parameters parameter NNS 10_1101-2021_02_08_430070 216 7 is be VBZ 10_1101-2021_02_08_430070 216 8 around around RB 10_1101-2021_02_08_430070 216 9 0.37 0.37 CD 10_1101-2021_02_08_430070 216 10 M M NNP 10_1101-2021_02_08_430070 216 11 , , , 10_1101-2021_02_08_430070 216 12 which which WDT 10_1101-2021_02_08_430070 216 13 is be VBZ 10_1101-2021_02_08_430070 216 14 even even RB 10_1101-2021_02_08_430070 216 15 less less JJR 10_1101-2021_02_08_430070 216 16 than than IN 10_1101-2021_02_08_430070 216 17 that that DT 10_1101-2021_02_08_430070 216 18 of of IN 10_1101-2021_02_08_430070 216 19 biRNN birnn NN 10_1101-2021_02_08_430070 216 20 ( ( -LRB- 10_1101-2021_02_08_430070 216 21 0.57 0.57 CD 10_1101-2021_02_08_430070 216 22 M m NN 10_1101-2021_02_08_430070 216 23 ) ) -RRB- 10_1101-2021_02_08_430070 216 24 . . . 10_1101-2021_02_08_430070 217 1 In in IN 10_1101-2021_02_08_430070 217 2 the the DT 10_1101-2021_02_08_430070 217 3 future future NN 10_1101-2021_02_08_430070 217 4 , , , 10_1101-2021_02_08_430070 217 5 when when WRB 10_1101-2021_02_08_430070 217 6 more more JJR 10_1101-2021_02_08_430070 217 7 nanopore nanopore JJ 10_1101-2021_02_08_430070 217 8 methylation methylation NN 10_1101-2021_02_08_430070 217 9 data datum NNS 10_1101-2021_02_08_430070 217 10 becomes become VBZ 10_1101-2021_02_08_430070 217 11 available available JJ 10_1101-2021_02_08_430070 217 12 , , , 10_1101-2021_02_08_430070 217 13 a a DT 10_1101-2021_02_08_430070 217 14 larger large JJR 10_1101-2021_02_08_430070 217 15 BERT BERT NNP 10_1101-2021_02_08_430070 217 16 model model NN 10_1101-2021_02_08_430070 217 17 and and CC 10_1101-2021_02_08_430070 217 18 pre pre JJ 10_1101-2021_02_08_430070 217 19 - - JJ 10_1101-2021_02_08_430070 217 20 training training JJ 10_1101-2021_02_08_430070 217 21 and and CC 10_1101-2021_02_08_430070 217 22 fine fine JJ 10_1101-2021_02_08_430070 217 23 - - HYPH 10_1101-2021_02_08_430070 217 24 tuning tuning NN 10_1101-2021_02_08_430070 217 25 scheme scheme NN 10_1101-2021_02_08_430070 217 26 can can MD 10_1101-2021_02_08_430070 217 27 be be VB 10_1101-2021_02_08_430070 217 28 further further RB 10_1101-2021_02_08_430070 217 29 explored explore VBN 10_1101-2021_02_08_430070 217 30 . . . 10_1101-2021_02_08_430070 218 1 5 5 CD 10_1101-2021_02_08_430070 218 2 Conclusion Conclusion NNP 10_1101-2021_02_08_430070 218 3 In in IN 10_1101-2021_02_08_430070 218 4 this this DT 10_1101-2021_02_08_430070 218 5 work work NN 10_1101-2021_02_08_430070 218 6 , , , 10_1101-2021_02_08_430070 218 7 we -PRON- PRP 10_1101-2021_02_08_430070 218 8 explored explore VBD 10_1101-2021_02_08_430070 218 9 applying apply VBG 10_1101-2021_02_08_430070 218 10 BERT BERT NNP 10_1101-2021_02_08_430070 218 11 models model NNS 10_1101-2021_02_08_430070 218 12 for for IN 10_1101-2021_02_08_430070 218 13 nanopore nanopore JJ 10_1101-2021_02_08_430070 218 14 methylation methylation NN 10_1101-2021_02_08_430070 218 15 detection detection NN 10_1101-2021_02_08_430070 218 16 , , , 10_1101-2021_02_08_430070 218 17 which which WDT 10_1101-2021_02_08_430070 218 18 aims aim VBZ 10_1101-2021_02_08_430070 218 19 to to TO 10_1101-2021_02_08_430070 218 20 use use VB 10_1101-2021_02_08_430070 218 21 a a DT 10_1101-2021_02_08_430070 218 22 non non JJ 10_1101-2021_02_08_430070 218 23 - - JJ 10_1101-2021_02_08_430070 218 24 recurrent recurrent JJ 10_1101-2021_02_08_430070 218 25 modeling modeling NN 10_1101-2021_02_08_430070 218 26 approach approach NN 10_1101-2021_02_08_430070 218 27 for for IN 10_1101-2021_02_08_430070 218 28 fast fast JJ 10_1101-2021_02_08_430070 218 29 inference inference NN 10_1101-2021_02_08_430070 218 30 . . . 10_1101-2021_02_08_430070 219 1 We -PRON- PRP 10_1101-2021_02_08_430070 219 2 quantified quantify VBD 10_1101-2021_02_08_430070 219 3 positional positional JJ 10_1101-2021_02_08_430070 219 4 signal signal NN 10_1101-2021_02_08_430070 219 5 - - HYPH 10_1101-2021_02_08_430070 219 6 shift shift NN 10_1101-2021_02_08_430070 219 7 related relate VBN 10_1101-2021_02_08_430070 219 8 to to IN 10_1101-2021_02_08_430070 219 9 methylation methylation NN 10_1101-2021_02_08_430070 219 10 for for IN 10_1101-2021_02_08_430070 219 11 different different JJ 10_1101-2021_02_08_430070 219 12 datasets dataset NNS 10_1101-2021_02_08_430070 219 13 of of IN 10_1101-2021_02_08_430070 219 14 specific specific JJ 10_1101-2021_02_08_430070 219 15 motif motif NN 10_1101-2021_02_08_430070 219 16 / / SYM 10_1101-2021_02_08_430070 219 17 methylase methylase NN 10_1101-2021_02_08_430070 219 18 and and CC 10_1101-2021_02_08_430070 219 19 found find VBD 10_1101-2021_02_08_430070 219 20 patterns pattern NNS 10_1101-2021_02_08_430070 219 21 across across IN 10_1101-2021_02_08_430070 219 22 datasets dataset NNS 10_1101-2021_02_08_430070 219 23 . . . 10_1101-2021_02_08_430070 220 1 In in IN 10_1101-2021_02_08_430070 220 2 the the DT 10_1101-2021_02_08_430070 220 3 process process NN 10_1101-2021_02_08_430070 220 4 of of IN 10_1101-2021_02_08_430070 220 5 evaluation evaluation NN 10_1101-2021_02_08_430070 220 6 , , , 10_1101-2021_02_08_430070 220 7 we -PRON- PRP 10_1101-2021_02_08_430070 220 8 found find VBD 10_1101-2021_02_08_430070 220 9 the the DT 10_1101-2021_02_08_430070 220 10 original original JJ 10_1101-2021_02_08_430070 220 11 BERT BERT NNP 10_1101-2021_02_08_430070 220 12 architecture architecture NN 10_1101-2021_02_08_430070 220 13 does do VBZ 10_1101-2021_02_08_430070 220 14 not not RB 10_1101-2021_02_08_430070 220 15 work work VB 10_1101-2021_02_08_430070 220 16 as as RB 10_1101-2021_02_08_430070 220 17 well well RB 10_1101-2021_02_08_430070 220 18 as as IN 10_1101-2021_02_08_430070 220 19 biRNN birnn NN 10_1101-2021_02_08_430070 220 20 . . . 10_1101-2021_02_08_430070 221 1 We -PRON- PRP 10_1101-2021_02_08_430070 221 2 proposed propose VBD 10_1101-2021_02_08_430070 221 3 a a DT 10_1101-2021_02_08_430070 221 4 refined refined JJ 10_1101-2021_02_08_430070 221 5 BERT BERT NNP 10_1101-2021_02_08_430070 221 6 considering consider VBG 10_1101-2021_02_08_430070 221 7 task task NN 10_1101-2021_02_08_430070 221 8 - - HYPH 10_1101-2021_02_08_430070 221 9 specific specific JJ 10_1101-2021_02_08_430070 221 10 characters character NNS 10_1101-2021_02_08_430070 221 11 into into IN 10_1101-2021_02_08_430070 221 12 the the DT 10_1101-2021_02_08_430070 221 13 modeling modeling NN 10_1101-2021_02_08_430070 221 14 . . . 10_1101-2021_02_08_430070 222 1 Compared compare VBN 10_1101-2021_02_08_430070 222 2 with with IN 10_1101-2021_02_08_430070 222 3 the the DT 10_1101-2021_02_08_430070 222 4 original original JJ 10_1101-2021_02_08_430070 222 5 BERT BERT NNP 10_1101-2021_02_08_430070 222 6 , , , 10_1101-2021_02_08_430070 222 7 the the DT 10_1101-2021_02_08_430070 222 8 refined refined JJ 10_1101-2021_02_08_430070 222 9 BERT BERT NNP 10_1101-2021_02_08_430070 222 10 uses use VBZ 10_1101-2021_02_08_430070 222 11 learnable learnable JJ 10_1101-2021_02_08_430070 222 12 positional positional JJ 10_1101-2021_02_08_430070 222 13 encoding encoding NN 10_1101-2021_02_08_430070 222 14 and and CC 10_1101-2021_02_08_430070 222 15 self self NN 10_1101-2021_02_08_430070 222 16 - - HYPH 10_1101-2021_02_08_430070 222 17 attention attention NN 10_1101-2021_02_08_430070 222 18 with with IN 10_1101-2021_02_08_430070 222 19 relative relative JJ 10_1101-2021_02_08_430070 222 20 position position NN 10_1101-2021_02_08_430070 222 21 representation representation NN 10_1101-2021_02_08_430070 222 22 , , , 10_1101-2021_02_08_430070 222 23 and and CC 10_1101-2021_02_08_430070 222 24 focuses focus VBZ 10_1101-2021_02_08_430070 222 25 more more RBR 10_1101-2021_02_08_430070 222 26 on on IN 10_1101-2021_02_08_430070 222 27 the the DT 10_1101-2021_02_08_430070 222 28 center center NN 10_1101-2021_02_08_430070 222 29 positions position NNS 10_1101-2021_02_08_430070 222 30 in in IN 10_1101-2021_02_08_430070 222 31 a a DT 10_1101-2021_02_08_430070 222 32 ±3bp ±3bp NN 10_1101-2021_02_08_430070 222 33 range range NN 10_1101-2021_02_08_430070 222 34 . . . 10_1101-2021_02_08_430070 223 1 The the DT 10_1101-2021_02_08_430070 223 2 experiment experiment NN 10_1101-2021_02_08_430070 223 3 results result NNS 10_1101-2021_02_08_430070 223 4 show show VBP 10_1101-2021_02_08_430070 223 5 that that IN 10_1101-2021_02_08_430070 223 6 the the DT 10_1101-2021_02_08_430070 223 7 refined refined JJ 10_1101-2021_02_08_430070 223 8 BERT BERT NNP 10_1101-2021_02_08_430070 223 9 can can MD 10_1101-2021_02_08_430070 223 10 achieve achieve VB 10_1101-2021_02_08_430070 223 11 competitive competitive JJ 10_1101-2021_02_08_430070 223 12 and and CC 10_1101-2021_02_08_430070 223 13 even even RB 10_1101-2021_02_08_430070 223 14 better well JJR 10_1101-2021_02_08_430070 223 15 results result NNS 10_1101-2021_02_08_430070 223 16 than than IN 10_1101-2021_02_08_430070 223 17 the the DT 10_1101-2021_02_08_430070 223 18 state- state- NN 10_1101-2021_02_08_430070 223 19 of of IN 10_1101-2021_02_08_430070 223 20 - - HYPH 10_1101-2021_02_08_430070 223 21 the the DT 10_1101-2021_02_08_430070 223 22 - - HYPH 10_1101-2021_02_08_430070 223 23 art art NN 10_1101-2021_02_08_430070 223 24 biRNN birnn NN 10_1101-2021_02_08_430070 223 25 model model NN 10_1101-2021_02_08_430070 223 26 on on IN 10_1101-2021_02_08_430070 223 27 a a DT 10_1101-2021_02_08_430070 223 28 set set NN 10_1101-2021_02_08_430070 223 29 of of IN 10_1101-2021_02_08_430070 223 30 5mC 5mc CD 10_1101-2021_02_08_430070 223 31 and and CC 10_1101-2021_02_08_430070 223 32 6mA 6ma CD 10_1101-2021_02_08_430070 223 33 benchmark benchmark NN 10_1101-2021_02_08_430070 223 34 datasets dataset NNS 10_1101-2021_02_08_430070 223 35 , , , 10_1101-2021_02_08_430070 223 36 while while IN 10_1101-2021_02_08_430070 223 37 the the DT 10_1101-2021_02_08_430070 223 38 model model NN 10_1101-2021_02_08_430070 223 39 inference inference NN 10_1101-2021_02_08_430070 223 40 speed speed NN 10_1101-2021_02_08_430070 223 41 is be VBZ 10_1101-2021_02_08_430070 223 42 about about RB 10_1101-2021_02_08_430070 223 43 6x 6x CD 10_1101-2021_02_08_430070 223 44 faster fast RBR 10_1101-2021_02_08_430070 223 45 . . . 10_1101-2021_02_08_430070 224 1 On on IN 10_1101-2021_02_08_430070 224 2 the the DT 10_1101-2021_02_08_430070 224 3 cross cross JJ 10_1101-2021_02_08_430070 224 4 - - JJ 10_1101-2021_02_08_430070 224 5 sample sample JJ 10_1101-2021_02_08_430070 224 6 evaluation evaluation NN 10_1101-2021_02_08_430070 224 7 , , , 10_1101-2021_02_08_430070 224 8 for for IN 10_1101-2021_02_08_430070 224 9 the the DT 10_1101-2021_02_08_430070 224 10 case case NN 10_1101-2021_02_08_430070 224 11 that that WDT 10_1101-2021_02_08_430070 224 12 train train NN 10_1101-2021_02_08_430070 224 13 and and CC 10_1101-2021_02_08_430070 224 14 test test NN 10_1101-2021_02_08_430070 224 15 data datum NNS 10_1101-2021_02_08_430070 224 16 from from IN 10_1101-2021_02_08_430070 224 17 different different JJ 10_1101-2021_02_08_430070 224 18 research research NN 10_1101-2021_02_08_430070 224 19 groups group NNS 10_1101-2021_02_08_430070 224 20 , , , 10_1101-2021_02_08_430070 224 21 BERTs BERTs NNP 10_1101-2021_02_08_430070 224 22 ( ( -LRB- 10_1101-2021_02_08_430070 224 23 include include VBP 10_1101-2021_02_08_430070 224 24 the the DT 10_1101-2021_02_08_430070 224 25 original original JJ 10_1101-2021_02_08_430070 224 26 BERT BERT NNP 10_1101-2021_02_08_430070 224 27 ) ) -RRB- 10_1101-2021_02_08_430070 224 28 show show VBP 10_1101-2021_02_08_430070 224 29 a a DT 10_1101-2021_02_08_430070 224 30 better well JJR 10_1101-2021_02_08_430070 224 31 performance performance NN 10_1101-2021_02_08_430070 224 32 than than IN 10_1101-2021_02_08_430070 224 33 biRNN birnn NN 10_1101-2021_02_08_430070 224 34 . . . 10_1101-2021_02_08_430070 225 1 Acknowledgements acknowledgement NNS 10_1101-2021_02_08_430070 225 2 We -PRON- PRP 10_1101-2021_02_08_430070 225 3 would would MD 10_1101-2021_02_08_430070 225 4 like like VB 10_1101-2021_02_08_430070 225 5 to to TO 10_1101-2021_02_08_430070 225 6 thank thank VB 10_1101-2021_02_08_430070 225 7 Marcus Marcus NNP 10_1101-2021_02_08_430070 225 8 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 225 9 and and CC 10_1101-2021_02_08_430070 225 10 Jared Jared NNP 10_1101-2021_02_08_430070 225 11 Simpson Simpson NNP 10_1101-2021_02_08_430070 225 12 for for IN 10_1101-2021_02_08_430070 225 13 making make VBG 10_1101-2021_02_08_430070 225 14 nanopore nanopore JJ 10_1101-2021_02_08_430070 225 15 methylation methylation NN 10_1101-2021_02_08_430070 225 16 data datum NNS 10_1101-2021_02_08_430070 225 17 publicly publicly RB 10_1101-2021_02_08_430070 225 18 available available JJ 10_1101-2021_02_08_430070 225 19 , , , 10_1101-2021_02_08_430070 225 20 Zaka Zaka NNP 10_1101-2021_02_08_430070 225 21 Wing Wing NNP 10_1101-2021_02_08_430070 225 22 - - HYPH 10_1101-2021_02_08_430070 225 23 Sze Sze NNP 10_1101-2021_02_08_430070 225 24 Yuen Yuen NNP 10_1101-2021_02_08_430070 225 25 for for IN 10_1101-2021_02_08_430070 225 26 providing provide VBG 10_1101-2021_02_08_430070 225 27 the the DT 10_1101-2021_02_08_430070 225 28 benchmark benchmark JJ 10_1101-2021_02_08_430070 225 29 dataset dataset NN 10_1101-2021_02_08_430070 225 30 and and CC 10_1101-2021_02_08_430070 225 31 pipeline pipeline NN 10_1101-2021_02_08_430070 225 32 , , , 10_1101-2021_02_08_430070 225 33 authors author NNS 10_1101-2021_02_08_430070 225 34 of of IN 10_1101-2021_02_08_430070 225 35 deepMOD deepMOD NNP 10_1101-2021_02_08_430070 225 36 and and CC 10_1101-2021_02_08_430070 225 37 deepSignal deepsignal JJ 10_1101-2021_02_08_430070 225 38 for for IN 10_1101-2021_02_08_430070 225 39 providing provide VBG 10_1101-2021_02_08_430070 225 40 their -PRON- PRP$ 10_1101-2021_02_08_430070 225 41 source source NN 10_1101-2021_02_08_430070 225 42 codes code NNS 10_1101-2021_02_08_430070 225 43 . . . 10_1101-2021_02_08_430070 226 1 References References NNPS 10_1101-2021_02_08_430070 226 2 Devlin Devlin NNP 10_1101-2021_02_08_430070 226 3 , , , 10_1101-2021_02_08_430070 226 4 J. J. NNP 10_1101-2021_02_08_430070 226 5 et et NNP 10_1101-2021_02_08_430070 226 6 al al NNP 10_1101-2021_02_08_430070 226 7 . . . 10_1101-2021_02_08_430070 227 1 ( ( -LRB- 10_1101-2021_02_08_430070 227 2 2018 2018 CD 10_1101-2021_02_08_430070 227 3 ) ) -RRB- 10_1101-2021_02_08_430070 227 4 . . . 10_1101-2021_02_08_430070 228 1 Bert Bert NNP 10_1101-2021_02_08_430070 228 2 : : : 10_1101-2021_02_08_430070 228 3 Pre pre JJ 10_1101-2021_02_08_430070 228 4 - - JJ 10_1101-2021_02_08_430070 228 5 training training NN 10_1101-2021_02_08_430070 228 6 of of IN 10_1101-2021_02_08_430070 228 7 deep deep JJ 10_1101-2021_02_08_430070 228 8 bidirectional bidirectional JJ 10_1101-2021_02_08_430070 228 9 transformers transformer NNS 10_1101-2021_02_08_430070 228 10 for for IN 10_1101-2021_02_08_430070 228 11 language language NN 10_1101-2021_02_08_430070 228 12 understanding understanding NN 10_1101-2021_02_08_430070 228 13 . . . 10_1101-2021_02_08_430070 229 1 arXiv arXiv NNP 10_1101-2021_02_08_430070 229 2 preprint preprint NN 10_1101-2021_02_08_430070 229 3 arXiv:1810.04805 arXiv:1810.04805 NNP 10_1101-2021_02_08_430070 229 4 . . . 10_1101-2021_02_08_430070 230 1 Huang Huang NNP 10_1101-2021_02_08_430070 230 2 , , , 10_1101-2021_02_08_430070 230 3 Z. Z. NNP 10_1101-2021_02_08_430070 230 4 et et FW 10_1101-2021_02_08_430070 230 5 al al NNP 10_1101-2021_02_08_430070 230 6 . . . 10_1101-2021_02_08_430070 231 1 ( ( -LRB- 10_1101-2021_02_08_430070 231 2 2020 2020 CD 10_1101-2021_02_08_430070 231 3 ) ) -RRB- 10_1101-2021_02_08_430070 231 4 . . . 10_1101-2021_02_08_430070 232 1 Improve improve VB 10_1101-2021_02_08_430070 232 2 transformer transformer NN 10_1101-2021_02_08_430070 232 3 models model NNS 10_1101-2021_02_08_430070 232 4 with with IN 10_1101-2021_02_08_430070 232 5 better well JJR 10_1101-2021_02_08_430070 232 6 relative relative JJ 10_1101-2021_02_08_430070 232 7 position position NN 10_1101-2021_02_08_430070 232 8 embeddings embedding NNS 10_1101-2021_02_08_430070 232 9 . . . 10_1101-2021_02_08_430070 233 1 arXiv arXiv NNP 10_1101-2021_02_08_430070 233 2 preprint preprint NN 10_1101-2021_02_08_430070 233 3 arXiv:2009.13658 arxiv:2009.13658 VB 10_1101-2021_02_08_430070 233 4 . . . 10_1101-2021_02_08_430070 234 1 Kim Kim NNP 10_1101-2021_02_08_430070 234 2 , , , 10_1101-2021_02_08_430070 234 3 D. D. NNP 10_1101-2021_02_08_430070 234 4 et et NNP 10_1101-2021_02_08_430070 234 5 al al NNP 10_1101-2021_02_08_430070 234 6 . . . 10_1101-2021_02_08_430070 235 1 ( ( -LRB- 10_1101-2021_02_08_430070 235 2 2020 2020 CD 10_1101-2021_02_08_430070 235 3 ) ) -RRB- 10_1101-2021_02_08_430070 235 4 . . . 10_1101-2021_02_08_430070 236 1 The the DT 10_1101-2021_02_08_430070 236 2 architecture architecture NN 10_1101-2021_02_08_430070 236 3 of of IN 10_1101-2021_02_08_430070 236 4 sars sars NNP 10_1101-2021_02_08_430070 236 5 - - HYPH 10_1101-2021_02_08_430070 236 6 cov-2 cov-2 NNP 10_1101-2021_02_08_430070 236 7 transcriptome transcriptome DT 10_1101-2021_02_08_430070 236 8 . . . 10_1101-2021_02_08_430070 237 1 Cell cell NN 10_1101-2021_02_08_430070 237 2 , , , 10_1101-2021_02_08_430070 237 3 181(4 181(4 CD 10_1101-2021_02_08_430070 237 4 ) ) -RRB- 10_1101-2021_02_08_430070 237 5 , , , 10_1101-2021_02_08_430070 237 6 914–921 914–921 CD 10_1101-2021_02_08_430070 237 7 . . . 10_1101-2021_02_08_430070 238 1 Kingma Kingma NNP 10_1101-2021_02_08_430070 238 2 , , , 10_1101-2021_02_08_430070 238 3 D. D. NNP 10_1101-2021_02_08_430070 238 4 P. P. NNP 10_1101-2021_02_08_430070 238 5 and and CC 10_1101-2021_02_08_430070 238 6 Ba Ba NNP 10_1101-2021_02_08_430070 238 7 , , , 10_1101-2021_02_08_430070 238 8 J. J. NNP 10_1101-2021_02_08_430070 239 1 ( ( -LRB- 10_1101-2021_02_08_430070 239 2 2014 2014 CD 10_1101-2021_02_08_430070 239 3 ) ) -RRB- 10_1101-2021_02_08_430070 239 4 . . . 10_1101-2021_02_08_430070 240 1 Adam Adam NNP 10_1101-2021_02_08_430070 240 2 : : : 10_1101-2021_02_08_430070 240 3 A a DT 10_1101-2021_02_08_430070 240 4 method method NN 10_1101-2021_02_08_430070 240 5 for for IN 10_1101-2021_02_08_430070 240 6 stochastic stochastic JJ 10_1101-2021_02_08_430070 240 7 optimization optimization NN 10_1101-2021_02_08_430070 240 8 . . . 10_1101-2021_02_08_430070 241 1 arXiv arXiv NNP 10_1101-2021_02_08_430070 241 2 preprint preprint NN 10_1101-2021_02_08_430070 241 3 arXiv:1412.6980 arXiv:1412.6980 NNP 10_1101-2021_02_08_430070 241 4 . . . 10_1101-2021_02_08_430070 242 1 Liu Liu NNP 10_1101-2021_02_08_430070 242 2 , , , 10_1101-2021_02_08_430070 242 3 Q. Q. NNP 10_1101-2021_02_08_430070 242 4 et et NNP 10_1101-2021_02_08_430070 242 5 al al NNP 10_1101-2021_02_08_430070 242 6 . . . 10_1101-2021_02_08_430070 243 1 ( ( -LRB- 10_1101-2021_02_08_430070 243 2 2019 2019 CD 10_1101-2021_02_08_430070 243 3 ) ) -RRB- 10_1101-2021_02_08_430070 243 4 . . . 10_1101-2021_02_08_430070 244 1 Detection detection NN 10_1101-2021_02_08_430070 244 2 of of IN 10_1101-2021_02_08_430070 244 3 dna dna NN 10_1101-2021_02_08_430070 244 4 base base NN 10_1101-2021_02_08_430070 244 5 modifications modification NNS 10_1101-2021_02_08_430070 244 6 by by IN 10_1101-2021_02_08_430070 244 7 deep deep JJ 10_1101-2021_02_08_430070 244 8 recurrent recurrent JJ 10_1101-2021_02_08_430070 244 9 neural neural JJ 10_1101-2021_02_08_430070 244 10 network network NN 10_1101-2021_02_08_430070 244 11 on on IN 10_1101-2021_02_08_430070 244 12 oxford oxford NNP 10_1101-2021_02_08_430070 244 13 nanopore nanopore NNP 10_1101-2021_02_08_430070 244 14 sequencing sequence VBG 10_1101-2021_02_08_430070 244 15 data datum NNS 10_1101-2021_02_08_430070 244 16 . . . 10_1101-2021_02_08_430070 245 1 Nature nature NN 10_1101-2021_02_08_430070 245 2 communications communication NNS 10_1101-2021_02_08_430070 245 3 , , , 10_1101-2021_02_08_430070 245 4 10(1 10(1 CD 10_1101-2021_02_08_430070 245 5 ) ) -RRB- 10_1101-2021_02_08_430070 245 6 , , , 10_1101-2021_02_08_430070 245 7 1–11 1–11 NNP 10_1101-2021_02_08_430070 245 8 . . . 10_1101-2021_02_08_430070 246 1 Ni Ni NNP 10_1101-2021_02_08_430070 246 2 , , , 10_1101-2021_02_08_430070 246 3 P. P. NNP 10_1101-2021_02_08_430070 246 4 et et NNP 10_1101-2021_02_08_430070 246 5 al al NNP 10_1101-2021_02_08_430070 246 6 . . . 10_1101-2021_02_08_430070 247 1 ( ( -LRB- 10_1101-2021_02_08_430070 247 2 2019 2019 CD 10_1101-2021_02_08_430070 247 3 ) ) -RRB- 10_1101-2021_02_08_430070 247 4 . . . 10_1101-2021_02_08_430070 248 1 Deepsignal deepsignal JJ 10_1101-2021_02_08_430070 248 2 : : : 10_1101-2021_02_08_430070 248 3 detecting detect VBG 10_1101-2021_02_08_430070 248 4 dna dna NN 10_1101-2021_02_08_430070 248 5 methylation methylation NN 10_1101-2021_02_08_430070 248 6 state state NN 10_1101-2021_02_08_430070 248 7 from from IN 10_1101-2021_02_08_430070 248 8 nanopore nanopore JJ 10_1101-2021_02_08_430070 248 9 sequencing sequencing NN 10_1101-2021_02_08_430070 248 10 reads read NNS 10_1101-2021_02_08_430070 248 11 using use VBG 10_1101-2021_02_08_430070 248 12 deep deep JJ 10_1101-2021_02_08_430070 248 13 - - HYPH 10_1101-2021_02_08_430070 248 14 learning learning NN 10_1101-2021_02_08_430070 248 15 . . . 10_1101-2021_02_08_430070 249 1 Bioinformatics bioinformatic NNS 10_1101-2021_02_08_430070 249 2 , , , 10_1101-2021_02_08_430070 249 3 35(22 35(22 CD 10_1101-2021_02_08_430070 249 4 ) ) -RRB- 10_1101-2021_02_08_430070 249 5 , , , 10_1101-2021_02_08_430070 249 6 4586–4595 4586–4595 CD 10_1101-2021_02_08_430070 249 7 . . . 10_1101-2021_02_08_430070 250 1 Shaw Shaw NNP 10_1101-2021_02_08_430070 250 2 , , , 10_1101-2021_02_08_430070 250 3 P. P. NNP 10_1101-2021_02_08_430070 250 4 et et NNP 10_1101-2021_02_08_430070 250 5 al al NNP 10_1101-2021_02_08_430070 250 6 . . . 10_1101-2021_02_08_430070 251 1 ( ( -LRB- 10_1101-2021_02_08_430070 251 2 2018 2018 CD 10_1101-2021_02_08_430070 251 3 ) ) -RRB- 10_1101-2021_02_08_430070 251 4 . . . 10_1101-2021_02_08_430070 252 1 Self self NN 10_1101-2021_02_08_430070 252 2 - - HYPH 10_1101-2021_02_08_430070 252 3 attention attention NN 10_1101-2021_02_08_430070 252 4 with with IN 10_1101-2021_02_08_430070 252 5 relative relative JJ 10_1101-2021_02_08_430070 252 6 position position NN 10_1101-2021_02_08_430070 252 7 representations representation NNS 10_1101-2021_02_08_430070 252 8 . . . 10_1101-2021_02_08_430070 253 1 arXiv arXiv NNP 10_1101-2021_02_08_430070 253 2 preprint preprint NN 10_1101-2021_02_08_430070 253 3 arXiv:1803.02155 arXiv:1803.02155 NNP 10_1101-2021_02_08_430070 253 4 . . . 10_1101-2021_02_08_430070 254 1 Simpson Simpson NNP 10_1101-2021_02_08_430070 254 2 , , , 10_1101-2021_02_08_430070 254 3 J. J. NNP 10_1101-2021_02_08_430070 254 4 T. T. NNP 10_1101-2021_02_08_430070 254 5 et et NNP 10_1101-2021_02_08_430070 254 6 al al NNP 10_1101-2021_02_08_430070 254 7 . . . 10_1101-2021_02_08_430070 255 1 ( ( -LRB- 10_1101-2021_02_08_430070 255 2 2017 2017 CD 10_1101-2021_02_08_430070 255 3 ) ) -RRB- 10_1101-2021_02_08_430070 255 4 . . . 10_1101-2021_02_08_430070 256 1 Detecting detect VBG 10_1101-2021_02_08_430070 256 2 dna dna NN 10_1101-2021_02_08_430070 256 3 cytosine cytosine NN 10_1101-2021_02_08_430070 256 4 methylation methylation NN 10_1101-2021_02_08_430070 256 5 using use VBG 10_1101-2021_02_08_430070 256 6 nanopore nanopore JJ 10_1101-2021_02_08_430070 256 7 sequencing sequencing NN 10_1101-2021_02_08_430070 256 8 . . . 10_1101-2021_02_08_430070 257 1 Nature nature NN 10_1101-2021_02_08_430070 257 2 methods method NNS 10_1101-2021_02_08_430070 257 3 , , , 10_1101-2021_02_08_430070 257 4 14(4 14(4 CD 10_1101-2021_02_08_430070 257 5 ) ) -RRB- 10_1101-2021_02_08_430070 257 6 , , , 10_1101-2021_02_08_430070 257 7 407 407 CD 10_1101-2021_02_08_430070 257 8 . . . 10_1101-2021_02_08_430070 258 1 Stoiber Stoiber NNP 10_1101-2021_02_08_430070 258 2 , , , 10_1101-2021_02_08_430070 258 3 M. M. NNP 10_1101-2021_02_08_430070 258 4 H. H. NNP 10_1101-2021_02_08_430070 258 5 et et NNP 10_1101-2021_02_08_430070 258 6 al al NNP 10_1101-2021_02_08_430070 258 7 . . . 10_1101-2021_02_08_430070 259 1 ( ( -LRB- 10_1101-2021_02_08_430070 259 2 2016 2016 CD 10_1101-2021_02_08_430070 259 3 ) ) -RRB- 10_1101-2021_02_08_430070 259 4 . . . 10_1101-2021_02_08_430070 260 1 De De NNP 10_1101-2021_02_08_430070 260 2 novo novo NNP 10_1101-2021_02_08_430070 260 3 identification identification NN 10_1101-2021_02_08_430070 260 4 of of IN 10_1101-2021_02_08_430070 260 5 dna dna NNP 10_1101-2021_02_08_430070 260 6 modifications modification NNS 10_1101-2021_02_08_430070 260 7 enabled enable VBN 10_1101-2021_02_08_430070 260 8 by by IN 10_1101-2021_02_08_430070 260 9 genome genome NN 10_1101-2021_02_08_430070 260 10 - - HYPH 10_1101-2021_02_08_430070 260 11 guided guide VBN 10_1101-2021_02_08_430070 260 12 nanopore nanopore JJ 10_1101-2021_02_08_430070 260 13 signal signal NN 10_1101-2021_02_08_430070 260 14 processing processing NN 10_1101-2021_02_08_430070 260 15 . . . 10_1101-2021_02_08_430070 261 1 BioRxiv biorxiv NN 10_1101-2021_02_08_430070 261 2 , , , 10_1101-2021_02_08_430070 261 3 page page NN 10_1101-2021_02_08_430070 261 4 094672 094672 CD 10_1101-2021_02_08_430070 261 5 . . . 10_1101-2021_02_08_430070 262 1 Vaswani Vaswani NNP 10_1101-2021_02_08_430070 262 2 , , , 10_1101-2021_02_08_430070 262 3 A. A. NNP 10_1101-2021_02_08_430070 262 4 et et FW 10_1101-2021_02_08_430070 262 5 al al NNP 10_1101-2021_02_08_430070 262 6 . . . 10_1101-2021_02_08_430070 263 1 ( ( -LRB- 10_1101-2021_02_08_430070 263 2 2017 2017 CD 10_1101-2021_02_08_430070 263 3 ) ) -RRB- 10_1101-2021_02_08_430070 263 4 . . . 10_1101-2021_02_08_430070 264 1 Attention attention NN 10_1101-2021_02_08_430070 264 2 is be VBZ 10_1101-2021_02_08_430070 264 3 all all DT 10_1101-2021_02_08_430070 264 4 you -PRON- PRP 10_1101-2021_02_08_430070 264 5 need need VBP 10_1101-2021_02_08_430070 264 6 . . . 10_1101-2021_02_08_430070 265 1 pages page NNS 10_1101-2021_02_08_430070 265 2 5998–6008 5998–6008 CD 10_1101-2021_02_08_430070 265 3 . . . 10_1101-2021_02_08_430070 266 1 Yuen Yuen NNP 10_1101-2021_02_08_430070 266 2 , , , 10_1101-2021_02_08_430070 266 3 Z. Z. NNP 10_1101-2021_02_08_430070 266 4 W.-S. W.-S. NNP 10_1101-2021_02_08_430070 266 5 et et NNP 10_1101-2021_02_08_430070 266 6 al al NNP 10_1101-2021_02_08_430070 266 7 . . . 10_1101-2021_02_08_430070 267 1 ( ( -LRB- 10_1101-2021_02_08_430070 267 2 2020 2020 CD 10_1101-2021_02_08_430070 267 3 ) ) -RRB- 10_1101-2021_02_08_430070 267 4 . . . 10_1101-2021_02_08_430070 268 1 Systematic systematic JJ 10_1101-2021_02_08_430070 268 2 benchmarking benchmarking NN 10_1101-2021_02_08_430070 268 3 of of IN 10_1101-2021_02_08_430070 268 4 tools tool NNS 10_1101-2021_02_08_430070 268 5 for for IN 10_1101-2021_02_08_430070 268 6 cpg cpg NNP 10_1101-2021_02_08_430070 268 7 methylation methylation NN 10_1101-2021_02_08_430070 268 8 detection detection NN 10_1101-2021_02_08_430070 268 9 from from IN 10_1101-2021_02_08_430070 268 10 nanopore nanopore JJ 10_1101-2021_02_08_430070 268 11 sequencing sequencing NN 10_1101-2021_02_08_430070 268 12 . . . 10_1101-2021_02_08_430070 269 1 bioRxiv biorxiv NN 10_1101-2021_02_08_430070 269 2 . . . 10_1101-2021_02_08_430070 270 1 .license .license NNP 10_1101-2021_02_08_430070 270 2 CC CC NNP 10_1101-2021_02_08_430070 270 3 - - HYPH 10_1101-2021_02_08_430070 270 4 BY BY NNP 10_1101-2021_02_08_430070 270 5 - - HYPH 10_1101-2021_02_08_430070 270 6 NC NC NNP 10_1101-2021_02_08_430070 270 7 - - HYPH 10_1101-2021_02_08_430070 270 8 ND ND NNP 10_1101-2021_02_08_430070 270 9 4.0 4.0 CD 10_1101-2021_02_08_430070 270 10 Internationalpeer Internationalpeer NNP 10_1101-2021_02_08_430070 270 11 review review NN 10_1101-2021_02_08_430070 270 12 ) ) -RRB- 10_1101-2021_02_08_430070 270 13 is be VBZ 10_1101-2021_02_08_430070 270 14 the the DT 10_1101-2021_02_08_430070 270 15 author author NN 10_1101-2021_02_08_430070 270 16 / / SYM 10_1101-2021_02_08_430070 270 17 funder funder NN 10_1101-2021_02_08_430070 270 18 , , , 10_1101-2021_02_08_430070 270 19 who who WP 10_1101-2021_02_08_430070 270 20 has have VBZ 10_1101-2021_02_08_430070 270 21 granted grant VBN 10_1101-2021_02_08_430070 270 22 bioRxiv biorxiv IN 10_1101-2021_02_08_430070 270 23 a a DT 10_1101-2021_02_08_430070 270 24 license license NN 10_1101-2021_02_08_430070 270 25 to to TO 10_1101-2021_02_08_430070 270 26 display display VB 10_1101-2021_02_08_430070 270 27 the the DT 10_1101-2021_02_08_430070 270 28 preprint preprint NN 10_1101-2021_02_08_430070 270 29 in in IN 10_1101-2021_02_08_430070 270 30 perpetuity perpetuity NN 10_1101-2021_02_08_430070 270 31 . . . 10_1101-2021_02_08_430070 271 1 It -PRON- PRP 10_1101-2021_02_08_430070 271 2 is be VBZ 10_1101-2021_02_08_430070 271 3 made make VBN 10_1101-2021_02_08_430070 271 4 available available JJ 10_1101-2021_02_08_430070 271 5 under under IN 10_1101-2021_02_08_430070 271 6 a a DT 10_1101-2021_02_08_430070 271 7 The the DT 10_1101-2021_02_08_430070 271 8 copyright copyright NN 10_1101-2021_02_08_430070 271 9 holder holder NN 10_1101-2021_02_08_430070 271 10 for for IN 10_1101-2021_02_08_430070 271 11 this this DT 10_1101-2021_02_08_430070 271 12 preprint preprint NN 10_1101-2021_02_08_430070 271 13 ( ( -LRB- 10_1101-2021_02_08_430070 271 14 which which WDT 10_1101-2021_02_08_430070 271 15 was be VBD 10_1101-2021_02_08_430070 271 16 not not RB 10_1101-2021_02_08_430070 271 17 certified certify VBN 10_1101-2021_02_08_430070 271 18 bythis bythis DT 10_1101-2021_02_08_430070 271 19 version version NN 10_1101-2021_02_08_430070 271 20 posted post VBD 10_1101-2021_02_08_430070 271 21 February February NNP 10_1101-2021_02_08_430070 271 22 10 10 CD 10_1101-2021_02_08_430070 271 23 , , , 10_1101-2021_02_08_430070 271 24 2021 2021 CD 10_1101-2021_02_08_430070 271 25 . . . 10_1101-2021_02_08_430070 271 26 ; ; : 10_1101-2021_02_08_430070 271 27 https://doi.org/10.1101/2021.02.08.430070doi https://doi.org/10.1101/2021.02.08.430070doi NFP 10_1101-2021_02_08_430070 271 28 : : : 10_1101-2021_02_08_430070 271 29 bioRxiv biorxiv VB 10_1101-2021_02_08_430070 271 30 preprint preprint NN 10_1101-2021_02_08_430070 271 31 https://doi.org/10.1101/2021.02.08.430070 https://doi.org/10.1101/2021.02.08.430070 UH 10_1101-2021_02_08_430070 271 32 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_02_08_430070 271 33 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD