id sid tid token lemma pos kw52j67652r 1 1 natural natural ADJ kw52j67652r 1 2 language language NOUN kw52j67652r 1 3 processing processing NOUN kw52j67652r 1 4 ( ( PUNCT kw52j67652r 1 5 nlp nlp PROPN kw52j67652r 1 6 ) ) PUNCT kw52j67652r 1 7 models model NOUN kw52j67652r 1 8 the the DET kw52j67652r 1 9 different different ADJ kw52j67652r 1 10 techniques technique NOUN kw52j67652r 1 11 computers computer NOUN kw52j67652r 1 12 use use VERB kw52j67652r 1 13 to to PART kw52j67652r 1 14 understand understand VERB kw52j67652r 1 15 and and CCONJ kw52j67652r 1 16 interpret interpret VERB kw52j67652r 1 17 human human ADJ kw52j67652r 1 18 languages language NOUN kw52j67652r 1 19 . . PUNCT kw52j67652r 2 1 nlp nlp PROPN kw52j67652r 2 2 covers cover VERB kw52j67652r 2 3 a a DET kw52j67652r 2 4 wide wide ADJ kw52j67652r 2 5 range range NOUN kw52j67652r 2 6 of of ADP kw52j67652r 2 7 sub sub NOUN kw52j67652r 2 8 - - NOUN kw52j67652r 2 9 topics topic NOUN kw52j67652r 2 10 such such ADJ kw52j67652r 2 11 as as ADP kw52j67652r 2 12 syntax syntax NOUN kw52j67652r 2 13 ( ( PUNCT kw52j67652r 2 14 analyzing analyze VERB kw52j67652r 2 15 if if SCONJ kw52j67652r 2 16 words word NOUN kw52j67652r 2 17 in in ADP kw52j67652r 2 18 an an DET kw52j67652r 2 19 utterance utterance NOUN kw52j67652r 2 20 are be AUX kw52j67652r 2 21 well well ADV kw52j67652r 2 22 arranged arranged ADJ kw52j67652r 2 23 ) ) PUNCT kw52j67652r 2 24 , , PUNCT kw52j67652r 2 25 semantics semantic NOUN kw52j67652r 2 26 ( ( PUNCT kw52j67652r 2 27 understanding understand VERB kw52j67652r 2 28 the the DET kw52j67652r 2 29 meaning meaning NOUN kw52j67652r 2 30 of of ADP kw52j67652r 2 31 combined combine VERB kw52j67652r 2 32 words word NOUN kw52j67652r 2 33 ) ) PUNCT kw52j67652r 2 34 , , PUNCT kw52j67652r 2 35 and and CCONJ kw52j67652r 2 36 discourse discourse NOUN kw52j67652r 2 37 . . PUNCT kw52j67652r 3 1 most most ADJ kw52j67652r 3 2 state state NOUN kw52j67652r 3 3 - - PUNCT kw52j67652r 3 4 of of ADP kw52j67652r 3 5 - - PUNCT kw52j67652r 3 6 the the DET kw52j67652r 3 7 - - PUNCT kw52j67652r 3 8 art art NOUN kw52j67652r 3 9 nlp nlp NOUN kw52j67652r 3 10 systems system NOUN kw52j67652r 3 11 feed feed VERB kw52j67652r 3 12 large large ADJ kw52j67652r 3 13 amounts amount NOUN kw52j67652r 3 14 of of ADP kw52j67652r 3 15 natural natural ADJ kw52j67652r 3 16 language language NOUN kw52j67652r 3 17 text text NOUN kw52j67652r 3 18 into into ADP kw52j67652r 3 19 different different ADJ kw52j67652r 3 20 models model NOUN kw52j67652r 3 21 for for ADP kw52j67652r 3 22 training training NOUN kw52j67652r 3 23 and and CCONJ kw52j67652r 3 24 testing testing NOUN kw52j67652r 3 25 . . PUNCT kw52j67652r 4 1 one one NUM kw52j67652r 4 2 problem problem NOUN kw52j67652r 4 3 with with ADP kw52j67652r 4 4 natural natural ADJ kw52j67652r 4 5 language language NOUN kw52j67652r 4 6 corpora corpora NOUN kw52j67652r 4 7 is be AUX kw52j67652r 4 8 the the DET kw52j67652r 4 9 unbalanced unbalanced ADJ kw52j67652r 4 10 frequency frequency NOUN kw52j67652r 4 11 of of ADP kw52j67652r 4 12 rare rare ADJ kw52j67652r 4 13 terms term NOUN kw52j67652r 4 14 against against ADP kw52j67652r 4 15 commonly commonly ADV kw52j67652r 4 16 used use VERB kw52j67652r 4 17 words word NOUN kw52j67652r 4 18 . . PUNCT kw52j67652r 5 1 the the DET kw52j67652r 5 2 word word NOUN kw52j67652r 5 3 - - PUNCT kw52j67652r 5 4 level level NOUN kw52j67652r 5 5 frequency frequency NOUN kw52j67652r 5 6 in in ADP kw52j67652r 5 7 natural natural ADJ kw52j67652r 5 8 language language NOUN kw52j67652r 5 9 creates create VERB kw52j67652r 5 10 irregular irregular ADJ kw52j67652r 5 11 sparsity sparsity NOUN kw52j67652r 5 12 patterns pattern NOUN kw52j67652r 5 13 , , PUNCT kw52j67652r 5 14 and and CCONJ kw52j67652r 5 15 these these DET kw52j67652r 5 16 patterns pattern NOUN kw52j67652r 5 17 generate generate VERB kw52j67652r 5 18 sparse sparse ADJ kw52j67652r 5 19 data datum NOUN kw52j67652r 5 20 structures structure NOUN kw52j67652r 5 21 that that PRON kw52j67652r 5 22 do do AUX kw52j67652r 5 23 not not PART kw52j67652r 5 24 perform perform VERB kw52j67652r 5 25 well well ADV kw52j67652r 5 26 on on ADP kw52j67652r 5 27 parallel parallel ADJ kw52j67652r 5 28 architectures architecture NOUN kw52j67652r 5 29 . . PUNCT kw52j67652r 6 1 asynchronous asynchronous ADJ kw52j67652r 6 2 methods method NOUN kw52j67652r 6 3 work work VERB kw52j67652r 6 4 best well ADV kw52j67652r 6 5 on on ADP kw52j67652r 6 6 specific specific ADJ kw52j67652r 6 7 sparse sparse ADJ kw52j67652r 6 8 distributions distribution NOUN kw52j67652r 6 9 . . PUNCT kw52j67652r 7 1 ideally ideally ADV kw52j67652r 7 2 , , PUNCT kw52j67652r 7 3 the the DET kw52j67652r 7 4 entire entire ADJ kw52j67652r 7 5 computation computation NOUN kw52j67652r 7 6 time time NOUN kw52j67652r 7 7 should should AUX kw52j67652r 7 8 be be AUX kw52j67652r 7 9 spent spend VERB kw52j67652r 7 10 on on ADP kw52j67652r 7 11 dense dense ADJ kw52j67652r 7 12 values value NOUN kw52j67652r 7 13 only only ADV kw52j67652r 7 14 , , PUNCT kw52j67652r 7 15 and and CCONJ kw52j67652r 7 16 computation computation NOUN kw52j67652r 7 17 time time NOUN kw52j67652r 7 18 on on ADP kw52j67652r 7 19 sparse sparse ADJ kw52j67652r 7 20 elements element NOUN kw52j67652r 7 21 should should AUX kw52j67652r 7 22 be be AUX kw52j67652r 7 23 minimized.graphics minimized.graphic NOUN kw52j67652r 7 24 processing processing NOUN kw52j67652r 7 25 units unit NOUN kw52j67652r 7 26 ( ( PUNCT kw52j67652r 7 27 gpu gpu X kw52j67652r 7 28 ) ) PUNCT kw52j67652r 7 29 are be AUX kw52j67652r 7 30 widely widely ADV kw52j67652r 7 31 used use VERB kw52j67652r 7 32 to to PART kw52j67652r 7 33 process process VERB kw52j67652r 7 34 a a DET kw52j67652r 7 35 large large ADJ kw52j67652r 7 36 quantity quantity NOUN kw52j67652r 7 37 of of ADP kw52j67652r 7 38 operations operation NOUN kw52j67652r 7 39 in in ADP kw52j67652r 7 40 parallel parallel NOUN kw52j67652r 7 41 . . PUNCT kw52j67652r 8 1 a a DET kw52j67652r 8 2 problem problem NOUN kw52j67652r 8 3 with with ADP kw52j67652r 8 4 the the DET kw52j67652r 8 5 use use NOUN kw52j67652r 8 6 of of ADP kw52j67652r 8 7 these these DET kw52j67652r 8 8 accelerators accelerator NOUN kw52j67652r 8 9 is be AUX kw52j67652r 8 10 that that SCONJ kw52j67652r 8 11 not not PART kw52j67652r 8 12 all all DET kw52j67652r 8 13 computation computation NOUN kw52j67652r 8 14 problems problem NOUN kw52j67652r 8 15 can can AUX kw52j67652r 8 16 be be AUX kw52j67652r 8 17 parallelized parallelize VERB kw52j67652r 8 18 , , PUNCT kw52j67652r 8 19 and and CCONJ kw52j67652r 8 20 some some DET kw52j67652r 8 21 parallel parallel ADJ kw52j67652r 8 22 adaptations adaptation NOUN kw52j67652r 8 23 run run VERB kw52j67652r 8 24 slower slow ADJ kw52j67652r 8 25 than than ADP kw52j67652r 8 26 a a DET kw52j67652r 8 27 serial serial ADJ kw52j67652r 8 28 cpu cpu NOUN kw52j67652r 8 29 counterpart counterpart NOUN kw52j67652r 8 30 . . PUNCT kw52j67652r 9 1 using use VERB kw52j67652r 9 2 gpus gpus PROPN kw52j67652r 9 3 to to PART kw52j67652r 9 4 process process VERB kw52j67652r 9 5 sparse sparse ADJ kw52j67652r 9 6 structures structure NOUN kw52j67652r 9 7 of of ADP kw52j67652r 9 8 different different ADJ kw52j67652r 9 9 sizes size NOUN kw52j67652r 9 10 poses pose VERB kw52j67652r 9 11 additional additional ADJ kw52j67652r 9 12 problems problem NOUN kw52j67652r 9 13 . . PUNCT kw52j67652r 10 1 a a DET kw52j67652r 10 2 large large ADJ kw52j67652r 10 3 part part NOUN kw52j67652r 10 4 of of ADP kw52j67652r 10 5 the the DET kw52j67652r 10 6 computation computation NOUN kw52j67652r 10 7 time time NOUN kw52j67652r 10 8 will will AUX kw52j67652r 10 9 be be AUX kw52j67652r 10 10 spent spend VERB kw52j67652r 10 11 on on ADP kw52j67652r 10 12 sparse sparse ADJ kw52j67652r 10 13 regions region NOUN kw52j67652r 10 14 if if SCONJ kw52j67652r 10 15 the the DET kw52j67652r 10 16 parallel parallel ADJ kw52j67652r 10 17 implementations implementation NOUN kw52j67652r 10 18 do do AUX kw52j67652r 10 19 not not PART kw52j67652r 10 20 take take VERB kw52j67652r 10 21 advantage advantage NOUN kw52j67652r 10 22 of of ADP kw52j67652r 10 23 the the DET kw52j67652r 10 24 partially partially ADV kw52j67652r 10 25 dense dense ADJ kw52j67652r 10 26 properties property NOUN kw52j67652r 10 27 of of ADP kw52j67652r 10 28 the the DET kw52j67652r 10 29 input.significant input.significant NOUN kw52j67652r 10 30 speedups speedup NOUN kw52j67652r 10 31 are be AUX kw52j67652r 10 32 achieved achieve VERB kw52j67652r 10 33 when when SCONJ kw52j67652r 10 34 a a DET kw52j67652r 10 35 parallel parallel ADJ kw52j67652r 10 36 implementation implementation NOUN kw52j67652r 10 37 is be AUX kw52j67652r 10 38 tailored tailor VERB kw52j67652r 10 39 to to ADP kw52j67652r 10 40 the the DET kw52j67652r 10 41 sparsity sparsity NOUN kw52j67652r 10 42 pattern pattern NOUN kw52j67652r 10 43 of of ADP kw52j67652r 10 44 the the DET kw52j67652r 10 45 problem problem NOUN kw52j67652r 10 46 being be AUX kw52j67652r 10 47 solved solve VERB kw52j67652r 10 48 and and CCONJ kw52j67652r 10 49 the the DET kw52j67652r 10 50 targeted target VERB kw52j67652r 10 51 architecture architecture NOUN kw52j67652r 10 52 . . PUNCT kw52j67652r 11 1 our our PRON kw52j67652r 11 2 work work NOUN kw52j67652r 11 3 adapts adapt VERB kw52j67652r 11 4 methods method NOUN kw52j67652r 11 5 used use VERB kw52j67652r 11 6 in in ADP kw52j67652r 11 7 nlp nlp PROPN kw52j67652r 11 8 to to PART kw52j67652r 11 9 run run VERB kw52j67652r 11 10 efficiently efficiently ADV kw52j67652r 11 11 on on ADP kw52j67652r 11 12 a a DET kw52j67652r 11 13 parallel parallel ADJ kw52j67652r 11 14 architecture architecture NOUN kw52j67652r 11 15 using use VERB kw52j67652r 11 16 high high ADJ kw52j67652r 11 17 performance performance NOUN kw52j67652r 11 18 computing computing NOUN kw52j67652r 11 19 concepts concept NOUN kw52j67652r 11 20 . . PUNCT kw52j67652r 12 1 all all DET kw52j67652r 12 2 contributions contribution NOUN kw52j67652r 12 3 focus focus VERB kw52j67652r 12 4 mainly mainly ADV kw52j67652r 12 5 on on ADP kw52j67652r 12 6 the the DET kw52j67652r 12 7 gpu gpu NOUN kw52j67652r 12 8 device device NOUN kw52j67652r 12 9 designed design VERB kw52j67652r 12 10 to to PART kw52j67652r 12 11 carry carry VERB kw52j67652r 12 12 out out ADP kw52j67652r 12 13 a a DET kw52j67652r 12 14 large large ADJ kw52j67652r 12 15 amount amount NOUN kw52j67652r 12 16 of of ADP kw52j67652r 12 17 computations computation NOUN kw52j67652r 12 18 faster fast ADV kw52j67652r 12 19 than than ADP kw52j67652r 12 20 several several ADJ kw52j67652r 12 21 off off ADP kw52j67652r 12 22 - - PUNCT kw52j67652r 12 23 the the DET kw52j67652r 12 24 - - PUNCT kw52j67652r 12 25 shelf shelf NOUN kw52j67652r 12 26 cpu cpu NOUN kw52j67652r 12 27 architectures architecture NOUN kw52j67652r 12 28 . . PUNCT kw52j67652r 13 1 this this DET kw52j67652r 13 2 dissertation dissertation NOUN kw52j67652r 13 3 covers cover VERB kw52j67652r 13 4 different different ADJ kw52j67652r 13 5 adaptations adaptation NOUN kw52j67652r 13 6 of of ADP kw52j67652r 13 7 sparse sparse ADJ kw52j67652r 13 8 nlp nlp NOUN kw52j67652r 13 9 algorithms algorithm NOUN kw52j67652r 13 10 to to ADP kw52j67652r 13 11 the the DET kw52j67652r 13 12 gpu gpu PROPN kw52j67652r 13 13 architecture architecture NOUN kw52j67652r 13 14 . . PUNCT kw52j67652r 14 1 we we PRON kw52j67652r 14 2 carry carry VERB kw52j67652r 14 3 out out ADP kw52j67652r 14 4 experiments experiment NOUN kw52j67652r 14 5 using use VERB kw52j67652r 14 6 different different ADJ kw52j67652r 14 7 gpu gpu PROPN kw52j67652r 14 8 architectures architecture NOUN kw52j67652r 14 9 and and CCONJ kw52j67652r 14 10 compare compare VERB kw52j67652r 14 11 the the DET kw52j67652r 14 12 performance performance NOUN kw52j67652r 14 13 on on ADP kw52j67652r 14 14 different different ADJ kw52j67652r 14 15 datasets dataset NOUN kw52j67652r 14 16 . . PUNCT kw52j67652r 15 1 our our PRON kw52j67652r 15 2 results result NOUN kw52j67652r 15 3 demonstrate demonstrate VERB kw52j67652r 15 4 that that SCONJ kw52j67652r 15 5 gpu gpu NOUN kw52j67652r 15 6 adaptations adaptation NOUN kw52j67652r 15 7 can can AUX kw52j67652r 15 8 significantly significantly ADV kw52j67652r 15 9 reduce reduce VERB kw52j67652r 15 10 the the DET kw52j67652r 15 11 execution execution NOUN kw52j67652r 15 12 time time NOUN kw52j67652r 15 13 of of ADP kw52j67652r 15 14 different different ADJ kw52j67652r 15 15 sparse sparse ADJ kw52j67652r 15 16 nlp nlp NOUN kw52j67652r 15 17 algorithms algorithm NOUN kw52j67652r 15 18 : : PUNCT kw52j67652r 15 19 6000x 6000x NUM kw52j67652r 15 20 speedup speedup NOUN kw52j67652r 15 21 on on ADP kw52j67652r 15 22 the the DET kw52j67652r 15 23 viterbi viterbi PROPN kw52j67652r 15 24 task task PROPN kw52j67652r 15 25 , , PUNCT kw52j67652r 15 26 4.5x 4.5x NUM kw52j67652r 15 27 speedup speedup NOUN kw52j67652r 15 28 on on ADP kw52j67652r 15 29 the the DET kw52j67652r 15 30 composition composition NOUN kw52j67652r 15 31 task task NOUN kw52j67652r 15 32 , , PUNCT kw52j67652r 15 33 7x 7x NUM kw52j67652r 15 34 speedup speedup NOUN kw52j67652r 15 35 on on ADP kw52j67652r 15 36 a a DET kw52j67652r 15 37 batched batched NOUN kw52j67652r 15 38 forward forward ADV kw52j67652r 15 39 - - PUNCT kw52j67652r 15 40 backward backward ADJ kw52j67652r 15 41 method method NOUN kw52j67652r 15 42 , , PUNCT kw52j67652r 15 43 and and CCONJ kw52j67652r 15 44 50x 50x NUM kw52j67652r 15 45 improvement improvement NOUN kw52j67652r 15 46 on on ADP kw52j67652r 15 47 batched batched NOUN kw52j67652r 15 48 operations operation NOUN kw52j67652r 15 49 seen see VERB kw52j67652r 15 50 in in ADP kw52j67652r 15 51 deep deep ADJ kw52j67652r 15 52 learning learning NOUN kw52j67652r 15 53 . . PUNCT