id author title date pages extension mime words sentences flesch summary cache txt work_cziuyuoc3ffuznqfiluyvprwpq Jie Zhou Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation 2016 14 .pdf application/pdf 8452 1051 78 Deep Recurrent Models with Fast-Forward Connections for Neural Machine This is the first time that a single NMT model achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. After special handling of unknown words and model ensembling, we obtain the best score reported to date 2003; Durrani et al., 2014) which consist of multiple separately tuned components, NMT models encode the source sequence into continuous representation space and generate the target sequence in an the systems based on these models can achieve similar performance to conventional SMT systems (Luong et al., 2015; Jean et al., 2015). With our deep attention model, the BLEU score can be improved to layers of the two columns process the word representations of the source sequence in different directions. Next we list the effect of the novel F-F connections in our Deep-Att model of shallow topology in Table 5: BLEU score of Deep-Att with different model ./cache/work_cziuyuoc3ffuznqfiluyvprwpq.pdf ./txt/work_cziuyuoc3ffuznqfiluyvprwpq.txt