id author title date pages extension mime words sentences flesch summary cache txt work_rujstqwje5gkdm2bqer4drugey Ellie Pavlick The Language Demographics of Amazon Mechanical Turk 2014 14 .pdf application/pdf 7172 862 67 bilingual parallel corpora in six Indian languages, and use them to train statistical machine translation systems. study by posting 25 sentences to MTurk for Spanish, Chinese, Hindi, Telugu, Urdu, and Haitian Creole. crowdsourcing to construct a 1.5 million word parallel corpus of dialect Arabic and English, training a statistical machine translation system that produced higher quality translations of dialect Arabic We created parallel corpora by translating the 100 most viewed Wikipedia pages in Bengali, Malyalam, Hindi, Tamil, Telugu, and Urdu into Figure 4: Translation quality for languages with at least 50 Turkers. For single word translations, we calculate the quality of translations on the level of individual assignments and aggregated over workers and languages. for the 51 foreign languages that Google Translate covered at the time of the study. Table 3 shows the differences in translation quality when computed using in-region versus out-ofregion Turkers, for the languages with the greatest ./cache/work_rujstqwje5gkdm2bqer4drugey.pdf ./txt/work_rujstqwje5gkdm2bqer4drugey.txt