id author title date pages extension mime words sentences flesch summary cache txt work_j4ym2limgffn3isereirctplfi Wei Xu Problems in Current Text Simplification Research: New Data Can Help 2015 16 .pdf application/pdf 8077 899 61 Simple Wikipedia has dominated simplification research in the past 5 years. reasons: 1) It is prone to automatic sentence alignment errors; 2) It contains a large proportion of inadequate simplifications; 3) It generalizes poorly to the sentence pairs in the PWKP corpus are not simplifications. Table 1: Example sentence pairs (NORM-SIMP) aligned between English Wikipedia and Simple English The breakdown in percentages is obtained through manual examination of 200 randomly sampled sentence pairs in the Parallel Wikipedia Simplification (PWKP) corpus. Siddharthan (2014)'s excellent survey of text simplification research states The Parallel Wikipedia Simplification (PWKP) corpus (Zhu et al., 2010) contains approximately Table 3: Example of sentences written at multiple levels of text complexity from the Newsela data set. Table 5: This table shows the vocabulary changes between different levels of simplification in the Newsela topics and degree of simplification between the Simple Wikipedia and the Newsela corpus. ./cache/work_j4ym2limgffn3isereirctplfi.pdf ./txt/work_j4ym2limgffn3isereirctplfi.txt