id author title date pages extension mime words sentences flesch summary cache txt work_zv5hinl2kzfuzlpf37agkaqtpq Najlah Gali Using linguistic features to automatically extract web page title 2017 28 .pdf application/pdf 12281 1373 74 Please cite this article as: Najlah Gali , Radu Mariescu Istodor , Pasi Fränti , Using Linguistic Features to Automatically Extract Web page Title, Expert Systems With Applications (2017), doi: Using Linguistic Features to Automatically Extract Web page Title Abstract Existing methods for extracting titles from HTML web page mostly rely on visual and structural features. morphosyntactic characteristics of known titles and define a part-of-speech tag patterns that help to extract candidate phrases However, the modern web page design allows the title to appear as a part of other phrases in the text node of the features: syntactic structure, similarity with the link of the web page, appearance in the title tag, appearance in meta tags, In both tree representations, existing methods use the entire text of the leaf nodes as candidate titles. We have proposed a new method to extract the title from HTML web pages using text segmentation, statistical features of the ./cache/work_zv5hinl2kzfuzlpf37agkaqtpq.pdf ./txt/work_zv5hinl2kzfuzlpf37agkaqtpq.txt