To search, Click below search items.

 

All Published Papers Search Service

Title

Extracting Content for News Web Pages based on DOM

Author

Hua Geng, Qiang Gao, Jingui Pan

Citation

Vol. 7  No. 2  pp. 124-129

Abstract

Nowadays, RSS is becoming a hot topic for Web applications. A lot of famous Web sites have provided RSS for users. However, making RSS files manually is boring, and so far, most sites haven’t provided such a service. In this paper, we mainly describe the design, implementation and evaluation of HTML2RSS, a system to extract content from HTML Web pages based on DOM structure, and generate RSS files automatically with the extracted content. We introduce two algorithms to extract information from semi-structured Web data. The goal of HTML2RSS is to provide users with RSS files as a substitute of the HTML pages.

Keywords

Web information extracting, DOM, XML, time pattern, RSS

URL

http://paper.ijcsns.org/07_book/200702/200702A17.pdf