IJCSNS - International Journal of Computer Science and Network Security

To search, Click below search items.

All Published Papers Search Service

Title	Extracting Content for News Web Pages based on DOM
Author	Hua Geng, Qiang Gao, Jingui Pan
Citation	Vol. 7 No. 2 pp. 124-129
Abstract	Nowadays, RSS is becoming a hot topic for Web applications. A lot of famous Web sites have provided RSS for users. However, making RSS files manually is boring, and so far, most sites haven’t provided such a service. In this paper, we mainly describe the design, implementation and evaluation of HTML2RSS, a system to extract content from HTML Web pages based on DOM structure, and generate RSS files automatically with the extracted content. We introduce two algorithms to extract information from semi-structured Web data. The goal of HTML2RSS is to provide users with RSS files as a substitute of the HTML pages.
Keywords	Web information extracting, DOM, XML, time pattern, RSS
URL	http://paper.ijcsns.org/07_book/200702/200702A17.pdf