To search, Click below search items.


All Published Papers Search Service


Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology


G T Raju, P S Satyanarayana


Vol. 8  No. 1  pp. 179-186


The exponential growth of the Web in terms of Web sites and their users during the last decade has generated huge amount of data related to the user’s interactions with the Web sites. This data is recorded in the Web access log files of Web servers and usually referred as Web Usage Data (WUD). Knowledge Discovery from Web Usage Data (KDWUD) is that area of Web mining deals with the application of data mining techniques to extract interesting knowledge from the WUD. As Web sites continue to grow in size and complexity, the results of KDWUD have become very critical for efficient and effective management of the activities related to: e-business, e-education, e-commerce, personalization, website design & management, network traffic analysis, the cache, the proxies, great diversity of Web pages in a site, search engine’s complexity, and to predict user’s actions. In this paper, we propose a complete preprocessing methodology, one of the important steps in KDWUD process. Several heuristics have been proposed for cleaning the WUD which is then aggregated and recorded in the relational data model. To validate the efficiency of the proposed preprocessing methodology, several experiments were conducted and the results shows that the proposed methodology reduces the size of Web access log files down to 73-82% of the initial size and offer richer logs that are structured for further stages of KDWUD.


Preprocessing, Knowledge Discovery, Web Usage Data, Web Usage Mining.