Volume : IX, Issue : VI, June - 2019

A COMPREHENSIVE STUDY ON CLASSIFICATION OF AUTOMATED CATEGORIZATION OF WEB SITES: A PROPOSED METHODOLOGY

Amish D. Vyas, Dr. Yogesh Kumar Sharma

Abstract :

Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW. It has been difficult to manage information present on web than to create it. Web page categorization can be defined as an approach to categorize the web pages based on a set of predefined categories to manage large web content. Yahoo! and ODP are the examples of web directories in which pages are categorized manually or semi automatically, but it is a very time consuming task. There are many ways of categorizing web pages using different techniques. An approach to categorize web pages automatically on the basis of characteristics of web pages using neural network based single discrete perceptron training algorithm which is extended by selecting web page specific features to categorize web pages of predefined categories with high accuracy. The idea is presented with the help of two specific and major categories of web pages chosen for categorization that are newspaper and education. Classification of Web pages is one of the challenging and important task as there is an increase in web pages in day to day life provided by internet. There are many ways of classifying web pages based on different approach and features. In this paper, a soft computing approach is proposed for classification of websites based on features extracted from URLs alone. The Open Directory Project dataset was considered and the proposed system classified the websites into various categories using Naive Bayes approach. The agenda of this paper is first to introduce the concepts related to web mining and then to provide a comprehensive review of different classification techniques. One of the classification algorithms used in WebDoc is based on Bayes’ theorem from probability theory. This paper focuses upon three aspects of this approach: different event models for the naive Bayes method, different probability smoothing methods, and different feature selection methods. In this paper, we report the performance of each method in terms of recall, precision, and F–measures. Experimental results show that the WebDoc system can classify Web documents effectively and efficiently

Article:Download PDF Journal DOI : 10.15373/2249555X

Cite This Article:

A COMPREHENSIVE STUDY ON CLASSIFICATION OF AUTOMATED CATEGORIZATION OF WEB SITES: A PROPOSED METHODOLOGY, Amish D. Vyas, Dr. Yogesh Kumar Sharma INDIAN JOURNAL OF APPLIED RESEARCH : Volume-9 | Issue-6 | June-2019


Number of Downloads : 78


References :