To search, Click below search items.


All Published Papers Search Service


Effective Dimension Reduction Techniques for Text Documents


P. Ponmuthuramalingam, T. Devi


Vol. 10  No. 7  pp. 101-109


Frequent term based text clustering is a text clustering technique, which uses frequent term set and dramatically decreases the dimensionality of the document vector space, thus especially addressing: very high dimensionality of the data and very large size of the databases. Frequent Term based Clustering algorithm (FTC) has shown significant efficiency comparing to some well known text clustering methods, but the quality of clustering still needs further enhancement. In this paper, the morphological variant words, stop words and grammatical words are identified and removed for further dimension reduction. Two effective dimension reduction algorithms, improved stemming and frequent term generation algorithms have been presented. An experiment on classical text documents as well as on web documents demonstrates that the developed algorithms yield good dimension reduction.


Dimension reduction, Latent semantic, Information retrieval, Text representation, Text documents