To search, Click below search items.


All Published Papers Search Service


A Compromise between N-gram Length and Classifier Characteristics for Protein Classification


Faouzi Mhamdi, Ricco Rakotomalala, Mourad Elloumi


Vol. 6  No. 4  pp. 82-87


Many scientific works deal with the protein classification problem and various learning methods and descriptors are used in them. In this paper, we want to systematize the analysis of the behavior of learning algorithms according to the features extracted from the primary description of proteins. We have used n-grams descriptors by testing the interaction between various length n of n-grams and the characteristics of the supervised learning methods. The main conclusion is that moderate length of n-grams (n = 2 or n = 3, ...) and linear support vector classifier (SVM) give the best compromise. But, a thorough analyze of the results puts into perspective this conclusion: the main characteristic which influences the accuracy of the classifier seems to be the dimensionality of the representation space.


Data mining, Protein Classification, n-grams, KNN, SVM, CART