To search, Click below search items.


All Published Papers Search Service


Application of Modified General Regression Model to Cluster Protein Sequences


G Lavanya Devi, Allam Appa Rao, A Damodaram, GR Sridhar, G Jaya Suma


Vol. 8  No. 4  pp. 225-231


Cluster analysis is the study of techniques for finding the most representative cluster prototypes. Linear relation of two sequences can be modeled perfectly through the classical linear regression model. Protein sequence clustering has many applications such as helps in classifying a new sequence, predicting the protein structure of unknown sequence and finding the family and subfamily relationships of protein sequences. To cluster a repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we have proposed a new technique named General Regression Model Technique (GRMT1) to test the linearity of the sequences. Later we have applied General Regression Model Technique Clustering Algorithm (GRMTCA) to cluster the protein sequences. The performance of the algorithm was evaluated with 50 protein sequences. We used BLAST to annotate the clusters obtained by GRMTCA. It is observed that the clusters have biological significance.


Clustering, BLAST, General Regression Model, Protein Sequences