To search, Click below search items.


All Published Papers Search Service


Amended Data Fusion Similarity Measurement based on Genetic Algorithm for Chemical Database Retrieval


Yahya Ali Abdelrahman Ali, Ahmed Hamza Osman, and Suad Mohammed


Vol. 22  No. 1  pp. 530-538


Virtual screening (VS) is a computer scheme used in the study of medicine development. VS is often used in computer-aided searches for novel lead compounds based on chemical similarity. Similarity retrieving is a technique for identifying molecules that are architecturally matched to a target chemical, which is beneficial in the discovery of new medicines. In the majority of traditional similarity methods, the molecular characteristics of biological and non-biologically linked activities are given equal weight. However, it has been shown that some distinguishing characteristics are more significant than others, depending on the chemical structure. As a result, this distinction should be considered when assigning a higher weight to each significant piece. The main objective for this study is to optimize weights of different similarity measures in data fusion for searching chemical database by applying a genetic algorithm (GA). In this paper, comparisons of various coefficient fusions were carried out. The results show that the Tanimoto, Cosine, Kulcznski (2) and Fossum coefficients are the best single coefficient. Cosine and Fossum coefficients gave the best combination for 2-coefficient fusion with weightings of 0.960 and 0.937, respectively. For 3-coefficient fusion, Russell-Rao, A Tanimoto and Cosine coefficient, of weightings 0.972, 0.960 and 0.960 respectively, give the best result. Combinations of Tanimoto and Cosine coefficients perform well and give a large number of actives. Using combination, with weights ranging between 0.0 and 1.0 generated by genetic algorithm, gave a better number of active than the non-weighted combination. Combining Cosine and Fossum coefficients without weights yields an average of 21.89% among the top 10% of compounds, whereas when a genetic algorithm (GA) is used to combine Cosine and Fossum coefficients with weights of 0.960 and 0.937, respectively, an average of 22.16% among the top 10% of compounds is obtained. Generally speaking, combinations of coefficients performed better than single coefficients.


Compounds; Data Fusion; Similarity; Chemical Database; Genetic Algorithm; coefficients