To search, Click below search items.


All Published Papers Search Service


Towards A New Token Based Framework for Record Linkage in Arabic Data Set


Hesham H. Abdel Ghafour, Ali El-Bastawissy, Abdelfatah A. Hegazy


Vol. 11  No. 6  pp. 146-151


Record linkage is the process of identifying if two records represent the same real entity or not. Record Linkage is one of the most important and most investigated issue in data quality literature. Most of the current researches have been applied on English context and these researches didn’t mention the required modifications in order to be applicable in other contexts like Arabic context. Applying record linkage algorithms on Arabic context is a challenging task due to the unique characteristics of Arabic language in terms of its morphological and orthographical features. This paper proposed a token based framework for record linkage in Arabic data set. In our framework we use a new technique for Arabic name tokenization and use a new approach for similarity computation.


Arabic Data Cleaning, Data Quality, Duplicate Detection, Data warehouse, Entity Resolution, Record Linkage, Object Identification, String Similarity