To search, Click below search items.


All Published Papers Search Service


Reconstruction of a Complete Dataset from an Incomplete Dataset by Expectation Maximization Technique: Some Results


Sameer S. Prabhune, S.R. Sathe


Vol. 10  No. 11  pp. 141-144


Preprocessing is a crucial step used for variety of data warehousing and mining. Real world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. Accuracy of any mining algorithm greatly depends on the input datasets. In this paper we describes an novel idea of predicting the missing values in the dataset by a well known principle of EM (Expectation Maximization) . After implementing and applying the EM filter, the dataset is completed with the estimated values, based on the well known principle of expected maximization of attribute instance. We demonstrate the efficacy of the approach on real data sets as a preprocessing step. The first section gives a brief introduction of the topic chosen for the implementation. In the second section we describe the preliminary tools that are required to develop this filter based on EM approach. In the third section we give the pseudo code for the EM technique for estimating the missing values. In the fourth section we discuss the implementation details for design and addition of this EM filter to WEKA workbench ( WEKA 3-5-4 ver.). Lastly experimental results from real-world data sets demonstrate the effectiveness of our method.


Data mining, Data preprocessing, Missing data