To search, Click below search items.


All Published Papers Search Service


Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF


Hasan J. Alyamani


Vol. 22  No. 1  pp. 283-287


Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.


Machine Learning, Supervised and Un-Supervised Learning, TF-IDF, BOW