To search, Click below search items.


All Published Papers Search Service


A Novel Information Search Approach for Languages without Word Delimiters


Lianlong Wu


Vol. 6  No. 5  pp. 59-63


In many languages there are no word delimiters among the text. It is very difficult to index articles in those languages. For example, Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information search system is developed and applied to the Chinese Web Test collection with 100 GB Web pages (CWT-100g). The experimental results show that the system is capable of searching Chinese information without segmentation of Chinese words and the speed of index build is reduced to the theoretical limitation. part of summary.


Search engine, segmentation of Chinese words, suffix tree, information system