To search, Click below search items.

 

All Published Papers Search Service

Title

A Finite State Model for Urdu Nastalique Optical Character Recognition

Author

Sohail Abdul Sattar, Shams-ul Haque, Mahmood Khan Pathan

Citation

Vol. 9  No. 9  pp. 116-122

Abstract

Finite state technology is being used since long to model NLP (Natural Language Processing) applications specially it has very successfully applied to machine translation and speech recognition systems. Character recognition in cursive scripts or handwritten Latin script also have attracted researchers’ attention and some research is also done in this area. Optical character recognition is the translation of optically scanned bitmaps of printed or written text into digitally editable data files. OCRs developed for many world languages are already under efficient use but none exist for Nastalique ? a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. Urdu has 39 characters against the Arabic 28. Each character then has 2-4 different shapes according to their position in the word: initial, medial, final and isolated. In Nastalique, word and character overlapping makes optical recognition more complex. Optical character recognition of the Latin script is relatively easier. This paper based on research on Nastalique OCR discusses a proposed finite state model for the optical recognition of Nastalique printed text.

Keywords

Nastalique, Script, Cursiveness, Feature, Automata, OCR

URL

http://paper.ijcsns.org/07_book/200909/20090915.pdf