文摘
Segmentation of handwritten document images is a complex task due to the variability in the writing styles. The segmentation technique has to deal with non-uniformly skewed, overlapped and touching lines. A very few works have been carried out yet, addressing these issues. This paper presents a novel methodology for segmenting handwritten Malayalam documents into its constituent lines, words and characters addressing the issues mentioned. Water flow technique is used in extraction of text lines. An algorithm has been proposed for dealing with touching and overlapping lines. Words from the text lines are detected using Spiral Run Length Smearing Algorithm (SRLSA). Further, skew correction is done on extracted words and the skew corrected words are produced for character segmentation. Skew correction is incorporated for ease of the recognition stage in handwritten Malayalam OCR.