DOI
Publication date
2016/02/14
Pages
1–5
← 2016 Papers
This paper proposes a strip-based fast and robust text detection algorithm for low cost embedded devices such as scanners/printers that is designed to operate with minimal memory requirements. Generally speaking, the unavailability of the whole document at once along with other memory and processing speed constraints pose a significant challenge. While conventional approaches process the whole image/page with intensive algorithms to get a desirable result, our algorithm processes strips of the page very efficiently in terms of speed and memory allocation. To this effect, a DCT block based approach along with appropriate pre and post-processing algorithms is used to create a map of text pixels from the original page while suppressing any non-text background, graphics or images. The proposed algorithm is able to detect text pixels from documents of varying backgrounds, colors and non-textual portions. This algorithm is simulated in both MATLAB and C programming languages and tested using a Beagle Board to simulate a low processing CPU on a wide variety of documents. The average execution time for a full 8.5x11 page scanned at 300 dpi is approximately 0.5 sec. in C and about 3 seconds on the Beagle board.
None listed