The OCR problem

Many documents are now being scanned and stored as images, making necessary the task of converting these documents back into text, for a computer to apply editing and text processing. However, this feature involves a number of challenges:

  • Variety of text fonts
  • Text size
  • Image noise
  • Manuscripts

In spite of these, humans can easily interpret and read even the text written in a bad-quality image. This can be explained by the fact that humans are already familiar with the text characters and the words in their language. Somehow, the algorithm must become acquainted with these elements (characters, digits, signalization, and so on), in order to successfully recognize text in images.

Simplifying the task – digit recognition

Although there are a variety of tools available in the market for OCR, it is still a big challenge for an algorithm to properly recognize text in images. So, we will be restricting our application to a small domain and deal with relatively simple problems. Therefore, in this chapter, we will implement a neural network to recognize the digits 0 to 9 represented in images. Also, the images will have standardized and small dimensions, for the sake of simplicity.

Approach to digit representation

We applied the standard dimension of 5 × 5 (25 pixels) in grayscale images, resulting in 25 grayscale values for each image, as shown in the following figure:

Approach to digit representation

In the preceding image, we have a shape of a circle representing the digit 0 at the left and a corresponding matrix with gray values for the same digit, in grayscale.

We apply this preprocessing in order to represent all the 10 digits in this application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset