Optical Character Recognition using Python and Google Tesseract OCR


Anirudh Mergu - May 11, 2018 - 18 comments

In this article, we will install Tesseract OCR on our system, verify the Installation and try Tesseract on some of the sample images.

In this article, We will

Step One – Installing Tesseract OCR

For macOS users, we’ll be using Homebrew to install Tesseract:

brew install tesseract

If you’re using the Ubuntu operating system, simply use aptget  to install Tesseract OCR:

sudo apt-get install tesseract-ocr

For Windows, please consult Tesseract documentation

Step Two – Verifying the Installation of Tesseract OCR

To validate that Tesseract has been successfully installed on your machine, execute the following commands:

tesseract -v

You should see the Tesseract version printed on your screen, along with a list of image file format libraries Tesseract is compatible with. For example,

tesseract 3.05.01
leptonica-1.74.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

If the Tesseract version is not displayed on your screen, a blank window may be opened and closed automatically.

If you get errors instead, then re-install Tesseract and make sure you update your PATH variable and try to open the console or the IDE which you are using with Administrative Privileges.

Step Three – Testing out Tesseract OCR

In order to obtain reasonable results, you need to supply images that are cleanly pre-processed and crisp.

Recommendations:

  • Use images with high resolution and DPI possible.
  • Make sure that the text is clearly visible and with no pixelations or deformations.

The GitHub repository for this tutorial will be available here

Let’s start coding now:

Create a file named ocr_main.py (I chose it, you can name it whatever you want)

1. Import necessary libraries

import cv2
import pytesseract
from PIL import Image

2. Get the path of the image file we are working on. I’m going to store the path to the file in a variable called path

# Get File Name from Command Line
path = input("Enter the file path : ").strip()

3. Load the image data and store it in the variable image

# load the image
image = cv2.imread(path)

4. Convert the image to grayscale for better recognition of text and store the data in gray

# Converting to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

5. If you want to pre-process your image, then do it accordingly.

temp = input("Do you want to pre-process the image ?nThreshold : 1nGrey : 2nNone : 0nEnter your choice : ").strip()
# If user enter 1, Process Threshold or if user enters 2, then process medianBlur. Else, do nothing.
if temp == "1":
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
elif temp == "2":
    gray = cv2.medianBlur(gray, 3)

6. Save the pre-processed temporary file as temp.png

filename = "{}.png".format("temp")
cv2.imwrite(filename, gray)

7. Apply OCR and print the output string.

text = pytesseract.image_to_string(Image.open(filename))
print(text)

And the final code will be :

import cv2
import pytesseract
from PIL import Image

def main():
    # Get File Name from Command Line
    path = input("Enter the file path : ").strip()
    # load the image

    image = cv2.imread(path)
    # Convert image to grayscale

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    temp = input("Do you want to pre-process the image ?nThreshold : 1nGrey : 2nNone : 0nEnter your choice : ").strip()

     # If user enter 1, Process Threshold
     if temp == "1":
         gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
     elif temp == "2":
         gray = cv2.medianBlur(gray, 3)

     # store grayscale image as a temp file to apply OCR

     filename = "{}.png".format("temp")

     cv2.imwrite(filename, gray)

     # load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file

     text = pytesseract.image_to_string(Image.open(filename))

     print(text)

 try:
     main()
 except Exception as e:
     print(e.args) print(e.__cause__)

Step 4: Let’s put our code to Test OCR

Here are some of the sample pictures to test Tesseract.

Normal Text

Italic Text

Handwriting

Before testing out tesseract, I recommend you to download the GitHub Repository from here

Text in bold represents output and the italic text indicates input.

Let’s try it on the first sample.

Sample 1

python ocr_main.py
Enter the file path: sample1.png
Do you want to pre-process the image?
Threshold: 1
Grey: 2
None : 3
Enter your choice: 1
You are awesome.

It works well on Sample Image 1, let’s try it on Sample Image 2.

Sample 2

python ocr_main.py
Enter the file path: sample1.png
Do you want to pre-process the image?
Threshold: 1
Grey: 2
None : 3
Enter your choice: 1
Some italic text.

And finally on the last sample.

Sample 3

python ocr_main.py
Enter the file path: sample1.png
Do you want to pre-process the image?
Threshold: 1
Grey: 2
None : 3
Enter your choice: 1
Hawdwriting

Thanks for taking time for reading this article, A big thumbs up for you people.

If you have any queries regarding this article, I would be glad to help you out. Please let me know in the comments section below 🙂

Author avatar

Anirudh Mergu

https://anirudhmergu.com
Highly motivated graphic designer and coder aimed to inspire people using the latest technologies. An engineer by Profession, a designer by Heart ♥.