Computer scienceData scienceNLPMain NLP tasksImage recognition as an NLP task

OCR Implementation

8 minutes read

Previously, you've learned about Optical Character Recognition. This time we are going to speak on OCR implementations.

In this topic we will discuss how to implement OCR in different Python libraries, including Keras-OCR, EasyOCR and our favourite Transformers. We will also bring you some libraries where you can conduct Handwritten Text Recognition.

Implementation in Keras-OCR

Keras is a simplified package for Tensorflow (the path for it is just keras or tensorflow.keras). Keras-OCR is another Python library that offers a pre-trained OCR model. You don't need to train a model and load it (as we have done in Hugging Face Transformers). The original model (on which the library is based) is stored in the TensorFlow Hub.

Install the library in the following way:

!pip install keras-ocr

Though you install it as keras-ocr with the hyphen (-), you should import the library with the underscore sign (_). Apart from it, you will need the matplotlib library to make a visualization:

import matplotlib.pyplot as plt
import keras_ocr

Now, let's initialize the pipeline:

pipeline = keras_ocr.pipeline.Pipeline()

Use a default pipeline. After initialization, we need to load images. In our case there are two images:

  • A road sign somewhere in Portugal (public domain);

 a picture of a metro tunnel with a train getting out of it. On the top there is a station name. Source. Metro, Sevres-Babylone, Paris, Frankreich, France Photograph Luidger (27. August 2005)

Make keras-ocr read these photos with the following code:

images = [
    keras_ocr.tools.read(img) for img in ['street_sign.jpg',  ## Road sign
                                          'subway.jpg'  ## Paris metro
]]

Now, let the model recognize all the characters:

prediction_groups = pipeline.recognize(images)

Let's make a visualization:

fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
    keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)

 Extracted characters are places on the left and right sides of the image and coloured in red. Most characters are not recognized correctly though.

Implementation in Transformers

Vision Encoder Decoder consists of a Visual Encoder and a decoder from the language Transformers. Vision Transformer (ViT) is a supervised transformers model trained on the ImageNet-21k dataset. Bidirectional Encoder representation from Image Transformers (BeiT) is another Vision Transformers model based on the BERT model. While ViT is a supervised model, this one is purely self-supervised. The model was trained on the ImageNet-22k dataset. Other Vision Transformers models include DeiT, DiT, ImageGPT, etc.

OCR models are available in Hugging Face since Transformers architecture is applicable for this task too. Take a look at the Manga OCR model. It's a Vision encoder-decoder model.

from transformers import pipeline

model = pipeline('image-to-text', model='kha-white/manga-ocr-base')

Now let's give the model a picture as input.

predicted = model(['antonio-prado-ckApI31wbZw-unsplash.jpg'])

print(predicted[0][0]['generated_text'])

##  は ぁ 、 は ぁ っ

This model is poor in reading large text or signboards. Let's take another image as an example:

predicted = model(['ryoji-iwata-3vk8Dgkd_Sc-unsplash.jpg'])

print(predicted[0][0]['generated_text'])

##  そ う い え ば 、

Other possible implementations

There are many other possible implantations for OCR. Here are some libraries for common OCR:

Tesseract is a package with an OCR engine and command line program. It supports 100+ languages and can analyze images with such formats as .jpg, .png, and .tif.

SwiftOCR, an OCR library written in Swift, is a fast and simple image recognition algorithm that uses neural network for image processing. It allows for the identification of both regular text and short alphanumeric codes (e.g., news articles) as well as short code letters.

EasyOCR is an excellent OCR library that supports more than 80 languages. You can process an image with just three lines of code (don't forget to install and import easyocr first):

reader = easyocr.Reader(['ru','en'])  ## you can choose several languages at once: ru - Russian, en - English
result = reader.readtext('image.jpg')  ## image.png -- your image

Here is some software for Handwritten Text Recognition (HTR):

Tegaki is software for HTR in Japanese and Chinese. You first need to download a package as any other package and then choose a model. Tegaki offers a dozen of models for Traditional/Simplified Chinese and Japanese.

Kraken is a good OCR system for Persian, Arabic, Urdu, Hebrew, Old & Modern French, Italian, English, and Turk languages. The system is designed to implement HTR on ancient texts.

Some other HTR software is ABBYY FineReader (mediocre solution), Pen To Print (good), MyScript Nebo (an app), and others.

Another option is to train your model. Initially, we wanted to show you how to train your model in Tensorflow, but this would take a lot of space. Instead, we offer you a good notebook on Kaggle to check for yourself.

Conclusion

In this topic, we've introduced you to the keras-ocr library, which allows you to use the model for OCR without skills in the Tensorflow library. And we've also shown how to implement OCR in Image-to-Text Transformers models. You've also learned about some other libraries and apps to implement OCR and HTR.

How did you like the theory?
Report a typo