terminal-shortcuts

Image to Text converter (OCR) for Ubuntu / Linux Mint

Tesseract is the best program for converting image to text, on Ubuntu/Linux. I’ve tried several OCR (Optical Character Recognition) applications but its accuracy is certainly higher than any other applications.

Tesseract is a simple and easy to use command line utility. It’s cross-platform application, and of course – it’s a free and open source software! You can supply various input formats and it can convert into 60+ languages.

Installing Tesseract in Ubuntu / Linux

sudo apt-get install tesseract-ocr

Further, you can install any language packages if required.

Now, you can start using Tesseract –

tesseract your_scanned_file.png output_content

The results will be saved to output_content.txt file. If you want to OCR for other languages then pass it as the additional parameter, specified by -l. (and of course, you would have to first install that language pack)

e.g for scanning images that contains Hindi texts,

tesseract your_scanned_paper.png output_content -l hin

Visit official page for more details about the project.