Tesseract is the best program for converting image to text, on Ubuntu/Linux. I’ve tried several OCR (Optical Character Recognition) applications but its accuracy is certainly higher than any other applications.
Tesseract is a simple and easy to use command line utility. It’s cross-platform application, and of course – it’s a free and open source software! You can supply various input formats and it can convert into 60+ languages.
Installing Tesseract in Ubuntu / Linux
sudo apt-get install tesseract-ocr
Further, you can install any language packages if required.
Now, you can start using Tesseract -
tesseract your_scanned_file.png output_content
The results will be saved to output_content.txt file. If you want to OCR for other languages then pass it as the additional parameter, specified by -l. (and of course, you would have to first install that language pack)
e.g for scanning images that contains Hindi texts,
tesseract your_scanned_paper.png output_content -l hin
Visit official page for more details about the project.
