The key in optimizing OCR for camera image is to make the correct binarization of the image. Binarization is the process of converting an image to black and white. Tesseract does this internally, but it can make mistakes, particularly if the page background is of uneven darkness, lighting blureness etc... all this is typical in camera images. Noise is also a problem. Noise is random variation of brightness or colour in an image, that can make the text of the image more difficult to read. Certain types of noise cannot be removed by Tesseract in the binarisation step, which can cause accuracy rates to drop. You can see more info about this problem here: https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality The DevScope OCR SDK has a special parameter for processing this kind of camera images. We will try to create the best picture possible but we just can't succeed in them all. That being said, when feeding the OCR engine with a camera based image, please make sure that you set the UseLocalAdaptiveThresholding to true, like in the example below: // Create the ocr job request var request = new TesseractOcrJobRequest(); request.AutoDeskew = true; request.OrientationMode = TesseractOcrOrientationMode.None; request.JobName = "MyJob " + DateTime.Now.ToString(); .... request.FileName = pathToImageDocument; request.UseLocalAdaptiveThresholding=true ... This setting will activate features to handle document photos but not text in natural scenes. Internally we will use a combination of methods to improve the image quality but bear in mind that if you are trying to recognize text in natural scenes, then that method will give you poor results. Recognizing text in natural images is a very advanced topic and requires special pre-processing of images accordingly.
|