OCR.NET 2.0 SDK Tesseract Engine
**NEW** 1D barcode reader engine included
The 2.0 engine now includes a fast barcode detector engine capable of interpreting 1D Barcodes on a page image.
The codes that the engine is able to interpret are:
- Code 2 of 5
- Code Interleaved 2 of 5
- Code 93
- Code 39
|
- Codabar
- UPC A
- EAN 13
- Code 128
|
|
**UPDATED** Windows Phone 8.1 Updated Demo App with full source code
The ocr sdk 2.0 will soon be available for download and contains the source code of a full demo app for Windows phone 8.1, that includes advanced techniques and components for speed and memory management in picture handling like custom camera control with auto-detection of optimum capture resolution, image viewer with pinch and zoom, image re-coding for low memory usage using wp8 native DecodePixelHeight, and more goodies...
Some screenshots of the App:
The sdk also includes a Windows Forms Application with full c# source code that demonstrates the use of the ocr engine in Windows using .NET 4.5 framework.
Introduction
DevScope OCR SDK is a Optical Character Recognition toolkit engine based on Google's open-source Tesseract OCR v3 that allows to develop applications using Microsoft .NET frameworks, that accurately recognizes characters in a scanned document image without the need to track and pay for each desktop, server or mobile deployment.
It's 100% royalty free.
Available as free trial download or full featured license. Is compatible with Microsoft.NET framework and also the first to support Windows Desktop And Server, Windows Phone 8.1 and Windows Store Apps.
The Tesseract OCR engine was originally developed by Hewlett-Packard UK. It was one of the top three engines in the 1995 UNLV Accuracy test and is probably one of the most accurate open source OCR engines available. Since then it has been extensively revised with sponsorship from Google.
Quick Price List
Licensing
- Per developer licenses: This license type entitles the specified number of developer/build machine at a single physical address to write software with access to DevScope OCR SDK.
Main Features
- New 1D Barcode Reader Engine
- New imageProcesing Actions for pre-processing an image before running the OCR engine
- New image format load in Windows Store and WP8 versions - PNG, TIFF, JPEG and BMP
- New dictionaries for MIRC and OCR-A/B optimized for reading numbers
- New dlls for Windows x86,x64
- New dlls for WP8 ARM, x86
- New dlls for Windows Store apps ARM,x86
- New class reference and usage documentation
- New ability to ocr na image directly from a writeableBitmap raw buffer.
- Full Unicode Support.
- Multi-thread, Multi-Instance Support. Optimal for batch processing.
- Works as async task on mobile devices for keeping the UI responsive
- Full-Featured C# demo application included.
- Character recognition confidence retrieval.
- Outputs a Document Object Model of easy navigation and extraction of the result OCR entities - block, paragraph, line, word and character location.
- Output results as Searchable PDF, Text, HOCR and UNVL format.
- Outputs the optimized thresholded image used for OCR.
- Included support for document Auto-Deskew and Auto-Orientation detection.
- Stand-Alone document Auto-Orientation detection feature.
- Included support for Local Adaptative Binarization for processing camera captured documents.
- Support for nearly 60 languages such as English, French, Italian, German, Spanish, Brazilian Portuguese, Dutch, Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish, German (standard and Fraktur script), Greek, Finnish, French, Hebrew, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian and Vietnamese, etc.
- Can recognize only digits, only alpha or only "white listed" characters.
- Can skip "black listed" characters.
- Outputs a Document Object Model of easy navigation and extraction of the result entities.
- Multiple OCR engine context support. Allows for the engine to process document image as a single word, single character, text block, line, uniform block of texto, vertical text etc...
- Highly Optimized for fast area processing.
- Available in 32 bit, 64 bit and ARM versions.
- Includes na enhanced image viewer componente with zoom with mouse-wheel and region highlighting.
So why wouldn't I just use Tesseract? What are DevScope OCR benefits?
- Stablility. The original Tesseract is based around a command line process which means that it does not matter if it occasionally terminates, crashes or leaks memory. If you are running a modern in-process application you need a safer behavior. DevScope OCR resolves these issues and presents you with a 100% stable platform.
- Performance. DevScope OCR is highly optimized for fast code and for Windows based operation systems. It also adds multithread support so you can spread load over multiple CPUs or cores and you can use it safely from multithreaded APIs like ASP.NET.
- Compatibility. Tesseract is 32-bit process and cannot be used in 64-bit applications. This is a significant issue when so many operating systems are now based around 64-bit address space. DevScope OCR eliminates this restriction and allows you to run in either x86 or x64 mode by just referencing the appropriate assembly.
- Mobile. DevScope OCR is the first to run on ARM based Windows phone 8 and WinRT devices.
- Simplicity. We provide a single library dll component and and its needed is to reference it in your project, It presents a clean and straight-forward API and also a full featured exemple so that you start using it right away.
FAQ
Where can I use or evaluate the DevScope OCR SDK?
You can get the library and a 30 day trial license by clicking the
get free trial version button. You can also purchase licenses
here.
The library will need to be unlocked with a supplied key, see "How can I unlock the DevScope OCR SDK?".
How can I unlock the DevScope OCR SDK?
Just call the SetLicense() method passing your license key and info as parameters: I.E.,
TesseractOcrEngine.SetLicense(CompanyName, Email,Supplied Key);
Can I use DevScope OCR SDK for barcode recognition?
No. Tesseract is for text recognition.
What image formats are supported ?
The supported image formats, that can be processed by the ocr engine are :
- bmp, jpg, png and tiff in the Windows version.
- bmp, jpg and png in the WP8 and WINRT version (no tiff support yet).
Is there a Minimum Text Size? (It won't read screen text!)
There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed".
I am getting poor recognition performance.
DevScope OCR SDK is targeted for books and articles scanned on a flatbed scanner at 300-600dpi. It works well on a variety of other printed materials, in multiple languages . Inputs it will not work on are:
- Handwriting
- unprocessed digital camera-captured documents
- text in photographic images
- CAPTCHAs
Where can I download all supported dictionary languages ?
You can download the each of the supported OCR languages by clicking on the following links.
Please Note: you must unzip and put each dictionary files inside the tessdata folder.
Afrikaans,
Albanian,
Ancient Greek,
Arabic,
Arabic,
Azerbaijani,
Basque,
Belarusian,
Bengali,
Bulgarian,
Catalan,
Cherokee,
Chinese (Simplified),
Chinese (Traditional),
Croatian,
Czech,
Danish,
Dutch,
English,
Esperanto,
Esperanto alternative,
Estonian,
Finnish,
Frankish,
French,
Galician,
German,
Greek,
Hebrew,
Hebrew (community),
Hindi,
Hungarian,
Icelandic,
Indonesian,
Italian,
Japanese,
Kannada,
Korean,
Latvian,
Lithuanian,
Macedonian,
Malay,
Malayalam,
Maltese,
Middle English (1100-1500),
Middle French (ca. 1400-1600),
Norwegian,
Polish,
Portuguese,
Romanian,
Russian,
Serbian (Latin),
Slovakian,
Slovakian Fraktur,
Slovenian,
Spanish,
Swahili,
Swedish,
Tagalog,
Tamil,
Telugu,
Thai,
Turkish,
Ukrainian,
Vietnamese.
How do I perform OCR on a specific zone of an image?
- Load the image.
- Define the zone (also called region of interest) by setting the appropriate request property .
I.E., request.ScanArea=new Rectangle(100,100,250,50)
- Perform the OCR using the DoOCR() method.
Rules and advices
- If you found a bug - please create issue by contacting our service support: Please make sure you are able to replicate problem with DevScope OCR SDK on the specific platform. Also please check our forums.
- Use the latest official release (optionally: try to check if problem is not solved in new versions).
- Use the correct language dictionary files.
- If you have a question - put it to the DevScope OCR SDK developer forum.
- Do not ask for support in comments - it will be deleted.
- Post example files e.g. if you have problema, just posting error messages is not sufficient if you used input file. Source of problem is hidden in input files usually.
- Do not post programs or libraries - post link where they can be downloaded
- Try to find optimal format for example images - 20Mb image is not helpful. Multi-page tiff useful only in case you have problem with multi-page functionality. E.g. 2 colour png provide same information as truecolour uncompressed tiff (tesseract will convert it to 2 colours anyway).
- Copy error message from terminal/console/command line windows instead of sending screen-shot.
- Read FAQ, Forum and search issues (also closed), search in forum before you post your issues/question. Maybe it was solved already.