Most of the OCR (Optical Character Recognition) utilities available in the market will convert scanned archives into a PDF
format, including both image and text in the same standard container. Alfresco supports a content transformation framework—where you can plug in a third-party content transformation engine to convert a document from one format to another.
This gives you great flexibility when converting your image document, such as a TIFF
file, to a machine readable format such as PDF
, RTF
, or TXT
.
The following figure illustrates the process of scanning a paper document using a network scanner, and transferring the document, in an image format, into the Alfresco repository. Once the image document gets into the Alfresco repository, you can trigger a business rule, which converts it to a PDF document. You can still keep the image document in the repository for future reference. The quality and the accuracy of the output PDF document will be depending upon the OCR utility that you use for the transformation.
Intelliant sells an OCR-Alfresco bundle, which can be downloaded from their web site. You can find more information about their offerings from their web site, at http://www.intelliant.fr/en/alfresco-ocr-bundle.php.
Their OCR utility is integrated with the Alfresco repository as a content transformation. Intelliant's OCR utility converts TIFF
images into PDF
, RTF
, and TXT
documents. Follow the tutorial provided on their web site to download and install the bundle.
Carry out the following steps to enable OCR in Alfresco:
<extension>
folder.You can follow this same process to integrate any OCR utility into Alfresco.