Nuance: speech recognition, text to voice, pdf converter, ocr scanning, ocr software, document scanning, call center optimization

Related Products
OmniPage Professional 16
OmniPage Capture SDK
Dragon Audiomining
Google Desktop Search
 
Key Features

The OmniPage Search Indexer is the first product to make all PDF, Fax and scanned document content visible to desktop search engines. Until now, image-based content was invisible to search engines, because without applying optical character recognition (OCR) to the image, there was no way for search engines to find text embedded within images.

The Best - Speed and Accuracy

Accuracy is the single most important feature of any OCR program - without accuracy you won't find your document. However speed is also important, because indexing image documents requires the extra time-consuming step of OCR. ScanSoft has over 6 OCR engines in its product portfolio, providing the unique ability to apply the best speed/accuracy balance to search indexing information.

Complete PDF Indexing - Normal and Scanned

While most people may think there is a single flavor of PDF, in fact there are many:

  • PDF Normal
    When you create a PDF file from a desktop application, such as from Microsoft Word or Quark Express, the PDF contains internal information that can be used by a search indexer. This includes text, font and positioning information. What is important in indexing PDF normal documents is speed of reading and indexing - exactly where ScanSoft technology excels.

  • PDF Image
    When a PDF file is created from a scanner, or from an online FAX service such as eFax, the PDF does not contain text information (other than the file name). The PDF is an image of the document - similar to a photograph. ScanSoft's OCR technology re-creates the text information from the image content - without changing the original file. This is important, especially if the image document has legal implications, such as a receipt, contract or correspondence.

  • PDF Image+Text
    This kind of PDF file is a hybrid of the two types described above. The visual document is a PDF image, but there is a hidden layer of text behind the image, which ideally matches the image. This isn't always the case, since a user can edit the text without changing the image representation. ScanSoft's PDF Overlay matching technology looks at the image and the hidden text, and compares the two.

Integration with Google Desktop Search

ScanSoft has a formal developer relationship with Google, which provides us with access to the interfaces needed to seamlessly add features to Google Desktop Search. We also provide suggestions and comments back to Google, so that our combined products can deliver the most to the user.

  • Indexing During Idle PC Time
    The OmniPage Search Indexer only works when you are not using your PC. This means that your PC won't slow down while you are using it, and that your content will be indexed when you need it. Most users will find that all of their image content will be indexed in a few hours or less.

  • Google Desktop Search Preferences
    You can turn on/off ScanSoft OmniPage Search Indexer from within the Desktop Preferences menu within Google Desktop Search.

Formats and Languages Indexed

ScanSoft's OmniPage Capture SDK, which was used to develop the OmniPage Search Indexer, supports a wide array of file formats - import and export. The product also provides OCR for over 120 languages, including Latin, Cyrillic and Asian (Chinese, Japanese and Korean). ScanSoft is also developing Arabic OCR under contract to a government agency.

In order to reduce the download size for the OmniPage Search Indexer, ScanSoft has limited the initial beta release to the following:

  • Formats - PDF image, image+text, normal; JPG/JPEG; TIF/TIFF; PaperPort MAX
  • Languages - English (US/UK), French, German, and Italian.
© 2002-2009 Nuance Communications, Inc. All rights reserved.