PDF Converter API and Server Platform 7.2 - Extract text using OCR, MSG Improvements

Posted at: 17:55 on 09 April 2014 by Muhimbi


We are happy to announce version 7.2 of the popular Muhimbi PDF Converter API and Server Platform. This new release further extends the OCR facility and MSG improvements introduced in the previous version and adds support for extracting text from bitmap based content and rendering of MSG based calendar entries.

A quick introduction for those not familiar with the product: The Muhimbi PDF Converter API and Server Platform is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java, .NET, Ruby & PHP based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, MSG (email), EML, AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Splitting and Merging facilities, an OCR facility and the ability to secure PDF files. A separate SharePoint specific version is available as well.

  Example of a converted Calendar entry with an (OLE) embedded Excel sheet

In addition to the changes listed above, some of the main changes and additions in the new version are as follows:

2100 Excel New Optionally scale Excel to page width & height
2059 HTML Fix System.ArgumentException: uri - string can not be empty
1996 HTML Improvement Reduce white space causing occasional extra empty PDF pages at end of file.
1802 Merging Fix Bookmark targets bottom of page
2093 Merging Fix "Unexpected token Unknown before 107448" while merging file
2078 Merging Fix Kernel Error while loading PDF
2073 Merging Fix System.IndexOutOfRangeException while merging
2074 Merging Fix System.NullReferenceException while merging
2075 Merging Fix System.NullReferenceException while merging
2076 Merging Fix Some HTML Converted files cannot be saved in Acrobat Pro after merging
2126 MSG Fix "System.InvalidOperationException: Stack empty" during conversion of 3rd party generated MSG files
2133 MSG Fix "Parameter is not valid" during conversion of 3rd party generated MSG files
2136 MSG Fix Content missing from converted MSG file
2106 MSG Fix Fixed MSG body for 3rd party generated MSG files
2116 MSG Fix Conversion of MSG files with an attached MSG that is signed
2124 MSG Fix "System.IndexOutOfRangeException" Converting German email
2125 MSG Fix Conversion of email never finishes
2105 MSG Fix "Invalid Compressed RTF header" during conversion of 3rd party generated emails
2090 MSG Fix Extra '}' in body text
2058 MSG Fix No bookmark generated for certain attachments
2056 MSG Fix ‘Sent date' not correct on some 3rd party generated emails
2057 MSG Fix Unicode converter issue (also with EML)
2088 MSG Improvement Add support for attendees to meeting invitations
2086 MSG Improvement Optionally throw error if embedded content is encountered that cannot be converted
2013 MSG Improvement From address shows LDAP path
2046 MSG Improvement Web Service support for MSGConverterFullFidelity.EmailAddressDisplayMode and FromEmailAddressDisplayMode
2087 MSG New Convert the visual representation of embedded objects
2068 MSG New Add support for the conversion of Calendar Entries
2050 MSG New Add config value to allow MSG attachments list to be displayed, even when attachments are disabled
2113 MSG/HTML Fix Rendering error in very long emails / HTML pages
2066 MSG/HTML Fix Sometimes content is truncated on systems running IE9, IE10 or IE11
2005 MSG/HTML Fix Fonts look weird in some emails
1786 OCR Fix Handle leak during OCR
2054 OCR Fix Some Mixed content (MS-Word files with scanned images) does not always OCR
1999 OCR Fix Arabic training data causes exception
1788 OCR Improvement Increase OCR Performance
2089 OCR Improvement Update Diagnostics tool to display OCRed text
2081 OCR Improvement In-line images are recognised but text is not placed on it correctly
1998 OCR Improvement Add support for Hebrew
2048 OCR New Support for extracting text from bitmap based content using OCR
2072 Other New Allow timeouts to be specified on web service call
2102 Watermarking Fix Chinese & Japanese fonts are not displayed in watermarks
2103 Watermarking Fix Watermarking some documents causes problem in Adobe Reader 9

For more information check out the following resources:

As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (39MB). .


Labels: , , ,


Post a Comment

Subscribe to Post Comments [Atom]

Subscribe to News feed