PDF Converter API and Server Platform 7.1 - Optical Character Recognition, EML and MSG overhaul

Posted at: 13:17 on 13 November 2013 by Muhimbi

PDFConverterServicesBox4_thumb3

We are happy to announce version 7.1 of our popular Muhimbi PDF Converter API and Server Platform and OCR and PDF/A Archiving API and Server Platform. The main new features are support for OCR (Optical Character Recognition) to convert scanned documents into fully searchable and indexable PDF files, and a completely overhauled converter for the EML (email) format that should really benefit those organisations that don’t use MS-Outlook’s MSG format to store email.

 
A quick introduction for those not familiar with the product: The Muhimbi PDF Converter Services is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java, .NET, Ruby & PHP based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, MSG (email), EML, AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Splitting and Merging facilities, an OCR facility and the ability to secure PDF files. A separate SharePoint specific version is available as well.
 

Scanned Document with OCRed text selected


In addition to the changes listed above, some of the main changes and additions in the new version are as follows:

1901 CAD Fix CAD Conversion - AccessViolationException
1931 CAD Improvement CAD Converter does not resolve externally referenced files
1850 CAD Improvement Add support for AutoCAD 2013
1916 Conversion Fix TIFF to PDF Conversion uses dimensions of first page for all pages
1853 Conversion Fix Post processing PDF generated from TIF as 'Screen Optimised' scrambles PDF
676 Conversion Improvement Excel Conversion - Add support for PDF/A
1930 Cross-conversion Fix Folder with Temp files cannot be deleted when converting DOC to HTML for some locales / regions
1879 EML New Implement conversion of RFC2045 / RFC5322 based EML files
1965 HTML Fix HTML Converter hangs on 0.5 page margin
1920 HTML Fix Not all URLs are recognised by HTML Converter
1827 HTML Fix HTML to PDF Conversion for some non-Roman languages lose characters
1840 HTML Fix Last line is truncated when converting HTML to PDF
1953 HTML Improvement Mixed fonts in same sentence are vertically offset when converting HTML to PDF
1940 HTML Improvement HTML Conversion doesn't convert unencoded quotes
1884 HTML Improvement Add configurable delay to HTML to PDF conversion for pages heavy on JavaScript / DHTML (e.g. pages containing Google Maps)
2009 InfoPath Improvement Fix InfoPath forms colour being lost on IE10 systems
1939 InfoPath Improvement InfoPath does not export to PDF well on systems with IE10
2010 Merging Fix System.NullReferenceException when saving merged file
2012 Merging Fix Internal hyperlinks are broken when merging documents
1990 Merging Fix Unexpected token DictionaryEnd while merging
1982 Merging Fix System.IndexOutOfRangeException: Index was outside the bounds of the array. while merging PDF
1984 Merging Fix Bookmark targets bottom of page
1968 Merging Fix Nullreference error in PdfLoadedFormFieldCollection.GetFieldType while merging
1978 Merging Fix Error in 'PdfLoadedPageCollection.GetPage' while merging file
1967 Merging Fix Blank pages while merging
1943 Merging Fix Fatal Error at 9670 while merging
1935 Merging Fix Merged file is empty when merging large bitmapped PDFs
1895 Merging Fix Fatal Error when merging
1892 Merging Fix System.NullReferenceException when merging
2007 MSG Fix MSG - Unexpected line break using plain text conversion
2014 MSG Fix MSG - Unicode / character encoding problem in HTML email
2006 MSG Fix MSG - Hyperlink breaks during conversion
1958 MSG Fix MSG - System.Exception: compressed-RTF CRC32 failed
1959 MSG Fix MSG/EML Converter - Last line is missing from some converted emails
1925 MSG Fix MSG to PDF - Plain text email carriage return handling is incorrect
1913 MSG Fix MSG to PDF - RTF HTML MSG - incorrectly converted accents / diacritics
1914 MSG Fix MSG to PDF - RTF HTML MSG - RTL languages not converted in correct order
1904 MSG Fix MSG to PDF - Sometimes Attachment is not processed
1911 MSG Fix MSG to PDF - Possible regression on in-line images
1912 MSG Fix MSG to PDF - RTF HTML MSG - Azerbaijani, Maltese - some unicode characters not converted, left as \uXXXX
1899 MSG Fix MSG to PDF - German special characters are sometimes not properly converted
1882 MSG Fix MSG to PDF - RTF email is missing portion of first line in body text
1885 MSG Fix MSG to PDF - Handle and Memory leak when converting signed MSG files
1862 MSG Fix MSG to PDF - Incorrect font
1863 MSG Fix MSG to PDF - Numbered list items not rendered
1601 MSG Improvement MSG to PDF - Improve line spacing in HTML to PDF Conversion
1660 MSG Improvement MSG to PDF - Test / Implement remaining languages
1917 MSG Improvement MSG to PDF - RTF HTML MSG - some languages causing small fonts
1903 MSG Improvement MSG to PDF - Implement Best Body Algorithm from MS-OXBBODY specification
1881 MSG Improvement MSG to PDF - Text opaque signed MIME messages lose formatting
2015 MSG New MSG to PDF - Include email address in 'To' field
995 OCR New OCR - Add support for OCR of PDF data to allow searchable PDFs
1985 Other Fix Cannot set PDF Creator / Processor meta data for some files
1972 Other Fix Loading a PDF 1.7 document into a PDFDocument resets it to PDF 1.5
1952 Other Fix Certain PDFs do not permit viewerpreferences to be read
1906 Other Fix Occasional Access Denied in Task Monitor on Win2K12 / InfoPath 2015
1799 Other Improvement Upgrade to .net 3.5
2061 Pro Fix Converting between PDF Versions on a locale that uses ',' as a decimal separator sets the PDF Version to 1.1
1945 Pro Fix PDF/A conversion - The DateTime represented by the string is not supported in calendar System.Globalization.GregorianCalendar.
1922 Pro Fix Re-processing existing PDF/A files for PDF/A output fails
1909 Pro Fix PDF/A Conversion fails when certain characters occur in the PDF Title
1888 Pro Fix Improve reliability of PDF/A2b conversions
1849 Pro Fix Linearization in combination with PDF/A fails
1979 Pro Improvement Always post process for PDFA when _outputFormatSpecificSettings.PostProcessFile == true
1843 Pro Improvement Allow transparent content in PDF/A2b documents
1974 Security Fix When security is removed from PDF files its contents still shows as encrypted

 
For more information check out the following resources:


As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (37MB). .

.

Labels: , , , , , , ,

PDF Converter for SharePoint 7.1 - Optical Character Recognition, EML and MSG overhaul

Posted at: 12:15 on 12 November 2013 by Muhimbi

PDFBox5

The version of the PDF Converter for SharePoint released earlier this year went down a storm as it was one of the first major third party products to ship with SharePoint 2013 and Nintex Workflow 2013 support. Today we are happy to announce yet another release of this popular product - version 7.1 - that benefits users of all SharePoint versions including SharePoint 2007, 2010 and 2013.

The main new features are support for OCR (Optical Character Recognition) to convert scanned documents into fully searchable and indexable PDF files, and a completely overhauled converter for the EML (email) format that should really benefit those people with mail enabled document libraries.

 
For those not familiar with the product, the PDF Converter for SharePoint is a lightweight solution that allows end-users to watermark, merge, split, secure, OCR and convert common document types - including InfoPath, AutoCAD, MSG (email) MS-Office, HTML and images - to PDF as well as other formats from within SharePoint using a friendly user interface, workflows or a web service call without the need to install any client side software or Adobe Acrobat. It integrates at a deep level with SharePoint and leverages facilities such as the Audit log, Nintex Workflow, localisation, security and tracing. It runs on SharePoint 2007, 2010 & 2013 and is available in English, German, Dutch, French, Traditional Chinese and Japanese. For detailed information check out the
product page.
 

Scanned Document with OCRed text selected


In addition to the changes listed above, some of the main changes and additions in the new version are as follows:

1901 CAD Fix CAD Conversion - AccessViolationException
1931 CAD Improvement CAD Converter does not resolve externally referenced files
1850 CAD Improvement Add support for AutoCAD 2013
1916 Conversion Fix TIFF to PDF Conversion uses dimensions of first page for all pages
1853 Conversion Fix Post processing PDF generated from TIF as 'Screen Optimised' scrambles PDF
676 Conversion Improvement Excel Conversion - Add support for PDF/A
1930 Cross-conversion Fix Folder with Temp files cannot be deleted when converting DOC to HTML for some locales / regions
1879 EML New Implement conversion of RFC2045 / RFC5322 based EML files
1965 HTML Fix HTML Converter hangs on 0.5 page margin
1920 HTML Fix Not all URLs are recognised by HTML Converter
1894 HTML Fix URL is not decoded properly when validating security settings
1827 HTML Fix HTML to PDF Conversion for some non-Roman languages lose characters
1840 HTML Fix Last line is truncated when converting HTML to PDF
1953 HTML Improvement Mixed fonts in same sentence are vertically offset when converting HTML to PDF
1940 HTML Improvement HTML Conversion doesn't convert unencoded quotes
1884 HTML Improvement Add configurable delay to HTML to PDF conversion for pages heavy on JavaScript / DHTML (e.g. pages containing Google Maps)
2009 InfoPath Improvement Fix InfoPath forms colour being lost on IE10 systems
1939 InfoPath Improvement InfoPath does not export to PDF well on systems with IE10
2010 Merging Fix System.NullReferenceException when saving merged file
2012 Merging Fix Internal hyperlinks are broken when merging documents
1990 Merging Fix Unexpected token DictionaryEnd while merging
1982 Merging Fix System.IndexOutOfRangeException: Index was outside the bounds of the array. while merging PDF
1984 Merging Fix Bookmark targets bottom of page
1968 Merging Fix Nullreference error in PdfLoadedFormFieldCollection.GetFieldType while merging
1978 Merging Fix Error in 'PdfLoadedPageCollection.GetPage' while merging file
1967 Merging Fix Blank pages while merging
1943 Merging Fix Fatal Error at 9670 while merging
1935 Merging Fix Merged file is empty when merging large bitmapped PDFs
1895 Merging Fix Fatal Error when merging
1892 Merging Fix System.NullReferenceException when merging
1957 Meta-data Fix Icon of destination file is incorrect if source file was created using the Content and structure page.
1944 Meta-data Fix Content type is not copied when it contains a user/group field that is populated
1946 Meta-data Fix Content Type ID is copied by field description, not internal field name
2007 MSG Fix MSG - Unexpected line break using plain text conversion
2014 MSG Fix MSG - Unicode / character encoding problem in HTML email
2006 MSG Fix MSG - Hyperlink breaks during conversion
1958 MSG Fix MSG - System.Exception: compressed-RTF CRC32 failed
1959 MSG Fix MSG/EML Converter - Last line is missing from some converted emails
1925 MSG Fix MSG to PDF - Plain text email carriage return handling is incorrect
1913 MSG Fix MSG to PDF - RTF HTML MSG - incorrectly converted accents / diacritics
1914 MSG Fix MSG to PDF - RTF HTML MSG - RTL languages not converted in correct order
1904 MSG Fix MSG to PDF - Sometimes Attachment is not processed
1911 MSG Fix MSG to PDF - Possible regression on in-line images
1912 MSG Fix MSG to PDF - RTF HTML MSG - Azerbaijani, Maltese - some unicode characters not converted, left as \uXXXX
1899 MSG Fix MSG to PDF - German special characters are sometimes not properly converted
1882 MSG Fix MSG to PDF - RTF email is missing portion of first line in body text
1885 MSG Fix MSG to PDF - Handle and Memory leak when converting signed MSG files
1862 MSG Fix MSG to PDF - Incorrect font
1863 MSG Fix MSG to PDF - Numbered list items not rendered
1601 MSG Improvement MSG to PDF - Improve line spacing in HTML to PDF Conversion
1660 MSG Improvement MSG to PDF - Test / Implement remaining languages
1917 MSG Improvement MSG to PDF - RTF HTML MSG - some languages causing small fonts
1903 MSG Improvement MSG to PDF - Implement Best Body Algorithm from MS-OXBBODY specification
1881 MSG Improvement MSG to PDF - Text opaque signed MIME messages lose formatting
2015 MSG New MSG to PDF - Include email address in 'To' field
995 OCR New OCR - Add support for OCR of PDF data to allow searchable PDFs
1985 Other Fix Cannot set PDF Creator / Processor meta data for some files
1972 Other Fix Loading a PDF 1.7 document into a PDFDocument resets it to PDF 1.5
1952 Other Fix Certain PDFs do not permit viewerpreferences to be read
1906 Other Fix Occasional Access Denied in Task Monitor on Win2K12 / InfoPath 2015
1799 Other Improvement Upgrade to .net 3.5
2061 Pro Fix Converting between PDF Versions on a locale that uses ',' as a decimal separator sets the PDF Version to 1.1
1945 Pro Fix PDF/A conversion - The DateTime represented by the string is not supported in calendar System.Globalization.GregorianCalendar.
1922 Pro Fix Re-processing existing PDF/A files for PDF/A output fails
1909 Pro Fix PDF/A Conversion fails when certain characters occur in the PDF Title
1888 Pro Fix Improve reliability of PDF/A2b conversions
1849 Pro Fix Linearization in combination with PDF/A fails
1979 Pro Improvement Always post process for PDFA when _outputFormatSpecificSettings.PostProcessFile == true
1843 Pro Improvement Allow transparent content in PDF/A2b documents
1974 Security Fix When security is removed from PDF files its contents still shows as encrypted
1966 UI Fix SharePoint 2013 ribbon icon is broken for Site Collection scoped Feature
1937 UI Fix Users with only 'read' permission on a doclib, but not on the main site, cannot open the conversion screen
1923 Watermarking Fix Random ArgumentException when applying XML based watermarks


For more information check out the following resources:


As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (43MB). .

.

Labels: , , , , , , ,

Convert MSG and EML based emails to PDF using PDF Converter Services / SharePoint

Posted at: 12:26 on 11 November 2013 by Muhimbi

Muhimbi’s range of server based PDF Conversion products has supported the conversion of MSG based emails for years. Based on conversations with customers of the PDF Converter for SharePoint and PDF Converter API & Server Platform (for Java, PHP, Ruby, C#, .NET) we understand that this facility is extremely popular as so few (if any) other solutions provide the same level of fidelity, performance and features as our software does.

Prior to the 7.1 release our main focus was on the conversion of MSG based emails. EML based emails were supported as well, but in a much more limited manner that focused on the conversion of the email body. With the release of 7.1, and in addition to the changes made to the MSG converter, the EML converter has been brought up to par resulting in both file formats being treated the same.

The key features are as follows:

  1. Support for all common email content types, including HTML, RTF, plain text and combinations of the three.
  2. Conversion of rich content including in-line images and tables.
  3. Support for the conversion of signed emails, both SMIME and Clear Text. The name of the person who signed the email is displayed in the header as well.
  4. Attachments are converted (optionally) as well and attached to the converted email, resulting in a single PDF file containing all documents. Ideal for archiving purposes.
  5. Broad internationalisation support. A German MSG based email with an EML based email attached in Japanese? No problem!
  6. Support for localization of email labels such as From, To, Subject. (As of version 7.2.1)
  7. Control how email addresses are displayed in the header: just the sender’s name, just the email address or both.
  8. Output resembles what comes out of MS-Outlook without the need for Outlook to be installed on the Conversion Server.

 

Example output of a regular email conversation as well as part of a web based newsletter

 

Many of our customers are sitting on gigabytes of emails that need to be archived for eDiscovery, Freedom Of Information requests and SOX, SEC, FTS, FCC, EPA, NLRB, IRS, EEOC, OSH, OFCOM retention regulations. Being able to access these emails 10, 20 or even 40 years down the line, in a universally accepted format such as PDF (including PDF/A), is absolutely essential. Muhimbi’s range of PDF Conversion products make this possible for all common file formats as well as some uncommon ones such as MSG, EML and even InfoPath.

As email conversion is part of our highly scalable PDF Conversion platform, it automatically benefits from all its features including reliability, scalability, watermarking engine, cross platform support, web services based API, PDF security, SharePoint integration, Nintex Workflow integration, Java support, PHP Support, Ruby Support, InfoPath attachments, Windows Azure etc.

Any questions, contact us or leave a reply below.

.

Labels: , , , , , ,

Subscribe to News feed