Although our software also support cross conversion between a number of file types, at Muhimbi our focus has always been on converting a wide range of document types to PDF with perfect fidelity. Well, why just let people convert to PDF? Why not look at ways to allow conversion from PDF?
Finding a good 3rd party command line converter to use as an example is always a challenge for us. Our developers have spent countless hours refining Muhimbi’s own converters and so we rarely find one that completely meets our standards. We are proud of our products and don’t want to suggest integrating them with another converter that won’t produce the same high quality, consistent results that we insist upon for our own converters. The search has been especially difficult for this particular post as the range of supported features between the various converters, not to mention the price, is so vast. So, while we’ll focus on one Converter, we’ll mention some others that provide some specific conversion options that might be more suitable for your particular needs.
Please note that we do not have any formal or informal relationships with the companies mentioned in this post. They are merely the result of a brief Google Search session.
PDF to Excel conversion is tricky since the process is a bit like trying to rebuild a sand castle after a wave has hit it- all the sand (data) is still there, but shape (cell formatting) is gone. Other formats are easier and so the approach we’ll take in this example is to use a converter that automates the creation of the cells based on its best guess as well as being able to handle multiple formats. This isn’t perfect, but it simplifies the conversion tremendously while still providing a good example of how the conversion works. Total PDF Converter X from CoolUtils is a lightweight conversion application with a simple command line and provides a good quality conversion. It’s available as a free trial and at the time of writing costs $159.90 to purchase. It provides conversion support from PDF to a wide variety of formats- DOC, RTF, XLSX, HTML, EPS, PS, TXT, CSV and images (BMP, JPEG, GIF, WMF, EMF, PNG, TIFF).
As always, we start with the assumption that the Converter for SharePoint or Converter Services is properly installed. Once that is done carry out the following steps:
Modify the ‘Muhimbi.DocumentConverter.Service.exe.config’ file as described here and add the following entry to the <MuhimbiDocumentConverters> section. This tells the Converter that PDFs can be converted to XLSX. If you installed the 3rd party software in a different path then please update the content of the parameter attribute.
parameter="C:\Program Files (x86)\Total PDF ConverterX\pdfconverterx.exe | {0} {1} -c doc"/>
The same changes can be made to allow conversion to any of the supported formats. And, before you ask- YES! You can add multiple entries to the config file to allow conversion from PDF to multiple types.
So, what do the results look like? Here we have an image of a typical PDF based quote:
Then here, you have the conversion to an Excel file:
So, as we’ve seen, this a good option that provides a great selection of conversion choices. Depending on your needs though, it might not provide the conversion flexibility required for something like Excel, or if you just need PDF to Word conversion, might not be worth the expense.
The addition of this converter allows SharePoint to take advantage of this conversion ability through workflows simply by changing the output format of the workflow to the desired type in the ‘Convert Document’ workflow action for SharePoint Designer and Nintex workflows. In order to allow SharePoint to process PDF files, which by default are not routed to the Converter, please run the following command from a SharePoint Command or PowerShell prompt:
For PDF to Excel conversion there are a few other options:
Intelligent Converters has the PDF-To-Excel converter for $29. It provides good conversion from PDF to Excel if you are primarily interested in extracting the data from the PDF and saving it in Excel format.
A-PDF.COM has the A-PDF to Excel converter for $39. It provides about the same level of conversion as PDF-To-Excel, but also uses a template file for conversion. This could be very helpful if you have a large number of consistent PDFs that need to be converted and want to specify cells and formatting for the conversions.
For PDF to Word conversion there is a decent free option from Weeny Software, Free PDF to Word Converter. The conversion has some problems with complex formatting (it creates quite good RTF documents), but if you are more concerned about getting the data into Word format than the way it looks, it is a good option.
Any questions or feedback? Leave a comment in the section below or contact us, we love talking to our customers.
Both the Muhimbi PDF Converter for SharePoint and Muhimbi PDF Converter API & Server Platform have long been able to cross-convert one file format to another out of the box, not just to PDF. This works very well, but what happens when you really want to mix it up- say convert a grid based text format like Excel to a multi-page image format like TIFF? As we told the customer who requested this- “I’m not sure, but I know we can figure it out for you!”
And so here it is! The solution is actually fairly simple and relies on two features that already exist within Muhimbi’s Converter products. The first is the multi-step converter we built to allow cross-conversions and the second is our ability to integrate 3rd party converters within Muhimbi’s own conversion process.
The multi-step converter allows for an intermediate format to be used when there is no converter that can directly convert between the file types requested. This stepping stone approach happens inside the Converter and is completely invisible to the user. There is only one step to creating the multi-step TIFF converter:
Modify the ‘Muhimbi.DocumentConverter.Service.exe.config’ file as described here and add the following entry to the <MuhimbiDocumentConverters> section. This tells the Converter that all the listed extensions can be converted to TIFF by first converting them to PDF and then from there to TIFF.
The 3rd party converter integration actually allows the Converter to do the conversion from PDF to TIFF and complete the job. This step is really no different than adding any 3rd party converter. It also has the added benefit of allowing PDF to TIFF conversion.
Download the Ghostscript GPL Releasefrom the Ghostscript website (please ensure you download the Windows Version, preferably version 9.16 as newer versions have a problem with fonts).
Install Ghostscript in a location of your choice on every server that runs the Muhimbi Conversion Service. Please make note of the location of the installation so you can point the Converter to it in the XML fragment listed below.
The next step is to modify the ‘Muhimbi.DocumentConverter.Service.exe.config’ file again and add the following entry to the same <MuhimbiDocumentConverters> section. Please remove the line wrapping from the content of the parameter attribute, this example has been reformatted to make it fit in a browser window. Please update the location of the Ghostscript executable as well.
More details on the Muhimbi parameters can be found here, though you shouldn’t need to change them. You can also review the various Ghostscript options here, especially the resolution that is used (-r300x300) as you may wish to change this in order to suit your specific needs.
NOTE- For this to work you must be on at least version 7.2.1 of the Muhimbi PDF Converter.
If the source file you wish to convert is in PDF format, and you are converting via our SharePoint Workflow Actions, then please make sure the option to send PDF files to the server is enabled as by default PDF files are skipped. For details see SkipPDFFiles option in this blog post.
There you go- your Muhimbi Converter can now convert all supported formats to multi-page TIFF as well! Want to convert an InfoPath or MSG file – including all attachments – to TIFF and apply watermarks in the process? Now you can.
When releasing several new versions of a product each year - like we do with the Muhimbi PDF Converter API and Server Platform - it's easy to overlook the overall stability and performance of the product in pursuit of new features. Sure, adding new features is fun, but adding stability and performance improvements is just as important, if not more so.
To make sure that everything will continue to work smoothly and reliably, we have dedicated an entire release to just this topic. We have fixed a number of important issues and improved performance, but all work and no play isn't the way to go either. So, we STILL found some time to sneak a few new features in, particularly the ability to convert files attached to PDFs and translate email and calendar labels for the language of your choice.
We haven't just been looking through our code- we've also be talking with our customers about how they're using the Converter. This has lead to a number of new blog posts suggesting new ways to use the existing features already present in the Converter. For example, you can easily add new conversion types as described in these recent blog posts about adding PCL and XPS to PDF Conversion using free third party software.
A quick introduction for those not familiar with the product: The Muhimbi PDF Converter Services is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java, .NET, Ruby & PHP based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, MSG (email), EML, AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Splitting and Merging facilities, an OCR facility and the ability to secure PDF files. A separate SharePoint specific version is available as well.
Converted Calendar Entry in English and German
In addition to the changes listed above, some of the main changes and additions in the new version are as follows:
2064
CAD
Fix
CAD Converter - Object reference not set to an instance of an object.
CAD Converter -ArgumentOutOfRangeException while converting CAD file.
2115
CAD
Fix
CAD Converter -Hatch Lines fill not correct.
2217
CAD
Improvement
CAD Converter -Increase performance, reduce file size and improve compatibility
2158
Excel
Fix
Excel fails to load certain documents in Excel 2013 with the following error: "Unable to get the Open property of the Workbooks class".
2159
Excel
Improvement
Excel files with external links open very slowly.
2145
HTML
Fix
Occasionally in-line images go missing when converting HTML/MSG to PDF.
2012
Merging
Fix
Internal hyperlinks are broken when merging certain documents.
2167
Merging
Fix
Merge operations cannot be executed as PDF/A due to problem with security settings.
2184
MSG
Fix
MSG to PDF - Attachment name not recognised when MSG is exported using ANSI.
2208
MSG
Fix
MSG to PDF - Converting email returns empty PDF.
1898
MSG
Improvement
MSG to PDF - Allow email labels to be translated.
2154
OCR
Fix
OCR Speeds 'fast' and 'rapid' stopped working.
2156
OCR
Fix
OCR - Occasional error under load.
2230
Other
Improvement
Add support for specifying additional output formats such as 'TIFF, PNG, GIF, JPG, PS, BMP, PCL' (This does not include native support for converting to these formats, which requiresthird party plug ins)
When releasing several new versions of a product each year - like we do with the Muhimbi PDF Converter for SharePoint - it's easy to overlook the overall stability and performance of the product in pursuit of new features. Sure, adding new features is fun, but adding stability and performance improvements is just as important, if not more so.
To make sure that everything will continue to work smoothly and reliably, we have dedicated an entire release to just this topic. We have fixed a number of important issues and improved performance, but all work and no play isn't the way to go either. So, we STILL found some time to sneak a few new features in, particularly the ability to convert files attached to PDFs and translate email and calendar labels for the language of your choice.
We haven't just been looking through our code- we've also be talking with our customers about how they're using the Converter. This has lead to a number of new blog posts suggesting new ways to use the existing features already present in the Converter. For example, you can easily add new conversion types as described in these recent blog posts about adding PCL and XPS to PDF Conversion using free third party software.
For those not familiar with the product, the PDF Converter for SharePoint is a lightweight solution that allows end-users to merge, split, watermark, secure, OCR and convert common document types - including InfoPath, AutoCAD, MSG (email) MS-Office, HTML and images - to PDF as well as other formats from within SharePoint using a friendly user interface, workflows or a web service call without the need to install any client side software or Adobe Acrobat. It integrates at a deep level with SharePoint and leverages facilities such as the Audit log, Nintex Workflow, localisation, security and tracing. It runs on SharePoint 2007, 2010 & 2013 and is available in English, German, Dutch, French, Traditional Chinese and Japanese. For detailed information check out the product page.
Converted Calendar Entry in English and German
In addition to the changes listed above, some of the main changes and additions in the new version are as follows:
2064
CAD
Fix
CAD Converter - Object reference not set to an instance of an object.
Converting List Item Attachments using Merge Facility.
2167
Merging
Fix
Merge operations cannot be executed as PDF/A due to problem with security settings.
2184
MSG
Fix
MSG to PDF - Attachment name not recognised when MSG is exported using ANSI.
2208
MSG
Fix
MSG to PDF - Converting email returns empty PDF.
1898
MSG
Improvement
MSG to PDF - Allow email labels to be translated.
2154
OCR
Fix
OCR Speeds 'fast' and 'rapid' stopped working.
2156
OCR
Fix
OCR - Occasional error under load.
2230
Other
Improvement
Add support for specifying additional output formats such as 'TIFF, PNG, GIF, JPG, PS, BMP, PCL' (This does not include native support for converting to these formats, which requires third party plug ins)