Subscribe to News feed

Programmatically Converting and Merging files attached to PDF Documents

Posted at: 13:54 on 17 April 2014 by Muhimbi

One of the cool things you can do when you have a comprehensive PDF Conversion and processing platform such as the Muhimbi PDF Converter Services, is that you can add relatively complex facilities  you didn’t originally envision, with relative ease.

When we originally set out we never thought of converting PDF files to PDF format, why would anyone do that? Well, it turns out there are several good reasons including Converting PDF to PDF/A (or other PDF versions), Changing PDF Viewer preferences or embed strip fonts from a PDF. Starting with version 7.2.1 we are adding another scenario to the mix, which is the ability to convert files attached to PDF Documents.

Similar to emails, a PDF document can have other files attached. Previously we simply ignored these files, but now we actively inspect PDF attachments and offer the option to convert and merge them to the main PDF. Ideal for archiving or printing purposes.

This new facility is accessible from our Web Services interface, see below, as well as SharePoint Designer and Nintex Workflows using our XML Override syntax. Conversion of PDF Attachments can globally be controlled using the Conversion Service’s configuration file by modifying the PDF.ConvertAttachments and PDF.ConvertAttachmentMode keys.
 

ConverterSpecificSettings_PDF 
The syntax is simple. Create a new instance of ConverterSpecificSettings_PDF, set its properties to the appropriate values and assign it to ConversionSettings.ConverterSpecificSettings before kicking off the conversion operation. A brief code example, that can easily be plugged into our standard sample code, can be found below.

ConverterSpecificSettings_PDF csc = new ConverterSpecificSettings_PDF();
csc.ConvertAttachments = true;
csc.ConvertAttachmentMode = PDFConvertAttachmentMode.RemoveSupported;
conversionSettings.ConverterSpecificSettings = csc;

 
The syntax for Java, Ruby and PHP is similar, but the code needs to be adapted to syntax specific to those environments.

 

The possible values for ConverterAttachmentMode are as follows:

  • RemoveAll: When a PDF file is processed, all attachments will be converted and merged to the main PDF. All attachments will be removed from the PDF, including those of attachments for which the file type is not recognised by the converter.
  • RemoveSupported: When a PDF file is processed, all attachments will be converted and merged to the main PDF, but only those attachments that are supported by the converter are removed from the PDF, all other attachments remain present in the main file.

Naturally these values are only used when ConvertertAttachments is set to True.

 

As this behaviour is part of the PDF Conversion Service’s processing pipeline, this new facility can be used in combination with all Merging, Watermarking, OCR, PDF Encryption and PDF/A post processing facilities.

 

Any questions or feedback? Leave a comment in the section below or contact us, we love talking to our customers.

 

.

Labels: , , , , ,

PDF Converter Services 7.2 - Extract text using OCR, MSG Improvements

Posted at: 17:55 on 09 April 2014 by Muhimbi

PDFConverterServicesBox4_thumb3

We are happy to announce version 7.2 of the popular Muhimbi PDF Converter Services. This new release further extends the OCR facility and MSG improvements introduced in the previous version and adds support for extracting text from bitmap based content and rendering of MSG based calendar entries.

A quick introduction for those not familiar with the product: The Muhimbi PDF Converter Services is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java, .NET, Ruby & PHP based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, MSG (email), EML, AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Splitting and Merging facilities, an OCR facility and the ability to secure PDF files. A separate SharePoint specific version is available as well.
 

  Example of a converted Calendar entry with an (OLE) embedded Excel sheet


In addition to the changes listed above, some of the main changes and additions in the new version are as follows:

2100 Excel New Optionally scale Excel to page width & height
2059 HTML Fix System.ArgumentException: uri - string can not be empty
1996 HTML Improvement Reduce white space causing occasional extra empty PDF pages at end of file.
1802 Merging Fix Bookmark targets bottom of page
2093 Merging Fix "Unexpected token Unknown before 107448" while merging file
2078 Merging Fix Kernel Error while loading PDF
2073 Merging Fix System.IndexOutOfRangeException while merging
2074 Merging Fix System.NullReferenceException while merging
2075 Merging Fix System.NullReferenceException while merging
2076 Merging Fix Some HTML Converted files cannot be saved in Acrobat Pro after merging
2126 MSG Fix "System.InvalidOperationException: Stack empty" during conversion of 3rd party generated MSG files
2133 MSG Fix "Parameter is not valid" during conversion of 3rd party generated MSG files
2136 MSG Fix Content missing from converted MSG file
2106 MSG Fix Fixed MSG body for 3rd party generated MSG files
2116 MSG Fix Conversion of MSG files with an attached MSG that is signed
2124 MSG Fix "System.IndexOutOfRangeException" Converting German email
2125 MSG Fix Conversion of email never finishes
2105 MSG Fix "Invalid Compressed RTF header" during conversion of 3rd party generated emails
2090 MSG Fix Extra '}' in body text
2058 MSG Fix No bookmark generated for certain attachments
2056 MSG Fix ‘Sent date' not correct on some 3rd party generated emails
2057 MSG Fix Unicode converter issue (also with EML)
2088 MSG Improvement Add support for attendees to meeting invitations
2086 MSG Improvement Optionally throw error if embedded content is encountered that cannot be converted
2013 MSG Improvement From address shows LDAP path
2046 MSG Improvement Web Service support for MSGConverterFullFidelity.EmailAddressDisplayMode and FromEmailAddressDisplayMode
2087 MSG New Convert the visual representation of embedded objects
2068 MSG New Add support for the conversion of Calendar Entries
2050 MSG New Add config value to allow MSG attachments list to be displayed, even when attachments are disabled
2113 MSG/HTML Fix Rendering error in very long emails / HTML pages
2066 MSG/HTML Fix Sometimes content is truncated on systems running IE9, IE10 or IE11
2005 MSG/HTML Fix Fonts look weird in some emails
1786 OCR Fix Handle leak during OCR
2054 OCR Fix Some Mixed content (MS-Word files with scanned images) does not always OCR
1999 OCR Fix Arabic training data causes exception
1788 OCR Improvement Increase OCR Performance
2089 OCR Improvement Update Diagnostics tool to display OCRed text
2081 OCR Improvement In-line images are recognised but text is not placed on it correctly
1998 OCR Improvement Add support for Hebrew
2048 OCR New Support for extracting text from bitmap based content using OCR
2072 Other New Allow timeouts to be specified on web service call
2102 Watermarking Fix Chinese & Japanese fonts are not displayed in watermarks
2103 Watermarking Fix Watermarking some documents causes problem in Adobe Reader 9

 
For more information check out the following resources:


As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (39MB). .

.

Labels: , , ,

PDF Converter for SharePoint 7.2 - OCR Workflow Activities, MSG Improvements

Posted at: 17:24 on by Muhimbi

PDFBox5

The new features introduced with version 7.1 of the PDF Converter for SharePoint have proven to be popular with our customers. Today we are happy to announce version 7.2, which takes the existing features and elevates them to the next level while staying compatible with all SharePoint versions including SharePoint 2007, 2010 and 2013.

In addition to a number of bug fixes, the main new features are OCR Workflow Actions for SharePoint Designer and Nintex workflow, the ability to extract text from bitmap based content using OCR as well as further improvements to the MSG and EML based converters, specifically in the area of embedded (OLE) content and calendar entries.

 
For those not familiar with the product, the PDF Converter for SharePoint is a lightweight solution that allows end-users to merge, split, watermark, secure, OCR and convert common document types - including InfoPath, AutoCAD, MSG (email) MS-Office, HTML and images - to PDF as well as other formats from within SharePoint using a friendly user interface, workflows or a web service call without the need to install any client side software or Adobe Acrobat. It integrates at a deep level with SharePoint and leverages facilities such as the Audit log, Nintex Workflow, localisation, security and tracing. It runs on SharePoint 2007, 2010 & 2013 and is available in English, German, Dutch, French, Traditional Chinese and Japanese. For detailed information check out the
product page.
 

 
Example of a converted Calendar entry with an (OLE) embedded Excel sheet


In addition to the changes listed above, some of the main changes and additions in the new version are as follows:

2100 Excel New Optionally scale Excel to page width & height
2059 HTML Fix System.ArgumentException: uri - string can not be empty
1996 HTML Improvement Reduce white space causing occasional extra empty PDF pages at end of file.
1802 Merging Fix Bookmark targets bottom of page
2093 Merging Fix "Unexpected token Unknown before 107448" while merging file
2078 Merging Fix Kernel Error while loading PDF
2073 Merging Fix System.IndexOutOfRangeException while merging
2074 Merging Fix System.NullReferenceException while merging
2075 Merging Fix System.NullReferenceException while merging
2076 Merging Fix Some HTML Converted files cannot be saved in Acrobat Pro after merging
2126 MSG Fix "System.InvalidOperationException: Stack empty" during conversion of 3rd party generated MSG files
2133 MSG Fix "Parameter is not valid" during conversion of 3rd party generated MSG files
2136 MSG Fix Content missing from converted MSG file
2106 MSG Fix Fixed MSG body for 3rd party generated MSG files
2116 MSG Fix Conversion of MSG files with an attached MSG that is signed
2124 MSG Fix "System.IndexOutOfRangeException" Converting German email
2125 MSG Fix Conversion of email never finishes
2105 MSG Fix "Invalid Compressed RTF header" during conversion of 3rd party generated emails
2090 MSG Fix Extra '}' in body text
2058 MSG Fix No bookmark generated for certain attachments
2056 MSG Fix ‘Sent date' not correct on some 3rd party generated emails
2057 MSG Fix Unicode converter issue (also with EML)
2088 MSG Improvement Add support for attendees to meeting invitations
2086 MSG Improvement Optionally throw error if embedded content is encountered that cannot be converted
2013 MSG Improvement From address shows LDAP path
2046 MSG Improvement Web Service support for MSGConverterFullFidelity.EmailAddressDisplayMode and FromEmailAddressDisplayMode
2087 MSG New Convert the visual representation of embedded objects
2068 MSG New Add support for the conversion of Calendar Entries
2050 MSG New Add config value to allow MSG attachments list to be displayed, even when attachments are disabled
2113 MSG/HTML Fix Rendering error in very long emails / HTML pages
2066 MSG/HTML Fix Sometimes content is truncated on systems running IE9, IE10 or IE11
2005 MSG/HTML Fix Fonts look weird in some emails
1786 OCR Fix Handle leak during OCR
2054 OCR Fix Some Mixed content (MS-Word files with scanned images) does not always OCR
1999 OCR Fix Arabic training data causes exception
1788 OCR Improvement Increase OCR Performance
2089 OCR Improvement Update Diagnostics tool to display OCRed text
2081 OCR Improvement In-line images are recognised but text is not placed on it correctly
1998 OCR Improvement Add support for Hebrew
1975 OCR New SharePoint Designer OCR Workflow Activity for generating searchable PDFs
1975 OCR New SharePoint Designer OCR Workflow Activity for extracting text from bitmaps
1976 OCR New Nintex Workflow OCR Activity for generating searchable PDFs
1976 OCR New Nintex Workflow OCR Workflow Activity for extracting text from bitmaps
2048 OCR New Support for extracting text from bitmap based content using OCR
2072 Other New Allow timeouts to be specified on web service call
2102 Watermarking Fix Chinese & Japanese fonts are not displayed in watermarks
2103 Watermarking Fix Watermarking some documents causes problem in Adobe Reader 9
2049 Watermarking New Add support for USER_NAME in addition to the existing REMOTE_USER and LOGON_USER in watermarks


For more information check out the following resources:


As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (46MB). .

.

Labels: , , ,

Get Outlook Mail in and out of SharePoint and Convert it to PDF - The Easy Way

Posted at: 15:53 on 03 April 2014 by David Radford

At Muhimbi we’re always looking to add great new features and functionality to our products. At the same time, we need to be careful not to start throwing features in just because they’re cool- a full featured product is great, a schizophrenic one isn’t.

A good example of a feature that doesn’t belong in a conversion product is the transferring of documents in and out of the SharePoint environment. There are so many ways to implement this, that it’d really be its own completely separate product… And it is! Mail2Share from Techtra provides a clean interface to SharePoint from within Outlook, allowing easy SharePoint adoption and integration from the familiar Outlook workspace.

The reasons for storing e-mails as PDFs is not always obvious, but with corporate Document Management Strategies becoming more complex and file formats always changing, there are some clear advantages to this:

  • PDF, particularly PDF/A, is the ideal file format for long term archiving.
  • PDF files can be viewed with a high level of fidelity on mobile devices. For example, if you received an AutoCAD file (dxf, dwg) and want to preview or share it with users that do not have an AutoCAD preview handler, with Mail2Share, you would be able to send the attachment to the configured mount point and receive back a converted version in PDF format.
  • The Muhimbi PDF Converter does not require the installation of a PDF writer on the local machine.

So, how can our PDF Converter and Mail2Share help with all this? Well, as it turns out- quite easily!  Leveraging The PDF Converter’s outstanding e-mail conversion, workflow integration, and watermarking features with Mail2Share’s innovative Outlook connectivity turns complex scenarios like the following into a few simple steps.

 

The scenario:

You have a number of regional offices that need to send in their current sales forecasts- an Excel spreadsheet with the details and then a written summary of the reasoning behind them. When these e-mails arrive, they need to be redirected to various internal groups, but they’re also sensitive and so access needs to be tracked and restricted. How can this all be managed easily in a central manner? How can we do this while also having a single file to move around that contains both the e-mail AND the attachment?


The steps:

  1. Install and configure The Muhimbi PDF Converter for SharePoint following the installation instructions from Chapter 2 of the Administration Guide.
     
  2. Install the Techtra Mail2Share desktop application (there is no server side component to worry about). To configure Mail2Share, just choose your SharePoint server, select the site you want to add libraries from, and then add the libraries you want to see in Outlook (and have SharePoint rights to) and you’re good to go.  
     
    select sharepoint
     
  3. Once that is done, you will have some additional folders in Outlook. To move an e-mail from Outlook to SharePoint, simply drag and drop the selected e-mail to the library you want from the list.

     

    drag and drop with arrow cropped 65 

  4. Once the e-mail arrives in the ‘Incoming Sales Projections’ library, it gets picked-up by a simple SharePoint workflow using our conversion action set to run when new files are created in it. The workflow converts both the body of the e-mail with the reasoning AND the Excel attachment with the details into an easy to manage PDF and then copies it to a different Library.
      workflow cropped

     

  5. The Library the newly created PDF is sent to has our Watermark on Open feature enabled (you might also want to add our PDF Security on Open as well, or instead of this). This watermarks the PDF with the date, location, and username of the person opening the PDF. In this case we have added the following text to the bottom left of every PDF every time it is opened in SharePoint or through Mail2Share.
     

    watermark on open75  

  6. The library is available to specific users in Outlook, using Mail2Share, based on their SharePoint rights.  In this case, the user only has rights to the ‘Outgoing Sales Projections’ library. The user then browses it like any other folder, selects the e-mail and XLS combined PDF, previews it if required, and then simply right clicks to send it as an attachment. The act of downloading it from SharePoint watermarks the PDF in the background and is seamless to the user.
     

    send as attachment75 

  7. Ah, no- that’s it- you’re done!

 

You now can easily convert e-mails (with attachments!), that need to be shared, into PDF, make them available in a central location, and instead of just restricting access- you can track where and when a specific copy of that PDF was created and by who. All automatically, without users needing to navigate into SharePoint or do anything more than drag-and-drop!

This is just a small sample of how Muhimbi’s PDF Converter for SharePoint and Techtra’s Mail2Share applications can work together to facilitate sharing and collaboration for users, while also becoming valuable tools for corporate Document Management Strategies.

 

 

Labels: , , , , , ,

Need support from experts?

Access our Forum

Download Free Trials