Subscribe to News feed

Converting HTML / Web content to PDF Using SharePoint, C#, Java and PHP

Posted at: 5:41 PM on 21 June 2017 by Muhimbi

HTML5 logo and wordmarkWhen we originally released the Muhimbi PDF Converter (SharePoint on-premise, SharePoint Online, non-SharePoint), our assumption was that the majority of our customers wanted to convert MS-Office content such as Word, Excel, InfoPath, Visio and PowerPoint to PDF. Although that is certainly a common use case, we were surprised by the number of people wanting to convert HTML content, specifically SharePoint pages such as Wikis, publishing pages, Nintex Forms, and the ‘properties page’ for List and Document Library items.

Although this worked very well for a long time, some of our back-end logic leverages Internet Explorer’s internal rendering engine, which has been causing more and more issues over the years while Microsoft – for good reasons – made some fundamental changes to IE’s internal workings. As a result, when using a ‘pre-8.3 release of the converter, you may find that particularly HTML5 pages (e.g. SP2013 and later) are rendered as bitmaps and ‘overflow CSS’ elements are cut-off.

As web pages become increasingly rich, with JavaScript and other clever CSS constructs, we decided to make the main focus of the 8.3 release a completely overhauled HTML Converter. We couldn’t be happier with the results as it makes our converter compatible with the latest and greatest web technologies. HTML 5 content is converted properly, the output intent can be switched between Print and Screen CSS media types and … here’s the kicker ..  SharePoint Online content can be converted as well.
 

The new Converter is part of the 8.3 beta release, contact us for access and mention your platform (Online or on-premise)
 

SharePointConversionOriginal SharePoint Web page (left), converted to PDF using the Screen Media type (middle) and the Print Media type (right)

 
The key features are as follows:

  1. Brand new conversion engine with support for JavaScript, CSS and HTML5.
  2. Support for Print and Screen CSS media types to optimise output for Print / PDF Conversion.
  3. Available to all client technology including Nintex Workflow, Nintex Forms, SharePoint Designer, Flow, Logic Apps, REST & Web Services API as well as the SharePoint User interface.
  4. Enabled by default, with the option to switch back to the legacy HTML Converter using our API or config file.
  5. The MSG and EML Converter uses the new engine by default for HTML based emails.
  6. Support for converting SharePoint Online URLs.
  7. Improved error reporting, including authentication related issues.
  8. Control conversion delay between initially loading the page, including JavaScript rendering, and starting the actual PDF Conversion process.
  9. Modify the ‘View Port’ size to allow responsive web content to output the appropriate version (e.g. mobile or desktop version)

    HTMLConversion-NWWe support it all, Nintex Workflow 2007, 2010, 2013 and 2016

     

    HTMLConversion-SPDSharePoint Designer Workflows are supported as well ranging from SP2007-2016, including Workflow Manager

     

    Caveats

    Please keep in mind that HTML is not the best format for print or PDF Conversion purposes. Although our new HTML Converter is much improved, it is not magic. Depending on your exact needs / system settings you may need to experiment with the various settings. Our support desk staff is very experienced, so contact us if you have any questions or require assistance.

    Some points to take into account:

    1. When converting SharePoint content, it is recommended to disabled the ‘Minimal Download Strategy’ SharePoint Feature as it really gets in the way. (See this article for details). If this is not an option then set the conversion delay (see below) to 1000 (milliseconds).
    2. The Conversion Services will need to authenticate against the page you are looking to convert. The HTML Converter fits in with Windows’ standard security model, so you may need to tweak the server’s internet settings as per this Knowledge Base article. Naturally you have to make sure that the account the Conversion Service runs under has the appropriate privileges to read the page that is being converted.
    3. Modern web based content is VERY complex. It is no longer a couple of HTML elements that make up the design of a page. External JavaScript is loaded, as is third party content in iframes, part of the page is rendered by JavaScript, CSS modifies the look and feel of the page depending on the media type, basically there is no clear point in time for our Converter to start the PDF Conversion process. Our software tries to make the most of it though, and in many cases succeeds using the default settings. For those situations where HTML content is converted too early (e.g. a ‘please wait, loading’ or similar message is displayed in the PDF) it is possible to tweak the Conversion Delay setting. 
    4. The converter has no knowledge of the current user’s browser session. If the user has modified the page (e.g. collapsed / opened certain sections) or has made changes to the page without saving, then the converter will not reflect these changes. When converting HTML to PDF, the Converter requests the specified URL from scratch using the credentials of the account the Conversion Service runs under.

     

    Any questions or comments? Leave a message below or contact our friendly support desk, we love to help.

    .

    Labels: , , , ,

    Need support from experts?

    Access our Forum

    Download Free Trials