Subscribe to News feed

Programmatically Converting Web / HTML pages to PDF format

Posted at: 3:12 PM on 19 August 2010 by Muhimbi

html

As part of our on-going series about new features in the PDF Converter for SharePoint 4.0 and PDF Converter Services, we would like to showcase our exciting new HTML to PDF conversion functionality.

Please note that this article mentions SharePoint as well as .NET a number of times. Rest assured that, as the PDF Converter Services is Web Services based, it works just as well from Java, C#, Ruby and other web services capable environments.

We anticipate that most of our customers will use this functionality to convert SharePoint pages, including lists, to PDF format. However, rather than displaying a boring old SharePoint site, let’s show how well this works with a real website, in this case one of our landing pages.

UPDATE: A workflow activity is now available as well for converting HTML to PDF as is an update for the SharePoint User interface to convert SharePoint pages to PDF format.

The following image shows the original HTML page on the left hand side and the converted PDF file on the right. As you can see this works very well.
 

HTML-to-PDF Example of the original web page (left) and the converted PDF file (right)

 
A summary of the new HTML features are as follows. Although this new functionality is available in both the PDF Converter Services as well as the PDF Converter for SharePoint, some of the more SharePoint centric features in the list are obviously exclusive to the SharePoint version.

  1. Built on top of Muhimbi’s rock solid service platform. No need to worry about runaway or orphaned processes. Everything is nicely controlled and scales over multiple CPUs and conversion servers.
     
  2. Integrates with the full feature set of Muhimbi’s PDF Conversion platform including full control over watermarks as well as PDF Security settings.
     
  3. High fidelity conversion (See image above) including multi page documents and JavaScript output. The generated PDF file contains real (searchable) text and is not just a low resolution screenshot of the converted web page.
     
  4. Supports conversion by URL as well as manually specified HTML fragments. Ideal for creating PDF based reports using generated HTML tables.
     
  5. Convert HTML documents stored inside SharePoint document libraries.
     
  6. Convert SharePoint pages to PDF format from the user’s Personal Actions menu.
     
  7. Convert web pages to PDF format from SharePoint workflows. Works great in combination with publishing sites.

 

HTML to PDF Conversion is accessible via the web services based interface as well. Listed below is a simple C# example of how to carry out a conversion from your own code. The code is not complete as it calls into some shared functions from our main C# example to keep things short.

Our existing Java based examples can easily be extended to carry out the same type of conversions. Contact us if you need a hand, we love to help and are very responsive.
 

/// <summary>
/// Simple sample to convert either a URL or HTML code fragment to PDF format
/// </summary>
/// <param name="htmlOnly">A flag indicating if an HTML Code fragment (true)
/// or URL (false) should be converted.</param>
private void ConvertHTML(bool htmlOnly)
{
    DocumentConverterServiceClient client = null;
 
    try
    {
        string sourceFileName = null;
        byte[] sourceFile = null;
 
        client = OpenService("http://localhost:41734/Muhimbi.DocumentConverter.WebService/");
 
        OpenOptions openOptions = new OpenOptions();
 
        //** Specify optional authentication settings for the web page
        openOptions.UserName = "";
        openOptions.Password = "";
 
        if (htmlOnly == true)
        {
            //** Specify the HTML to convert
            sourceFile = System.Text.Encoding.UTF8.GetBytes("Hello <b>world</b>");
        }
        else
        {
            // ** Specify the URL to convert
            openOptions.OriginalFileName = "http://www.muhimbi.com/";
        }
        openOptions.FileExtension = "html";
        //** Generate a temp file name that is later used to write the PDF to
        sourceFileName = Path.GetTempFileName();
        File.Delete(sourceFileName);
 
        // ** Enable JavaScript on the page to convert. 
        openOptions.AllowMacros = MacroSecurityOption.All;
 
        // ** Set the various conversion settings
        ConversionSettings conversionSettings = new ConversionSettings();
        conversionSettings.Fidelity = ConversionFidelities.Full;
        conversionSettings.PDFProfile = PDFProfile.PDF_1_5;
        conversionSettings.PageOrientation = PageOrientation.Portrait;
        conversionSettings.Quality = ConversionQuality.OptimizeForPrint;
 
        // ** Carry out the actual conversion
        byte[] convertedFile = client.Convert(sourceFile, openOptions, conversionSettings);
 
        // ** Write the PDF file to the local file system.
        string destinationFileName = Path.GetDirectoryName(sourceFileName) + @"\" + 
                                            Path.GetFileNameWithoutExtension(sourceFileName) + 
                                            "." + conversionSettings.Format;
        using (FileStream fs = File.Create(destinationFileName))
        {
            fs.Write(convertedFile, 0, convertedFile.Length);
            fs.Close();
        }
 
        // ** Display the converted file in a PDF viewer.
        NavigateBrowser(destinationFileName);
    }
    finally
    {
        CloseService(client);
    }
}

All in all some pretty exciting functionality. Don’t hesitate to leave a comment below if you have any questions or contact us to discuss any of our products.

.

Labels: , , , , , ,

5 Comments:

  • I tried the sample code on this page and had problems getting Infopath forms to render using Forms Server. Just running the URL path to the XML file through the Muhimbi Convert function did not work for me. The PDF files were blank. After some trial and error I found that Sharepoint normally redirects calls to an Infopath XML file to the forms server so that the final URL is something like http://www.myserver.com/sites/thissite/_layouts/FormServer.aspx?XmlLocation=/sites/thissite/MyFormsLibrary/MyInfoPathFile.xml@DefaultItemOpen=1 . When sending that URL to the Muhimbi Convert function rather than a direct call to http://www.myserver.com/sites/thissite/MyFormsLibrary/MyInfoPathFile.xml the InfoPath form was successfully rendered and converted to PDF.

    By Anonymous Anonymous, At 02 September, 2011 16:00  

  • It is good to hear you have managed to resolve the problem, but the link provided in our blog post should work correctly, see http://blog.muhimbi.com/2010/08/using-sharepoint-forms-services-to.html

    By Blogger Muhimbi, At 07 September, 2011 11:14  

  • According to this post, using Muhimbi to convert an HTML file to PDF results in a:

    "High fidelity conversion...The generated PDF file contains real (searchable) text and is not just a low resolution screenshot of the converted web page."

    However, in my experience, I am seeing just the opposite. In other words, the converted PDF is simply an image of the original HTML that is *not* searchable and also appears rather grainy.

    I used the Muhimbi-provided sample code to convert http://www.muhimbi.com/default.aspx and the resulting PDF appears grainy too.

    Please let me know how to convert HTML to PDF and preserve the full fidelity of the source document.

    Thanks.

    By Anonymous Jeremy Jameson, At 20 April, 2012 23:31  

  • Hi Jeremy,

    It sounds like IE9 is installed on your conversion server. Microsoft has made some changes in IE9 that cause this ‘output as bitmap’ behaviour. I will email a workaround to you that solves this problem.

    This problem will be permanently fixed in the upcoming 6.0 release of our software.

    By Blogger Muhimbi, At 23 April, 2012 09:41  

  • Thanks, Jeroen.

    The workaround you provided results in "actual text" in the PDF (instead of an image). There's a minor issue with hyperlinks (the bottom part of the link text is cut off and the links are not functional), but that won't be a showstopper (at least, not for now).

    Wonderful customer service -- as usual. Much appreciated.

    By Anonymous Jeremy Jameson, At 23 April, 2012 15:32  

Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link