Server-Side Table of Contents Generation for PDFs and Merged Files

Related Products

PDF Converter

PDF Converter

Share

Does software get a 7-year itch?  We’re not sure, and since at the time of writing both our Muhimbi PDF Converter for SharePoint and our Muhimbi PDF Converter Services (for Java, PHP, Ruby, and .NET) have been on the market for about 7 years, we didn’t want to find out!

To keep things fresh, we regularly look at actual customer requests to decide what features should be added to the Converter.  One of the more popular ones has been the ability to convert and merge multiple documents into a single PDF- all in one operation.  Although this facility works very well, and includes the ability to generate PDF bookmarks to aid with navigation inside the merged document, it seemed like it could do with a little ‘freshening up’.

The results have been, well, just awesome!  As of version 7.3 of the Converter, we now have the ability to create a real Table of Contents (TOC) page.  This is not a facility that just blindly converts section headings into bookmarks and then puts them at front of the PDF, no, this is a truly feature rich facility that gives you full control over the generated TOC, from start to finish.

Let’s have a look at how this works.

Object Model

In this post the focus is on generating a Table Of Contents via our API. However, if you wish to use this functionality from a SharePoint Workflow, then please continue reading as well. We might have to get a little bit technical, but once you’re familiar with the concepts, you can use our XML based workflow syntax to generate a table of contents from SharePoint Designer, Nintex and K2 based workflows.

The classes relevant to dealing with TOCs are displayed below (click to zoom-in) and are as follows:

TOC Diagram

  • MergeSettings: When merging multiple files and generating a single table of contents, follow the normal procedure for merging files ( sample code) and populate the MergeSettings.TOCSettings property as per the sample code below.
  • ConversionSettings: To generate a table of contents for a single document, follow the normal procedure for converting or processing a single file ( sample code) and populate ConversionSettings.TOCSettings as per the sample code below.
  • TOCSettings: All settings related to the generation of the table of contents can be found in this class. The available properties are as follows:
    • Bookmark: The TOC itself can have its own PDF bookmark to aid with navigation. Specify the text in this property.

    • Location: TOCs can be added to the Front or Back of the document. Enter the relevant option here.

    • MinimumEntries: For certain, simple, documents that only have one or 2 bookmarks, it may not make sense to add a table of contents. Specify the minimum number of entries here before a TOC is generated. The default value is ‘0’, which will always create a TOC regardless of the number of entries.

    • PageMargins: Page margins in the format set out below. It defaults to a uniform half inch margin.

      "#{dim}" - for a uniform margin or

      "#{dim},#{dim},#{dim},#{dim}" - for individual margins

      where

      - # is numeric value

      - {dim} is dimension. Either empty (meaning inches) or "mm", "in", "in.", "inch" or "inches".

    • PageOrientation: The orientation used by the TOC. Portrait, Landscape or Default. The Default option uses the same orientation as the page following (or preceding) the TOC depending on the value specified in Location.

    • PaperSize: A named paper size such as A4 or Letter (See MSDN) or a custom size in "{width}{dim}{sep}{height}{dim}" format where:

      - {width} and {height} are numerical values (please use a colon '.' as the decimal separator) .

      - {dim} is the dimension which can be 'mm', 'in.' or 'inches'. (It defaults to inches when nothing is specified)

      - {sep} separates the width and the height, either 'by', comma (,) or the letter 'x' Example: "8.5 in. by 6 in."

    • Properties: Optional properties to pass to the XSL template for display or processing purposes. For details see below.

    • Template:

The XSL template (details below) to use for formatting purposes. This can either be a string containing all the XSL, a path - local to the server running the conversion service - to the location of the XSL file, or a URL to the XSL file on a web (or SharePoint) server. NameValuePair: A single value that can be passed into the XSL using TOCSettings.Properties. TOCLocation: Used by TOCSettings.Location to determine where the TOC should go. BookmarkGenerationOption: As explained under the XML Source Data section below, the TOC system is based on the content and structure of PDF Bookmarks. It is therefore essential that during the conversion of the source documents ConversionSettings.GenerateBookmarks is set to Automatic.

Based on the previously described list of classes and properties, adding a TOC may sound complex, but nothing could be further from the truth. The easiest way to get started is to take our sample code, add the following code and then pass tocSettings into either ConversionSettings.TOCSettings or MergeSettings.TOCSettings.

You can certainly use your own code, and even pass the tocSettings to both ConversionSettings.TOCSettings AND MergeSettings.TOCSettings to generate TOCs for each individual document in a merge operation, and then add an overall TOC for the entire merged document.

The big question is what to specify in the Template property. Read on for details.

XML Source Data

To determine what entries to include in the TOC, the conversion service looks at the PDF Bookmarks present in the PDF. If the source file is not already in PDF format, it will be converted to PDF and – where possible – generate PDF bookmarks based on the internal structure of the document. For example, when converting an MS-Word file the various headings determine the structure of the PDF Bookmarks.

Although in most cases it is not important for our customers to have any knowledge about the internals of the Muhimbi Conversion Service, in this particular case – and by design - it is. Internally, an XML document is generated that represents the content and structure of the PDF Bookmarks, this XML document is then transformed using XSL into HTML. It is this HTML – the language that underpins every website on the internet – that determines the formatting of the TOC. Developers have full control over the XSL, providing an enormous amount of flexibility.

Let’s take our comprehensive Administration Guide as an example. When converted to PDF a set of nested PDF bookmarks is generated, which internally generates the following XML.

Generated XML

(Ignore the occasional text encoding artefacts, it is caused by our blogging software, generated TOCs do not have this problem)

The generated XML is fairly straight forward, a number of nested topic elements make up the structure. Each element has a descriptive title attribute, a level attribute (which matches the nesting level), a page attribute containing the page number, and a target attribute which is used for internal processing purposes.

Please note: All page numbers in the TOC reflect the physical page number of that page in the generated PDF, including the addition of the TOC page itself. If the source document(s) already contain page numbers, then these may no longer be the same as the page number listed in the TOC or their actual page number in the generated PDF. If you wish to change the page numbers displayed in the footer of a document then please use our watermarking facilities.

The list of topic elements is followed by a properties section (Line 84 onwards). This section, and its contents, consists of a number of optional values that may have been passed into the request. This allows, for example, the addition of information to the TOC to display the document's status, author, title or any other kind of information. In this example we are passing in the title of the document.

XSL Transformation

Although the XML document’s content may differ between requests, the structure is always the same. As a result we can use the XSL industry standard to convert the XML into an attractive looking HTML document. Although XSL may look daunting to the uninitiated, the following sample ( download) is a good starting point and can be amended to suit your particular needs (or used as is).

Although this is a standard XSL file, the following sections are of particular interest:

  • Lines 12-56: Standard HTML CSS style sheet which controls the look of the generated HTML.
  • Line 60: Insert a custom property passed into the conversion request. In our example the document’s title.
  • Line 76: An empty template for the properties element to prevent this information from being displayed as a plain list.
  • Lines 78-97: XSL template for generating HTML associated with all Level 0 topics. If you wish to control the generated HTML for a specific level then copy the topic[@level=’0’] template and change the level number to match to appropriate nesting level.
  • Lines 99-118: XSL Template for all topic levels that do not have an explicit template defined.

Not sure what is going on and this all looks like a load of gobbledygook? Don’t worry! Just use the XSL sample provided above and use it in your project, as you can see in the following screenshot, it looks pretty good..

Sample TOC

Testing & Troubleshooting

Although not the most attractive looking application ever devised, the PDF Converter comes with a handy Diagnostics Tool (including full source code) to test the Table Of Contents facility. While this might be merely a handy test tool, not the official user interface for the TOC facility, it can be incredibly helpful in quickly testing various XSL template designs before integrating them into your solution..

Diagnostics Tool

To test the XSL and TOC output, enable the Table of Content as per the screenshot above, modify the XSL template if needed, specify any optional properties, select a file or folder in the WS Convert tab and choose either the Convert or Merge button.

Any questions? Please leave a comment below or contact us directly.

Labels: Articles, Merging, News, pdf, PDF Converter, PDF Converter Services

Have a Question?
We’re Always Happy to Help.

© Muhimbi Ltd. 2008 - 2024
This website uses cookies to ensure you get the best experience. Learn more