Subscribe to News feed

Server Side Table Of Contents generation for PDFs and Merged files

Posted at: 11:49 AM on 21 January 2015 by Muhimbi

Sample TOCDoes software get a 7-year itch?  We’re not sure, and since at the time of writing both our Muhimbi PDF Converter for SharePoint and our Muhimbi PDF Converter Services (for Java, PHP, Ruby, and .NET) have been on the market for about 7 years, we didn’t want to find out!

To keep things fresh, we regularly look at actual customer requests to decide what features should be added to the Converter.  One of the more popular ones has been the ability to convert and merge multiple documents into a single PDF- all in one operation.  Although this facility works very well, and includes the ability to generate PDF bookmarks to aid with navigation inside the merged document, it seemed like it could do with a little ‘freshening up’.

The results have been, well, just awesome!  As of version 7.3 of the Converter, we now have the ability to create a real Table of Contents (TOC) page.  This is not a facility that just blindly converts section headings into bookmarks and then puts them at front of the PDF, no, this is a truly feature rich facility that gives you full control over the generated TOC, from start to finish.

Let’s have a look at how this works.

 

Object Model

In this post the focus is on generating a Table Of Contents via our API. However, if you wish to use this functionality from a SharePoint Workflow, then please continue reading as well. We might have to get a little bit technical, but once you’re familiar with the concepts, you can use our XML based workflow syntax to generate a table of contents from SharePoint Designer, Nintex and K2 based workflows.

The classes relevant to dealing with TOCs are displayed below (click to zoom-in) and are as follows:

TOC Diagram

  • MergeSettings: When merging multiple files and generating a single table of contents, follow the normal procedure for merging files (sample code) and populate the MergeSettings.TOCSettings property as per the sample code below.
  • ConversionSettings: To generate a table of contents for a single document, follow the normal procedure for converting or processing a single file (sample code) and populate ConversionSettings.TOCSettings as per the sample code below.
  • TOCSettings: All settings related to the generation of the table of contents can be found in this class. The available properties are as follows:
    • Bookmark: The TOC itself can have its own PDF bookmark to aid with navigation. Specify the text in this property.
    • Location: TOCs can be added to the Front or Back of the document. Enter the relevant option here.
    • MinimumEntries: For certain, simple, documents that only have one or 2 bookmarks, it may not make sense to add a table of contents. Specify the minimum number of entries here before a TOC is generated. The default value is ‘0’, which will always create a TOC regardless of the number of entries.
    • PageMargins: Page margins in the format set out below. It defaults to a uniform half inch margin. 
          "#{dim}" - for a uniform margin or
          "#{dim},#{dim},#{dim},#{dim}" - for individual margins
      where
         - # is numeric value 
         - {dim} is dimension. Either empty (meaning inches) or "mm", "in", "in.", "inch" or "inches".
    • PageOrientation: The orientation used by the TOC. Portrait, Landscape or Default. The Default option uses the same orientation as the page following (or preceding) the TOC depending on the value specified in Location.
    • PaperSize: A named paper size such as A4 or Letter (See MSDN) or a custom size in "{width}{dim}{sep}{height}{dim}" format where:
          - {width} and {height} are numerical values (please use a colon '.' as the decimal separator) .
          - {dim} is the dimension which can be 'mm', 'in.' or 'inches'. (It defaults to inches when nothing is specified)
          - {sep} separates the width and the height, either 'by', comma (,) or the letter 'x' Example: "8.5 in. by 6 in."
    • Properties: Optional properties to pass to the XSL template for display or processing purposes. For details see below.
    • Template: The XSL template (details below) to use for formatting purposes. This can either be a string containing all the XSL, a path - local to the server running the conversion service - to the location of the XSL file, or a URL to the XSL file on a web (or SharePoint) server.
  • NameValuePair: A single value that can be passed into the XSL using TOCSettings.Properties.
  • TOCLocation: Used by TOCSettings.Location to determine where the TOC should go.
  • BookmarkGenerationOption: As explained under the XML Source Data section below, the TOC system is based on the content and structure of PDF Bookmarks. It is therefore essential that during the conversion of the source documents ConversionSettings.GenerateBookmarks is set to Automatic.

 

Based on the previously described list of classes and properties, adding a TOC may sound complex, but nothing could be further from the truth. The easiest way to get started is to take our sample code, add the following code and then pass tocSettings into either ConversionSettings.TOCSettings or MergeSettings.TOCSettings.

//** Create any custom properties that need to be passed into the TOC.
NameValuePair[] properties = new NameValuePair[2];
properties[0] = new NameValuePair() { Name = "title", Value = "Development Guide" };
properties[1] = new NameValuePair() { Name = "status", Value = "Draft" };
 
// ** Specify the various TOCSettings
TOCSettings tocSettings = new TOCSettings
{
    MinimumEntries = 0,
    Bookmark = "Table Of Contents",
    Location = TOCLocation.Front,
    Properties = properties,
    Template = @"C:\templates\toc.xsl",
};
 
// ** Pass the TOC Settings into the conversion
conversionSettings.TOCSettings = tocSettings;

 
You can certainly use your own code, and even pass the tocSettings to both ConversionSettings.TOCSettings AND MergeSettings.TOCSettings to generate TOCs for each individual document in a merge operation, and then add an overall TOC for the entire merged document.

The big question is what to specify in the Template property. Read on for details.

 

XML Source Data

To determine what entries to include in the TOC, the conversion service looks at the PDF Bookmarks present in the PDF. If the source file is not already in PDF format, it will be converted to PDF and – where possible – generate PDF bookmarks based on the internal structure of the document. For example, when converting an MS-Word file the various headings determine the structure of the PDF Bookmarks.

Although in most cases it is not important for our customers to have any knowledge about the internals of the Muhimbi Conversion Service, in this particular case – and by design - it is. Internally, an XML document is generated that represents the content and structure of the PDF Bookmarks, this XML document is then transformed using XSL into HTML. It is this HTML – the language that underpins every website on the internet – that determines the formatting of the TOC. Developers have full control over the XSL, providing an enormous amount of flexibility.

Let’s take our comprehensive Administration Guide as an example. When converted to PDF a set of nested PDF bookmarks is generated, which internally generates the following XML.

1 <?xml version="1.0" encoding="utf-8"?> 2 <toc> 3 <topics> 4 <topic title="Administration Guide - Contents" target="muhimbi:60162e96-3f60-45e6-ad04-e7b01bdf5174" level="0" page="1" /> 5 <topic title="1 Introduction" target="muhimbi:a2119755-cc8e-448e-95c8-18b5d685ae9f" level="0" page="8"> 6 <topic title="1.1 Relevant articles on the Muhimbi Blog" target="muhimbi:69dfdada-a887-44f3-b951-ce6952899737" level="1" page="9" /> 7 <topic title="1.2 Prerequisites" target="muhimbi:77b4d393-1a66-4e86-9e38-22b623d2bdb6" level="1" page="10" /> 8 <topic title="1.3 Solution architecture" target="muhimbi:6793ecf6-e589-4f6e-9f4e-647ca4bc88ec" level="1" page="11" /> 9 </topic> 10 <topic title="2 Deployment" target="muhimbi:901d9df3-1bac-4dd9-83cf-77d9d87e548f" level="0" page="12"> 11 <topic title="2.1 Quick start" target="muhimbi:1616a20f-3347-4b00-8e1e-7e841b2635fc" level="1" page="12"> 12 <topic title="2.1.1 Installing the Windows Service" target="muhimbi:051c054d-e570-46b0-94fc-9cbf1d8bfa79" level="2" page="12" /> 13 <topic title="2.1.2 Installing the SharePoint Front End &amp; Workflow actions" target="muhimbi:333eb6c7-b10f-4a3b-ba12-836020ea8fed" level="2" page="13" /> 14 </topic> 15 <topic title="2.2 Detailed Installation instructions" target="muhimbi:8ba96044-8d42-4cd1-8a71-5e010051fac2" level="1" page="14"> 16 <topic title="2.2.1 Installing the Windows Service" target="muhimbi:0872834c-1d40-4de0-a65f-45e01d150ced" level="2" page="14" /> 17 <topic title="2.2.2 Installing the SharePoint Front End" target="muhimbi:a8168468-9917-4162-95fd-2e2d6bba59e6" level="2" page="15" /> 18 <topic title="2.2.3 Feature Activation / Deactivation" target="muhimbi:f76a0716-0551-4999-8fbf-be0ff1e72d0b" level="2" page="16" /> 19 <topic title="2.2.4 Installing the License" target="muhimbi:19795149-0e18-4f32-a00a-b75142a47bae" level="2" page="17" /> 20 </topic> 21 <topic title="2.3 Post Installation configuration" target="muhimbi:6153f30e-3d5f-4206-888c-57fff52b7994" level="1" page="18"> 22 <topic title="2.3.1 Enabling converters / Specifying location of Conversion Service" target="muhimbi:ab558ca8-ee42-4049-be82-fd5ea2b5dc57" level="2" page="18" /> 23 <topic title="2.3.2 Tuning the Document Conversion service" target="muhimbi:4334943e-4628-4fce-b019-aab9f0eaccb1" level="2" page="19" /> 24 </topic> 25 <topic title="2.4 Un-installation" target="muhimbi:0f338760-7386-41af-826d-409b21534d4a" level="1" page="32"> 26 <topic title="2.4.1 Un-installing the Document converter service" target="muhimbi:e4ba3c70-57d1-4daa-8d4a-f70edb7f0219" level="2" page="32" /> 27 <topic title="2.4.2 Un-installing the SharePoint Front End via the command line" target="muhimbi:4a840f2a-85d0-4d59-94f5-5a6852b8b3c4" level="2" page="32" /> 28 <topic title="2.4.3 Un-installing the SharePoint Front End via Central Administration" target="muhimbi:a2868d43-2f80-472e-976b-ed5a7cd7c135" level="2" page="33" /> 29 </topic> 30 <topic title="2.5 Upgrading from a previous version" target="muhimbi:4af4be9f-bad0-4711-b59c-c2268c312a10" level="1" page="33" /> 31 </topic> 32 <topic title="3 Troubleshooting &amp; Other common tasks" target="muhimbi:886a480c-64bf-4ab6-9fef-8310be060613" level="0" page="35"> 33 <topic title="3.1 Windows Event Log" target="muhimbi:44ec6735-2571-4f36-be84-d2dd4a4a7754" level="1" page="35" /> 34 <topic title="3.2 SharePoint Trace Log" target="muhimbi:4dfb8185-2639-4314-9103-419a0a13f656" level="1" page="35" /> 35 <topic title="3.3 Document Converter Trace Log" target="muhimbi:0d1b2b06-34df-40bd-a8f5-443357cdc13a" level="1" page="36" /> 36 <topic title="3.4 SharePoint audit log" target="muhimbi:630bf1fc-73e8-4a05-b50b-399a578443ed" level="1" page="36" /> 37 <topic title="3.5 Common issues &amp; Errors" target="muhimbi:15f62df4-a947-4a67-9f2b-224fc62e9194" level="1" page="36"> 38 <topic title="3.5.1 Your account is not allowed to deploy SharePoint Solutions" target="muhimbi:4e969499-5845-4c5b-94bc-811329c65f14" level="2" page="36" /> 39 <topic title="3.5.2 Errors on newly added servers" target="muhimbi:4a67b0d0-d10a-46e0-a41f-3474f83dd8e8" level="2" page="36" /> 40 <topic title="3.5.3 An evaluation message is displayed in the UI and converted documents" target="muhimbi:2ca37697-8e8e-4770-aa6d-aa0aff195b4d" level="2" page="37" /> 41 <topic title="3.5.4 ‘Unknown Error’ or ‘resource object not found’" target="muhimbi:33dcea86-ea14-4493-9e9a-7cb16d60454a" level="2" page="38" /> 42 <topic title="3.5.5 Documents using non standard fonts (e.g. Japanese) are not converted properly / The fonts in the destination document are not correct" target="muhimbi:5e585578-b274-4f37-8fbf-2a06e28779da" level="2" page="39" /> 43 <topic title="3.5.6 Error messages related to printer drivers or the printer spooler are logged" target="muhimbi:8a0d2155-02fe-4dec-b660-9b7166f16b93" level="2" page="39" /> 44 <topic title="3.5.7 The PDF Converter functionality is not visible in a Document Library" target="muhimbi:579e7d7e-939d-4704-aa0c-b52e8ae26fae" level="2" page="40" /> 45 <topic title="3.5.8 Problems converting InfoPath forms without a shared XSN file" target="muhimbi:a2a42c5a-4b06-4413-900d-2eb67756c2e7" level="2" page="40" /> 46 <topic title="3.5.9 The ‘Convert to PDF’ context menu is displayed twice" target="muhimbi:f50c69f4-8cad-487a-a1f3-a3143d4cdabc" level="2" page="41" /> 47 <topic title="3.5.10 InfoPath forms using Ink controls fail to convert" target="muhimbi:2eaa219e-a7b7-42c8-ac9a-832bd62e9e3e" level="2" page="41" /> 48 <topic title="3.5.11 Error 403 (Forbidden) when converting InfoPath forms" target="muhimbi:3f1b9637-483e-4fe0-8f95-320e8b0560e3" level="2" page="41" /> 49 <topic title="3.5.12 InfoPath files are converted using an old version of the XSN template" target="muhimbi:9bad33c2-8b32-4d8b-86ae-7c2456aef741" level="2" page="41" /> 50 <topic title="3.5.13 Nintex Workflow Activities are not working as expected after upgrading" target="muhimbi:324dcada-0548-4177-bc06-c3ff4d9d9e53" level="2" page="42" /> 51 <topic title="3.5.14 Event Manager error after uninstallation" target="muhimbi:9d4087b5-41a9-4b3a-9d3f-7520e9dca303" level="2" page="42" /> 52 <topic title="3.5.15 Files uploaded via Windows Explorer do not trigger ‘Insert’ watermarks" target="muhimbi:0cb8fcf4-c797-4f00-a2e8-196fc8b39f9b" level="2" page="42" /> 53 <topic title="3.5.16 ‘Watermark on Open’ does not show watermarks" target="muhimbi:25d56426-b43f-480b-9a05-12d6765d84cc" level="2" page="42" /> 54 <topic title="3.5.17 Problems with HTML to PDF Conversion of SharePoint 2010 pages" target="muhimbi:b5e9b07b-7cd9-4da6-bce7-73f3710a4857" level="2" page="43" /> 55 <topic title="3.5.18 Changing the default bookmark and sort fields when merging files" target="muhimbi:76ff6ccc-fa0b-4817-8da0-594e289440db" level="2" page="43" /> 56 <topic title="3.5.19 Deploying the Conversion Service on Windows Server 2012 and later" target="muhimbi:9f985db8-b0be-4709-8850-9f9576d43bb5" level="2" page="43" /> 57 </topic> 58 </topic> 59 <topic title="Appendix - Installing converter dependencies" target="muhimbi:081c9e2e-6d32-4ec7-9db2-d707bac33528" level="0" page="44"> 60 <topic title="Supported formats &amp; their dependencies" target="muhimbi:3378d431-8157-40c8-9f0a-f2ae596b662b" level="1" page="44" /> 61 <topic title="Using Office 2007" target="muhimbi:e104ce16-8435-46b9-b984-6cf01ade7505" level="1" page="44" /> 62 <topic title="Using Office 2010 / 2013" target="muhimbi:62255d08-d241-4d2a-a999-d10c4d5ebbb2" level="1" page="45" /> 63 </topic> 64 <topic title="Appendix - Using InfoPath with External Data Sources" target="muhimbi:4786bed5-ca2d-4c62-8867-9d5d81451c57" level="0" page="46"> 65 <topic title="Details for InfoPath 2007" target="muhimbi:030c36b7-a323-4dd1-b3d8-ee17a7c52ce8" level="1" page="46" /> 66 <topic title="Details for InfoPath 2010 &amp; 2013" target="muhimbi:1869d607-03ac-49e1-8f97-e17d2f2da27d" level="1" page="48"> 67 <topic title="3.5.20 Digitally signing forms" target="muhimbi:4f7e14e8-ee87-4fdc-94a1-15cc4cf413c8" level="2" page="48" /> 68 <topic title="3.5.21 Using Muhimbi’s ‘AutoTrustForms’ feature" target="muhimbi:d5a3842b-99f0-4f13-84f6-3b020ef85eba" level="2" page="48" /> 69 </topic> 70 </topic> 71 <topic title="Appendix - Post processing PDF output to PDF/A" target="muhimbi:0e86a1c2-de45-4aa6-8985-6c9c03eed681" level="0" page="50" /> 72 <topic title="Appendix - Unattended (un)installation" target="muhimbi:2bccd9eb-2443-4e2a-b12e-4bd953736c01" level="0" page="52"> 73 <topic title="Installation" target="muhimbi:9aff89ed-ab3e-4db9-bc7f-e6d024018e7b" level="1" page="52" /> 74 <topic title="Uninstallation" target="muhimbi:f1e17053-31bc-4c83-950b-788215e524f2" level="1" page="52" /> 75 <topic title="Upgrading" target="muhimbi:8604e3a4-357b-4096-abbc-87fce487d979" level="1" page="52" /> 76 </topic> 77 <topic title="Appendix - Advanced Deployment Scenarios" target="muhimbi:16dcc443-1bee-45c0-b2db-4a8f9589a525" level="0" page="53" /> 78 <topic title="Appendix - Using Word Automation Services" target="muhimbi:1f437a13-c848-4cd5-82e3-7ea0fb2c0db6" level="0" page="57" /> 79 <topic title="Appendix - STSADM Commands" target="muhimbi:bf4ebe1f-e32f-4946-946f-95ec0ea1d777" level="0" page="61" /> 80 <topic title="Appendix - Creating Custom Converters" target="muhimbi:e00a133a-209a-4d42-9028-688270d15320" level="0" page="62" /> 81 <topic title="Appendix - Invoke 3rd party Converters" target="muhimbi:0ad6b377-679e-4067-a805-0528899c6e23" level="0" page="67" /> 82 <topic title="Appendix - Licensing" target="muhimbi:256bcd24-0697-40ee-bfa9-062e1e6961ac" level="0" page="69" /> 83 </topics> 84 <properties> 85 <property name="title">Document with Generated TOC</property> 86 </properties> 87 </toc>

Generated XML
(Ignore the occasional text encoding artefacts, it is caused by our blogging software, generated TOCs do not have this problem)

The generated XML is fairly straight forward, a number of nested topic elements make up the structure. Each element has a descriptive title attribute, a level attribute (which matches the nesting level), a page attribute containing the page number, and a target attribute which is used for internal processing purposes.

Please note: All page numbers in the TOC reflect the physical page number of that page in the generated PDF, including the addition of the TOC page itself. If the source document(s) already contain page numbers, then these may no longer be the same as the page number listed in the TOC or their actual page number in the generated PDF. If you wish to change the page numbers displayed in the footer of a document then please use our watermarking facilities.

The list of topic elements is followed by a properties section (Line 84 onwards). This section, and its contents, consists of a number of optional values that may have been passed into the request. This allows, for example, the addition of information to the TOC to display the document's status, author, title or any other kind of information. In this example we are passing in the title of the document.

 

XSL Transformation

Although the XML document’s content may differ between requests, the structure is always the same. As a result we can use the XSL industry standard to convert the XML into an attractive looking HTML document. Although XSL may look daunting to the uninitiated, the following sample (download) is a good starting point and can be amended to suit your particular needs (or used as is).
 

1 <?xml version="1.0" encoding="utf-8"?> 2 <xsl:stylesheet version="1.0" 3 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 4 xmlns:msxsl="urn:schemas-microsoft-com:xslt" 5 exclude-result-prefixes="msxsl"> 6 7 <xsl:output method="html" indent="yes"/> 8 9 <xsl:template match="/toc"> 10 <html> 11 <head> 12 <style type="text/css"> 13 ul.toc 14 { 15 margin: 0; 16 padding: 0; 17 list-style: none; 18 } 19 ol.toc 20 { 21 margin: 0; 22 padding: 0; 23 margin-left: 10px; 24 list-style: none; 25 } 26 ul.toc li 27 { 28 clear: both; 29 overflow: hidden; 30 } 31 ol.toc li 32 { 33 overflow: hidden; 34 } 35 span.title 36 { 37 float: left; 38 padding-right: 4px; 39 } 40 span.page 41 { 42 float: right; 43 padding-left: 4px; 44 } 45 span.dots 46 { 47 font-size: 0px; 48 width:100%; 49 border-bottom: 2px dotted black; 50 } 51 a.toc 52 { 53 text-decoration: none; 54 color: #000; 55 } 56 </style> 57 </head> 58 <body> 59 <h1> 60 <xsl:value-of select="properties/property[@name='title']"/> 61 </h1> 62 <br/> 63 <br/> 64 <xsl:apply-templates/> 65 </body> 66 </html> 67 </xsl:template> 68 69 <xsl:template match="topics"> 70 <ul class="toc"> 71 <xsl:apply-templates/> 72 </ul> 73 </xsl:template> 74 75 <!-- Empty template so properties are not appearing --> 76 <xsl:template match="properties"></xsl:template> 77 78 <xsl:template match="topic[@level='0']"> 79 <li> 80 <xsl:element name="a"> 81 <xsl:attribute name="href"> 82 <xsl:value-of select="@target"/> 83 </xsl:attribute> 84 <xsl:attribute name="class">toc</xsl:attribute> 85 <span class="title" style="font-weight: 900;"> 86 <xsl:value-of select="@title"/> 87 </span> 88 <span class="page"> 89 <xsl:value-of select="@page"/> 90 </span> 91 <span class="dots"></span> 92 </xsl:element> 93 </li> 94 <ol class="toc"> 95 <xsl:apply-templates/> 96 </ol> 97 </xsl:template> 98 99 <xsl:template match="topic"> 100 <li> 101 <xsl:element name="a"> 102 <xsl:attribute name="href"> 103 <xsl:value-of select="@target"/> 104 </xsl:attribute> 105 <xsl:attribute name="class">toc</xsl:attribute> 106 <span class="title"> 107 <xsl:value-of select="@title"/> 108 </span> 109 <span class="page"> 110 <xsl:value-of select="@page"/> 111 </span> 112 <span class="dots"></span> 113 </xsl:element> 114 </li> 115 <ol class="toc"> 116 <xsl:apply-templates/> 117 </ol> 118 </xsl:template> 119 120 </xsl:stylesheet> 121

 

Although this is a standard XSL file, the following sections are of particular interest:

  • Lines 12-56: Standard HTML CSS style sheet which controls the look of the generated HTML.
  • Line 60: Insert a custom property passed into the conversion request. In our example the document’s title.
  • Line 76: An empty template for the properties element to prevent this information from being displayed as a plain list.
  • Lines 78-97: XSL template for generating HTML associated with all Level 0 topics. If you wish to control the generated HTML for a specific level then copy the topic[@level=’0’] template and change the level number to match to appropriate nesting level.
  • Lines 99-118: XSL Template for all topic levels that do not have an explicit template defined.

 

Not sure what is going on and this all looks like a load of gobbledygook? Don’t worry! Just use the XSL sample provided above and use it in your project, as you can see in the following screenshot, it looks pretty good..

Sample TOC

 

Testing & Troubleshooting

Although not the most attractive looking application ever devised, the PDF Converter comes with a handy Diagnostics Tool (including full source code) to test the Table Of Contents facility. While this might be merely a handy test tool, not the official user interface for the TOC facility, it can be incredibly helpful in quickly testing various XSL template designs before integrating them into your solution..

Diagnostics Tool

To test the XSL and TOC output, enable the Table of Content as per the screenshot above, modify the XSL template if needed, specify any optional properties, select a file or folder in the WS Convert tab and choose either the Convert or Merge button.

 

Any questions? Please leave a comment below or contact us directly.

.

Labels: , , , , , ,