Subscribe to News feed

Splitting PDF Files using the PDF Converter Web Service and .NET / C#

Posted at: 17:02 on 27 October 2011 by Muhimbi

Split IconTo facilitate the new PDF Splitting facility in our PDF Converter for SharePoint we have added the ability to split a single file into multiple ones to our core PDF Conversion engine, which our SharePoint product shares with our generic Java / .NET oriented PDF Converter Services.

In this post we’ll describe in detail how to invoke this new splitting facility from your own code. This demo uses C# and .NET, but the web services based interface is identical when used from Java (See this generic PDF Conversion sample).
 

This post is part of the following series related to manipulating PDF files using web services.


Key Features

The key features of the new splitting facility are as follows:

  1. Split a single PDF file into one or more individual PDF files.
  2. Split based on number of pages or bookmarks.
  3. Automatically generate numbered file names using .NET’s formatting syntax, e.g. 'split-{0:3D}.pdf' will use 3 digits for the sequential numbers starting at ‘split-001.pdf’. When splitting by bookmark then an optional {1} parameter can be inserted in the file name to include the name of the bookmark as well.
  4. Can be combined in combination with other actions, e.g. convert & merge.

.

A note about splitting based on bookmark levels: PDFs store bookmarks at the page level, so it is not clear on what part of the page a heading starts or ends. As a result an extra page will always be exported for each file split based on bookmark levels.

For example let’s assume the following document:

  • Page 1: Contains chapter 1 and sections 1.1. and 1.2.
  • Page 2: Contains the last paragraph of 1.2 and all of chapter 2.
  • Page 3: Contains Chapter 3.

When splitting this document based on bookmarks using ‘1’ as the batch size then the following files will be created:

  • File 1: Contains page 1 and 2 as expected.
  • File 2: Contains pages 2 and 3 even though Chapter 2 is only really part of page 2. This is because there is no way to know if Chapter 2 runs over into page 3 or not.
  • File 3: Contains Chapter 3.

 

Object Model

The object model is relatively straight forward. The classes related to PDF Splitting are displayed below. A number of enumerations are used as well by the various classes, these can be found in our original post about Converting files using the Web Services interface.

 

ClassDiagram-Splitting 
The Web Service method that controls splitting (as well as merging) of files is called ProcessBatch. It accepts a ProcessingOptions object that holds all information about the files to process and the operations to apply. A Results object is returned that, when it comes to splitting of files, contains one or more results that hold the contents of the file as well as the suggested output file name, which you may us to save the file locally.

As the ProcessingOptions class accepts both MergeSettings and SplitOptions it is possible to convert and merge a set of input files and then split up the results, all in a single web service call. Just populate the various properties and the system will take care of the rest.

 

Example code

The following sample describes the steps needed to split up a single PDF file based on the number of pages. We are using Visual Studio and C#, but any environment that can invoke web services should be able to access this functionality. Note that the WSDL can be found at http://localhost:41734/Muhimbi.DocumentConverter.WebService/?wsdl.

A generic PDF Conversion Java based example is installed alongside the product and discussed in the User & Developer Guide. The source code for this example can be found in the folder the Muhimbi Conversion service has been installed to.

  1. Start a new Visual Studio project and create the project type of your choice. In this example we are using a standard .net 3.0 project of type Console Application. Name it ‘Split PDF’.
  2. In the Solution Explorer window, right-click References and select Add Service Reference. (Do not use web references!)
  3. In the Address box enter the WSDL address listed in the introduction of this section. If the Conversion Service is located on a different machine then substitute localhost with the server’s name.
  4. Accept the default Namespace of ServiceReference1 and click the OK button to generate the proxy classes.
  5. Optionally add a PDF file to the solution, set the Build Action to None and Copy to Output Directory to Copy if newer. By doing this there will always be a valid test file in the same directory as the compiled executable.
  6. Copy and paste the following code and replace the contents of Program.cs.

 

using System;
using System.IO;
using System.ServiceModel;
using Split_PDF.ServiceReference1;
 
namespace Split_PDF
{
    class Program
    {
        // ** The URL where the Web Service is located. Amend host name if needed.
        static string SERVICE_URL = "http://localhost:41734/Muhimbi.DocumentConverter.WebService/";
 
        static void Main(string[] args)
        {
            DocumentConverterServiceClient client = null;
 
            try
            {
                // ** Determine the source file and read it into a byte array.
                string sourceFileName = null;
                if (args.Length == 0)
                {
                    //** Delete any split files from a previous test run.
                    foreach (string file in Directory.GetFiles(Directory.GetCurrentDirectory(), 
"spf-*.pdf"))
                    {
                        File.Delete(file);
                    }
 
                    // ** If nothing is specified then read the first PDF file.
                    string[] sourceFiles = Directory.GetFiles(Directory.GetCurrentDirectory(), 
"*.pdf");
                    if (sourceFiles.Length > 0)
                        sourceFileName = sourceFiles[0];
                    else
                    {
                        Console.WriteLine("Please specify a document to split.");
                        Console.ReadKey();
                        return;
                    }
                }
                else
                    sourceFileName = args[0];
 
                byte[] sourceFile = File.ReadAllBytes(sourceFileName);
 
                // ** Open the service and configure the bindings
                client = OpenService(SERVICE_URL);
 
                //** Set the absolute minimum open options
                OpenOptions openOptions = new OpenOptions();
                openOptions.OriginalFileName = Path.GetFileName(sourceFileName);
                openOptions.FileExtension = "pdf";
 
                // ** Set the absolute minimum conversion settings.
                ConversionSettings conversionSettings = new ConversionSettings();
 
                // ** Create the ProcessingOptions for the splitting task.
                ProcessingOptions processingOptions = new ProcessingOptions()
                {
                    MergeSettings = null,
                    SplitOptions = new FileSplitOptions()
                    {
                        FileNameTemplate = "spf-{0:D3}",
                        FileSplitType = FileSplitType.ByNumberOfPages,
                        BatchSize = 5,
                        BookmarkLevel = 0
                    },
                    SourceFiles = new SourceFile[1]
                    {
                        new SourceFile()
                        {
                            MergeSettings = null,
                            OpenOptions = openOptions,
                            ConversionSettings = conversionSettings,
                            File = sourceFile
                        }
                    }
                };
 
                // ** Carry out the splittng.
                Console.WriteLine("Splitting file " + sourceFileName);
                BatchResults batchResults = client.ProcessBatch(processingOptions);
 
                // ** Process the returned files
                foreach (BatchResult result in batchResults.Results)
                {
                    Console.WriteLine("Writing split file " + result.FileName);
                    File.WriteAllBytes(result.FileName, result.File);
                }
 
                Console.WriteLine("Finished.");
            }
            catch (FaultException<WebServiceFaultException> ex)
            {
                Console.WriteLine("FaultException occurred: ExceptionType: " +
                                ex.Detail.ExceptionType.ToString());
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }
            finally
            {
                CloseService(client);
            }
            Console.ReadKey();
        }
 
 
        /// <summary>
        /// Configure the Bindings, endpoints and open the service using the specified address.
        /// </summary>
        /// <returns>An instance of the Web Service.</returns>
        public static DocumentConverterServiceClient OpenService(string address)
        {
            DocumentConverterServiceClient client = null;
 
            try
            {
                BasicHttpBinding binding = new BasicHttpBinding();
                // ** Use standard Windows Security.
                binding.Security.Mode = BasicHttpSecurityMode.TransportCredentialOnly;
                binding.Security.Transport.ClientCredentialType =
                                                                HttpClientCredentialType.Windows;
                // ** Increase the client Timeout to deal with (very) long running requests.
                binding.SendTimeout = TimeSpan.FromMinutes(30);
                binding.ReceiveTimeout = TimeSpan.FromMinutes(30);
                // ** Set the maximum document size to 50MB
                binding.MaxReceivedMessageSize = 50 * 1024 * 1024;
                binding.ReaderQuotas.MaxArrayLength = 50 * 1024 * 1024;
                binding.ReaderQuotas.MaxStringContentLength = 50 * 1024 * 1024;
 
                // ** Specify an identity (any identity) in order to get it past .net3.5 sp1
                EndpointIdentity epi = EndpointIdentity.CreateUpnIdentity("unknown");
                EndpointAddress epa = new EndpointAddress(new Uri(address), epi);
 
                client = new DocumentConverterServiceClient(binding, epa);
 
                client.Open();
 
                return client;
            }
            catch (Exception)
            {
                CloseService(client);
                throw;
            }
        }
 
        /// <summary>
        /// Check if the client is open and then close it.
        /// </summary>
        /// <param name="client">The client to close</param>
        public static void CloseService(DocumentConverterServiceClient client)
        {
            if (client != null && client.State == CommunicationState.Opened)
                client.Close();
        }
 
    }
}

 

Compile the application and run it either from the command prompt, with a path to the PDF file to split on the command line, or – if a PDF file is present in the executable’s folder – just run it.

Note that In this example we are programmatically configuring the WCF Bindings and End Points. If you wish you can use the declarative approach using the config file as well.

This new functionality is available as of version 5.2 of our software.

.

Labels: , , , , ,

Splitting PDF Files using a SharePoint workflow and the Muhimbi PDF Converter

Posted at: 11:59 on by Muhimbi

'Banana split'.... get it?Some time ago we introduced a facility in the Muhimbi PDF Converter for SharePoint to merge files using the User Interface, Workflow and Web Service calls. Naturally we have to support the ‘other side of the equation’ as well, resulting in the new PDF Split functionality described in this article.

This post shows how to use a SharePoint Designer Workflow to automatically split up an existing PDF file into multiple files containing 10 pages each. This is quite a common scenario for organisations that deal with massive documents who frequently split up these kind of files in batches of 100 pages to keep the files manageable. If your document is using a format other than PDF then make sure your use our Convert to PDF Workflow Activity first.

The SharePoint Designer Workflow Activity is named Split PDF. After adding it to your workflow you will see the following Workflow Sentence.
 

Split-Sentence

 
The workflow sentence is consistent with our other Workflow Activities (e.g. Converting / watermarking), and is largely self-describing. The following fields are available:

  • This document: The document to split up. For most workflows selecting Current Item will suffice, but some custom scenarios may require the look up of a different item. You may also want to check that the file type of the document is ‘pdf’ before trying to split it up.
  • This file: The name and location of the split files are stored in here. Leave this field empty to use the same folder and file name as the source file, but with sequential numbers added. However, you can optionally specify a path and / or filename template.
  • Path: Enter a path, including the Document Library and any folder names, to write the split files to. E.g. “shared documents/split files/”. You can even specify a different site collection by starting the path with a '/' (never start with 'http:'). When just specifying a path, without the file name, then make sure to use a trailing ‘/’.
  • File Name: The file name can be anything and allows the standard .NET string formatting facilities for numbering, e.g. 'split-{0:3D}' will use 3 digits for the sequential numbers starting at ‘split-001.pdf’. When splitting by bookmark then an optional {1} parameter can be inserted in the file name to include the name of the bookmark as well.
  • Number of pages / bookmark level: Specify if you wish to split based on the number of pages or the level of the bookmark.
  • Batch size: When splitting based on the number of pages then this parameter must be set to the maximum number of pages to include in each split file. When splitting based on the bookmark level then this parameter should contain the ‘depth’ at which to split. E.g. specify ‘1’ to split on top level chapters (Chapter 1, chapter 2, etc.) or a higher number to split at a deeper level (e.g. ‘2’ splits on Chapter 1, 1.1, 1.2, 2, 2.1 etc.)  
  • Parameter ‘List ID’: The ID of the list the split files were written to. This can later in the workflow be used to perform additional tasks on the file such as performing a check-in or out.
  • Parameter ‘List Item IDs’: Unlike our other workflow activities, this parameter will return a string with ‘;’ separated values of the generated item IDs. This list can then be used by other (custom) activities, e.g. the ones created by our Workflow Power Pack, to process the individual files further.

 
A note about splitting based on bookmark levels: PDFs store bookmarks at the page level, so it is not clear on what part of the page a heading starts or ends. As a result an extra page will always be exported for each file split based on bookmark levels.

For example let’s assume the following document:

  • Page 1: Contains chapter 1 and sections 1.1. and 1.2.
  • Page 2: Contains the last paragraph of 1.2 and all of chapter 2.
  • Page 3: Contains Chapter 3.

When splitting this document based on bookmarks using ‘1’ as the batch size then the following files will be created:

  • File 1: Contains page 1 and 2 as expected.
  • File 2: Contains pages 2 and 3 even though Chapter 2 is only really part of page 2. This is because there is no way to know if Chapter 2 runs over into page 3 or not.
  • File 3: Contains Chapter 3.

 

With all the theory out of the way, let’s create a simple example to split up PDF files in batches of 10 pages. .

  1. Download and install the Muhimbi PDF Converter for SharePoint.  Version 5.2 or newer is required.
  2. Make sure you have the appropriate privileges to create workflows on a site collection.
  3. Create a new workflow using SharePoint Designer.
  4. Associate the workflow with the library of your choice. Do not tick any of the boxes next to the ‘Automatically start….’ options, we want to start this workflow manually. If you wish to run this workflow automatically then you may want to add an extra column to determine if a file has been split before, similar to the technique used in this post.
  5. Design the workflow as per the following screen. In summary it does the following:
     
    • Check if the file is in PDF format. Otherwise it cannot be split.
    • The ‘split’ files are written to a folder named ‘Split Files’ so make sure this folder exists. e.g. "Shared Documents/Split Files/spf-{0:D5}.pdf”. You can leave our sample file name or merge the file’s name in using workflow lookups.
    • Log the generated list of Item IDs to the workflow history.

Split-Worfklow

 
Publish the workflow and create / convert / upload a PDF file in the document library. From the file's context menu select 'Workflows' and run your workflow. Depending on the size of the document the split files will be generated in a matter of seconds.

.





Labels: , , , , ,

Announcement: Acquisition of SharePoint Audit Product Line by Idera

Posted at: 13:00 on 03 October 2011 by Muhimbi

Today is a very exciting day as we are announcing the acquisition of the entire Muhimbi SharePoint Audit product line by Idera. This is great news for all existing users of this product as Idera has an enormous amount of experience in the field of Audit and Security software. Muhimbi will continue to focus on the popular SharePoint PDF Conversion and Manipulation line of products.

 

Idera Announces New SharePoint Auditing Solution
Idera SharePoint audit Joins SharePoint enterprise manager to Provide
Comprehensive SharePoint Security, Compliance Solution


ANAHEIM, CALIF. – OCTOBER 3, 2011
– In booth #617 at the Microsoft SharePoint Conference 2011, Idera, a leading provider of Microsoft SharePoint management and administration tools, today announced Idera SharePoint audit™. Idera SharePoint audit is a comprehensive SharePoint auditing solution that captures and records all audit events, manages the audit data captured in a central repository and provides comprehensive reporting and analytics regarding who is doing what, when and where in a SharePoint environment.

Idera SharePoint audit and Idera SharePoint enterprise manager™ work hand-in-hand to provide an overall SharePoint security and compliance solution. Idera SharePoint audit enables administrators to simplify audit administration, automatically turning on auditing for new SharePoint site collections and configuring specialized auditing at the web application or site collection level. Leveraging proven technology acquired from Muhimbi, it collects the data needed to comply with regulations and data security requirements such as Sarbanes Oxley (SOX) and HIPAA. Idera SharePoint enterprise manager then ensures security standards are properly applied in SharePoint by remediating potential security or compliance issues.

“As we started to research what functionality we would add to our SharePoint product portfolio, a powerful SharePoint auditing solution that can audit activity across all versions of SharePoint with minimal performance impact rose to the top of the list,” said Wayne Washburn, Director of SharePoint Products at Idera. “We invite all SharePoint professionals to try SharePoint audit.”

In related news, Idera also announced SharePoint backup™ 3.1 and SharePoint diagnostic manager™ 2.6 today. To learn more go to http://www.idera.com/News/Press-Releases/ or visit Idera in booth #617 at the Microsoft SharePoint Conference where the company will be doing informative product demonstrations and hosting giveaways.

 

Pricing and Availability
Idera SharePoint audit is currently in Public Beta. To download a Beta copy, please visit www.idera.com/Products/SharePoint/SharePoint-audit/.

The product will be generally available in November and will be priced at $9,995 per web front end.

 

About Idera
Idera provides tools for Microsoft SQL Server, SharePoint and PowerShell management and administration. Our products provide solutions for performance monitoring, backup and recovery, security and auditing and PowerShell scripting. Headquartered in Houston, Texas, Idera is a Microsoft Gold Partner and has over 9,000 customers worldwide. To learn more, please contact Idera at +1-713.523.4433 or visit www.idera.com.

 

About Muhimbi
Muhimbi Ltd is the market leader for SharePoint based PDF Conversion, watermarking and security products with full support for InfoPath, AutoCAD, MS-Office, SharePoint Designer and Nintex workflows. Operating from St. Albans, United Kingdom, Muhimbi is working with hundreds of customers around the world to deploy their products in large as well as small SharePoint environments. To learn more, please contact Muhimbi at jritmeijer@muhimbi.com or visit www.muhimbi.com

 

Idera is a division of BBS Technologies, Inc. Idera, SharePoint audit, SharePoint backup, SharePoint diagnostic manager and SharePoint enterprise manager are trademarks or registered trademarks of BBS Technologies, Inc. or its subsidiaries in the United States and other jurisdictions. All other company and product names may be trademarks or registered trademarks of their respective companies.

###

Press and Analyst contact:
Carrie Ward, PR for Idera, 832-407-5347, carrie.ward@idera.com

.







Labels: , ,

Need support from experts?

Access our Forum

Download Free Trials