Subscribe to News feed

Converting PDF document to PDF/A1b using the Muhimbi PDF Converter Web Service

Posted at: 3:20 PM on 27 September 2011 by Muhimbi

PDFAlogoAs Muhimbi’s range of PDF Conversion products, including the PDF Converter for SharePoint and the PDF Converter Services, now provide the ability to post process any converted document for output in PDF/A format, one obvious use for this brilliant new functionality is to convert regular PDF files to PDF/A format.

In this post we’ll provide a simple .NET sample that invokes our Web Services interface to carry out the conversion from PDF to PDF/A1b. The code is nearly identical to the code to convert and watermark a simple MS-Word file with the following exceptions. You can apply the same changes to the Java sample to make it do the same using that language.

  1. openOptions.FileExtension is set to pdf.
  2. conversionSettings.PDFProfile is set to PDFProfile.PDF_A1B.
  3. converstionSettings.OutputFormatSpecificSettings is set to an instance of OutputFormatSpecificSettings_PDF with the PostProcessFile property set to True.
  4. The client.ProcessChanges() method is invoked rather than client.Convert().
  5. All references to watermarks have been removed as they are not part of this sample.
     

Some minor clean-up has been carried out as well to make the code even shorter. After running the example the resulting file validates perfectly according to Acrobat X Pro.
 

PDFA Check

 

Sample Code

Listed below is sample code to convert PDF to PDF/A. You can either copy the code from this blog post, download the Visual Studio Project or open the project from the Sample Code folder in the Windows Start Menu.

The sample code expects the path of the PDF file on the command line. If the path is omitted then the first PDF file found in the current directory will be used.

  1. Download and install the Muhimbi PDF Converter Services or PDF Converter for SharePoint.
     
  2. Install the prerequisites as described here. There is no need to make any changes to the configuration file.
     
  3. Create a new Visual Studio C# Console application named PDFA_Conversion.
     
  4. Add a Service Reference to the following URL and specify ConversionService as the namespace

        http://localhost:41734/Muhimbi.DocumentConverter.WebService/?wsdl
     
  5. Paste the following code into Program.cs
     
    using System;
    using System.Diagnostics;
    using System.IO;
    using System.ServiceModel;
    using Watermarking.ConversionService;
     
    namespace PDFA_Conversion
    {
        class Program
        {
         // ** The URL where the Web Service is located. Amend host name if needed.
         static string SERVICE_URL = "http://localhost:41734/Muhimbi.DocumentConverter.WebService/";
     
            static void Main(string[] args)
            {
                DocumentConverterServiceClient client = null;
     
                try
                {
                    // ** Determine the source file and read it into a byte array.
                    string sourceFileName = null;
                    if (args.Length == 0)
                    {
                        // ** If nothing is specified then read the first PDF file from the folder.
                        string[] sourceFiles = Directory.GetFiles(
    Directory.GetCurrentDirectory(), "*.pdf");
                        if (sourceFiles.Length > 0)
                            sourceFileName = sourceFiles[0];
                        else
                        {
                            Console.WriteLine("Please specify a document to convert to PDF/A.");
                            Console.ReadKey();
                            return;
                        }
                    }
                    else
                        sourceFileName = args[0];
     
                    byte[] sourceFile = File.ReadAllBytes(sourceFileName);
     
                    // ** Open the service and configure the bindings
                    client = OpenService(SERVICE_URL);
     
                    //** Set the absolute minimum open options
                    OpenOptions openOptions = new OpenOptions();
                    openOptions.OriginalFileName = Path.GetFileName(sourceFileName);
                    openOptions.FileExtension = "pdf";
     
                    // ** Set the absolute minimum conversion settings.
                    ConversionSettings conversionSettings = new ConversionSettings();
                    conversionSettings.PDFProfile = PDFProfile.PDF_A1B;
     
    // ** Specify output settings as we want to force post processing of files.
                    OutputFormatSpecificSettings_PDF osf = new OutputFormatSpecificSettings_PDF();
                    osf.PostProcessFile = true;
                    // ** We need to specify ALL values of an object, so use these for PDF/A
                    osf.FastWebView = false;
                    osf.EmbedAllFonts = true;
                    osf.SubsetFonts = false;
                    conversionSettings.OutputFormatSpecificSettings = osf;
     
                    // ** Carry out the conversion.
                    Console.WriteLine("Converting file " + sourceFileName + " to PDF/A.");
                    byte[] convFile = client.ProcessChanges(sourceFile, openOptions, 
    conversionSettings);
     
                    // ** Write the converted file back to the file system using the same name.
                    string destinationFileName = Path.GetFileName(sourceFileName);
                    using (FileStream fs = File.Create(destinationFileName))
                    {
                        fs.Write(convFile, 0, convFile.Length);
                        fs.Close();
                    }
     
                    Console.WriteLine("File converted to " + destinationFileName);
     
                    // ** Open the generated PDF/A file in a PDF Reader
                    Console.WriteLine("Launching file in PDF Reader");
                    Process.Start(destinationFileName);
                }
                catch (FaultException<WebServiceFaultException> ex)
                {
                    Console.WriteLine("FaultException occurred: ExceptionType: " +
                                    ex.Detail.ExceptionType.ToString());
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.ToString());
                }
                finally
                {
                    CloseService(client);
                }
                Console.ReadKey();
            }
     
     
            /// <summary>
            /// Configure the Bindings, endpoints and open the service using the specified address.
            /// </summary>
            /// <returns>An instance of the Web Service.</returns>
            public static DocumentConverterServiceClient OpenService(string address)
            {
                DocumentConverterServiceClient client = null;
     
                try
                {
                    BasicHttpBinding binding = new BasicHttpBinding();
                    // ** Use standard Windows Security.
                    binding.Security.Mode = BasicHttpSecurityMode.TransportCredentialOnly;
                    binding.Security.Transport.ClientCredentialType =
                                                                  HttpClientCredentialType.Windows;
                    // ** Increase the client Timeout to deal with (very) long running requests.
                    binding.SendTimeout = TimeSpan.FromMinutes(30);
                    binding.ReceiveTimeout = TimeSpan.FromMinutes(30);
                    // ** Set the maximum document size to 50MB
                    binding.MaxReceivedMessageSize = 50 * 1024 * 1024;
                    binding.ReaderQuotas.MaxArrayLength = 50 * 1024 * 1024;
                    binding.ReaderQuotas.MaxStringContentLength = 50 * 1024 * 1024;
     
                    // ** Specify an identity (any identity) in order to get it past .net3.5 sp1
                    EndpointIdentity epi = EndpointIdentity.CreateUpnIdentity("unknown");
                    EndpointAddress epa = new EndpointAddress(new Uri(address), epi);
     
                    client = new DocumentConverterServiceClient(binding, epa);
     
                    client.Open();
     
                    return client;
                }
                catch (Exception)
                {
                    CloseService(client);
                    throw;
                }
            }
     
            /// <summary>
            /// Check if the client is open and then close it.
            /// </summary>
            /// <param name="client">The client to close</param>
            public static void CloseService(DocumentConverterServiceClient client)
            {
                if (client != null && client.State == CommunicationState.Opened)
                    client.Close();
            }
     
        }
    }
  6. Make sure the output folder contains a PDF file.
     
  7. Compile and execute the application. The converted PDF/A file will automatically be opened in your system’s PDF reader.

 

As all this functionality is exposed via a Web Services interface, it works equally well from Java and other web services enabled environments. Please note that you need the PDF Converter Professional add-on license in addition to a valid PDF Converter for SharePoint or PDF Converter Services License in order to use this functionality.

This code is merely an example of what is possible, feel free to adapt it to you own needs. The possibilities are endless.

.

Labels: , , , , ,

PDF/A Support in the Muhimbi PDF Converter Services & SharePoint

Posted at: 12:29 PM on 26 September 2011 by Muhimbi

PDFAlogo

Muhimbi’s range of PDF Conversion products, including the PDF Converter for SharePoint and the PDF Converter Services, has provided some level of PDF/A support since the very beginning. We knew this was going to be important so we even built it into our web service’s object model. However, up until now (version 5.1), any request to output the file in PDF/A format was passed on directly to the underlying converter, which may, or may not, do a good job of sticking to the PDF/A standard (more on this subject below).

Update: As of version 7.0, the Muhimbi PDF Converter also support the PDF/A2b standard.

The converters that support PDF/A1b (not PDF/A2b) natively are:

  • MS-Word
  • PowerPoint
  • MS-Publisher
  • Viso
  • TIFF

This all all changes with version 5.2 of our software, which allows any PDF file to be post processed and converted to PDF/A1b. More about this later, let’s have a look at the history, and reasons, behind the PDF/A standard first.

Over the last 20 years the PDF document format has grown from a relatively basic file type that displays information on screen as if it were printed on paper, to an all singing and dancing 3D, Audio / Video digital printing format. Although at the moment it is easy and free to download a viewer that can display the latest and greatest content, what about the computers used 50 years from now? Who knows, perhaps the holographic AppleMicroGoogle™ quantum computers we’ll be using by then won’t even understand the concept of embedded JavaScript, or no one could be bothered to implement it in the version of Acrobat Reader that was rewritten from the ground up after the Great Computer Failure of 2038. All kidding aside, the PDF/A standard was created to go ‘back to basics’ and use PDF for what it was originally intended for and give future computers and operating systems a remote chance of displaying a file’s content similar to how it was originally intended to look.

A good summary of the main aims and rules with regards to PDF/A can be found on the pdfa.org website

What does PDF/A aim to achieve? PDF/A aims to produce files with static content that can therefore be visually reproduced completely precisely today and in many years time. Files that are subject to long-term archiving should work regardless of the device or operating system used. The future usability of PDF/A files must also be guaranteed in a manufacturer-independent manner – and this includes Adobe. PDF/A is a ‘complete’ format. This means that PDF files that comply with the PDF/A standard are complete in themselves and use no external references or non-PDF data. The PDF/A-1 standard is based on PDF/A specification 1.4, which means that it works within the technical scope of the functions available in Acrobat 5.

A range of rules must be observed when generating PDF/A files in order to meet the goals named above. For example, when generating PDF/A, it is important to embed all fonts and clearly specify all colors. Forms, comments, and notes are only permitted to a limited extent. Compression is allowed as a general rule, but LZW and JPEG2000 are excluded. Transparent objects and layers (Optional Content Groups) are not permitted. PDF/A uses rules for metadata that are based on XMP (Extensible Metadata Platform). Finally, a PDF/A file must identify itself as such.

Please note that you need the PDF Converter Professional add-on license in addition to a valid PDF Converter for SharePoint or PDF Converter Services License in order to use this functionality.
 

Muhimbi’s interpretation of the PDF/A standard

As is generally the case with these kinds of standards, there are as many PDF/A validators as there are interpretations of the specification. The validator we use internally for testing purposes is the one that comes with Adobe Acrobat Pro X. I am sure there are stricter validators, but if Acrobat Pro says a document validates fine then I am sure there is a pretty good chance it can be opened 50 years from now.

In the screenshot below you can see the validation results of a PowerPoint file that was converted to PDF by the Muhimbi PDF Converter and subsequently post processed for output as PDF/A1b. As you can see it validates perfectly well. The same is true for conversion to PDF/A2b.
 

PDFA Check PDF/A File generated by the Muhimbi PDF Converter

 

As you can see in the screenshot below, the same document saved by PowerPoint itself in PDF/A format does not validate successfully. 
 

PDFA Check - Non-compliant PDF/A File generated by PowerPoint

  
Clearly the file post-processed by the Muhimbi PDF Converter validates better, but that doesn’t mean that the one generated by PowerPoint cannot be displayed 50 years from now. The validator checks many rules including important ones such as font embedding, but also rules that are perhaps not so important like the fact that the Modification Date stored in the PDF is exactly the same as the one stored in the XMP meta-data.

 

Configuring the Muhimbi PDF Converter to use PDF/A

The Muhimbi PDF Converter relies on 3rd party software to carry out the PDF/A post processing step. Fortunately this software is free to download and install for both individuals and organisations. The actual use of the software is less than trivial, but our software takes care of all the complexities.

Installation of this software is optional and only required if you intend to carry out any PDF/A post processing. The steps are as follows:

  1. If not already installed, install the Muhimbi PDF Converter for SharePoint (or PDF Converter Services) version 5.2 or newer. (7.0 for PDF/A2b support)
  2. Download the latest GPL Release from the Ghostscript website. Depending on your hardware and operating system you will need to download either the 32 or 64 bit version. Muhimbi has tested the PDF/A1b post processing with versions 9.04 and later. PDF/A2b requires Ghostscript 9.06 or later.
  3. Install Ghostscript in a location of your choice on every server that runs the Muhimbi Conversion Service. If you accept the default location, or the default location on a different drive, then the Muhimbi PDF Converter will automatically detect Ghostscript’s file path.

 

Once Ghostscript has been installed you are ready to go, providing you access our Web Services based interface from your own software and the PDFA.PostProcessing configuration value discussed below is set appropriately (details for use with SharePoint can be found further below). Just set the web service’s ConversionSettings.PDProfile property to PDF_A1B and any converter that doesn’t natively support PDF/A1b output will automatically send the generated PDF file to the post processor. However, depending on your exact needs you may want to update a number of settings in the Muhimbi Conversion Service’s config file.

  1. Open ‘Muhimbi.DocumentConverter.Service.exe.config’ in your favourite text editor. The file is stored in the same directory where the Muhimbi Conversion Service has been installed in. You can find a shortcut to this folder in the ‘Muhimbi Document Converter’ group in your start menu.
  2. Change the following settings if needed:
    • Ghostscript.Path: Leave the path empty to auto detect the path. When manually specifying the path include the executable as well, e.g. "E:\Program Files\gs\gs9.04\bin\gswin64c.exe"
    • PDFA.PostProcessing:  Not all converters are able to provide native PDF/A1b output. Use this setting to post process any generated PDF file. Valid values are 'All' (Post Process files generated by all converters, including the ones that are supposed to already support PDF/A1b), 'WhenNeeded' (Post process files for only those converters that do not support native PDF/A1b output) or 'None' (Do not post process files generated by any converters. This is the default option). Please note that these values will only be used if the output format is set to PDF_A1B, either in the web service call or via the global 'ConversionSettings.ForcePDFProfile' config value.
    • PDFA.RasterizeTransparentContent: Define how transparent content is dealt with during conversion to PDF/A. The default setting (False) removes all transparency. If you wish to retain transparent objects then set this value to True, which will result in pages being rasterized resulting in considerably larger and slower PDF files.
    • ConversionSettings.ForcePDFProfile: Override the web service’s ConversionSettings.PDFProfile value during conversion.  Leave empty to use the setting specified in the web service call. Accepted values are members of the Muhimbi.DocumentConverter.WebService.Data.PDFProfile enumeration or an empty string. For example: 'PDF_1_5' (Use PDF Version 1.5) or 'PDF_A1B' (Use the PDF/A standard for long term archiving).
  3. Restart the Muhimbi Conversion Service from the Windows Services Management Console.

 

If you don’t directly interact with Muhimbi’s Web Services interface, but rather use the SharePoint Front End functionality that comes with the PDF Converter for SharePoint, e.g. workflow activities or manual PDF Conversion, then you MUST set the ForcePDFProfile configuration value to PDF_A1B or PDF_A2B. Please note that that this is a global switch that forces all functionality provided by our product to output in PDF/A format. This may have side effects when applying PDF Security. When using SharePoint Workflows you can also specify the PDF Profile on a conversion by conversion basis, see this post.

Sample code to convert PDF Documents to PDF/A using our web services interface can be found here. If you don’t wish to use Ghostscript, or if your organisation already uses a different PDF to PDF/A converter that you wish to integrate with, then please contact us.

 

Issues and limitations

At Muhimbi we like to under promise and over deliver, so interpret the following in that light. We prefer to be upfront about any issues you may encounter to prevent unexpected surprises at a later date.

At this moment in time we are aware of the following issues:

  1. 64 bit only: The 32 bit version of Ghostscript 9.04 contains a bug that may interfere with PDF/A output. As a result you will need to install the 64 bit version. In other words you can only run the entire solution successfully on a 64 bit machine running a 64 bit Operating System. This bug has been fixed in ‘post 9.04 versions’.
  2. Merged Documents: When documents are merged using the Muhimbi PDF Converter, and subsequently post processed for PDF/A output, then Acrobat’s validator shows the ‘CIDSystemInfo and CMap dict not compatible' validation error. It doesn’t happen for all document types and we are confident this message does not have any significant side effects.

 
There are other side effects inherent to the PDF/A1b standard that we can’t do anything about. For example, as transparency is not supported, any documents that use (semi) transparent objects may not look the same as the source document. Similarly, because fonts MUST be embedded in PDF/A compliant documents, the resulting PDF file may be larger than expected, although in many cases you will find them to be smaller than the source file.

We love working with our customers and we’ll do our very best to solve any problems you may experience, so please drop us a line if you have any questions or require assistance.

.

Labels: , , , , ,

PDF Converter Services 5.1 – Maintenance release

Posted at: 1:27 PM on 13 September 2011 by Muhimbi

PDFConverterServicesBox4Earlier this week we released version 5.1 of the PDF Converter for SharePoint, which ships with an improved version of our popular PDF Conversion engine. Today we are releasing an update to the standalone version of the Muhimbi PDF Converter Services that includes all new functionality and fixes including the ability to convert InfoPath 2010 forms that include external data connections.

A quick introduction for those not familiar with the product: The Muhimbi PDF Converter Services is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java and .NET based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Merging facilities and the ability to secure PDF files. A separate SharePoint specific version is available as well.


Some of the main changes in the new version are as follows:

1304 New - InfoPath: Allow external data sources during PDF Conversion in InfoPath 2010
1429 New - InfoPath: Limit conversion of InfoPath attachments to just the active views
1359 Fix - InfoPath: Attachments are sometimes converted twice
1444 Fix - InfoPath: IP2010 still executes disabled 'Form Load' rule
1492 Fix - InfoPath: 'Busy' during conversion
1445 Fix - InfoPath: forms containing Ink controls do not convert
1355 New - Watermarking: Optimise memory use during watermarking
1473 Fix - Watermarking certain bitmap based PDF Files loses all content
1400 Fix - Watermarking: Special characters in MergeField values are not escaped for RTF watermarks
1383 New - Make automatic stripping of MS-Word templates optional
1427 New - Allow all conversion default values to be globally overridden in config file.
1453 Fix - Excel sheets with large OLAP cubes may fail to convert
1470 Fix - Protected MS-Word documents don't convert in Office 2010
1454 Fix - CAD drawing returns empty PDF
1206 Fix - HTML to PDF Conversion hangs on long pages
1419 Fix - PDF Files secured using the PDF Converter may cause problems with searching inside documents
1420 Fix - Image with large dimensions does not convert

For more information check out the following resources:


As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (8MB). .

.

Labels: , , , , , ,

PDF Converter for SharePoint 5.1 – PDF Merging using workflows & Nintex improvements

Posted at: 11:02 AM on 12 September 2011 by Muhimbi

PDFBox5

It is difficult to capture all the amazing new functionality of the 5.1 release of our popular Muhimbi PDF Converter for SharePoint in a single subject line without exceeding the number of available characters, so I am afraid ‘PDF merging using workflows & Nintex improvements’ will have to do.

The list of new features is considerable but the main ones are as follows:


For those not familiar with the product, the PDF Converter for SharePoint is a lightweight solution that allows end-users to convert common document types - including InfoPath, AutoCAD, MS-Office, HTML and images - to PDF format from within SharePoint using a friendly user interface, workflows or a web service call without the need to install any client side software or Adobe Acrobat. It integrates at a deep level with SharePoint and leverages facilities such as the Audit log, localisation, security and tracing. It runs on WSS 3, MOSS as well as SharePoint 2010 and is available in English, German, Dutch, French, Traditional Chinese and Japanese. For detailed information check out the product page.
 

PDF-Merging-Combined

 
In addition to the changes listed above, some of the main changes in the new version are as follows:

1425 New - Nintex: Add Nintex error handling facility to existing Convert Activity
1200 New - Nintex: Create Watermark Activity for Nintex Workflow
1199 New - Nintex: Create PDF Security Activity for Nintex Workflow
1424 New - Nintex: Create PDF Merge Activity for Nintex Workflow
1423 New - Nintex: Create HTML Conversion Activity for Nintex Workflow
1304 New - InfoPath: Allow external data sources during PDF Conversion in InfoPath 2010
1429 New - InfoPath: Limit conversion of InfoPath attachments to just the active views
1359 Fix - InfoPath: Attachments are sometimes converted twice
1444 Fix - InfoPath: IP2010 still executes disabled 'Form Load' rule
1492 Fix - InfoPath: 'Busy' during conversion
1445 Fix - InfoPath: forms containing Ink controls do not convert
1395 Fix - Filtering in SP2010 doesn't take the regional settings of current site into account.
1338 New - Filtering: Add support for checking if a user is in a SharePoint group.
1337 New - Filtering: New comparison operator 'is in list'.
1339 New - Filtering: Add support for filtering for the current user using '[me]'.
1336 New - Filtering: Add support for filtering for the current day using '[today]'.
1394 New - Watermarking: Move watermark handler to a new and separate SharePoint Feature
1355 New - Watermarking: Optimise memory use during watermarking
1473 Fix - Watermarking certain bitmap based PDF Files loses all content
1407 Fix - Watermarking: On SP2007 opening a PDF File and then cancelling it writes an error to the event log
1350 Fix - Watermarking: Opening PDF Files behaves differently with the Muhimbi handler installed
1417 Fix - Watermarking: Opening large PDF files fails
1428 Fix - Watermarking: Image watermarking  using a variable path does not validate in SharePoint Designer 2010
1397 Fix - Watermarking: When a field code value is empty then the field name is displayed in the watermark
1400 Fix - Watermarking: Special characters in MergeField values are not escaped for RTF watermarks
1329 Fix - Watermarking: Conversion service merge fields are only supported in text watermarks
1328 Fix - Watermarking: Image watermark only supports resources from the current web
1187 New - New SharePoint Designer Workflow activity to allow PDF files to be merged
1385 New - Add limits to the number of files that can be merged
1383 New - Make automatic stripping of MS-Word templates optional
1371 New - PDF Merging - Allow field to be specified for sorting
1427 New - Allow all conversion default values to be globally overridden in config file.
1408 New - Add SharePoint auditing for the various PDF Operations.
1016 Fix - Update translations for all supported languages
1453 Fix - Excel sheets with large OLAP cubes may fail to convert
1201 Fix - Selecting a destination folder that does not exist will revert to the parent folder
1470 Fix - Protected MS-Word documents don't convert in Office 2010
1454 Fix - CAD drawing returns empty PDF
1374 Fix - PDF Merging - Using Calculated field as source for bookmarks shows the data type
1206 Fix - HTML to PDF Conversion hangs on long pages
1419 Fix - PDF Files secured using the PDF Converter may cause problems with searching inside documents
1420 Fix - Image with large dimensions does not convert
1260 Fix - Using a hyperlink to 'convert and download' a web page on an anonymous site is not possible

 
For more information check out the following resources:


As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (13MB). .

.





Labels: , , , , , , , ,