The views to convert can be specified during design time, or at run time using a little bit of custom code or InfoPath rules. However, we are asked quite frequently about how to control which view to convert using a SharePoint workflow. The example provided below uses a SharePoint Designer workflow, but it works equally well in Nintex and Visual Studio workflows.
Creating the InfoPath form
Open InfoPath 2010 Designer.
Select Blank Form / Design Form.
Change the title at the top to 'View 1'.
Create a Text box and name it '_MuhimbiViews'.
On the 'Page Design' tab select 'New View'.
Name the new view 'View 2'.
Change the Title at the top to 'View 2'.
From the ‘File’ menu select 'Publish to SharePoint Server'.
Enter the location of your SharePoint site collection.
In the 'What do you want to create or modify' tab accept the default settings (Form Library).
Accept the default for 'Create a new form library', we named it 'MuhimbiViews', but it can be anything.
On the next screen make the '_MuhimbiViews' field available as a column on the library and select 'Allow users to edit data in this field...'
Finish the publishing process.
(There are multiple ways to set InfoPath fields from a workflow, see this article for more details).
To test that all steps have been carried out correctly and that the PDF Converter is working as expected carry out the following steps:
Open the new Form Library in SharePoint and fill out a new form. Don’t enter any data, just save it.
Convert the file that was just generated from the context menu. When the PDF is opened you'll see View 1.
Fill out another form, but this time enter 'View 2'. Save it under a new name.
Convert the file we just generated from the context menu. When the PDF is opened you'll see View 2.
Creating the workflow
From the Library ribbon in SharePoint select Workflow Settings / Create a workflow in SharePoint Designer.
Name the workflow whatever you like, we named it 'Muhimbi Views'.
In this workflow you will need to set the value of the _MuhimbiViews field before carrying out the conversion. The problem is that the workflow does not persist the value of the changed field before executing the Conversion step, so we need to trick SharePoint a bit, not uncommon.
There are two ways to trick the workflow:
Option 1 - Using an Impersonation step
Position the cursor before Step 1 and click the 'Impersonation Step' button in the ribbon.
Position the cursor inside the Impersonation Step and add a 'set field in current item' action.
Use this action to set the value of the ‘_MuhimbiViews’ field to 'View 2;View1'. (Note that when the field was exported it probably defaulted to a different name, e.g. ‘Muhimbi Views’, it is mapped correctly internally though)
Position the cursor after the Impersonation Step and click the ‘Step’ button to add a new step.
In this new step add the 'Convert Document' action and fill in the blanks.
Publish the workflow and execute it on a previously filled out form.
The resulting PDF should contain views 2 and 1 (in that order)
Option 2 - Using 'Pause for duration'
The steps below assume the default, empty, workflow state. Not the workflow entered as part of Option 1.
Position the cursor inside the first Step and add a 'set field in current item' action.
Set the value of the ‘_MuhimbiViews’ field to 'View 2;View1'
Add the 'Pause for duration' action as the next step and let it pause for 1 minute (it may actually take 5, but that can be tweaked).
Add the 'Convert Document' action and fill in the blanks.
Publish the workflow and execute it on a previously filled out form.
It may take a while to execute due to the Pause action, but the resulting PDF should contain views 2 and 1 (in that order)
A third option exists, which is to specify Converter Specific Settings using the XML Override facility in the Convert Document Workflow activity. For more information see this example (scroll to the top of that page for an introduction).
That is all. Any questions or comments? Use the feedback facility below or contact us directly.
So, if you are still with me, let’s talk about building some nice hardware. At Muhimbi we are big fans of virtualisation so we already have some nice hardware to run a large number of Test and Development environments on. However, the Dell T605 Servers we bought a couple of years ago are becoming a bit long in the tooth, are difficult to upgrade and are extremely ‘jet-engine’ loud (Dell sells them as ’silent servers’, right!).
So we decided to get some new kit. There always is the option to buy something off-the-shelf from HP, but for something with these specifications that would be extremely expensive. It wouldn’t give us a high level of control either and would probably be very loud as well, so we decided to go for a custom build. We approached Boston, Broadberry and Armari, who all provided competitive quotes for custom builds of silent servers. If you have no desire to build your own system, or if you are not a complete control freak, then by all means contact these guys. In the end we decided to completely build our own system, not because it would be cheaper (it was), but because it would give us ultimate control and be a lot of fun as well.
Buying the hardware
So we freshened up on the latest Intel Xeon processors and bought the following hardware. Most of it was ordered from Scan, but they let us down with the delivery because they couldn’t source the Xeons quickly enough.
Dual Xeon E5-2670: Purchased from Ballicom, one of the very few suppliers with stock. These processors are extremely pricey (£1080 / $1740 each, before tax), but the E5-2670 offer the best performance ratio without paying an additional premium.
Asus Z9PE-D8 WS: The choice of dual Xeon boards is extremely limited. We decided to go with Asus because they have a good track record for both compatibility and extensibility.
8 x 8GB DDR3 Hynix: The maximum memory supported by the motherboard for ECC Registered memory is 64GB (256GB for non-ECC). We decided to go for the Hynix 1600MHz ECC Registered memory because it it very affordable and available from Scan.
2 x OCZ RevoDrive3 X2: We don’t need crazy amounts of storage, but as the system will run a number of operating systems in parallel, including databases, we need storage to be fast with high levels of concurrent IO. The logical choice is SSD drives and the fastest, of the affordable, SSD Drives are these PCI-Express based cards that bypass the SATA bus all together. Both 480GB cards will be installed in a RAID-1 (mirrored) configuration. We bought these cards from Kikatek who had them in stock at a great price.
2 x Samsung 830 SSD: We need something to boot from and they almost give these drives away. They perform brilliantly on SATA 6Gbps so we installed two 128GB units in a RAID-1 configuration. Purchased from Scan, but they were cheaper on Amazon.
2 x Zalman CNPS9900-MAX: CPUs need to be cooled and these CPU coolers work very well. They are quiet and keep the cores below 50 degrees centigrade under load. We bought these from Scan and purchased the Socket 2011 support clips for this cooler from Quiet PC (unnecessary as the clips were included in the box, but that was not clear from the description). The CPU coolers were purchased from Scan as well.
Corsair AX850 Power supply: This professional series power supply is highly efficient and very quiet. Purchased from Scan as well.
Fractal Design Case: We need to put everything in a box so we bought this stylish looking Define XL computer case. It supports the E-ATX form factor and comes with quiet fans. Purchased from Box.co.uk who had it in stock.
That is all, you don’t need anything else other than perhaps a phillips screw driver. All cables, including plenty of SATA cables, are included with the various components.
Building the System
If you have never put a PC together than you may feel a bit discouraged by the sheer amount of cables, connectors, little screws and components littering your floor after opening the boxes (The picture below was taken after cleaning most of it away). However, with the help for someone experienced, or strangers on the internet, it is not too difficult to put together. There is no ‘Ikea like’ step-by-step guide, but after reading the documentation (and Googling like crazy) things quickly fall into place.
I won’t bore you with the details, but a few issues deserve mentioning:
EEB Form Factor: The Asus motherboard uses the EEB form factor, which is usually reserved for servers. It is not easy to find a case that uses this exact same form factor, but you can buy a regular E-ATX case. The only real difference is that not all of the holes in the motherboard match the case, only 7 out of 11. This is good enough, just don’t blindly screw all 11 risers into the case as the 4 that do not have a corresponding hole in the motherboard will most likely cause a short circuit on the back of the board.
Direction of CPU Coolers: Each CPU cooler has a big fan that can be pointed either left or right. My concern was that if I both point them in the same direction then the ‘hot air’ from fan 1 would be blown into CPU / Fan 2. I went ahead with this anyway as both fans blow towards the nearest and most powerful case fan.
I/O Shield: The motherboard comes with a nice ‘plate’ to cover the block of ports at the back of the motherboard (USB, LAN, Keyboard etc). I was unable to make this fit, there simply was no room in the case. Not the end of the world, but I would have preferred less open air holes in the back.
Fan connectors: The Fractal Design case fans have 3 pins while the motherboard has 4-pin connectors. Don’t worry about it, just plug them in, they only fit in one way. The industry cannot decide on a single standard apparently.
Power plug connectors: Similar to the Fan connectors, there doesn’t appear to be an agreed standard for power connectors to the CPU. The Corsair power supply comes with 2 ‘EPS’ connectors while the motherboard requires 2 ATX12V connectors. Turns out these are the same, but as the pattern on the plugs (defined by square and round pins) doesn’t match I hesitated for a full 24 hours before just forcing the plugs in. I wasn’t too keen to fry £2500 worth of motherboard and CPUs. After contacting Corsair they told me they get that question a lot and that you can push the 8 pin EPS Connector into the ATX12V slot. It worked, pfew.
System panel connectors: Various little wires come out of the case to deal with the reset and power button as well as the case light. They are clearly labeled, but have a good look at your motherboard manual and remember that white & black both mean ‘ground’. (A racial equality decision, hurrah!)
Raid select: The documentation for the Z9PE-D8 WS motherboard is incorrect. On page 2-28 it shows the jumper for the RAID select at the bottom, but it is located just right of the Intel C602 chip (level with the top most SATA port on the right-hand side). A digital copy of the documentation can be found here and allows you to zoom all the way in.
These were the only real stumbling blocks encountered (well, and one of the RevoDrives being DOA). It was our expectation to put it all together in 4 hours, but the problem with the power plug connectors delayed successful completion by 24 hours.
Installing the OS / Software
With the system built we were keen to get the OS installed to carry out a burn-in test and run some benchmarks. We decided to install Windows Server 2012. As we don’t have a DVD drive in the server (it is 2012 for god’s sake) we used the Windows 7 USB Download tool to create a bootable Win2K12 installer on a memory stick. Works like a charm!
Unfortunately the Windows 2012 installer crashed spectacularly. It turns out that the default BIOS of the Z9PE-D8 WS motherboard is not compatible with Windows 8 / Win2K12. Downloading and installing the latest BIOS solved the problem. Just make sure you write it to a FAT/FAT32 memory stick or you get the 'Rom file is not an EFI BIOS’ message. (The Win2K12 USB Stick uses NTFS, so don’t use that to upgrade the BIOS.)
As the boot volume is mirrored (in our case) we downloaded and unpacked the latest drivers for the motherboard and placed them on the same USB stick as the Windows Installer. Depending on which RAID solution you pick (the motherboard has 3) you may be asked to load a driver during installation. A separate blog post will go deeper into the topic of RAID and SSD and why not to use the on-board LSI RAID solution (seriously, don’t do it, it is painfully slow, pick the Intel or Marvell based one).
Even though all storage drivers were present on the USB stick Win2K12 installation failed as we received the “we couldn't create a partition or locate an existing one” error when selecting the installation disk. It turns out that the RAID1 boot disks could not be selected when the RevoDrive is also present in the system, so we temporarily disabled the PCIE Option Rom in the system BIOS for the slot that contains the RevoDrive.
With that change made Windows installed smoothly and quickly. That was until we enabled ‘Event Logging’ in the BIOS, which caused the Win2K12 boot process to slow down to a crawl (Another 24 hours lost before we figured out this was the cause).
Finally, we ran a Burn in test for 24 hours to check the system does not overheat under stress. This test was executed by running Prime 95 for several hours and measuring the temperature for each CPU core using CPU Thermometer 1.2. The results are impressive, CPU Core temperatures are between 18 and 30 degrees (centigrade) when idle and between 39 and 50 degrees under maximum load.
The ambient noise level in the office is 40dB with the server off (same as a Library) and, measured at 1 meter distance, 45dB with the server on (same as a Library, just including the background bird calls).
Any questions or remarks, did we do anything stupid or wrong? Let us know in the comments below.
We regularly receive requests about official performance metrics for our popular PDF Converter Service for Java / .NET / PHP and our PDF Converter for SharePoint. Other than giving our stock answer of ‘it performs extremely well and scales both up and out’ we have never been comfortable publishing official figures. Not because we have something to hide, but because there is no official industry metric. Conversions per second / minute / hour depends on a number of factors including the source file format, size of the document, complexity of the document and naturally the hardware being used to carry out the conversions.
Although from an engineering perspective this is a perfectly valid answer, some people have extreme requirements and need as many details as they can get. So here goes, official metrics for Muhimbi’s range of PDF Conversion products.
Dual Xeon E5-2670 - 2.6 GHz. Each CPU has 8 cores for a total of 16.
Windows Server 2012.
The CPUs have enough cores to demonstrate how well the software scales up. As the system uses SSD drives, reading and writing documents adds no significant overhead.
Please note that out-of-the-box Windows Server 2012 (and 2008 as well) defaults to the Balanced Power Plan. Although we too are keen to be green, we changed the power plan to High Performance during these tests, which made a measurable difference.
A relatively simple approach was used to carry out the tests. The Diagnostics tool that ships with version 6.2 of the PDF Converter allows the number of concurrent requests to be specified (please don’t use any earlier versions of this tool to carry out your own performance tests). Tests were then carried out with 1, 2, 4, 8 and 16 concurrent requests. Verbose logging was disabled to make sure no CPU cycles were wasted on displaying log data. Everything, including the Diagnostics Tool, runs on the same machine.
The Conversion Service’s configuration file was updated to allow 16 concurrent requests for each document type (Concurrency.MaximumInstances.*). The total number of concurrent operations (maxConcurrentCalls) was set to 16 as well.
Each test document was duplicated 1000 times and saved in a single folder. The path to this folder was entered in the Diagnostics Tool after which the Convert Folder button was clicked. At the end of the test the first timestamp in the log window was subtracted from the last resulting in the exact amount taken to convert 1000 documents. This data was then fed into a simple Excel sheet to draw a pretty graph.
A number of real world documents were selected to carry out the tests. Specifically:
License Agreement.doc (MS-Word small): A simple .doc file containing 5 pages of formatted text and a logo in each page header. File size: 141KB.
Administration Guide.doc (MS-Word med): The Administration Guide that ships with the PDF Converter Services. 47 pages of text and images using complex formatting. File size: 773KB.
User Guide.doc (MS-Word large). The User & Developer Guide that ships with the PDF Converter Services. 78 pages of text and many images using complex formatting. File size: 2.8MB.
HTML: A page from the Muhimbi website saved as a self-contained MHT file (includes all images, CSS and scripts). This was done to take network latency and performance of our public website out of the equation. File size: 37KB.
TIFF: A typical scanned TIFF file containing a 17 page patent application. File size: 1.3MB.
MSG (RTF): An email that uses RTF based text. No attachments are included as we want to focus on the conversion of e-mail. File size: 30KB
MSG (HTML): A rich HTML email containing the Muhimbi Newsletter, no attachments. File size: 58KB
InfoPath: The standard Expense Form template that comes with InfoPath. No InfoPath attachments were included as we only want to test InfoPath conversion.
We didn’t include all possible document types as the Muhimbi PDF Converter supports dozens of file types. When converting Office files such as Excel, Visio and Publisher you should see similar figures as those reported for MS-Word.
The scalability tests described in this article focus on conversion. The PDF Converter can also be used for applying PDF Security, Watermarking, Splitting and Merging, but as those operations only take a fraction of the time of a typical conversion we did not include them in this report.
Results on a physical (non-virtualised) server
The chart largely speaks for itself.
As you can see the PDF Converter provides near linear scalability for each additional CPU core with the following exceptions:
MS-Word (small): This document is so quick to convert that we cannot send enough requests to the converter to keep all cores busy all the time. For this reason, in this particular test, there is no benefit between having 8 and 16 cores.
InfoPath: InfoPath is limited to only a single instance, a limitation of the product itself. Therefore performance does not improve beyond 4 parallel requests. In the extreme situation that more performance is needed, InfoPath can be configured to scale further by using a virtualisation solution such as Hyper-V and then deploying the PDF Converter to multiple virtual machines on the same system. A simple HTTP load balancer, including the free NLBS one that comes with windows, can then be used to distribute the requests.
As you can see, MSG results for RTF based emails very closely match the MS-Word (small) converter. This is because internally RTF conversion is carried out by the MS-Word converter. Please note that we artificially lowered the line for the MSG converter by 10% as it would otherwise overlap with the results of the MS-Word converter test.
The figures indicate ‘better than linear’ scalability. We can’t explain this other than attributing this to the Turbo Boost feature in Intel’s line of Xeon E5 CPUs that automatically overclocks the frequency of the CPU under high load.
The same test was performed by installing Hyper-V on the same server, creating a new virtual machine and assigning all 32 logical processors to the VM as well as 12GB of memory. The OS inside the VM, in this case was Win2K8R2 and the version of MS-Office is 2013.
As there are some differences here (OS, available memory, Office version etc.) please do not consider this a like for like test or a test of how well Hyper-V scales. These figures are purely intended to show what kind of figures you could theoretically expect when running the PDF Converter inside a virtualised environment.
Although we did not retest each and every file format, the mix of tests carried out makes clear that in a virtualised environment the Muhimbi PDF Converter scales as well as in non-virtualised environments.
Summary & Conclusion
On a system such as the one described in this article you can easily achieve north of 15.000 conversions per hour (360.000/day). Naturally the exact figure depends on the documents being converted, but the mix of files used in this test represents a good combination of real world documents.
If more performance is needed then linear scalability can be achieved by adding additional conversion servers. These servers can then be load balanced using a standard HTTP Load Balancer. For details see this overview of typical deployment scenarios.
A quick introduction for those not familiar with the product: The Muhimbi PDF Converter Services is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java, .NET & PHP based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, MSG (email) AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Splitting and Merging facilities and the ability to secure PDF files. A separate SharePoint specific version is available as well.
The main changes in the new version are as follows:
Internal parser corrupts some documents when converting DOC to DOCX.
It has been almost four months since we released version 6.0 of the PDF Converter for SharePoint, a release that has generated a large amount of positive feedback, specifically in the area of cross-conversion. Naturally we have not been sitting back, we pretty much worked flat out on new features and improvements, resulting in today’s 6.1 release.
The list of new features and improvements is considerable but the main ones are as follows:
For those not familiar with the product, the PDF Converter for SharePoint is a lightweight solution that allows end-users to watermark, merge, split, secure and convert common document types - including InfoPath, AutoCAD, MSG (email) MS-Office, HTML and images - to PDF as well as other formats from within SharePoint using a friendly user interface, workflows or a web service call without the need to install any client side software or Adobe Acrobat. It integrates at a deep level with SharePoint and leverages facilities such as the Audit log, Nintex Workflow, localisation, security and tracing. It runs on WSS 3, MOSS as well as SharePoint 2010 and is available in English, German, Dutch, French, Traditional Chinese and Japanese. For detailed information check out the product page.
In addition to the changes listed above, some of the main changes and additions in the new version are as follows:
Internal parser corrupts some documents when converting DOC to DOCX.