Posted at: 12:02 PM on 29 October 2012 by Muhimbi
We regularly receive requests about official performance metrics for our popular PDF Converter Service for Java / .NET / PHP and our PDF Converter for SharePoint. Other than giving our stock answer of ‘it performs extremely well and scales both up and out’ we have never been comfortable publishing official figures. Not because we have something to hide, but because there is no official industry metric. Conversions per second / minute / hour depends on a number of factors including the source file format, size of the document, complexity of the document and naturally the hardware being used to carry out the conversions.
Although from an engineering perspective this is a perfectly valid answer, some people have extreme requirements and need as many details as they can get. So here goes, official metrics for Muhimbi’s range of PDF Conversion products.
We recently got our hands on a nice new server and decided to hijack it to carry out these performance tests before the system is repurposed as a virtualisation rig. The relevant specifications are as follows:
- Dual Xeon E5-2670 - 2.6 GHz. Each CPU has 8 cores for a total of 16.
- 64GB RAM.
- SSD Drives.
- Windows Server 2012.
The CPUs have enough cores to demonstrate how well the software scales up. As the system uses SSD drives, reading and writing documents adds no significant overhead.
Please note that out-of-the-box Windows Server 2012 (and 2008 as well) defaults to the Balanced Power Plan. Although we too are keen to be green, we changed the power plan to High Performance during these tests, which made a measurable difference.
A relatively simple approach was used to carry out the tests. The Diagnostics tool that ships with version 6.2 of the PDF Converter allows the number of concurrent requests to be specified (please don’t use any earlier versions of this tool to carry out your own performance tests). Tests were then carried out with 1, 2, 4, 8 and 16 concurrent requests. Verbose logging was disabled to make sure no CPU cycles were wasted on displaying log data. Everything, including the Diagnostics Tool, runs on the same machine.
The Conversion Service’s configuration file was updated to allow 16 concurrent requests for each document type (Concurrency.MaximumInstances.*). The total number of concurrent operations (maxConcurrentCalls) was set to 16 as well.
Each test document was duplicated 1000 times and saved in a single folder. The path to this folder was entered in the Diagnostics Tool after which the Convert Folder button was clicked. At the end of the test the first timestamp in the log window was subtracted from the last resulting in the exact amount taken to convert 1000 documents. This data was then fed into a simple Excel sheet to draw a pretty graph.
A number of real world documents were selected to carry out the tests. Specifically:
- License Agreement.doc (MS-Word small): A simple .doc file containing 5 pages of formatted text and a logo in each page header. File size: 141KB.
- Administration Guide.doc (MS-Word med): The Administration Guide that ships with the PDF Converter Services. 47 pages of text and images using complex formatting. File size: 773KB.
- User Guide.doc (MS-Word large). The User & Developer Guide that ships with the PDF Converter Services. 78 pages of text and many images using complex formatting. File size: 2.8MB.
- HTML: A page from the Muhimbi website saved as a self-contained MHT file (includes all images, CSS and scripts). This was done to take network latency and performance of our public website out of the equation. File size: 37KB.
- TIFF: A typical scanned TIFF file containing a 17 page patent application. File size: 1.3MB.
- MSG (RTF): An email that uses RTF based text. No attachments are included as we want to focus on the conversion of e-mail. File size: 30KB
- MSG (HTML): A rich HTML email containing the Muhimbi Newsletter, no attachments. File size: 58KB
- InfoPath: The standard Expense Form template that comes with InfoPath. No InfoPath attachments were included as we only want to test InfoPath conversion.
We didn’t include all possible document types as the Muhimbi PDF Converter supports dozens of file types. When converting Office files such as Excel, Visio and Publisher you should see similar figures as those reported for MS-Word.
The scalability tests described in this article focus on conversion. The PDF Converter can also be used for applying PDF Security, Watermarking, Splitting and Merging, but as those operations only take a fraction of the time of a typical conversion we did not include them in this report.
Results on a physical (non-virtualised) server
The chart largely speaks for itself.
As you can see the PDF Converter provides near linear scalability for each additional CPU core with the following exceptions:
- MS-Word (small): This document is so quick to convert that we cannot send enough requests to the converter to keep all cores busy all the time. For this reason, in this particular test, there is no benefit between having 8 and 16 cores.
- InfoPath: InfoPath is limited to only a single instance, a limitation of the product itself. Therefore performance does not improve beyond 4 parallel requests. In the extreme situation that more performance is needed, InfoPath can be configured to scale further by using a virtualisation solution such as Hyper-V and then deploying the PDF Converter to multiple virtual machines on the same system. A simple HTTP load balancer, including the free NLBS one that comes with windows, can then be used to distribute the requests.
As you can see, MSG results for RTF based emails very closely match the MS-Word (small) converter. This is because internally RTF conversion is carried out by the MS-Word converter. Please note that we artificially lowered the line for the MSG converter by 10% as it would otherwise overlap with the results of the MS-Word converter test.
The figures indicate ‘better than linear’ scalability. We can’t explain this other than attributing this to the Turbo Boost feature in Intel’s line of Xeon E5 CPUs that automatically overclocks the frequency of the CPU under high load.
An archive with all source files, as well as the PDF versions can be downloaded here.
Results on a virtual server
The same test was performed by installing Hyper-V on the same server, creating a new virtual machine and assigning all 32 logical processors to the VM as well as 12GB of memory. The OS inside the VM, in this case was Win2K8R2 and the version of MS-Office is 2013.
As there are some differences here (OS, available memory, Office version etc.) please do not consider this a like for like test or a test of how well Hyper-V scales. These figures are purely intended to show what kind of figures you could theoretically expect when running the PDF Converter inside a virtualised environment.
Although we did not retest each and every file format, the mix of tests carried out makes clear that in a virtualised environment the Muhimbi PDF Converter scales as well as in non-virtualised environments.
Summary & Conclusion
On a system such as the one described in this article you can easily achieve north of 15.000 conversions per hour (360.000/day). Naturally the exact figure depends on the documents being converted, but the mix of files used in this test represents a good combination of real world documents.
If more performance is needed then linear scalability can be achieved by adding additional conversion servers. These servers can then be load balanced using a standard HTTP Load Balancer. For details see this overview of typical deployment scenarios.
Any questions or comments? Use the feedback facility below or contact us directly.