PDF Converter Services 8.2 - Maintenance Release

Posted at: 14:15 on 13 January 2017 by Muhimbi


We are happy to announce a new version of the Muhimbi PDF Converter Services. As the product is mature and stable, this is largely a maintenance release that solves a number of issues and introduces some refinements.

A quick introduction for those not familiar with the product: The Muhimbi PDF Converter Services is an ‘on premises’ server based SDK that allows software developers to convert typical Office files to PDF format using a robust, scalable but friendly Web Services interface from Java, .NET, Ruby & PHP based solutions. It supports a large number of file types including MS-Office and ODF file formats as well as HTML, MSG (email), EML, AutoCAD and Image based files and is used by some of the largest organisations in the world for mission critical document conversions. In addition to converting documents the product ships with a sophisticated watermarking engine, PDF Splitting and Merging facilities, an OCR facility and the ability to secure PDF files. A separate SharePoint specific version is available as well.

Some of the main changes and additions in the new version are as follows:

2923 CAD Fix Incorrect text size and alignment during CAD to PDF conversion
2674 CAD Fix Lines are too thick when converting DWG
2818 CAD Improvement AutoCAD x-ref search path does not search in sub folders
2940 HTML Fix When converting HTML to PDF, large images that are loaded from an absolute path or URL are skipped
2792 HTML Fix Some PDF iFilters do not pick up PDF Documents that have been converted from HTML to PDF
2890 HTML Fix HTML Conversion problems on extremely large files
2908 Merge Fix Error while merging certain files
2928 Merge Improvement Merge operations timeout after 30 minutes
2942 OCR Fix OCR overlay is rotated for certain PDF files
2537 OCR Fix PDF Syntax (validation) errors after carrying out OCR
2708 OCR Fix OCR temp files not cleaned up in case of an error
2915 OCR Fix Content of some OCR-ed PDFs not picked up by iFilters
2919 PDFA Fix PDF/A Color intent mismatch for certain source documents
2917 Service Improvement Move DocumentConverter logs to 'log' sub folder
2900 Service New Add 'EPS' to OutputFormat.cs and related code (workflow actions etc)
2810 Setup Fix Switching to new printer driver after installing with the old driver doesn't work
2812 Setup Fix Improve installer on systems with wide range of .net framework versions
2813 Setup Fix Fix automatic uninstall steps when switching back and forth in installer
2826 Setup Fix Improve checking for local Admin rights on non-English operating systems
2809 Setup Fix Last step of uninstallation hangs under certain circumstances
2815 Setup Fix Duplicate Windows firewall rules created by installer
2807 Setup Improvement Installer doesn't write Exceptions to log until after error dialog is closed
2720 Watermark Fix Free Text & RTF watermarks do not show Arabic text consistently

For more information check out the following resources:

As always, feel free to contact us using Twitter, our Blog, regular email or subscribe to our newsletter.

Download your free trial here (54MB). .


Labels: , ,

Converting and Archiving InfoPath files in PDF format

Posted at: 13:52 on 10 January 2017 by Muhimbi

microsoft-infopathYou know that archiving InfoPath forms is important, and can be difficult for a host of reasons. Luckily, converting from InfoPath to PDF with Muhimbi’s range of Server Side PDF Conversion products is a simple and optimal solution to keep an organization in compliance of regulations.

After determining that converting documents to PDF is the route that best addresses your organization’s needs, we then need to determine how to archive those PDFs. Simply storing a file is suboptimal in terms of efficacy; for this practice and resulting PDF to be useful, it will be critical to know where documents are stored. There are a lot of suggested best practices in this arena, and that can make determining the specifics of your process more convoluted than it needs to be.

In order to architect a more straightforward storage plan there are three essential points to address: ensuring metadata is retained in a converted form, making the conversion and storage of a document part of a workflow, and creating a cogent plan for the first two points in advance in order to avoid ad hoc policy decisions.


Why Metadata is Important

Metadata allows you to tag documents with information that can be accessed later, without involving a user. This not only relieves an end user of remembering which type of data needs to be stored with specific documents, it also yields two additional benefits:

First, the ability to search SharePoint storage by metadata allows very specific queries to be used when looking for documentation; the more specificity in the query, the more accurate and specific the results. This becomes more and more useful when hundreds of thousands of documents are stored and may need to be queried. For example, it would be easy to sort archived PDFs by InfoPath form title, author, and date range created (all of which are default metadata settings).

Second, the metadata within a document can be used by Muhimbi’s PDF Converter for SharePoint to create watermarks for that document, meaning that not only is the metadata there for search, but it is also attached in a viewable and “un-touchable” (encrypted) way. The inability to edit metadata can be required for some regulations and compliance rules.

Selecting and adding new metadata is a well-covered and documented area of SharePoint usage, so we won’t go into depth about adding new metadata requirements for a document in this article, but we’ve had the ability to maintain metadata during conversion baked into Muhimbi’s PDF converter for SharePoint since the very first version. Worth mentioning as well, it was later updated in version 6 to allow for a workflow step to copy metadata and set content type in a single operation. The PDF converter for SharePoint can also secure that document so that it can be viewed but NOT changed.


Simplify using Workflows

Now that a document is converted and metadata has been retained it still needs to be pushed to its final storage location. Manual storage is an option, but in reality it would only be appropriate for very small, and highly disciplined teams. Like most manual processes, it becomes cumbersome quickly if there is notable volume or number of contributors involved. One form that isn’t stored in the proper place won’t be an issue… until that form is needed, often months or even years later, and mistakes are likely to occur more frequently in tandem with document volume, especially if they are being stored manually.

Furthermore, in the event that PDFs are being manually archived, protocols have to be manually adhered to every single time a PDF is saved as well. For example, let’s assume we’ll be manually archiving customer invoices. We’d want them saved in a tree accordingly:

    Year -> Customer -> Billing -> Invoices -> unique invoice number.

This process will almost always work, but one miss-click means that the invoice is being saved in the wrong year, or even wrong customer. Again, manual processes become cumbersome and increasingly error-prone as volume increases, so unless the team can be counted on 100% of the time to always remember to convert a form to PDF, and then place that PDF in the right location, manual conversion and archiving should be avoided.

Luckily, workflows automate these processes, and are pretty easy to set up. Planning ahead and setting up workflows not only makes the process easier for everyone, it also eliminates headaches and mitigates the risk of human error, such as plain forgetfulness.

Creating a workflow with Muhimbi is just a few steps - and works in most common workflow environments including SharePoint Designer, Nintex Workflow, K2, Visual Studio and Microsoft Flow. It’s easy enough that there is no real reason not to use workflows for any type of content that will be routinely created and needs to be stored.

Saving documentation in a safe, secure and reliable manner is a core business need, and regulatory compliance requirements only make the need more prominent. While the archiving of data may seem complex, it doesn’t need to be complicated as long as an automated process is developed that leverages metadata use and automated workflows.


If you are dealing with InfoPath in your organisation, you MUST think about archiving this kind of information in a file format that is accessible. For more information read out InfoPath Archiving whitepaper or contact our friendly support desk.


Labels: , , ,

The How and Why of OCR / Providing document access to the visually impaired

Posted at: 17:00 on 06 January 2017 by Muhimbi

OCR-LogoWhile Optical Character Recognition (OCR) may seem like a newer technology, it’s been around for more than 50 years. In fact, OCR has become embedded in our daily life without much fanfare. For example, if you’ve ever inserted a check directly into an ATM and the ATM displayed the amount– that was OCR working for you. Of course, OCR functionality goes well beyond depositing Grandma’s birthday check.

Due to an overwhelming amount of user requests, OCR has been an important part of Muhimbi’s range of server-side PDF Conversion products (SharePoint, SDK for Java / PHP / C#, SharePoint Online / Office 365). Implementing software to recognize images and convert them to alpha-numeric characters was no trivial task, but thankfully it’s much easier to explain than it was to actually implement!

When an image is entered into a system it is reviewed for recognizable text. That text is then deciphered by the system with its best guess for each individual character. The system then creates a hidden data layer that contains the deciphered content, synced to the appropriate space on the image.

There are a lot of ways this can be useful for a business and we have included a few examples below. Perhaps more than one will ring familiar to your organization’s needs-set.

If an organization needs to digitize old orders and invoices, doing so manually would involve discrete steps for scanning in the paper copies, renaming them, and storing them in the correct place. However, with OCR technology it’s possible to scan the images in, set rules to look for key information, rename files, and create settings to automatically store them appropriately. SharePoint workflows become super helpful with tasks like these!

Another example involves InfoPath, always a popular topic in our PDF Converter for SharePoint’s use case. It’s not uncommon for InfoPath forms to allow (or require) the attaching of relevant documents. Those docs are most often attached as images, or non-OCR PDFs. By having OCR scan and digitize the content of those files their later usability is significantly increased.

OCR also offers advantages that deal with search-ability. The content in the hidden data layer attached to the file is searchable using a PDF reader or web browser. This allows for “search by content” functionality. Additionally, this text layer can also be set to be crawl-able or index-able allowing search engines to display the OCR document as results. Naturally, this makes said documents much more convenient to work with.

OCRed-Document Scanned Document with OCRed text selected

Perhaps most meaningfully, OCR can empower visually impaired users to access content that would be otherwise impossible; the data layer can be used by a text-to-speech system to ‘read’ content to a user. Of course, even though expanding available content to the visually impaired has obvious business value, the impact goes well beyond office work.

For a bit of history in how OCR became a dominant technology in providing content access to the visually impaired, we should start by mentioning that many governments around the world have implemented standards based off Web Content Accessibility Guidelines (WCAG), which has helped formalize how web content should be created and accessed by any machine. Some examples of governmental implementation include US section 508 and UK Equality Act of 2010, meaning that all US and UK government websites must adhere to the standards set in the WCAG.

The WCAG is a lasting legacy of the Web Accessibility Initiative, which spun out of the personal computer boom of the 1990s. As recently as a few years ago, only about 1% of published books became available in braille, so the WCAG and Web Accessibility Initiative have played an important role in setting up useful guidelines to make sure that online content was held to a higher standard.

The wide adoption of these standards means more electronic content has become available to the visually impaired, both through electronic braille readers (which can cost upwards of $3,000), and the less expensive combination of OCR technology and a screen reader. A screen reader, either as a desktop application or a browser extension, allows text-to-speech capabilities for both rich-text content as well as OCR saved content.

Furthermore, while personal, organizational, or corporate sites aren’t required to comply with these standards, most do because they’ve become widely accepted best-practices. This increases the prevalence and frequency with which OCR technology is used.

There are plenty of solid business reasons for including OCR capabilities into Muhimbi’s range of server side PDF Conversion products. However, we can’t help but think that bringing new content, and more options to those with a visual impairment is perhaps the most notable.

An overview of the various OCR facilities provided by our product range can be found in this Knowledge Base Article.

What do you think, is this something that could be useful for your organisation? Leave a comment below or contact our friendly support desk for more information. We love to help.


Labels: , ,

Subscribe to News feed