Extract PDF Forms Data (FDF, XFDF, XML) using SharePoint or C#, Java, PHP

Posted at: 15:08 on 08 October 2021 by Muhimbi

Muhimbi's range of PDF Conversion products are primarily used for ... well ... as the name says .... conversion of various file formats to PDF. This all works great, but once you start generating PDF files, you start to get a lot of questions from customers about other PDF facilities. As a result, over the years we have added PDF merging, splitting and securing.

Happy customers all around, but there is always room for improvement and additional features. As we are very active in the Electronic Forms market (InfoPath to PDF, Microsoft Forms, Nintex Forms, etc.) we decided to add support for the most popular form filling technology, PDF Forms.

As a result it is now possible to take a typical PDF form, being it a government issued W9, or a form specific to your industry or organisation, and use our existing conversion facilities to extract forms data in the FDF, XFDF and XML standards.


Just use our existing conversion facilities, being it our API, Nintex Workflow, or SharePoint Designer and pass in a PDF file containing form data. Select FDF, XFDF or XML as the output format and the form data is extracted in the desired format. Once you have the extracted data, use it in whatever way is required. Save it in a SharePoint column, insert it into a database, make a decision in your code based on the value of a field, etc.

Please note that at the time of writing (version 10.1.2) we support the AcroForms standard, which has broad support across the PDF Market. Support for the XML Forms Architecture standard will not be available until the 10.2 release.


So let's have a look at what this looks like in the real-word. In this case via Nintex Workflow.

Assuming that the Muhimbi PDF Converter is deployed in your environment, and the Nintex Workflow Integration has been activated, it is a matter of adding Muhimbi's Convert Document action to the workflow. Fill in the blanks and make sure you set the Output Format to something that is relatively easy to 'parse', in this example XFDF. In this particular case we capture the Output Item ID, which is the ID of the generated XFDF file. We need this so we can read the content of the generated file in the next step.



The Convert Document action generates XFDF, an XML based format specific to the form fields defined in the source PDF file. An example can be found below:



In the next step we want to extract data from the generated XFDF file, so we can use it in the rest of our workflow. Insert Nintex' standard Query XML action and specify the ItemURL, which is based on the Item ID captured in the previous step. Then use the XPath builder to generate a path to the field to extract. (Please ignore the XPath in the screenshot below, it will be different for your particular document).



This is a very high-level example, which assumes you are familiar with how to create workflows and parse XML.

If you have any questions or comments, leave a note below or reach out to our support desk. We love to help!


Labels: , , , , ,

Subscribe to News feed