Python XML to JSON, XML to Dict

Filed Under: Python

Today we will learn how to convert XML to JSON and XML to Dict in python. We can use python xmltodict module to read XML file and convert it to Dict or JSON data. We can also stream over large xml files and convert them to Dictionary. Before stepping into the coding part, let’s first understand why XML conversion is actually necessary.

Converting XML to Dict/JSON

XML files have slowly become obsolete but there are pretty large systems on the web that still uses this format. XML is heavier than JSON and so, most developers prefer the latter in their applications.

When applications need to understand the XML provided by any source, it can be a tedious task to convert it to JSON. The xmltodict module in Python makes this task extremely easy and straightforward to perform.

Getting started with xmltodict

We can get started with xmltodict module but we need to install it first. We will mainly use pip to perform the installation.

Install xmltodict module

Here is how we can install the xmltodict module using Python Package Index (pip):


pip install xmltodict

This will be done quickly as xmltodict is a very light weight module. Here is the output for this installation:

python install xmltodict module

The best thing about this installation was that this module is not dependent on any other external module and so, it is light-weight and avoids any version conflicts.

Just to demonstrate, on Debian based systems, this module can be easily installed using the apt tool:


sudo apt install python-xmltodict

Another plus point is that this module has an official Debian package.

Python XML to JSON

The best place to start trying this module will be to perform an operation it was made to perform primarily, to perform XML to JSON conversions. Let’s look at a code snippet on how this can be done:


import xmltodict
import pprint
import json

my_xml = """
    <audience>
      <id what="attribute">123</id>
      <name>Shubham</name>
    </audience>
"""

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(xmltodict.parse(my_xml)))

Let’s see the output for this program:
python xml to json

Here, we simply use the parse(...) function to convert XML data to JSON and then we use the json module to print JSON in a better format.

Converting XML File to JSON

Keeping XML data in the code itself is neither always possible nor it is realistic. Usually, we keep our data in either database or some files. We can directly pick files and convert them to JSON as well. Let’s look at a code snippet how we can perform the conversion with an XML file:


import xmltodict
import pprint
import json

with open('person.xml') as fd:
    doc = xmltodict.parse(fd.read())

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(doc))

Let’s see the output for this program:
python xml file to json

Here, we used another module pprint to print the output in a formatted manner. Apart from that, using the open(...) function was straightforward, we used it get a File descriptor and then parsed the file into a JSON object.

Python XML to Dict

As the module name suggest itself, xmltodict actually converts the XML data we provide to just a simply Python dictionary. So, we can simply access the data with the dictionary keys as well. Here is a sample program:


import xmltodict
import pprint
import json

my_xml = """
    <audience>
      <id what="attribute">123</id>
      <name>Shubham</name>
    </audience>
"""
my_dict = xmltodict.parse(my_xml)
print(my_dict['audience']['id'])
print(my_dict['audience']['id']['@what'])

Let’s see the output for this program:
python xml to dict

So, the tags can be used as the keys along with the attribute keys as well. The attribute keys just needs to be prefixed with the @ symbol.

Supporting Namespaces in XML

In XML data, we usually have a set of namespaces which defines the scope of the data provided by the XML file. While converting to the JSON format, it is then necessary that these namespaces persist in the JSON format as well. Let us consider this sample XML file:


<root xmlns="http://defaultns.com/"
        xmlns:a="http://a.com/">
    <audience>
        <id what="attribute">123</id>
        <name>Shubham</name>
    </audience>
</root>

Here is a sample program on how we can include XML namespaces in the JSON format as well:


import xmltodict
import pprint
import json

with open('person.xml') as fd:
    doc = xmltodict.parse(fd.read(), process_namespaces=True)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(doc))

Let’s see the output for this program:
xml namespace to dict and json

JSON to XML conversion

ALthough converting from XML to JSON is the prime objective of this module, xmltodict also supports doing the reverse operation, converting JSON to XML form. We will provide the JSON data in program itself. Here is a sample program:


import xmltodict

student = {
  "data" : {
    "name" : "Shubham",
    "marks" : {
      "math" : 92,
      "english" : 99
    },
    "id" : "s387hs3"
  }
}

print(xmltodict.unparse(student, pretty=True))

Let’s see the output for this program:
python json to xml

Please note that giving a single JSON key is necessary for this to work correctly. If we consider that we modify our program to contain multiple JSON keys at the very first level of data like:


import xmltodict

student = {
    "name" : "Shubham",
    "marks" : {
        "math" : 92,
        "english" : 99
    },
    "id" : "s387hs3"
}

print(xmltodict.unparse(student, pretty=True))

In this case, we have three keys at the root level. If we try to unparse this form of JSON, we will face this error:
python json to xml unparse error

This happens because xmltodict needs to construct the JSON with the very first key as the root XML tag. This means that there should only be a single JSON key at the root level of data.

Conclusion

In this lesson, we studied an excellent Python module which can be used to parse and convert XML to JSON and vice-versa. We also learned how to convert XML to Dict using xmltodict module.

Reference: API Doc

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages