Aveek Das
Console output from the above snippet

Working with JSON data in Python

March 30, 2021 by

In this article, I am going to write about the various ways we can work with JSON data in Python. JSON stands for Java Script Object Notation and has become one of the most important data formats to store and transfer data across various systems. This is due to its easy-to-understand structure and also because it is very lightweight. You can easily write simple and nested data structures using JSON and it can be read by programs as well. In my opinion, JSON is much more human-readable as compared to XML, although both are used to store and transfer data. In modern web applications, by default JSON is being used to transfer information.

Understanding the JSON data structure

First, let’s begin by understanding how JSON looks and how to deal with it.

A sample JSON structure

Figure 1 – A sample JSON structure

In the figure above you can see a sample data structure that is represented in JSON. The sample is a representation of this article. The top-level node of the sample is data under which a list is created by using the [] braces. Inside the [] braces, you can have multiple JSON nodes or strings as required. To keep things simple, I have only used one item on the list. The next items inside the list are the type, id, attributes, and author in regards to the article submitted. The attributes and author are nested objects that can be further expanded to title, description, created, updated and id, name respectively.

By having a quick glance at the overall data structure it is easy to determine the relationships between the article and the author and as such very easy to understand by both humans and machines.

Concept of serialization and deserialization of the JSON

So far, we have understood how JSON looks like and how can we interpret a JSON data structure. Now, we should understand how can we use this data in python and do operations as required. While dealing with JSON, we often come across two terms known as Serialization and Deserialization of data. The basic format of writing JSON is just a string data type that contains data in key-value pairs. In order for the machine to understand this string, it needs to be converted into an object which can be then consumed by the interpreter. The process of converting a string JSON into a python object is called Deserialization and the process of converting a python object back to JSON is called Serialization.

Let’s now understand and try to do this using python.

https://gist.github.com/aveek22/4dffd4379d33104381ffca5fe10b6cba

Console output from the above snippet

Figure 2 – Console output from the above snippet

If you see the code above, you will notice that I have imported the JSON module into the script. This is the default module provided by Python to deal and work with JSON data. You can read more about this library from the official documentation. There are four basic methods in this library as follows:

  • json.dump – This method is used to serialize a python object from the memory into a JSON formatted stream that can be written to a file
  • json.dumps – This is used to serialize the python objects in the memory to a string that is in the JSON format. The difference between both of these is that in the former, a stream of data is produced while the latter creates a string data type
  • json.load – You can use this method to load data from a JSON file that exists on the file system. It parses the file and then deserializes the data into a python object
  • json.loads – This is similar to json.load, the only difference is it can read a string that contains data in the JSON format

From my experience, I can say that you will be using the json.loads and json.dumps quite more frequently as compared to their streaming data counterparts. An important point worth mentioning is that the JSON library works only with the built-in python data types like string, integer, list, dictionaries, etc. In case you would want to work with a custom data type, then we would first need to convert the custom datatype to a python dictionary object and then serialize it to JSON data format.

Using Pandas to read JSON data

So far, we have learned about working with the JSON library in python to work with JSON data types. Now let us also take a look around the Pandas library in python and how to read and write data using Pandas. As you might be aware, Pandas is extensively used in the field of data science to analyze existing data and discover insights from the underlying data.

https://gist.github.com/aveek22/c7fc11b226504420c6ec980534a94ba5

If you run the code above, you will get the data loaded into a Pandas dataframe.

JSON Data loaded as Pandas Dataframe

Figure 3 – JSON Data loaded as Pandas Dataframe

As you can see in the figure above, the read_json() method in Pandas reads the JSON from the string or a file and then converts it into a Pandas dataframe. This method also accepts several other parameters of which I will be discussing the most important ones in the following section.

  • path – The first parameter accepted by this method is the path or the name of the JSON formatted string. Instead of specifying a variable name, you can directly provide the JSON string as an argument and it will still work fine
  • orient – This parameter is used to define the format in which the JSON string is available. The most common values accepted for this parameter are records, index, columns, values, etc
  • typ – This defines the type of data that should be returned by the method. By default, it returns a dataframe, but can also be set to return a series instead of a dataframe

So far, we have seen how to read JSON formatted data using Pandas. Now, let us also understand how to export data from Pandas dataframe back to JSON. Basically, we are going to serialize a Pandas dataframe to a JSON string.

https://gist.github.com/aveek22/cd96bcef996d45db7c03059918b7bc69

Converting Pandas DataFrame to JSON

Figure 4 – Converting Pandas DataFrame to JSON

As you can see in the figure above, when we execute the above snippet, the Pandas dataframe gets converted into a JSON string which is then printed to the console. This is done with the to_json() method available in Pandas that help us to convert existing data to JSON string. The important parameters accepted by this method are discussed as follows.

  • path – This parameter is somewhat different from the one that we have seen in the previous section. This is an optional parameter in which it will write the JSON data after serializing it
  • orient – This is used to define the format in which the data has to be exported. There are several values for this parameter like records, split, index, columns, values etc. By default, if the method is passed on to a dataframe, the columns are selected

You can follow the official documentation from Pandas to learn more about handling JSON data with Pandas.

Conclusion

In this article, we have seen what JSON is and how to work with JSON data in python using various libraries. JSON is a rich data structure and can be used in almost every modern application in the recent world. Also, it is easily understood and read by humans as well as machines and as a result, has gained a lot of popularity with the developers. JSON data can be structured, semi-structured, or completely unstructured. It is also used in the responses generated by the REST APIs and represents objects in key-value pairs just like the python dictionary object.

Table of contents

Setting up Visual Studio Code for Python Development
How to debug Python scripts in Visual Studio Code
Deploy Python apps to Azure Functions using Visual Studio Code
Getting started with Amazon S3 and Python
Getting started with Pandas in Python
Working with Pandas Dataframes in Python
Exploring databases in Python using Pandas
Best practices to follow while programming in Python
Exporting data with Pandas in Python
Create REST APIs in Python using Flask
Working with JSON data in Python
Understanding *args and *kwargs arguments in Python
Aveek Das
JSON, Python

About Aveek Das

Aveek is an experienced Data and Analytics Engineer, currently working in Dublin, Ireland. His main areas of technical interest include SQL Server, SSIS/ETL, SSAS, Python, Big Data tools like Apache Spark, Kafka, and cloud technologies such as AWS/Amazon and Azure. He is a prolific author, with over 100 articles published on various technical blogs, including his own blog, and a frequent contributor to different technical forums. In his leisure time, he enjoys amateur photography mostly street imagery and still life. Some glimpses of his work can be found on Instagram. You can also find him on LinkedIn View all posts by Aveek Das

168 Views