Working With JSON in Python: Parse, Read & Write Data





Sarah Whitmore
Coding Tutorials
Understanding JSON: The Lingua Franca of Data Exchange
In the world of software development, JSON has become a cornerstone for exchanging data in a textual format. You'll bump into it everywhere – from frontend frameworks interacting with backends, to APIs delivering information, and even configuration files.
This guide will walk you through the essentials of handling JSON data using Python, covering how to parse it, read it from files, and write it back out.
So, What Exactly is JSON?
JSON, which stands for JavaScript Object Notation, is a lightweight format designed for easy data interchange. Despite its name, it's language-independent, but its structure is familiar to programmers who've worked with C-family languages, including Python.
Data in JSON is built around key-value pairs. The keys are always strings (enclosed in double quotes), while the values can be strings, numbers, booleans (true
/false
), arrays (ordered lists, enclosed in square brackets []
), other JSON objects (unordered collections of key-value pairs, enclosed in curly braces {}
), or the special value null
. You can find more details on supported data types here.
These key-value pairs can be nested. An object might contain other objects or arrays as values. For instance, you could represent a list of users like this:
[
{
"username": "coder_gal",
"join_date": "2023-01-15",
"active": true
},
{
"username": "py_ninja",
"join_date": "2022-11-01",
"active": false
}
]
This structure shows an array ([]
) containing two objects ({}
). Each object represents a user with properties like "username", "join_date", and "active" status.
Its simplicity and human-readability, coupled with its direct mapping to data structures found in many programming languages like Python and JavaScript, make JSON extremely popular for web APIs, configuration files, and simple data storage tasks.
Handling JSON Strings in Python
Often, you'll get JSON data as a plain text string within your Python script. This is common when fetching data from web APIs. When interacting with external services, perhaps via proxies to manage connections or access geo-specific content, the responses are frequently JSON-formatted strings.
Let's imagine fetching some data from a public API:
import requests
import json
# Fetch data from a sample API endpoint
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
print(post_string)
# Output might look like:
# {
# "userId": 1,
# "id": 1,
# "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
# "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
# }
print(type(post_string))
# Output: <class 'str'>
Working directly with this `post_string` isn't very Pythonic. To easily access fields like "title" or "userId", we need to convert this JSON-formatted string into a native Python data structure.
Python's built-in json
library is perfect for this. First, make sure to import it:
import json
The library provides the json.loads()
function (short for "load string"). It takes a JSON string as input, parses it, and converts it into corresponding Python objects: JSON objects become dictionaries, JSON arrays become lists, strings become strings, numbers become integers or floats, booleans become `True`/`False`, and `null` becomes `None`.
import requests
import json
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
# Parse the JSON string into a Python dictionary
post_data = json.loads(post_string)
print(type(post_data))
# Output: <class 'dict'>
print(post_data["id"])
# Output: 1
print(post_data["title"])
# Output: sunt aut facere repellat provident occaecati excepturi optio reprehenderit
print(post_data["userId"])
# Output: 1
Now that `post_data` is a Python dictionary, manipulating it is straightforward. Let's say we want to create a function to add a prefix to the post title:
def prefix_title(post_dict, prefix="[UPDATED] "):
post_dict['title'] = prefix + post_dict['title']
return post_dict
We can apply this function:
updated_post_data = prefix_title(post_data)
print(updated_post_data['title'])
# Output: [UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit
After modifying the data, you might need to convert it back into a JSON string, perhaps to send it to another API or save it. Use the `json.dumps()` function (short for "dump string") for this:
updated_post_string = json.dumps(updated_post_data)
print(updated_post_string)
This will likely print a compact JSON string without extra whitespace:
{
"userId": 1,
"id": 1,
"title": "[UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
JSON's structure ignores whitespace, so this is perfectly valid. However, for readability during debugging or logging, you can use the `indent` argument in `json.dumps()`:
pretty_updated_post_string = json.dumps(updated_post_data, indent=4)
print(pretty_updated_post_string)
This produces a much more human-friendly output:
{
"userId": 1,
"id": 1,
"title": "[UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
Here's the complete sequence for handling a JSON string:
import requests
import json
# Function to modify the data
def prefix_title(post_dict, prefix="[UPDATED] "):
post_dict['title'] = prefix + post_dict['title']
return post_dict
# Fetch data
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
# Parse JSON string to Python dict
post_data = json.loads(post_string)
# Modify the Python dict
updated_post_data = prefix_title(post_data)
# Convert Python dict back to JSON string (pretty-printed)
updated_post_string_pretty = json.dumps(updated_post_data, indent=4)
# Print the result
print(updated_post_string_pretty)
Reading JSON Data From a File
Sometimes, JSON data resides in a file (typically with a .json
extension) rather than being received as a string directly in your script.
First, let's create an example file named tasks.json
. Populate it with a JSON array of task objects:
[
{
"taskId": 101,
"description": "Setup Python environment",
"status": "done",
"priority": 1
},
{
"taskId": 102,
"description": "Read JSON file",
"status": "in progress",
"priority": 2
},
{
"taskId": 103,
"description": "Write updated JSON file",
"status": "pending",
"priority": 2
},
{
"taskId": 104,
"description": "Learn about custom objects",
"status": "pending",
"priority": 3
}
]
Reading this file involves two main steps in Python: opening the file and then parsing its JSON content.
We use Python's built-in `open()` function, preferably within a `with` statement to ensure the file is automatically closed afterward.
# Make sure 'json' library is imported
import json
try:
with open("tasks.json", "r") as file_handle:
# File is open, now parse its content
pass # Placeholder for parsing logic
except FileNotFoundError:
print("Error: tasks.json not found.")
Inside the `with` block, we use the `json.load()` function (note: `load` without the 's'). This function reads from a file object (like our `file_handle`), unlike `json.loads()` which reads from a string.
import json
tasks_list = []
try:
with open("tasks.json", "r") as file_handle:
tasks_list = json.load(file_handle)
print(f"Successfully loaded {len(tasks_list)} tasks.")
print(type(tasks_list)) # Output: <class 'list'>
print(type(tasks_list[0])) # Output: <class 'dict'>
except FileNotFoundError:
print("Error: tasks.json not found.")
except json.JSONDecodeError:
print("Error: Could not decode JSON from tasks.json.")
Now, `tasks_list` holds the data from `tasks.json` as a Python list of dictionaries, ready for processing.
Writing Python Data to a JSON File
Just as we read from a file, we can also write our Python data structures (like lists and dictionaries) back into a JSON file.
Let's continue with our `tasks_list`. Suppose we want to update the status of all "pending" tasks to "scheduled". We can iterate through the list and modify the dictionaries:
# (Assuming tasks_list is loaded as shown previously)
for task in tasks_list:
if task.get("status") == "pending":
task["status"] = "scheduled"
print(f"Updated task {task.get('taskId')} to scheduled.")
To save this modified list back to a file, we again use `open()` but this time in write mode (`"w"`). Then, we use the `json.dump()` function (again, `dump` without the 's'), providing the Python data structure and the file object.
The code below creates a new file named `updated_tasks.json` and writes the modified `tasks_list` into it, using indentation for readability.
# (Assuming tasks_list is loaded and modified as shown previously)
import json
output_filename = "updated_tasks.json"
try:
with open(output_filename, "w") as file_handle:
json.dump(tasks_list, file_handle, indent=4)
print(f"Successfully wrote updated tasks to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
Here's the combined code for reading, modifying, and writing:
import json
input_filename = "tasks.json"
output_filename = "updated_tasks.json"
tasks_list = []
# Read the JSON file
try:
with open(input_filename, "r") as f_in:
tasks_list = json.load(f_in)
print(f"Loaded {len(tasks_list)} tasks from {input_filename}.")
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
exit() # Exit if input file is missing
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
exit()
# Modify the data
updated_count = 0
for task in tasks_list:
if task.get("status") == "pending":
task["status"] = "scheduled"
updated_count += 1
print(f"Marked {updated_count} pending tasks as scheduled.")
# Write the updated data to a new JSON file
try:
with open(output_filename, "w") as f_out:
json.dump(tasks_list, f_out, indent=4)
print(f"Successfully wrote updated tasks to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
Decoding JSON into Custom Python Objects
While Python dictionaries are flexible, converting JSON directly into custom Python classes often makes more sense for complex applications. Classes allow you to bundle data attributes with methods that operate on that data, improving code organization and enabling features like dot notation access.
Let's define a `Task` class to represent our task data:
class Task:
def __init__(self, task_id, description, status, priority):
self.task_id = task_id
self.description = description
self.status = status
self.priority = priority
def __repr__(self):
# Provides a developer-friendly string representation
return f"Task(id={self.task_id}, desc='{self.description}', status='{self.status}', prio={self.priority})"
def mark_complete(self):
self.status = "completed"
print(f"Task {self.task_id} marked as completed.")
def increase_priority(self):
self.priority += 1
print(f"Task {self.task_id} priority increased to {self.priority}.")
With this class, actions like marking a task complete become more intuitive:
# Example usage (not related to JSON loading yet)
task_obj = Task(101, "Setup Python environment", "done", 1)
print(task_obj)
task_obj.increase_priority()
print(task_obj)
But how do we automatically convert the JSON data we loaded (which defaults to dictionaries) into instances of our `Task` class?
The `json.load()` and `json.loads()` functions accept an optional `object_hook` argument. This argument takes a function that will be called for every JSON object (which is initially decoded into a dictionary). This function can then transform the dictionary into any object you want, like an instance of our `Task` class. See the official documentation for more details.
Let's add a static method to our `Task` class to act as the object hook:
class Task: # ... (previous methods __init__, __repr__, etc.) ...
@staticmethod
def from_json_dict(dct):
# Check if the dictionary has the keys we expect for a Task
if "taskId" in dct and "description" in dct and "status" in dct and "priority" in dct:
return Task(
task_id=dct["taskId"],
description=dct["description"],
status=dct["status"],
priority=dct["priority"]
)
# If it doesn't look like a Task, return the original dict
return dct
This `from_json_dict` function takes a dictionary (`dct`). If the dictionary contains the keys expected for a task, it creates and returns a `Task` object using those values. Otherwise, it returns the dictionary unmodified (this is important for handling nested structures where not everything is a task).
Now, we use this method as the `object_hook` when loading:
import json
# (Task class definition including from_json_dict goes here)
task_objects = []
input_filename = "tasks.json"
try:
with open(input_filename, "r") as f_in:
# Use the object_hook here!
task_objects = json.load(f_in, object_hook=Task.from_json_dict)
print(f"Loaded {len(task_objects)} task objects.")
print(type(task_objects[0])) # Output: <class '__main__.Task'>
# Now we can use Task methods directly
for task in task_objects:
if task.status == "scheduled":
task.mark_complete()
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
Now, `task_objects` is a list of `Task` instances, allowing you to work with your data in a much more object-oriented way.
Encoding Custom Python Objects using JSONEncoder
Converting our custom `Task` objects back into JSON format requires a bit more setup than decoding. The standard `json.dump()` function doesn't inherently know how to serialize arbitrary objects like our `Task` instances.
We need to tell the JSON library how to handle these objects. This is done by creating a custom encoder class that inherits from `json.JSONEncoder` and overriding its `default()` method. The `default()` method is called whenever the encoder encounters an object it doesn't recognize (like our `Task`). Inside this method, we must convert the object into a JSON-serializable format, typically a dictionary.
import json
# (Task class definition goes here)
class TaskEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Task):
# If the object is a Task instance, return its dictionary representation
return {
"taskId": obj.task_id,
"description": obj.description,
"status": obj.status,
"priority": obj.priority
# You could add "__type__": "Task" here if needed for complex decoding
}
# Let the base class default method handle other types
return super().default(obj)
This `TaskEncoder` checks if the object `obj` it receives is an instance of `Task`. If so, it returns a dictionary containing the task's data. Otherwise, it calls the default implementation of the base class to handle standard types (lists, dicts, strings, etc.).
To use this custom encoder, pass it to the `cls` argument of `json.dump()` or `json.dumps()`:
# (Assuming task_objects is a list of Task instances modified previously)
import json
# (Task class and TaskEncoder class definitions go here)
output_filename = "completed_tasks.json"
try:
with open(output_filename, "w") as f_out:
# Use the custom encoder via the 'cls' argument
json.dump(task_objects, f_out, indent=4, cls=TaskEncoder)
print(f"Successfully wrote completed tasks to {output_filename} using custom encoder.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
This will create a `completed_tasks.json` file containing the JSON representation of your `Task` objects, formatted nicely thanks to `indent=4`.
Here's the full code combining custom decoding and encoding:
import json
# --- Custom Task Class ---
class Task:
def __init__(self, task_id, description, status, priority):
self.task_id = task_id
self.description = description
self.status = status
self.priority = priority
def __repr__(self):
return f"Task(id={self.task_id}, desc='{self.description}', status='{self.status}', prio={self.priority})"
def mark_complete(self):
self.status = "completed"
def increase_priority(self):
self.priority += 1
@staticmethod
def from_json_dict(dct):
if "taskId" in dct and "description" in dct and "status" in dct and "priority" in dct:
return Task(dct["taskId"], dct["description"], dct["status"], dct["priority"])
return dct
# --- Custom JSON Encoder ---
class TaskEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Task):
return {
"taskId": obj.task_id,
"description": obj.description,
"status": obj.status,
"priority": obj.priority
}
return super().default(obj)
# --- Main Script Logic ---
input_filename = "tasks.json"
output_filename = "final_tasks.json"
task_objects = []
# Decode JSON file into Task objects using object_hook
try:
with open(input_filename, "r") as f_in:
task_objects = json.load(f_in, object_hook=Task.from_json_dict)
print(f"Loaded {len(task_objects)} Task objects from {input_filename}.")
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
exit()
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
exit()
# Modify the Task objects
for task in task_objects:
if task.status == "scheduled" or task.status == "in progress":
task.mark_complete()
print(f"Marked task {task.task_id} as complete.")
# Encode Task objects back to JSON file using custom encoder
try:
with open(output_filename, "w") as f_out:
json.dump(task_objects, f_out, indent=4, cls=TaskEncoder)
print(f"Successfully wrote final task list to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
And there you have it! You've explored several ways to handle JSON in Python, from simple string parsing to working with files and mapping data to custom objects. These techniques cover the most common scenarios you'll encounter when dealing with JSON data in your Python projects.
A great way to practice these skills is by working on projects that involve interacting with web APIs. Many public APIs return data in JSON format, offering ample opportunity to parse, process, and maybe even store the results. You can find a vast collection of public APIs to experiment with.
Understanding JSON: The Lingua Franca of Data Exchange
In the world of software development, JSON has become a cornerstone for exchanging data in a textual format. You'll bump into it everywhere – from frontend frameworks interacting with backends, to APIs delivering information, and even configuration files.
This guide will walk you through the essentials of handling JSON data using Python, covering how to parse it, read it from files, and write it back out.
So, What Exactly is JSON?
JSON, which stands for JavaScript Object Notation, is a lightweight format designed for easy data interchange. Despite its name, it's language-independent, but its structure is familiar to programmers who've worked with C-family languages, including Python.
Data in JSON is built around key-value pairs. The keys are always strings (enclosed in double quotes), while the values can be strings, numbers, booleans (true
/false
), arrays (ordered lists, enclosed in square brackets []
), other JSON objects (unordered collections of key-value pairs, enclosed in curly braces {}
), or the special value null
. You can find more details on supported data types here.
These key-value pairs can be nested. An object might contain other objects or arrays as values. For instance, you could represent a list of users like this:
[
{
"username": "coder_gal",
"join_date": "2023-01-15",
"active": true
},
{
"username": "py_ninja",
"join_date": "2022-11-01",
"active": false
}
]
This structure shows an array ([]
) containing two objects ({}
). Each object represents a user with properties like "username", "join_date", and "active" status.
Its simplicity and human-readability, coupled with its direct mapping to data structures found in many programming languages like Python and JavaScript, make JSON extremely popular for web APIs, configuration files, and simple data storage tasks.
Handling JSON Strings in Python
Often, you'll get JSON data as a plain text string within your Python script. This is common when fetching data from web APIs. When interacting with external services, perhaps via proxies to manage connections or access geo-specific content, the responses are frequently JSON-formatted strings.
Let's imagine fetching some data from a public API:
import requests
import json
# Fetch data from a sample API endpoint
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
print(post_string)
# Output might look like:
# {
# "userId": 1,
# "id": 1,
# "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
# "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
# }
print(type(post_string))
# Output: <class 'str'>
Working directly with this `post_string` isn't very Pythonic. To easily access fields like "title" or "userId", we need to convert this JSON-formatted string into a native Python data structure.
Python's built-in json
library is perfect for this. First, make sure to import it:
import json
The library provides the json.loads()
function (short for "load string"). It takes a JSON string as input, parses it, and converts it into corresponding Python objects: JSON objects become dictionaries, JSON arrays become lists, strings become strings, numbers become integers or floats, booleans become `True`/`False`, and `null` becomes `None`.
import requests
import json
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
# Parse the JSON string into a Python dictionary
post_data = json.loads(post_string)
print(type(post_data))
# Output: <class 'dict'>
print(post_data["id"])
# Output: 1
print(post_data["title"])
# Output: sunt aut facere repellat provident occaecati excepturi optio reprehenderit
print(post_data["userId"])
# Output: 1
Now that `post_data` is a Python dictionary, manipulating it is straightforward. Let's say we want to create a function to add a prefix to the post title:
def prefix_title(post_dict, prefix="[UPDATED] "):
post_dict['title'] = prefix + post_dict['title']
return post_dict
We can apply this function:
updated_post_data = prefix_title(post_data)
print(updated_post_data['title'])
# Output: [UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit
After modifying the data, you might need to convert it back into a JSON string, perhaps to send it to another API or save it. Use the `json.dumps()` function (short for "dump string") for this:
updated_post_string = json.dumps(updated_post_data)
print(updated_post_string)
This will likely print a compact JSON string without extra whitespace:
{
"userId": 1,
"id": 1,
"title": "[UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
JSON's structure ignores whitespace, so this is perfectly valid. However, for readability during debugging or logging, you can use the `indent` argument in `json.dumps()`:
pretty_updated_post_string = json.dumps(updated_post_data, indent=4)
print(pretty_updated_post_string)
This produces a much more human-friendly output:
{
"userId": 1,
"id": 1,
"title": "[UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
Here's the complete sequence for handling a JSON string:
import requests
import json
# Function to modify the data
def prefix_title(post_dict, prefix="[UPDATED] "):
post_dict['title'] = prefix + post_dict['title']
return post_dict
# Fetch data
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
# Parse JSON string to Python dict
post_data = json.loads(post_string)
# Modify the Python dict
updated_post_data = prefix_title(post_data)
# Convert Python dict back to JSON string (pretty-printed)
updated_post_string_pretty = json.dumps(updated_post_data, indent=4)
# Print the result
print(updated_post_string_pretty)
Reading JSON Data From a File
Sometimes, JSON data resides in a file (typically with a .json
extension) rather than being received as a string directly in your script.
First, let's create an example file named tasks.json
. Populate it with a JSON array of task objects:
[
{
"taskId": 101,
"description": "Setup Python environment",
"status": "done",
"priority": 1
},
{
"taskId": 102,
"description": "Read JSON file",
"status": "in progress",
"priority": 2
},
{
"taskId": 103,
"description": "Write updated JSON file",
"status": "pending",
"priority": 2
},
{
"taskId": 104,
"description": "Learn about custom objects",
"status": "pending",
"priority": 3
}
]
Reading this file involves two main steps in Python: opening the file and then parsing its JSON content.
We use Python's built-in `open()` function, preferably within a `with` statement to ensure the file is automatically closed afterward.
# Make sure 'json' library is imported
import json
try:
with open("tasks.json", "r") as file_handle:
# File is open, now parse its content
pass # Placeholder for parsing logic
except FileNotFoundError:
print("Error: tasks.json not found.")
Inside the `with` block, we use the `json.load()` function (note: `load` without the 's'). This function reads from a file object (like our `file_handle`), unlike `json.loads()` which reads from a string.
import json
tasks_list = []
try:
with open("tasks.json", "r") as file_handle:
tasks_list = json.load(file_handle)
print(f"Successfully loaded {len(tasks_list)} tasks.")
print(type(tasks_list)) # Output: <class 'list'>
print(type(tasks_list[0])) # Output: <class 'dict'>
except FileNotFoundError:
print("Error: tasks.json not found.")
except json.JSONDecodeError:
print("Error: Could not decode JSON from tasks.json.")
Now, `tasks_list` holds the data from `tasks.json` as a Python list of dictionaries, ready for processing.
Writing Python Data to a JSON File
Just as we read from a file, we can also write our Python data structures (like lists and dictionaries) back into a JSON file.
Let's continue with our `tasks_list`. Suppose we want to update the status of all "pending" tasks to "scheduled". We can iterate through the list and modify the dictionaries:
# (Assuming tasks_list is loaded as shown previously)
for task in tasks_list:
if task.get("status") == "pending":
task["status"] = "scheduled"
print(f"Updated task {task.get('taskId')} to scheduled.")
To save this modified list back to a file, we again use `open()` but this time in write mode (`"w"`). Then, we use the `json.dump()` function (again, `dump` without the 's'), providing the Python data structure and the file object.
The code below creates a new file named `updated_tasks.json` and writes the modified `tasks_list` into it, using indentation for readability.
# (Assuming tasks_list is loaded and modified as shown previously)
import json
output_filename = "updated_tasks.json"
try:
with open(output_filename, "w") as file_handle:
json.dump(tasks_list, file_handle, indent=4)
print(f"Successfully wrote updated tasks to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
Here's the combined code for reading, modifying, and writing:
import json
input_filename = "tasks.json"
output_filename = "updated_tasks.json"
tasks_list = []
# Read the JSON file
try:
with open(input_filename, "r") as f_in:
tasks_list = json.load(f_in)
print(f"Loaded {len(tasks_list)} tasks from {input_filename}.")
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
exit() # Exit if input file is missing
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
exit()
# Modify the data
updated_count = 0
for task in tasks_list:
if task.get("status") == "pending":
task["status"] = "scheduled"
updated_count += 1
print(f"Marked {updated_count} pending tasks as scheduled.")
# Write the updated data to a new JSON file
try:
with open(output_filename, "w") as f_out:
json.dump(tasks_list, f_out, indent=4)
print(f"Successfully wrote updated tasks to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
Decoding JSON into Custom Python Objects
While Python dictionaries are flexible, converting JSON directly into custom Python classes often makes more sense for complex applications. Classes allow you to bundle data attributes with methods that operate on that data, improving code organization and enabling features like dot notation access.
Let's define a `Task` class to represent our task data:
class Task:
def __init__(self, task_id, description, status, priority):
self.task_id = task_id
self.description = description
self.status = status
self.priority = priority
def __repr__(self):
# Provides a developer-friendly string representation
return f"Task(id={self.task_id}, desc='{self.description}', status='{self.status}', prio={self.priority})"
def mark_complete(self):
self.status = "completed"
print(f"Task {self.task_id} marked as completed.")
def increase_priority(self):
self.priority += 1
print(f"Task {self.task_id} priority increased to {self.priority}.")
With this class, actions like marking a task complete become more intuitive:
# Example usage (not related to JSON loading yet)
task_obj = Task(101, "Setup Python environment", "done", 1)
print(task_obj)
task_obj.increase_priority()
print(task_obj)
But how do we automatically convert the JSON data we loaded (which defaults to dictionaries) into instances of our `Task` class?
The `json.load()` and `json.loads()` functions accept an optional `object_hook` argument. This argument takes a function that will be called for every JSON object (which is initially decoded into a dictionary). This function can then transform the dictionary into any object you want, like an instance of our `Task` class. See the official documentation for more details.
Let's add a static method to our `Task` class to act as the object hook:
class Task: # ... (previous methods __init__, __repr__, etc.) ...
@staticmethod
def from_json_dict(dct):
# Check if the dictionary has the keys we expect for a Task
if "taskId" in dct and "description" in dct and "status" in dct and "priority" in dct:
return Task(
task_id=dct["taskId"],
description=dct["description"],
status=dct["status"],
priority=dct["priority"]
)
# If it doesn't look like a Task, return the original dict
return dct
This `from_json_dict` function takes a dictionary (`dct`). If the dictionary contains the keys expected for a task, it creates and returns a `Task` object using those values. Otherwise, it returns the dictionary unmodified (this is important for handling nested structures where not everything is a task).
Now, we use this method as the `object_hook` when loading:
import json
# (Task class definition including from_json_dict goes here)
task_objects = []
input_filename = "tasks.json"
try:
with open(input_filename, "r") as f_in:
# Use the object_hook here!
task_objects = json.load(f_in, object_hook=Task.from_json_dict)
print(f"Loaded {len(task_objects)} task objects.")
print(type(task_objects[0])) # Output: <class '__main__.Task'>
# Now we can use Task methods directly
for task in task_objects:
if task.status == "scheduled":
task.mark_complete()
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
Now, `task_objects` is a list of `Task` instances, allowing you to work with your data in a much more object-oriented way.
Encoding Custom Python Objects using JSONEncoder
Converting our custom `Task` objects back into JSON format requires a bit more setup than decoding. The standard `json.dump()` function doesn't inherently know how to serialize arbitrary objects like our `Task` instances.
We need to tell the JSON library how to handle these objects. This is done by creating a custom encoder class that inherits from `json.JSONEncoder` and overriding its `default()` method. The `default()` method is called whenever the encoder encounters an object it doesn't recognize (like our `Task`). Inside this method, we must convert the object into a JSON-serializable format, typically a dictionary.
import json
# (Task class definition goes here)
class TaskEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Task):
# If the object is a Task instance, return its dictionary representation
return {
"taskId": obj.task_id,
"description": obj.description,
"status": obj.status,
"priority": obj.priority
# You could add "__type__": "Task" here if needed for complex decoding
}
# Let the base class default method handle other types
return super().default(obj)
This `TaskEncoder` checks if the object `obj` it receives is an instance of `Task`. If so, it returns a dictionary containing the task's data. Otherwise, it calls the default implementation of the base class to handle standard types (lists, dicts, strings, etc.).
To use this custom encoder, pass it to the `cls` argument of `json.dump()` or `json.dumps()`:
# (Assuming task_objects is a list of Task instances modified previously)
import json
# (Task class and TaskEncoder class definitions go here)
output_filename = "completed_tasks.json"
try:
with open(output_filename, "w") as f_out:
# Use the custom encoder via the 'cls' argument
json.dump(task_objects, f_out, indent=4, cls=TaskEncoder)
print(f"Successfully wrote completed tasks to {output_filename} using custom encoder.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
This will create a `completed_tasks.json` file containing the JSON representation of your `Task` objects, formatted nicely thanks to `indent=4`.
Here's the full code combining custom decoding and encoding:
import json
# --- Custom Task Class ---
class Task:
def __init__(self, task_id, description, status, priority):
self.task_id = task_id
self.description = description
self.status = status
self.priority = priority
def __repr__(self):
return f"Task(id={self.task_id}, desc='{self.description}', status='{self.status}', prio={self.priority})"
def mark_complete(self):
self.status = "completed"
def increase_priority(self):
self.priority += 1
@staticmethod
def from_json_dict(dct):
if "taskId" in dct and "description" in dct and "status" in dct and "priority" in dct:
return Task(dct["taskId"], dct["description"], dct["status"], dct["priority"])
return dct
# --- Custom JSON Encoder ---
class TaskEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Task):
return {
"taskId": obj.task_id,
"description": obj.description,
"status": obj.status,
"priority": obj.priority
}
return super().default(obj)
# --- Main Script Logic ---
input_filename = "tasks.json"
output_filename = "final_tasks.json"
task_objects = []
# Decode JSON file into Task objects using object_hook
try:
with open(input_filename, "r") as f_in:
task_objects = json.load(f_in, object_hook=Task.from_json_dict)
print(f"Loaded {len(task_objects)} Task objects from {input_filename}.")
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
exit()
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
exit()
# Modify the Task objects
for task in task_objects:
if task.status == "scheduled" or task.status == "in progress":
task.mark_complete()
print(f"Marked task {task.task_id} as complete.")
# Encode Task objects back to JSON file using custom encoder
try:
with open(output_filename, "w") as f_out:
json.dump(task_objects, f_out, indent=4, cls=TaskEncoder)
print(f"Successfully wrote final task list to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
And there you have it! You've explored several ways to handle JSON in Python, from simple string parsing to working with files and mapping data to custom objects. These techniques cover the most common scenarios you'll encounter when dealing with JSON data in your Python projects.
A great way to practice these skills is by working on projects that involve interacting with web APIs. Many public APIs return data in JSON format, offering ample opportunity to parse, process, and maybe even store the results. You can find a vast collection of public APIs to experiment with.
Understanding JSON: The Lingua Franca of Data Exchange
In the world of software development, JSON has become a cornerstone for exchanging data in a textual format. You'll bump into it everywhere – from frontend frameworks interacting with backends, to APIs delivering information, and even configuration files.
This guide will walk you through the essentials of handling JSON data using Python, covering how to parse it, read it from files, and write it back out.
So, What Exactly is JSON?
JSON, which stands for JavaScript Object Notation, is a lightweight format designed for easy data interchange. Despite its name, it's language-independent, but its structure is familiar to programmers who've worked with C-family languages, including Python.
Data in JSON is built around key-value pairs. The keys are always strings (enclosed in double quotes), while the values can be strings, numbers, booleans (true
/false
), arrays (ordered lists, enclosed in square brackets []
), other JSON objects (unordered collections of key-value pairs, enclosed in curly braces {}
), or the special value null
. You can find more details on supported data types here.
These key-value pairs can be nested. An object might contain other objects or arrays as values. For instance, you could represent a list of users like this:
[
{
"username": "coder_gal",
"join_date": "2023-01-15",
"active": true
},
{
"username": "py_ninja",
"join_date": "2022-11-01",
"active": false
}
]
This structure shows an array ([]
) containing two objects ({}
). Each object represents a user with properties like "username", "join_date", and "active" status.
Its simplicity and human-readability, coupled with its direct mapping to data structures found in many programming languages like Python and JavaScript, make JSON extremely popular for web APIs, configuration files, and simple data storage tasks.
Handling JSON Strings in Python
Often, you'll get JSON data as a plain text string within your Python script. This is common when fetching data from web APIs. When interacting with external services, perhaps via proxies to manage connections or access geo-specific content, the responses are frequently JSON-formatted strings.
Let's imagine fetching some data from a public API:
import requests
import json
# Fetch data from a sample API endpoint
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
print(post_string)
# Output might look like:
# {
# "userId": 1,
# "id": 1,
# "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
# "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
# }
print(type(post_string))
# Output: <class 'str'>
Working directly with this `post_string` isn't very Pythonic. To easily access fields like "title" or "userId", we need to convert this JSON-formatted string into a native Python data structure.
Python's built-in json
library is perfect for this. First, make sure to import it:
import json
The library provides the json.loads()
function (short for "load string"). It takes a JSON string as input, parses it, and converts it into corresponding Python objects: JSON objects become dictionaries, JSON arrays become lists, strings become strings, numbers become integers or floats, booleans become `True`/`False`, and `null` becomes `None`.
import requests
import json
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
# Parse the JSON string into a Python dictionary
post_data = json.loads(post_string)
print(type(post_data))
# Output: <class 'dict'>
print(post_data["id"])
# Output: 1
print(post_data["title"])
# Output: sunt aut facere repellat provident occaecati excepturi optio reprehenderit
print(post_data["userId"])
# Output: 1
Now that `post_data` is a Python dictionary, manipulating it is straightforward. Let's say we want to create a function to add a prefix to the post title:
def prefix_title(post_dict, prefix="[UPDATED] "):
post_dict['title'] = prefix + post_dict['title']
return post_dict
We can apply this function:
updated_post_data = prefix_title(post_data)
print(updated_post_data['title'])
# Output: [UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit
After modifying the data, you might need to convert it back into a JSON string, perhaps to send it to another API or save it. Use the `json.dumps()` function (short for "dump string") for this:
updated_post_string = json.dumps(updated_post_data)
print(updated_post_string)
This will likely print a compact JSON string without extra whitespace:
{
"userId": 1,
"id": 1,
"title": "[UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
JSON's structure ignores whitespace, so this is perfectly valid. However, for readability during debugging or logging, you can use the `indent` argument in `json.dumps()`:
pretty_updated_post_string = json.dumps(updated_post_data, indent=4)
print(pretty_updated_post_string)
This produces a much more human-friendly output:
{
"userId": 1,
"id": 1,
"title": "[UPDATED] sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
Here's the complete sequence for handling a JSON string:
import requests
import json
# Function to modify the data
def prefix_title(post_dict, prefix="[UPDATED] "):
post_dict['title'] = prefix + post_dict['title']
return post_dict
# Fetch data
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
post_string = response.text
# Parse JSON string to Python dict
post_data = json.loads(post_string)
# Modify the Python dict
updated_post_data = prefix_title(post_data)
# Convert Python dict back to JSON string (pretty-printed)
updated_post_string_pretty = json.dumps(updated_post_data, indent=4)
# Print the result
print(updated_post_string_pretty)
Reading JSON Data From a File
Sometimes, JSON data resides in a file (typically with a .json
extension) rather than being received as a string directly in your script.
First, let's create an example file named tasks.json
. Populate it with a JSON array of task objects:
[
{
"taskId": 101,
"description": "Setup Python environment",
"status": "done",
"priority": 1
},
{
"taskId": 102,
"description": "Read JSON file",
"status": "in progress",
"priority": 2
},
{
"taskId": 103,
"description": "Write updated JSON file",
"status": "pending",
"priority": 2
},
{
"taskId": 104,
"description": "Learn about custom objects",
"status": "pending",
"priority": 3
}
]
Reading this file involves two main steps in Python: opening the file and then parsing its JSON content.
We use Python's built-in `open()` function, preferably within a `with` statement to ensure the file is automatically closed afterward.
# Make sure 'json' library is imported
import json
try:
with open("tasks.json", "r") as file_handle:
# File is open, now parse its content
pass # Placeholder for parsing logic
except FileNotFoundError:
print("Error: tasks.json not found.")
Inside the `with` block, we use the `json.load()` function (note: `load` without the 's'). This function reads from a file object (like our `file_handle`), unlike `json.loads()` which reads from a string.
import json
tasks_list = []
try:
with open("tasks.json", "r") as file_handle:
tasks_list = json.load(file_handle)
print(f"Successfully loaded {len(tasks_list)} tasks.")
print(type(tasks_list)) # Output: <class 'list'>
print(type(tasks_list[0])) # Output: <class 'dict'>
except FileNotFoundError:
print("Error: tasks.json not found.")
except json.JSONDecodeError:
print("Error: Could not decode JSON from tasks.json.")
Now, `tasks_list` holds the data from `tasks.json` as a Python list of dictionaries, ready for processing.
Writing Python Data to a JSON File
Just as we read from a file, we can also write our Python data structures (like lists and dictionaries) back into a JSON file.
Let's continue with our `tasks_list`. Suppose we want to update the status of all "pending" tasks to "scheduled". We can iterate through the list and modify the dictionaries:
# (Assuming tasks_list is loaded as shown previously)
for task in tasks_list:
if task.get("status") == "pending":
task["status"] = "scheduled"
print(f"Updated task {task.get('taskId')} to scheduled.")
To save this modified list back to a file, we again use `open()` but this time in write mode (`"w"`). Then, we use the `json.dump()` function (again, `dump` without the 's'), providing the Python data structure and the file object.
The code below creates a new file named `updated_tasks.json` and writes the modified `tasks_list` into it, using indentation for readability.
# (Assuming tasks_list is loaded and modified as shown previously)
import json
output_filename = "updated_tasks.json"
try:
with open(output_filename, "w") as file_handle:
json.dump(tasks_list, file_handle, indent=4)
print(f"Successfully wrote updated tasks to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
Here's the combined code for reading, modifying, and writing:
import json
input_filename = "tasks.json"
output_filename = "updated_tasks.json"
tasks_list = []
# Read the JSON file
try:
with open(input_filename, "r") as f_in:
tasks_list = json.load(f_in)
print(f"Loaded {len(tasks_list)} tasks from {input_filename}.")
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
exit() # Exit if input file is missing
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
exit()
# Modify the data
updated_count = 0
for task in tasks_list:
if task.get("status") == "pending":
task["status"] = "scheduled"
updated_count += 1
print(f"Marked {updated_count} pending tasks as scheduled.")
# Write the updated data to a new JSON file
try:
with open(output_filename, "w") as f_out:
json.dump(tasks_list, f_out, indent=4)
print(f"Successfully wrote updated tasks to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
Decoding JSON into Custom Python Objects
While Python dictionaries are flexible, converting JSON directly into custom Python classes often makes more sense for complex applications. Classes allow you to bundle data attributes with methods that operate on that data, improving code organization and enabling features like dot notation access.
Let's define a `Task` class to represent our task data:
class Task:
def __init__(self, task_id, description, status, priority):
self.task_id = task_id
self.description = description
self.status = status
self.priority = priority
def __repr__(self):
# Provides a developer-friendly string representation
return f"Task(id={self.task_id}, desc='{self.description}', status='{self.status}', prio={self.priority})"
def mark_complete(self):
self.status = "completed"
print(f"Task {self.task_id} marked as completed.")
def increase_priority(self):
self.priority += 1
print(f"Task {self.task_id} priority increased to {self.priority}.")
With this class, actions like marking a task complete become more intuitive:
# Example usage (not related to JSON loading yet)
task_obj = Task(101, "Setup Python environment", "done", 1)
print(task_obj)
task_obj.increase_priority()
print(task_obj)
But how do we automatically convert the JSON data we loaded (which defaults to dictionaries) into instances of our `Task` class?
The `json.load()` and `json.loads()` functions accept an optional `object_hook` argument. This argument takes a function that will be called for every JSON object (which is initially decoded into a dictionary). This function can then transform the dictionary into any object you want, like an instance of our `Task` class. See the official documentation for more details.
Let's add a static method to our `Task` class to act as the object hook:
class Task: # ... (previous methods __init__, __repr__, etc.) ...
@staticmethod
def from_json_dict(dct):
# Check if the dictionary has the keys we expect for a Task
if "taskId" in dct and "description" in dct and "status" in dct and "priority" in dct:
return Task(
task_id=dct["taskId"],
description=dct["description"],
status=dct["status"],
priority=dct["priority"]
)
# If it doesn't look like a Task, return the original dict
return dct
This `from_json_dict` function takes a dictionary (`dct`). If the dictionary contains the keys expected for a task, it creates and returns a `Task` object using those values. Otherwise, it returns the dictionary unmodified (this is important for handling nested structures where not everything is a task).
Now, we use this method as the `object_hook` when loading:
import json
# (Task class definition including from_json_dict goes here)
task_objects = []
input_filename = "tasks.json"
try:
with open(input_filename, "r") as f_in:
# Use the object_hook here!
task_objects = json.load(f_in, object_hook=Task.from_json_dict)
print(f"Loaded {len(task_objects)} task objects.")
print(type(task_objects[0])) # Output: <class '__main__.Task'>
# Now we can use Task methods directly
for task in task_objects:
if task.status == "scheduled":
task.mark_complete()
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
Now, `task_objects` is a list of `Task` instances, allowing you to work with your data in a much more object-oriented way.
Encoding Custom Python Objects using JSONEncoder
Converting our custom `Task` objects back into JSON format requires a bit more setup than decoding. The standard `json.dump()` function doesn't inherently know how to serialize arbitrary objects like our `Task` instances.
We need to tell the JSON library how to handle these objects. This is done by creating a custom encoder class that inherits from `json.JSONEncoder` and overriding its `default()` method. The `default()` method is called whenever the encoder encounters an object it doesn't recognize (like our `Task`). Inside this method, we must convert the object into a JSON-serializable format, typically a dictionary.
import json
# (Task class definition goes here)
class TaskEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Task):
# If the object is a Task instance, return its dictionary representation
return {
"taskId": obj.task_id,
"description": obj.description,
"status": obj.status,
"priority": obj.priority
# You could add "__type__": "Task" here if needed for complex decoding
}
# Let the base class default method handle other types
return super().default(obj)
This `TaskEncoder` checks if the object `obj` it receives is an instance of `Task`. If so, it returns a dictionary containing the task's data. Otherwise, it calls the default implementation of the base class to handle standard types (lists, dicts, strings, etc.).
To use this custom encoder, pass it to the `cls` argument of `json.dump()` or `json.dumps()`:
# (Assuming task_objects is a list of Task instances modified previously)
import json
# (Task class and TaskEncoder class definitions go here)
output_filename = "completed_tasks.json"
try:
with open(output_filename, "w") as f_out:
# Use the custom encoder via the 'cls' argument
json.dump(task_objects, f_out, indent=4, cls=TaskEncoder)
print(f"Successfully wrote completed tasks to {output_filename} using custom encoder.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
This will create a `completed_tasks.json` file containing the JSON representation of your `Task` objects, formatted nicely thanks to `indent=4`.
Here's the full code combining custom decoding and encoding:
import json
# --- Custom Task Class ---
class Task:
def __init__(self, task_id, description, status, priority):
self.task_id = task_id
self.description = description
self.status = status
self.priority = priority
def __repr__(self):
return f"Task(id={self.task_id}, desc='{self.description}', status='{self.status}', prio={self.priority})"
def mark_complete(self):
self.status = "completed"
def increase_priority(self):
self.priority += 1
@staticmethod
def from_json_dict(dct):
if "taskId" in dct and "description" in dct and "status" in dct and "priority" in dct:
return Task(dct["taskId"], dct["description"], dct["status"], dct["priority"])
return dct
# --- Custom JSON Encoder ---
class TaskEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Task):
return {
"taskId": obj.task_id,
"description": obj.description,
"status": obj.status,
"priority": obj.priority
}
return super().default(obj)
# --- Main Script Logic ---
input_filename = "tasks.json"
output_filename = "final_tasks.json"
task_objects = []
# Decode JSON file into Task objects using object_hook
try:
with open(input_filename, "r") as f_in:
task_objects = json.load(f_in, object_hook=Task.from_json_dict)
print(f"Loaded {len(task_objects)} Task objects from {input_filename}.")
except FileNotFoundError:
print(f"Error: {input_filename} not found.")
exit()
except json.JSONDecodeError:
print(f"Error: Could not decode JSON from {input_filename}.")
exit()
# Modify the Task objects
for task in task_objects:
if task.status == "scheduled" or task.status == "in progress":
task.mark_complete()
print(f"Marked task {task.task_id} as complete.")
# Encode Task objects back to JSON file using custom encoder
try:
with open(output_filename, "w") as f_out:
json.dump(task_objects, f_out, indent=4, cls=TaskEncoder)
print(f"Successfully wrote final task list to {output_filename}.")
except IOError:
print(f"Error: Could not write to file {output_filename}.")
And there you have it! You've explored several ways to handle JSON in Python, from simple string parsing to working with files and mapping data to custom objects. These techniques cover the most common scenarios you'll encounter when dealing with JSON data in your Python projects.
A great way to practice these skills is by working on projects that involve interacting with web APIs. Many public APIs return data in JSON format, offering ample opportunity to parse, process, and maybe even store the results. You can find a vast collection of public APIs to experiment with.

Author
Sarah Whitmore
Digital Privacy & Cybersecurity Consultant
About Author
Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.