Structured Query Language (SQL) is an excellent tool when working with a database that follows a traditional relational model – one that contains data that is organized into structured tables with fixed schema and established relationships. However, things have the potential to break down when introducing unstructured data into the mix. Doing so requires careful consideration to prevent massive headaches that are looming right around the corner.
Despite the obvious benefits to a neatly structured database, there are times where unstructured data needs to be accommodated, and JSON is a massive culprit. In this article, I’ll be explaining what JSON is, how and when it might fit into a database, and how to work with it using PostgreSQL.
What is JSON?
JSON (JavaScript Object Notation) is a data interchange format that is frequently used when transmitting information. This format manages to be both human readable and easily parsed by machines, and it is primarily used between a server and web applications.
JSON typically consists of an object containing multiple key-value pairs called “fields”. Each key is followed by a value, separated by a colon. These values can take 5 different forms:
- Integers
- Strings
- Booleans
- Objects
- Arrays
{
"name": "John Doe",
"age": 30,
"isStudent": false
}
In the above example, we have a single object representing a person enclosed by curly braces. This object contains a string field (name), an integer field (age), and a boolean field (isStudent), along with their corresponding values.
In more complex examples, JSON objects can contain other nested objects. A good example of this can be shown by adding an address field:
{
"name": "John Doe",
"age": 30,
"isStudent": false,
"homeAddress": {
"street": "123 Main St.",
"city": "Las Vegas",
"state": "NV",
"country": "United States",
"zip": "89102"
}
}
Aside from continuing to nest more and more objects within themselves, likely the most complicated example we can look at would include an array. Arrays are enclosed by brackets, and include comma separated fields, which can be one of any of the above listed types – including objects, and even more arrays.
{
"name": "John Doe",
"age": 30,
"isStudent": false,
"homeAddress": {
"street": "123 Main St.",
"city": "Las Vegas",
"state": "NV",
"country": "United States",
"zip": "89102"
},
"languages": ["English", "Spanish", "Italian"]
}
In the above example, we’ve added a “languages” array of strings to keep track of what languages the person speaks.
When is JSON useful?
There are certain circumstances where JSON can be a viable option when expanding a database. JSON can be very useful when working with data that is semi-structured. In particular, it’s a useful addition when working with optional or differing fields on a particular object.
JSON is also useful when you need to easily store and retrieve complex and potentially nested data structures, where you don’t need to perform much logic on the data at the database level. Configuration settings or user preferences make perfect examples of information that could be stored within a JSON column.
One of the major pitfalls with JSON data in a relational database is the potential performance drawback if used incorrectly. It’s likely to be a massive mistake to store information you want to search or filter by within a JSON blob, because a text search needs to be done every time you search on it. This is particularly troublesome with deeply nested data, and can lead to major slowdowns in your application. This can be alleviated slightly by using PostgreSQL’s jsonb
type instead of json
, but queries will still be much more complicated.
JSON in PostgreSQL
When working with PostgreSQL, we have access to the json
and jsonb
data types. The types are nearly identical, but it’s important to note that json
is stored as text, while jsonb
is stored in a binary representation. The trade off between the two is important; jsonb
may take up slightly more storage space and incurs some overhead during INSERT
operations due to the conversion, but ends up allowing for much more efficient queries and indexing, significantly increasing performance in that regard. This data type was added by release 9.4 of PostgreSQL, and has become the clear choice in most circumstances.
Adding a jsonb
column to a table is simple with PostgreSQL. Like any other column, the ALTER
TABLE
statement with an ADD
COLUMN
clause will be necessary, specifying the jsonb data type.
ALTER TABLE customers
ADD COLUMN address jsonb;
For our example, we’ll be adding an address field to a customers table. Let’s assume the table also has an ID column and a Name column. The example will make use of multiple standard SQL commands – for a refresher, check out this article on data querying and manipulation.
Inserting JSON Data
To insert a new row that contains a json column, we’ll use an INSERT
statement like normal. For the address value, we’ll include the entire JSON object between single quotes like so:
INSERT INTO customers (name,address)
VALUES ('John Doe', '{"street": "123 Main St.", "city": "Las Vegas", "state": "NV", "zip": "89102"}');
Querying JSON Data
Retrieving JSON data is also somewhat straightforward. To obtain the jsonb
column we can reference the column name in a SELECT
statement as usual, but we can also obtain specific fields within the JSON object. To do so, we’ll use the double arrow operator "->>"
:
SELECT id, name, address->>'street' AS street
FROM customers
WHERE id = 1;
id | name | street
----+----------+--------------
1 | John Doe | 123 Main St.
The arrow operator comes in two forms which are useful in different situations. The "->>"
operator (which we used in the above query) returns the following JSON field as text. The "->"
operator, on the other hand, returns the following JSON field as a JSON object. This would be useful in cases where we need to traverse deeper into a nested JSON object, for example.
Updating JSON Data
So what happens if we want to edit just the “street” field within the address column? In order to actually get in and edit values within a JSON structure, we will need to use a function like jsonb_set
. This function takes in four arguments:
- The target
jsonb
column - The path to the desired field
- The new value for the field
- The
create_missing
boolean field
UPDATE customers
SET address = jsonb_set(
address,
'{street}',
'"321 Oak St."',
false,
)
WHERE id = 1;
The create_missing
boolean is used to determine whether the field should be created if it isn’t found. In our example case, the “street” field already exists, so there’s no need to set the argument to true. This argument in particular will default to “true” if it isn’t set.
In summary, JSON can be a valuable addition to a database in specific circumstances – as long as the disadvantages are kept in mind. PostgreSQL provides all of the necessary tools to interact with JSON, and we discussed how to use these tools to insert new JSON into the database, query specific fields from these columns, and how to update fields within a JSON object. If you have any questions, please feel free to leave a comment below.
Leave a Reply