Convert JSON to CSV with bash script

A tale of two formats: Converting JSON to CSV

Peter Andersson
May 5, 2016 · 4 min read
 

So this just happened:
- “Peter, can you send me a CSV of those search results?”
- “…yeah, sure…give me a minute”

We’d been looking up what metadata about podcasts we could get from querying iTunes. I the hacker, he the hustler.
I was playing around with the iTunes Search API using curl:

curl "https://itunes.apple.com/search?term=<some search term>&media=podcast&limit=200"

This gives you a JSON-formatted result. I thought it’d be easy transforming this data into CSV. I mean, of course there’s a simple-to-use tool out there for this, right?

WRONG!

It should be noted that I expected this to take about 2 minutes and then I’d be back doing what I do.
I initiated my usual procedure: Problem -> Google -> Solution (yeah that’s how we do it these days). After not too much digging I found a couple of possible solutions. Let’s have a look at them.

jsonv

A Bash command line tool for converting JSON to CSV
Seemed promising, but required the user to specify every single field to include in the result. This was suboptimal, I just wanted all the fields (why isn’t this the default?). Installed it anyway, but couldn’t get it to play well with the data from iTunes. My patience ran out.

jq

Here’s a popular and powerful tool for manipulating JSON data and there’s even a function in there for converting results into CSV. Score!
.. but here’s one of the first examples I found:

<input data> | jq -c ‘.results[] | {name, party, cash: .cash_on_hand} | select(.cash | tonumber > 1000000)’

To the trained eye, this may look straightforward except for the missing CSV conversion. To me however, at this point, it might’ve just as well been written in Wingdings.
I remembered having used jq sometime in the past, but now I was painfully reminded I’d forgotten how to use it. So how did I modify the command to fit my data? Well, my 2 minutes were long gone and I was starting to get frustrated. Still hoping to find that one simple solution I decided to move on.

json2csv

The name sure implies it can do the job and the example from the GitHub page looks simple:

cat input.json | json2csv -k user.name,remote_ip > output.csv

..but wait.. do I need to specify the fields?! Installed it, tried it, couldn’t get it to play well with my data, gave up.

Now I was seriously fed up. FINE, I’ll dig into jq.

I’ll spare you the gory details of my going through the manual. Instead, let’s look at my first attempt:

<itunes data> | jq -r '.results[] | [.[] | tostring] | @csv'

jq is built around filters. You specify what filters to use from left to right and connect them through the common pipe symbol ‘|’ to create a new filter. Here are the subfilters used above:

(1) .results[](2) [.[] | tostring](3) @csv

iTunes returns a JSON object with 2 fields: resultCount and results. In results we find an array with a JSON object for each search result.
A filter with a dot + a field name extracts the value of that field, so in filter (1) we extract the array using .results, then append brackets [] to unpack the array and return each object as a separate entity. This is what makes it possible to get one result per row in the final result.
Using the filter .[] we can extract the values for all fields in an object and return the results on separate rows. However, doing so would be counter-productive since we would no longer be able to differentiate values from different objects. To counteract this we’re grouping values in arrays by wrapping the filter in brackets [.[]] . If we were to apply filter (3) now we would get errors complaining about invalid CSV rows. This is due to to some fields having arrays as values in this particular dataset. An example is genre: [‘Podcasting’, ‘Education’, ‘Business’]. We can get around this by converting all values to strings, which leads us to filter (2). It’s important we pipe these subfilters together inside the brackets to get this to work.
In the last filter (3) we convert each array to a CSV row, but there is one final problem to consider. @csv consider values like “podcast” as strings including the quotes. The quotes are then escaped and extra quotes are added so the final result is “\”podcast\””. For this reason we add the flag -r to jq which drops one set of quotes. Victory!

This story could have ended here, but after coming this far, I made a heartbreaking discovery. Each search result object wasn’t including the same set of data. Fields could be present in some objects, but not in others. Also, in a JSON object the order of the fields is not relevant.. and in the iTunes data, the order can differ from object to object, even if they have the same set of fields…

………

………

………

………

…so it turns out I need to specify which fields to use and in what order! F**K!
Now, do I go back to trying jsonv/json2csv again? No way. I know better now, let’s do this with jq. Here’s the final solution at the very end of my journey:

<itunes data> | jq -r '.results[] | [.artistName, .collectionName, .feedUrl, .releaseDate, .country, .primaryGenreName, .genres | tostring] | @csv'

I’m not even going to try explaining this command, by now I hope you’re able to interpret it and if need be save the data to a file.

Lesson: learn powerful tools and learn them properly. It will make life as a developer a lot easier.

Peter out.

5 (2)
Article Rating (2 Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments (2)
Comment By frank - Mon, Nov 29th, 2021 7:09 AM
Thanks for this. saved me hours of digging through azure json files.
Comment By Stephan Wenderlich - Tue, Aug 24th, 2021 11:23 AM
Hi I know how powerful jq can be and it takes a while to understand all its details. However, you article is spot on and saved some valuable time.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
Transform XML to CSV Format | Unix String Pattern Manipulation The Ugly Way
Viewed 5197 times since Sun, Jan 9, 2022
Unix - Find command examples
Viewed 4501 times since Fri, Jun 8, 2018
To do a quick check on the number of path present (does not mean all are Enabled] using for loop
Viewed 3794 times since Fri, Jun 8, 2018
java Simple Java JDBC Program To Execute SQL Update Statement
Viewed 12487 times since Sun, Jan 9, 2022
Unix - Examples for grep command
Viewed 2978 times since Fri, Jun 8, 2018
bash for do done AIX
Viewed 1775 times since Mon, Jun 4, 2018
Convert CSV to JSON with jq
Viewed 27155 times since Mon, Jan 20, 2020
O’Reilly’s CD bookshelf
Viewed 12345 times since Wed, Jun 27, 2018
Unix - eval command example
Viewed 1936 times since Fri, Jun 8, 2018
Epoch & Unix Timestamp Conversion Tools
Viewed 61498 times since Fri, Jun 22, 2018