Trifacta Wrangler

Trifacta Wrangler is a free program (currently in beta) that helps you clean up data sets and gives you a first cut at basic analysis. It is great for quickly turning messy data into structured, manageable formats.

Trifacta Wrangler

In the past few days I’ve used it to analyze huge log files and turn messy JSON in structured CSVs that I could import into SQL.

Quick tips:

  • splitrows always has to come first. The program usually tries to split by \n (new line) first, but that doesn’t always work for JSON. Try splitting by something like },{, or do a quick find and replace ( },{ for }|||{ ) and do the split by ||| if you want to keep the curly brackets for an unnest.
  • unnest is very powerful for splitting out JSON values out into separate columns titled by their keys.
  • flatten works better than unnest in cases where the JSON does not have keys. It creates new rows and repeats other values in adjacent columns to keep the relation. This works well if you have an ID column and are going to eventually stuff things into a relational database.

Here is documentation for the Transforms.

View more TIL posts