It’s time to lose your fear of spreadsheets and jump into the numbers
There used to always be an awkward moment when I got asked the question by another reporter: “What does a data journalist do?”
There would be a pause while I searched for another way to explain it. But no matter what I came up with, it never seemed to really answer the question. And the reason, I’ve finally
worked out, is that the distinction is false. All journalism is data journalism.
Every story is made up of data in the form of interviews, statistics, findings, observations and background information. Unfortunately much of that information is in an unstructured
format – transcripts, notes and clippings – that can’t easily be manipulated.
This is in contrast to structured information, most commonly seen in a spreadsheet. This is information that can be sorted, aggregated and
combined with other datasets.
What data journalism does is take advantage of two technological shifts that have opened up that world of structured information for reporters.
The first change has been the ever-growing body of publicly available sets of structured data that journalists can explore, cross reference, mash together and manipulate.
The federal government’s data.gov.au website alone lists about 1100 datasets that various agencies have released.
In many cases the data is in a spreadsheet format that can be explored using the Microsoft Excel program.
Using spreadsheets and structured data will be familiar ground to many business reporters who have been data journalists for years, manipulating aggregated stockmarket information, economic indicators and survey information to produce stories.
The second change has been a steady increase in the number of tools that nonexperts can use to explore and visualise the datasets.
Tools such as Google Fusion Tables, GeoCommons and Tableau let journalists map out data and use charts to analyse data in ways previously available only to experts.
The overall process of data journalism is similar to what we would normally do as journalists – with a few extra steps along the way.
1. Get the data This can be as easy as going to a government website like the Australian Bureau of Statistics, tenders.gov.au or BOCSAR (the Bureau of Crime
Statistics and Research) and downloading a spreadsheet. Or it can be as difficult as having to manually enter information that has been uploaded to a government website
on a PDF of a photocopied page.
A story I worked on with The Australian Financial Review’s accounting reporter, Agnes King, about auditor independence turned on the definition of audit and
non-audit work. There was no other way than to go through each annual report and record the data manually.
Often being a data journalist really translates into being a data entry journalist.
2. Analyse the data This is where you “interview” the data to see if it has anything interesting to say. It can involve finding the aggregates of the information,
ranking the fields or visualising the data.
We usually use Excel, OpenOffice’s Calc or Google’s Spreadsheets to find the key features of the data, such as averages and totals, and for charting. If we need more complex visualisations, especially if there is some geographic element involved, we’ll use GeoCommons, Google Fusion Tables or Tableau to map out the data.
Another story I worked on at the AFR involved looking for the suburbs with the highest percentage of unoccupied properties across the country.
The lightbulb moment came when I mapped the data out using Google Fusion.
It clearly showed that the most unoccupied areas tended to be those coastal regions popular with sea-changers.
It was like a nice tip-off.
A few calls later Ben Hurley, one of our property journalists, had a story that showed that many sea-changers were heading back to the city because the coastal towns were too far away from family and friends and lacked many of the services they needed as they got older.
3. Audit the data This is the most painful part of the process, but it is critical because one wrong cell or figure can make everything you have produced worthless.
I often redo my analysis a few times to make sure the end result tallies up, and talk with internal and external experts about how I went about my calculations to try and find a flaw
in the process.
4. Report Like any story, there is often no correlation between the amount of effort that goes into getting and analysing the data and the value of the results. Most often, the analysis
simply provides a pointer in the direction that the reporter should head. But there are also times where hours, days and weeks of analytical work produces… very little that is newsworthy.
I’ve carried out many Census data crunches that led to conclusions that were completely obvious or already known.
The challenge there was to take a breath, let it go and move on to the next story.
Often the information became useful in another context as part of a series of charts building an argument or narrative.
5. Deliver the data The most straightforward approach is to simply do it the old-fashioned way: write a story, create a graphic and commission a relevant photo.
A recent analysis I did of detailed employment data centred on two charts showing where jobs were being created and where they were being lost.
In this case, the charts were the heart of the story.
These days there is also the opportunity to tell stories in a number of more exciting ways. You can provide the primary documents, publish interactive graphics, or publish interactive tools that give the readers the opportunity to explore the underlying information in your story.
This is important because the outlier information – the unusual or unexpected fact – is often the core of the story, while readers are more interested in the data that directly impacts their lives.
A story I did with another AFR writer, Katie Walsh, focused on the suburbs with the highest number of rich singles (the top suburb was Mosman in Sydney, if you must know). We published the full set of data.
This provided readers with the critical bit of information – the suburbs nearby with the highest number of rich singles. After all, it’s only fair that readers can do their own bit of data journalism as well.
DIY DATA MINING
Tools
GeoCommons: geocommons.com
Google Fusion Tables: google.com/fusiontables
Google Spreadsheets: google.com/google-d-s/spreadsheets
Microsoft Excel: office.microsoft.com/en-au/excel/
OpenOffice Calc: openoffice.org
Tableau: tableausoftware.com
Books
The Guardian: Facts are Sacred: The power of data by Simon Rogers (Guardian Shorts), Kindle edition US$2.99 on amazon.com
The Data Journalism Handbook by Jonathan Gray, Lucy Chambers and Liliana Bounegru(O’Reilly Media), Kindle edition US$9.99 on amazon.com
Web resources
Online journalism blog: “The inverted pyramid of data journalism”
onlinejournalismblog.com/2011/07/07/the-inverted-pyramid-of-data-journalism/
Poynter: “Using data visualization as a reporting tool can reveal story’s shape”
www.poynter.org/latest-news/top-stories/95154/using-data-visualization-as-areporting-tool-can-reveal-storys-shape/
journalism.co.uk: “How to: get to grips with data journalism”
www.journalism.co.uk/skills/how-to-get-to-grips-with-data-journalism/s7/a542402/
First published in WALKLEY MAGAZINE #73