Nkandla vs. Oscar Pistorius in tweets

Earlier this week I put together a small set of scripts to track the amount of attention the Oscar Pistorius trial was getting in Twitter. With not only local but also international audiences keen to follow the murder trial of the celebrity athlete it wasn’t surprising that Oscar was big on the social network. As you can see from this the Twitter activity around the Oscar trial regularly topped 2,000 tweets an hour while court was in session and on Wednesday peaked at just under 4,000 an hour.

So, when on Wednesday the Public Protector Thuli Madonsela finally released her investigative report into Jacob Zuma’s Nkandla residence I decided to do the same to see if people cared as much about the allegations against the president as they did about the gory details of a celebrity murder trial.

The good news is that they do, and significantly so (hover over chart for details):

Using the exact same method as I used with the Oscar Pistorius trial, I tracked mentions of Nkandla on Twitter from 10am on the morning of the announcement until 1pm the following day. As we can see there were just over 500 Nkandla-related tweets an hour at 12pm, half an hour before the announcement, but that quickly spiked to just over 6,000 an hour by 2pm, just over an hour into the announcement.

Oscar Pistorius: day 11 morning session in words

We’re into the 11th day of the Oscar Pistorius murder trial and during a particularly long, drawn out pre-lunch session I was messing around with the a large collection of tweets from the morning’s session that I had collected. What to do with almost 10,000 tweets but make a word cloud? So here it is: a quick wordcloud generated from 9,996 tweets over the course of the morning’s session.


It’s worth mentioning that the wordcloud was created with the excellent Wordle wordcloud generator. The tweets that were used in this were collected automatically using a series of scripts I wrote over the weekend. But more on that later.

Setting PDF data free with Tabula


This week I spent some time working on an education project I have been dabbling with for a few weeks. Inevitably I eventually hit the PDF wall. The only place to get the extra data I needed to move the project forward was locked in a 276 page PDF. Even worse, the tables I wanted to pull out of the PDF were huge and spanned most of the 276 pages.

Split the PDF
Fortunately the tables were neatly divided into sections so the first thing I did was split the PDF up into smaller chunks. The easiest way to do this was to open the PDF in the Chrome browser and then “print” the pages I wanted to a new PDF. It’s pretty painless: open the PDF in Chrome, hover over the bottom right of the screen and select the printer icon. In the print ¬†switch from your standard printer to PDF and select the pages you want to print.

Convert to CSV
Next I tried every online PDF to Excel converter I could find. And each one failed, maybe because the files were still pretty big or because of the annoying layout which had column header text flipped 90 degrees. Either way it was going nowhere so I decided to give Tabula a try.

Tabula is a free, Java-based program designed specifically to “liberate data” stuck in PDFs. It’s pretty simple to install and it runs in your browser, a lot like OpenRefine.

Select and download
Once Tabula is open in your browser you need to find the PDF you want to extract data from. Upload the file and that is opened in the editing window. Next, drag a rectangle shape around the table you want. If the table continues on the next page(s), click the “Repeat Selection” button and the selection area is duplicated across the following pages.

Scrolling down you can reviews the selections and either move them if they’re slightly off, resize them or even remove certain pages from the selection. When you’re done you click the download button and download the content as a CSV or TSV file.

My selections covered dozens of pages at a time but Tabula did an excellent job. There were a few relatively minor formatting issues with the final files but given that I was working with 276 pages and more than 6,000 lines of data in the final document they were relatively insignificant.


SA Hacks List gets an update


A couple of years ago I started a small project to collect the Twitter handles of South African journalists. That first iteration was done using a Google spreadsheet but it quickly got complicated. The second version was built using a database, some PHP code and the Twitter API. Which worked nicely until Twitter updated its API to standardise on OAuth for authorisation. I didn’t have the time to re-do the code when this happened so the Hacks List languished for a while.

Until now. The new Hacks List (v2.0) is a complete rewrite of the code. It is still being developed (when I get the time) and it is missing a few features but overall I think it works okay. Take a look and let me know what you think @alastairotter)

Hacks List 1.0 is still in place for now (even in its broken form) but I will be replacing that shortly.

Re-inventing the newsroom

I was fortunate enough to spend part of Tuesday listening to David Boardman (@dlboardman), the former editor of the Seattle Times, talk about how his newspaper had re-invented itself to take on the significant challenges the newsprint industry is facing. It is a success story of note, especially as the only other daily newspaper in Seattle was closed down in 2009 and the Seattle Times not only survived but went on win Pulitzer prizes for its journalism.

Listening to Boardman speak it was apparent that the changes made at the Times weren’t small and cosmetic but were large, sweeping and affected every aspect of the business. There were many highlights listening to him but the most notable for me were the following:

  • Establishing the mission statement of the Seattle Times as it looked to reinvent itself was a process that involved all staff.
  • There were cultural challenges getting staff to see technology as tools for their jobs and not just as delivery systems.
  • There were challenges in breaking down the barriers between business and news.
  • There were cultural challenges in getting everyone to see news “as a conversation”.
  • The focus was on getting “feet on the street” doing stories and reducing the numbers of staff just processing copy.
  • The number of copy-editors was reduced significantly but other new roles (multimedia etc) expanded.
  • The newsroom was re-organised into three areas: creation, curation and community. The creation team was about generating content, including words, pictures, graphics, videos. The curation team about editing, design, prioritising news. The community team focused on community outreach projects and bringing the community into the newsroom.
  • All staff were trainined in new technologies such as SEO, analytics and social media. What was most notable was that all staff underwent this training, not just a handful of “digital” people.
  • Staff were trained in social media and editors were made responsible for ensuring reporters did use social media.
  • The Seattle Times partnered with local bloggers. They started with five blogs and eventually grew that to around 60. Articles from these blogs were promoted on the Times‘ front page and drove traffic to the bloggers. Boardman said that this was initially a hard sell because it seemed counter-intuitive to be driving traffic away from the Seattle Times‘ website. But, he said, in practice pushing readers to high-quality content meant they would come back for more.
  • The new media world is not about being “platform agnostic” but about taking advantage of the opportunities each new platform offers.
  • Perhaps the most interesting project instituted as part of the makeover was the idea of bringing community members into the newsroom. Every Wednesday a member of the community, from CEOs to activists to religious representatives, is invited to the planning meeting. At that meeting the first 30 minutes are for the diary meeting. The second half of the hour is allocated to the community member to talk about what they are interested in.

Boardman concluded with perhaps the most striking example of how the newsroom had been re-skilled and re-energised. The promotional video below was made by journalists in the newsroom and not by a specialist marketing agency.


Boardman was speaking at the Future of News colloquim organised by the Wits School of Journalism. One of the other presentations, by Jos Kuper, on the news-reading trends among the youth has been neatly captured by Anton Harber.

Media tools: ThingLink makes images interactive

Static pictures on the web are *so* yesterday. Today’s readers want information, interaction and loads of stuff to click on. And today’s online media tool of the week does exactly that: it adds life to otherwise static pictures.

ThingLink makes it simple to add clickable hotspots to any image. The hotspots can include pop-up video links, pop-up descriptions or even links to other sites.

It really is simple. Create an account at ThingLink, upload a picture and add the links you want by clicking on the picture. Give each one a description, a link if desired, choose an icon and save. Audio and video links are automatically recognised and a pop-up viewer is added to the image. You can also search for related audio and video links in the left-hand bar of the editing window.

When you’re finished the image can be shared on a range of sites and services such as Google+, Twitter and Facebook, as well as being embeddable in most other sites. On platforms such as WordPress you’ll need the ThingLink plugin to get it all working.

Take a look at the image below. It’s a sample I knocked up in just a couple of minutes. It’s probably not the best illustration of ThingLink around but it does show the basics. For some infinitely better examples of ThingLink in action take a look at the ThingLink featured page.

Have you used ThingLink on your site? Let me know what you think of it.