An occasional blog by Alastair Otter, writer, editor, hacker, all round geek. Topics are likely to include media, data journalism, gadgets and software. Follow me on Twitter for more: @alastairotter

Hacking education data: a school finder

Earlier this year, as part of another project, I downloaded the South Africa education department’s list of schools. At the same time I had a copy of the 2013 matric results for all public schools (at that stage in PDF format). It occurred to me it might be worth trying to put the two together to create some sort of schools information tool.

Of course, like most things, that is easier said than done. The first major hurdle was getting the matric results out of hundreds of heavily designed PDF pages. Pretty much nothing was up to the task although Tabula did the best of the lot. Eventually I gave up the task and went straight to the education department. It took a day or two but eventually I had a copy of the 2013 results.

With two handy spreadsheets in hand it seemed a simple task to join them together (they both had school IDs which ought to match). Again, not so simple. There were lost of discrepancies between the two lists and they were in serious need of cleaning (OpenRefine to the rescue).

schoolfinderBecause I was doing this as a side project I spread the work over a couple of weeks, picking it up when I had some spare time. Verifying the joined tables was pretty time consuming, as was getting the data into a format that I could insert into a MySQL database. But eventually it was done and I could start building a front end to it.

The current version is a relatively simple School Finder. It includes all of the schools info I could get my hands on, for both public and private schools. For now it only includes the matric results for public schools. I am now looking to see how I can add in private schools data.

The final product is built using PHP, MySQL, JQuery, and Google’s maps. The maps are static for now because I had some issues using the interactive version with a large number of data points.

Instagramming the city

As part of some testing earlier today I was playing around with Instagram. I’ve never been a heavy user of Instagram but I was looking setting up an account for our newsroom. Along the way I got distracted by some of the great pictures I found. To test Instagram’s embed capabilities I put together a small gallery of photos of Johannesburg, one of my favourite photographic subject matters. I even indulgently included one my own photos. Click on my name if you want to follow me.

Continue reading

Easy map visualisations with Mapstarter

In the world of data visualisation d3.js is the toolset of choice for most interactive journalists. But D3 also comes with a steep learning curve that makes it relatively inaccessible to the average journalist with just a small amount of coding experience.

I’ve done some very basic D3 stuff in the past but it took absolutely ages to get it working properly and I broke the visualisations more often than I improved them. And I certainly never got close to making anything that even resembled a map visualisation in all the time I spent trying to learn D3. So Mapstarter is something of a blessing.

Mapstarter makes it simple to create a basic interactive map from shapefiles as well as GeoJSON and TopoJSON files. And even if you do know how to programme your own D3 map, Mapstarter speeds up the time to get a map from concept to reality.

I heard about Mapstarter a couple of days ago (ht: @siyafrica) and decided to give it a spin. I downloaded a shape file from the South African Demarcation Board website (in this case an election ward map for Johannesburg) and 10 minutes later I had a functioning map of the number of registered voters in each ward.

Mapstarter is literally that, a “starter”. Once you have the map created you can tweak the information and styles and, if you know some programming, you can build out even more impressive map visualisations.

The original map I made popped up the number of voters in each ward when you hovered over it but it was extremely basic. So I opened up the shape file database file in LibreOffice, added another column for label text and rebuilt the map. This version is still very basic but at least you can tell what the numbers represent.

Once the map is created you can download it as a SVG or image file which you could use in Illustrator as the basis for an illustration, or you can download the code and include that in your website. That’s what I’ve done above.


Learning curves, and graphs

I’m rather pleased with this interactive graph, not because it’s particularly good but because it’s my first foray into D3.js.

Over the past few months I’ve been doing a lot of data-based work and while there are some excellent data visualisation tools available on the web, I’m always frustrated because I don’t have complete control over the final product. So this weekend I set out to learn D3.js. For those not aware of this D3 is a Javascript library similar to JQuery but specialising in data-driven visual effects.

So, after spending most of my long weekend glued to my computer I can finally show off something that I wrote from scratch. In all honesty it’s not particularly wonderful and not as attractive as some of the other graphs I’ve done using free tools on the web but it is a start.

If for some reason you decide to try and learn D3.js, do yourself a favour and start with this excellent tutorial by Scott Murray: Interactive Data Visualization. If it wasn’t for this excellent introduction I’m pretty sure I wouldn’t be anywhere close to where I have managed to get to.

Nkandla vs. Oscar Pistorius in tweets

Earlier this week I put together a small set of scripts to track the amount of attention the Oscar Pistorius trial was getting in Twitter. With not only local but also international audiences keen to follow the murder trial of the celebrity athlete it wasn’t surprising that Oscar was big on the social network. As you can see from this the Twitter activity around the Oscar trial regularly topped 2,000 tweets an hour while court was in session and on Wednesday peaked at just under 4,000 an hour.

So, when on Wednesday the Public Protector Thuli Madonsela finally released her investigative report into Jacob Zuma’s Nkandla residence I decided to do the same to see if people cared as much about the allegations against the president as they did about the gory details of a celebrity murder trial.

The good news is that they do, and significantly so (hover over chart for details):

Using the exact same method as I used with the Oscar Pistorius trial, I tracked mentions of Nkandla on Twitter from 10am on the morning of the announcement until 1pm the following day. As we can see there were just over 500 Nkandla-related tweets an hour at 12pm, half an hour before the announcement, but that quickly spiked to just over 6,000 an hour by 2pm, just over an hour into the announcement.

Oscar Pistorius: day 11 morning session in words

We’re into the 11th day of the Oscar Pistorius murder trial and during a particularly long, drawn out pre-lunch session I was messing around with the a large collection of tweets from the morning’s session that I had collected. What to do with almost 10,000 tweets but make a word cloud? So here it is: a quick wordcloud generated from 9,996 tweets over the course of the morning’s session.


It’s worth mentioning that the wordcloud was created with the excellent Wordle wordcloud generator. The tweets that were used in this were collected automatically using a series of scripts I wrote over the weekend. But more on that later.