Archive | data RSS for this section

Mapping postcodes with ScraperWiki and the Google Maps API

ScraperWiki

ScraperWiki

After my last, film and videogame related post, we’re back to familiar ground to look at the ace mashup/data processing tool ScraperWiki. ScraperWiki is a great little suite of programming languages with a very specific focus on extracting information from webpages, files, you name it. It has a database that you can use to store your data and generates URI’s that you can point people to to retrieve your data. One of the really cool things you can do is to take the data you’ve “scraped” and do “things” with it through the views functionality. There is a choice of three programming languages available, depending on your knowledge (Python, Ruby and PHP) and a number of libraries that tap into each one.

Playing around with this earlier last month I used it to rebuild some broken RSS feeds on the City of Lincoln Council website that we had previously been running through Yahoo! Pipes. The process, once I’d learned the foibles of ScraperWiki by following through a few examples, was really painless and helped me gain a better understanding of writing in Python; this seems to be the best way of getting the data off the page and into the ScraperWiki data stores.

After the RSS feed I wanted a new challenge and had an idea. I’d been playing with the Google Maps API on my local development machine for a while, working with a few functions that would take a UK postcode and convert it to Long/Lat for displaying on a map. While this was quite fun I didn’t really have an application for this, nor a place I could host such an application. So I thought I could use the views in ScraperWiki. I had a number of datasets for the Lincoln Decent Homes Scheme (you can find out more about that here) which I thought would be great to somehow visualise on the website as a map, allowing people to find their area and expand it to find out what work has been done to council properties. Previously this had all been published as tables in PDF files.

Getting the data was fairly straightforward; I simply uploaded the eight CSV files to our server to deliver raw data. As I would need to combine each property line under one postcode, I set about writing a scraper for these CSV’s. You can find the code here (written in Python; there are some comments so I’m not going to go into detail!) and the output here. What this is basically doing is reading in each line of the eight CSV files and, where it finds a match in a postcode, combining the informative bit (properties, work done) into a HTML table – this is used in the  eventual map. It takes a good few seconds to run as it has a ton of data to process, but it goes to show how robust ScraperWiki is.

Once I’d got my data I then needed to turn it into the map. I created a new view based on the data, coded in PHP so that I could, effectively, create a new web page on the ScraperWiki framework. I used a hybrid of PHP and the Javascript libraries I had been playing with to build a page which chucks all the scraperwiki data into an array, calls the Google Maps API and then itterates through the array, creating points on the map. There’s also a bit of CSS to help style the tables and the map in general. You can find the code here, and the map here. Again, it’s a bit of a hefty bit of code and the map can take a few seconds to render but it’s worth it.

All in all, this took me about two days to put together and a lot of that was learning and troubleshooting. What we now have, however, is a very easy way to keep this data up to date. Instead of having to compile new PDF files and alter links on the website, all we do is update the master CSV files and everything else falls into place. This is open data in its purest form, simple source files feeding into automated systems which self maintain.

If anyone wants to go into more detail about how I did this, simply hit up my contact form and drop me a line!

Mapped videos with YouTube, Google Docs and Fusion Tables

Yesterday I attended the Hyperlocal West Midlands Govcamp in Wallsall (Addictomatic linky here). One of the sessions I attended/chipped in on was the “Blue Peter Open Data” session hosted by Will Perrin. There was some very interesting laymans term open-data stuff coming out of it; not much I didn’t already know, but I do like sessions of that kind!

Part way through, Will showed how he’d used Google’s Fusion tables to analyse data sets created or hosted in Google Docs. A pretty cool tool in itself, it also comes with the ability to analyse and map GeoData. This is great if you have a set of street names, for example, that you want to map. One of the attendees, Lucie Kerley, was there from a company called MACE (Media Archive for Central England), looking for home videos from around the Midlands. She asked the question of whether something like this could be used to show pictures and video next to the points – I suggested that, knowing the infinite magics employed by Google (pixies and brownies and the like – we all know that’s their secret) that it should, in theory, be possible.

So, today, I thought I’d give it a crack. I set up a very simple three column spreadsheet in my Google Docs. This contained a location (street names in Lincoln), a link to a video, and a comment. Importing the video into Fusion Tables, I hit the Visualise > Map button and was presented with a map containing my points. Snazzy. But, clicking on the points didn’t automatically show the video. I tried replacing the link with the YouTube embed code, however Fusion Maps didn’t really like this either, and stripped out the tags. Where to go from here?

I did some digging and found that you could customise the HTML within the popup bubbles in Fusion maps. Maybe this was the key. I tried something…

The one thing that is constant, and identifies videos on YouTube, is an ID string which appears in the videos URL. For example, this URL: http://www.youtube.com/watch?feature=player_profilepage&v=FL2Bb6-vD1E, identifies the video FL2Bb6-vD1E. The first thing I did was reduce the video column down to just these ID’s. No other embed code or URL information. I then dug into the templating.

I created a custom template which contained very simple HTML. What was that HTML? The standard YouTube embed code. For the video given above, that’s:

The custom HTML can refer to columns in the Fusion table using a {} notation. In my case, the video column was {Video}. So, I replaced the video ID’s in the HTML with {Video}. This gave me:

Did it work? Did it ever! The map was now enabled with points which, when clicked, showed a video in the popup. This is a real proof of concept – of course none of the videos refer to the roads they’ve been slapped on. But it works. Which is, quite frankly, awesome.

There is absolutely no reason that, with a bit of thought, this couldn’t be extended to include playlists, Picasa (maybe Flickr?) galleries and other comments. It’s a brilliant way to create an interactive multimedia map! The only thing I’ve not tested is whether the Fusion Table updates with the Docs Spreadsheet.

Resources:

You can find the original spreadsheet here.

Here’s the spreadsheet imported into Fusion Tables.

Here’s the map.

Extract KML data from Google My Maps

So this is something that’s been bugging me for a while now. At the City of Lincoln Council I promote not only the use of web based tools such as those that Google supply, but also the use of open data. To that, I created the site data.lincoln.gov.uk, a WordPress powered site which pushes out Lincoln data sets in a nice, easey, machine readable way. CSV, RDF – those boxes got ticked early on. But KML was eluding me. We create most of our maps by hand in Google maps. Now, I know that this produces KML – it can be used in Google Earth, there’s a link for it and all. But downloading the “KML” file from this link produces what looks like a simple hook in for Google Earth to retrieve the online KML data.

Humm.

Let’s take a look at the problem. We’ll use this map here for reference. The url from the “Download to Google Earth” button (which you can get by right clicking on the button) defaults to this:

http://www.google.co.uk/maps/ms?ie=UTF8&hl=en&vps=1&jsv=280a&msa=0&output=nl&msid=100185268874394349858.000459d5fe17d48f39f66

This gives us our KML file that hooks into Google Earth; great for that use, not so good for my data site! Take a look at the url – see that bit that says “output = nl”? Let’s change that to “output=kml”. That gives us the link:

http://www.google.co.uk/maps/ms?ie=UTF8&hl=en&vps=1&jsv=280a&msa=0&output=kml&msid=100185268874394349858.000459d5fe17d48f39f66

…which produces EXACTLY what we want – the makeup of the map in raw KML. Spiffing! I can’t take credit from that – this article helped me out.

What I have done, however, is take it further. how can we engineer this to get the KML of any map? The key is in the msid part. This is what Google uses to identify a map. In the map, click on the “Link” button and take a look at the link. The msid is encoded in there. Simply replace the msid in the link above with the msid of the map you want the KML for and you’re away!

Job done!

Hopefully that’s all clear and of some use to you! I’m off to convert all of our maps to KML links so that we can provide them on our data site! Ta ta!

Implementing custom HTML in WordPress

As you will know from the last post, I’m working on getting Lincoln’s election data into the Open Election Data project. RDFa-ifying the content on our website was fairly straightforward involving me having to add funky attributes to the HTML pages. The next step, however, was a bit tougher as links to the pages had to be discoverable. Again, this involved adding some custom RDFa attributes to the HTML – this was not possible in our admin system without some severe jiggery-pokery, so I turned to the recently launched data.lincoln.gov.uk.

What should have been straightforward, however, still proved problematic. I set up this post initially and grit my teeth upon the first publish when I realised that WordPress had filtered out all the RDFa goodness. This is down to the built in kses filter which ensures that no ‘dodgy’ code is put into posts and comments. Comments I’m more than happy to have protected but, as I’m the one who controls the posts, I was more than a little grumpy that I was being denied free regin.

I did some digging, came up with a number of people having the same problem, tried editing the kses file, hit head on wall and then eventually came across this plugin. It’s a beauty of a plugin and does exactly what I needed – allows you free reign to add whatever HTML you want to your WordPress posts. Installed, activated and now the ward list on data.lincoln.gov.uk is fully RDFa-ified. Simples.

So, if you need to fiddle the HTML, maybe adding some RDFa or microformats, on your WordPress blog, make sure you fire that plugin up.

Open Data – Lincoln Style

With the recent rush of activity regarding open data on the web and the launch of data.gov.uk it seemed only fitting that I try and follow suit with the data that Lincoln holds. Experimenting with Google Docs has shown that it’s pretty straightforward to get data up in csv format, visualise it and provide it in a number of flavours for consumption.

So what’s next? As data.gov.uk has given central gov an ever expanding repository for open data files, I thought that it would be best for Lincoln to follow suit as well. That brings me to data.lincoln.gov.uk. Currently in beta, this is a VERY basic WordPress build which will act as a shop front for Lincoln’s open data. Information will go onto our website in one form or another and we will create links through to the data site for consumption, often using Google Docs as a conduit where appropriate.

Why use WordPress? A number of reasons:

  • Flexibility – I could create a stripped down, basic frontend for the site that just works
  • Categorising – I love WordPress’ category system, the way it builds itself up as you add categories; this seemed sensible when I could not guarantee what data packets we would be providing from the get go
  • RSS – WordPress generates an RSS feed; something that our website CMS does not do. Users will quickly be able to register for updates to the data library

So that’s it! Basic, simple to implement and maintain. I’m going to publicise this in Lincoln and see what kind of reaction we get to it.