This damn economic crisis/swine flu outbreak isn’t quite that bad yet, but nevertheless there is a very limited sense where the UK is quite anarchic: geocoding addresses using Google Maps.
Having completed my download of addresses for my new Google Maps website the next stage was to geocode them so that I can plot them on the map. I had no idea how tricky it would be when I started out.
The most irritating and fundamental difficulty is that geocodes for UK postcodes are not available for free. The data is owned by the Royal Mail, and there is at least one website where you can buy access to this information (it has a free trial, but I discovered that this is just for about 10 or so geocodes). You can search by postcode on google maps, but if you put a postcode e.g. LL13 7YH into the geocoder API you’re given the geocode for LL13 7 – not accurate enough to be of any real use.
So you have to go for geocoding full addresses instead. The geocodes for these data isn’t owned by Royal Mail, but by the Ordnance Survey, and for some reason they are less restrictive about sharing the information. But there’s still a long hard slog before you can get the geocodes out of this.
Google offer a really useful turorial on geocoding addresses, and this, combined with my approach to iterating over a large number of records meant I was collecting the geocodes in no time. However, it wasn’t as peachy as it seemed.
For example, the address Llantysilio, Denbighshire, UK brings up a pretty accurate geocode for the village of Llantysilio in North Wales. However, the full address, including the postal town is Llantysilio, Llangollen, Denbighshire, UK, and this unexpectedly brings up the geocode for an address on Castle Street, right in the middle of Llangollen. So a more complete address leads to a far less accurate geocode. This is immensely problematic.
In general I was feeding in the longest possible address made up out of the data I had, so in my php script I had something like the following:
while(count($arr_address > 1) && !$str_lat)
{
$str_address = implode(', ', $arr_address);
attempt_geocode($url.$str_name.', '.$str_address.', '.$str_county.', UK');
attempt_geocode($url.$str_address.', '.$str_county.', UK');
attempt_geocode($url.$str_address.', UK');
array_pop($arr_address);
}
This starts with the longest, most detailed address string, and then gradually cuts the string down (possibly sacrificing accuracy in order to get a passable geocode), with attempt_geocode() exiting the loop on success.
But the fact that longer addresses can lead to incorrect geocodes meant I had to work in a way to start off with shorter addresses, and if that doesn’t get a geocode then gradually shorten them and keep trying to geocode. So I’ now have:
attempt_geocode($url.$arr_address[0].', '.$str_county.', UK');
attempt_geocode($url.$arr_address[0].', '.end($arr_address).', '.$str_county.', UK');
while(count($arr_address > 1) && !$str_lat)
{
$str_address = implode(', ', $arr_address);
attempt_geocode($url.$str_address.', '.$str_county.', UK');
attempt_geocode($url.$str_address.', UK');
array_pop($arr_address);
}
A long process in order to get a geocode that could still quite likely be wrong, and even if it’s basically correct might not be as accurate as a postcode; but nevertheless an improvement on what I had before.
A glimmer of hope though is that google maps itself doesn’t suffer from this issue – both address versions return the same accurate point on the map, and as someone pointed out to me on stackoverflow, google maps is in beta, so maybe teh geocoder API just hasn’t been updated to the newer, better address parser, and maybe one day reliable geocoding for free in the UK will be a reality. Also, somebody has found a way to geocode in the UK using postcodes, by hacking together the google maps and search APIs, and I may well try it, as this address geocoding malarchy leaves a lot to be desired. (*edit – turns out it’s heavily reliant on javascript so can’t be used for geocoding masses of pages without slowing down your browser.)
Finally, if this article wasn’t any help, there’s loads of geocoding links here.