Posts Tagged ‘geocoding’

Where you headin’, luv?

Friday, August 21st, 2009

In previous posts I’ve covered the inherent difficulty of geocoding addresses and postcodes in the UK, specifically in order use the geocodes with Google maps. I learned a lot about the limitations of Google’s various geocoding services. To sum up the situation:

Previously I’d been geocoding large batches of addresses/postcodes, but having got the geocodes in order to plot points on the map, I wanted to add search functionality to the map, so that it would zoom in on a given address/postcode. I wanted it to be accurate for both addresses and postcodes, so using what I’d learned, I wrote a javascript function, geocodeUKAddress, which always returns the best geocode that Google can offer, so your website can be as reliable as a London cabbie (again, I can’t embed it here as javascript has a habit of breaking wordpress, though there must be some way to make it safe – I will research).

You will need to include the following in ther head of your webpage too

<script src="http://maps.google.com/maps?file=api&amp;v=2&amp;key=your_api_key_here" type="text/javascript"></script>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>

I can’t take credit for the regular expression that recognises UK postcodes, but apart from that, it’s all my own work, which anyone can feel free to use. Unless you maybe work for Rupert Murdoch.

Geocoding in the UK

Sunday, August 16th, 2009

The art of geocoding addresses in the UK, as I previously explained, is a soul-destroying process, frought with inaccuracy, bugs and convoluted workarounds. And for all that work you end up with a set of points of which a great deal are probably somewhat inaccurate and at least some of which are completely wrong. UK addresses (and probably those elsewhere in the world) are complicated creatures, which Google’s geocoding engine often interprets wrongly.

Postcodes, on the other hand, are rather easier; there is a well-defined relationship between a UK postcode and its corresponding (usually pretty small) piece of the British countryside. But google’s geocoding api will only return a geocode for the postcode sector (ie will give a geocode for LL12 5 when you searched for LL12 5TH). However, someone did figure out a way of using Google’s local search API combined with google maps to geocode UK postcodes. Since he blogged about it the API has changed, so below is an outline of how to geocode a batch of postcodes in the UK using just some simple php, the current google ajax search API and a little javascript (jQuery isn’t essential, but cuts down on coding a bit). The javascript is the crucial step.

Assuming you have a database full of postcodes and id numbers, and 2 empty columns to store latitude and longitude values, this is how it’s done. (Download source geocode.zip).

1. Create a html page geocode.html with the following content:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >

<head>
<title></title>
<meta name="description" content="" />
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<link rel="stylesheet" href="" type="text/css" media="screen" />
<script type="text/javascript" src="jquery-1.3.2.js"></script>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript" src="geocode.js"></script>
</head>
<body>
<div id="counter"></div>
</body>
</html>

(Make sure you specify the correct location for your local javascript files)

2. Create a php file (in the same directory), geocode.php, with the following rough structure (it will only be accessed via ajax, so is very stripped down):

<?php
require_once ('mysqlConnect.php'); //or other database connection details
if($_GET)
{
 //var_dump($_GET);
 update_record();
 send_new_data();
}

//gets the next record without a geocode and sends the id and postcode to the browser
function send_new_data() {
 $query = @mysql_query("SELECT id, postcode FROM geocode_table WHERE lat = '' AND postcode != '' ORDER BY id LIMIT 1");
 if(($query) &&mysql_num_rows($query)) {
  $row = mysql_fetch_array($query, MYSQL_ASSOC);
  echo $row['id'].','.$row['postcode'];
 } else {
  echo 'stop';
 }
}

//updates the last record with data sent from browser
function update_record() {
 $id = $_GET['id'];
 $lat = $_GET['lat'];
 $lng = $_GET['lng'];
 if($id > 0)
 {
  $update = "UPDATE geocode_table SET lat = '".$lat."', lng = '".$lng."' WHERE id = ".$id;
  $result = @mysql_query($update);
  if (!$result) {
   die('Invalid query: ' . mysql_error());
  }
 }
}
?>

3. Create a javascript file geocode.js, saved in the same directory again (I would paste it here but it keeps breaking wordpress)

4. Running the code

Once you’ve altered the database connection details, and SQL query to suit your setup, simply open geocode.html in your browser. A counter will tell you which record you’re on. To stop the code simply close your browser/browser tab.

How it all works

In a nutshell (ignoring the special case of starting off the loop) the code repeatedly performs the following process:

….in geocode.php, send_new_data() finds a record which has no latitude value and sends it’s id number and postcode as an ajax response to set_and_get_next(). This keeps track of the id in a global variable and sends the postcode to getPointFromPostcode(), which uses google’s local search to get a geocode. Once it’s found a geocode it passes it to set_and_get_next(), which sends it to geocode.php in an ajax request. There update_record()… well… updates the record, and send_new_data() finds a record which has no la….

Compared to my previous approach iterating a script over large sets of data, using ajax is very sleek. Similarly to a pure php script I can load from a browser, though with much of the resource intensive scripting taking place on my or google’s server. But with ajax there’s no problem with the browser timing out from time to time, or baulking at the number of times a page is requested. It’s a little harder to code, and probably less efficient… but I like it. And I’ll definitely be using my shiny new geocoded postcode data.

Anarchy in the UK

Monday, July 13th, 2009

This damn economic crisis/swine flu outbreak isn’t quite that bad yet, but nevertheless there is a very limited sense where the UK is quite anarchic: geocoding addresses using Google Maps.

Having completed my download of addresses for my new Google Maps website the next stage was to geocode them so that I can plot them on the map. I had no idea how tricky it would be when I started out.

The most irritating and fundamental difficulty is that geocodes for UK postcodes are not available for free. The data is owned by the Royal Mail, and there is at least one website where you can buy access to this information (it has a free trial, but I discovered that this is just for about 10 or so geocodes). You can search by postcode on google maps, but if you put a postcode e.g. LL13 7YH into the geocoder API you’re given the geocode for LL13 7 – not accurate enough to be of any real use.

So you have to go for geocoding full addresses instead. The geocodes for these data isn’t owned by Royal Mail, but by the Ordnance Survey, and for some reason they are less restrictive about sharing the information. But there’s still a long hard slog before you can get the geocodes out of this.

Google offer a really useful turorial on geocoding addresses, and this, combined with my approach to iterating over a large number of records meant I was collecting the geocodes in no time. However, it wasn’t as peachy as it seemed.

For example, the address Llantysilio, Denbighshire, UK brings up a pretty accurate geocode for the village of Llantysilio in North Wales. However, the full address, including the postal town is Llantysilio, Llangollen, Denbighshire, UK, and this unexpectedly brings up the geocode for an address on Castle Street, right in the middle of Llangollen. So a more complete address leads to a far less accurate geocode. This is immensely problematic.

In general I was feeding in the longest possible address made up out of the data I had, so in my php script I had something like the following:

while(count($arr_address > 1) && !$str_lat)
 {
$str_address = implode(', ', $arr_address);
 attempt_geocode($url.$str_name.', '.$str_address.', '.$str_county.', UK');
 attempt_geocode($url.$str_address.', '.$str_county.', UK');
 attempt_geocode($url.$str_address.', UK');
 array_pop($arr_address);
 }

This starts with the longest, most detailed address string, and then gradually cuts the string down (possibly sacrificing accuracy in order to get a passable geocode), with attempt_geocode() exiting the loop on success.

But the fact that longer addresses can lead to incorrect geocodes meant I had to work in a way to start off with shorter addresses, and if that doesn’t get a geocode then gradually shorten them and keep trying to geocode. So I’ now have:

 attempt_geocode($url.$arr_address[0].', '.$str_county.', UK');
 attempt_geocode($url.$arr_address[0].', '.end($arr_address).', '.$str_county.', UK');
while(count($arr_address > 1) && !$str_lat)
{
 $str_address = implode(', ', $arr_address);
 attempt_geocode($url.$str_address.', '.$str_county.', UK');
 attempt_geocode($url.$str_address.', UK');
 array_pop($arr_address);
}

A long process in order to get a geocode that could still quite likely be wrong, and even if it’s basically correct might not be as accurate as a postcode; but nevertheless an improvement on what I had before.

A glimmer of hope though is that google maps itself doesn’t suffer from this issue – both address versions return the same accurate point on the map, and as someone pointed out to me on stackoverflow, google maps is in beta, so maybe teh geocoder API just hasn’t been updated to the newer, better address parser, and maybe one day reliable geocoding for free in the UK will be a reality. Also, somebody has found a way to geocode in the UK using postcodes, by hacking together the google maps and search APIs, and I may well try it, as this address geocoding malarchy leaves a lot to be desired. (*edit – turns out it’s heavily reliant on javascript so can’t be used for geocoding masses of pages without slowing down your browser.)

Finally, if this article wasn’t any help, there’s loads of geocoding links here.