The nearest neighbour functionality of a K-D tree naturally lends itself to the task of reverse-geocoding. Consider the blue point I randomly dropped into the Thames (51.500478, -0.122118) – we want to find the closest postcode to this point (perhaps within a maximum cut-off of 1 Mi)…
In a similar spirit to this: we use Numpy to load all the postcodes in the UK into a scipy.spatial.cKDTree and then query it with a list of points we are interested in. We then extracted the postcode for the closest match and save it as a CSV:
""" Nearest Neighbour Search / Reverse Geocoding Get nearest postcode for each point in list: points(latitude,longitudes) """ import numpy as np from pyproj import Proj, transform from scipy.spatial import cKDTree # Postcodes location (full 1.7 mill live UK postcodes) # https://geoportal.statistics.gov.uk/geoportal/catalog/content/filelist.page?&search=onspd postcodes_csv = '../_postcodes.csv' # Points to get nearest postcode for (lat|lon) in_csv = '.../nn_in.csv' # Output out_csv = '.../nn_out.csv' # Projection: http://epsg.io/27700-5339 (accuracy: 1.0m) uk = '+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 \ +x_0=400000 +y_0=-100000 +ellps=airy \ +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs' def proj_arr(points,proj_to): """ Project geographic co-ordinates to get cartesian x,y points = [lat,lon] Transform(origin|destination|lon|lat) """ inproj = Proj(init='epsg:4326') outproj = Proj(proj_to) func = lambda x: transform(inproj,outproj,x,x) return np.array(list(map(func, points))) # Load postcodes postcodes = np.genfromtxt(postcodes_csv, delimiter=',', names=True, usecols=(0,1,2), dtype=('U7',float,float)) proj_postcodes = proj_arr(postcodes[['lat','lon']],uk) print("We have %d postcodes" % len(proj_postcodes)) # Load points to query: points = np.genfromtxt(in_csv, delimiter=',', names=True, usecols=(0,1), dtype=(float,float)) proj_points = proj_arr(points[['lat','lon']],uk) print("We have %d points to query" % len(proj_points)) # Create KD tree for nearest-neighbour %time tree = cKDTree(proj_postcodes) %time out_postcodes = tree.query(proj_points) # Save the postcodes np.savetxt(out_csv, postcodes['pcd'][out_postcodes],fmt=('%s'))
The closest post-code returned for my test-point was SW1A2JH – I thought that SW1A0AA would be closer, fortunately it was .02 Miles further away. This slightly bastardises the notion that a postcode is a polygon and not a point (or a polygon centroid), which means that although Big Ben is closer, its postcode ‘SW10AA’ covers a large area and the ‘centre’ returned is actually further away – but the idea still stands.
To check some timings I ran this on a list of 10,000 co-ordinates and it took seconds:
2.54 seconds to build the KD Tree and 24ms to query it with 10,000 points