Falsehoods programmers believe about addresses and location data

The following is a list of misconceptions that I have seen other programmers, as well as myself run into throughout my career. Of course, this is based on the famous Falsehoods Programmers Believe About Names by Patrick McKenzie.

As someone who currently works as a software developer, has had a bit of a career as a postie and has worked on a couple of location databases, I think a lot of those are common sense, but probably not for everyone — and needless to say, I don’t think this is the first post anyone writes about the topic.

I myself have had to do weird things with address forms such as zerofilling postal code fields until the validator was happy, finding a plausible state for my address even if the country I live in does not have that subdivision and make up a house number where there was none, because German address forms insist on every address having one.

Then there is the story of a friend of mine who lived in a house where his entrance was located in a different address and postal code from the one around the corner in the same building, resulting his phone company’s service reps having a mini meltdown every single time he would call them, even if the local phone technicians knew exactly what was going on.

The morale of the story is probably that addresses are convoluted things meant for human use and that postal codes are meant for mail sorting and not supposed to be tinkered with by mortals.

The following is a list of falsehoods, all of which are or may be wrong:

Basic validation and formatting

  • Every location has an English name.
  • Locations with weird foreign characters in them always have a simplified/anglified spelling of their name.
  • Every location can be written using letters found in ASCII or the ISO encoding that I use.
  • Every location can be written using Unicode encoded characters.
  • Addresses are always formatted or structured in the same way.
  • At least addresses have the same format within the same country.
  • Pizza delivery drivers, official maps and the post office use the same street names and addresses for the same location.
  • Every address indicates a name, street address, postal code, state or region and city.
  • Every street address has a house number.
  • Every building has only a single address.
  • Addresses always have a clearly defined structure.
  • Every location has an address.
  • But surely every address indicates a location that can be found on a map.
  • No?

Postal codes

  • Postal codes are always called “Zip Codes” in English
  • Postal codes are always 5 numerical digits.
  • Postal codes are always numerical.
  • Then at least postal codes should be validated somehow.
  • Well, then I can strip out spaces and dashes from them.
  • Postal codes always indicate the nearest city/town/village.
  • Postal codes never change.
  • I can assume things unrelated to mail sorting, such as real-estate value from a postal code.
  • Postal codes don’t cross administrative boundaries such as county, state or region.
  • Every postal code indicates a fixed location.
  • Every location in every country has an associated postal code.
  • Every country uses postal codes and I can depend on that!
  • Postal codes are not personally identifiable information.
  • At least there aren’t multiple postal codes per building, right?
  • Right?

Localities and locations

  • Everyone agrees about which country a city or town belongs to.
  • Well, then buildings don’t cross country borders.
  • Every city or town has a single, canonical name.
  • Elements like street name and locality always start with a character in the A-Z range.
  • What do you mean with ’s-Hertogenbosch?
  • People never use administrative divisions that officially don’t exist anymore.
  • Localities such as cities have official names that everyone agrees on.
  • Everyone agrees what a county, municipality, hamlet, village, town or city actually are.
  • Also, those never merge with, change or cross each other’s hierarchical boundaries.
  • Street and locality names are unique.
  • Street and locality names never change.
  • At least street and place names are unique within the same municipality, town or city.
  • Then the combination of city or postal code and place name are always unique.
  • Businesses don’t have more than one branch in the same street.
  • Buildings and venues never move places.
  • At least they don’t move on a day-by-day basis.
  • No, really?

Using an external API

  • Information from the Google Maps API is always correct and up to date.
  • At least there is a single, correct location database that I can always use.
  • The location data in my external API always shows the location of each place, but not of the car park that is 300 meters away.
  • Corrections and submissions to OpenStreetMap and Google Maps will stick.
  • Also, they will not be overwritten by a bot, without any comment in less than 12 hours.

Routing

  • Routing is just about finding the shortest distance between two points.
  • I can always depend on popular APIs such as Google Maps or HERE Maps for routing.
  • An external API is better at routing than a human driver.
  • One-way streets, traffic islands that block right-hand turns and dead ends are edge cases and I don’t need to account for them.
  • You mean people also drive on the wrong side of the road?
  • Drivers are happy and accepting about their route being planned by an automated system, with an estimated time for each stop because it saves them time and effort.
  • Navigation routes never change.
  • Well, but not within the same day, right?