Skip to main content

https://technology.blog.gov.uk/2016/12/06/how-we-represent-data-when-developing-apis/

How we represent data when developing APIs

Posted by: , Posted on: - Categories: Chat

Make things open, it makes things better sticker

At GDS, “make things open,” is something you hear on a daily basis. In my new role as the GDS Open Standards Lead, I’m helping to make sure we do this, ensuring we choose the best open standards, and promote them as widely as possible.

This post looks at the conventions we’ve adopted, and builds on a previous post from one of our technical architects on how we tackle API versioning. The post assumes the use of JSON as the response format, because that's what we've standardised on.

There are issues with JSON's portability, which usually boil down to the inappropriate use of the JSON number type (JSON holds multiple types of data, including numbers, strings, and booleans). Where possible, we avoid using JSON numbers as there are so many subsets - it’s often easier to work with text/strings and then convert them into numerical form.

Representing time and date

JSON does not specify how to represent dates or times. For this reason, it’s important for producers of APIs to use recognised standards. Broadly speaking, there are 2 sensible ways of representing times in JSON.

The first is by using ISO 8601, an international standard published by the International Organization for Standardisation in the 1980s. ISO 8601 remains the recommended format for representing date and time, and helps ensure people read the time correctly.

The advantages of using ISO 8601 include:

  • avoiding confusion in international communication (in the US, the 2 January is written as "01/02", which people in Europe would understand to be the 1 February)
  • the ability to specify a time zone, which is vital for international communications
  • it’s of arbitrary precision - so we can store the time an event occurred as 2016-10-19T12:24:12+00:00 (this tells us the event was on the 19 October 2016 at 24 minutes and 12 second past midday in the GMT time zone)
  • ease of representing dates in the distant past and future -I’ll explain in a future post how this offers a distinct advantage over the UNIX timestamp
  • no ambiguity between seconds and milliseconds (like there is with other standards like the UNIX Timestamp)
  • humans have to read your code sometimes - it's a lot easier to understand 2016-12-25 than 1482667932.

A second standard used within government is the UNIX Timestamp, which represents time and date as an integer total of the number of seconds or milliseconds passed since the 1 January 1970.

Like ISO 8601, the UNIX Timestamp is well understood by computers but it does carry a number of disadvantages. Despite these disadvantages, the UNIX TImestamp still gets used within government because it’s set as the default way to store times and dates within many systems so it’s used automatically.

Use of the UNIX Timestamp can cause confusion over precision. Some UNIX timestamps use seconds, and others use milliseconds. And, the standard is unclear on how to store dates before 1970 – it’s highly language-dependent. Most systems using signed integers (that accept both positive and negative values) will take a negative number and convert it into a historic date. However systems assuming unsigned integers (that don’t accept negative values) may not be able to process the date at all. So some languages react differently to others.

JSON can store the timestamp as an Integer (a variable that specifically holds a numerical value) or a String (a variable that can hold a range of characters including numbers). As we approach the Unix Epochalypse this limitation is of particular concern because JavaScript only supports 53 bits of precision for numbers. Different languages implementing the JSON specification may behave unpredictably with numbers exceeding this range. For this reason, when using the UNIX Timestamp, it’s better to use Strings but this isn’t specified by the standard.

Representing a physical location

There are many different ways of representing the physical location of something.

We use the GeoJSON open standard, which as a subset of the JSON format can be parsed in software. We have chosen this data format for these reasons:

  • it matches all our goals for an Open Standard
  • it has excellent library support in a wide variety of languages
  • It doesn't limit us to only latitude and longitude - we can add metadata as needed

Versioning

We all know the feeling. You've released your API, people are using it, people are loving it - but now it is time to change things.  Perhaps you need to add more data, remove some fields, or rename others.  How do you do it in the least disruptive  way for your users?

Our lead technical architect on Government as a Platform (GaaP) recently went into details on how to version transactional services, but it’s also worth noting how we version our payloads. The best way to version your API is to design its payload with an eye on the future. This doesn't mean your design has to stay static, but rather your design can adapt without breaking compatibility.

Let’s look at  a trivial example. Suppose as part of your data, you consider representing some personal information:

{

"person": {

"Name": "Alice Wonderland",

"DOB": "1999-01-01",

"married": true

}

}

Before settling on the payload above, you’d need to think about whether the fields are sufficient. Perhaps the payload could better be structured with separate first and last names. You’d also need to think about how the design will cope with names from cultures which don't have first and last names.

You need to focus on creating the optimal description for each of your data payload fields. In the above example, you’d need to consider whether the abbreviated DOB makes sense or whether it’s better to spell out the field - you don’t need to save space. You’d also consider whether DOB makes sense when combined with DOD (date of death) or DOJ (date of joining).

Some fields, like the marriage example above, shouldn’t always be reduced to a simple true/false answer since there may be more than two states you wish to record. For example "married", "unmarried", "divorced", "widowed", "estranged", "annulled" and so on.

If you need to add fields to your response - this is usually easy to do. Make sure you update your documentation and inform the consumers of your data. But any change that removes data or fundamentally alters the structure of the JSON returned must be accompanied by a change in the API version. If you don't change the version, you run the risk of breaking a user's application or library when it tries to consume your data.  If you want people to keep using your API, you have to make life easy for them.

The easiest way to version your API is by changing the URl. For instance, example.com/api/v1/getTimeline becomes example.com/api/v2/getTimeline.

Conventions that make things futureproof

Most importantly, understand your users' current needs and consider how these needs may change. Finding out what users want now is good. But, keeping things open to extension and iteration in the months and years to come will ensure your design is always fit for the future. A good design at the start will save difficult versioning decisions later on.

You can follow Terence on Twitter, sign up now for email updates from this blog or subscribe to the feed.

Sharing and comments

Share this page