Skip to main content

This blog post was published under the 2015-2024 Conservative Administration

https://technology.blog.gov.uk/2021/01/12/getting-the-document-checking-service-ready-for-take-off/

Getting the Document Checking Service ready for take-off

Posted by: and , Posted on: - Categories: GOV.UK Verify, Tools, Transformation

Document Checking Service logo

Prior to the now-launched Document Checking Service (DCS) pilot, the DCS was a service available exclusively to Identity Providers (IDPs) that were part of GOV.UK Verify. The DCS helps people prove their identity online.

Organisations participating outside of GOV.UK Verify, connecting to the DCS as part of the DCS pilot, have different requirements and constraints than IDPs. It was important to support both types of user equally and make changes to the service to reflect this.

Creating public user-led documentation

For the first time, the DCS team at the Government Digital Service (GDS) would be publishing a significant portion of the DCS interface specification, along with instructions on how to connect.

When considering documentation for the DCS pilot, it was important to follow the Technology Code of Practice and put users first.

We tested the documentation with volunteer developers in a lab day, a session where we could observe how useful they found it. We gave them a sheet of instructions and 3 pieces of sample passport data. The developers then tried and completed tasks similar to those new users of the DCS would need to. For example, to use the DCS to check:

  • which sample passport data is valid
  • which sample passport data is invalid
  • which sample passport data causes a service error

We asked the volunteers to narrate what they were doing so we had an insight into their thought process. The DCS team observed the volunteers attempting the task and made notes of what did and did not go well.

The first lab day showed the documentation was confusing, and the users fell back on using search engines to find out how to do certain tasks. Nobody was able to complete the task in the allotted time.

This provided a good starting point to improve the documentation and respond to user needs. We held a “doc-a-thon” to audit and identify the sections which were unclear or not meeting user needs.

The DCS’s technical writer paired up with subject matter experts on the team to rewrite the documentation, focusing on the user needs of pilot participants. We made the pages more task-driven - for example, instead of a heading being “Message structure”, it is now “Sign and encrypt a DCS payload”. The updated documentation also changed to follow a structure similar to how a typical user would connect to the DCS pilot.

We ran a second lab day with different developers and asked them to complete the same task, supported by the new documentation. One of the developers was able to complete the task, and the other got significantly further than the previous labs. 

When the pilot launched with actual users, the documentation received positive feedback and the service has received very few support requests relating to the documentation. As the pilot continues, the team reviews and iterates the documentation to make sure it is always meeting user needs.

Improving the DCS’s error messages

One of the most important insights from the user testing was that the error messages for the DCS were either unclear or non-existent. We decided to review all the diagnostic messages from user-caused errors, and refine them one by one.

For example, the DCS requires the forenames field in the payload to be no more than 30 characters in total. The original error message was “Total length of strings and separators in list exceeds valid value” which is not clear on what is wrong and how a user can fix it.

The DCS’s technical writer rewrote this to a plain English alternative: “Length must be between 1 and 30 characters”. The improved error messages are now clearer on how a user can resolve the error and proceed with their task.

We also changed what we return when a client has misconfigured credentials. Using the wrong certificate or omitting mandatory fields in the JOSE payload previously sent back an HTTP error response with no content on what went wrong. The improved error message now sends back content which clearly explains the problem.

Enforcing fixed quotas for pilot clients

Each participant requests a number of checks as part of joining the pilot, which we refer to as their quota. Once a participant has reached this number of checks, they cannot make any more. The concept of a quota did not exist before the pilot, as there was no need to cap requests from identity providers which were supporting government services.

Our audit store, a PostgreSQL database, records the number of requests we receive from each DCS client but is not suitable for querying quota numbers on every request. Having a new and separate shared counter for quota numbers also provides a single source of truth, as we run multiple instances of the DCS’s entry-point application.

To meet these needs, we introduced a Redis instance (provided by AWS Elasticache) that is both updated from the service entry point and periodically refreshed from the audit store.

The new quota limits are not enforced for the existing IDPs and only apply to the pilot participants. We deployed new HTTP headers to communicate available quota and rate-limits to connected clients.

Protecting existing GOV.UK Verify traffic

We wanted to make sure IDP traffic needed to support GOV.UK Verify was isolated from pilot requests. This was so that no pilot participant could use up our own rate-limit agreed with Her Majesty’s Passport Office (HMPO).

The team deployed changes to the DCS so it could support separate rate limits depending on client permissions. DCS pilot participants cannot use up requests in a way that would prevent GOV.UK Verify operating, and GOV.UK Verify cannot prevent the DCS pilot operating.

The DCS’s rate-limiting storage is built on top of the same shared Redis instance that we use for quota storage. We consolidated our existing egress rate-limit which is used to protect our upstream providers to use this Redis instance too.

We confirmed our different rate-limits were working by testing the DCS with Gatling scenarios covering different blends of traffic types. For example, if we set a rate-limit of 10 requests per second and fired 15 requests per second, we saw 33% of traffic being returned with HTTP status code 429 (Too Many Requests).

What’s next

2020 was an exciting year for the DCS, from launching the pilot to the first participants successfully integrating with the service. During the course of the pilot we’ll continue to conduct user research with its participants. Our work will contribute to our wider ambition of making it easier and safer for people and businesses to access services online.

Find out more about the DCS pilot on GOV.UK.

Thanks to Chris Clayson, Dan Besbrode, and Bibi Burahee for their contributions to this post.

Sharing and comments

Share this page