Replacement of Category 150 location history fields

Chris d L
Chris d L The helpers that keep the community running smoothly. UKB Community team Data Analyst
  • Updated

Overview

Some data quality issues have been identified in the home location history fields in Category 150 (Data-Fields 22700, 22701, 22702, 22703, 22704). UK Biobank has superseded these fields in the Q1 2025 showcase update with a new set of improved fields (Data-Fields 32220, 32221, 32222, 32223, 32224).

The old fields will be retired in Q2 2025. Until then, we will keep them available but flagged as obsolete so that researchers already using them are not disrupted.

We strongly recommend that researchers use the new fields. You will experience minimal disruption to existing pipelines when moving to the new fields as they have the same structure as the old fields.

Please continue reading to find out why we have made this change and the impact it will have on your research.

Data problems with old Data-Fields 22700-22704

Inaccurate locations

Some locations drawn from linkage data were identified as not representing participants’ home addresses. For example, a given address might be the location where emergency services interacted with a participant, rather than the participant's home address.

Inaccurate dates

An import error caused some day/month values to be transposed in Data-Field 22700. When this happened, values larger than 12 in the ‘months’ position were added to the date as additional months. For example, a date value in the raw data of 31/01/2000 was read as 01/31/2000. The 19 ‘excess’ months were added, resulting in the date being imported as 01/07/2002.

  • The largest date gap that this issue caused was 30 months – the example given above.
  • 21% of recruitment and post-recruitment locations were affected by this issue

Please note that a date recorded against location data is the date that the change was recorded in the IT system, rather than necessarily the date that the participant moved.

Differences in new Data-Fields 32220-32224

  1. Pre-baseline locations (recorded before a participant's recruitment) have been removed. These came exclusively from linkage sources where UK Biobank could not confirm their accuracy. Pre-baseline locations comprise 75% of the data in the old Data-Fields. We have found that most of these locations match the home location at recruitment in the new Data-Fields:

    40% of recruitment locations exactly match the removed pre-baseline values.
    A further 45% of recruitment locations are within 1km2 of the removed pre-baseline values.

  2. New location history data are compiled primarily from direct interactions with UK Biobank. This allows us to be more confident in the accuracy of the location and date being recorded. Locations previously derived from linkage data have been retained only where they could be corroborated by an additional source of information (e.g. direct report).

  3. The new fields have more complete history post-recruitment and will be updated periodically.

Impact on 'location at assessment' Data-Fields 22686-22689

There are no changes to location at assessment Data-Fields as part of this update. These fields do not contain the errors outlined above.

Impact on existing research projects

If you have used Data-Fields 22700-22704 in your research and wish to re-run your analyses with Data-Fields 32220-32224, please contact the Access team to discuss this. Furthermore, if you have returned data to us based on these fields, please contact us so that we can update Showcase with that information.

Related to

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.