Restriction of OMOP derivation dataset

Laura B
Laura B Data Analyst
  • Updated

The Observational Medical Outcomes Partnership (OMOP) data derivation produced and returned as part of Project 26041, and published on the UK Biobank Research Analysis Platform (UKB-RAP) in the record tables attached to Data-Field 20142 is being restricted. This is because the dataset represents a static copy that is no longer maintained, and is therefore out of date. Additionally, some quality issues have been identified by research groups using this OMOP derivation dataset, which are currently under further investigation.

Researchers are welcome to request access to the OMOP derivation dataset on the UKB-RAP if they wish to use it – they will be asked to confirm that they have read this article and are aware of the following identified issues:

  • Read v3 (CTV3) code mapping in the primary care data

The primary care data (Category 3000) contains some local codes specific to the particular software system supplier. Local codes from the English data provider TPP are now listed in Data-Coding 8708, but were not available at the time of the OMOP mapping. These codes, which start with “Y”, were instead mapped to Read v3 Term IDs, which do not appear in the data. Consequently, these concepts appear with incorrect concept_name values in the omop_concept table (Resource 1545).

For example, the concept_id “2000187102” derives from the TPP local code “Y0384” and should be mapped to “Smear Under GMS” (i.e. a cervical smear test under the General Medical Services programme) but is instead defined as “Closure of atrial septal defect.” Researchers are advised to re-map any concept_ids beginning with “Y” under the vocabulary_id “UKB_GP_Clinical_read_3” to their correct meanings in Data-Coding 8708, using the concept_code column.

  • Inconsistent formatting with OMOP CDM Specification 5.3

We have received reports of formatting that is inconsistent with OMOP CDM specification 5.3, though some of these are still being investigated. For example, omop_death includes multiple records for some participants, covering both primary and secondary causes of death, although the specification for the death table states that only one record per person should be present.


For more information about this OMOP derivation dataset, please see Resource 1420. Due to the proprietary nature of the conversion, we are unable to provide granular details of the mappings used.

Please be aware that OMOP tables not containing individual-level participant data are published as Showcase resources attached to Data-Field 20142 in the "Resources" tab, rather than data tables within the UKB-RAP.

We will update this article if new issues are identified. UK Biobank is assessing the interest within the research community regarding the conversion of UK Biobank data to common data models such as OMOP. Updates on these developments will be included in this article and/or the future timelines page.


Related to

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request



Article is closed for comments.