This article provides an overview of the types of data available on the UK Biobank Research Analysis Platform (UKB-RAP) and how the data is organised on the platform. This tutorial is part of the overview series introducing researchers to the data and functionality of the UKB-RAP.
Introduction
All UKB data is described in the UKB Showcase website. The Showcase page for each data field will specify whether the data type is Bulk, Records or Data (which means tabular data). The UKB data can be found in two places on the UKB-RAP.
Bulk folder
The UKB Bulk data type represents particularly large or complex items, such as genotyping array data, genome sequencing data and raw imaging data. These data types often require specialist software to view or analyse.
Bulk data can be found in the Bulk folder in your UKB-RAP project. A screenshot of what this folder looks like is shown below:
Data in the Bulk folder can be accessed programmatically using software apps from the tools library or using interactive workstations like JupyterLab and RStudio.
Data fields in the Bulk folder can be item type Records or item type Bulk on the Showcase website. For example:
- Data-field 22418 Genotype calls is item type Records.
- Data-field 23159 Whole Exome Sequencing BGEN files is item type Bulk.
Parquet database
Tabular data such as demographic information, questionnaire results and assay measurements such as NMR metabolomics can be found in the Parquet database in your UKB-RAP project, within sub-table entity participant
Record-level health outcomes linkage data such as the HES tables and the GP tables are also found in the Parquet database, in separate sub-table entities such as hesin_diag and gp_scripts.
Clicking on this Dataset Record item will open the cohort browser . A screenshot of what the file looks like is shown below:
Both assay and questionnaire data will come as item type Data - meaning that they contain tabular values of elementary types or with simple structures. Some examples include:
- Data-field 23474 3-Hydroxybutyrate
- Data-field 21022 Age at recruitment
Example fields with type Record that are in the Parquet database include:
- Data-field 41234 Records in HES inpatient diagnoses dataset
- Data-field 42039 GP prescription records
Summary table of data types and where to find them
Data type |
Example |
Item type on UKB Showcase |
Location on UKB-RAP |
| Bulk | Field 22418 - Genotyping array calls | Records | Bulk folder |
| Field 23159 - WES BGEN files | Bulk | ||
| Tabular | Field 21022 - Age | Data | Parquet database accessed via "appNNNN.dataset" |
| Field 50 - Height | |||
| Field 41270 - ICD10 diagnosis | |||
| Field 23474 - Metabolomics | |||
| Field 42040 - GP clinical event records | Records |
Sub-folders and sub-categories
The sub-folder structure within the UKB-RAP Bulk folder is similar to the category and sub-category structure in the UKB Showcase. However, it is not quite identical, particularly for the imaging fields.
The sub-folder structure within the cohort browser is very similar to the category and sub-category structure in the UKB Showcase.
Comments
0 comments
Please sign in to leave a comment.