This article provides an overview of the types of data available on the UK Biobank Research Analysis Platform (UKB-RAP) and how it is organised on the platform. This tutorial is part of the overview series introducing researchers to the data and functionality of the UKB-RAP.
Introduction
Let's first talk about the type of data available to you on the UKB-RAP and how it's organised on the platform. In general, data can be found in two places on the platform.
Bulk data
The UKB bulk data type represents particularly large and/or complex items, such as genotyping array data, genome sequencing data and raw imaging data. These data types require specialist software to view or analyse.
Bulk data can be found in the bulk directory folder in your UKB-RAP project directory space. A screenshot of what this folder looks like is shown below:
Data can also be accessed programmatically using software apps from the tool library or using interactive workstations like JupyterLab and RStudio.
Bulk data can either of item type ‘records’ or ‘bulk’ on the Showcase website. For example:
- Data-field 22418 (genotype calls) is bulk data, with an item type of records.
- Data-field 23159 (Whole Exome Sequencing BGEN files) is bulk data, with an item type of bulk.
Phenotype data
Phenotype data (such as demographic information or questionnaire results) as well as assay measurements (such as NMR metabolomics) can be found in the app###.dataset file in your UKB-RAP project directory space.
Clicking on this dataset will take you directly to the cohort browser. A screenshot of what the file looks like is shown below:
Both assay and phenotype data will come as item type data- meaning they contain values of elementary types or with simple structures. Some examples include:
- Data-field 23474 (3-Hydroxybutyrate)
- Data-field 21022 (age at recruitment)
Summary table of data types and where to find them
So, to summarise:
Data type |
Example |
Item type (UKB Showcase) |
Location on UKB-RAP |
Bulk | Field 22418 - Genotyping array calls | Records | /Bulk folder |
Field 23159 - WES BGEN files | Bulk | ||
Phenotype | Field 21022 - Age | Data | Dataset in root folder "appNNNN.dataset" |
Field 50 - Height | |||
Field 41270 - ICD10 diagnosis | |||
Assay results | Field 23474 - Metabolomics |
Comments
0 comments
Please sign in to leave a comment.