Finding data and how it is organised

UKB Communications
UKB Communications
  • Updated

This article provides an overview of the types of data available on the UK Biobank Research Analysis Platform (UKB-RAP) and how it is organised on the platform. This tutorial is part of the overview series introducing researchers to the data and functionality of the UKB-RAP.

Introduction

Let's first talk about the type of data available to you on the UKB-RAP and how it's organised on the platform. In general, data can be found in two places on the platform.

Bulk data

The UKB bulk data type represents particularly large and/or complex items, such as genotyping array data, genome sequencing data and raw imaging data. These data types require specialist software to view or analyse.

Bulk data can be found in the bulk directory folder in your UKB-RAP project directory space. A screenshot of what this folder looks like is shown below:

Bulk data image.png

Data can also be accessed programmatically using software apps from the tool library or using interactive workstations like JupyterLab and RStudio.

Bulk data can either of item type ‘records’ or ‘bulk’ on the Showcase website. For example:

  • Data-field 22418 (genotype calls) is bulk data, with an item type of records.
  • Data-field 23159 (Whole Exome Sequencing BGEN files) is bulk data, with an item type of bulk.

Phenotype data

Phenotype data (such as demographic information or questionnaire results) as well as assay measurements (such as NMR metabolomics) can be found in the app###.dataset file in your UKB-RAP project directory space.

Clicking on this dataset will take you directly to the cohort browser. A screenshot of what the file looks like is shown below:

Phenotype data.png

Both assay and phenotype data will come as item type data- meaning they contain values of elementary types or with simple structures. Some examples include:

Summary table of data types and where to find them

So, to summarise:

Data type

Example

Item type (UKB Showcase)

Location on UKB-RAP

Bulk Field 22418 - Genotyping array calls Records /Bulk folder
Field 23159 - WES BGEN files Bulk
Phenotype Field 21022 - Age Data Dataset in root folder "appNNNN.dataset"
Field 50 - Height
Field 41270 - ICD10 diagnosis
Assay results Field 23474 - Metabolomics

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.