Finding data and how it is organised

  • Updated

 

This article provides an overview of the types of data available on the UK Biobank Research Analysis Platform (UKB-RAP) and how the data is organised on the platform. This tutorial is part of the overview series introducing researchers to the data and functionality of the UKB-RAP.

Introduction

All UKB data is described in the UKB Showcase website.   The Showcase page for each data field will specify whether the data type is Bulk, Records or Data (which means tabular data).    The UKB data can be found in two places on the UKB-RAP.

Bulk folder

The UKB Bulk data type represents particularly large or complex items, such as genotyping array data, genome sequencing data and raw imaging data. These data types often require specialist software to view or analyse.

Bulk data can be found in the Bulk folder in your UKB-RAP project. A screenshot of what this folder looks like is shown below:

Bulk data image.png

Data in the Bulk folder can be accessed programmatically using software apps from the tools library or using interactive workstations like JupyterLab and RStudio.

Data fields in the Bulk folder can be item type Records or item type Bulk on the Showcase website. For example:

Parquet database

Tabular data such as demographic information, questionnaire results and assay measurements such as NMR metabolomics can be found in the Parquet database in your UKB-RAP project, within sub-table entity participant

Record-level health outcomes linkage data such as the HES tables and the GP tables are also found in the Parquet database, in separate sub-table entities such as hesin_diag and gp_scripts.

Clicking on this Dataset Record item will open the cohort browser . A screenshot of what the file looks like is shown below:

Phenotype data.png

Both assay and questionnaire data will come as item type Data - meaning that they contain tabular values of elementary types or with simple structures. Some examples include:

Example fields with type Record that are in the Parquet database include:

Summary table of data types and where to find them

Data type

Example

Item type on UKB Showcase

Location on UKB-RAP

Bulk Field 22418 - Genotyping array calls Records Bulk folder
Field 23159 - WES BGEN files Bulk
Tabular Field 21022 - Age Data Parquet database accessed via "appNNNN.dataset" 
Field 50 - Height
Field 41270 - ICD10 diagnosis
Field 23474 - Metabolomics
Field 42040 - GP clinical event records Records

 

Sub-folders and sub-categories

The sub-folder structure within the UKB-RAP Bulk folder is similar to the category and sub-category structure in the UKB Showcase.  However, it is not quite identical, particularly for the imaging fields.

The sub-folder structure within the cohort browser is very similar to the category and sub-category structure in the UKB Showcase.

Was this article helpful?

7 out of 10 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.